positivedeﬁnitematrices:compression, decomposition ...positivedeﬁnitematrices:compression,...

Positive Definite Matrices: Compression,Decomposition, Eigensolver, and Concentration

Thesis byDe Huang

In Partial Fulfillment of the Requirements for theDegree of

Doctor of Philosophy

CALIFORNIA INSTITUTE OF TECHNOLOGYPasadena, California

2020Defended May 14, 2020

ii

© 2020

De HuangORCID: 0000-0003-4023-9895

All rights reserved

iii

ACKNOWLEDGEMENTS

First of all, I would like to express my deepest gratitude to my adviser, Prof.Thomas Yizhao Hou, for his consistently wholehearted mentoring, supporting andinspiring. Prof. Hou introducedme to the frontier ofmany important fields includingmultiscale PDE, fluid dynamics, data analysis and machine learning, and I havetruly enjoyed my interdisciplinary research among these fascinating areas. ProfHou’s enthusiastic dedication tomathematics and steadfast resolution in solving hardproblems have aroused my passion for research, toughened my will in hard-workingand moreover, taught me how to become a better man. Beyond the academic, Prof.Hou is not only a good mentor but also a caring friend, continuously providing me alot of helps in my personal life. To this point, I would also like to thank Prof. Hou’swife, Prof. Chang, for sharing with us her knowledge in Buddhism and for bringingus delicious food on occasion.

I want to thank Prof. Houman Owhadi for he has been an inspiration in a largeportion of the work in my thesis. He has taught me with patience many usefultechniques in multiscale analysis and probability theory.

I must also thank Prof. Joel Tropp, who guided me to the realm of random-matrixtheories. His excellent skills and keen insight in matrix analysis have greatly helpedme build up my understanding in this field and develop my own works.

I am grateful to Prof. Andrew Stuart and Prof. Peter Schroeder for kindly serving inmy thesis committee. They have provided me with invaluable advice on my thesis.

I feel honored to have had stimulating discussions with many brilliant mathemati-cians, especially Prof. Oscar Bruno, Prof. Venkat Chandrasekaran, Prof. ThomasVidick, Prof. Stanley Osher, Prof. Alexander Kiselev, Prof. Vladimír Šverák, Prof.Jack Xin, Prof. Jonathan Niles-Weed, Prof. Rachel Ward, Dr. Kachun Lam, Dr.Pengchuan Zhang, Dr. Pengfei Liu, Mr. Yifan Chen, Mr. Jiajie Chen, Mr. ShumaoZhang, Ms. Ziyun Zhang, and more. Their knowledge truly inspired me. Besidesthat, I want to thank all the professors in the wonderful department of Computationaland Mathematical Science at Caltech who have never shown hesitation in answeringmy questions.

Thanks are also due to all staff members of the CMS department, especially MariaLopez, Carmen Sirois, Diana Bohler and Sydney Garstang. They have been takinggood care of me in various ways in the five three years.

iv

Coming to friends, I have been truly fortunate to have their company during mydays of joys and days of bitterness. First of all, there is Muzhe Zeng, my closestconfidant and best friend, with whom I share the happiest memories of more thanten years. He is like the mainspring to my gears of joy and the piston to my engineof passion.

Then there are my three closest friends at Caltech, Jialin Song, Florian Schaeferand Mandy Huo. We have had a lot of unforgettably interesting discussions on avariety of topics, from the deepest mysteries of humanity to the farthest void of theuniverse. Moreover, they treated me like true family.

Besides, there is a long list of lovely names including Yufei Ou, Qingcan Wang,Zichao Long, Yingying Hu, Jinyang Huang, Zhuoheng Liu, Jiahui Tang, QingwenXu, AndreaColadangelo, NavidAzizan, XinyingRen, KarenaCai, Bo Sun, XinliangLiu, and many many more.

Last but not least, I owe tons of thanks to my fairylike girlfriend, Su Wang. Youhave been supporting me all along and bringing me endless happiness. I could nothave survived the last year of my strenuous Ph.D. journey without you.

This thesis is especially dedicated to my mother, Ms. Youping Zuo, the strongestwoman I have ever known. I am truly grateful to her for bringing me up the wayshe did. If there is any goodness in my character, it is from her. She gives me allher support to my pursuit of knowledge, even though it means I have to leave her inyears of worry and missing. I love you, Mom.

v

ABSTRACT

For many decades, the study of positive definite (PD) matrices has been one ofthe most popular subjects among a wide range of scientific researches. A hugemass of successful models on PD matrices have been proposed and developed inthe fields of mathematics, physics, biology, etc., leading to a celebrated richness oftheories and algorithms. In this thesis, we draw our attention to a general class ofPD matrices featured by the generic form A =

∑mk=1 Ek , where Ek

mk=1 is a finite

sequence of positive semidefinite matrices reflecting the topological and structuralnature of A. For this class of PD matrices, we will develop theories and algorithmson operator compression, multilevel decomposition, eigenpair computation, andspectrum concentration. We divide these contents into three main parts.

In the first part, we propose an adaptive fast solver for the preceding class of PDmatrices which includes the well-known graph Laplacians. We achieve this byestablishing an adaptive operator compression scheme and a multiresolution matrixfactorization algorithm which have nearly optimal performance on both complexityand well-posedness. To develop our methods, we introduce a novel notion of energydecomposition for PD matrices and two important local measurement quantities,which provide theoretical guarantee and computational guidance for the constructionof an appropriate partition and a nested adaptive basis.

In the second part, we propose a new iterative method to hierarchically computea relatively large number of leftmost eigenpairs of a sparse PD matrix under themultiresolution matrix compression framework. We exploit the well-conditionedproperty of every decomposition components by integrating the multiresolutionframework into the Implicitly Restarted Lanczos method. We achieve this combina-tion by proposing an extension-refinement iterative scheme, in which the intrinsicidea is to decompose the target spectrum into several segments such that the corre-sponding eigenproblem in each segment is well-conditioned.

In the third part, we derive concentration inequalities on partial sums of eigenvaluesof random PD matrices by introducing the notion of k-trace. For this purpose,we establish a generalized Lieb’s concavity theorem, which extends the originalLieb’s concavity theorem from the normal trace to k-traces. Our argument employsa variety of matrix techniques and concepts, including exterior algebra, mixeddiscriminant, and operator interpolation.

vi

PUBLISHED CONTENT AND CONTRIBUTIONS

[1] Thomas Y. Hou, De Huang, Ka Chun Lam, and PengChuan Zhang. An adap-tive fast solver for a general class of positive definite matrices via energy de-composition. Multiscale Modeling & Simulation, 16(2):615–678, 2018. URLhttps://doi.org/10.1137/17M1140686.D.H. proposed this project, generalized the operator compression method andthemultiresolutionmatrix decomposition framework, designed the adaptive par-titioning algorithm, proved themain theoretical results, conducted the numericalexperiments, and participated in the writing of the manuscript.

[2] Thomas Y. Hou, De Huang, Ka Chun Lam, and Ziyun Zhang. A fast hi-erarchically preconditioned eigensolver based on multiresolution matrix de-composition. Multiscale Modeling & Simulation, 17(1):260–306, 2019. URLhttps://doi.org/10.1137/18M1180827.D.H. proposed this project, designed the hierarchical framework of the eigen-solver, proposed the spectral preconditioner, proved the main theoretical results,conducted the numerical experiments, and participated in the writing of themanuscript.

[3] De Huang. A generalized Lieb’s theorem and its applications to spectrumestimates for a sum of random matrices. Linear Algebra and its Applications,579:419 – 448, 2019. ISSN 0024-3795. URL https://doi.org/10.1016/j.laa.2019.06.013.D.H. is the sole author of the manuscript.

[4] De Huang. Generalizing Lieb’s concavity theorem via operator interpolation.Advances in Mathematics, 369:107208, 2020. ISSN 0001-8708. URL https://doi.org/10.1016/j.aim.2020.107208.D.H. is the sole author of the manuscript.

https://doi.org/10.1137/17M1140686

https://doi.org/10.1137/18M1180827

https://doi.org/10.1016/j.laa.2019.06.013

https://doi.org/10.1016/j.laa.2019.06.013

https://doi.org/10.1016/j.aim.2020.107208

https://doi.org/10.1016/j.aim.2020.107208

vii

TABLE OF CONTENTS

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPublished Content and Contributions . . . . . . . . . . . . . . . . . . . . . . viBibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xChapter I: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Operator Compression and Fast Linear Solver . . . . . . . . . . . . . 11.2 Hierarchically Preconditioned Eigensolver . . . . . . . . . . . . . . 81.3 Concentration of Eigenvalue Sums . . . . . . . . . . . . . . . . . . 131.4 Summary of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 18

Chapter II: Operator Compression via Energy Decomposition . . . . . . . . . 212.1 Energy Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Operator Compression . . . . . . . . . . . . . . . . . . . . . . . . . 282.3 Construction of Partition . . . . . . . . . . . . . . . . . . . . . . . . 492.4 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Chapter III: Multiresolution Matrix Decomposition . . . . . . . . . . . . . . 693.1 Multiresolution Matrix Decomposition (MMD) . . . . . . . . . . . . 693.2 MMD with Localization . . . . . . . . . . . . . . . . . . . . . . . . 753.3 Multilevel Operator Compression . . . . . . . . . . . . . . . . . . . 843.4 Numerical Example for MMD . . . . . . . . . . . . . . . . . . . . . 86

Chapter IV: Hierarchically Preconditioned Eigensolver . . . . . . . . . . . . 924.1 Implicitly Restarted Lanczos Method (IRLM) . . . . . . . . . . . . . 924.2 The Compressed Eigen Problem . . . . . . . . . . . . . . . . . . . . 944.3 Hierarchical Spectrum Completion . . . . . . . . . . . . . . . . . . 964.4 Cross-Level Refinement of Eigenspace . . . . . . . . . . . . . . . . 1054.5 Overall Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114.6 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.7 Comparison with the IRLM . . . . . . . . . . . . . . . . . . . . . . 1264.8 On Compressed Eigenproblems . . . . . . . . . . . . . . . . . . . . 131

Chapter V: Concentration of Eigenvalue Sums and Generalized Lieb’s Con-cavity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1375.1 Concentration of Eigenvalue Sums . . . . . . . . . . . . . . . . . . 1385.2 K-trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1425.3 Generalized Lieb’s Concavity Theorems . . . . . . . . . . . . . . . 1455.4 Proof of Concavity Theorems . . . . . . . . . . . . . . . . . . . . . 1475.5 From Lieb’s Theorem to Concentration . . . . . . . . . . . . . . . . 1665.6 Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 171

viii

5.7 Other Results on K-trace . . . . . . . . . . . . . . . . . . . . . . . . 184Chapter VI: Concluding Discussions . . . . . . . . . . . . . . . . . . . . . . 191Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

ix

LIST OF ILLUSTRATIONS

Number Page1.1 Process flowchart for compressing A−1. . . . . . . . . . . . . . . . . 32.1 An illustration of a graph partition. . . . . . . . . . . . . . . . . . . 282.2 An example showing the relationship between mesh size, the error

factor ε (Pj, 1), the condition factor δ(Pj, 1) and contrast. . . . . . . 412.3 An Illustration of the running time of Algorithm 4. . . . . . . . . . . 582.4 Intrinsic dimension of the sets of graphs. . . . . . . . . . . . . . . . 592.5 Error and well-posedness studies of the compressed operators. . . . . 612.6 Compression error ε2com and the mean radius of Ψ. . . . . . . . . . . 622.7 Profiles of (localized) basis functions ψ. . . . . . . . . . . . . . . . . 622.8 Spectrum of L−1, L−1 − PL

ΨL−1 and L−1 − PL

ΨL−1. . . . . . . . . . . 63

2.9 Difference between corresponding eigenpairs. . . . . . . . . . . . . . 632.10 Plot of some eigenfunctions of L−1. . . . . . . . . . . . . . . . . . . 642.11 Partitioning result of an elliptic PDE. . . . . . . . . . . . . . . . . . 662.12 Samples of the localized basis functions. . . . . . . . . . . . . . . . 672.13 Samples of the localized basis functions from a regular partition. . . . 683.1 Process flowchart of Algorithm 6. . . . . . . . . . . . . . . . . . . . 823.2 A “Roll surface” constructed by (3.33). . . . . . . . . . . . . . . . . 873.3 Spectrum and complexity of a 4-level MMD. . . . . . . . . . . . . . 883.4 Spectrum and complexity of a 5-level MMD. . . . . . . . . . . . . . 894.1 A flow chart illustrating the procedure of Algorithm 13. . . . . . . . 1124.2 Point clouds of datasets. . . . . . . . . . . . . . . . . . . . . . . . . 1174.3 The error, the residual and the relative error. . . . . . . . . . . . . . . 1204.4 Colormaps of eigenvectors: Bunny. . . . . . . . . . . . . . . . . . . 1214.5 Colormaps of eigenvectors: Brain. . . . . . . . . . . . . . . . . . . . 1224.6 Heaping up more eigenvectors leads to finer partition. . . . . . . . . 1224.7 Convergence of computed spectrum in different errors. . . . . . . . . 1284.8 The completion and convergence process of the target spectrum. . . . 1294.9 #A(k) versus α in the 4-level SwissRoll example. . . . . . . . . . . . 129

4.10 A comparison of effective condition numbers. . . . . . . . . . . . . . 1324.11 A comparison of eigenvalues and localized eigenfunctions. . . . . . . 134

x

LIST OF TABLES

Number Page2.1 Comparison between our partitioning and the uniform regular parti-

tioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.1 Complexity results of a 4-level MMD. . . . . . . . . . . . . . . . . . 883.2 Complexity results of a 5-level MMD. . . . . . . . . . . . . . . . . . 893.3 Case 1: Performance of the 4-level and the 5-level decompositions. . 903.4 Case 2: Performance of the 4-level and the 5-level decompositions. . 904.1 Matrix decomposition time (in seconds) for different examples. . . . 1184.2 Decomposition parameters different datasets. . . . . . . . . . . . . . 1184.3 Decomposition information of different datasets . . . . . . . . . . . 1194.4 Eigenpair computation information. m , nnz(A(0)). . . . . . . . . . 1244.5 4-level eigenpairs computation of Brain data with (η, c) = (0.2, 20). . 1254.6 3-level eigenpairs computation of SwissRoll datawith (η, c) = (0.1, 20).1264.7 4-level eigenpairs computation of SwissRoll data: (η, c) = (0.2, 20). . 1274.8 Eigenpair computation time of different methods. . . . . . . . . . . . 130

1

C h a p t e r 1

INTRODUCTION

1.1 Operator Compression and Fast Linear SolverFast algorithms for solving symmetric positive definite (PD) linear systems havefound broad applications across both theories and practice, including machine learn-ing [11, 21, 106], computer vision [10, 15, 135], image processing [5, 23, 81],computational biology [32, 83], etc. For instance, solving linear systems of graphLaplacians, which has a deep connection to the spectral properties of the under-lying graphs, is one of the foundational problems in data analysis. Performingfinite element simulation on a wide range of physical systems will introduce thecorresponding stiffness matrix, which is also symmetric and positive definite.

Many related works have drawn inspiration from the spectral graph theory [30, 84]in which the spectrum and the geometry of graphs are found to be highly correlated.By computing the spectrum of the graph, the intrinsic geometric information can bedirectly obtained and various applications can be found [16, 57]. However, the pricefor finding the spectrum of graph is relatively expensive as it involves solving globaleigen problems. On the contrary, the Algebraic Multigrid method [118, 123, 125] isa purely matrix-based multiresolution type solver. It simply uses the interaction ofnonzero entries within the matrix as an indicator to describe the geometry implicitly.These stimulating techniques and concepts motivate us to look into the problemsfrom two different points of view and search for a brand new framework which canintegrate the advantages from both ends.

In the first part of the thesis, we propose an adaptive fast solver for a general classof PD matrices. We achieve this by developing an adaptive operator compressionscheme and a multiresolution matrix factorization algorithm both with nearly opti-mal performance on complexity and well-posedness. These methods are developedbased on a newly introduced framework, namely, the energy decomposition for aPD matrix A to extract its hidden geometric information. For the ease of discussion,we first consider A = L, the graph Laplacian of an undirected graph G. Under thisframework, we reformulate the connectivity of subgraphs in G as the interactionbetween energies. These interactions reveal the intrinsic geometric informationhidden in L. In particular, this framework naturally leads into two important local

2

measurements, which are the error factor and the condition factor. Computingthese two measurements only involves solving a localized eigenvalue problem andconsequently no global computation or information is involved. These twomeasure-ments serve as guidances to define an appropriate partition of the graph G. Usingthis partition, a modified coarse space and corresponding basis with exponentialdecaying property can be constructed. Compression of L−1 can thus be achieved.Furthermore, the systematic clustering procedure of the graph regardless of theknowledge of the geometric information allows us to introduce a multiresolutionmatrix decomposition (MMD) framework for graph Laplacian, and more generally,PD linear systems. In particular, following the work in [88], we propose a nearly-linear time fast solver for general PD matrices. Given the prescribed well-posednessrequirement (i.e., the condition factor), every component from MMD will be awell-conditioned, lower dimensional PD linear system. Any generic iterative solvercan then be applied in parallel to obtain the approximated solution of the givenarbitrary PD matrix satisfying the prescribed accuracy.

Problem SettingGiven an n × n PD matrix A, our ultimate goal is to develop a fast algorithm toefficiently solve Ax = b, or equivalently, compress the solver A−1 with a desiredcompression error. We make the following assumptions on the matrix A. First ofall, λmin(A) = O(1) for well-posedness, where λmin(A) is the minimum eigenvalueof the matrix A. Second, the spectrum of the matrix A is broad-banded. Third, it isstemmed from summation of symmetric and positive semidefinite (PSD) matrices,i.e. A =

∑mk=1 Ek for a sequence Ek

mk=1 of PSD matrices. We remark that the

second assumption can be interpreted as the sparsity requirement of A, which is, theexistence of some intrinsic, localized geometric information. For instance, if A = L

is a graph Laplacian, such sparsity can be described by the requirement

#Nk (i) = O(kd), ∀i,

where #Nk (i) is the number of vertices near the vertex i with logic distance no greaterthan k and d is the geometric dimension of the graph (i.e., the optimal embeddingdimension). This is equivalent to assuming that the portion of long interaction edgesis small. The third assumption, in many concerning cases, is a natural consequenceduring the assembling of the matrix A. In particular, a graph Laplacian L can beviewed as a summation of 2-by-2 matrices representing edges in the graph G. These2 by 2 matrices are PSD matrices and can be obtained automatically if G is given.

3

Another illustrative example is the patch-wise stiffness matrix of a finite elementfrom the discretization of partial differential equation (PDE) using finite elementtype methods.

Operator CompressionTo compress the operator A−1 (where A satisfies the above assumptions) with adesired error bound, we adopt the idea of constructing a modified coarse space asproposed in [53, 75, 88]. The procedure is summarized in Figure 1.1. Note that onecommon strategy of these PDE approaches is to make use of the natural partitionunder some a priori geometric assumption on the computational domain. In contrast,we adaptively construct an appropriate partition using the energy decompositionframework, which requires no a priori knowledge related to the underlying geometryof the domain. This partitioning idea is requisite and advantageous in the scenariowhen no explicit geometry information is provided, especially in the case of graphLaplacian. Therefore, one of our main contributions is to develop various criteriaand systematic procedures to obtain an appropriate partitionP = Pj

Mj=1 (e.g., graph

partitioning in the case of graph Laplacian) which reveals the structural property ofA.

Figure 1.1: Process flowchart for compressing A−1.

Leaving aside the difficulties of finding an appropriate partition, our next task is todefine a coarse spaceΦ such that ‖x−PΦx‖2 ≤ ε ‖x‖A. As shown in [53, 88], havingsuch requirement, togetherwith themodification of coarse spaceΦ intoΨ = A−1(Φ),we have ‖A−1−PA

ΨA−1‖2 ≤ ε2. This affirms us thatΦmust be constructed carefully

in order to achieve the prescribed compression accuracy. Further, under the energydecomposition setting, we can ensure such requirement by simply considering localaccuracy ‖x − PΦ j x‖2, which in turns gives a local computation procedure forchecking the qualification of the local Φ j . Specifically, we introduce the errorfactor

ε(P, q) = maxPj∈P

1√(λq+1(Pj ))

(where λq+1(Pj ) corresponds to the (q + 1)th smallest eigenvalue of some eigenproblem defined on the patch Pj) as our defender to strictly control the overall com-pression error. The error factor guides us to construct a subspace Φq

j ∈ span(Pj )

4

for every patch Pj satisfying the local accuracy, and eventually, the global accuracyrequirement. Afterwards, we apply the formulation of the modified coarse spaceΨ to build up the exponential decaying basis ψ j for the operator compression. Tobe general, we reformulate the coarse space modification procedure by using purelymatrix-based arguments. In addition, this reformulation immediately introducesanother criterion, called the condition factor δ(Pj ) over every patch Pj ∈ P. Thissecond measurement serves as another defender to control the well-posedness ofour compressed operator. Similar to the error factor, the condition factor is alocal measurement which can be obtained by solving a partial local eigen problem.This local restriction can naturally convey in a global sense to bound the maximumeigenvalue of the compressed operator. In particular, we prove that the compressedoperator Ast satisfies κ(Ast) ≤ maxPj∈P δ(Pj )‖A−1‖2.

Up to this point, we can see that the choice of the partition has a direct influence onboth the accuracy and the well-posedness of the compressed operator. In this work,we propose a nearly-linear time partitioning algorithmwhich is purely matrix-based,and with complexity

O(d · s2 · log s · n) +O(log s · n · log n),

where s is the average patch size and d is the intrinsic geometric dimension. Withthe relationship between the error factor and condition factor, we can reverselytreat the local properties as the blueprint to govern the process of partitioning.This in turn regularizes the condition number of the compressed operator such thatcomputational complexity of solving the original linear system can be significantlyreduced.

Multiresolution Matrix DecompositionHaving a generic operator compression scheme, we follow the idea in [88] to extendthe compression scheme hierarchically to form a multiresolution matrix decompo-sition (MMD) algorithm. However, instead of using a precedent nested partitioningof the domain, we perform the decomposition level-by-level in a recursive manner.In other words, every new level is generated inductively from the previous level(subject to some uniform well-posedness constraints) by applying our adaptive par-titioning technique. This provides more flexibility and convenience to deal withvarious, and even unknown multiresolution behavior appearing in the matrix. Thisdecomposition further leads us to develop a fast solver for PD matrices with time

5

complexityO(nnz(A) · log n ·

(log ε−1 + log n

)clog ε−1),

where nnz(A) is the number of nonzero entries in A and c is some absolute con-stant depending only on the geometric property of A. We would like to emphasizethat the construction of the appropriate partition is essential to the formation of thehierarchical decomposition procedure. Our study shows that the hidden geometricinformation of A can be subtly recovered from the inherited energy decompositionof compressed operator using our framework. The interaction between these inher-ited energies serve a purpose similar to that of the energy elements of A. Thereforewe can recognize the compressed operator as an initial operator in the second levelof the decomposition procedure and proceed to the next level repeatedly. We wouldlike to emphasize that with an appropriate choice of partitioning, the sparsity andthe well-posedness properties of the compressed operator can be inherited throughlayers. This nice property enables us to decompose the original problem of solv-ing A−1 into sets of independent problems with similar complexities and conditionnumbers, which favors the parallel implementation of the solver.

Related Previous WorksIn the past few years, several works relevant to the compression of elliptic operatorswith heterogeneous and highly varying coefficients have been proposed. Målqvistand Peterseim et al. [63, 75] constructed localized multiscale basis functions fromthe modified coarse space V ms

H = VH − FVH , where VH is the original coarsespace spanned by conforming nodal basis, and F is the energy projection onto thespace (VH )⊥. The exponential decaying property of this modified basis has alsobeen shown both theoretically and numerically. Meanwhile, a beautifully insightfulwork by Owhadi [88] reformulated the problem from the perspective of DecisionTheory using the idea of Gamblets as the modified basis. In particular, a coarsespace Φ of measurement functions is constructed from a Bayesian perspective,and the gamblet space is explicitly given as Ψ = A−1(Φ), which turns out tobe a counterpart of the modified coarse space in [75]. In addition, the basis ofΦ is generalized to nonconforming measurement functions and the gamblets stillenjoy the exponential decay property which makes localization possible. Hou andZhang in [53] extended these works such that localized basis functions can also beconstructed for higher-order strongly elliptic operators. Owhadi further generalizedthese frameworks to a more unified methodology for arbitrary elliptic operators onSobolev spaces in [89] using the Gaussian process interpretation. Note that for the

6

above-mentioned works, since the problems they considered are originated fromPDE-type modeling, the computational domains are naturally assumed to be given,that is, the partition P can be obtained directly (which is not available for graphLaplacians or general PD matrices). This assumption greatly helps the immersionof the nested coarse spaces with different scales into the computational domain.In other words, the exponential decaying property of the basis can be preciselyachieved. Recently, Schäfer et al. [104] proposed a near-linear running timealgorithm to compress a large class of dense kernel matrices Θ ∈ Rn×n. The authorsalso provided rigorous complexity analyses and showed that the storage complexityof the proposed algorithm isO(n log(n) logd (n/ε )) and the running time complexityis O(n log2(n) log2d (n/ε )), where d is the intrinsic dimension of the problem.

Recall that for solving linear systems exhibiting multiple scales of behavior, the classof multiresolution methods decomposes the problem additively in terms of differentscales of resolution. This captures the features of different scales and allows usto treat these components differently. For instance, the widely used GeometricMultigrid (GMG) methods [28, 103, 129] provide fast solvers for linear systemswhich are stemmed from discretization of linear elliptic differential equations. Themain idea is to accelerate the convergence of basic iterativemethods by introducing anested structure on the computational domain so that successive subspace correctioncan be performed. However, the performance is hindered when the regularity of thecoefficients is lost. To overcome this deficiency, an enormous amount of significantprogress has been made. Numerous methods ranging from geometry specific topurely algebraic/ matrix-based approach have been developed (See [7, 44, 118] forreview). Using the tools of compressing the operator A−1 possessing multiple scalefeatures, Owhadi in [88] also proposed a straightforward but intelligible way tosolve the roughness issue. By introducing a natural nested structure on the givencomputational domain, a systematic multiresolution algorithm for hierarchicallydecomposing elliptic operators was proposed. This in turn provides a near-linearcomplexity solver with guaranteed prescribed error bounds. The efficiency ofthis multilevel solver is guaranteed by carefully choosing a nested structure ofmeasurement functionsΦ, which satisfies (i) the Poincaré inequality; (ii) the inversePoincaré inequality; and (iii) the frame inequality. In [89], Owhadi and Scovelextended the result to problems with general PD matrices, where the existence of Φsatisfying (i), (ii) and (iii) is assumed. In particular, for discretization of continuouslinear bijections from H s (Ω) to H−s (Ω) or L2(Ω) space these assumptions are shownto hold true using prior information on the geometry of the computational domainΩ.

7

However, the practical construction of this nested global structure Φ still presentssome essential difficulty when no intrinsic geometric information is provided apriori. To solve this generic problem, we introduce the energy decomposition andthe inherited system of energy elements. Instead of a priori assuming the existenceof such nested structure Φ, we use the idea of inherited energy decomposition tolevel-wisely constructΦ and the corresponding energy decomposition by using localspectral information and an adaptive clustering technique.

On the other hand, to mimic the functionality and convergence behavior of GMGwithout providing the nested meshes, the Algebraic Multigrid (AMG) methods[118, 123, 125] take advantage of the connectivity information given in the matrix A

itself to define intergrid transfer operators, which avoids the direct construction of therestriction and relaxation operators in GMG methods. Intuitively, the connectivityinformation reveals the hidden geometry of the problem subtly. This purely algebraicframework bypasses the “geometric" requirement in GMG, and is widely used inpractice on graphs with sufficiently nice topologies. In particular, a recent AMGmethod called LAMG has been proposed by Livne and Brandt [71], where the runtime and storage of the algorithm are empirically demonstrated to scale linearly withthe number of edges. We would like to emphasize that the difference between ourproposed solver and a general recursive-typed iterative solver is the absence of nestediterations. Our solver decomposes the matrix adaptively according to the inheritedmultiple scales of the matrix itself. The matrix decomposition divides the originalproblem into components of controllable well-conditioned, lower dimensional PDlinear systems, which can then be solved in parallel using any generic iterativesolver. In other words, this decomposition also provides a parallelizable frameworkfor solving PD linear systems.

Another inspiring stream of nearly-linear time algorithm for solving graph Laplaciansystemwas given by Spielman and Teng [112–115]. With the innovative discoveriesin spectral graph theory and graph algorithms, such as the fast construction oflow-stretch spanning trees and clustering scheme, they successfully employed allthese important techniques in developing an effective preconditioned iterative solver.Later, Koutis, Miller and Peng [59] followed these ideas and simplified the solverwith computation complexity O(m log n log log n log ε−1), where m and n are thenumber of edges and vertices respectively. In contrast, we employ the idea ofmodified coarse space to compress a general PD matrix (i.e., the graph Laplacian inthis case) hierarchically with the control of sparsity and well-posedness.

8

1.2 Hierarchically Preconditioned EigensolverThe computation of eigenpairs for large and sparse matrices, particularly for positivesemidefinite matrices (PSD) matrices, is one of the most fundamental tasks in manyscientific applications. For example, the leftmost eigenpairs (i.e., the eigenpairsassociated with the N smallest eigenvalues for some N ∈ N) of a graph LaplacianL help reveal the topological information of the corresponding network from realdata. One illustrative example is that the multiplicity of the smallest eigenvalue λ1of L coincides with the number of the connected components of the correspondinggraph G. In particular, the second-smallest eigenvalue of L is well known as thealgebraic connectivity or the Fiedler value of the graph G, which is applied to de-velop algorithms for graph partitioning [30, 82, 84]. Another important exampleregarding the use of leftmost eigenpairs is the computation of betweenness central-ity of graphs as mentioned in [12, 17, 18]. Computing the leftmost eigenpairs oflarge and sparse PSD matrices also stems from the problem of predicting electronicproperties in complex structural systems [41]. Such prediction is achieved by solv-ing the Schrödinger equation HΨ = EΨ, where H is the Hamiltonian operatorfor the system, E corresponds to the total energy and |Ψ(r) |2 represents the chargedensity at location r . Solving this equation using the self-consistent field requirescomputing the eigenpairs of H repeatedly, which dominates the total computationcost of the overall iterations. Thus, an efficient algorithm to solve the eigenproblemis indispensable. Usage of leftmost eigenpairs can also be found in vibrationalanalysis in mechanical engineering [78]. In [31], authors also suggest that the left-most eigenpairs of the covariance matrix between residues are important to extractfunctional and structural information about protein families. Efficient algorithmsfor computing p smallest eigenpairs for relatively large p are therefore crucial invarious applications.

Iterative MethodsAs most of the linear systems from engineering problems or networks are typicallylarge and sparse in nature, iterative methods are preferred. Recently, several effi-cient algorithms have been developed to obtain leftmost eigenpairs of A. Theseinclude the Jacobi–Davidson method [108], implicit restarted Arnoldi/Lanczosmethod [22, 65, 110], and the deflation-accelerated Newton method [13]. All thesemethods give promising results [12, 76], especially for finding a small number ofleftmost eigenpairs. Other methods for computing accurate leftmost eigenpairs us-ing hierarchical refinement/correction techniques were proposed in [69, 133, 134].

9

However, as reported in [76], the Implicit Restarted Lanczos Method (IRLM) isstill the best performing algorithm when a large number of smallest eigenpairs arerequired. Therefore, it is highly desirable to develop a new algorithm, based on thearchitecture of the IRLM, that can further optimize the performance.

The main purpose of this part of work is to explore the possibility of exploiting theadvantageous energy decomposition framework under the architecture of the IRLM.In particular, we propose a new spectrum-preserving preconditioned hierarchicaleigensolver for computing a large number of the leftmost eigenpairs. This eigen-solver takes full advantage of the intrinsic structure of the given matrix, the nicespectral property in the Lanczos procedure and also the preconditioning character-istics of the Conjugate Gradient (CG) method. Given a sparse symmetric positivematrix A which is assumed to be energy decomposable (See Section 2.1 for details),we integrate the well-behaved matrix properties that are inherited from the MMDwith IRLM. The preconditioner we propose for the CG method (which hence be-comes the Preconditioned Conjugate Gradient (PCG) method) can also preserve thenarrowed residual spectrum of A during the Lanzcos procedure. Throughout thisthesis, theoretical performance of our proposed algorithm is analyzed rigorouslyand we conduct a number of numerical experiments to verify the efficiency and ef-fectiveness of the algorithm in practice. To summarize, our contributions are threefolds:

• We establish a hierarchical framework to compute a relatively large numberof leftmost eigenpairs of a sparse symmetric positive matrix. This frameworkemploys the MMD algorithm to further optimize the performance of IRLM.In particular, a specially designed spectrum-preserving preconditioner is in-troduced for the PCG method to compute x = A−1b for some vector b.

• The proposed framework improves the running time of finding the mtar left-most eigenpairs of amatrix A ∈ Rn×n fromO(mtar ·κ(A)·nnz(A) log 1

ε ) (whichis achieved by the classical IRLM) to O

(mtar · nnz(A) · (log 1

ε + log n)C),

where κ(A) is the condition number of A, nnz(·) is the number of nonzeroentries and C is some small constant independent of mtar, nnz(A) and κ(A).

• We also provide a rigorous analysis on both the accuracy and the asymptoticcomputational complexity of our proposed algorithm. This ensures the cor-rectness and efficiency of the algorithm even in large-scale, ill-conditionedscenarios.

10

In many practical applications, the operator A may not be explicitly stored entry-wisely and only the evaluation of Ax is available. In this situation, our proposedalgorithm also works as it only requires the storage of the stiffness matrices corre-sponding to some hierarchical basis Ψ according to the accuracy requirement. Toconstruct these stiffness matrices, we only need to evaluate Ax. Therefore, for theease of discussion, we simply assume that the given operator A is a finite-dimensionalaccessible matrix.

Overview of Our MethodIn this part of work, we propose and develop an iterative scheme for computinga relatively large number of the left most eigenpairs of a PSD matrix, under theframework of operator compression and decomposition introduced in the first partof the thesis. Note that we can transfer a PSD matrix into a PD matrix by adding toit a constant multiple of the identity matrix, which does not change the eigenvectorsand only shifts all eigenvalues by one uniform constant. Under our precedingframework, we can decompose the inverse A−1 of a PD matrix A ∈ Rn×n into

A−1 = PAU

A−1 + PAΨ

A−1 := PAU

A−1 + Θ,

where [U,Ψ] corresponds to a basis of Rn; PAU

and PAΨ

are the correspondingsubspace projections. Recursively, we can also consider Θ as a “new” A−1 andfurther decompose Θ in the same manner. This will give a MMD of A−1 =∑K

k=1 PAU (k ) A−1 + Θ(K ). To illustrate, we first consider a 1-level decomposition,

i.e., K = 1. One important observation regarding this decomposition is that thespectrum of the original operator A−1 resembles that of the compressed operator Θ.In particular, if λi,Θ is the ith smallest eigenvalue of Θ and ζi,Θ is the correspondingeigenvector, then (λ−1i,Θ, ζi,Θ) is a good approximation of (λ−1i , qi) for small λi, where(λ−1i , qi) denotes the ith eigenpair of A−1. These approximate eigenpairs (λ−1i,Θ, ζi,Θ)can then be used as the initial approximation of the targeted eigenpairs. Notice thatcompression errors are introduced into these eigenpairs by thematrix decomposition.Therefore, a refinement procedure should be carried out to reduce these errors up tothe prescribed accuracy. Once we obtain the refined eigenpairs, we may extend thespectrum in order to obtain the targeted number of eigenpairs. As observed in [76],the IRLM is the best performing algorithm when a large number of eigenpairs areconsidered, we therefore employ the Krylov subspace extension technique to extendspectrum up to some prescribed control of the well-posedness. Intuitively, theMMDdecomposes the spectrum of A−1 into different segments of different scales. Using a

11

subset of the decomposed components to approximate A−1 yields a great reductionof the relative condition number. Thus, we can further trim down the complexity ofthe IRLM by approximating A−1 during the shifting process.

To generalize from the 1-level algorithm to the K-level algorithm, we develop ahierarchical scheme to compute the leftmost eigenpairs of an energy decomposablematrix. Given the K-level multiresolution decomposition Θ(k)Kk=1 of an energydecomposable matrix A, we first compute the eigen decomposition [V (K )

ex , D(K )ex ]

of Θ(K ) (with dimension N (K )) corresponding to the coarsest level by using somestandard direct method. Then we propose a compatible refinement scheme forboth V (K )

ex and D(K )ex to obtain V (K−1)

ini and D(K−1)ini , which will then be used as the

initial spectrum in the consecutive finer level. The efficiency of the cross-levelrefinement is achieved by a modified version of the orthogonal iteration with theRitz Acceleration, where we exploit the proximity of the eigenspace across levelsto accelerate the CG/PCG method within the refinement step. Using this refinedinitial spectrum, our second stage is to extend spectrum up to some prescribedcontrol of the well-posedness using the Implicit Restarted Lanczos architecture.Recall that a shifting approach is introduced to reduce the iteration number for theextension, which again requires solving A(K−1) x = w with the PCG method in eachiteration. However, the preconditioner for PCG when we are solving for A(K−1)w

must be chosen carefully. Otherwise the orthogonal property brought about by theKrylov subspace methods may not be utilized and a large CG iteration number willbe required (See Section 4.7). In view of this, we propose a spectrum-preservinghierarchical preconditioner M (K−1) := (Ψ(K−1))TΨ(K−1) for accelerating the CGiteration during the Lanczos iteration. In particular, we can show that using thepreconditioner M (K−1), the number of PCG iterations to achieve a relative ε inA(K−1)-norm can be controlled in terms of the condition factor δ(P) (from theenergy decomposition of the matrix) and an extension threshold µ(K−1)

ex .

This process then repeats hierarchically until we reach the finest level. Underthis framework, the condition numbers of all the corresponding operators withineach level are controlled. The overall accuracy of our proposed algorithm is alsodetermined by the prescribed compression error at the highest level.

Related Previous WorksSeveral important iterative methods have been proposed to tackle the eigenproblemsof PD matrices. One of the well-established algorithms is the Implicitly Restarted

12

Lanczos Method (IRLM) (or the Implicitly Restarted Arnoldi Method (IRAM)for asymmetrical sparse matrices), which has been implemented in various popularscientific computing packages likeMATLAB,R andARPACK.The IRLMcombinesboth the techniques of the implicitly shifted QR method and the shifting of theoperators to avoid the difficulties for obtaining the leftmost eigenpairs. Anotherpopular algorithm for finding leftmost eigenpairs is the Jacobi–Davidson method.The main idea is to minimize the Rayleigh quotient q(x) = xT Ax

xT x using a Newton-type methodology. Efficacy and stability of the algorithm are then achieved byusing a projected simplification of the Hessian of the Rayleigh quotient namely,J (xk ) := (I − xk xT

k )(A − q(xk )I)(I − xk xTk ) with the update of xk given by

xk+1 = xk − J (xk )−1(Axk − q(xk )xk ). (1.1)

Notice that the advantage of such approach is the low accuracy requirement for solv-ing (1.1). A parallel implementation of this algorithm was also proposed [99]. In[13], the authors proposed the Deflation-Accelerated Conjugate Gradient (DACG)method designed for solving the eigenproblem of PD matrices. The main idea is toreplace the Newton’s minimization procedure of the Rayleigh quotient r (x) by thenonlinear CG method which avoids solving linear systems within the algorithm. Acomprehensive numerical comparison between the three algorithms was reportedin [12]. Recently, Martínez [76] studied a class of tuned preconditioners for accel-erating both the DACG and the IRLM for the computation of the smallest set ofeigenpairs of large and sparse PD matrices. However, as reported in [76], the IRLMstill outperforms the others when a relatively large number of leftmost eigenpairs isdesired. By virtue of this, we are motivated to develop a more efficient algorithmparticularly designed for computing a large number of leftmost eigenpairs.

Another class of methods is designed for constructing localized/compressed eigen-modes that capture the space spanned by the true leftmost eigenvectors. One ofthe representative pioneer works was proposed by Ozolinš et al. in [90]. The goalof this work is to obtain a spatially localized solution of a class of problems inmathematical physics by constructing the compressed modes. In particular, findingthese localized modes can be formulated as an optimization problem

ΨN = argminΨN

1µ‖ΨN ‖1 + Tr(ΨT

N HΨN ) such that ΨTN ΨN = I .

The authors of [90] proposed an algorithm based on the split Bregman iteration tosolve the L1 minimization problem. Replacing the Hamiltonian operator H by a

13

graph Laplacian matrix, one obtains the L1 regularized Principal Component Anal-ysis. In particular, if there is no L1 regularization term in the optimization problem,the optimal ΨN will be the first N eigenvectors of A. Furthermore, when N isreasonably larger than some integer mtar , the localized basis functions ΨN producedwith L1 regularization can approximately span the mtar leftmost eigenspace (i.e.,the eigenspace spanned by the mtar leftmost eigenvectors). Similarly, the MMDframework provides us the hierarchical and sparse/localized basis Ψ. These lo-calized basis functions capture the compressed modes and eventually provide us aconvenient way to control the complexity of the eigensolver.

1.3 Concentration of Eigenvalue SumsAs we have discussed in the preceding sections, theories and algorithms on theleftmost eigenpairs of PSDmatrices, have drawn great attention from awide range ofstudies, due to their importance inmany scientificmodelings. In the study of physicalHamiltonian systems, the leftmost eigenpairs of a Hamiltonian operator representthe ground states and their energies [61, 128]. Computing the leftmost eigenpairs istherefore a fundamental task for understanding a Hamiltonian system. As we havementioned in the previous section, the smallest eigenvalues of a graph Laplacianreflect the connectivity of the graph, and the corresponding eigenvectors divide thegraph into closely connected clusters. Based on this, a great numbers of spectralclustering techniques have been developed over the last decades [29, 84, 127].

The importance of developing efficient algorithms for computing the leftmost eigen-pairs of PSD matrices is then beyond question. However, the efficiency of thehierarchical eigensolver we proposed, and of many other iterative methods, reliesheavily on the sparsity of the target PSD matrix. When we deal with a large anddense matrix in practice, random sparsification or random sampling provides an ef-fective way to remarkably reduce the computational cost [111, 113]. The purpose ofrandom sparsification is to reduce the complexity of a system without significantlyperturbing its spectrum. Then by applying eigensolvers to the sparsified matrices,we can still be confident that the resulting eigenpairs are close to the ground truth.The practicability of these random approaches is guaranteed by the well-establishedtheories of matrix concentration.

In this thesis, we will establish concentration results on the sum of the smallesteigenvalues of PSDmatrices. The reason why we study this quantity is that it carriesmeaningful algebraic and geometric information of the matrix. By an extended

14

version of the Courant-Fisher characterization of eigenvalues, the sum of the k

smallest eigenvalues of a PSD matrix A (denoted by Sk (A)) is the minimum of theoptimization problem

minQ∈Cn×k,Q∗Q=Ik

Tr[Q∗AQ],

and this minimum is achieved by the orthogonal matrix V = [v1, v2, · · · , vk] ∈ Cn×k

whose columns are the k leftmost eigenvectors of A. Therefore, if we obtain an ap-proximation V ofV , the accuracy of this approximation can be measured by the ratioTr[V ∗AV ]/Sk (A), which requires knowing the value of Sk (A). When A is large anddense, computing Sk (A) can be a computationally intractable problem. However,if we can efficiently compute the smallest eigenvalues of a random sparsification A

of A and if we have concentration guarantee on Sk ( A) such that Sk ( A) − Sk (A) issmall, thenwe can use Sk ( A) as a substitute tomeasure the accuracy of approximatedeigenvectors.

Another illustration of the importance of the sum of the smallest eigenvalues arises inthe connectivity study of graphs. By spectral theory, the number of zero eigenvaluesof the Laplacian of a graph indicates the number of connected components in thegraph. A relaxed version is that the number of the Laplacian’s eigenvalues that areclose to zero indicates the number of major clusters in the graph. Based on thisobservation, many researchers have developed clustering methods by investigatingthe smallest eigenvalues of graph Laplacians. Therefore, if the sum of the smallesteigenvalues has a concentration property, we can indirectly study the clustering of agraph by only looking at the Laplacian of an associated sparsified graph. Moreover,in many cases of interest the number k of clusters is assumed to be known a priori[29, 96, 98], then the smallness of the sum of the k smallest of a sparsified Laplacianis a good indicator that it preserves the connectivity property of the original graph.

We remark that the rightmost eigenpairs of PSD matrices also play a significant rolein many applications. In fact, studying the outstanding rightmost eigenpairs of acorrelation matrix is the foundation of the Principal Component Analysis [55, 56],which aims to extract an intrinsic low-dimensional structure from high-dimensionaldata. We will therefore simultaneously develop concentration results on the sum ofthe smallest eigenvalues and the sum of the largest eigenvalues.

Matrix Concentration InequalitiesMatrix concentration describes how likely a randommatrix is close to its expectation.As noncommutative extensions of their scalar counterparts, matrix concentration

15

inequalities have been widely studied through many efforts and have had a profoundimpact on a wide range of areas in computational mathematics and statistics. Inparticular, many important results on concentration phenomena of sums of indepen-dent random matrices [73, 87, 101, 121] and matrix-valued martingale sequences[86, 94, 120] have been developed over the past decades; we refer to the monograph[122] by Tropp for a detailed overview on this subject and an extensive bibliography.

These matrix concentration results provide rich theoretical supports for studies anddevelopments in stochastic models and algorithms for random matrices [77, 130] infields ranging from quantum physics [43] to financial statistics [62, 95]. A typicalexample is the study of clustering of random graphs [1, 38] arising from researchon social networks [79], image classification [9, 20] and so on.

In our setting, we are interested in the concentration phenomena of a class ofrandom PSD matrices that admit an energy decomposition. To be specific, we wantto study how far does the spectrum of a random PSD matrix Y deviate from thespectrum of its expectation EY , where Y =

∑mi=1 X (i) is the sum of a sequence of

independent random PSD matrices X (i)mi=1. For this type of random matrices,concentration inequalities on the extreme eigenvalues have been well developed[122]. For example, the following Chernoff-type tail bounds regarding the largestand the smallest eigenvalues are due to Tropp [121, Theorem 1.1]:

P λmax(Y ) ≥ (1 + ε)λmax(EY ) ≤ n(

eε

(1 + ε)1+ε

)λmax(EY )/c

, ε ≥ 0,

P λmin(Y ) ≤ (1 − ε)λmin(EY ) ≤ n(

e−ε

(1 − ε)1−ε

)λmin(EY )/c

, ε ∈ [0, 1),

where n is the dimension of Y and c = maxi ‖X (i) ‖. This inequality indicates that,with a high probability, the largest (or smallest) eigenvalue ofY still provides a goodapproximation to that of EY .

The main purpose of this part of the thesis is to generalize these concentrationinequalities on the extreme eigenvalues to their counterparts on the sum of the k

largest (or smallest) eigenvalues for any natural number k. In particular, we willextend the preceding Tropp’s inequalities to the following new Chernoff-type tail

16

bounds:

P

k∑i=1

λi (Y ) ≥ (1 + ε)k∑

i=1λi (EY )

≤

(nk

) 1k(

eε

(1 + ε)1+ε

) 1ck

∑ki=1 λi (EY )

, ε ≥ 0,

P

k∑i=1

λn−i+1(Y ) ≤ (1 − ε)k∑

i=1λn−i+1(EY )

≤

(nk

) 1k(

e−ε

(1 − ε)1−ε

) 1ck

∑ki=1 λn−i+1(EY )

, ε ∈ [0, 1),

where λi (Y ) denotes the ith largest eigenvalue of Y so∑k

i=1 λn−i+1(Y ) is the sum ofthe smallest k eigenvalues of Y . Our result shows that, the sum of the k largest (orsmallest) eigenvalues obeys a similar concentration law as the largest (or smallest)eigenvalue.

Lieb’s Concavity TheoremTropp’s proof of his concentration inequalities on extreme eigenvalues is basedon the matrix Laplace transform method (see, for example, [74]) and also reliescritically on a concavity theorem by Lieb: the function

A 7−→ Tr[exp(H + log A)

], (1.2)

is concave on H++n , for any n × n Hermitian matrix H . Here H++n denotes theconvex cone of all Hermitian, PD matrices. Note that this part of theories doesnot restrict to real symmetric matrices, hence we will be dealing with Hermitianmatrices. The concavity of (1.2) is equivalent to a celebrated result in the study oftrace inequalities, the joint concavity of the function

(A, B) 7−→ Tr[K∗ApK Bq] (1.3)

on H+n × H+m, for any K ∈ Cn×m, p, q ∈ [0, 1], p + q ≤ 1, known as Lieb’s Con-cavity Theorem [67]. Here H+n is the convex cone of all n × n Hermitian, positivesemidefinite matrices. This theorem answered affirmatively an important conjec-ture by Wigner, Yanase and Dyson [131] in information theory. It also led to Lieb’sthree-matrix extension of the Golden–Thompson inequality

Tr[eA+B+C]

≤ Tr[eA

∫ ∞

0(e−B + t I)−1eC (e−B + t I)−1dt

](1.4)

17

for arbitrary Hermitian matrices A, B,C of the same size, which was then used byLieb and Ruskai [68] to prove the strong subadditivity of quantum entropy.

In this part of work, we will generalize Lieb’s concavity theorem from trace to aclass of homogeneous matrix functions. In particular, we will prove that the function

(A, B) 7−→ Trk[(B

q2 K∗ApK B

q2 )s] 1

k (1.5)

is jointly concave on H+n × H+m, for any K ∈ Cn×m and any p, q ∈ [0, 1], s ∈[0, 1

p+q ]. The k-trace functionTrk[A] of a squarematrix Adenotes the kth elementarysymmetric polynomial of the eigenvalues of A:

Trk[A] =∑

1≤i1<i2<···<ik≤n

λi1λi2 · · · λik , 1 ≤ k ≤ n, (1.6)

where λ1, λ2, · · · , λn are all eigenvalues of A. In particular, we have Tr1[A] = Tr[A]and Trn[A] = det[A].

In the case k = 1, the concavity of function (1.5) has been studied by many andresults with an increasing range of s have been obtained over time: 1 ≤ s ≤ 1

p+q

[49] and 12 ≤ s ≤ 1

p+q [47] by Hiai, and 0 ≤ s ≤ 11+q by Carlen, Frank and Lieb

[25]. These partial results together already suffice to conclude the concavity forthe full range 0 ≤ p, q ≤ 1, 0 ≤ s ≤ 1

p+q . The first complete proof of concavitycovering the full range is due to Hiai [48]. Meanwhile, the convexity of function(1.5) with k = 1 for different ranges of p, q, s has also been established. A completeconvexity/concavity result for the full range of p, q, s was recently accomplished byZhang [136] using an elegant variational approach that is modified from a variationalmethod developed by Carlen and Lieb [27]. Here “full” means the conditions arealso necessary for the corresponding convexity/concavity to hold for all dimensionsn,m. We refer to the papers [26] by Carlen, Frank and Lieb, and [136] by Zhang forhistorical overviews on this topic in more detail. In this thesis, we will only workon the concavity of function (1.5), since the map A 7→ Trk[A]

1k is homogeneous of

order 1 and concave on H+n for k ≥ 2. Our work therefore extends the establishedconcavity results from the normal trace to a class of k-trace functions.

After establishing our generalized Lieb’s concavity theorem, it is easy to furtherderive the concavity of the map

A 7−→ Trk[exp(H + log A)

] 1k (1.7)

on H++n , for any Hermitian matrix H of the same size. This will then yield thedesired concentration inequalities on sums of eigenvalues. Our proof of this part

18

is similar in spirit to Tropp’s argument based on the original Lieb’s theorem (1.2).However, the extension from trace to k-trace will endow this argument the power tocontrol the partial sums of eigenvalues rather than just the extreme eigenvalues.

Since Lieb’s original establishment of his concavity theorem, alternative proofs havebeen developed from different aspects of matrix theories, including matrix tensors(Ando [3], Carlen [24], Nikoufar et al. [85]), the theory of Herglotz functions(Epstein [37]), and interpolation theories (Uhlmann [124], Kosaki [58]). The tensorapproaches prove the theorem elegantly by translating the concavity of (1.3) tothe operator concavity of the map (A, B) 7→ Ap ⊗ Bq, but have difficulties ingeneralizing to our k-trace case due to the nonlinearity of Trk . However, the k-tracehas two good properties that are most essential to the desired results: (i) the mapA 7→ Trk[A]

1k is concave on H+n , and (ii) the k-trace satisfies Hölder’s inequality

as the normal trace does. We, therefore, turned to the more generalizable methodsof operator interpolation based essentially on Hölder’s inequality. Originating fromthe Hadamard three-lines theorem [45], the interpolation of operators has been apowerful tool in operator and functional analysis, with variants including the Riesz-Thorin interpolation theorem [97], Stein’s interpolation of holomorphic operators[116], Peetre’s K-method [92] and many others. In particular, we found Stein’scomplex interpolation technique most compatible and easiest to use in the k-tracesetting. Our use of interpolation technique was inspired by a recent work of Sutteret al. [119], in which they applied Stein’s interpolation to derive a multivariateextension of the Golden–Thompson inequality. This interpolation technique willhelp us first prove a key lemma that the function

A 7−→ Trk[(K∗ApK )s] 1

k (1.8)

is concave onH+n , for any K ∈ Cn×n and any p ∈ [0, 1], s ∈ [0, 1p ]. Note that function(1.8) is a special case of function (1.5) with q = 0. Given this lemma, the concavityin the more general case can be obtained via a powerful variational argument thatoriginates in [27] by Carlen and Lieb. This kind of variational methods, introducingsupremum/infimum characterizations of trace functions, has been widely used in thestudy of the convexity/concavity of function (1.5) and its variants in the trace case(see, e.g., [24–27]). Our proof for the k-trace case is similar in inspirit to a refinedvariational approach by Zhang [136], which is again based on Hölder’s inequalities.

1.4 Summary of the ThesisThe remaining thesis is organized as follows.

19

• In Chapter 2, we present the general framework of operator compression viaenergy decomposition. The overall approach consists of three major parts:(i) the construction of an adaptive partition P, (ii) the choice of the coarsespace Φ and its basis, and (iii) the construction of the modified coarse spaceΨ and its (localized) basis. We prove the spatially exponential decay propertyof the basis functions of Ψ, which ensures the nearly optimal complexity ofour method. We introduce two local measurement factors, the error factorand the condition factor, to provide a theoretical guarantee (error estimate andcomplexity estimate) for our method and practical guidance for the design ofour algorithms. The performance of our operator compression framework isthen tested on a PDE example.

• In Chapter 3, we extend our operator compression framework to a MMDscheme. The development of our MMD relies on the inheritance of theenergy decomposition through a pyramid-like hierarchical structure, fromthe finest level to the coarsest level. Our MMD scheme naturally leads to aparallelizable fast linear solver for large sparse PD linear systems. We illustrateby several graph Laplacian examples the effectiveness of our MMD approachin resolving the large condition number of a PD matrix by decomposing itinto many well-conditioned pieces.

• In Chapter 4, we develop our MMD framework into a hierarchically pre-conditioned eigensolver for sparse positive semidefinite matrices, based onthe celebrated scheme of the Implicit Restarted Lanczos Method. Our hi-erarchical eigensolver consists of two alternating procedures: the level-wisespectrum extension process that finds new eigenpair candidates, and the cross-level spectrum refinement process that refines the computed eigenspace. Theextension process relies critically on the careful design of a spectral preserv-ing preconditioner. The refinement process follows the Orthogonal Iterationwith Ritz Acceleration which converges exponentially fast with properly cho-sen parameters. Our hierarchical eigensolver is tested on multiple numericalexamples, which provide evidence that our method could outperform otherexisting methods.

• In Chapter 5, we prove some concentration inequalities on eigenvalue sumsof random positive semidefinite matrices, extending existing concentrationresults on extreme eigenvalues. We introduce the notion of k-traces to pro-vide bounds on sums of the largest (or smallest) eigenvalues. These new

20

concentration inequalities are consequences of a generalized Lieb’s concavitytheorem, which extends the famous Lieb’s concavity theorem from normaltrace to k-traces. The proof of our generalized Lieb’s concavity theorem in-corporates a variety of mathematical concepts and techniques, including themixed discriminant, the exterior algebra, derivatives of matrix functions andoperator interpolation.

21

C h a p t e r 2

OPERATOR COMPRESSION VIA ENERGY DECOMPOSITION

In this chapter, we introduce a novel framework of operator compression via en-ergy decomposition. It provides a robust methodology for extracting the leadingcomponents of the inverse of a general class of positive definite (PD) matrix. Ourapproach consists of three major parts: the construction of an adaptive partition P,the choice of the coarse space Φ and its basis, and the construction of the modifiedcoarse space Ψ and its (localized) basis. We provide rigorous compression errorestimate and complexity estimate by introducing two local measurements whichcan be calculated efficiently by solving a local and partial eigen problem. Theseconcepts and methods will be discussed under the following outline.

In Section 2.1, we will introduce the foundation of our work, which is the notion ofenergy decomposition of general PD matrices. Section 2.2 discusses the construc-tion of the coarse space and its corresponding modified coarse space, which servesto construct the basis with exponential decaying property. Concurrently, the localmeasurements error factor and the condition factor are introduced. The analysisin this section will guide us to design the systematic algorithm for constructingthe partition P, which is described in Section 2.3. Discussion of the computa-tional complexity is also included. To demonstrate the efficacy of our partitioningalgorithm, two numerical results are reported in Section 2.4.

2.1 Energy DecompositionWe start by considering the linear system Lx = b, where L is the Laplacian of anundirected, positive-weighted graphG = V ;E,W , i.e.

Li j =

∑(i, j ′)∈E wi j ′ if i = j;

−wi j if i , j and (i, j) ∈ E;

0 otherwise.

(2.1)

We allow for the existence of self-loops (i, i) ∈ E. When L is singular we mean tosolve x = L†b, where L† is the pseudo-inverse of L. Our algorithm will base on afast clustering technique using local spectral information to give a good partition ofthe graph, upon which special local basis will be constructed and used to compress

22

the operator L−1 into a low dimensional approximation L−1com subject to a prescribedaccuracy.

As we will see, our clustering technique exploits local path-wise information of thegraph G by operating on each single edge in E, which can be easily adapted to alarger class of linear systems with symmetric, positive semidefinite matrix. Noticethat the contribution of an edge (i, j) ∈ E with weight wi j to the Laplacian matrixL is simply

Eii ,

i

*..,

+//-

0wii i

0, i = j; Ei j ,

i j

*.........,

+/////////-

0wi j −wi j i

. . .

−wi j wi j j

0

, i , j .

(2.2)And we have L =

∑(i, j)∈E Ei j . In view of such matrix decomposition, our algo-

rithm works for any symmetric, positive semidefinite matrix A that has a similardecomposition A =

∑mk=1 Ek with each Ek 0. Here Ek 0 means Ek is posi-

tive semidefinite. Therefore, we will theoretically develop our method for generaldecomposable PD matrices. Also we assume that A is invertible, as we can easilygeneralize our method to the case when A†b is pursued.

We therefore introduce the idea of energy decomposition and the correspondingmathematical formulation which motivates the methodology for solving linear sys-tems with energy decomposable linear operator. Let A be a n×n symmetric, positivedefinite matrix. We define the energy decomposition as follows:

Definition 2.1.1 (Energy Decomposition). We call Ek mk=1 an energy decomposi-

tion of A and Ek to be an energy element of A if

A =m∑

k=1Ek, Ek 0 ∀k = 1, . . . ,m. (2.3)

Intuitively, the underlying structural(geometric) information of the original matrixA can be realized through an appropriate energy decomposition. And to preserveas much detailed information of A as possible, it is better to use the finest energy

23

decomposition that we can have, which actually comes naturally from the generat-ing of A as we will see in some coming examples. More precisely, for an energydecomposition E = Ek

mk=1 of A, if there is some Ek that has its own energy decom-

position Ek = Ek,1 + Ek,2 that comes naturally, then the finer energy decompositionE f ine = Ek

mk=2∪Ek,1, Ek,2 is more preferred as it gives us more detailed informa-

tion of A. However one would see that any Ek can have some trivial decompositionEk =

12Ek +

12Ek , which makes no essential difference. To make it clear what should

be the finest underlying energy decomposition of A that we will use in our algorithm,we first introduce the neighboring relation between energy elements and basis.

Let E = Ek mk=1 be an energy decomposition of A, andV = vi

ni=1 be an orthonor-

mal basis of Rn. we introduce the following notation:

• For any E ∈ E and any v ∈ V , we denote E ∼ v if vT Ev > 0 ( or equivalentlyEv , 0, since E 0 );

• For any u, v ∈ V , we denote u ∼ v if ∃E ∈ E such that uT Ev , 0.

As an immediate example, if we take V to be the set of all distinct eigen vectorsof A, then v / u for any two v, u ∈ V , namely all basis functions are isolated andeverything is clear. But such choice of V is not trivial in that we know everythingabout A if we know its eigen vectors. Therefore, instead of doing things in thefrequency space, we assume the least knowledge of A and work in the physicalspace, that is we will chooseV to be the natural basis ei

ni=1 of R

n in all practicaluse. But for theoretical analysis, we still use the general basis notationV = vi

ni=1.

Also, for those who are familiar with graph theory, it is more convenient to un-derstand the sets V, E from graph perspective. Indeed, one can keep in mind thatG = V, E is the generalized concept of undirected graphs, whereV stands for theset of vertices, and E stands for the set of edges. For any vertices (basis) v, u ∈ V ,and any edge (energy) E ∈ E, v ∼ E means that E is an edge of v, and v ∼ u meansthat v and u share some common edge. However, different from the traditionalgraph setting, here one edge(energy) E may involve multiple vertices instead of justtwo, and two vertices(basis) v, u may share multiple edges that involve different setsof vertices. Further, the spectrum magnitude of the “multi-vertex edge" E can beviewed as a counterpart of edge weight in graph setting. Conversely, if the problemcomes directly from a weighted graph, then one can naturally construct the sets Vand E from the vertices and edges of the graph as we will see in Example 2.1.10.

24

Definition 2.1.2 (Neighboring). Let E = Ek mk=1 be an energy decomposition of

A, and V = vini=1 be an orthonormal basis of Rn. For any E ∈ E, We define

N (E;V ) := v ∈ V : E ∼ v to be the set of v ∈ V neighboring E. Similarly, forany v ∈ V , we define N (v; E) := E ∈ E : E ∼ v and N (v) := u ∈ V : u ∼ v

to be the set of E ∈ E and the set of u ∈ V neighboring v ∈ V respectively.Furthermore, for any S ⊂ V and any E ∈ E, we denote E ∼ S ifN (E;V )∩S , ∅and E ∈ S if N (E;V ) ⊂ S.

In what follows, we will see that if two energy elements Ek, Ek ′ have the sameneighbor basis, namelyN (Ek ;V ) = N (Ek ′;V ), then there is no need to distinguishbetween them, since it is the neighboring relation between energy elements and basisthat matters in how we make use of the energy decomposition. Therefore we say anenergy decomposition E = Ek

mk=1 is the finest underlying energy decomposition

of A if no Ek ∈ E can be further decomposed as

Ek = Ek,1 + Ek,2,

where eitherN (Ek,1;V ) & N (Ek ;V ) orN (Ek,2;V ) & N (Ek ;V ). From now on,we will always assume that E = Ek

mk=1 is the finest underlying energy decomposi-

tion of A that comes along with A.

Using the neighboring concept between energy elements and orthonormal basis, wecan then define various energies of a subset S ⊂ V as follows:

Definition 2.1.3 (Restricted, Interior and Closed energy). Let E = Ek mk=1 be a

energy decomposition of A. Let S be a subset of V , and PS be the orthogonalprojection onto S. The restricted energy of S with respect to A is defined as

AS := PSAPS; (2.4)

The interior energy of S with respect to A and E is defined as

AES=

∑E∈S

E; (2.5)

The closed energy of S with respect to A and E is defined as

AE

S =∑E∈S

E +∑

E<S,E∼S

PSEd PS, (2.6)

whereEd =

∑v∈V

( ∑u∈V

vT Eu)vvT =

∑v∼E

(∑u∼v

vT Eu)vvT (2.7)

25

is called the diagonal concentration of E, and we have

PSEd PS =∑

v∈S,v∼E

(∑u∼v

vT Eu)vvT (2.8)

Remark2.1.4. The restricted energy ofS can be simply viewed as the restriction of A

on the subsetS. The interior energy (closed energy) ofS is AS excluding (including)contributions from other energy elements E < S neighboring S. The followingexample illustrates the idea of various energies introduced in Definition 2.1.3 byconsidering the 1-dimensional discrete Laplace operator with Dirichlet boundaryconditions.

Example 2.1.5. Consider A to be the (n + 1) × (n + 1) tridiagonal matrix withentries -1 and 2 on off-diagonals and diagonal respectively. Let

E1 =( 2 −1−1 1

0), En =

( 01 −1−1 2

), and Ek = *

,

01 −1−1 1

0+-

(2.9)

for k = 2, . . . n − 1. Let V = eini=0 to be the standard orthonormal basis for

Euclidean space Rn+1. Formally Ek is the edge between ek−1 and ek . If S =e3, e4, e5, e6, then we have

AS =*...,

02 −1−1 2 −1−1 2 −1−1 2

0

+///-

, AES=

*...,

01 −1−1 2 −1−1 2 −1−1 1

0

+///-

,

and AE

S =*...,

03 −1−1 2 −1−1 2 −1−1 3

0

+///-

.

Recall that the interior energy AES=

∑Ek∈S

Ek =∑6

k=4 Ek , while the closed energy

AE

S = AES+

∑E<S,E∼S

PSEd PS

= AES+ |eT

3 E3e2 |e3eT3 + |e

T3 E3e3 |e3eT

3 + |eT7 E7e6 |e6eT

6 + |eT6 E7e6 |e6eT

6

includes the partial contributions from other energy elements E < S neighboringS, which are E3 and E7 respectively.

Remark 2.1.6.

- Notice that any eigenvector x of AS (or AES, AE

S) corresponding to nonzeroeigenvalue must satisfy x ∈ span(S). In this sense, we also say AS (or AE

S,

AE

S) is local to S.

26

- For any energy E, we have E Ed since for any x =∑n

i=1 civi, we have

xT Ex =∑

i

c2i vTi Evi +

∑i, j

2cic jvTi Ev j

≤∑

i

c2i vTi Evi +

∑i, j

(c2i + c2j )vTi Ev j

=∑

i

∑j

c2i vTi Ev j

= xT Ed x.

Proposition 2.1.7. For any S ⊂ V , we have that AES AS A

E

S .

Proof. We have

AES=

∑E∈S

E ∑E∈S

E +∑E<S,E∼S

PSEPS ∑E∈S

E +∑E<S,E∼S

PSEd PS = AE

S .

Notice that PSEPS = E for E ∈ S, and PSEPS = 0 for E / S, thus

AS = PSAPS =∑E∈S

E +∑E<S,E∼S

PSEPS,

and the desired result follows.

Definition 2.1.8 (Partition of basis). LetV = vini=1 be an orthonormal basis ofR

n.We say P = Pj

Mj=1 is a partition ofV = vi

ni=1 if (i) Pj ⊂ V ∀ j; (ii) Pj ∩ Pj ′ = ∅

if j , j′; and (iii)⋃M

j=1 Pj = V .

Again one can see the partition of basis as partition of vertices. This partitionP is thekey to construction of local basis for operator compression purpose. The followingproposition serves to bound the matrix A from both sides with blocked(patched)matrices, which will further serve to characterize properties of local basis.

Proposition 2.1.9. Let E = Ek mk=1 be an energy decomposition of A, and P =

Pj Mj=1 be a partition ofV . Then

M∑j=1

AEPj A

M∑j=1

AE

Pj. (2.10)

27

Proof. Let EP = E ∈ E : ∃Pj ∈ P such that E ∈ Pj , and EcP= E\EP . Recall

that E ∈ Pj if N (E,V ) ⊂ Pj (See Definition 2.1.2). We will use Pj to denote theorthogonal projection onto Pj . Since Pj ∩ Pj ′ = ∅ for j , j′, we have

∑j Pj = Id.

Then ∑j

AEPj=

∑j

∑E∈Pj

E

∑

j

∑E∈Pj

E +∑

E∈EcP

E

∑

j

∑E∈Pj

E +∑

E∈EcP

[(∑j

Pj)Ed (∑

j ′Pj ′

)]

=∑

j

∑E∈Pj

E +∑

E∈EcP

(∑j

Pj Ed Pj)

=∑

j

( ∑E∈Pj

E +∑

E<Pj,E∼Pj

Pj Ed Pj)

=∑

j

AE

Pj.

We have used the fact that Pj Ed Pj ′ = 0 for j , j′. Notice that

A =∑

E∈EP

E +∑

E∈EcP

E =∑

j

∑E∈Pj

E +∑

E∈EcP

E,

and the desired result follows.

Throughout the thesis, we will always assume that A has a finest energy decompo-sition E = Ek

mk=1, and all the other discussed energies of A are constructed from

E with respect to some orthonormal basis V (by taking interior or closed energy).Therefore we will simply use A

S, AS to denote AE

S, AE

S for any S ⊂ V .

Example 2.1.10. Consider L to be the graph Laplacian matrix of the graph given inFigure 2.1. For graph Laplacian, an intrinsic energy decomposition arises duringthe assembling of the matrix in which the energy element is defined over eachedge(see Equation (2.2)). Now suppose we have given the partition P =

Pj

3j=1

with P1 = [e1, e2, e3], P2 = [e4, e5, e6, e7] and P3 = [e8, e9, e10, e11], where ei are thestandard basis of R11. Then we can obtain LPj

and LPj as follows:

LP1=

(4 −2 −2−2 4 −2−2 −2 4

)P1, LP2

=

( 5 −2 −1 −2−2 4 −2 0−1 −2 5 2−2 0 −2 4

)P2

, LP3=

( 4 −2 −2 0−2 5 −1 −2−2 −1 5 −20 −2 −2 4

)P3

LP1 =

(6 −2 −2−2 6 −2−2 −2 8

)P1, LP2 =

( 7 −2 −1 −2−2 4 −2 0−1 −2 7 2−2 0 −2 8

)P2

, LP3 =

( 6 −2 −2 0−2 5 −1 −2−2 −1 9 −20 −2 −2 6

)P3

28

(a) (b)

Figure 2.1: (a) An illustration of a graph example . (b) An illustration of a partitionP = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11.

Here we denote the matrix (·)Pjto be the matrix in R11×11 but with nonzero entries

on Pj only.

2.2 Operator CompressionAs mentioned in the Section 1.1, inspired by the Finite Element Method (FEM)approach for solving partial differential equation (PDE) in which the variational for-mulation naturally gives the energy decomposition of the operator, we adopt a similarstrategy of FEM to find a subspace Φ approximating the solution space of a linearsystem involving A that are energy decomposable. In particular, approximation ofA−1 can also be obtained.

For traditional FEM, the accuracy of these approximations relies on the regularityof the given coefficients. Without assuming any smoothness on coefficients, onepromising way to approximate the operator is to consider projecting the operatorinto the modified subspace Ψ = Φ − PA

UΦ in [75], or Ψ = A−1(Φ) as proposed in[53, 88]. Here U = (Φ)⊥ is the l2-orthogonal complement space and PA

U is the A-orthogonal projection operator. In the case where A is invertible, these two modifiedspaces are equivalent. Therefore, we propose to employ a similar methodology forcompressing a general symmetric, PD matrix A.

We first obtain a general error estimate for projecting the matrix A into a subspaceΨ = A−1(Φ) ofRn, given the projection type approximation property of the subspaceΦ. With this observation, the operator compression problem is narrowed down intochoosing an appropriate Φ which satisfies condition (2.11). The following theoremalso gives us a general idea on how we can control the errors introduced during thecompression of the operator A.

29

Theorem 2.2.1. Let Φ be a subspace of Rn, and PΦ be the orthogonal projectiononto Φ with respect to 〈·, ·〉2. Let Ψ be the subspace of Rn given by Ψ = A−1(Φ),and PA

Ψbe the orthogonal projection onto Ψ with respect to 〈·, ·〉A. If

‖x − PΦx‖2 ≤ ε ‖x‖A, ∀x ∈ Rn, (2.11)

for some ε > 0, then

1. For any x ∈ Rn, and b = Ax, we have

‖x − PAΨ

x‖A ≤ ε ‖b‖2. (2.12)


‖x − PAΨ

x‖2 ≤ ε2‖b‖2. (2.13)

3. We have‖A−1 − PA

ΨA−1‖2 ≤ ε2. (2.14)

Proof. 1. Let y = A−1(PΦb) ∈ Ψ, then

‖x − y‖2A = (x − y, A(x − y)) = (x − y − PΦ(x − y), b − PΦb)

≤ ‖x − y − PΦ(x − y)‖2‖b − PΦb‖2 ≤ ε ‖x − y‖A‖b‖2,

and thus we have

‖x − PAΨ

x‖A ≤ ‖x − y‖A ≤ ε ‖b‖2.

2. Let z = A−1(x − PAΨ

x), then

‖x − PAΨ

x‖22 = (x − PAΨ

x, x − PAΨ

x) = (x − PAΨ

x, Az)

= (x − PAΨ

x, z − PAΨ

z)A ≤ ‖x − PAΨ

x‖A‖z − PAΨ

z‖A

≤ ε ‖x − PAΨ

x‖A‖Az‖2,

and thus‖x − PA

Ψx‖2 ≤ ε ‖x − PA

Ψx‖A ≤ ε2‖b‖2.

3. Immediate result of 2.

30

Interchangeably, we will use Φ and Ψ to denote the basis matrix of the space Φ andΨ respectively, such that ΦTΦ = IN and Ψ = A−1ΦT . Here N is the dimension ofΦ, and T is some N × N nonsingular matrix to be determined. Then we have

PAΨ= Ψ(ΨT AΨ)−1ΨT A = A−1Φ(ΦT A−1Φ)−1ΦT, (2.15)

which is the A-orthogonal projectionmatrix into the subspaceΨ. The correspondingcompressed approximation of A−1 is given by

PAΨ

A−1 = Ψ(ΨT AΨ)−1ΨT . (2.16)

In [75], Målqvist and Petersein proposed the use of modified coarse space in orderto handle roughness of coefficients when solving elliptic equations with FEM.Assuming that the finite elements are conforming and if we see Φ as the originalcoarse space VH in [75], then Ψ is exactly the modified coarse space V ms

H as theyproposed, and the first error estimate in Theorem 2.2.1 is consistent to their erroranalysis. More generally, Owhadi in [89] makes use of the Gamblet framework toconstruct the basis of modified coarse space such that the conforming propertiesof that basis is no longer required. In particular, (2.15) is an analogy of ΨΦ =KΨT (ΦKΦT )−1Φ in page 9 of [88] and the error estimate in (2.12) is correspondingto Proposition 3.6 in that paper.

As a FEM type method, the choice of Φ determines the operator compression erroror the solution approximation error. We know that the optimal rank-N approximatorof A−1 is given by taking Φ to be the eigenspace of A corresponding to the first N

smallest eigenvalues, which is essentially the Principal Component Analysis (PCA)[55]. And the optimal compression error is given by ε2 = (λN+1(A))−1. Thoughwith optimal approximation property, the drawback of the PCA is nonnegligible inthat the eigenvectors of A are almost always dense even when A has strong localproperties. While the sparse PCA [34, 80, 137] provides a strategy to obtain a sparseapproximation of A−1, it implicitly assumes that the operators inherit a low rankcharacteristics such that l1 minimization approach is effective. To fully use the localproperties of A, we would prefer to choose Φ that can be locally computed but stillhas good approximation property, namely satisfying condition (2.11) with a prettygood error ε and a nearly optimal dimension N . Also we hope the a priori errorbound ε can be estimated locally.

Indeed when solving elliptic PDEs using FEM, the nodal basis can be chosen as(discretized) piece-wise polynomials with compact local supports, and the error

31

is given by the resolution of the partition of the computational domain [8, 19].However, such choices of partition of the computational domain do not dependon the operator A in traditional FEM. Yet it only depends on the geometry of thecomputational domain, and thus the performance relies on the regularity of A. Soa natural question arises: can we do better if we choose the partition and the nodalbasis using the local information of A?

Furthermore, what can we do if we do not have a priori the computational domain?Such scenario arises, for example, when the underlying geometry of some operatorsA, like graph Laplacian, is unknown and no embedding maps to the physical domaincan be found easily. In this case, one of the promising ways to accomplish such tasklies in the deep connection between our energy representation of the operator A andits hidden geometric structure. More specifically, the energy decomposition of theoperator introduced in Section 2.1 reveals the intrinsic locality of the underlyinggeometry in an algebraic way, so that we can construct an optimal partition ofthe computational space and choose a proper subspace/basis Φ using only localinformation.

After constructing the partition and the basis Φ, the next mission is to find a goodbasis Ψ of the space A−1(Φ). The choice of Ψ serves to preserve the locality of thestiffness matrix Ast = Ψ

T AΨ inherited from A, and to give a reasonable bound onthe condition number of Ast.

Algorithm 1 Operator CompressionInput: Energy decomposition E, underlying basisV , desire accuracy ε1: Construct partition P subject to ε using Algorithm 4;2: Construct Φ using Algorithm 2;3: Construct Ψ using Algorithm 3 subject to ε ;4: Compute PA

ΨA−1 = Ψ(ΨT AΨ)−1ΨT as the compressed operator;

In summary, as mentioned in Section 1.1, our approach is to (i) construct a partitionof the computational space/basis using local information of A; (ii) constructΦ that islocally computable in each patch of the partition and satisfies error condition (2.11);(iii) construct Ψ that provides stiffness matrix Ast with locality and reasonablecondition number. The whole process can be summarized as the Algorithm 1, andeach step will be discussed in following sections.

But to theoretically develop our approach, we first assume that we are given animaginary partition P, and then derive proper constructions of Φ and Ψ serving the

32

desired purposes based on this partition. In the derivation process, we come up withsome desired conditions that will, in return, guide us how to construct the adaptivepartition P with the desirable properties.

2.2.1 Choice of ΦAs discussed in the last subsection, the underlying geometry of the operator maynot be given. Therefore determining Φ which archives the condition (2.11) is nota trivial task. Instead of tackling this problem directly, the following propositionprovides us a more apparent and local criterion on choosing Φ.

Proposition 2.2.2. Let P = Pj Mj=1 be a partition of V , and APj

Mj=1 be thecorresponding interior energies as defined in Definition 2.1.3. For each 1 ≤ j ≤ M ,let Φ j be some subspace of spanPj such that

‖x − PΦ j x‖2 ≤ ε ‖x‖APj, ∀x ∈ spanPj , (2.17)

for some constant ε . Then we have

‖x − PΦx‖2 ≤ ε ‖x‖A, ∀x ∈ Rn, (2.18)

where Φ =⊕

j Φ j .

Proof. Since P = Pj Mj=1 is a partition ofV , we have PΦ =

∑j PΦ j , and thus

‖x − PΦx‖22 = ‖∑

j

(Pj x − PΦ j x)‖22 =∑

j

‖Pj x − PΦ j x‖22 ≤ ε

2∑

j

‖Pj x‖2APj

.

Notice that ∑j

‖Pj x‖2APj

=∑

j

‖x‖2APj

= ‖x‖2A ≤ ‖x‖2A,

and the conclusion follows.

Intuitively, given a partition P ofV , we can constructΦ locally by choosingΦ j thatsatisfies (2.17) for each Pj . Apparently, the choice of Φ j depends on the partitionP and the feasibility of this problem is guaranteed since we can always set P = Vto fulfill (2.17). But this choice is not optimal. We should adaptively choose P andΦ in such a way that it minimizes N , the dimension of Φ.

Suppose we are given the partition P = Pj Mj=1. Then minimizing N is equivalent

to minimizing the dimension of each Φ j . In the following, we will first definethe notion of interior spectrum of the interior energy AS for a subset S ⊂ V .Lemma 2.2.4 will then show the relationship between the interior spectrum and theminimum dimension that can be achieved for each Φ j .

33

Definition 2.2.3 (Interior Spectrum). Let S be a subset of V . We define theinterior spectrum λ j (S, A)sj=1 as the set of eigenvalues of A

S, where s = #S =

dim(spanS) and ASis the interior energy of S with respect to A (as an operator

restricted to the space spanS).

In what follows, since A is generally given and fixed, we will write λ j (S; A) asλ j (S). Also we will always assume the ordering λ1(S) ≤ λ2(S) ≤ · · · ≤ λs (S).

Lemma 2.2.4. Given a set S ⊂ V and a constant ε , let q(ε ) be the smallestinteger such that 1

ε2≤ λq(ε )+1(S). Also define G(ε ) = Θ ⊂ spanS : ‖x −

PΘx‖2 ≤ ε ‖x‖AS, ∀x ∈ spanS, and let p(ε ) = minΘ∈G(ε ) dimΘ. Then we have

q(ε ) = p(ε ).

Proof. Let Φk ⊂ spanS denote the eigenspace of AS(as an operator restricted

to spanS) corresponding to interior eigenvalues λ1(S) ≤ λ2(S) ≤ · · · ≤ λk (S).On the one hand, for all x ∈ spanS, we have

‖x‖2AS≥ ‖x − PΦq (ε) x‖2A

S

= (x − PΦq (ε) x)T AS

(x − PΦq (ε) x) ≥ λq(ε )+1(S)‖x − PΦq (ε) x‖22,

which is‖x − PΦq (ε) x‖2 ≤

1√λq(ε )+1(S)

‖x‖AS≤ ε ‖x‖A

S.

Thus Φq(ε ) ∈ G(ε ), q(ε ) ≥ p(ε ). On the other hand, assume that the minimump(ε ) is achieved by some space Θ, then one can check that

λp(ε )+1 = maxΘ⊂spanSdimΘ=p(ε )

minx∈spanS

‖x − PΘx‖2AS

‖x − PΘx‖22

≥ minx∈spanS

‖x − PΘ

x‖2AS

‖x − PΘ

x‖22

= minx∈spanS

‖x − PΘ

x‖2AS

‖x − PΘ

x − PΘ

(x − PΘ

x)‖22

(since P

Θ(x − P

Θx) = 0

)= min

y=x−PΘ

xx∈spanS

‖y‖2AS

‖y − PΘy‖22

≥ miny∈spanS

‖y‖2AS

‖y − PΘy‖22≥

1ε2,

34

which implies p(ε ) ≥ q(ε ) by the definition of q(ε ). Therefore, we have p(ε ) =q(ε ).

By Lemma 2.2.4, one optimal way to minimize dimΦ j for each Pj subject tocondition (2.17), is to take Φ j = Φ

qj (ε )j , the eigenspace corresponding to interior

eigenvalues λ1(Pj ) ≤ λ2(Pj ) ≤ · · · ≤ λqj (ε ) (Pj ), where qj (ε ) is the smallest integersuch that 1

ε2≤ λqj (ε )+1(Pj ). Recall that this criterion for choosing Φ j is based on

the fact that the partition P is given. Then one shall ask a more practical question:how do we construct an “optimal” partition P, in the sense that it has a smallesttotal dimension of Φ?

Instead of answering this question directly, we consider the problem in a moretractable way. We fix an integer q, and choose a q-dimensional local space Φ j foreach Pj . Then the problem of minimizing dimΦ subject to the condition (2.17)is reduced to finding a partition P = Pj

Mj=1 with a minimal patch number. Still

guided by Lemma 2.2.4, we know that we should chooseΦ j = Φqj , and the condition

(2.17) is satisfied if and only if 1ε2≤ λq+1(Pj ) for each Pj .

Definition 2.2.5 (Error factor). Let P = Pj Mj=1 be a partition of V . The error

factors of P are defined as

ε(Pj, q) =1√

λq+1(Pj ), 1 ≤ j ≤ M, and ε(P, q) = max

j

1√λq+1(Pj )

.

(2.19)

Therefore, given a constant ε , we need to minimize the patch number of P subjectto ε(P, q) ≤ ε .

Construction 2.2.6 (Construction of Φ). We choose Φ =⊕M

j=1Φqj , where Φ

qj ⊂

spanPj is the eigenspace corresponding to the first q interior eigenvalues of patchPj . We also require (Φq

j )TΦqj = Iq, i.e. ΦTΦ = IN . Then the condition (2.17) is

satisfied if ε(P, q) ≤ ε .

Guided by Construction 2.2.6, we propose Algorithm 2 to construct Φ. Notice thatit also computes the compliment space Uj of Φ j in each span(Pj ), which will servefor the purpose of performing MMD in Section 3.1.

Remark 2.2.7.

35

- The construction of Uj can be implicitly done, for example, by extending Φ j

to an orthonormal basis of span(Pj ) with local QR factorization, where onlyq Householder vectors [h1, h2, · · · , hq] need to be stored. In fact, we canapply economic QR factorization toΦ j to obtain (I − h1hT

1 )(I − h2hT2 ) · · · (I −

hqhTq ) = [Q j,Uj] where [Q j,Uj] is orthogonal and Span(Φ j ) = Span(Q j ).

In following algorithms there are only two kinds of operation that involveUj , namely UT

j x for some x ∈ Rs and Uj x for some x ∈ Rs−q. The formerone can be done by computing y = (I − hqhT

q ) · · · (I − h2hT2 )(I − h1hT

1 )x

and then taking the last s − q entries of y; the latter one can be done byextending x to x = [0, x] with additional q 0s in front and then computing(I − h1hT

1 )(I − h2hT2 ) · · · (I − hqhT

q ) x.

- The integer q is given before the partition is constructed, and the choice of q

will be discussed in Section 2.3.

Algorithm 2 Construction of ΦInput: Energy decomposition E, partition P subject to ε(P, q) ≤ ε .1: for each Pj ∈ P do2: Extract APj

from E;3: Find the first q normalized eigenvectors of APj

as Φ j ;4: Find Uj such that [Φ j,Uj] is an orthonormal basis of span(Pj );5: end for6: Collect all Φ j as Φ, and all Uj as U .

Complexity of Algorithm 2For simplicity, we assume that all patches in partition P have the same patch size s.Then number of patches is #P = n

s . Let F (s) denote the local patch-wise complexityof solving partial eigen problem and extending Φ j to [Φ j,Uj]. Then the complexityof Algorithm 2 is

O(F (s)

s· n). (2.20)

2.2.2 Choice of ΨSuppose that we have determined the space Φ = [ϕ1, ϕ2, · · · , ϕN ], the next step isto find Ψ = [ψ1, ψ2, · · · , ψN ] = A−1ΦT , namely to determine T , so that

1. each ψi is locally computable, or can be approximated by some ψi that islocally computable;

36

2. the stiffness matrix Ast = ΨT AΨ has relatively small condition number, or the

condition number can be bounded by some local information.

Generally each A−1φi is not local (sparse), so it may be impossible to find even oneψ ∈ spanA−1Φ that is locally computable. A more promising idea is to find ψ thatcan be well approximated by some ψi which is locally computable.

Lemma 2.2.8. Assume that Ψ = [ψ1, ψ2, · · · , ψN ] satisfies ‖x − PAΨ

x‖A ≤ ε ‖Ax‖2and ‖A−1st ‖2 ≤ ‖A−1‖2, and that Ψ = [ψ1, ψ2, · · · , ψN ] satisfies ‖ψi−ψi‖A ≤

Cε√N, 1 ≤

i ≤ N for some constant C. Then we have


‖x − PAΨ

x‖A ≤ (1 + C‖A−1‖2)ε ‖b‖2.


‖x − PAΨ

x‖2 ≤ (1 + C‖A−1‖2)2ε2‖b‖2.

3. We have‖A−1 − PA

ΨA−1‖2 ≤ (1 + C‖A−1‖2)2ε2.

Proof. We only need to prove property 1, properties 2 and 3 follow by using thesame argument as in Theorem 2.2.1. Recall that we have

‖x − PAΨ

x‖A = ‖x − Ψc‖A ≤ ε ‖b‖2,

with c = A−1st ΨT Ax. Let y1 = Ψc =

∑Ni=1 ciψi, y2 = Ψc =

∑Ni=1 ciψi. Then we have

‖y1−y2‖A =

N∑i=1

ci (ψi − ψi) A

≤

N∑i=1|ci |‖ψi−ψi‖A ≤

Cε√

N√

N

( N∑i=1

c2i) 12 = Cε

√cT c.

Notice thatcT c = xT AΨA−2st Ψ

T Ax ≤ ‖A12ΨA−2st Ψ

T A12 ‖2‖x‖2A,

‖A12ΨA−2st Ψ

T A12 ‖2 = ‖A−1st Ψ

T AΨA−1st ‖2 = ‖A−1st ‖2,

‖x‖2A = bT A−1b ≤ ‖A−1‖2‖b‖22,

therefore we get

‖y1 − y2‖A ≤ Cε√‖A−1st ‖2‖A−1‖2‖b‖2 ≤ Cε ‖A−1‖2‖b‖2.

37

Then we have

‖x−y2‖A ≤ ‖x−y1‖A+‖y1−y2‖A ≤ ε ‖b‖2+Cε ‖A−1‖2‖b‖2 = (1+C‖A−1‖2)ε ‖b‖2.

Since y2 ∈ spanΨ, we obtain

‖x − PAΨ

x‖A ≤ ‖x − y2‖A ≤ (1 + C‖A−1‖2)ε ‖b‖2.

Guided by Lemma 2.2.8, in order to preserve the compression accuracy, we requirethat each ψi be approximated accurately in energy norm ‖ · ‖A by some ψi that islocally computable. To implement this idea, we consider the problem reversely.Suppose we already have some ψi that is locally computable, so the construction ofΨ is to find ψi ∈ A−1(Φ) so that ‖ψi − ψi‖A is small for each i. Since ψi is given,minimizing ‖ψi − ψi‖A can be simply solved by taking ψi = PA

Ψψi. Thanks to the

expression PAΨ= A−1Φ(ΦT A−1Φ)−1ΦT , we can perform the energy projection PA

Ψ

as long as we know Φ. Therefore we have

Ψ = PAΨΨ = A−1Φ(ΦT A−1Φ)−1ΦT

Ψ, (2.21)

=⇒ ΦTΨ = ΦT A−1Φ(ΦT A−1Φ)−1ΦT

Ψ = ΦTΨ. (2.22)

Then we shall discuss how to describe the locality of each ψi. Similar to the localityof Φ, though seems greedy, we can also require that ψi ∈ spanPji for some ji, andthis requirement implies that ϕT

i′ψi = 0, for all ϕi′ < spanPji . Then to determineΦTΨ = ΦT Ψ, we still need to determine ϕT

i′ψi for each ϕi′ ∈ spanPji . But actually,in the following proof of exponential decay of ψi, we can see that the value of ϕT

i′ψi

for each ϕi′ ∈ spanPji does not essentially change the decay property of ψi. Weonly need to make sure that Ψ has the same dimension as Φ. So for simplicity, werequire that

ϕTi′ψi = δi′,i, 1 ≤ i′ ≤ N, i.e. Φ

TΨ = ΦT

Ψ = IN . (2.23)

Adding this extra localization constraint to the form of Ψ = A−1ΦT , we can chooseΨ as follows:

Construction 2.2.9 (Construction ofΨ). We chooseΨ = A−1ΦT so thatΦTΨ = IN ,that is

Ψ = A−1Φ(ΦT A−1Φ)−1, T = (ΦT A−1Φ)−1, (2.24)

38

and we haveAst = Ψ

T AΨ = (ΦT A−1Φ)−1. (2.25)

Remark 2.2.10. Our choice of Ψ is inspired by the result proposed by Owhadi in[88], where the author obtained the same format ofΨ from amarvelous probabilisticperspective. In this work, the idea of Gamblet Transformation is introduced. Suchtransformation gives a particular choice of basis in the modified coarse space, whichensures the exponential decay feature of Ψ. Our derivation of the choice of Ψ canbe seen as a algebraic interpretation of Owhadi’s probabilistic construction.

Though we construct each ψi from some local vector ψi, the error ‖ψi − ψi‖A isnot necessarily small. To have both good locality and small error, we need to usesomething in between. The following lemma (see also Section 3.2 in [88]) showsthat the construction of Ψ in (2.24) is equivalent to the optimizer of a minimizationproblem.

Lemma 2.2.11. Let Ψ be constructed as in (2.24). Then for each i, ψi satisfies

ψi = argminx∈Rn ‖x‖A,

subject to ϕTi′x = δi′,i, ∀ i′ = 1, . . . , N .

Proof. Notice that Aψi ∈ Φ, thus for any x that satisfies ϕTi′x = δi′,i, we have

ΦT (x − ψi) = 0, and ψTi A(x − ψi) = 0. Then we have

‖x‖2A = ‖x − ψi + ψi‖2A = ‖ψi‖

2A + ‖x − ψi‖

2A + 2ψ

Ti A(x − ψi) ≥ ‖ψi‖

2A. (2.26)

By the construction given in (2.24) and guided by Lemma 2.2.11, we can obtainevery ψi by solving the optimization problem. Our next step is to make use of thisminimal property to construct local ψi that will be proved exponentially convergentto ψi.

Definition 2.2.12 (Layers of neighbors). Let P = Pj Mj=1 be a partition ofV . For

any Pj ∈ P, we recursively define S0(Pj ) = Pj , and

Sk+1(Pj ) =⋃

Pj ′∼Sk (Pj )

Pj ′, k = 0, 1, 2, · · · . (2.27)

39

Sk (Pj ) is called the kth neighbor patch ball of Pj , and Sk (Pj )/Sk−1(Pj ) the kthneighbor patch layer of Pj .

Remark 2.2.13. By making use of the notion of neighboring introduced in Defi-nition 2.1.2, we can construct the “algebraic neighbor layers” starting from anyinitial patch Pj . Still we do not implicitly assume any underlying physical domainto the operator A.

Definition 2.2.14 (Local approximator). For each ψi, let Pji be the patch such thatϕi ∈ Φ ji ⊂ spanPji . Then for each k ≥ 0, we define the k-local approximator ofψi as

ψki = argmin

x∈spanSk (Pji)‖x‖A,


(2.28)

Remark 2.2.15. Here k is called the radius of ψki . The condition ϕT

i′ψki = δi′,i is

equivalent to ΦTψki = Φ

Tψi. By Lemma 2.2.11 and the definition of ψki , we have

(ψki − ψi)T Aψi = 0, (ψk−1

i − ψki )T Aψk

i = 0, ∀k, (2.29)

and hence

‖ψki ‖

2A = ‖ψi‖

2A + ‖ψ

ki − ψi‖

2A, (2.30)

‖ψk−1i ‖2A = ‖ψ

ki ‖

2A + ‖ψ

k−1i − ψk

i ‖2A. (2.31)

Definition 2.2.16 (Condition factor). Let P = Pj Mj=1 be a partition of V , and let

A−1Pj

denote the inverse of APj as an operator restricted on spanPj . The conditionfactors are defined as

δ(Pj,Φ j ) = maxx∈Φ j

xT x

xT A−1Pj

x, 1 ≤ j ≤ M, and δ(P,Φ) = max

Pj∈Pδ(Pj,Φ j ).

(2.32)

Remark 2.2.17.

- In what follows, since we always fix a choice of Φ for a partition P, we willsimply use δ(Pj ) and δ(P) to denote δ(Pj,Φ j ) and δ(P,Φ) respectively. Inparticular, when we use the Construction 2.2.6 for Φ with some integer q, wecorrespondingly use the notations δ(P, q).

40

- If we follow the construction ΦTjΦ j = Iqj , where qj is the dimension of Φ j ,

then we have

δ(Pj,Φ j ) = maxc∈Rq j

cTΦTjΦ jc

cTΦTj A−1PjΦ jc= max

c∈Rq j

cT c

cTΦTj A−1PjΦ jc= ‖(ΦT

j A−1PjΦ j )−1‖2,

(2.33)that is

(ΦTj A−1PjΦ j )−1 δ(Pj )Iqj δ(P)Iqj . (2.34)

Moreover, by block-wise inequalities we have

(ΦT (M∑

j=1APj )

−1Φ)−1 δ(P)IN . (2.35)

This analysis will help us to bound the maximum eigenvalue of the stiffnessmatrix Ast = Ψ

T AΨ by δ(P).

Example 2.2.18. In this example, we consider the operator A to be the discretizationof 2-D second-order elliptic operator by standard 5-point Finite Difference scheme.Similar to the case of graph Laplacian in Example 2.1.10, we have a natural energydecomposition inherited from the assembling of such discretization. Specifically,for every pair of vertices ehori := [(i, j), (i, j + 1)] and evert := [(i, j), (i + 1, j)] inthe finite difference grid, the energy elements are

Ei jhori = −

1|ehori |

2

(i, j) (i, j + 1)

*..........,

+//////////-

0ai, j+ 1

2−ai, j+ 1

2(i, j)

. . .

−ai, j+ 12

ai, j+ 12

(i, j + 1)

0

, and

Ei jvert = −

1|evert |

2

(i, j) (i + 1, j)

*..........,

+//////////-

0ai+ 1

2 , j−ai+ 1

2 , j(i, j)

. . .

−ai+ 12 , j

ai+ 12 , j

(i + 1, j)

0

.

Now suppose we are given a partition P, we focus on a particular local patch Pj tostudy how mesh size and contrast affect the error factor and the condition factor.Figure 2.2a shows the high-contrast field (colored in black) in Pj . For simplicity,we set Pj as a square domain with fixed length H = |ehori | = |evert |. We also

41

set the coefficient in high-contrast field to be 103 (and 1 otherwise). Figure 2.2bshows the decreasing trend of the error factor ε (Pj, 1) as the vertex number #V

inside the patches (i.e. the vertex density in Pj) increases. Here we choose q =

1 for illustration purpose. Figure 2.2c shows a similar decreasing trend of thecondition factor δ(Pj, 1) and Figure 2.2d plots ε (Pj, 1)2 ·δ(Pj, 1) versus #V . Fixingthe vertex number #V in the patch Pj , we also study the relationship of ε (Pj, 1),δ(Pj, 1) and the contrast. In particular, we double the contrast by 2 in each singlecomputation and investigate the trend of ε (Pj, 1) and δ(Pj, 1). Figure 2.2e shows thedecrease of ε (Pj, 1) as contrast increases. For δ(Pj, 1), although it also increasesas the contrast increases, we can clearly see that there is an upper bound (around220 in this example), even when the contrast jumps up to 220. Figure 2.2g plotsε (Pj, 1)2 · δ(Pj, 1) versus contrast.

(a)

5 10 15Vertex number#104

0.2384

0.2386

0.2388

0.239

0.2392

0(P,1) " h-1

(b)

5 10 15Vertex number #104

211

212

213

214

/(P) " h2

(c)

5 10 15Vertex number#104

12

12.05

12.1

12.15

12.2

12.25

/(P) " 02(P,1)

(d)

5 10 15 20Contrast (log2)

0.241

0.242

0.243

0.244

0.245

0.246

0(P,1) " h-1

(e)


140

160

180

200

220

/(P) " h2

(f)


8

9

10

11

12

/(P) " 02(P,1)

(g)

Figure 2.2: An example showing the relationship between mesh size, the errorfactor ε (Pj, 1), the condition factor δ(Pj, 1) and contrast.

The following theorem shows the scaling properties of ψi, ψki under Construc-

tion 2.2.9 and Definition 2.2.14, which will help to prove the exponential decay ofthe basis function ψi.

Theorem 2.2.19. For each ψi, we have

‖ψi‖A ≤ ‖ψki ‖A ≤ ‖ψ

0i ‖A ≤

√δ(Pji ). (2.36)

42

Proof. ‖ψi‖A ≤ ‖ψki ‖A ≤ ‖ψ

0i ‖A has been proved in the construction of local

approximators. We only need to prove ‖ψ0i ‖A ≤

√δ(Pji ). Recall that ψ0

i is definedas

ψ0i = argmin

x∈spanPji

‖x‖A,


Without loss of generality, we can assume that ϕi is the first column of Φ ji . Andnotice that ‖x‖A = ‖x‖APji

, therefore the optimization formation can be rewrittenas

ψ0i = argmin

x∈spanPji

‖x‖APji,

subject to ΦTji

x = zi,(2.37)

where zi = (1, 0, 0, · · · , 0)T ∈ Rqji . This optimization problem can be uniquely andexplicitly solved as

ψ0i = A−1Pji

Φ ji (ΦTji A−1Pji

Φ ji )−1zi, (2.38)

where again A−1Pjidenotes the inverse of APji

as an operator restricted to spanPji .And thus we have

‖ψ0i ‖

2A = ‖ψ

0i ‖

2APji

= zTi (ΦT

ji A−1PjiΦ ji )

−1zi . (2.39)

Notice that

APji APji

⇒ A−1Pji A

−1Pji⇒ Φ

Tji A−1Pji

Φ ji ΦTji A−1PjiΦ ji

⇒ (ΦTji A−1PjiΦ ji )

−1 (ΦTji A−1Pji

Φ ji )−1,

since APjiand APji

are both symmetric, positive definite as operators restricted tospanPji . Therefore by (2.35) we have

‖ψ0i ‖

2A ≤ zT

i (ΦTji A−1PjiΦ ji )

−1zi ≤ ‖(ΦTji A−1PjiΦ ji )

−1‖2‖zi‖22 = δ(Pji ). (2.40)

Compliment Space

For each Pj , without causing any ambiguity, we use Uj interchangeably to denoteboth the orthogonal compliment of Φ j with respect to spanPj , or an orthonormalbasis matrix ofUj . Namely we haveUj ⊂ spanPj , and ΦT

j Uj = 0. Then we define

α(Pj ) = maxx∈Uj

xT APj x

xT APjx, 1 ≤ j ≤ M, and α(P) = max

Pj∈Pα(Pj ). (2.41)

43

Remark 2.2.20.

- If we choose Φ j so that it satisfies condition (2.17), then we have

xT APjx = ‖x‖2APj

≥1ε2‖x − PΦ j x‖

22 =

1ε2‖x‖22, ∀x ∈ Uj,

and xT APj x ≤ ‖APj ‖2‖x‖22 . Thus α(Pj ) ≤ ε2‖APj ‖2. This argument is

meant to show that we can have α(P) < +∞ if we choose P and Φ properly.But this bound is not tight, as α(P) can be much smaller in general.

- An immediate result of the definition of α(Pj ) is that

UTj APjUj α(Pj )UT

j APjUj, ∀ j .

The following theorem shows that the local basis function ψki is exponentially

convergent to ψi as its support Sk (Pji ) extends(or as k increases). Indeed, theexponential decay of ψi has been proved in [53, 75, 88, 89] in different mannersbased on a common observation that the energy of ψi in the region beyond a certainsingle layer of patches is comparable to its energy only on this layer, which reflectsthe local interacting feature of the operator A itself. Also based on this observation,we modify the proof in Section 6 of [89] using matrix framework coherent to ourenergy settings.

Theorem 2.2.21 (Exponential decay). For each ψi, we have

‖ψki − ψi‖

2A ≤

(α(P) − 1α(P)

) k‖ψ0

i − ψi‖2A ≤

(α(P) − 1α(P)

) kδ(Pji ). (2.42)

Proof. For simplicity, we will write ψi as ψ, ψki as ψk , Pji as P, and Sk (Pji ) as Sk .

Let Yk denote the joint space of all Uj such that Pj ⊂ Sk , and Zk the joint space ofall Uj such that Pj ⊂ V\Sk . We still use Yk, Zk as the basis matrix for the spacesYk, Zk , so that each Uj is a bunch of columns of either Yk or Zk . We use U to denoteY∞. Notice that we can always arrange Uj in a particular order so that the matrixform U = [Yk, Zk] holds. We define

r k = ψk − ψ, k ≥ 0; wk = ψk−1 − ψk, k ≥ 1, (2.43)

then according to (2.31) we have

‖r k−1‖2A = ‖rk ‖2A + ‖w

k ‖2A. (2.44)

44

Since ΦT (ψk − ψ) = ΦT (ψk−1 − ψk ) = 0, we have r k ∈ U and wk ∈ Yk . Then bythe minimal properties of ψk and ψ, we actually have

r k−1 = PAUψ

k−1 = U (UT AU)−1UT Aψk−1, (2.45)

wk = PAYkψ

k−1 = Yk (YTk AYk )−1YT

k Aψk−1 = Yk (YTk ASkYk )−1YT

k ASkψk−1. (2.46)

By the definition of Sk , we know that Sk−1 / V\Sk , and therefore ZTk Aψk−1 = 0.

Then we get

‖r k−1‖2A = ψk−1,T AU (UT AU)−1UT Aψk−1

= ψk−1,T A[

Yk 0]

(UT AU)−1

YTk

0

Aψk−1.

Due to the locality of APj, APj, we obtain

UT AU UT( M∑

j=1APj

)U =

M∑j=1

*,

0UT

j APjUj

0+-

1α(P)

M∑j=1

*,

0UT

j APjUj

0+-.

As a simple inference of Proposition 2.1.9, we have∑Pj⊂Sk

APj ASk ,∑

Pj⊂V\Sk

APj AV\Sk ,

and thereforeM∑

j=1

*,

0UT

j APjUj

0+-= *

,

YTk

(∑Pj ⊂Sk

APj

)Yk

ZTk

(∑Pj ⊂V\Sk

APj

)Zk

+-

(YTk

ASkYk

ZTk

AV\Sk Zk

).

Combining all results above, we have

(UT AU)−1 α(P) *,

YTk ASkYk

ZTk AV\Sk Zk

+-

−1

= α(P) *,

(YTk ASkYk )−1

(ZTk AV\Sk Zk )−1

+-,

and thus

‖r k−1‖2A ≤ α(P)ψk−1,T A[

Yk 0] (

(YTk

ASkYk )−1

(ZTk

AV\Sk Zk )−1

)

YTk

0

Aψk−1

= α(P)ψk−1,T AYk (YTk ASkYk )−1YT

k Aψk−1

= α(P)ψk−1,T ASkYk (YTk ASkYk )−1YT

k ASkψk−1

= α(P)‖wk ‖2A.

45

This gives us

‖r k−1‖2A = ‖rk ‖2A + ‖w

k ‖2A ≥ ‖rk ‖2A +

1α(P)

‖r k−1‖2A

=⇒ ‖r k ‖2A ≤α(P) − 1α(P)

‖r k−1‖2A.

Applying this recursively, we have

‖ψk − ψ‖2A ≤(α(P) − 1

α(P)

) k‖ψ0 − ψ‖2A. (2.47)

Notice that ‖ψ0 − ψ‖2A = ‖ψ0‖2A − ‖ψ‖

2A ≤ ‖ψ

0‖2A ≤ δ(P), and this completes ourproof.

Remark 2.2.22.

- Recall that in Lemma 2.2.8, to make a k-layer approximator Ψk become agood approximator, we need ‖ψi − ψ

ki ‖A ≤

Cε√Nfor some constant C, and thus

Theorem 2.2.21 guides us to choose

k = O(log1ε+ log N + log δ(P)). (2.48)

And the locality of ψki lies in the local connection property of the matrix A.

- By the definition in Equation (2.41), α(P) is locally scaling invariant. There-fore the layer-wise decay rate is unchanged when A is locally multiplied bysome scaling constant.

Corollary 2.2.23. For any ψi, the interior energy of ψi on V/Sk (Pji ) = Sck (Pji )

decays exponentially with k, namely

‖ψi‖2ASc

k(Pji

)≤

(α(P) − 1α(P)

) kδ(Pji ).

Moreover, for any two ψi, ψi′, we have

|ψTi Aψi′ | ≤

(α(P) − 1α(P)

) kii ′

4 −12 δ(P),

where kii′ is the largest integer such that Pji ′ ⊂ Sckii ′

(Pji )( or equivalently Pji ⊂

Sckii ′

(Pji ′ )).

Proof. This proof basically follows the idea in [88]. For any ψi, recall that ψki ∈

span(Sk (Pji )), thus ASck

(Pji)ψ

ki = 0, and

‖ψi‖2ASc

k(Pji

)= ‖ψi − ψ

ki ‖

2ASc

k(Pji

)≤ ‖ψi − ψ

ki ‖

2A ≤

(α(P) − 1α(P)

) kδ(Pji ).

46

For any twoψi, ψi′, since (ψi−ψki )TΦ = 0, and Aψi′ ∈ Φ, we have (ψi−ψ

ki )T Aψi′ = 0

for all k. Also notice that (ψki )T Aψk

i′ = 0 for k <kii ′2 , since Sk (Pji ) ∩ Sk (Pji ′ ) = ∅

when 2k < kii′. Therefore taking k = d kii ′2 e − 1 we have

|ψTi Aψi′ | = |(ψk

i )T A(ψi′ − ψki′) | ≤ ‖ψ

ki ‖A‖ψi′ − ψ

ki′‖A

≤(α(P) − 1

α(P)

) k2 δ(P) ≤

(α(P) − 1α(P)

) kii ′

4 −12 δ(P).

Remark 2.2.24. Recall that in Lemma 2.2.8 for the compression error with local-ization ε2com to be bounded by some prescribed accuracy ε2, we need the localizationerror ε2loc ≤

ε2

N . But now with the exponential decaying feature ofΨ, empirically, weobserve that we can relax the requirement of the localization error to be ε2loc ≤ O(ε2)in practice.

Now we have constructed a Ψ that can be approximated by local computable basisin energy norm. So the remaining task is to tackle with the second criteria: to givea control on the condition number of Ast = Ψ

T AΨ.

Theorem 2.2.25. Let λmin(Ast) and λmax(Ast) denote the smallest and largest eigen-values of Ast respectively, then we have

λmin(Ast) ≥ λmin(A), λmax(Ast) ≤ δ(P), (2.49)

and so we haveκ(Ast) =

λmax(Ast)λmin(Ast)

≤ δ(P)‖A−1‖2. (2.50)

Proof. Thanks to (2.25), and since ΦTΦ = IM , we have

‖A−1st ‖2 = ‖ΦT A−1Φ‖2 ≤ ‖A−1‖2 =⇒ λmin(Ast) ≥ λmin(A). (2.51)

Thanks to (2.35), and since A ∑M

j=1 APj , we have

δ(P)IM (ΦT (M∑

j=1APj )

−1Φ)−1 (ΦT A−1Φ)−1 = Ast =⇒ λmax(Ast) ≤ δ(P).

(2.52)

47

Corollary 2.2.26. Let Ψ be the local approximator of Ψ defined in Lemma 2.2.8such that ‖ψi − ψi‖A ≤

ε√N, then we have

λmin( Ast) ≥ λmin(A), λmax( Ast) ≤(1 +

ε√δ(P)

)2δ(P), (2.53)

and thus

κ( Ast) =λmax( Ast)

λmin( Ast)≤

(1 +

ε√δ(P)

)2δ(P)‖A−1‖2. (2.54)

Proof. Notice that PAΨ= Ψ A−1st Ψ

T A is a projection with respect to the energy innerproduct, we have

AΨ A−1st ΨT A A =⇒ Ψ A−1st Ψ

T A−1,

and since ΦT Ψ = IN , we have

‖ A−1st ‖2 ≤ ‖ΦT A−1Φ‖2 ≤ ‖A−1‖2 =⇒ λmin( Ast) ≥ λmin(A).

For any c ∈ RN , using a similar argument in Lemma 2.2.8, we have

‖Ψc − Ψc‖A ≤ Cε ‖c‖2,

then we get

cT Astc = ‖Ψc‖2A ≤(‖Ψc − Ψc‖A + ‖Ψc‖A

)2≤ (Cε +

√λmax(Ast))2‖c‖22,

and using λmax(Ast) ≤ δ(P), we have

λmax( Ast) ≤(1 +

Cε√δ(P)

)2δ(P).

Guided by Theorem 2.2.25, we obtain a simple methodology on the control ofcondition number of the stiffness matrix Ast. As shown in (2.49) and (2.2.25),the only variable is the choice of partition P and thus the burden again falls tothe construction of the partition P. Nevertheless, this new criterion allows us toregulate the quality of partitions directly by avoiding large δ(P).

Now we can design an algorithm to construct the local approximator Ψ of Ψ subjectto a desired localization error ε loc. Intuitively, a straightforward way is to choose a

48

large enough uniform decay radius r and directly compute Ψ = Ψr . The localizationerror can then be guaranteed by Lemma 2.2.8. But redundant computation willprobably occur since some ψ may decay much faster than the others. Instead, wepropose to compute each ψi hierarchically from the center patch Pji by makinguse of optimization property (2.28). Suppose that we already obtain ψk−1

i , thenby optimization property (2.28), one can check that wk

i = ψk−1i − ψk

i satisfies thefollowing optimization problem

wki = argmin

w∈spanSk (Pji)‖ψk−1

i − w‖A,

subject to ΦTw = 0.

(2.55)

Similar to the proof of Theorem 2.2.21, let Yi,k denote the joint space of all Uj suchthat Pj ⊂ Sk (Pji ). Then the constraints in optimization problem (2.55) imply thatwk

i ∈ spanYi,k . Therefore we can explicitly compute wki as

wki = PA

Yi,kψk−1i = Yi,k

(YT

i,k ASk (Pji)Yi,k

)−1YTi,k ASk (Pji

)ψk−1i , (2.56)

and thusψk

i = ψk−1i − Yi,k

(YT

i,k ASk (Pji)Yi,k

)−1YTi,k ASk (Pji

)ψk−1i . (2.57)

Specially, we can compute ψ0i by (2.57) with an initial guess ψ0−1

i ∈ spanPji

satisfying ϕTi′ψ

0−1i = δi′,i, ∀ i′ = 1, . . . , N . Notice that the main cost of computa-

tion of ψki involves inverting the matrix YT

i,k AYi,k , whose condition number can bebounded by ε(P, q)2λmax(A)κ(YT

i,kYi,k ) as we will see in (3.1.1). By choosing all Uj

orthonormal, we have κ(YTi,kYi,k ) = 1, then the computation efficiency is measured

by ε(P, q)2λmax(A) if we use CG type method. When we prescribe some certainaccuracy ε(P, q)2 but λmax(A) is really large, a multiresolution strategy will beadopted to ensure the efficiency of computing ψk

i .

To summarize, the process of computing a sufficient approximator ψi starts with theformation of ψ0

i , then inductively computes ψki by solving inverse problem (2.57)

with initializer ψk−1i , and finally ends with ψi = ψ

ri when some stopping criterion is

attained for k = r . Such inductive computation suggests us to use the CG method totake advantage of the exponential convergence ofψk

i . Having faith in the exponentialdecay of ‖ψk

i − ψi‖A, we choose the stopping criterion as

η2k

1 − η2k‖ψk−1

i − ψki ‖

2A ≤ ε

2loc, for ηk =

‖ψk−1i − ψk

i ‖A

‖ψk−2i − ψk−1

i ‖A.

49

The reason is that if ‖ψki −ψi‖A does decay as ‖ψk

i −ψi‖A = O(ηk ) for some constantη ∈ (0, 1), then

‖ψki − ψi‖

2A =

η2

1 − η2O(‖ψk−1

i − ψi‖2A − ‖ψ

ki − ψi‖

2A) =

η2

1 − η2O(‖ψk−1

i − ψki ‖

2A),

where we have used (2.31). With the analysis above, we propose Algorithm 3 forconstructing Ψ.

Complexity of Algorithm 3For simplicity, we assume that all patches in partition P have the same patch sizes. Let r be the necessary number of layers for ‖ψr

i − ψi‖A ≤ε√N, where N is the

dimension of Φ(or Ψ), then we have

r = O(log1ε+ log N + log δ(P)). (2.58)

Since we are actually compressing A−1, we can bound N by the original dimensionn. Further we assume that locality condition (2.63), (2.64) and (2.65) are satisfied,then the support size of each ψr

i is O(s · rd). Since we also only need to solve (2.57)up to the same relative accuracyO( ε√

N) using the CGmethod, the cost of computing

ψri can be estimated by

O(κ(YTi,k AYi,k ) · s · rd · (log

1ε+ log N + log δ(P)))

≤ O(ε(P, q)2 · λmax(A) · s · (log1ε+ log n + log δ(P))d+1) (2.59)

Finally the total complexity of Algorithm 3 is N times the cost for every ψri , i.e.

O(n · q · ε(P, q)2 · λmax(A) · (log1ε+ log n + log δ(P))d+1), (2.60)

where we have used the relation N = nq/s.

2.3 Construction of PartitionWith the analysis in the previous sections, we now have a blueprint for the con-struction of partition P = Pj

Mj=1. Given an underlying energy decomposition

E = Ek mk=1 of a PD matrix A and an orthonormal basisV of Rn, the basic idea is

to find a partition P ofV with small patch number #P and small condition factorδ(P, q), while subject to a prescribed error bound on the error factor ε(P, q). Inparticular, our goal is to find the optimizer of the following problem:

P = argminP

f1(#P) + f2(δ(P, q)),

subject to ε(P, q) ≤ ε,

50

Algorithm 3 Construction of ΨInput: Energy decomposition E, partition P, Φ, desired accuracy ε locOutput: Ψ1: for i = 1, 2, · · · , dim(Φ) do2: Compute ψ0

i by solving (2.57) on S0(Pji );3: Compute ψ1

i by solving (2.57) on S1(Pji ) with initializer ψ0i ;

4: repeat5: Compute ψk

i by solving (2.57) on Sk (Pji ) with initializer ψk−1i ;

6: η ←‖ψk

i −ψk−1i ‖A

‖ψk−1i −ψk−2

i ‖A;

7: until η2

1−η2 ‖ψki − ψ

k−1i ‖2A < ε2loc;

8: ψi ← ψki ;

9: end for10: Ψ = [ψ0, ψ1, . . . , ψdim(Φ)].

where f1, f2 are some penalty functions, q is a chosen integer, and ε is the desiredaccuracy. This ideal optimization problem is intractable, since in general suchdiscrete optimization means to search over all possible combinations. Instead, wepropose to use local clustering approach to ensure efficiency.

Generally, if we have a priori knowledge of the underlying computational domainof the problem, like Ω ⊂ Rd , one of the optimal choices of partition will be theuniform regular partition. For instance, in [19], regular partitions are used in thesense that each patch (finite element) has a circumcircle of radius H and an inscribedcircle of radius ρH for some ρ ∈ (0, 1). The performance under regular partitioningrelies on the regularity of the coefficients of A (low contrast, strong ellipticity),and the equivalence between energy norm defined by A and some universal normindependent of A. In particular, since regular partitioning of the computationaldomain is simply constructed regardless of the properties of A, its performancecannot be ensured when A loses some regularity in some local or micro-scaledregions.

In view of this, a more reasonable approach is to construct a partition P basedon the information extracted from A, which is represented by the local energydecomposition E of A in our proposed framework. For computational efficiency,the construction procedure should rely only on local information (rather than globalspectral information as in the procedure of Eigendecomposition). This explains whywe introduce the localmeasurements in Section 2.2: the error factor ε (P, q) and thecondition factor δ(P, q), which keep track of the performance of partition in our

51

searching approach. These measurements are locally (patch-wisely) computableand thus provide the operability of constructing partition with local operationsinteracting with only neighbor data.

To make use of the local spectral information, we propose to construct the desiredpartition P of V by iteratively clustering basis functions in V into patches. Inparticular, small patches(sets of basis) are combined into larger ones, and the scaleof the partition becomes relatively coarser and coarser. For every such newlygenerated patch Pj , we check if ε(Pj, q) still satisfies the required accuracy (See(2.17)). The whole clustering process stops when no patch combination occurs, thatis, when the partition achieves the resolution limit. Also, for patch Pj to be well-conditioned, we set a bound c on δ(Pj, q)ε(Pj, q)2. The motivation of such boundwill be explained in Section 3.1. And for large δ(Pj, q) to diminish, patches withlarge condition factor are combined first. To realize the partitioning procedure andmaintain the computation efficiency, we combine patches pair-wisely. Our proposedclustering algorithm is summarized in Algorithm 4 and Algorithm 5.

Algorithm 4 Pair-ClusteringInput: energy decomposition E, underlying basisV , desired accuracy ε , condition

bound c.Output: Partition P.1: Initialize: Pj = v j , δ(Pj, q) = APj (scalar), 1 ≤ j ≤ n;2: while Number of active patches > 0, do3: Sort active Pj with respect to δ(Pj, q) in descending order;4: Mark all (active and inactive) Pj as unoperated;5: for each active Pj in descending order of δ(Pj, q), do6: Find_Match(Pj, ε, c);7: if Find_Match succeeds, then8: Mark Pj as operated;9: else if all neighbor patches of Pj are unoperated, then10: Mark Pj as inactive;11: end if12: end for13: end while

Remark 2.3.1.

- If we see 1/ε(Pj ∪ Pj ′, q)2 as the gain, and δ(Pj ∪ Pj ′, q) as the cost, thewell-conditioning bound ε(Pj ∪ Pj ′, q)2δ(Pj ∪ Pj ′, q) ≤ c implies that thecost is proportional to the gain.

52

Algorithm 5 Find_MatchInput: Pj , ε , c.Output: Succeeds or Fails.1: for Pj ′′ ∼ Pj do2: Find largest Con(Pj, Pj ′′) among all unoperated Pj ′′ (stored as Pj ′);3: end for4: Compute ε(Pj ∪ Pj ′, q) and δ(Pj ∪ Pj ′, q);5: if ε(Pj ∪ Pj ′, q) ≤ ε & δ(Pj ∪ Pj ′, q)ε(Pj ∪ Pj ′, q)2 ≤ c, then6: combine Pj and Pj ′ to form Pj (Pj ′ no longer exists);7: update δ(Pj, q);8: return Find_Match succeeds.9: else10: return Find_Match Fails.11: end if

- If we want the patch sizes to grow homogeneously, we can take patch size intoconsideration when sorting the patches (Line 3 in Algorithm 4).

- The local basis functions,Φ j , are also computed in the sub-functionFind_Match,and can be stored for future use.

The sub-function Find_Match in Line 6 of Algorithm 4 takes a patch Pj as inputand finds another patch Pj ′ that will be absorbed by Pj . As a local operation,the possible candidates for Pj ′ are just the neighboring patches of Pj . To furtheraccelerate the algorithm, we avoid checking the error factor for all possible pair(Pj, Pj ′) with Pj ′ ∼ Pj . Alternatively, we check the patch Pj ′ that has the largest“connection" (correlation) with Pj . Undoubtedly, this quantity can be defined indifferent ways. Here we propose the connection between Pj and Pj ′ as:

Con(Pj, Pj ′) =∑

E∼Pj,E∼Pj ′

( ∑u∈Pj,v∈Pj ′

u∼v

|uT Ev |). (2.61)

On the one hand, noted that Con(Pj, Pj ′) can be easily computed and inheriteddirectly after patch combination since one can check that Con(Pj ∪ Pj ′, Pj ′′) =Con(Pj, Pj ′′) + Con(Pj ′, Pj ′′). On the other hand, we observe that

APj∪Pj ′= APj

+ APj ′+

∑E∼Pj,E∼Pj ′

E∈Pj∪Pj ′

E = APj+ APj ′

+ Cross Energy. (2.62)

In other words, a larger cross energy implies larger interior eigenvalues of APj∪Pj ′,

which means APj∪Pj ′is less likely to violate the accuracy requirement. One can

53

also recall the similarity of this observation to the findings in spectral graph theory,where stronger connectivity of the graph corresponds to larger eigenvalues of thegraph Laplacian L. These motivate us to simplify the procedure by examining thepatch candidate P′j with largest connection to Pj .

Though our algorithm does not assume any a priori structural information of A,its efficiency and effectiveness may rely on the hidden locality properties of A. Toperform a complexity analysis of Algorithm 4, we first introduce some notations.Similar to the layers of neighbors defined in (2.2.12), we defineN1(v) = N (v), and

Nk+1(v) = N (Nk (v)) = u ∈ V : u ∼ Nk (v),

that is, for any u ∈ Nk (v), there is a path of length k that connects u and v withrespect to the connection relation “∼” defined in Definition 2.1.2. The followingdefinition describes the local interaction property of A:

Definition 2.3.2 (Locality/Sparsity of A). A is said to be local of dimension d withrespect toV , if

#Nk (v) = O(kd), ∀k ≥ 1, ∀v ∈ V . (2.63)

The following definition describes the local spectral properties of an energy decom-position of A. It states that a smaller local patch corresponds to a smaller scale, andthat ε(P, q) tends to increase and δ(P, q) tends to decrease as the patch size of P

increases. This explains why we combine patches from finer scales to coarser scalesto construct the desired partition P.

Definition 2.3.3 (Local energy decomposition). E = Ek mk=1 is said to be a local

energy decomposition of A of order (q, p) with respect to V , if there exists someconstant h, such that

ε(Nk (v), q) = O((hk)p), ∀k ≥ 1, ∀v ∈ V . (2.64)

Moreover, E is said to be well-conditioned if there is some constant c such that

ε(Nk (v), q)2δ(Nk (v), q) ≤ c, ∀k ≥ 1, ∀v ∈ V . (2.65)

Remark 2.3.4.

- The locality of A implies that #Nk+1(v)−#Nk (v) = O(d · kd−1). In particular#N1(v) = O(d), and thus the number of nonzero entries of A is m = O(d · n).

54

- LetP be a partition ofV such that each patch P ∈ P satisfies diam(P) = O(r)and #P = O(rd), where r is an integer, and “diam” is the path diameter withrespect to the adjacency relation “∼” defined in Definition 2.1.2. Let Sk (P)be the layers of neighbors (patch layers) defined in (2.2.12), and #PSk (P)denote the number of patches in Sk (P). Then the locality of A implies that

#PSk (P) = O(#Sk (P)

rd ) = O((rk)d

rd ) = O(kd).

This means that a V with adjacency relation defined by A has a self-similarproperty between fine scale and coarse scale.

These abstract formulations/notations actually summarize a large class of problemsof interest. For instance, suppose A is assembled from the FEM discretization of awell-posed elliptic equation with homogeneous Dirichlet boundary conditions:

Lu =∑

0≤|σ |,|γ |≤p

(−1) |σ |Dσ (aσγDγu) = f , u ∈ Hp0 (Ω),

where Ω ⊂ Rd is a bounded domain. LetV be the nodal basis of the discretization,and each energy element E in E be the energy inner product matrix (i.e. the stiffnessmatrix) of the neighbor nodal functions on a fine mesh patch. The locality of Land the underlying dimension of Ω ensure that A is local of dimension d withrespect to V . With a consistent discretization of L on local domains, the interiorenergy corresponds to a Neumann boundary condition, while the closed energycorresponds to a Dirichlet boundary condition. In this sense, using a continuouslimit argument and the strong ellipticity assumption, Hou and Zhang in [53] provethat if q ≥

(p+d−1

d

), then ε(Nk (v), q) . (hk)p (generalized Poincaré inequality)

where h is the fine mesh scale and p is half the order of the elliptic equation; andε(Nk (v), q)2δ(Nk (v), q) ≤ c (inverse estimate) for some scaling-invariant constantc. Unfortunately these arguments would be compromised if strong ellipticity is notassumed, especially when high-contrast coefficients are present. However, as we seein Example 2.2.18, ε(Nk (v), q) and δ(Nk (v), q) actually converge when the contrastof the coefficient becomes large, which is not explained by general analysis. So wecould still hope that the matrix A and the energy decomposition E have the desiredlocality that can be numerically learned, even when conventional analysis fails.

55

Estimate of patch numberIntuitively, if A is local of dimension d, and E is well-conditioned and local of orderq, then the patch number of an ideal partition P subject to accuracy ε should be

#P = O( n

(ε1/p/h)d

)= O

( nhd

εd/p

), δ(P, q) = O

( cε2

), (2.66)

where we estimate the path diameter of each patch in P by O(ε1/p/h), and thus thepatch size by O((ε1/p/h)d).

Inherited localityAs we have made the locality assumption on A, we would hope that the compressedoperator PA

ΨA−1 = Ψ A−1st Ψ

T can also take advantage of such locality. In fact,the localization of Ψ not only ensures the efficiency of the construction of Ψ, butalso conveys the locality of A to the stiffness matrix Ast . Suppose A is local ofdimension d. Let Ψ be the local approximator obtained in Algorithm 3 such thatψi = ψ

ri , 1 ≤ i ≤ N for some uniform radius r . Let V = vi

Ni=1 be the orthonormal

basis of RN such that

vTi Astv j = vT

i ΨT AΨv j = ψ

Ti Aψ j, ∀ 1 ≤ i, j ≤ N .

We then similarly define the adjacency relation between basis vectors in V withrespect to Ast. Since each localized basis ψi interacts with patches in r patch layers(thus interacts with other ψi′ corresponding to patches in 2r layers), and each patchcorresponds to q localized basis functions, using the result in Remark 2.3.4 we have

#Nk (v) = O((rk)d · q) = q · rd · O(kd), ∀k ≥ 1, ∀v ∈ V .

That means Ast inherits the locality of dimension d from A. In addition, by usingthe same argument as in Remark 2.3.4, we have #N1(v) = O(d · q · rd), which is thenumber nonzero entries (NNZ) of one single column of Ast. Therefore the numberof nonzero entries(NNZ) of Ast can be bounded byO(N ·d ·q ·rd) = O(m ·q2 ·rd/s),where m = O(d · n) is the NNZ of A, s is the average patch size, and we have usedthe relation N = nq/s. In particular, if the localization error is subject to ε loc =

ε√N,

then the radius has estimate r = O(log 1ε + log n + log δ(P, q)), and thus the bound

on the NNZ of Ast becomes

O(m · q2 ·1s· (log

1ε+ log n + log δ(P, q))d). (2.67)

56

Choice of q

Recall that, instead of choosing a larger enough qj for each patch Pj to satisfiedε(Pj, qj ) ≤ ε , we use a uniform integer q for all patches, and leave the mission ofaccuracy to the construction of partition. So before we proceed to the algorithm,we still need to know what q we should choose. In some problems, q can bedetermined by theoretical analysis. For example, when solving elliptic equation oforder 2p with FEM, we should at least choose q =

(p+d−1

d

)to obtain an optimal

rate of convergence. And as for graph Laplacians that are generally considered asa discrete second-order elliptic problem (p = 1), we can thus choose q = 1. Butwhen the problem is more complicated and has no intrinsic order, the choice of q

can be tricky. So one practical strategy is to start from q = 1, and increase q whenthe partition obtained is not acceptable.

Complexity of Algorithm 4For simplicity, we assume that all patches in the final output partition P have thesame patch size s. Under locality assumption of A given in (2.63), the local operationcost of Find_Match(Pj) is approximately

O(d · F (size(Pj ))) ≤ O(d · F (s)).

Here F (#Pj ) is a function of #Pj that depends only on complexity of solving localeigen problems(with respect to APj

) and local inverse problems(with respect to APj )on patch Pj , thus we can bound F (s) by O(s3). Since patches are combined pair-wise, the number of while-loops starting at Line 2 is of order O(log s), and in eachwhile-loop, the operation cost can be bounded by

O(d ·F (s)

s· n) +O(n · log n) = O(d · s2 · n) +O(n · log n),

where O(d · F (s)s · n) comes from operating Find_Match(Pj) for all surviving active

patch Pj , and O(n · log n) comes from sorting operations. Therefore the totalcomplexity of Algorithm 4 is

O(d · s2 · log s · n

)+O(log s · n · log n). (2.68)

Complexity of Algorithm 1Combining all procedures together and noticing that Algorithm 2 can be absorbedto Algorithm 4, the complexity of Algorithm 1 is

O(d ·s2 ·log s ·n

)+O(log s ·n·log n)+O

(q ·n·ε2 · ‖A‖2 ·(log

1ε+log n+log δ(P))d+1),

(2.69)

57

where n is the original dimension of basis, s is the maximal patch size, ε is theprescribed accuracy, and we have used the fact ε(P, q) ≤ ε .

Remark 2.3.5.

- The maximum patch size s can be viewed as the compression rate since s ∼ nM

where M is the patch number. The complexity analysis above implies that fora fixed compression rate s, complexity of Algorithm 4 is linear in n. However,when the locality conditions (2.63) and (2.64) are assumed, one can see thats ∼

(ε

1p /h

)d , where ε is the desired (input) accuracy. As a consequence,while the desired accuracy is fixed, if n increases, then the complexity isno longer linear in n since h (finest scale of the problem) may change as n

changes. In other words, Algorithm 4 loses the near-linear complexity whenthe desired accuracy ε is set too large compared to the finest scale of theproblem. To overcome such limitation, we should consider a hierarchicalpartitioning introduced in Section 3.1.

- As we mentioned before, the factor ε2‖A‖2 also suggests a hierarchical com-pression strategy, when ‖A‖2 is too large compared to the prescribed accuracyε2.

2.4 Numerical ExamplesIn this section, two numerical examples are reported to demonstrate the efficacy andeffectiveness of our proposed operator compression algorithm. For consistency, allthe experiments are performed on a singlemachine equippedwith Intel(R)Core(TM)i5-4460 CPU with 3.2GHz and 8GB DDR3 1600MHz RAM.

2.4.1 Numerical Example 1: A Graph Laplacian ExampleThe first numerical example arises from solving the finite graph Laplacian systemLx = b, where L is the Laplacian matrix of a d-dimensional undirected randomgraph G = [V, E]. Vertices xi = (xi

1, · · · , xid) ∈ V ⊂ Rd are generated subject to a

uniform distribution over the domain Ω = [0, 1]d . Edge weights are then given by

wii = ci, ∀i; wi j =

r−2i j , if r2i j ≤ η/n2d ,

0, otherwise,∀ i , j,

where ri j = ‖xi −x j ‖2, n = #V is the number of vertices and η > 0 is some densityfactor for truncating long distance interactions. We set ci = 1 ∀i for the sake ofwell-posedness and invertibility of the graph Laplacian, which gives ‖L−1‖2 = 1.

58

1 1.5 2 2.5 3 3.5 4

Vertex Number #104

0.8

1

1.2

1.4

1.6

1.8

2

2.2

Time (s)

3-D, 2 = 4

2-D, 2 = 2.5

2-D, 2 = 3

(a) Running time

2 2.5 3 3.5 4

NNZ #105

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

Time (s)

2-D, 2 = 2.5

(b) 2-D graphs with η = 2.5

3 3.5 4 4.5 5 5.5

NNZ #105

1.2

1.4

1.6

1.8

2

2.2

Time (s)

2-D, 2 = 3

(c) 2-D graphs with η = 3

2 2.2 2.4 2.6 2.8

NNZ #105

0.8

0.9

1

1.1

1.2

Time (s)

3-D, 2 = 4

(d) 3-D graphs with η = 4

Figure 2.3: An illustration of the running time of Algorithm 4. (a) plots therunning time against the number of vertices of random generated graphs in differentdimensions and density factor η. (b)-(d) plot the running time against the numberof nonzero entries (NNZ) in the graph Laplacian operators.

We also remark that our choice of η ensures that the graph is locally connectedand that the second smallest eigenvalue of L is of order O(1). This also givesn ∝ Number of nonzero entries (NNZ) of L in this example (which is actually notrequired by our algorithm). The basis V ⊂ Rn is given by the natural basis withrespect to vertices values, and the energy decomposition E = Ek

mk=1 is collected

as described in Example 2.1.10, where each Ek corresponds to an edge in G. Sincep = 1 for graph Laplacian, we set q = 1 throughout this numerical example.

We first verify the complexity of Algorithm 4 by applying it to partition randomgraphs generated as described above. To be consistent, we set the prescribedaccuracy 1

εd∝ n and the upper bound c of δ(P, 1)ε(P, 1)2 to be 100 in all cases,

which is large enough for patches to combine with each other. Figure 2.3 illustratesthe nearly-linear time complexity of our algorithm with respect to the graphs’ vertexnumber n, which is consistent to our complexity estimation in Section 2.3. Every dot

59

represents the partitioning result of the given 2-D/3-D graph. In particular, the redand blue sets of dots are the partition results for 2-D graphs under the constructionof η = 2.5 and η = 3 respectively, while the magenta point set represents thepartitioning of the random 3-D graphs under the setting of η = 4. Similarly,Figure 2.3b - Figure 2.3d show, respectively, the time complexity of Algorithm 4versus the NNZ of L. Notice that for graphs having NNZ with order up to 105, therunning time is still within seconds. These demonstrate the lightweight nature ofAlgorithm 4.

1 1.2 1.4 1.6 1.8 2

1/ 02 #104

800

1000

1200

1400

Patch number

(a) 2-D graphs: #P ∝ 1ε2

1 1.2 1.4 1.6 1.8 2

1/ 02 #104

500

600

700

800

900

1000

Patch number

(b) 2-D graphs: #P ∝ 1ε2

4 5 6

1/ 03 #104

1500

2000

2500

3000

Patch number

(c) 3-D graphs: #P ∝ 1ε3

Figure 2.4: Intrinsic dimension of the sets of graphs.

Second, we record the patch numbers of partition P obtained fromAlgorithm 4. Wefix the domainΩ = [0, 1]× [0, 1] and gradually increase the density of vertices in thegraph. Therefore, we have n ∝ 1

hd and thus #P ∝ 1εd

by the observation in (2.66).The relationship between 1

εdand patch number #P for the three cases is plotted

in Figure 2.4a - Figure 2.4c respectively. Figure 2.4a and (2.4b) show the linearrelationship between #P and 1

ε2, meaning that d = 2 in these cases. Similarly, the

plot in Figure 2.4c that discloses the dimension of the input graphs is 3-dimensionalas #P ∝ 1

ε3. These results precisely justify the capability of our framework in

capturing geometric information of the domain.

Third, we focus on one particular 2-D random graph with vertex number n =

10000 to verify the performance of our algorithm on controlling the error and well-posedness of the corresponding compressed operator. We employ the concept of thek-nearest neighbor (KNN) to impose local interaction. Specifically, for each vertexxi, we denote NN(xi, ki) to be the set of the ki nearest vertices of xi. Any twovertices xi and x j have an edge of weight wi j = 1/r2i j if and only if xi ∈ NN(x j ; k j )or x j ∈ NN(xi; ki). Let y = (0.5, 0.5) be the center of Ω. We set ki = 15 if‖xi − y‖2 ≤ 0.25 and ki = 5 otherwise. Therefore the sub-graph inside the diskB(y, 0.25) has a stronger connectivity than the sub-graph outside.

60

We perform Algorithm 1 with a fixed condition control c = 50 and a prescribedaccuracy ε2 varies from 0.001 to 0.0001. Figure 2.5a shows the ratios ε2com/ε2

and ε(P, 1)2/ε2, where ε2com = ‖L−1 − PLΨ

L−1‖2 is the compression error. UsingAlgorithm 4, we achieve a nearly optimal local error control. Also notice that theglobal compression error ratio ε2com/ε2 is strictly bounded by 1 but also above 0.5,meaning that our approach is neither playing down nor overdoing the compression.Figure 2.5b shows the condition number of Ast, which is consistent to the prescribedaccuracy, and is strictly bounded by ε2 · δ(P, 1) and the prescribed condition boundc. Figure 2.5c plots patch number #P versus ε−2 (the blue curve). Though thegraph has different connectivity at different parts, it is still a 2-D graph and is locallyconnected in the sense of Equation (2.63). Therefore the curve is below linear,which is consistent to estimate (2.66) with d = 2. As comparison, the red curve isthe optimal compression dimension subject to the same prescribed accuracy givenby eigenvalue decomposition. Since Algorithm 4 combines patches pair-wisely,the output patch number #P can be up to 2 times the optimal case. Figure 2.5dshows the partition result with #P = 298 for the case ε2 = 0.001, where the blacklines outline the boundaries of patches. Figure 2.5e illustrates the patch sizes of thepartition. We can see that patches near the center of the domain have larger sizesthan the ones near the boundary, since the graph has a higher connectivity insidethe disk B((0.5, 0.5), 14 ).

We also fix the prescribed accuracy ε2 = 0.0001 to study the performance ofcompression with localization. In this case we have N = #P = dim(Φ) = 1446. LetΨ be the local approximator of Ψ constructed by Algorithm 3 subject to localizationerror ‖ψi − ψi‖

2A ≤ ε2loc. Figure 2.6 shows the compression error ε2com = ‖L−1 −

PLΨ

L−1‖2, mean radius and mean support size of Ψ with different ε2loc varies from0.1 to 0.0001. Recall that Lemma 2.2.8 requires a localization error ε2loc = ε

2/N

to ensure ε2com ≤ ε2, but Figure 2.6a shows that ε2loc = ε2 is adequate. Figure 2.6b

shows the linearity between mean radius and log 1ε loc

, which is consistent to theexponential decay of Ψ proved in Theorem 2.2.21. Figure 2.6c shows the quadraticrelation between mean support size and mean radius of Ψ, which again reflects thegeometric dimension of the graph is 2.

By fixing ε2loc = 0.0001, we have the mean radius of Ψ ≈ 4.5 and the mean supportsize ≈ 449. We pick three functions ψ1, ψ2, ψ3 such that ψ1 is close to the center ofΩ, ψ2 is near the boundary of connectivity change, and ψ3 is close to the boundaryof Ω. Figure 2.7a-2.7f (first two rows of Figure 2.7) show the profiles of |ψi | and

61

(a) (b) (c)

(d) (e)

Figure 2.5: Error and well-posedness studies of the compressed operators.

log10 |ψi | for i = 1, 2, 3. Though all of them decay exponentially from their centerpatches to outer layers, ψ1 decays slower than ψ3 since the graph has a higherconnectivity (i.e., larger patch sizes) near the center. Figure 2.7g and Figure 2.7hshow the profiles of |ψi | and log10 |ψi | for i = 1, 2, 3. The bird’s-eye view of theirsupports is shown in Figure 2.7i. Similarly, ψ1 needs a larger support than ψ3 toachieve the same accuracy, which implies that ψ1 decays slower than ψ3. Figure 2.8shows the spectrum of L−1, L−1 − PL

ΨL−1 and L−1 − PL

ΨL−1. Notice that if we

truncate the fine-scale part of L−1 with prescribed accuracy ε2, then L−1 − PLΨ

L−1

has rank n− N and the compression error is 8.13× 10−5 < ε2. Similarly, if the localapproximator ψ is applied (instead of ψ), then L−1 − PL

ΨL−1 also has rank n − N ,

and the compression error is also 8.13 × 10−5 < ε2. This means that the samecompression error is achieved by the compression PL

ΨL−1 with localization.

Fourth, we compare our results with the compression given by the PCA [55],that is, L−1 ≈

∑NPCAi=1 λ−1i qiqT

i , where λi is the ith smallest eigenvalue of L andqi is the corresponding normalized eigenvector. On the one hand, to achieve thesame compression error 8.13 × 10−5, we need NPCA = 893, which is the optimalcompression dimension for such accuracy. But we remark that such achievementrequires solving global eigen problem and the compressed operator

∑Npca

i=1 λ−1i qiqTi

62

(a) (b) (c)

Figure 2.6: Compression error ε2com and the mean radius of Ψ.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 2.7: Profiles of ψ1, ψ2, ψ3, and the corresponding approximator ψ1, ψ2, ψ3.

usually loses the original sparsity features. On the other hand, our approach has alarger number of basis functions (N = 1446) since only local eigen information isused and patches are combined pair-wisely. But our approach gives local functionswith compressed dimension just up to 2 times of the optimal dimension. Further, itturns out that we can recover the eigenvectors of L corresponding to relatively smalleigenvalues by solving eigenvalue problem of Ast (or Ast). Figure 2.10 shows the

63

2nd, 10th, 20th and 50th eigenvectors corresponding to the small eigenvalues of L−1

(first row) and A−1st (second row) respectively. Let λi,st be the ith smallest eigenvalueof Ast, and ξi be the corresponding eigenvector so that qi = Ψξi has l2-norm equalto 1. From the experiment, we observe that λ−1i,st is a good approximation of λ−1i andqi is a good approximation of qi for small λi, as shown in Figure 2.9. In other words,this procedure provides us convenience for computing the first few eigenvalues andeigenvectors of L, since Ast has the compressed size with a much smaller conditionnumber (κ( Ast) = 1.14 × 105 vs. κ(L) = 4.29 × 108).

(a) Spectrum of L−1 and L−1 − PLΨ

L−1 (b) Spectrum of L−1 and L−1 − PL

ΨL−1

Figure 2.8: Spectrum of L−1, L−1 − PLΨ

L−1 and L−1 − PLΨ

L−1.

(a) (b)

Figure 2.9: Difference between the true eigenvalues/eigenvectors and the approxi-mated ones.

2.4.2 Numerical Example 2: A PDE ExampleOur second numerical example arises from using GFEM to solve the followingelliptic equation with homogeneous Dirichlet boundary conditions:

−∇ · (a · ∇u) = f , u ∈ H10 (Ω), (2.70)

64

(a) q2 (b) q10 (c) q20 (d) q50

(e) q2 (f) q10 (g) q20 (h) q50

Figure 2.10: Plot of 2nd, 10th, 20th and 50th eigenvectors corresponding to L−1 (firstrow) and A−1st (second row) respectively.

whereΩ = [0, 1]×[0, 1], f ∈ L2(Ω), and the coefficient a is a 2-by-2matrix functionof (x, y) of the form

a = *,

a11 a12a12 a22

+-= *

,

cos θ sin θ− sin θ cos θ

+-

*,

µ · e1 00 µ · e2

+-

*,

cos θ − sin θsin θ cos θ

+-.

(2.71)Here θ = θ(x, y) ∈ C(Ω) is the rotation (deformation) factor, µ = µ(x, y) ∈ L∞(Ω)is the contrast factor, and ei = ei (x, y) ∈ L∞(Ω), i = 1, 2 are the roughnessfactor. For the problem to be elliptic and well-posed, we require that µe1, µe2 > C

for some uniform constant C > 0. To increase the level of difficulty in solvingthis elliptic PDE, we choose e1 and e2 to be highly oscillatory and µ varies fromO(1) to O(106) (high contrasts). More precisely, e1, e2 are generated with extremeroughness as ei (x, y) = 1 + wi (x, y), i = 1, 2, where for each point (x, y), we setw1(x, y),w2(x, y) i.i.d

∼ U ([−0.1, 0.1]), a uniform distribution on [−0.1, 0.1]. Thecontrast factor µ(x, y) is generated from the background permeability as shown inFigure 2.11a. θ is given by θ(x, y) = π · (x + y). The magnitude of |a11(x, y) | in Ωis also plotted in Figure 2.11d as a reference.

We use GFEM [19] with a regular triangularization to form a finite system that isfine enough to capture the details of the background field. The basisV ∈ Rn is thevector representation of the Galerkin nodal basis and the PDmatrix A is the stiffnessmatrix of nodal basis with respect to the energy inner product

∫Ω

(∇·)T a(∇·)dxdy.The energy decomposition E = Ek

mk=1 is the collection of all patch-wise stiffness

65

matrices Ek on every triangle τk . Specifically, each Ek has the form

Ek =*.,

0w1k ,2k w1k ,2k w1k ,3kw2k ,1k w2k ,2k w2k ,3kw3k ,1k w3k ,2k w3k ,3k

0

+/-, wik, jk =

∫τk

(∇φik )T a(∇φ jk )dxdy, i, j = 1, 2, 3

where φik , i = 1, 2, 3 are the three nodal basis surrounding τk . In this case, everyfinest energy element involves three functions, which generalizes the concept ofgraphs’ edges as we mentioned before. One should also notice that for patch τk

touching the boundary ofΩ, the corresponding Ek reduces to involve only two or onefunction since nodal basis functions on boundary are not required for homogeneousDirichlet boundary conditions. Moreover, following the discussion in Section 2.3,we will choose q = 1 since the problem (2.70) is of second order. That is, we onlyconstruct one measurement function ϕ on each patch of the partition.

Partition #P ε (P)2 δ (P) κ (Ast) δ (P) ‖A−1 ‖2 max δ (P j )ε (P j )2

200 × 200Ours 1396 9.99 × 10−5 3.87 × 106 4.80 × 103 1.15 × 104 233.31

Regular 1156 5.77 × 10−5 2.53 × 1011 2.53 × 108 7.49 × 108 454.70

400 × 400Ours 1199 9.99 × 10−5 4.05 × 106 3.12 × 103 1.11 × 104 277.60

Regular 1156 5.05 × 10−5 3.14 × 1011 2.26 × 108 8.59 × 108 1.06 × 103

Table 2.1: Comparison with the uniform regular partitioning for 200 × 200 and400 × 400 resolutions.

In this example, we compare our partitioning technique with the performance ofregular partition to illustrate the adaptivity of our algorithm to the features ofthe coefficient function a(x, y). Here we consider the cases with two differentresolutions, which are the regular triangulations with 200 × 200 and 400 × 400vertices respectively. We set q = 1, the prescribed accuracy ε2 = 10−4 and theupper bound c = 300. We apply Algorithm 1 to compute the compressed operatorfor both cases. In the case of regular partition, we use a uniform and regular partitionon Ω that achieves the prescribed accuracy (i.e., ε(P, 1)2 < 10−4). Notice that inthis case, the first eigenvector, Φ j on every patch is a constant. Such choice ofΦ, along with the use of regular partition is equivalent to the set up in [53, 88].Therefore, the only modification we apply in this numerical example is the adaptiveconstruction of the construction P, which remarkably improves the behavior of thecompressed operator A−1. We would also like to remark that in the case withouthigh-contrasted channels, Algorithm 4 coherently gives a regular partitioning on thedomain as conventional partition methods.

Table 2.1 summarizes the partitioning results in both cases. Under regular partition-ing, we have #P = 1156 and each Pj has the size of at most 6×6 (Figure 2.11b) and

66

(a) Distribution of µ(x, y) (b) 200 × 200 regular (c) 400 × 400 regular

(d) |a11(x, y) | for (x, y) ∈ Ω (e) 200 × 200 proposed (f) 400 × 400 proposed

Figure 2.11: Partitioning result: operator stemmed from the FEM of an elliptic PDEwith high-contrast coefficients. (a) shows the distribution of the high-contrast factorin the domain Ω. (b) and (c) shows the regular partition in resolution 200× 200 and400 × 400 respectively. (d) shows the value of a11(x, y) on Ω. (e) and (f) show thepartition results obtained from Algorithm 4.

12×12 (Figure 2.11c) vertices respectively. Notably, the condition factors for bothcases go up to 1011 and the corresponding true condition numbers κ(Ast) are havingan order of 108, which show that such partition will produce an ill-posed compressedoperator. Using our approach, the square error factors ε(P, 1)2 achieved in bothcases are strictly bounded above by the prescribed accuracy ε2 = 10−4 and δ(P, 1)is of the order of 106 only. Indeed, the true compression error is even smaller asthe square error factor ε(P, 1)2 is only the theoretical upper bound as required inProposition 2.2.2. Furthermore, the true condition numbers κ(Ast) are in the orderof 103 (compare to 108 from regular partitioning), which are again bounded by (andis much smaller than) δ(P, 1)‖A−1‖2 as observed in Theorem 2.2.25. Moreover,the patch number #P is comparable to the case of regular partitioning. Also noticethat maxPj∈P δ(Pj, 1) · ε(Pj, 1)2 < c = 300 in both cases, which are coherent tothe prescribed requirement. These results successfully illustrate the consistency be-tween the numerical results and theoretical discoveries in the previous sections. Thepartition results obtained by Algorithm 4 are shown in Figure 2.11e and Figure 2.11f

67

respectively.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 2.12: Samples of the localized basis functions.

We remark that the reason for the huge difference in the condition number is causedby the nonadaptivity of the partitioning (i.e., regular partition) to the given operatorA. Specifically, if some patches are fully covered in the high channel regions, the ac-curacy achieved by this patch is obviously very promising. However, correspondingpatch-wise condition factor will jump up to the similar order as the high contrastfactor. In other words, patches which are fully covered by regions of high contrastshould be avoided. As shown in Figure 2.11f and Figure 2.11e, our proposed Algo-rithm 4 can automatically extract the intrinsic geometric information of the operator(which is the distribution of high-contrast regions) and prevent patches which arefully enclosed in the high-contrast regions.

We also consider the profiles, log-profiles and supports of three localized functionsψ1, ψ2, ψ3 obtained by Algorithm 3 with prescribed localization error ε2loc = 10−4

in the 200 × 200 resolution case. The plots are shown in Figure 2.12. We can see

68

(a) (b) (c)

(d) (e) (f)

Figure 2.13: Samples of the localized basis functions from a regular partition.

in Figure 2.12a that ψ1 has multiple peaks (three peaks exactly), with one high-contrast channel cutting through. This means that this single function characterizesthe local feature of the operator. Though the exponential decaying feature is stillobvious, to achieve the prescribed localization error, some local functions (ψ1, ψ3)have to extend along the high permeability channels and thus end up with relativelylarge supports. Recall that Remark 2.2.22 implies that the decay rate of ψi andthus the radius of ψi are invariant under local scaling (contrast scaling). But higherpermeability means stronger connectivity and consequently larger patch sizes, andtherefore ψi extends farther in physical distance along high permeability channels.As a limitation of our approach, this long range decaying compromises the sparsityof the localized basis and the stiffness matrix, which is an issue that we plan toresolve in our future work. As comparison, Figure 2.13 plots the log-profile and thesupports of three localized functions ψre

1 , ψre2 , ψ

re3 obtained similarly but with the

regular partition in the 200 × 200 resolution case. We can see that some localizedfunctions under regular partition also have relative large support size. However,such long distance extension is not the result of large patch sizes (since regularpartition has uniform patch size), but the result of large condition factor δ(P, 1).Recall that to achieve an desired localization error, the radius of localized basis isalso affected by log δ(P, 1).

69

C h a p t e r 3

MULTIRESOLUTION MATRIX DECOMPOSITION

In this chapter, we extend our preceding operator compression method to a MMDframework. OurMMDmethod is the product of a hierarchical partitioning procedureand an associated construction of a nested basis structure, which passes the energydecomposition from the finest level to the coarsest level. It resolves the largecondition number of a PD matrix by decomposing it into many well-conditionedpieces. This MMD scheme naturally leads to a parallelizable fast linear solver forPD linear systems.

In Section 3.1, we systematically explain the how to generalize our operator com-pression method to a MMD framework, based on a hierarchically inherited energydecomposition. In Section 3.2, we propose the concept of localization of MMD andthe inherited locality of the compressed operator. These essential ingredients guidesus to develop the parallelizable solver with nearly-linear time complexity for largeand sparse PD linear systems. We further discuss our MMD frame in Section 3.3from the perspective of a multilevel operator compression. Section 3.4 is dedicatedto several graph Laplacian examples that illustrate the effectiveness of our MMDapproach in resolving the large condition number of a PD matrix. Error estimateand numerical results are reported to show the efficacy of this proposed algorithm.

3.1 Multiresolution Matrix Decomposition (MMD)One can see that what we have been doing with operator compression is essentiallyto truncate the microscopic/fine-scale part of A while preserving the necessarymacroscopic/coarse-scale part that dominates the accuracy. And in the meanwhile,the condition number of the compressed operator also drops to the level consistentto the prescribed accuracy. This consistency inspires us to perform the compressionprocedure hierarchically in order to separate the operator into multiple scales ofresolution rather than just two.

Now consider that, instead of just compressing the inverse A−1, we want to solvethe problem Ax = b. Due to the sparsity of A, a straightforward idea is to employiterative methods. But these methods will suffer from the large condition numberof A. Alternatively, we would like to use the energy decomposition of A, and

70

the locality (i.e. sparsity) of the energy decomposition to resolve the difficultyof large condition number. The main idea is to decompose the computation ofA−1 into hierarchical resolutions such that (i)the relative condition number in eachscale/level can be well bounded, and (ii)the sub-system to be solved on each levelis as sparse as the original A. Here we will make use of the choice of the partitionP, the basis Φ and Ψ obtained in Section 2.2 and Section 2.3 to serve the purposeof multiresolution operator decomposition. In the following, We first implement aone-level decomposition.

3.1.1 One-level Operator DecompositionLet P, Φ, Ψ and U be constructed as in Algorithm 1, namely

(i) P = Pj Mj=1 is a partition ofV;

(ii) Φ = [Φ1,Φ2, · · · ,ΦM] such that every Φ j ⊂ spanPj has dimension qj ;

(iii) Ψ = A−1Φ(ΦT A−1Φ)−1;

(iv) U = [U1,U2, · · · ,UM] such that everyUj ⊂ spanPj has dimension dim(Uj ) =dim(spanPj ) − qj and satisfies ΦT

j Uj = 0.

Then [U,Ψ] forms a basis of Rn, and we have

UT AΨ = UTΦ(ΦT A−1Φ)−1 = 0. (3.1)

Thus the inverse of A can be written as

A−1 =(

UT

ΨT

−1

UT

ΨT

A

[U Ψ

] [U Ψ

]−1 )−1=U (UT AU)−1UT + Ψ(ΨT AΨ)−1ΨT .

(3.2)

In the following we denote ΨT AΨ = Ast and UT AU = Bst respectively. We alsouse the phrase “solving A−1” to mean “solving A−1b for any b". From (3.2), weobserve that solving A−1 is equivalent to solving A−1st and B−1st separately. For Bst,notice that since the space/basis U is constructed locally with respect to each patchPj , Bst will inherit the sparsity characteristic from A if A is local/sparse. Thus itwill be efficient to solve B−1st using iterative type methods if the condition numberof Bst is bounded. In the following, we introduce Lemma 3.1.1 which provides anupper bounded of the Bst that ensures the efficiency of solving B−1st . The proof of the

71

lemma imitates the proof from Theorem 10.9 of [89], where the required condition(2.18) corresponds to Equation (2.3) in [89].

Lemma 3.1.1. If Φ satisfies the condition (2.18) with constant ε , then

λmax(Bst) ≤ λmax(A) · λmax(UTU), λmin(Bst) ≥1ε2· λmin(UTU), (3.3)

and thusκ(Bst) ≤ ε2 · λmax(A) · κ(UTU). (3.4)

Proof. For λmax(Bst), we have

λmax(Bst) = ‖Bst‖2 = ‖UT AU ‖2 ≤ ‖A‖2‖U ‖22= ‖A‖2‖UTU ‖2 = λmax(A)λmax(UTU).

For λmin(Bst), since Φ satisfies the condition (2.18) with constant ε and ΦTU = 0,we have

‖x‖22 ≤1

λmin(UTU)xTUTU x ≤

ε2

λmin(UTU)xTUT AU x =

ε2

λmin(UTU)xT Bstx,

thus λmin(Bst) ≥ 1ε2λmin(UTU).

We shall have some discussions on the bound ε2 · λmax(A) · κ(UTU) separately intotwo parts, namely (i) ε2 · λmax(A); and (ii) κ(UTU). Notice that UTU is actuallyblock-diagonal with blocks UT

j Uj , therefore

κ(UTU) =λmax(UTU)λmin(UTU)

=max1≤ j≤M λmax(UT

j Uj )

min1≤ j≤M λmin(UTj Uj )

. (3.5)

In other words, we can bound κ(UTU) well by choosing proper Uj for each Pj . Forinstance, if we allow any kind of local computation on Pj , we may simply extendΦ j to an orthonormal basis of spanPj to get Uj by using QR factorization [102].In this case, we have κ(UTU) = 1.

For the part ε2λmax(A), recall that we construct Φ based on a partition P and aninteger q so thatΦ satisfies condition (2.18) with constant ε(P, q), thus the posteriorbound of κ(Bst) is ε(P, q)2λmax(A)(when κ(UTU) = 1). Recall that κ(Ast) isbounded by δ(P, q)‖A−1‖2, therefore κ(Ast)κ(Bst) ≤ ε(P, q)2δ(P, q)κ(A). That is,Ast and Bst divide the burden of the large condition number of Awith an amplification

72

factor ε(P, q)2δ(P, q). We call κ(P, q) , ε(P, q)2δ(P, q) the qth-order conditionnumber of the partition P. This explains why we attempt to bound κ(P, q) in theconstruction of the partition P.

Ideally, we hope the one-level operator decomposition gives κ(Ast) ≈ κ(Bst), sothat the two parts equally share the burden in parallel. But such result may notbe good enough when κ(Ast) and κ(Bst) are still large. To fully decompose thelarge condition number of A, a simple idea is to recursively apply the one-leveldecomposition. That is, we first set a small enough ε to sufficiently bound κ(Bst);then if κ(Ast) is still large, we apply the decomposition to A−1st again to furtherdecompose κ(Ast). However, the decomposition of A−1 is based on the constructionof P and Φ, namely on the underlying energy decomposition E = Ek

mk=1 of A.

Hence, we have to construct the corresponding energy decomposition of Ast beforewe implement the same operator decomposition on A−1st .

3.1.2 Inherited Energy DecompositionLet E = Ek

mk=1 be the energy decomposition of A, then the inherited energy

decomposition of Ast = ΨT AΨ with respect to E is simply given by EΨ = EΨ

k mk=1

whereEΨ

k = ΨT EkΨ, k = 1, 2, · · · ,m. (3.6)

Notice that this inherited energy decomposition of Ast with respect to E has the samenumber of energy elements as E, which is not preferred and actually redundantin practice. Therefore we shall consider reducing the energy decomposition ofAst. Indeed we will use Ψ instead of Ψ in practice, where each ψi is some localapproximator of ψi (obtained by Construction 2.2.9). Specifically, we will actuallydeal with Ast = Ψ

T AΨ and thus we shall consider to find a proper condensed energydecomposition of Ast.

If we see Ast as a matrix with respect to the reduced space RN , then for any vectorx = x1, . . . , xN ∈ R

N , the connection x ∼ EΨ between x and some EΨ = ΨT EΨ

comes from the connection between E and those ψi corresponding to nonzero xi,and such connections are the key to constructing a partition Pst for Ast. Recall thatthe support of each ψi(Sk (Pji ) for some k) is a union of patches, there is no needto distinguish among energy elements interior to the same patch when we deal withthe connections between these elements and the basis Ψ. Therefore we introducethe reduced inherited energy decomposition of Ast = Ψ

T AΨ as follows:

Definition 3.1.2 (Reduced inherited energy decomposition). With respect to the

73

underlying energy decomposition E of A, the partition P and the correspondingΨ, the reduced inherited energy decomposition of Ast = Ψ

T AΨ is given by EΨre =

AΨPjMj=1 ∪ E

Ψ : E ∈ EcP with

AΨPj= ΨT APj

Ψ, j = 1, 2, · · · , M, and (3.7)

EΨ = ΨT EΨ, ∀E ∈ EcP, (3.8)

where EcP= E\EP with EP = E ∈ E : ∃Pj ∈ P s.t. E ∈ Pj .

Once we have the underlying energy decomposition of Ast (or Ast), we can repeatthe procedure to decompose A−1st (or A−1st ) in RN as what we have done to A−1 in Rn.We will introduce the multi-level decomposition of A−1 in the following subsection.

3.1.3 Multiresolution Matrix DecompositionLet A(0) = A, and we construct A(k), B(k) recursively from A(0). More precisely,let E (k−1) be the underlying energy decomposition of A(k−1), and P (k), Φ(k), Ψ(k)

and U (k) be constructed corresponding to A(k−1) and E (k−1) in space RN (k−1) , whereN (k−1) is the dimension of A(k−1). We use one-level operator decomposition todecompose (A(k−1))−1 as

(A(k−1))−1 =U (k) ((U (k))T A(k−1)U (k))−1(U (k))T

+ Ψ(k) ((Ψ(k))T A(k−1)Ψ

(k))−1(Ψ(k))T,

and then define A(k) = (Ψ(k))T A(k−1)Ψ(k), B(k) = (U (k))T A(k−1)U (k), and E (k) =

(E (k−1))Ψ(k )

re as in Definition 3.1.2. Moreover, if we write

Φ(1) = Φ(1), Φ(k) = Φ(1)Φ

(2) · · ·Φ(k−1)Φ

(k), k ≥ 1, (3.9a)

U (1) = U (1), U (k) = Ψ(1)Ψ

(2) · · ·Ψ(k−1)U (k), k ≥ 1, (3.9b)

Ψ(1) = Ψ(1), Ψ(k) = Ψ(1)Ψ

(2) · · ·Ψ(k−1)Ψ

(k), k ≥ 1, (3.9c)

then one can prove by induction that for k ≥ 1,

A(k) = (Ψ(k))T AΨ(k) =((Φ(k))T A−1Φ(k))−1, B(k) = (U (k))T AU (k),

(Φ(k))TΦ(k) = (Φ(k))TΨ(k) = IN (k ), Ψ(k) = A−1Φ(k) ((Φ(k))T A−1Φ(k))−1,and for any integer K ,

A−1 = (A(0))−1

=

K∑k=1U (k) ((U (k))T AU (k))−1(U (k))T + Ψ(K ) ((Ψ(K ))T AΨ(K ))−1(Ψ(K ))T .

(3.10)

74

Remark 3.1.3.

- One shall notice that the partition P (k) on each level k is not a partitionof the whole space Rn, but a partition of the reduced space RN (k−1) , andΦ(k),Ψ(k),U (k) are all constructed corresponding to this P (k) in the samereduced space. Intuitively, if the average patch sizes (basis number in apatch) for partition P (k) is s(k), then we have N (k) =

q(k )

s(k ) N (k−1), where q(k)

is the integer for constructing Φ(k).

- Generally, methods of multiresolution type use nested partitions/meshes thatare generated only based on the computational domain [8, 123]. But herethe nested partitions are replaced by level-wisely constructed ones whichare adaptive to A(k) on each level and require no a priori knowledge of thecomputational domain/space.

- In the Gamblet setting introduced in [88], equations (3.9) together with (3.10)can be viewed as the Gamblet Transform.

The multiresolution operator decomposition up to a level K is essentially equivalentto a decomposition of the whole space Rn [88] as

Rn = U (1) ⊕ U (2) ⊕ · · · ⊕ U (K ) ⊕ Ψ(K ),

where again we also useU (k)( or Ψ(k)) to denote the subspace spanned by the basisU (k)( or Ψ(k)). Due to the A-orthogonality between these subspaces, using thisdecomposition to solve A−1 is equivalent to solving A−1 in each subspace separately(or more precisely solving (B(k))−1, k = 1, · · · , K , or (A(K ))−1), and by doing so wedecompose the large condition number of A into bounded pieces as the followingcorollary states.

Corollary 3.1.4. If on each level Φ(k) is given by Construction 2.2.6 with integerq(k), then for k ≥ 1 we have

λmax(A(k)) ≤ δ(P (k), q(k)), λmin(A(k)) ≥ λmin(A),

λmax(B(k)) ≤ δ(P (k−1), q(k−1))λmax((U (k))TU (k)),

λmin(B(k)) ≥1

ε(P (k), q(k))2λmin

((U (k))TU (k)),

and thusκ(A(k)) ≤ δ(P (k), q(k))‖A−1‖2,

75

κ(B(k)) ≤ ε(P (k), q(k))2δ(P (k−1), q(k−1))κ((U (k))TU (k)) .

For consistency, we write δ(P (0), q(0)) = λmax(A(0)) = λmax(A).

Proof. These results follow directly from Theorem 2.2.25 and Lemma 3.1.1.

Remark 3.1.5. The fact λmin(B(k)) & 1ε(P (k ),q(k ) )2 implies that the level k is a level

with resolution of scale no greater than ε(P (k), q(k)), namely the space U (k) is asubspace of the whole space Rn of scale finer than ε(P (k), q(k)) with respect to A.This is essentially what multiresolution means in this decomposition.

Now we have a multiresolution decomposition of A−1, the applying of A−1 (namelysolving linear system Ax = b)) can break into the applying of (B(k))−1 on each leveland the applying of (A(K ))−1 on the bottom level. In what follows, we always assumeκ((U (k))TU (k)) = 1. Then the efficiency of the multiresolution decomposition inresolving the difficulty of large condition number of A lies in the effort to boundeach ε(P (k), q(k))2δ(P (k−1), q(k−1)) so that B(k) has a controlled spectrum widthand can be efficiently solved using the CG type method. Define κ(P (k), q(k)) =ε(P (k), q(k))2δ(P (k), q(k)) and γ (k) =

ε(P (k ),q(k ) )ε(P (k−1),q(k−1) ) , then we can write

κ(B(k)) ≤ ε(P (k), q(k))2δ(P (k−1), q(k−1)) = (γ (k))2κ(P (k−1), q(k−1)). (3.11)

The partition condition number κ(P (k), q(k))) is a level-wise information only con-cerning the partition P (k). Similar to what we do in Algorithm 4, we will impose auniform bound c in the partitioning process so that κ(P (k), q(k))) ≤ c on every level.The ratio γ (k) reflects the scale gap between level k−1 and k, which is why it shouldmeasure the condition number(spectrum width) of B(k). However, it turns out thatthe choice of γ (k) is not arbitrary, and it will be subject to a restriction derived outof concern of sparsity.

So far the A(k) and B(k+1) are dense for k ≥ 1 since the basisΨ(k) are global. It wouldbe pointless to bound the condition number of B(k) if we cannot take advantage of thelocality/sparsity of A. So in practice, the multiresolution operator decomposition isperformed with localization on each level to ensure locality/sparsity. Thus we havethe modified multiresolution operator decomposition in the following subsection.

3.2 MMD with LocalizationLet A(0) = A, and we construct A(k), B(k) recursively from A(0). More precisely,let E (k−1) be the underlying energy decomposition of A(k−1), and P (k), Φ(k), Ψ(k)

76

and U (k) be constructed corresponding to A(k−1) and E (k−1) in space RN (k−1) . Wedecompose ( A(k−1))−1 as

( A(k−1))−1 =U (k) ((U (k))T A(k−1)U (k))−1(U (k))T

+ Ψ(k) ((Ψ(k))T A(k−1)Ψ

(k))−1(Ψ(k))T .(3.12)

Let Ψ(k) be a local approximator of Ψ(k). Then we define

A(k) = (Ψ(k))T A(k−1)Ψ

(k), B(k) = (U (k))T A(k−1)U (k), (3.13)

and E (k) = (E (k−1))Ψ(k )

re as in Definition 3.1.2.

Similar to Corollary 3.1.4, we have the following estimates on the condition numbersof A(k) and B(k):

Corollary 3.2.1. If on each level Φ(k) is given by Construction 2.2.6 with integerq(k), and Ψ(k) is a local approximator of Ψ(k) subject to localization error ‖ψ (k)

i −

ψ (k)i ‖A(k−1) ≤

ε√N (k )

, then for k ≥ 1 we have

λmax( A(k)) ≤(1 +

ε√δ(P (k), q(k))

)2δ(P (k), q(k)), λmin( A(k)) ≥ λmin(A),

λmax(B(k)) ≤(1 +

ε√δ(Pk−1), q(k−1))

)2δ(P (k−1), q(k−1))λmax

((U (k))TU (k)),

λmin(B(k)) ≥1

ε(P (k), q(k))2λmin

((U (k))TU (k)),

and thusκ( A(k)) ≤

(1 +

ε√δ(P (k), q(k))

)2δ(P (k), q(k))‖A−1‖2,

κ(B(k)) ≤(1+

ε√δ(P (k−1), q(k−1))

)2ε(P (k), q(k))2δ(P (k−1), q(k−1))κ

((U (k))TU (k)) .

For consistency, we write δ(P (0), q(0)) = λmax( A(0)) = λmax(A).

Proof. These results follow directly from the proof of Theorem 2.2.25, Corol-lary 2.2.26 and Lemma 3.1.1.

One can indeed prove that δ(P (k), q(k)) ≥ 1ε(P (k ),q(k ) )2 , and thus ε√

δ(P (k−1),q(k−1) )≤

εε(P (k), q(k)) is a small number. Therefore Corollary 3.2.1 states that the mul-tiresolution decomposition with localization has estimates on condition numbers of

77

the same order as in Corollary 3.1.4, i.e. κ( A(k)) ≤ O(δ(P (k), q(k))‖A−1‖2) andκ(B(k)) ≤ O(ε(P (k), q(k))2δ(P (k−1), q(k−1))). Having this in hand, we proceed todiscuss the desired sparsity of A(k) and B(k).

Locality Preservation: Similar to the locality discussion of Ast in Section 2.3,under the locality condition (2.63), we have the following recursive estimate on thenumber of nonzero entries of each A(k) as

nnz( A(k)) = O(nnz( A(k−1)) · (q(k))2 ·1

s(k) · (r(k))d), (3.14)

where s(k) is the average patch size of P (k), and r (k) is the decay radius of Ψ(k).Also, noticing that B(k) = (U (k))T A(k−1)U (k) and that the basisU (k) are local vectorsof support size s(k), we have

nnz(B(k)) = O(nnz( A(k−1)) · s(k)). (3.15)

In fact, the basisU (k) can be computed fromΦ(k) using the implicit QR factorization[102], and thus the matrix multiplication with respect to U (k) can be done by usingthe Householder vectors in time linear to q(k) · N (k). Therefore, when we evaluateB(k) = (U (k))T A(k−1)U (k) (in iterative method), only the NNZ of A(k−1) matters.In brief, we need to preserve the locality of A(k) down through all the levels toensure the efficiency of the multiresolution decomposition with localization. Butthe accumulation of the factor (q(k ) )2(r (k ) )d

s(k ) , if not well controlled, will compromisethe sparsity inherited from A(0) = A. Therefore a necessary condition for thedecomposition to keep sparsity is

o(s(k)) ≥ (q(k))2(r (k))d, k ≥ 1,

under which we have the sparsity estimate nnz( A(k)) = O(nnz(A)). In particular,when we impose the localization error ‖ψ (k)

i −ψ(k)i ‖A(k−1) ≤

ε√N (k )

on each level k forsome uniform ε , we have r (k) = O(log 1

ε + log N (k) + log δ(P (k), q(k))) accordingthe discussions in Section 2.2.2. Then the sparsity condition becomes

o(s(k)) ≥ (q(k))2(log

1ε+ log N (k) + log δ(P (k), q(k))

)d, k ≥ 1. (3.16)

This lower bound of the patch size s(k) means that we need to compress enoughdimensions from higher level to lower level in order to preserve sparsity due to theoutreaching support of the localized basis Ψ(k).

78

In practice, we will choose some ε smaller than the top level scale ε(P (1), q(1)) anda uniform integer q. By imposing uniform condition bound κ(P (k), q(k)) ≤ c wehave δ(P (k), q(k)) ≤ c

ε(P (k ),q(k ) )2 ≤cε2. Therefore a safe uniform criterion for patch

size s(k) isO(s(k)) = s = q2(log

1ε+ log n)d+l, (3.17)

for some small l > 0, which asymptotically, when n goes large and ε goes small,will ensure nnz(A(k)) = O(nnz(A)) down through the decomposition. Since thedecomposition should stop when N (K ), the dimension of A(K ) is small enough,namely when n = O((s/q)K ), (3.17) also gives us an estimate of the total levelnumber as

K = O(logs/q n) = O( log n

log(q(log 1ε + log n)d+l )

)= O

( log n

log(log 1ε + log n)

). (3.18)

Choice of scale ratio γRecall that the partition P (k) is a partition of basis in the space RN (k−1) . By tracingback to the top level, we can also see it as a partition in the original space Rn.Denoting R(k) to be the average radius (with respect to adjacency defined by A)and S(k) to be the average patch size of the patches (with respect to Rn) in P (k),we have S(k) = O((R(k))d) under the locality condition (2.63), and an intuitivegeometry estimate gives S(k )

S(k−1) = s(k). As a consequence, under the local energydecomposition condition (2.64) of order (q, p), we have the following estimate

γ (k) =ε(P (k), q)ε(P (k−1), q)

= O(( R(k)

R(k−1)

) p) = O(

( S(k)

S(k−1)

) pd ) = O((s(k))

pd ). (3.19)

Such estimate arises naturally in a lot of PDE problems, especially when the smallesteigenvalues of local operators have clear dependence on the domain size, the dimen-sion of the space and the order of the equation [53]. Under the sparsity condition(3.16) and considering q as a constant, we require

o(γ (k)) ≥ (log1ε+ log N (k) + log δ(P (k), q(k)))p, (3.20)

to ensure the sparsity of the decomposition, and similarly a safe, uniform choice ofthe scale ratio γ (k) is

γ (k) = γ = (log1ε+ log n)p+l, (3.21)

for some small l > 0. Such choice provides a uniform bound on the conditionnumber of B(k) as

κ(B(k)) ≤ O(ε(P (k), q(k))2δ(P (k−1), q(k−1))) ≤ O((log1ε+ log n)p+l ) (3.22)

79

when a uniform condition bound κ(P (k), q(k)) ≤ c is imposed by algorithm. Noticethat the ratio γ (k) is only defined for k ≥ 2, thus the estimate (3.22) is valid fork ≥ 2. For consistency, we choose ε(P (1), q(1))2 = O( (log 1

ε+log n)p+l

‖A‖2) so that (3.22)

is also valid for k = 1.

Remark 3.2.2. By estimate (3.22), the bound on κ(B(k)) will go to infinity whenn goes to infinity. In our construction of the multiresolution decomposition forresolving large condition number of A, we cannot asymptotically have an absoluteconstant bound for κ(B(k)) on all levels, due to the required preservation of sparsity.This difficulty comes from the inductive nature of the algorithm that the posteriorestimate of the sparsity of A(k) is based on the sparsity of A(k−1), as shown in (3.14).However, in [89], the existence of nested measurement function Φ is assumed a-priori before the construction of the multiresolution structure, and thus the sparsityof A(k) can be inherited directly from A(0), which avoids the accumulation of thefactor (q(k ) )2(r (k ) )d

s(k ) through levels. As a result, the sparsity of A(k) does not contradictthe uniform bound of κ(B(k)).

Error estimateUsing multiresolution operator decomposition with localization to solve A−1, theerror on each level comes from two main sources: (i) the localization error betweenΨ(k) ((Ψ(k))T A(k−1)Ψ(k))−1(Ψ(k))T and Ψ(k) ((Ψ(k))T A(k−1)Ψ(k))−1(Ψ(k))T ; (ii) theerror caused by solving (A(k))−1 =

((Ψ(k))T A(k−1)Ψ(k))−1 (or (B(k))−1) with iter-

ative type methods. To come up with an estimate of the total error, we perform astandard analysis of error accumulation in an inductive manner.

Theorem 3.2.3. Given an integer K , let Inv(A) denote the solver for A−1 usingK-level’s multiresolution operator decomposition with localization. Assume that

(i) each (B(k))−1 can be solved efficiently subject to a uniform relative error bounderrB in the sense that the solver Inv(B(k)) (as a linear operator) satisfies

‖(B(k))−1b− Inv(B(k))b‖B(k ) ≤ errB‖b‖(B(k ) )−1, ∀b ∈ RN (k ), ∀1 ≤ k ≤ K ;

(3.23)

(ii) at level K , ( A(K ))−1 can be solved efficiently subject to a relative error err(K )A

in the sense that the solver Inv( A(K )) satisfies

‖( A(K ))−1b − Inv( A(K ))b‖A(K ) ≤ err(K )A ‖b‖( A(K ) )−1, ∀b ∈ RN (K )

. (3.24)

80

(iii) each Ψ(k) satisfies the localization approximation property

‖ψ (k)i − ψ (k)

i ‖A(k−1) ≤errloc

2√

N (k) ‖A−1‖2, 1 ≤ i ≤ N (k), (3.25)

with a uniform constant errloc.

Then we have

‖A−1b − Inv(A)b‖A ≤ errtotal ‖b‖A−1, ∀b ∈ Rn,

and in ‖ · ‖2,

‖A−1b − Inv(A)b‖2 ≤ errtotal ‖A−1‖2‖b‖2, ∀b ∈ Rn.

whereerrtotal = K (errB + errloc) + err(K )

A .

Proof. First by assumption (ii), we have

‖( A(K ))−1b − Inv( A(k))b‖A(K ) ≤ err(K )A ‖b‖( A(K ) )−1, ∀b ∈ RN (K )

.

To perform induction, we assume that at level k, ( A(k))−1 can be solved subject to arelative error err(k)

A in the sense that the solver Inv( A(k)) satisfies

‖( A(k))−1b − Inv( A(k))b‖A(k ) ≤ err(k)A ‖b‖( A(k ) )−1, ∀b ∈ RN (k )

.

Recall that ( A(k−1))−1 and the solver Inv( A(k−1)) are given by

( A(k−1))−1 = U (k) ((U (k))T A(k−1)U (k))−1(U (k))T+Ψ(k) ((Ψ(k))T A(k−1)Ψ

(k))−1(Ψ(k))T,

(3.26)

Inv( A(k−1)) = U (k)Inv(B(k))(U (k))T + Ψ(k)Inv( A(k))(Ψ(k))T, (3.27)

then for any b ∈ RN (k−1) , we have

‖( A(k−1))−1b − Inv( A(k))b‖A(k−1)

≤ ‖U (k) ((U (k))T A(k−1)U (k))−1(U (k))T b −U (k)Inv(B(k))(U (k))T b‖A(k−1)

+ ‖Ψ(k) ((Ψ(k))T A(k−1)Ψ

(k))−1(Ψ(k))T b − Ψ(k) ((Ψ(k))T A(k−1)Ψ

(k))−1(Ψ(k))T b‖A(k−1)

+ ‖Ψ(k) ((Ψ(k))T A(k−1)Ψ

(k))−1(Ψ(k))T b − Ψ(k)Inv( A(k))(Ψ(k))T b‖A(k−1)

= I1 + I2 + I3.

81

Recall that A(k) = (Ψ(k))T A(k−1)Ψ(k), B(k) = (U (k))T A(k−1)U (k), then by assump-tion (i) we have

I1 = ‖(B(k))−1(U (k))T b − Inv(B(k))(U (k))T b‖B(k )

≤ errB(bTU (k) (B(k))−1(U (k))T b

) 12

≤ errB‖( A(k−1))12U (k) ((U (k))T A(k−1)U (k))−1(U (k))T ( A(k−1))

12 ‖2‖b‖( A(k−1) )−1

≤ errB‖b‖( A(k−1) )−1 .

Similarly by the assumption of induction, we have

I3 = ‖( A(k))−1(Ψ(k))T b − Inv( A(k))(Ψ(k))T b‖A(k ) ≤ err(k)A ‖b‖( A(k−1) )−1 .

Let x = ( A(k−1))−1b, then we get

I2 = ‖P A(k−1)

Ψ(k ) x − P A(k−1)

Ψ(k ) x‖A(k−1)

≤ ‖P A(k−1)

Ψ(k ) x − P A(k−1)

Ψ(k ) P A(k−1)

Ψ(k ) x‖A(k−1) + ‖P A(k−1)

Ψ(k ) P A(k−1)

U (k ) x‖A(k−1) .

Using a similar argument in Lemma 2.2.8 we can actually prove by assumption (iii)that

‖P A(k−1)

Ψ(k ) x − P A(k−1)

Ψ(k ) P A(k−1)

Ψ(k ) x‖A(k−1) ≤12errloc‖P A(k−1)

Ψ(k ) x‖A(k−1) ≤12errloc‖b‖( A(k−1) )−1,

and

‖P A(k−1)

Ψ(k ) P A(k−1)

U (k ) x‖A(k−1)

≤ ‖((Ψ(k))T A(k−1)Ψ

(k))−1‖122 ‖(Ψ

(k))T A(k−1) P A(k−1)

U (k ) x‖2

≤ ‖A−1‖122 ‖(Ψ

(k) − Ψ(k))T A(k−1) P A(k−1)

U (k ) x‖2

≤ ‖A−1‖122 ‖(Ψ

(k) − Ψ(k))T A(k−1) (Ψ(k) − Ψ(k))‖122 ‖P

A(k−1)

U (k ) x‖A(k−1)

≤12errloc‖b‖( A(k−1) )−1,

thus I2 ≤ errloc‖b‖( A(k−1) )−1 . Finally we have

‖( A(k−1))−1b − Inv( A(k−1))b‖A(k−1) ≤ (errB + errloc + err(k)A )‖b‖( A(k−1) )−1,

that is we haveerr(k−1)

A = (errB + errloc + err(k)A ).

Then by induction the relative total error using K-level’s decomposition with local-ization for solving A−1 is

errtotal = err(0)A = K (errB + errloc) + err(K )

A .

82

Remark 3.2.4. Assumption (i) is reasonable since each B(k) inherit the sparsity fromA, and its condition numbers can be well bounded in orderO((log 1

errloc + log n)p+l ).Assumption ((ii)) is reasonable since each A(K ) is of small dimension when K isas large as in (3.18). Assumption (iii) is reasonable due to the exponential decayproperty of each Ψ(k). Indeed, to ensure locality of reduced energy decomposition,the localization error control can be relaxed in practice. Such relaxed error can befixed by doing compensation computation as we will see in Section 3.4.

Figure 3.1: Process flowchart of Algorithm 6.

3.2.1 AlgorithmWe now summarize the procedure of MMD with localization as Algorithm 6, andthe use of MMD to solve linear system as Algorithm 7. Also, Figure 3.1 shows theflowchart of Algorithm 6.

Algorithm 6 MMD with localization

Input: PD matrix A = A(0), energy decomposition E = E (0), underlying basisV , localization constant ε , level number K , q(k), error factor bound ε(k) andcondition bound c(k) for each level.

Output: A(K ), Ψ(k),U (k), and B(k).1: for k = 1 : K do2: Construct P (k),Φ(k),U (k), Ψ(k) with Algorithm 1, with respect to

A(k−1), E (k−1), and subject to q(k), ε(k), c(k) and localization error ε√N (k )

;3: Compute A(k) and B(k) by (3.13);4: Compute reduced energy E (k) by (3.8);5: output/store Ψ(k),U (k), B(k);6: end for7: output/store A(K ).

Remark 3.2.5.

- Once the MMD is obtained, the first for-loop (Line 1) in Algorithm 7 can beperformed in parallel, which makes it much more efficient than nonparalleliz-able iterative methods.

- Once the whole decomposition structure is completed, we can a posterioromit the level-wise energy decompositions and partitions. Then if we see our

83

Algorithm 7 Solving linear system with MMD with localization

Input: Ψ(k),U (k), B(k) for k = 1, 2, · · · , K , A(K ), load vector b = b(0), prescribedrelative accuracy ε

Output: Approximated solution x (0).1: for k = 1 : K do2: z(k) = (U (k))T b(k−1);3: Solve B(k)y(k) = z(k) up to relative error ε ;4: b(k) = (Ψ(k))T b(k−1);5: end for6: Solve A(K ) x (K ) = b(K ) up to relative error ε ;7: for k = K : 1 do8: x (k−1) = U (k)y(k) + Ψ(k) x (k);9: end for

level-wisely constructed Φ as a nested sequence (3.9a), our decompositionis structurally equivalent to the result obtained in [89], where the existenceof such nested Φ is a priori assumed. Therefore, the required properties ofthe nested sequence in Condition 2.3 of [89] are similar to the assumption inTheorem 3.2.3.

Complexity of Algorithm 6Assume that locality conditions (2.63),(2.64),(2.65) are true with constant d, p, q, c.Then all q(k) and c(k) are chosen uniformly over levels to be q, c respectively. ε(k)

is chosen subject to scale ratio choice (3.21), (ε(1))2 = (log 1ε+log n)p+l

‖A‖2for some small

l > 0, and ε is chosen so that ε ≤ ε(1). Due to the condition bound c, we havecondition number estimate (3.22). Then the complexity of Line 2 can be modifiedfrom (2.69) as

O(d · s2 · log s ·n

)+O(log s ·n · log n)+O

(q ·n · (log

1ε+log n)p+l · (log

1ε+log n)d+1),

where s = O((log 1ε + log n)d(1+l/p)) according to estimate (3.19). The complexity

of Line 3 and 4(sparse matrices multiplication) together can be bounded by

O(n · (log1ε+ log n)3d)

due to the locality of Ψ(k) and the inherited locality of A(k−1) and B(k−1). Thereforethe complexity on each level can be bounded by

O(d · s2 · log s · n

)+O(log s · n · log n) +O

(q · n · (log

1ε+ log n)3d+p), (3.28)

84

where we have assumed that d ≥ 1 ≥ l. The total complexity of Algorithm 6 isthen the level number K times (3.28). By (3.18), we have K = O

( log nlog(log 1

ε+log n)

)≤

O(log n) and K log s = O(log n). Thus the total complexity of Algorithm 6 is

O(d · s2 · log n · n

)+O(n · (log n)2) +O

(K · q · n · (log

1ε+ log n)3d+p)

≤ O(m · log n · (log1ε+ log n)3d+p). (3.29)

where m = O(d · n) is the number of nonzero entries of A.

Complexity of Algorithm 7Assume that the relative accuracy ε is the same as the ε in Algorithm 6. Recall thatthe number of nonzero entries of each A(k) is bounded by O(nnz(A)) = O(m), andthe condition number each B(k) can be bounded by O((log 1

ε + log n)p+l ), then thecomplexity of solving linear system in Line 3 using a CG type method is boundedby

O(m · (log1ε+ log n)p+l · log

1ε

).

Therefore if we use a CG type method to solve all inverse problems involved inAlgorithm 7, based on the MMD with localization given by Algorithm 6, therunning time of Algorithm 7 subject to level-wise relative accuracy ε is

O(K · m · (log1ε+ log n)p+l · log

1ε

) ≤ O(m · (log1ε+ log n)p+l · log

1ε· log n).

However, by Theorem 3.2.3, the total accuracy is ε total = O(Kε ). Thus the com-plexity of Algorithm 7 subject to a total relative accuracy ε total is

O(m · (log1

ε total+ log n)p+l · (log

1ε total

+ log log n) · log n). (3.30)

3.3 Multilevel Operator CompressionWe can also consider the MMD from the perspective of operator compression. Forany K , by omitting the finer scale subspaces U (k), k = 1, 2, · · · , K , we get aneffective approximator of A−1 as

A−1 ≈ Ψ(K ) ((Ψ(K ))T AΨ(K ))−1(Ψ(K ))T = PAΨ(K ) A−1. (3.31)

Intuitively, this approximation lies above the scale of ε(P (K ), q(K )), and thereforeshould have a corresponding dominant compression error. However we should againnotice that the composite basis Φ(k) is not given a priori and directly in Rn, butconstructed level-by-level using the information of A(k) on each level, and recall that

85

the error factor ε(P (k), q(k)) is computed with respect to the reduced space RN (k ) ,not to the whole spaceRn. Thus the total error of compression (3.31) is accumulatedover all levels finer than level K . To quantify such compression error, we introducethe following theorem:

Theorem 3.3.1. Assume that on each level Φ(k) is given by Construction 2.2.6 withinteger q(k). Then we have

‖x − PΦ(K ) x‖2 ≤( K∑

k=0ε(P (k), q(k))2

) 12‖x‖A, ∀x ∈ Rn, (3.32)

and thus for any x ∈ Rn and b = Ax, we have

‖x − PAΨ(K ) x‖A ≤

( K∑k=1

ε(P (k), q(k))2) 12‖b‖2,

‖x − PAΨ(K ) x‖2 ≤

( K∑k=1

ε(P (k), q(k))2)‖b‖2,

‖A−1 − PAΨ(K ) A−1‖2 ≤

( K∑k=1

ε(P (k), q(k))2).

Proof. Again by Theorem 2.2.1, we only need to prove (3.32). For consistency, wewriteΦ(0) = In, and correspondingly Ψ(0) = In, PΦ(0) = In, PA

Ψ(0) = In. Using (3.9),it is easy to check that for any x ∈ Rn and any k1 ≤ k2 ≤ k3,

(PΦ(k1) x − PΦ(k2) x)T (PΦ(k2) x − PΦ(k3) x) = 0,

thus we have

‖x − PΦ(K ) x‖22 = ‖K∑

k=1(PΦ(k−1) x − PΦ(k ) x)‖22 =

K∑k=1‖PΦ(k−1) x − PΦ(k ) x‖22 .

Notice that

PΦ(k−1) x − PΦ(k ) x = Φ(k−1) (Φ(k−1))T x −Φ(k−1)Φ

(k) (Φ(k))T (Φ(k−1))T x,

thus by the construction of Φ(k)( or Φ(k)), we have

‖PΦ(k−1) x − PΦ(k ) x‖22 = ‖(Φ(k−1))T x − Φ(k) (Φ(k))T (Φ(k−1))T x‖22

≤ ε(P (k), q(k))2‖(Φ(k−1))T x‖2A(k−1)

= ε(P (k), q(k))2xTΦ(k−1) ((Φ(k−1))T A−1Φ(k−1))−1(Φ(k−1))T x

= ε(P (k), q(k))2‖PAΨ(k−1) x‖

2A

≤ ε(P (k), q(k))2‖x‖2A.

86

We have used the fact that

‖PAΨ(k ) x‖

2A = xT AΨ(k) ((Ψ(k))T AΨ(k))−1(Ψ(k))T Ax

= xTΦ(k) ((Φ(k))T A−1Φ(k))−1(Φ(k))T x, ∀k ≥ 0.

Therefore we have

‖x − PΦ(K ) x‖22 ≤( K∑

k=0ε(P (k), q(k))2

)‖x‖2A.

Remark 3.3.2.

- Though the compression error is in a cumulative form, if we assume thatε(P (k), q(k)) increases with k at a certain ratio ε(P (k ),q(k ) )

ε(P (k−1),q(k−1) ) ≥ γ for someγ > 1, then it is easy to see that

K∑k=1

ε(P (k), q(k))2 ≤γ2

γ2 − 1ε(P (K ), q(K ))2,

which is an error of scale ε(P (K ), q(K ))2 as we expected.

- Again one shall be aware of the difference between the one-level compres-sion with error factor ε(P (K ), q(K )) and the multi-level compression in Sec-tion 3.1.3. A one-level compression with error factor ε(P (K ), q(K )) requiresto construct P (K ),Φ(k) and so on directly with respect to A in the whole spaceRn, which involves solving eigenvalue problems on considerably large patchesin P (K ) when ε(P (K ), q(K )) is a coarse scale. But the multi-level compres-sion in Section 3.1.3 is computed hierarchically with bounded compressionratio between levels, and thus only involves eigenvalue problems on patchesof well-bounded size(s = O(log 1

ε + log n)d+l) in each reduced space RN (k ) ,and is thus more tractable in practice.

- One can also analyze the compression error when localization of each Ψ(k)

is considered. The analysis would be similar to the one in Theorem 3.3.1.

3.4 Numerical Example for MMDOur third numerical example shows the effectiveness of using MMD with localiza-tion to solve a graph Laplacian system. Again we use the same setup in the Example1 in Section 2.4.1, with density factor η = 2. But this time the vertices of the graph

87

(a) Front view (b) Top view

Figure 3.2: A “Roll surface” constructed by (3.33).

are randomly distributed around a 2-dimensional roll surface of area 1 in R3. Thedistribution is a combination of a uniform distribution over the surface and an upto 10% random displacement off the surface. More precisely, the two-dimensionalroll is characterized as

(x(t), y(t), z) = (ρ(t) cos(θ(t)), ρ(t) sin(θ(t)), z), t ∈ [0, 1], z ∈ [0, 1],

where

θ(t) =1alog

(1 + t

(e4πa − 1

)), ρ(t) =

a√1 + a2

(t +

1e4πa − 1

),

and so√

(ρ′(t))2 + (ρ(t)θ′(t))2 = 1. Each vertex (xi, yi, zi) is generated by

(xi, yi, zi) = (ηi ρ(ti) cos(θ(ti)), ηi ρ(ti) sin(θ(ti)), zi), i = 1, 2, · · · , n, (3.33)

where tii.i.d∼ U [0, 1], zi

i.i.d∼ U [0, 1], and ηi

i.i.d∼ U [0.9, 1.1]. In this example we

take n = 10000 and a = 0.1. Figure 3.2a and Figure 3.2b show the point cloud ofall vertices. This explicit expression, however, is considered as a hidden geometricinformation, and is not employed in our partitioning algorithm.

The Laplacian L = A0 and the energy decomposition E are given as in Exam-ple 2.1.10, and we apply a 4-level multiresolution matrix decomposition with lo-calization using Algorithm 6 to decompose the problem of solving L−1. In thisparticular case we have λmax(L) = 1.93 × 107 and λmin(L) = 1. Again for graphLaplacian, we choose q = 1. On each level k, the partition is constructed sub-ject to ε(P (k), 1)2 = 10k−6 (i.e. ε(P (k), 1)24k=1 = 0.00001, 0.0001, 0.001, 0.01)and δ(P (k), 1)ε(P (k), 1)2 ≤ 50. The compliment space U (k) are extended from

88

Level Size #Nonzeros Condition Number Complexity

L 10000 × 10000 128018 , m 1.93 × 107 2.47 × 1012

B(1) 1898 × 1898 9812 ≈ 0.07m 1.72 × 102 1.69 × 106

B(2) 6639 × 6639 391499 ≈ 3.06m 1.80 × 101 7.04 × 106

B(3) 1244 × 1244 417156 ≈ 3.26m 7.27 × 101 3.03 × 107

B(4) 186 × 186 34596 ≈ 0.27m 4.47 × 101 1.55 × 106

A(4) 33 × 33 1025 ≈ 0.008m 2.83 × 103 2.90 × 106

total - 854088 ≈ 6.67m - 4.34 × 107

Table 3.1: Complexity results of 4-level MMDwith localization using Algorithm 6.

(a) Spectrum (b) Complexity

Figure 3.3: Spectrum and complexity of each layer in a 4-level MMD obtained fromAlgorithm 6.

Φ(k) using a patch-wise QR factorization. So according to Corollary 3.2.1, eachκ(B(k)), k = 2, 3, 4 is expected to be bounded by

δ(P (k−1), 1)ε(P (k), 1)2 = δ(P (k−1), 1)ε(P (k−1), 1)2ε(P (k), 1)2

ε(P (k−1), 1)2≤ 500,

κ(B(1)) is expected bounded by λmax(L)ε(P (0), 1)2 = 1.93 × 102, and κ( A(4)) isexpected to be bounded by

δ(P (3), 1)λmin(L)−1 = δ(P (3), 1)ε(P (3), 1)2λmin(L)−1

ε(P (3), 1)2≤ 5000.

Since we will use a CG type method to compare the effectiveness of solving L−1

directly and using the 4-level decomposition, the complexities of both approaches areproportional to the product of the number of nonzero (NNZ) entries and the conditionnumber of the matrix concerned, given a fixed prescribed relative accuracy [102].Therefore we define the complexity of a matrix as the product of its NNZ entries and

89

its condition number. Though here we use the sparsity of B(k) = (U (k))T A(k−1)U (k),in practice only the sparsity of A(k−1) matters and the matrix multiplication withrespect to U (k) can be done by using the Householder vectors from the implicit QRfactorization [102]. The results not only satisfy the theoretical prediction, but alsoturn out to be much better than expected as shown in Table 3.1 and Figure 3.3. We

Level Size #Nonzero Condition Number Complexity

L 10000 × 10000 128018 , m 1.93 × 107 2.47 × 1012

B(1) 1898 × 1898 9812 ≈ 0.07m 1.72 × 102 1.69 × 106

B(2) 6639 × 6639 391499 ≈ 3.06m 1.80 × 101 7.04 × 106

B(3) 1014 × 1014 237212 ≈ 1.85m 2.56 × 101 6.09 × 106

B(4) 313 × 313 86323 ≈ 0.67m 1.62 × 101 1.40 × 106

B(5) 114 × 114 12996 ≈ 0.10m 5.54 × 101 7.20 × 105

A(5) 22 × 22 442 ≈ 0.004m 1.60 × 103 7.07 × 105

total - 738284 ≈ 5.77m - 1.76 × 107

Table 3.2: Complexity results of a 5-levelMMDwith localization usingAlgorithm6.

(a) (b)

Figure 3.4: Spectrum and complexity of each layer in a 5-level MMD obtained fromAlgorithm 6.

now verify the performance of the 4-level and the 5-level MMDs by solving twoparticular systems Lu∗ = b, where

Case 1: u∗i = (x2i + y2i + z2i )12 , i = 1, 2, · · · , n, b = Lu∗;

Case 2: u∗i = xi + yi + sin(zi), i = 1, 2, · · · , n, b = Lu∗.

For both cases, we set the prescribed relative accuracy to be ε = 10−5 such that‖u − u∗‖L ≤ ε ‖b‖2. By Theorem 3.2.3, this accuracy can be achieved by imposing

90

a corresponding accuracy control on each level’s linear system relative error (i.e.,err(K )

A and errB ) and localization error (errloc). In practice, instead of imposing ahard error control, we relax the localization error to ε(P (k), 1) (instead of ε ) in orderto ensure sparsity, which is actually how we obtain the 4-level decomposition withlocalization. Such relaxed localization error can be fixed by doing a compensationcorrection at level-0, which takes the output of Algorithm 7 as an initialization tosolve Lu = b. As shown in the gray columns “# Iteration" in Table 3.3 and Table 3.4,the number of iterations in the compensation calculation (which is the computationat Level 0) are fewer than 25 in both cases, which indicate that the localization erroron each level is still small even when relaxed.

Scale Matrix # Iteration Main Cost

Direct λ−1max (L) ∼ λ−1min (L) L 572 7.32 × 107

Level 4-level 5-level 4-lvl 5-lvl 4-lvl 5-lvl 4-lvl 5-lvl

0 10−5 ∼ λ−1max (L) 10−5 ∼ λ−1max (L) B(1) B(1) 22 22 2.15 × 105 2.15 × 105

1 10−5 ∼ 10−4 10−5 ∼ 10−4 B(2) B(2) 11 11 4.31 × 106 4.31 × 106

210−4 ∼ 10−3 10−4 ∼ 3 × 10−3 B(3) B(3) 25 14 1.04 × 107 3.32 × 106

3 × 10−3 ∼ 10−3 B(4) 14 1.21 × 106

3 10−3 ∼ 10−2 10−3 ∼ 10−2 B(4) B(5) 23 22 7.96 × 105 2.86 × 105

4 10−2 ∼ λ−1min (L) 10−2 ∼ λ−1min (L) A(4) A(5) 31 22 3.18 × 104 9.72 × 103

0 - - L L 25 30 3.20 × 106 3.84 × 106

Total - - - - 137 135 1.90 × 107 1.31 × 107

Parallel - - - - - - 1.36 × 107 8.15 × 106

Table 3.3: Case 1: Performance of the 4-level and the 5-level decompositions.

Scale Matrix # Iteration Main Cost

Direct λ−1max (L) ∼ λ−1min (L) L 586 7.50 × 107

Level 4-level 5-level 4-lvl 5-lvl 4-lvl 5-lvl 4-lvl 5-lvl

0 10−5 ∼ λ−1max (L) 10−5 ∼ λ−1max (L) B(1) B(1) 24 24 2.35 × 105 2.35 × 105

1 10−5 ∼ 10−4 10−5 ∼ 10−4 B(2) B(2) 11 11 4.31 × 106 4.31 × 106

210−4 ∼ 10−3 10−4 ∼ 3 × 10−3 B(3) B(3) 25 14 1.04 × 107 3.32 × 106

3 × 10−3 ∼ 10−3 B(4) 14 1.21 × 106

3 10−3 ∼ 10−2 10−3 ∼ 10−2 B(4) B(5) 23 23 7.96 × 105 2.99 × 105

4 10−2 ∼ λ−1min (L) 10−2 ∼ λ−1min (L) A(4) A(5) 30 22 3.08 × 104 9.72 × 103

0 - - L L 16 23 2.05 × 106 2.94 × 106

Total - - - - 129 131 1.78 × 107 1.23 × 107

Parallel - - - - - - 1.24 × 107 7.25 × 106

Table 3.4: Case 2: Performance of the 4-level and the 5-level decompositions.

In particular, we use a preconditioned CG method to solve any involved linearsystems Au = b in Algorithm 7. The precondition matrix D is chosen as the

91

diagonal part of A and we take 0 (all-zeros vector) as initials if no preconditioningvector is provided. The main computational cost of a single use of the PCG methodis measured by the product of the number of iterations and the NNZ entries of thematrix involved (See the gray column “Main Cost” in both Table 3.3 and Table 3.4).

In both cases, we can see that the total computational costs of our approach areobviously reduced compared to the direct use of PCG.Moreover, since the downwardlevel-wise computation can be done in parallel, the effective computational costs ofour approach are even less, which is the sum of the maximal cost among all levelsand the cost of the compensation correction (See “Parallel" row in Table 3.3 andTable 3.4 respectively).

Further, from the results we can see that the costs among all levels in both casesare mainly concentrated on level 2, namely, the inverting of B(3). This observationimplies that the structural/geometrical details of L have more proportion on thescale corresponding to level 2 than on other scales, which is consistent to the factthat B(3) on level 2 has the largest complexity of all. Though in practice we do nothave the information in Table 3.3 and Table 3.4, we may observe the dominanceof time complexity on level 2 after numbers of calls of our solver. As a naturalimprovement, we can simply further decompose the problem at level 2. Moreprecisely, to relieve the dominance of level 2, we add one extra scale of 0.0003between the scales of 0.0001 and 0.001. Consequently, we obtain a similar 5-leveldecomposition with ε(P (k), 1)25k=1 = 0.00001, 0.0001, 0.0003, 0.001, 0.01 (SeeTable 3.2 and Figure 3.4). From the “Main Cost 5-level” columns of Table 3.3 andTable 3.4), we can see that this simple improvement of further decomposition basedon the feedback of computational results does reduce the computational cost.

92

C h a p t e r 4

HIERARCHICALLY PRECONDITIONED EIGENSOLVER

In this chapter, we develop ourMMD framework into a hierarchically preconditionedeigensolver for large and sparse positive semidefinite matrices. The foundation ofour eigensolver is the celebrated Implicit Restarted Lanczos Method (IRLM), whichis integrated into a hierarchical structure consisting of two alternating processes: thelevel-wise spectrum extension process that finds new eigenpair candidates, and thecross-level spectrum refinement process that refines the computed eigenspace. In theextension process, we propose a spectral preserving preconditioner to guarantee thecomputational efficacy of each iteration. We apply theOrthogonal IterationwithRitzAcceleration to design a efficient refinement process that converges exponentiallyfast with properly chosen parameters.

The layout of the rest of this chapter is as follows: We briefly review the ImplicitlyRestarted Lanczos Method in Section 4.1. Some spectrum error analysis and per-turbation theories subject to our operator compression framework are discussed inSection 4.2. Theoretical developments and algorithms of the hierarchical spectrumextension process and the eigenpair refinement process are then proposed in Sec-tion 4.3 and Section 4.4 respectively. Combining these two procedures, we proposeour hierarchical eigensolver in Section 4.5, where details of the choice of param-eters are discussed. Section 4.6 is devoted to experimental results to justify theeffectiveness of our proposed algorithm. In Section 4.7, we provide a quantitativenumerical comparison with the conventional IRLM. The numerical results show thatour proposed algorithm gives a promising results in terms of runtime complexity.In Section 4.8, we further compare our approach to a class of existing works oneigenspace approximation via the computation of compressed eigenmodes.

4.1 Implicitly Restarted Lanczos Method (IRLM)The Arnoldi iteration is a widely used method to find eigenvalues of general asym-metric sparse matrices. It belongs to the family of Krylov subspace methods. Inthe symmetric case, we can further simplify it as the Lanczos iteration. A directapplication of Lanczos iteration gives the largest eigenvalues of an operator by cal-culating the eigenvalues of its projection on a Krylov subspace. In each step thealgorithm expands the Krylov subspace and finds an orthogonal basis of the space.

93

Namely, after k steps, the factorization is

AVk = VkTk + f k eTk . (4.1)

where we recall that Tk is a tridiagonal matrix when A is symmetric. Denote (θ, y)as an eigenpair of Tk . Let x = Vk y. Then we have

‖Ax − xθ‖2 = ‖AVk y − Vk yθ‖2= ‖ f k ‖2 |eTk y |. (4.2)

Therefore θ is a good approximation of the eigenvalue of A if and only if ‖ f k ‖2 |eTk y |

is small. The latter is called the Ritz residual. An analogy to the power methodshows that, to compute the largest m eigenvalues, the convergence rate of the largestm eigenvalues of A is (λm+1/λm)k where λi is the ith largest eigenvalue of A.

The direct Lanczos method is not practical due to the fact that ‖ f k ‖2 rarely becomessmall enough until the size of Tk approaches that of A. An improvement is theIRLM [64, 109]. The IRLM employs the idea analogous to the implicitly shiftedQR-iteration [39]. With this approach, the “unwanted” eigenvalues (in this case theleftmost ones) are shifted away implicitly in each round of implicit restart, and Tk

is kept with a small size equal to the number of desired eigenvalues. This is one ofthe state-of-the-art algorithms for large-scale partial eigen problems.

Yet, it is still complicated if we want to find the leftmost eigenvalues. One possibleapproach is to use a shifted IRLM. Namely, to find eigenvalues nearest to σ, wecan replace A with (A − σI)−1 as the target operator. By taking σ = 0 we get theeigenvalues with smallest magnitude. Such approach usually converges with a fewiterations, but it requires solving A−1 in every iteration. For large sparse problems,A−1 is usually solved by the CGmethod. The complexity of CG is the complexity ofmatrix-vector product times the number of CG iterations. The former is equal to thenumber of nonzero entries of A (denoted as nnz(A)), while the latter is controlledby the condition number κ(A). Therefore, the total complexity of the shifted IRLMfor solving mtar smallest eigenvalues is

O(RIRLM · mtar · nnz(A) · κ(A)), (4.3)

where RIRLM is the number of IRLM rounds. In the following, we will develop theextension-refinement algorithm to integrate the MMD framework with the shiftedIRLM which gives considerable improvement in terms of iteration numbers of CGand PCG throughout the algorithm.

94

Algorithm 8 Lanczos Iteration (p-step extension)Input: V , T , f , target operator op(·), p.Output: V , T , f .1: k = column number of V ;2: for i = 1 : p do3: β = ‖ f ‖2;4: if β < ε then5: generate a new random f , β = ‖ f ‖2;6: end if7: T ←

( TβeT

k+i−1

), v = f /β, V ← [V, v];

8: w = op(v);9: h = VTw, T ← [T, h];10: f = w − V h;11: Re-orthogonalize to adjust f ;12: end for

Algorithm 9 Inner Iteration of the IRLMInput: V , T , f , p.Output: V , T , f .1: k = column number of V ;2: Perform Algorithm 8 on V , T and f for p steps;3: Set Q = Ik+p and σ j to be the p smallest eigenvalues of T ;4: for j = 1 : p do5: T − σ j I = Q j R j ;6: T = QT

j TQ j,Q ← QQ j ;7: end for8: f ← V · Q(:, k + 1) · T (k + 1, k) + f · Q(k + p, k);9: V ← V · Q(:, 1 : k), T ← T (1 : k, 1 : k);

4.2 The Compressed Eigen ProblemIn the previous section, we introduced an effective compression technique for a PDmatrix A subject to a prescribed compression error ε . The compressed operatoris also symmetric and positive definite. Therefore, by the well-known eigenvalueperturbation theory, we know that the eigenpairs of the compressed operator can beused as good approximations for the eigenpairs of the original matrix. In particular,we have the following estimate:

Lemma 4.2.1. Let Θ = Ψ(ΨT AΨ)−1ΨT be a rank-N compressed approximation ofA−1 introduced in Theorem 2.2.1 such that ‖A−1 − Θ‖2 ≤ ε. Let µ1 ≥ µ2 ≥ · · · ≥

µn > 0 be the eigenvalues of A−1 in a descending order, and µ1 ≥ µ2 ≥ · · · ≥ µN >

95

0 be the nonzero eigenvalues of Θ in a descending order. Then we have

|µi − µi | ≤ ε, 1 ≤ i ≤ N ; µi ≤ ε, N < i ≤ n.

Moreover, let vi, i = 1, · · · , N , be the corresponding normalized eigenvectors of Θsuch that Θvi = µi vi, then we have

‖A−1vi − µi vi‖2 ≤ 2ε, 1 ≤ i ≤ N .

Since the nonzero eigenvalues of Θ and the corresponding eigenvectors actually re-sult from the nonsingular stiffness matrix Ast = Ψ

T AΨ, we will call these eigenpairsthe essential eigenpairs of Θ in what follows. We will also need the followinglemma for developing our algorithms.

Lemma 4.2.2. Let ( µi, vi), i = 1, · · · , N , be the N essential eigenpairs of Θ givenin Lemma 4.2.1.

(i) Let wi = ΨT vi, then

ΨTΨA−1st wi = µiwi, 1 ≤ i ≤ N .

(ii) Let zi = Ψ†vi = (ΨTΨ)−1ΨT vi, then

A−1st ΨTΨzi = µi zi, 1 ≤ i ≤ N,

where Ast = ΨT AΨ is the stiffness matrix. Conversely, if either (i) or (ii) is true,

then ( µi, vi), i = 1, · · · , N , are eigenpairs of Θ.

Similar to Lemma 4.2.1, we have the following estimates for multiresolution de-composition.

Lemma 4.2.3. Given an integer K , let Θ(k) = Ψ(k) ((Ψ(k))T AΨ(k))−1(Ψ(k))T , k =

1, 2, · · · , K , with Ψ(k) given in (3.9). Write A−1 = Θ(0). Let (µ(k)i , v (k)

i ), i =

1, 2, · · · , N (k), be the essential eigenpairs of Θ(k) where µ(k)1 ≥ µ(k)

2 ≥ · · · ≥

µ(k)N (k ) > 0. Then for any 0 ≤ k′ < k ≤ K , we have

|µ(k ′)i − µ(k)

i | ≤ εk, 1 ≤ i ≤ N (k); |µ(k ′)i | ≤ εk, N (k) < i ≤ N (k ′),

and‖Θ(k ′)v (k)

i − µ(k ′)i v (k)

i ‖2 ≤ 2εk, 1 ≤ i ≤ N (k) .

96

Proof. By Theorem 3.3.1 we have that ‖Θ(0) − Θ(k) ‖2 = ‖A−1 − Θ(k) ‖2 ≤ εk ,k = 1, 2, · · · , K . From the definition of Θ(k) and the decomposition (3.10), one caneasily check that

A−1 = Θ(0) Θ(1) · · · Θ(K−1) Θ(K ) .

Then the results follow immediately.

On Compressed EigenproblemsRecall that the efficiency of constructing the compressed operator we propose relieson the exponential decay property of the basis Ψ. This spacial exponential decayfeature allows us to localize Ψ and to construct sparse stiffness matrix Ast = Ψ

T AΨ

without compromising compression accuracy ε in O(nnz(A) · (log( 1ε ) + log n)c)time. In fact, the problem of using spatially localized/compact basis to compresshigh-dimensional operator and to approximate eigenspace of smallest eigenvalueshas long been studied in different ways. A representative pioneer work is themethod of compressed modes proposed by Ozolinš et al. [90], intended originallyfor Schrödinger’s equation in quantum physics. By adding a L1 regularization tothe variational form of an eigenproblem, they obtained spatially compressed basismodes that well span the desired eigenspace. Though the way they obtain sparsityis quite different from what we do, both methods obtain interestingly similar resultsfor some model problems. It can be inspiring to make comparison between theirmethod and ours, so that readers can have better understanding of our approach. Weleave the detailed comparison to Section 4.2.

4.3 Hierarchical Spectrum CompletionNow that we have a sequence of compressed approximations, we next seek to usethis decomposition to compute the dominant spectrum of A−1 down to a prescribedvalue in a hierarchical manner. In particular, we propose to decompose the targetspectrum into several segments of different scales, and then allocate the computationof each segment to a certain level of the compressing sequence so that the problemon each level is well-conditioned.

To implement this idea, we first go back to the one-level compression settings.Suppose that we have accurately obtained the first m essential eigenpairs (µi, vi),i = 1, · · · ,m, of Θ = Ψ(ΨT AΨ)−1ΨT = ΨA−1st Ψ

T , and our aim is to compute thefollowing mtar − m eigenpairs (namely extend to the first mtar eigenpairs) using theLanczos method. Define Vm = spanvi : 1 ≤ i ≤ m and Vm+ = spanvi : m < i ≤

97

N = V⊥m ∩ spanΨ. Then to perform the Lanczos method to compute the nextsegment of eigenpairs of Θ, we need to repeatedly apply the operator ΨA−1st Ψ

T tovectors in Vm+ , which requires to compute A−1st w for w ∈ Wm+ = Ψ

T (Vm+ ).

Ideally we want the computation of the following mtar −m eigenpairs to be restrictedto a problem with bounded spectrum width that is proportional to µm/µmtar . Thisis possible since we assume that we have accurately obtained the span space Vm

of the first m eigenvectors, and thus we can consider our problem in the reducedspace orthogonal to Vm. In this case, the CG method will be efficient for computinginverse matrix operations.

Definition 4.3.1. Let A be a symmetric, positive definite matrix, and V be aninvariant subspace of A. We define the condition number of A with respect to V as

κ(A,V ) =λmax (A,V )λmin(A,V )

,

where

λmax (A,V ) = maxv∈V,v,0

vT AvvTv

, λmin(A,V ) = minv∈V,v,0

vT AvvTv

.

Theorem4.3.2. Let A be a symmetric, positive definite matrix, andV be an invariantsubspace of A. When using the conjugate gradient method to solve Ax = b withinitial guess x0 such that r0 = b − Ax0 ∈ V , we have the following estimate

‖xk − x∗‖A ≤ 2(√κ(A,V ) − 1√κ(A,V ) + 1

) k‖x0 − x∗‖A,

and‖xk − x∗‖2 ≤ 2

√κ(A,V )

(√κ(A,V ) − 1√κ(A,V ) + 1

) k‖x0 − x∗‖2,

where x∗ is the exact solution, and xk ∈ x∗ + V is the solution at the kth stepof CG iteration. Thus it takes k = O(κ(A,V ) · log 1

ε ) steps (or k = O(κ(A,V ) ·

(log κ(A,V ) + log 1ε )

)steps) to obtain a solution subject to relative error ε in the

energy norm (or l2 norm).

Proof. We only need to notice that the k-order Krylov subspace K (A, r0, k) gener-ated by A and r0 satisfies

K (A, r0, k) ⊂ V, ∀k ∈ Z.

98

Notice that, for any i = m + 1, · · · , N , though vi ∈ Vm+ is an eigenvector ofΘ = ΨA−1st Ψ

T , wi = ΨTvi is not an eigenvector of A−1st (but an eigenvector ofΨTΨA−1st ) since we do not require Ψ to be orthonormal. Therefore the space Wm+

is not an invariant space of Ast , and if we directly use the CG method to solveAst x = w, the convergence rate will depend on κ(Ast ) = λmax (Ast )λmin(Ast )−1 ,instead of λmax (Ast )µm as intended. Though we bound λmax (Ast ) from above byδ(P) and λmin(Ast ) from below by λmin(A) (See Theorem 2.2.25), κ(Ast ) can bestill large since we prescribe a bounded compression rate in practice to ensure theefficiency of the compression algorithm.

Therefore, we need to find a proper invariant space, so that we can make use of theknowledge of the space Vm and restrict the computation of A−1st w to a problem ofnarrower spectrum.

Lemma 4.3.3. Let (µi, vi), i = 1, · · · , N , be the essential eigenpairs of Θ =Ψ(ΨT AΨ)−1ΨT = ΨA−1st Ψ

T , such that µ1 ≥ µ2 ≥ · · · ≥ µN > 0. Let (ΨTΨ)12

be the square root of the symmetric, positive definite matrix ΨTΨ. Then (µi, zi),i = 1, · · · , N , are all eigenpairs of (ΨTΨ)

12 A−1st (ΨTΨ)

12 , where

zi = (ΨTΨ)−

12Ψ

Tvi, 1 ≤ i ≤ N .

Moreover, for any subset S ⊂ 1, 2, · · · , N , and ZS = spanzi : i ∈ S, we have

K (AΨ, z, k) ⊂ ZS, ∀z ∈ ZS, ∀k ∈ Z,

where AΨ = (ΨTΨ)−12 Ast (ΨTΨ)−

12 .

Lemma 4.3.4. Let Ψ be given in Construction 2.2.9, then we have

λmin(ΨTΨ) ≥ 1, λmax (ΨT

Ψ) ≤ 1 + ε(P)δ(P),

and thusκ(ΨT

Ψ) ≤ 1 + ε(P)δ(P).

Proof. Let U be the orthogonal complement basis of Φ as given in (3.2), so [Φ,U]is an orthonormal basis of Rn, and we have ΦΦT + UUT = In. Since ΦTΨ =

ΦT A−1Φ(ΦT A−1Φ)−1 = IN , we have

ΨTΨ = ΨT

ΦΦTΨ + ΨTUUT

Ψ = IN + ΨTUUT

Ψ.

99

We then immediately obtain ΨTΨ IN , and thus λmin(ΨTΨ) ≥ 1. To obtain anupper bound of λmax (ΨTΨ), we notice that from the construction of Φ we have

‖x − PΦx‖22 ≤ ε(P)xT Ax, ∀x ∈ Rn =⇒ (In − PΦ)2 ε(P) A,

where PΦ = ΦΦT denotes the orthogonal projection into spanΦ. Since ΦΦT +

UUT = In, we have

UUT = In − ΦΦT = (In − ΦΦ

T )2 ε(P)A.

Therefore we have

ΨTΨ = IN + Ψ

TUUTΨ IN + ε(P)ΨT AΨ = IN + ε(P) Ast,

and by Theorem 2.2.25 we obtain

λmax (ΨTΨ) ≤ 1 + ε(P)λmax (Ast ) ≤ 1 + ε(P)δ(P).

Theorem 4.3.5. Let AΨ and (µi, zi) be defined as in Lemma 4.3.3. Let Zm+ =

spanzi : m < i ≤ N , then Zm+ is an invariant space of AΨ, and we have

κ(AΨ, Zm+ ) ≤ µm+1δ(P).

Proof. By Lemma 4.3.3 and Lemma 4.3.4, we have

λmax (AΨ, Zm+ ) ≤ λmax (AΨ) = ‖(ΨTΨ)−

12 Ast (ΨT

Ψ)−12 ‖2

≤ ‖Ast ‖2‖(ΨTΨ)−1‖2 ≤ δ(P).

And by the definition of Zm+ , we have

λmin(AΨ, Zm+ ) =1

λmax (A−1Ψ, Zm+ )

=1

λmax ((ΨTΨ)12 A−1st (ΨTΨ)

12 , Zm+ )

=1

µm+1.

Inspired by Lemma 4.3.4 and Theorem 4.3.5, we now consider to solve Ast x = w

efficiently forw ∈ Wm+ = ΨT (Vm+ ) = (ΨTΨ)

12 (Zm+ ) bymaking use of the controlled

condition number κ(AΨ, Zm+ ) and κ(ΨTΨ). Theoretically, we can compute x =

A−1st w by the following steps:

100

(i) Compute b = (ΨTΨ)−12w ∈ Zm+;

(ii) Use the CG method to compute y = A−1Ψ

b with initial guess y0 such thatb − AΨy0 ∈ Zm+;

(iii) Compute x = (ΨTΨ)−12 y.

Notice that this procedure is exactly solving Ast x = w using the preconditioned CGmethod with preconditioner ΨTΨ, which only involves applying Ast and (ΨTΨ)−1

to vectors, but still enjoys the good conditioning property of AΨ restricted to Zm+ .Therefore we have the following estimate:

Corollary 4.3.6. Consider using the PCG method to solve Ast x = w for w ∈ Wm+

with preconditioner ΨTΨ and initial guess x0 such that r0 = w − Ast x0 ∈ Wm+ . Letx∗ be the exact solution, and xk be the solution at the kth step of the PCG iteration.Then we have

‖xk − x∗‖Ast ≤ 2(√κ(AΨ, Zm+ ) − 1√κ(AΨ, Zm+ ) + 1

) k‖x0 − x∗‖Ast ,

and

‖xk − x∗‖2 ≤ 2√κ(ΨTΨ)κ(AΨ, Zm+ )

(√κ(AΨ, Zm+ ) − 1√κ(AΨ, Zm+ ) + 1

) k‖x0 − x∗‖2.

Proof. Let yk = (ΨTΨ)12 xk and y∗ = (ΨTΨ)

12 x∗, then we have

‖yk − y∗‖22 = (xk − x∗)T

ΨTΨ(xk − x∗),

and‖yk − y∗‖

2AΨ = (yk − y∗)T AΨ(yk − y∗) = ‖xk − x∗‖2Ast

.

Noticing that (ΨTΨ)−12 r0 ∈ Zm+ and K (AΨ, (ΨTΨ)−

12 r0, k) ⊂ Zm+ ∀k, the results

follow from Theorem 4.3.2.

By Corollary 4.3.6, to compute a solution of Ast x = w subject to a relative error εin the Ast-norm, the number of needed PCG iterations is

O(κ(AΨ, Zm+ ) · log

1ε

)= O

(µm+1δ(P) · log

1ε

).

101

This is also an estimate of the number of needed PCG iterations for a relative errorε in the l2-norm, if we assume that κ(ΨTΨ), κ(AΨ, Zm+ ) ≤ 1

ε .

In what follows we will denote M = ΨTΨ. Notice that the nonzero entries of M

are due to the overlapping support of column basis vectors of Ψ, while the nonzeroentries of Ast = Ψ

T AΨ are results of interactions between column basis vectors ofΨ with respect to A. Thus we can reasonably assume that nnz(M) ≤ nnz(Ast ).Suppose that in each iteration of the whole PCG procedure, we also use the CGmethod to compute M−1b for some b subject to a relatively higher precision ε ,which requires a cost of O(nnz(M) · κ(M) · log 1

ε ). In practice it is sufficient totake ε smaller than but comparable to ε (e.g. ε = 0.1ε), so log( 1ε ) = O(log 1

ε ). ByLemma 4.3.4 we have κ(M) = O(ε(P)δ(P)). Then the computational complexityof each single iteration can be bounded by

O(nnz(Ast )

)+O

(nnz(M) · κ(M) · log

1ε

)= O

(nnz(Ast ) · ε(P)δ(P) · log

1ε

),

and the total cost of computing a solution of Ast x = w subject to a relative error εis

O(µm+1δ(P) · nnz(Ast ) · ε(P)δ(P) · (log

1ε

)2). (4.4)

We remark that when the original size of A ∈ Rn×n is large, the eigenvectors V

are long and dense. It would be expensive to compute inner products with theselong vectors over and over again. In fact, in the previous discussions the operatorΘ = ΨA−1st Ψ

T (of the same size as A) and the eigenvectors V are only for purposeof analysis use to explain the idea of our method. In practical, for a long vectorv = Ψv, we do not need to keep track of the whole vector, but only need to storeits much shorter coefficients v of compressed dimension N instead. When wecompute v2 = Θv1 = ΨA−1st Ψ

Tv1, it is equivalent to computing v2 = A−1st M v1, wherev j = Ψv j, j = 1, 2, and M = ΨTΨ. One can check that the analysis presented abovestill applies. So in the implementation of our method, we only deal with operatorA−1st M and short vectors V , and the long eigenvectors V and Ψ will not appear untilin the very end when we recover V = ΨV . We remark that since the eigenvectorsof Θ are orthogonal, their coefficient vectors V are M-orthogonal, i.e. VT MV = I.We use ‖x‖M to denote the norm

√xT M x.

Recall that in the Lanczos method with respect to operator Θ, the upper-Hessenbergmatrix T in the Arnoldi relation

ΘV = VT + f eT

102

is indeed tridiagonal, since Θ is symmetric, and VT [V, f ] = [I, 0]. This upper-Hessenberg matrix T being tridiagonal is the reason why the implicit restartingprocess(Algorithm 9) is efficient. Now since we are actually dealing with theoperator A−1st M and the coefficient vectors VT = M−1ΨTV , the Arnoldi relationbecomes

A−1st MV = VT + f eT,

where f = M−1ΨT f . So as long as we keep V M-orthogonal and f M-orthogonalto V , T will still be tridiagonal since

T = VT MVT = VT M (VT + f eT ) = VT M A−1st MV

is symmetric. We therefore modified Algorithm 8 to Algorithm 10 to take M-orthogonality into consideration.

Summarizing the analysis above, we proposeAlgorithm 11 for extending a given col-lection of eigenpairs using theLanczos typemethod. The operatorOP( · ; Ast, M, εop)exploits our key idea that uses M = ΨTΨ as the preconditioner to effectively reducethe number of PCG iterations in every operation of A−1st M . For convenience, wewill use “x = pcg(A, b, M, x0, ε )” to represent the operation of computing x = A−1b

using the PCGmethod with preconditioner M and initial guess x0, subject to relativeerror ε . “x = pcg(A, b,−, x0, ε )” means no preconditioner is used (i.e. the normalCG method), and “x = pcg(A, b, M,−, ε )” means an all zero vector is used as theinitial guess.

Algorithm 10 General Lanczos Iteration (p-step extension)

Input: V , T , f , target operator op(·), p, inner product matrix MOutput: V , T , f1: k = column number of V ;2: for i = 1 : p do3: β = ‖ f ‖M ;4: if β < ε then5: generate a new random f , β = ‖ f ‖M ;6: end if7: T ←

( TβeT

k+i−1

), v = f /β, V ← [V, v];

8: w = op(v);9: h = VT Mw, T ← [T, h];10: f = w − V h;11: Re-orthogonalize to adjust f (with respect to M-orthogonality);12: end for

103

Function y = Operator OP(x; Ast, M, εop)

1: w = M x;2: y = pcg(Ast,w, M,−, εop);

Algorithm 11 Eigenpair Extension

Input: Vini, Dini, OP( · ; Ast, M, εop), target number mtar ,prescribed accuracy ε , eigenvalue threshold µ, searching step d.

Output: Vex , Dex .1: Generate random initial vector V = v that is M-orthogonal to Vini;2: repeat3: perform d steps of general Lanczos iteration (Algorithm 10) with operator

OP to extend V,T ;4: while Lanczos residual > ε , do5: Perform c · d steps of shifts to restart Lanczos (Algorithm 9) and renew

V,T ;6: end while7: Find the smallest eigenvalue of T as µ;8: until µ < µ or dim(V ) ≥ mtar − dim(Vini).9: mnew = dim(V );10: while Lanczos residual > ε , do11: Perform c · mnew steps of shifts to restart Lanczos (Algorithm 9) and renew

V,T ;12: end while13: PSPT = T (Schur Decomposition);

14: Vex = [Vini, V P], Dex =

[Dini

S

];

Given an existing eigenspace Vini = ΨVini, Algorithm 11 basically uses the Lanczosmethod to find the following eigenpairs ofΘ in the space V⊥ini. Notice that the outputVex gives the coefficients of the desired eigenvectors Vex in the basis Ψ. However,different from the classical Lanczos method, we do not prescribe a specific numberfor the output eigenpairs. Instead, we set a threshold µ to bound the last outputeigenvalue. As we will develop our idea into a multi-level algorithm that pursuesa number of target eigenpairs hierarchically, the output of the current level will beused to generate the initial eigenspace for the higher level. Therefore, the purpose ofsetting a threshold µ on the current level is to bound the restricted condition numberon the higher level, as the initial eigenspace Vini from the lower level helps to boundthe restricted condition number on the current level.

The choice of the threshold µ will be discussed in detail after we introduce therefinement procedure. Here, to develop a hierarchical spectrum completion method

104

using the analysis above, we state the hierarchical versions of Lemma 4.3.4 andTheorem 4.3.5.

Lemma 4.3.7. Let Ψ(k) be given in (3.9), and M (k) = (Ψ(k))TΨ(k). Then we have

λmin(M (k)) ≥ 1, λmax (M (k)) ≤ 1 + εkδk,

and thusκ(M (k)) ≤ 1 + εkδk .

Proof. The proof is similar to the proof of Lemma 4.3.4. Let U (k) = (Φ(k))⊥ bethe orthogonal complement basis of Φ(k). According to Theorem 3.3.1, we have

‖x − PΦ(k ) x‖22 ≤ εk ‖x‖2A,

which implies that

U (k) (U (k))T = (In −Φ(k) (Φ(k))T ) ≤ εk A.

Notice that (Φ(k))TΨ(k) = IN (k ) , Φ(k) (Φ(k))T +U (k) (U (k))T = In, we thus have

M (k) = (Ψ(k))TΦ(k) (Φ(k))TΨ(k) + (Ψ(k))TU (k) (U (k))TΨ(k)

= IN (k ) + (Ψ(k))TU (k) (U (k))TΨ(k),

=⇒ IN (k ) M (k) IN (k ) + εk (Ψ(k))T AΨ(k) = IN (k ) + εk A(k) .

Therefore we have λmin(M (k)) ≥ 1, and by Corollary 3.1.4 we have

λmax (M (k)) ≤ 1 + εkλmax (A(k)) ≤ 1 + εkδk .

Theorem 4.3.8. Let A(k) and Ψ(k) be given in (3.9), and M (k) = (Ψ(k))TΨ(k). Let(µ(k)

i , v (k)i ), i = 1, · · · , N (k), be the essential eigenpairs ofΘ(k) = Ψ(k) (A(k))−1(Ψ(k))T .

Definez(k)

i = (M (k))−12 (Ψ(k))Tv (k)

i , 1 ≤ i ≤ N (k) .

Given an integer mk , let Z (k)m+k

= spanz(k)i : mk < i ≤ N (k), then Z (k)

m+k

is an invariant

space of A(k)Ψ= (M (k))−

12 A(k) (M (k))−

12 , and we have

κ(A(k)Ψ, Z (k)

m+ ) ≤ µ(k)mk+1δk .

105

Moreover, consider using the PCG method to solve A(k) x = w for w ∈ W (k)m+k

with

preconditioner M (k) and initial guess x0 such that r0 = w − A(k) x0 ∈ W (k)m+k

, where

W (k)m+k

= span(Ψ(k))Tv (k)i : mk < i ≤ N (k). Let x∗ be the exact solution, and xt be

the solution at the tth step of the PCG iteration. Then we have

‖xt − x∗‖A(k ) ≤ 2(√µ(k)

mk+1δk − 1√µ(k)

mk+1δk + 1

) t‖x0 − x∗‖A(k ),

and

‖xt − x∗‖2 ≤ 2√εk µ

(k)mk+1δ

2k

(√µ(k)mk+1δk − 1√µ(k)

mk+1δk + 1

) t‖x0 − x∗‖2.

Recall that we will use the CG method to implement Lanczos iteration on eachlevel k to complete the target spectrum. To ensure the efficiency of the CG method,namely to bound the restricted condition number κ(A(k)

Ψ, Z (k)

m+ ) on each level, we needa priori knowledge of the spectrum (µ(k)

i , v (k)i ) : 1 ≤ i ≤ mk such that µ(k)

mk+1δk

is uniformly bounded. This given spectrum should be inductively computed on thelower level k + 1. But notice that there is a compression error between each twoneighbor levels, which will compromise the orthogonality and thus the theoreticalbound for restricted condition number, if we directly use the spectrum of the lowerlevel as a priori spectrum of the current level. Therefore we introduce a refinementmethod in Section 4.4 to overcome this difficulty.

4.4 Cross-Level Refinement of EigenspaceIn the previous section we have established a one-level spectrum extension method,given that a partial accurate spectrum is provided. To develop this method into aninductive hierarchical spectrum completion procedure, a natural idea is to use thespectrum computed at the lower level as the initial spectrum to be used in the higherlevel. However, such initial spectrum is not actually good enough since there is acompression error between each two neighboring levels. Thus we need to use acompatible refinement technique to refine the initial spectrum.

Now consider the cross-level spectrum refinement between the two consecutive lev-els, the h-level and the l-level. The two operators areΘh = Ψh ((Ψh)T AΨh)−1(Ψh)T

and Θl = Ψl ((Ψl )T AΨl )−1(Ψl )T respectively. We have the relations

Ψl = ΨhΨ

l, U l = ΨhU l,

106

Alst = (Ψl )T AΨl = (Ψl )T (Ψh)T AΨh

Ψl = (Ψl )T Ah

stΨl,

Blst = (U l )T AU l = (U l )T (Ψh)T AΨhU l = (U l )T Ah

stUl,

(Ahst )−1 = Ψl (Al

st )−1(Ψl )T +U l (Bl

st )−1(U l )T,

Θh = Ψh (

Ψl (Al

st )−1(Ψl )T +U l (Bl

st )−1(U l )T )

(Ψh)T = Θl +U l (Blst )−1(U l )T .

(4.5)

Now suppose that we have obtained the first ml essential eigenpairs (µl,i, vl,i), i =

1, · · · ,ml , of Θl . We want to use these eigenpairs as initial guess to obtain the firstmh essential eigenpairs of Θh. Recall that we have the estimates

|µh,i − µl,i | ≤ εl, 1 ≤ i ≤ ml,

and‖Θhvl,i − µh,ivl,i‖2 ≤ 2εl, 1 ≤ i ≤ ml,

where εl is the compression error bound. These estimates give us confidence thatwe can obtain (µh,i, vh,i), i = 1, · · · ,mh, efficiently from (µl,i, vl,i), i = 1, · · · ,ml , byusing some refinement technique.

Indeed, wewill use theOrthogonal IterationwithRitzAcceleration as our refinementmethod. Consider an initial guess Q(0) of the first m eigenvectors of a PD operatorΘ. To obtain more accurate eigenvalues and eigenspace, the Orthogonal Iterationwith Ritz Acceleration runs as follows:

Q(0) ∈ Rn×m given with (Q(0))TQ(0) = Im

F (0) = ΘQ(0)

for k = 1, 2, · · ·

Q(k) R(k) = F (k−1) (QR factorization)

F (k) = ΘQ(k) (∗)

S(k) = (Q(k))T F (k)

P(k) D(k) (P(k))T = S(k) (Schur decomposition)

Q(k) ← Q(k) P(k)

F (k) ← F (k) P(k)

end

107

To state the convergence property of the Orthogonal Iterationwith Ritz Acceleration,we first define the distance between two spaces. LetV1,V2 ⊂ R

n be two linear spaces,and PV1,PV2 be the orthogonal projections onto V1,V2 respectively. We define thedistance between V1 and V2 as

dist(V1,V2) = ‖PV1 − PV2 ‖2.

We also use the same notation dist(V1,V2) when V1,V2 are matrices of columnvectors. In this case dist(V1,V2) means dist(spanV1, spanV2).

Suppose that the diagonal entries µ(k)i , i = 1, · · · ,m, of D(k) are in a decreasing

order, then µ(k)i is a good approximation of the ith eigenvalue of Θ, and spanQ(k)

i

is a good approximation of the eigenspace spanned by the first i eigenvectors of Θ,where Q(k)

i denotes the first i columns of Q(k). We would like to emphasize that themeaning of the superscript (k) of µ(k)

i is different from those in Section 4.3. Moreprecisely, we have the following convergence estimate:

Theorem (Stewart [117], 1968): Let (µi, vi), i = 1, · · · , N , be the ordered (es-sential) eigenpairs of Θ, and let µ(k)

i , i = 1, · · · ,m, be the ordered eigenvalues ofD(k) = (Q(k))TΘQ(k) given in the Orthogonal Iteration with Ritz Acceleration (∗).Let Vm = [v1, v2, · · · , vm], and d (0) = dist(Vm,Q(0)). Then we have

|µi − µ(k)i | ≤ O

(( µm+1µi

)2k· ‖Θ‖2 ·

(d (0))2

1 − (d (0))2), 1 ≤ i ≤ m.

Moreover, we have

dist(Vm,Q(k)) ≤ O(( µm+1

µm

) k·

d (0)√1 − (d (0))2

),

and for i = 1, · · · ,m − 1, if we further assume that αi = µi − µi+1 > 0, then we have

dist(Vi,Q(k)i ) ≤ O

(( µm+1µi

) k·

d (0)√1 − (d (0))2

)+O

(√iαi·( µ2m+1µmµi

) k· ‖Θ‖2 ·

(d (0))2

1 − (d (0))2),

where Vi and Q(k)i are the first i columns of Vm and Q(k) respectively.

Now we go back to our problem, where we have Θ = Θh, m = ml , and Q(0) =

V lml= [vl,1, · · · , vl,ml

]. We next consider the efficiency of this refinement technique

108

in our problem. As long as the initial distance d (0) = dist(V hml,V l

ml) < 1, the first

mh eigenvalues and the eigenspace of the first mh eigenvectors of Θh convergesexponentially fast at a rate (

µh,ml+1µh,mh

)k . We can expect that a few iterations ofrefinement will be sufficient to give an accurate eigenspace for narrowing downthe residual spectrum of Θh, if we can ensure that the ratio µh,ml+1

µh,mhis small enough.

This will be verified in our numerical examples to be presented in Section 4.6. Inparticular, to refine the first mh eigenpairs subject to a prescribed accuracy ε , weneed K = O(log( 1ε )/ log(

µh,mh

µh,ml+1)) refinement iterations.

The main cost of the refinement procedure comes from the computation of ΘhQ(0)

and the computation of ΘhQ(k) in each iteration. We will reduce the computationalcost by using the fact that Q(k) is a good approximation of eigenvectors of Θh. Wefirst consider how to compute ΘhQ(0) efficiently.

Notice that in our problem, we take Q(0) = V lml, whose columns are the first ml

eigenvectors of Θl . Therefore by (4.5), we have

ΘhQ(0) = ΘhV l

ml= ΘlV l

ml+U l (Bl

st )−1(U l )TV l

ml= V l

mlDl

ml+U l (Bl

st )−1(U l )TV l

ml,

where Dlml

is a diagonalmatrixwhose diagonal entries are µl,1, µl,2, · · · , µl,ml. Recall

that by Lemma 3.1.1 and Corollary 3.1.4, κ(Blst ) is bounded by εlδh that can be

well controlled in the decomposition procedure. Thus it is efficient to solve (Blst )−1

using the CG method. As we have mentioned before, applying (U l )T or U l fromthe left is performed by doing patch-wise Householder transformations that involveonly one local Householder vector on each patch, which takes O(N h) computationalcost, where N h is the compressed dimension on level h or the size of Ah

st . Thereforein the CG method, the cost of matrix multiplication of Bl

st = (U l )T AhstU

l mainlycomes from the number of nonzero entries of Ah

st . Then the total computational costof computing ΘhQ(0) subject to a relative error ε can be bounded by

O(ml · nnz(Ah

st ) · εlδh · log(1ε

)).

Next, we consider how to compute ΘhQ(k). To do so, we first compute w(k)i =

(Ψh)T q(k)i , where q(k)

i is the ith column of Q(k), then compute (Ahst )−1w(k)

i , andapply Ψh. Again we will use the PCG method with predictioner Mh = (Ψh)TΨh tocompute (Ah

st )−1w(k)

i . Aswe have discussed in Section 4.3, this is equivalent to usingthe CGmethod to compute (Ah

Ψ)−1z(k)i , where Ah

Ψ = (Mh)−12 Ah

st (Mh)−12 , and z(k)

i =

(Mh)−12w(k)

i = (Mh)−12 (Ψh)T q(k)

i . Inspired by Corollary 4.3.6, we seek to provide

109

a good initial guess for the CG method to ensure efficiency. In the Orthogonal Itera-tion with Ritz Acceleration (∗), one can check that (Q(k))T (ΘhQ(k)−Q(k) D(k)) = 0,where D(k) is a diagonal matrix with diagonal entries µ(k)

1 , µ(k)2 , · · · , µ(k)

ml, and there-

fore

(Z (k))T ((AhΨ)−1Z (k) − Z (k) D(k))

= (Q(k))TΨh(Mh)−12

((AhΨ)−1(Mh)−

12 (Ψh)TQ(k) − (Mh)−

12 (Ψh)TQ(k) D(k)

)= (Q(k))T

(Ψh(Ah

st )−1(Ψh)TQ(k) − Ψh(Mh)−1(Ψh)TQ(k) D(k)

)= (Q(k))T

(Θ

hQ(k) −Q(k) D(k))

= 0,

where we have used that Q(k) ∈ spanΨh and so Ψh(Mh)−1(Ψh)TQ(k) = Q(k).This observation implies that if we use µ(k)

i z(k)i as the initial guess for computing

(AhΨ)−1z(k)

i using the CGmethod, the initial residual z(k)i − (Ah

Ψ)(µ(k)i z(k)

i ) is orthog-onal to (Ah

Ψ)−1Z (k). Since Q(k) are already good approximate essential eigenvectorsof Θh, Z (k) are good approximate eigenvectors of (Ah

Ψ)−1, we can expect that thetarget eigenspace Zmh

, namely the eigenspace of the first mh eigenvectors of (AhΨ)−1,

can be well spanned in span(AhΨ)−1Z (k). Therefore we can reasonably assume that

z(k)i − (Ah

Ψ)(µ(k)i z(k)

i ) ∈ Zm+h= Z⊥mh

, and so again we can benefit from the restrictedcondition number κ(Ah

Ψ, Zm+h) ≤ µh,mh+1δh as introduced in Section 4.3. More-

over, we notice that the spectral residual ‖Θhq(k)i − µ(k)

i q(k)i ‖2 is bounded by 2εl by

Lemma 4.2.3, and we have

‖(Ahst )−1w(k)

i −µ(k)i (Mh)−1w(k)

i ‖2 ≤ ‖(AhΨ)−1z(k)

i −µ(k)i z(k)

i ‖2 = ‖Θhq(k)

i −µ(k)i q(k)

i ‖2,

(4.6)where we have used λmin(Mh) ≥ 1 (Lemma 4.3.7). Thus if we use µ(k)

i z(k)i as the

initial guess, the initial error will be bounded by 2εl at most, and the CG procedurewill only need

O(κ(Ah

Ψ, Zm+h) · log(

εl

ε))= O

(µh,mh+1δh · log(

εl

ε))

iterations to achieve a relative accuracy ε , instead ofO(κ(AhΨ, Zm+

h) · log( 1ε )). Notice

that using the initial guess µ(k)i z(k)

i for (AhΨ)−1z(k)

i is equivalent to using the initialguess µ(k)

i (Mh)−1w(k)i for (Ah

st )−1w(k)

i .

Supported by the analysis above, we will compute (Ahst )−1w(k)

i using the precondi-tioned CGmethod with preconditioner Mh and initial guess µ(k)

i (Mh)−1w(k)i . Again

suppose that in each PCG iteration, we also use the CG method to apply (Mh)−1

110

subject to a higher relative accuracy ε , which takes O(nnz(Mh) · κ(Mh) · log( 1ε ))computational cost. In practice, it is sufficient to take ε comparable to ε . Recall thatnnz(Mh) ≤ nnz(Ah

st ), and κ(Mh) ≤ O(εhδh) (Lemma 4.3.7), the cost of computingΘhQ(k) subject to a relative error ε is then bounded by

O(ml · µh,mh+1δh · log(

εl

ε) · nnz(Ah

st ) · εhδh · log(1ε

)).

Notice that in each refinement iterationwe also need to perform oneQR factorizationand one Schur decomposition, which together cost O(N h · m2

l ). However, as wehave mentioned in the introduction, we only consider the asymptotic complexityof our method when the original A becomes super large. In this case, the numbermtar of the target eigenpairs is considered as a fixed constant, and so the termO(N h · m2

l ) ≤ O(N hm2tar ) is considered to be minor and will be omitted in our

complexity analysis. Therefore, the total cost of refining the first mh eigenpairssubject to a prescribed accuracy ε can be bounded by

O(ml · nnz(Ah

st ) · εlδh · log(1ε

))

(4.7)

+O(ml · µh,mh+1δh · log(

εl

ε) · nnz(Ah

st ) · εhδh · log(1ε

) · log(1ε

)/ log(µh,mh

µh,ml+1)).

Again we remark that the operator Θh, the long vectors Q(k), F (k), V l and V h areonly for analysis use. Operations on long vectors of size n will be very expensive andunnecessary, especially on lower levels where the compression dimension N h(thesize of Ah

st) is small. Notice that all long vectors on the h-level are in spanΨh as

Q(k) = ΨhQ(k), F (k) = ΨhF (k), V lml= ΨhV l

ml, V h

mh= ΨhV h

mh,

we thus only operate on their coefficients in the basis Ψh. Correspondingly, when-ever we need to consider orthogonality of long vectors, we replace it by the Mh-orthogonality of their coefficient vectors. One can check that all discussions abovestill apply. Also another advantage of using the coefficient vectors is that in the previ-ous discussions, the good initial guess µ(k)

i (Mh)−1w(k)i = µ(k)

i (Mh)−1(Ψh)T q(k)i =

µ(k)i q(k) is obtained explicitly.

Summarizing the analysis above, we propose the following Algorithm 12 as our re-finement method. Since wewant the eigenspace spanned by the first mh eigenvectorsof Θh to be computed accurately, the refinement stops when dist(Q(k−1)

mh,Q(k)

mh) < ε

111

for some prescribed accuracy ε , where Q(k)mh

denotes the first mh columns of Q(k).Since Q(k) is orthogonal, one can check that

dist(Q(k−1)mh

,Q(k)mh

) = ‖Q(k)mh−Q(k−1)

mh(Q(k−1)

mh)TQ(k)

mh‖2

= ‖Q(k)mh− Q(k−1)

mh(Q(k−1)

mh)T MhQ(k)

mh‖Mh

≤

√λmax (Mh)‖Q(k)

mh− Q(k−1)

mh(Q(k−1)

mh)T MhQ(k)

mh‖2

≤√1 + εhδh‖Q

(k)mh− Q(k−1)

mh(Q(k−1)

mh)T MhQ(k)

mh‖F .

In practical, we use ‖Q(k)mh− Q(k−1)

mh(Q(k−1)

mh)T MhQ(k)

mh‖F < ε√

1+εhδhas the stopping

criterion since it is easy to check. We have used Lemma 4.3.7 to bound λmax (Mh).

Algorithm 12 Eigenpair Refinement

Input: V lml, Dl

ml, prescribed accuracy ε , target eigenvalue threshold µh.

Output: V hmh, Dh

mh.

1: Set Q(0) = V lml, D(0) = Dl

ml, k = 0;

2: for i = 1 : ml do3: gi = pcg(Bl

st, (Ul )T Mhq(0)

i ,−,−, ε ); (Q = [q1, · · · , qml])

4: end for5: F (0) = Q(0) D(0) +U lG; (G = [g1, · · · , gml

])6: repeat7: k ← k + 1;8: Q(k) R(k) = F (k−1); (QR factorization with respect to Mh orthogonality, i.e.

(Q(k))T MhQ(k) = I)9: W (k) = MhQ(k);10: for i = 1 : ml do11: f (k)

i = pcg(Ahst,w

(k)i , Mh, µ(k−1)

i q(k)i , ε ); (F = [ f1, · · · , fml

])12: end for13: S(k) = (W (k))T F (k);14: P(k) D(k) (Pk))T = S(k) (Schur decomposition, diagonals of D(k) in decreas-

ing order);15: renew mh so that µ(k)

mh≥ µh > µ(k)

mh+1;16: Q(k) ← Q(k) P(k), F (k) ← F (k) P(k);17: until ‖Q(k)

mh− Q(k−1)

mh(Q(k−1)

mh)T MhQ(k)

mh‖F < ε .

18: V hmh= Q(k)

mh, Dh

mh= D(k)

mh. (D(k)

mhdenotes the first mh-size block of D(k))

4.5 Overall AlgorithmsCombining the refinement method and the extension method, we now propose ouroverall Algorithm 13 for computing partial eigenpairs of a PD matrix A. It utilizesthe a priori multiresolution decomposition of A to compute the first mtar eigenpairsof A−1, by passing approximate eigenpairs from lower levels to higher levels to

112

finally reach a prescribed accuracy. In particular, this algorithm starts with theeigen decomposition of the lowest level (whose dimension is small enough), refinesand extends the approximate eigenpairs on each level, and stops at the highestlevel. The overall accuracy is achieved by the prescribed compression error of thehighest level. It could be clearer using a flow chart (Figure 4.1) to illustrate theprocedure of our method. If we see the eigenproblem of the original matrix A as acomplicated model, our algorithm resolves the model complexity by hierarchicallysimplifying/coarsening the originalmodel into an inductive sequence of approximatemodels.

Figure 4.1: A flow chart illustrating the procedure of Algorithm 13.

Recall that the output V (k)ex of the extension process and the initializing process are

the coefficients of V (k)ex in the basis Ψ(k). When passing these results from level k

to level k − 1, we need to recover the coefficients of V (k)ex in the basis Ψ(k−1). This

can be done by simply reforming V (k)ex ← Ψ(k)V (k)

ex (Line 3 in Algorithm 13), sinceV (k)

ex = Ψ(k)V (k)

ex = Ψ(k−1)Ψ(k)V (k)

ex .

In Algorithm 13, the parameters should be chosen carefully to ensure computationalefficiency, by using the analysis in the previous sections. We shall discuss the choiceof each parameter separately. To be consistent, we first clarify some notations. Letmk,mk be the numbers of output eigenpairs of the refinement process and theextension process respectively on level k. Ignoring numerical errors, let (µ(k)

i , v (k)i ),

i = 1, · · · , N (k), be the essential eigenpairs of the operator Θ(k) as in Section 4.3.Let (µ(k)

i , v (k)i ), i = 1, · · · ,mk , denote the output eigenpairs on level k. Notice that

(µ(k)i , v (k)

i ), i = 1, · · · , mk , are the output of the refinement process, and (µ(k)i , v (k)

i ),i = mk + 1, · · · ,mk , are the output of the extension process.

113

Choice Of Multi-level Accuracies ε (k): Notice that there is a compression errorεk between level k and level k − 1. That is to say, no matter how accurately wecompute the eigenpairs of Θ(k), they are approximations of eigenpairs of Θ(k−1)

subject to accuracy no better that εk . Therefore, on the one hand, the choice ofthe algorithm accuracy ε (k) for the eigenpairs of Θ(k) on each level should notcompromise the compression error. On the other hand, the accuracy should not beover-achieved due to the presence of the compression error. Therefore, we chooseε (k) = 0.1 × εk in practice.

Choice Of Thresholds (µ(k)re , µ

(k)ex )Kk=1: These thresholds provide control on the

smallest eigenvalues of output eigenpairs of both the refinement process and theextension process in that

µ(k)mk≥ µ(k)

re > µ(k)mk+1, µ(k)

ex ≥ µ(k)mk, k = 1, 2, · · · , K .

Recall that the outputs of the refinement process are the inputs of the extensionprocess, and the outputs of the extension process are the inputs of the refinementprocess on the higher level. By Theorem 4.3.8, to ensure the efficiency of theextension process, we need to uniformly control the restricted condition number

κ(A(k)Ψ, Z (k)

m+k

) ≤ µ(k)mk+1δk < µ(k)

re δk .

Recall that in Section 4.4 the convergence rate of the refinement process is given byµh,ml+1µh,mh

, where l corresponds to k + 1 and h corresponds to k on each level k. Thusto ensure the efficiency of the refinement process we need to uniformly control theratio

µ(k)mk+1+1

µ(k)mk

≤µ(k)

mk+1

µ(k)re

≤µ(k+1)

mk+1 + εk+1

µ(k)re

≤µ(k+1)

ex + εk+1

µ(k)re

,

where εk+1 is the compression error between level k+1 and level k, andwe have usedLemma 4.2.3. Thus, more precisely, we need to choose thresholds (µ(k)

re , µ(k)ex )Kk=1

so that there exist uniform constants κ > 0, γ ∈ (0, 1) so that

(i) µ(k)re δk ≤ κ, (ii)

µ(k+1)ex + εk+1

µ(k)re

≤ γ. (4.8)

Due to the existence of εk , condition (ii) implies that there is no need to choose µ(k)ex

much smaller than εk , which suffers from over-computing but barely improves theefficiency of the refinement process. So one convenient way is to choose

µ(k)re = αεk+1, µ(k)

ex = βεk, (4.9)

114

for some uniform constants α, β > 0 such that α > 1 + β. Recall that whenconstructing the multiresolution decomposition, we impose conditions εkδk ≤ c

and εk = ηεk+1 for some uniform constants c > 0 and η ∈ (0, 1). Thus we have

µ(k)re δk =

α

ηεkδk ≤

αcη= κ,

µ(k+1)ex + εk+1

µ(k)re

=1 + βα= γ < 1.

Choice of Searching Step d: In the first part of the extension algorithm, we explorethe number mk so that µ(k)

mk≤ µ(k)

ex , and we do this by setting an exploring step sized and examining the last few eigenvalues every d steps of the Lanczos iteration.The step size d should neither be too large to avoid over computing, nor too smallto ensure efficiency. In practical, we choose d = minb dimΨ

(k )

10 c, bmtar

10 c.

Complexity: Now we summarize the complexity of Algorithm 13 for computingthe first mtar largest eigenpairs of A−1 for a PD matrix A ∈ Rn×n subject to an errorε. Suppose we are provided a K-level MMD of A with εkδk ≤ c, εk = ηεk+1,and ε1 = ε. In what follows, we will uniformly estimate nnz(A(k)

st ) ≤ nnz(A),ε (k) ≥ ε (1) = 0.1ε1 and mk ≤ mtar .

We first consider the complexity of all refinement process. Notice that by our choiceεk+1ε (k ) =

εk+10.1ε (k ) =

10.1η , the factor log( εlε ) in (4.7), which is now log( εk+1

ε (k ) ), can be

estimated as O(log( 1η )). Since we can will make sureµ(k )mk+1+1

µ(k )mk

≤ γ for some constant

γ < 1, the factor log(µh,mh

µh,ml+1) in (4.7), which is now log(

µ(k )mk

µ(k )mk+1+1

), can be seen as

a constant. Also using estimates µh,mhδh ≤

αcη = O( c

η ), εlδh ≤cη , εhδh ≤ c and

log 1ε = O(log 1

ε ), we modify (4.7) to obtain the complexity of all K-level refinementprocess

O(mtar · nnz(A) ·

c2

ηlog(

1η

) · (log1ε

)2 · K). (4.10)

Next we consider the complexity of all extension process. As we have discussedin Section 4.3, the major cost of the extension process comes from the operationof adding a new vector (the adding operation) to the Lanzcos vectors (Line 7 ofAlgorithm 10 that happens in Line 3 of Algorithm 11). Using estimates µmδ(P) ≤αcη = O( c

η ), ε(P)δ(P) ≤ c, log 1ε = O(log 1

ε ), we modify (4.4) to obtain the cost ofevery single call of the adding operation as

O(

c2

η· nnz(A) · (log

1ε

)2).

115

On every level, the indexes contributing to adding operations go from mk + 1 to mk .Due to the refinement process, we have mk ≤ mk+1, and so every single index from1 to mtar may contribute more than one adding operations. But if we reasonablyassume that µ(k+1)

ex > µ(k−1)re , namely β > αη under parameter choice (4.9), we will

have m(k+1) < m(k−1), and so every index from 1 to mtar will contribute no morethan two adding operations. Therefore the total cost of all extension process can beestimated as

O(mtar ·

c2

η· nnz(A) · (log

1ε

)2). (4.11)

We remark that the cost of implicit restarting process is only a constant multiple of(4.11). Combining (4.10) and (4.11), we obtain the total complexity of our method

O(mtar · nnz(A) ·

c2

ηlog(

1η

) · (log1ε

)2 · K). (4.12)

To further simplify (4.12), we need to use estimates for the MMD given in Sec-tion 3.2. In particular, to preserve sparsity nnz(A(k)

st ) ≤ nnz(A), we need to choosethe scale ratio η−1 = (log 1

ε+log n)p as in (3.21) for some constant p. We remark thatfor graph Laplacian, p = 1. The resulting level number is K = O( log n

log(log 1ε+log n)

).The condition bound c can be imposed to be uniform constant by Algorithm 6 inSection 3.2.1. Then the overall complexity of Algorithm 13 can be estimated as

O(mtar · nnz(A) · (log

1ε+ log n)p · (log

1ε

)2 · log n)

= O(mtar · nnz(A) · (log

1ε+ log n)p+3

).

(4.13)

4.6 Numerical ExamplesIn this section we present several numerical examples for the eigensolver. We willuse Algorithm 13 to compute a relative large number of eigenpairs of large matricessubject to prescribed accuracies.

4.6.1 Dataset DescriptionThe datasets we use are drawn from different physical contexts. They are generatedas 3D point clouds and transformed into graphs by adding edges in the KNN setting.

• The first dataset is the well-known “Stanford Bunny” from Stanford 3D Scan-ning Repository1. A reconstructed bunny has 35947 vertices that can beembedded into a surface in R3 with 5 holes in the bottom.

1http://graphics.stanford.edu/data/3Dscanrep/

116

Algorithm 13 Hierarchical Eigenpair ComputationInput: K-level decomposition Θ(k)Kk=1 of PD matrix A, target number mtar ,

searching step d,prescribed multi-level accuracies ε (k), extension thresholds µ(k)

ex Kk=1, refine-

ment thresholds µ(k)re

Kk=1.

Output: V , D.1: Find the eigen pairs [V (K )

ex , D(K )ex ] of the eigen problem (A(K )

st )−1M (K ) x = µx;2: for k = K − 1 : 1 do3: V (k+1)

ex ← Ψ(k+1)V (k+1)ex

4: [V (k)ini , D(k)

ini ] = Eigen_Refine([V (k+1)ex , D(k+1)

ex ]; ε (k), µ(k)re );

5: op = OP( · ; A(k), M (k), ε (k));6: [V (k)

ex , D(k)ex ] = Eigen_Extend([V (k)

ini , D(k)ini ]; op, ε (k), µ(k)

ex , d,mtar );7: end for8: V = Ψ(1)V (1)

ex D = D(1)ex .

• The second dataset is a MRI data of brain from the Open Access Series ofImaging Sciences (OASIS)2. They use FreeSurfer to reconstruct the surfacefrom MRI scan and obtain a point cloud with 48463 points.

• The third dataset is a “SwissRoll” model, which is popular in manifold learn-ing. Vertices are generated by

(xi, yi, zi) = (ticos(ti), yi, tisin(ti)) + ηi, i = 1, 2, ..., n, (4.14)

where tii.i.d∼ U [1.5π, 4.5π], yi

i.i.d∼ U [0, 20], and ηi

i.i.d∼ N (0, 0.05I3). It can

be viewed as a spiral of one and a half rounds plus random noise. In ourexamples the roll has n = 20000 points.

With point clouds at hand, we apply the KNN setting to construct graphs withkbunny = 20, kbrain = 20 and kswissroll = 10. Each existing edge ei j is weighted ase−r2i, j/σ, where ri, j is the Euclidean distance between vertices vi and v j , and σ is aparameter. We have σbunny = 10−6, σbrain = 10−4 and σswiss = 0.1. Figure 4.2shows the point clouds of datasets.

From the graphs given above, we construct their related graph Laplacians L in2http://www.oasis-brains.org/

117

the general setting:

Li j =

∑k∼i wik, i = j,

−wi j, i , j .

Further, without loss of generality, we rescale all graph Laplacians and add uni-form self-loops of weight 1 to them, so that each of them satisfies (i)λ1 = 1, (ii)λ2 = O(1). Under this construction, we obtain three graph Laplacian matricesLbunny, Lbrain, Lswissroll . Lbunny has size n = 35947, sparsity nnz = 714647 andcondition number κ(Lbunny) = 1.86 × 104; Lbrain has size n = 48463, sparsitynnz = 1038065 and condition number κ(Lbunny) = 1.14 × 105; Lswissroll has sizen = 20000, sparsity nnz = 248010 and condition number κ(Lbunny) = 1.15 × 106.

Figure 4.2: Point cloud of datasets. From left to right: (1) bunny (point cloud andsculpture); (2) brain; (3) Swiss roll.

4.6.2 Numerical MMDBefore computing eigenpairs of graph Laplacians from our datasets using Algo-rithm 13, we need to apply Algorithm 6 in Section 3.2.1 to obtain the MMDs,which is the only pre-computation step in our proposed algorithm. For each graphLaplacian, we perform the decomposition with a prescribed condition bound c and aseries of multi-level resolutions (compression errors) εk

Kk=1. Note that we perform

two decompositions with different multiresolutions for the SwissRoll data. Thedecomposition time for each example is reported in Table 4.1. Recall that the to-tal complexity of Algorithm 6 is O

(nnz(A) · log n ·

(log 1

ε + log n)3d+p)), where d

denotes the intrinsic geometric dimension of the graph. By comparingwith the com-plexity estimate in (4.13), when mtar

(log 1

ε + log n)3d−2

, the pre-computationtime for constructingMMD only takes up a relative small portion of overall time. Asillustrated later in Table 4.8, even with the pre-computation time taken into account,our proposed algorithm is still faster than other well-established methods.

118

Data 2-level Bunny 4-level Brain 4-level SwissRoll 3-level SwissRollTime 10.245 34.589 8.124 9.430

Table 4.1: Matrix decomposition time (in seconds) for different examples.

Table 4.2 and Table 4.3 give the detailed information of all decompositions we willuse for eigenpair computation. In Table 4.2, K is the number of levels, ε1 is thefinest (prescribed) accuracy, η is the ratio εk/εk+1 and c is the condition boundsuch that εkδk ≤ c. By Lemma 4.3.7, the condition number of M (k) is bounded asκ(M (k)) ≤ 1 + εkδk ≈ c, and by Corollary 3.1.4, the condition number of B(k) isbounded as κ(B(k)) ≤ εkδk−1 ≤ c/η. We can see in Table 4.3 that these bounds arewell satisfied. Recall that the bounded condition number of M (k) is essential for theefficiency of Algorithm 11, and the bounded condition number of B(k) is essentialfor the efficiency of Algorithm 12.

Table 4.3 also shows the detailed information for all four decompositions. The2-norm of A(k), namely λmax (A(k)) decreases as k increases, and well boundedas ‖A(k) ‖2 ≤ δk ≤ cε−1k as expected by Corollary 3.1.4 (we have normalizedµ1 = ‖L−1‖2 to 1). And the sparsities of A(k) and M (k) are of the same order as thesparsity of A(0) = L, i.e. nnz(A(k)), nnz(M (k)) = O(nnz(A(0))) as ensured by ourchoice of parameters discussed in Section 4.5.

Data K ε1 η c Bound on κ(M (k )) Bound on κ(B(k ))

Bunny 2 10−3 0.1 20 20 200

Brain 4 10−4 0.2 20 20 100

SwissRoll 3 10−5 0.1 20 20 200

SwissRoll 4 10−5 0.2 20 20 100

Table 4.2: Decomposition parameters different datasets.

4.6.3 The Coarse Level Eigenpair ApproximationWe first use the decompositions given above to compute the first few eigenpairsof graph Laplacians with relatively low accuracies. Numerical results reveal thateven on the coarse levels, the compressed (low dimensional) operators show goodspectral approximation properties with regard to the smallest eigenvalues of L (orthe largest eigenvalues of L−1). Here we take the bunny data and the brain dataas examples. For the bunny data, we use the lowest level k = 2 with compressionerror ε2 = 0.01; for the brain data, we use level k = 3 with compression error

119

Level k εk dim(A(k ) ) nnz (A(k ) ) ‖A(k ) ‖2 nnz (M (k ) ) κ (M (k ) ) κ (B(k ) )

The 2-level decomposition of Bunny data.

0 - 35947 714k = m 1.86 × 104 - - -

1 10−3 2641 613k ≈ 0.86m 1.05 × 104 203k ≈ 0.28m 1.45 5.58

2 10−2 198 27k ≈ 0.04m 1.37 × 103 10k ≈ 0.02m 2.05 45.03

The 4-level decomposition of Brain data.

0 - 48463 1038k = m 1.14 × 105 - - -

1 10−4 11622 2546k ≈ 2.45m 7.82 × 104 725k ≈ 0.70m 1.29 5.80

2 5 × 10−4 1713 431k ≈ 0.42m 2.01 × 104 189k ≈ 0.18m 1.84 18.34

3 2.5 × 10−3 252 37k ≈ 0.04m 3.33 × 103 20k ≈ 0.02m 2.19 28.23

4 1.25 × 10−2 35 1k < 0.01m 4.53 × 102 1k < 0.01m 2.02 35.08

The 3-level decomposition of SwissRoll data.

0 - 20000 248k = m 1.15 × 106 - - -

1 10−5 5528 689k ≈ 2.78m 4.31 × 105 197k ≈ 0.79m 1.45 10.06

2 10−4 723 108k ≈ 0.44m 7.44 × 104 35k ≈ 0.14m 2.30 67.47

3 10−3 55 2k < 0.01m 5.45 × 103 1k < 0.01m 3.92 185.93

The 4-level decomposition of SwissRoll data.

0 - 20000 248k = m 1.15 × 106 - - -

1 10−5 5528 689k ≈ 2.78m 4.31 × 105 197k ≈ 0.79m 1.45 10.06

2 5 × 10−5 1347 215k ≈ 0.87m 9.36 × 104 65k ≈ 0.26m 1.90 26.52

3 2.5 × 10−4 203 18k ≈ 0.08m 1.89 × 104 9k ≈ 0.04m 3.06 98.87

4 1.25 × 10−3 53 2k < 0.01m 3.72 × 103 1k < 0.01m 3.36 51.14

Table 4.3: Decomposition information of (i) Bunny (2-level) (ii) Brain (4-level) and(iii) SwissRoll (3, 4-level) data. m , nnz(A(0)). 1k = 1000.

ε3 = 0.0025. We compute the first 50 eigenpairs vi, λi of the compressed operatorby directly solving the general eigen problem (Lemma 4.2.2)

A(k) zi = λi M (k) zi, vi = Ψ(k) zi, i = 1, · · · , 50.

The computation of the coarse level eigenproblem is much more efficient due tothe compressed dimension. To show the error of the approximate eigenvalues, theground truth is obtained by using the Eigen C++ Library 3. Figure 4.3 shows theabsolute and relative errors of these eigenvalues. In both cases µi is the ith largesteigenvalue of L−1 and λi = 1/µi; µi is the ith largest eigenvalue of the compressedproblem Θ(k) and λi = 1/µi. By Lemma 4.2.3, |µi − µi | is bounded by εk and‖L−1vi − µi vi‖2 is bounded 2εk . We can see in Figure 4.3 that both estimates arewell satisfied. In particular, the error of the first eigenvalue is close to the bound ofεk . However, the first eigenpair is already known. Therefore, we are only interestedin the 2nd up to 50th eigenvalues and we embed the sub-plot of these eigenvalue

120

errors as shown in Figure 4.3a and Figure 4.3c respectively.

(a) error µi− µi and residual ‖L−1vi−µi vi ‖2 (b) relative error of µi and λi

(c) error µi− µi and residual ‖L−1vi−µi vi ‖2 (d) relative error of µi and λi

Figure 4.3: The error, the residual and the relative error. Top: Bunny data; bottom:Brain data

Next, we qualitatively examine the accuracy of the approximate eigenvectors ofthe compressed operators by comparing their behaviors in image segmentation tothose of the true eigenvectors of the original Laplacian operators. In the imagesegmentation, the eigenvectors of graph Laplacian provide a solution to graphpartitioning problem. Namely, for a partition (A, B) that satisfies A ∪ B = V andA ∩ B = ∅, a measure of their disassociation called the normalized cut (Ncut) isdefined as [107]

Ncut(A, B) =cut(A, B)

assoc(A,V )+

cut(A, B)assoc(B,V )

, (4.15)

where

cut(A, B) =∑

u∈A,v∈B

w(u, v), assoc(A,V ) =∑

u∈A,t∈V

w(u, t).

3Eigen C++ Library is available at http://eigen.tuxfamily.org/index.php?title=Main_Page

http://eigen.tuxfamily.org/index.php?title=Main_Page

http://eigen.tuxfamily.org/index.php?title=Main_Page

121

Shi and Malik [107] showed that, for a connected graph, minimizing Ncut canbe rephrased as finding the eigenvector v2 that corresponds to the second smallesteigenvalue λ2 of the graph Laplacian (since we always have λ1 = 0 and v1 auniform vector). Taking sign(v2) transforms it into a binary vector which gives asatisfactory cut. Moreover, the next few eigenvectors provide further cuts of thepreviously partitioned fractions. Therefore, our eigensolver may serve as a powerfultool for graph partitioning, as well as its applications including image segmentationand manifold learning.

We test graph partitioning on bunny and brain datasets using the eigenvectors ofboth original and compressed operators. Figures 4.4 and 4.5 shows the colormapand the partition generated by some selected eigenvectors. From the pictures we cansee that the original and the compressed operators give very similar results when itcomes to graph partitioning. The compressed operator is not only easier to compute,but also gives a satisfactory partition in practical settings.

Figure 4.4: Colormap (left) and partition (right) using the 2nd, 4th and 6th eigenvec-tors of the original/compressed operator.

Figure 4.6 gives an example of refining the partition with more eigenvectors. In thebrain data, a fraction that is left intact in the first 5 eigenvectors (the light green parton the left) is divided into a lot more fractions when eigenvectors pile up to 15.

4.6.4 The Multi-level Eigenpair ComputationTo test the efficacy of the hierarchical structure in our approach, we use our mainAlgorithm 13 to compute a relatively large number of eigenpairs of Laplacian ma-

122

Figure 4.5: Colormap (left) and partition (right) using the 2nd, 4th and 6th eigenvec-tors of the original/compressed operator.

Figure 4.6: Heaping up more eigenvectors leads to finer partition. Left: partitionusing the first 5 eigenvectors. Middle: a uniform fraction from the previous partition.Right: further partition using the next 10 eigenvectors.

trices subject to the prescribed accuracy. In particular, for both the Brian dataand the SwissRoll data, we compute the first 500 eigenpairs of the graph Lapla-cian subject to prescribed accuracy |λ−1i − λ

−1i | = |µi − µi | ≤ ε = ε1. The three

decompositions of these two datasets are used in this experiment; for each decom-position, we apply Algorithm 13 with two sets of parameters, (α, β) = (5, 2) and(α, β) = (3, 1). The details of the results obtained by Algorithm 13 are summarizedin Table 4.4-Table 4.7. In Table 4.4, parameters α, β, κ, γ are defined in Section 4.5.In Table 4.5-Table 4.7, we collect numerical results that reflect the efficiency of eachsingle process (refinement or extension). Here we give a detailed description of thenotations we use in these tables:

• #I and #O denote the numbers of input and output eigenpairs. To be consistentwith the notations defined in Section 4.5, we use (#I,#O)= (mk+1, mk ) forrefinement process on level k, and (#I,#O)= (mk,mk ) for extension processon level k.

123

• #Iter denotes the number of orthogonal iterations in the refinement process.Note that this number is controlled by the ratio γ.

• #cg(B(k)) denotes number of CG calls concerning B(k) in the refinementprocess; #pcg(A(k)) denotes the number of PCG calls concerning A(k) in therefinement process and the extension process. #(B(k)) and #(A(k)) denotethe average numbers of matrix-vector multiplications concerning B(k) andA(k) respectively, namely the average numbers of iterations, in one singlecall of CG or PCG. Note that #(B(k)) is controlled by log(1/ε (k))κ(B(k)) ≤log(1/ε (k))c/η, and #(A(k)) by log(1/ε (k))κ(A(k)

Ψ, Z (k)

m+k

) ≤ log(1/ε (k))αc/η.

• As the extension process proceeds, the target spectrum to be computed onthis level shrinks even more, and so does the restricted condition numberof the operator. Thus the numbers of iterations in each PCG call get muchsmaller than its expected control log(1/ε (k))αc/η, which is a good thing inpractice. So to study how the theoretical bound log(1/ε (k))αc/η really affectsthe efficiency of PCG calls, it is more reasonable to investigate the maximalnumber of iterations in one PCG call on each level. We use #(A(k)) to denotethe largest number of iterations in one single PCG call on level k.

• #(M (k)) denotes the average number of matrix-vector multiplications con-cerning M (k) in one single CG call. Such CG calls occur in the PCG callsconcerning A(k) where M (k) acts as the preconditioner. Note that #(M (k)) iscontrolled by log(1/ε (k))κ(M (k)) ≤ log(1/ε (k))(1 + c).

• “Main Cost” denotes the main computational cost contributed by matrix-vector multiplication flops. In the refinement process we have

Main Cost =#cg(B(k+1)) · #(B(k+1)) · nnz(A(k))

+ #pcg(A(k)) · #(A(k)) ·(nnz(A(k)) + #(M (k)) · nnz(M (k))

),

while in the extension process we have

Main Cost = #pcg(A(k)) · #(A(k)) ·(nnz(A(k)) + #(M (k)) · nnz(M (k))

).

Table 4.5-Table 4.7 show the efficiency of our algorithm. We can see that #(B(k)) and#(M (k)) arewell bounded as expected, due to the artificial imposition of the conditionbound c. #(A(k)) and the numerical condition number #(A(k))/ log(1/ε (k)) are alsowell controlled by choosing α properly to bound κ = αc/η. It is worth mentioningthat #(A(k))/ log(1/ε (k)) appears to be uniformly bounded for all levels, actually

124

Data MMD (α, β) (η, c) κ γ Total #Iter Total Main Cost

Brain 4-level (5, 2) (0.2, 20) 500 3/5 12 4.37 × 105 · m

4-level (3, 1) (0.2, 20) 300 2/3 15 4.13 × 105 · m

SwissRoll 3-level (5, 2) (0.1, 20) 1000 3/5 13 7.56 × 105 · m

3-level (3, 1) (0.1, 20) 600 2/3 16 7.17 × 105 · m

SwissRoll 4-level (5, 2) (0.2, 20) 500 3/5 19 7.00 × 105 · m

4-level (3, 1) (0.2, 20) 300 2/3 28 5.86 × 105 · m

Table 4.4: Eigenpair computation information. m , nnz(A(0)).

much smaller than κ, which reflects our uniform control on efficiency. #Iter is wellbounded due to the proper choice of β for bounding γ = (1 + β)/α.

We may also compare the results for the same decomposition but from two differentsets of parameters (α, β). For all three decompositions, the experiments with(α, β) = (5, 2) have a smaller γ = 3

5 , and thus is more efficient in the refinementprocess (less #Iter and less refinement main cost). While the experiments with(α, β) = (3, 1) have a smaller κ that leads to better efficiency in the extensionprocess (smaller #(A(k))/ log(1/ε (k)) and less extension main cost). But since thedominant cost of the whole process comes from the extension process, thus theexperiments with (α, β) = (3, 1) have a smaller total main cost.

We remark that the choice of (α, β) not only determines (κ, γ) that will affect thealgorithm efficiency, but also determines the segmentation of the target spectrum andits allocation towards different levels of the decomposition. Smaller values of α andβmeansmore eigenpairs being computed on coarser levels (larger k), which relievesthe burden of the extension process for finer levels, but also increases the load ofthe refinement process. There could be an optimal choice of (α, β) that minimizesthe total main cost, balancing between the refinement and the extension processes.However, without a priori knowledge of the distribution of the eigenvalues, whichis the case in practice, a safe choice of (α, β) would be α, β = O(1).

To further investigate the behavior of our algorithm, we focus on numerical exper-iments carried out on the 4-level decomposition of the SwissRoll data. Figure 4.7shows the convergence of the computed spectrum in different errors. Figure 4.8shows the completion and the convergence process of the target spectrum in thecase of (α, β) = (3, 1) (corresponding to Table 4.7). We use a log-scale plot toillustrate the error |µi − µ(k) | after we complete the refinement process and the

125

(α, β) = (5, 2)

Refin

ement

Lvl k (#I,#O) #Iter #cg (B) #(B) #pcg (A) #(A) #(M ) Main Cost

3 (7, 4) 4 7 24.43 28 10.97 6.10 5.66 × 101 · m

2 (41, 17) 4 41 25.90 164 16.26 6.12 4.50 × 103 · m

1 (207, 84) 4 207 23.44 828 19.17 4.64 1.02 × 105 · m

Extension

Lvl k (#I,#O) #(A) ε #(A)log (1/ε) #pcg (A) #(A) #(M ) Main Cost

3 (4, 41) 43 2.5 × 10−4 5.18 175 16.93 5.39 4.37 × 102 · m

2 (17, 207) 75 5.0 × 10−5 7.57 500 32.27 5.47 2.27 × 104 · m

1 (84, 500) 82 10−5 7.12 1248 44.23 4.45 3.07 × 105 · m

(α, β) = (3, 1)

Refin

ement


3 (15, 6) 5 15 24.54 75 7.74 6.07 1.08 × 102 · m

2 (78, 28) 5 78 25.85 390 11.17 6.01 7.39 × 103 · m

1 (276, 140) 5 276 23.43 1380 14.28 4.67 1.29 × 105 · m

Extension


3 (6, 78) 37 2.5 × 10−4 4.46 225 14.12 5.41 4.70 × 102 · m

2 (28, 276) 57 5.0 × 10−5 5.75 600 27.91 5.43 2.34 × 104 · m

1 (140, 500) 63 10−5 5.47 1080 42.09 4.46 2.53 × 105 · m

Table 4.5: 4-level eigenpairs computation of Brain data with (η, c) = (0.2, 20).

extension process respectively on each level k. As we can see, each application ofthe refinement process improves the accuracy of the first mk eigenvalues at least bya factor of η = εk

εk+1, but at the price of discarding the last mk+1 − mk computed

eigenvalues. So the computation of the last mk+1 − mk computed eigenvalues on thecoarser level k + 1 actually serves as preconditioning to ensure the efficiency of therefinement process on level k. Then the extension process extends the spectrum tomk that is determined by the threshold µ(k)

ex . The whole computation is an iterativeprocess that improves the accuracy of the eigenvalues by applying the hierarchicalLanczos method to each eigenvalue at most twice.

We also further verify our critical control of the restricted condition numberκ(A(k)

Ψ, Z (k)

m+k

) by the parameter κ = αc/η, by showing the dependence of #(A(k)) (or

#(A(k))/ log(1/ε (k))) on κ. Recall that #(A(k)) denotes the largest number of itera-tions in one single PCG call concerning A(k) on level k. Using the 4-level decompo-sition of the SwissRoll data with (η, c) = (0.2, 20), we perform Algorithm 13 with

126

(α, β) = (5, 2)

Refin

ement Lvl k (#I,#O) #Iter #cg (B) #(B) #pcg (A) #(A) #(M ) Main Cost

2 (21, 12) 7 21 52.14 147 17.61 6.33 3.91 × 103 · m

1 (232, 100) 6 232 47.23 1392 16.08 5.29 1.86 × 105 · mEx

tension Lvl k (#I,#O) #(A) ε #(A)

log (1/ε) #pcg (A) #(A) #(M ) Main Cost

2 (12, 232) 94 10−5 8.16 650 28.20 7.25 2.67 × 104 · m

1 (100, 500) 101 10−6 7.31 1200 59.44 6.10 5.42 × 105 · m

(α, β) = (3, 1)

Refin

ement Lvl k (#I,#O) #Iter #cg (B) #(B) #pcg (A) #(A) #(M ) Main Cost

2 (35, 19) 8 35 51.89 280 13.13 6.45 5.74 × 103 · m

1 (315, 165) 8 315 46.85 2520 12.73 5.37 2.66 × 105 · m

Extension Lvl k (#I,#O) #(A) ε #(A)

log (1/ε) #pcg (A) #(A) #(M ) Main Cost

2 (19, 315) 69 10−5 5.99 700 25.10 7.29 2.57 × 104 · m

1 (165, 500) 78 10−6 5.65 1005 54.91 6.11 4.20 × 105 · m

Table 4.6: 3-level eigenpairs computation of SwissRoll data with (η, c) = (0.1, 20).

fixed β = 1 but different α ∈ [3, 5]. Figure 4.9 shows #(A(k)) versus α for all threelevels. By Theorem 4.3.8, we expect that #(A(k)) ∝ κ · log(1/ε (k)) ∝ α · log(1/ε (k)).This linear dependence is confirmed in Figure 4.9. It is also important to note thatthe curve(green) corresponding to level 1 is below the curve (blue) correspondingto level 2 in Figure 4.9b, which again implies that #(A(k))/ log(1/ε (k)) is uniformlybounded for all levels.

4.7 Comparison with the IRLMOwning to the observation in [76] that IRLM is still one of the most performingand well-known algorithms for finding a large portion of smallest eigenpairs, in thissection, we compare the computation complexity of our proposed algorithm withthe IRLM.

To quantitatively compare the two methods, we record the computation time andthe number of CG iterations as the benchmarks. The reasons for doing this are asfollows:

• In large-scale setting, direct method for solving sparse matrix A−1 is gen-eral, not practical since large memory storage is required. Instead, iterativemethods, especially the CG method (as A is PD in our case) is employed.

127

(α, β) = (5, 2)

Refin

ement


3 (18, 10) 6 18 22.61 108 7.19 7.87 3.39 × 102 · m

2 (84, 44) 8 84 43.45 672 10.42 6.49 2.11 × 104 · m

1 (390, 195) 5 390 28.85 1950 11.68 5.42 1.92 × 105 · m

Extension


3 (10, 84) 42 2.5 × 10−5 3.96 200 18.32 8.43 1.53 × 103 · m

2 (44, 390) 63 5 × 10−6 5.16 1050 29.30 7.24 8.47 × 104 · m

1 (195, 50) 71 10−6 5.13 915 57.47 6.10 4.00 × 105 · m

(α, β) = (3, 1)

Refin

ement


3 (31, 16) 7 31 22.45 217 6.09 8.09 5.89 × 102 · m

2 (95, 67) 12 95 43.44 1140 7.66 6.66 2.63 × 104 · m

1 (459, 314) 7 459 28.75 3656 8.71 5.56 2.65 × 105 · m

Extension


3 (16, 95) 31 2.5 × 10−5 2.92 200 16.61 8.48 1.39 × 103 · m

2 (67, 459) 49 5 × 10−6 4.01 1100 25.66 7.27 7.79 × 104 · m

1 (314, 500) 55 10−6 3.98 558 50.61 6.12 2.15 × 105 · m

Table 4.7: 4-level eigenpairs computation of SwissRoll data: (η, c) = (0.2, 20).

• In both the IRLM and our proposed algorithm, the dominating complexitycomes from the operator of computing A−1b for some b.

Remark 4.7.1. For small-scale problems, a direct solver (such as sparse Choleskyfactorization) for A−1 is preferred in the IRLM. In this way, only one factorizationstep for A is required prior to the IRLM. Moreover, solving for A−1 in each iterationis replaced by solving two lower triangular matrix systems. This will bring a signif-icant speedup for the IRLM. However, recall that we are aiming at understandingthe asymptotic behavior and performance of these methods. Therefore, the IRLMdiscussed in this section employs the iterative solver instead of a direct solver.

To be consistent, all the experiments are performed on a single machine equippedwith Intel(R) Core(TM) i5-4460 CPU with 3.2GHz and 8GB DDR3 1600MHzRAM. Both the proposed algorithm and the IRLM are implemented using C++ with

128

(a) log10(µi − µ(k )i ), i = 1, · · · ,mk

(b) log10(‖L−1v (k )i − µi v

(k )i ‖2), i =

1, · · · ,mk

(c) log10((µi − µ(k )

i )/µi), i =

1, · · · ,mk

(d) log10((λi − λ (k )

i )/λi), i =

1, · · · ,mk

Figure 4.7: Convergence of computed spectrum in different errors.

the Eigen Library for fairness. In particular, the built-in (preconditioned) CG solversare used in the IRLM implementation, instead of implementing on our own.

Table 4.8 shows the overall computation time for computing the leftmost (i) 300;(ii) 200 and (iii) 100 eigenpairs using (i) our proposed algorithm, (ii) the IRLMwith incomplete Cholesky PCG method (IRLM-ICCG); and (iii) the IRLM withclassical CG method (IRLM-CG). In this numerical example, the error tolerance ofthe eigenvalues in all three cases is set to 10−5. Since the error for IRLM cannot beobtained a priori, we fine-tune the relative error tolerance for the (preconditioned)CG solver such that eigenvalues errors are of order O(10−6). For the proposedalgorithm, the time required for level-wise eigenpair computation is recorded. In thebottom level (level-4 or level-3 in these cases), we have used the built-in eigensolverfunction in the Eigen Library to obtain the full eigenpairs (corresponding to Line 1in Algorithm 6). As the problem size is small, the time complexity is insignificantfor all three examples.

The total runtime of our proposed algorithm in each example is computed by sum-

129

(a) after level 3 refinement (b) after level 3 extension (c) after level 2 refinement

(d) after level 2 extension (e) after level 1 refinement (f) after level 1 extension

Figure 4.8: The completion and convergence process of the target spectrum. The re-finement process retains part of the spectrum subject to threshold µ(k)

re with improvedaccuracy, and the extension process extends the spectrum subject to threshold µ(k)

ex .The whole process is an iterative procedure that aims at improving the accuracy ofthe eigenvalue solver.

(a) #A(k ) versus α (b) #A(k )/ log(ε−1pcg ) versus α

Figure 4.9: #A(k) versus α in the 4-level SwissRoll example.

ming up all levels’ computation time, plus the MMD time (which is the secondrow in Table 4.8). For all these examples, our proposed algorithm outperforms theIRLM. Although both the size of the matrices and their corresponding conditionnumbers are not extremely large, the numerical experiments already show a observ-able improvement. From the theoretical analysis discussed in the previous sections,this improvement will even be magnified if the PD matrices are of larger scales and

130

# Eigenpairs Methods 4-level Brain 4-level SwissRoll 3-level SwissRollDecomposition∗ 34.589 8.124 9.430

300Proposed∗∗

Level-4 0.010 0.011 -Level-3 0.841 0.560 0.083Level-2 29.122 40.796 18.729Level-1 61.286 18.846 22.440

Total ( ∗ + ∗∗ ) 125.848 68.337 50.682IRLM-CG 174.028 81.005IRLM-ICCG 525.73 289.385

200Proposed∗∗



100Proposed∗∗



Table 4.8: Computation time (in seconds) for the 4-level Brain, 3-level SwissRolland the 4-level SwissRoll examples using the proposed Hierarchical multi-leveleigensolver; the IRLM with CG solver and the IRLM with incomplete CholeskyPCG solver.

more ill conditioned. Indeed, we assert that our proposed algorithm cannot be fullyutilized in these illustrations. Therefore, one of the main future works is to performdetailed numerical experiments in these cases. For instance, by considering the3-level and 4-level SwissRoll examples, we observe that a 3-level decomposition isindeed sufficient for SwissRoll graph Laplacian, where we recall the correspondingcondition number is ‖A‖2 = 1.15 × 106. Therefore, using a 3-level decomposition,the overall runtime reduction goes up to approximately 37% when 300 eigenpairsare required.

Notice that the time required for the IRLM-ICCG is notably much more than that ofthe IRLM-CG, which contradicts to our usual experience regarding precondition-ing. In fact, such phenomenon can be explained as follows: In the early stage of theIRLM, preconditioning with incomplete Cholesky factorization helps reducing the

131

iteration number of the CG. However, when the eigen-subspace is gradually pro-jected away throughout the IRLM process, the spectrum of the remaining subspacereduces and therefore CG iteration numbers also drops significantly. On the con-trary, preconditioning with incomplete Cholesky ignores such update in spectrumand therefore the CG iteration number is uniform throughout the whole Lanczositeration. Hence, the classical CG method is preferred if a large number of leftmosteigenpairs are required. Figure 4.10a shows the CG iteration numbers in the IRLM-ICCG, IRLM-CG and respectively, our proposed hierarchical eigensolver versus theLanczos iteration. More precisely, if we call Vk in (4.1) to be the Lanczos vector, thex-axis in the figure then corresponds to the first time we generate the i-th column ofthe Lanczos vector. For IRLM methods, it is equivalent to the extension procedurefor the i-th column of the Lanczos vector, which corresponds to Line 6 – 8 in Algo-rithm 8. In particular, the CG iteration number recorded in this figure correspondsto the operation op in Line 8 of Algorithm 8. For our proposed algorithm, thereare three separate sections, each section’s CG iteration numbers correspond to theformation of Lanczos vectors in the 3rd-, 2nd- and the 1st-level respectively. Sincewe may also update some of these Lanczos vector during the refinement process,therefore some overlaps in the recording of CG iteration numbers corresponding tothose Lanczos vector are observed. With the spectrum-preserving hierarchical pre-conditioner M introduced in our algorithm, the CG iteration number for computingA−1b for some b is tremendously reduced. In contrast, the CG iteration number forIRLM-CG is the largest at the beginning but decreases exponentially and asymptot-ically converges to our proposed result. For IRLM-ICCG, the incomplete Choleskyfactorization does not capture the spectrum update and therefore the iteration num-bers is uniform throughout the computation. This observation is also consistent tothe time complexity as shown in Table 4.8. Figure 4.10b shows the correspondingnormalized plot, where the iteration number is normalized by log( 1ε ).

Similar results can also be plotted for the 4-level Brain and the 3-level SwissRollexamples. We therefore skip those plots to avoid repetition.

4.8 On Compressed EigenproblemsIn this section, we compare the our method for compressed eigenproblem andthe method proposed by Ozolinš et al. [90]. We start with the straightforwardcompression directly using the eigenvectors corresponding to smallest eigenvalues,

132

(a) (b)

Figure 4.10: (a) The PCG iteration number in the 4-level SwissRoll example. TheIRLM-ICCG methods exhibits a uniform iteration number, while the IRLM-ID hasan exponential decaying iteration number. For our proposed algorithm, since thespectrum-preserving hierarchical preconditioner M is employed, the CG iterationnumber is minimum. This is also consistent to the time complexity shown inTable 4.8. (b) The corresponding normalized plot, where the iteration number isnormalized by log(ε ).

which can be obtained by solving the following optimization problem:

Ψ = argminΨ

∑Ni=1 ψ

Ti Aψi,

s.t. ψTi ψ j = δi j, i, j = 1, 2, · · · , N .

(4.16)

The compression via eigenvectors is well known as the PCA method is optimalin 2-norm sense for fixed compressed dimension N . However, computing a largenumber of eigenvectors is a hard problem itself, not to mention that we actuallyintend to approximate eigenpairs using the compressed operator. Also the spatiallyextended profiles of exact eigenvectors make them less favorable in many fields ofresearches. Then as modification, Ozolinš et al. [90] added a L1 regularization termto impose the desired locality on Ψ. They modified the optimization problem (4.16)as

Ψ = argminΨ

∑Ni=1

(ψT

i Aψi +1µ ‖ψi‖1

),

s.t. ψTi ψ j = δi j, i, j = 1, 2, · · · , N .

(4.17)

The L1 regularization, as widely used in many optimization problems for sparsitypursuit, effectively ensures each output ψi to have spatially compact support, at thecost of compromising the approximation accuracy compared to PCA. The factor µcontrols the locality of Ψ. A smaller µ gives more localized profiles of Ψ, which,however, results in larger compression error for a fixed N . The loss of approximation

133

accuracy can be compensated by increasing, yet not significantly, the basis numberN . An algorithm based on the split Bregman iteration was also proposed in [90] toeffectively solve the problem (4.17). In summary, their work provides an effectivemethod to find a bunch of localized basis functions that can approximately span theeigenspace of smallest eigenvalues of A.

Although our approach to operator compression is originally developed from adifferent perspective based on FEM, it can be reformulated as an optimizationproblem similar to (4.16). In fact, to obtain the basis Ψ used in our method, we cansimply replace the nonlinear constraints ψT

i ψ j = δi j, i, j = 1, 2, · · · , N, by linearconstraints ψT

i φ j = δi j, i, j = 1, 2, · · · , N, to get

Ψ = argminΨ

∑Ni=1 ψ

Ti Aψi,

s.t. ψTi φ j = δi j, i, j = 1, 2, · · · , N .

(4.18)

Here Φ = [φ1, φ2, · · · , φN ] is a dual basis that we construct ahead of Ψ to providea priori compression error estimate as stated in Theorem 2.2.1. As the constraintsbecome linear, problem (4.18) can be solved explicitly by Ψ = A−1Φ(ΦT A−1Φ)−1

as mentioned in (2.24). Instead of imposing locality by adding L1 regularizationas in (4.17), we obtain the exponential decaying feature of Ψ by constructing eachdual basis function φi locally. That is the locality of Φ and the strong correlationΨTΦ = I automatically give us the locality of Ψ under energy minimizing property.The optimization form (4.18) was derived by Owhadi in [88] where Ψ was used asthe FEM basis to solve second-order elliptic equations with rough coefficients. Thismethodology was then generalized to problems on higher-order elliptic equations[53], general Banach space [89] and general sparse PD matrix [52]. In all previousworks the nice spectral property of Ψ has been observed and in particular theeigenspace corresponding to the smallest M eigenvalues of A can be approximatelywell spanned by Ψ of a relative larger dimension N = O(M).

To further compare the problems (4.17) and (4.18), we test both of them on the one-dimensional Kronig–Penney (KP) model studied in [90] with rectangular potentialwells replaced by inverted Gaussian potentials. In this example, the matrix A

comes from discretization of the PDE operator −12∆ + V (x) defined on the domain

Ω with periodic boundary condition. In particular, Ω = [0, 50], and V (x) =−V0

∑Nel

j=1 exp(−

(x−x j )2

2δ2). As in [90], we discretize Ω with 512 equally spaced

nodes, and we choose Nel = 5, V0 = 1, δ = 3, and x j = 10 j − 5(instead of x j = 10 j

in [90], which essentially changes nothing).

134

For problem (4.18), we divide Ω into N equal-length intervals ΩiNi=1, and choose

the dual basis Φ = [φ1, φ2, · · · , φN ] such that φi is the discretization of the indicatorfunction 1(Ωi)(1(Ωi)(x) = 1 for x ∈ Ωi, otherwise 1(Ωi)(x) = 0). We use Ψo todenote the exact result of problem (4.18), namely Ψo = A−1Φ(ΦT A−1Φ)−1. SinceΨo is not orthogonal, we should compute the eigenvalues from the general eigenvalueproblem ΨT

o AΨov = λΨToΨov(Lemma 4.2.2) as approximations of the eigenvalues

of A. We use λo to denote these approximate eigenvalues.

For problem (4.17), we use Algorithm 1 and exactly the same parameters providedin [90], which means we are simply reproducing their results, except that we usea finer discretization (512 rather than 128) and we shift the potential V (x). Wehave used normalized Φ as the initial guess for Algorithm 1 in [90], and chooseµ = 10. We use Ψcm to denote the result of problem (4.17). We use λcm to denotethe eigenvalues of ΨT

cm AΨcm.

N = 50 N = 75 N = 100

Figure 4.11: Results of problems (4.17) and (4.18) for N = 50(first column),N = 75(second column) and N = 100(third column). First row: the first 50eigenvalues of A and those of the compressed problems. Second row: examples oflocal basis functions. Third row: examples of local basis functions in log scale.

135

We compare the approximate eigenvalues to the first 50 eigenvalues of A. Thefirst row of Figure 4.11 shows that both methods give very good approximations ofλ(A). And when N increases, the approximations become better. But relatively,the results λcm from Figure 4.11 is closer to the ground truth than our resultsλo from (4.18). To improve our results, we simply solve problem (4.18) again,but this time using previous result Ψo as the dual basis. That is we computeΨo2 = A−1Ψo(ΨT

o A−1Ψo), and compute eigenvalues λo2 from the general eigenvalueproblem ΨT

o2AΨo2v = λΨTo2Ψo2v. We can see that the approximate eigenvalues λo2

are even closer to the ground truth. An interpretation of this improvement is thatif we see Ψo = A−1Φ(ΦT A−1Φ)−1 as a transformation from Φ to Ψo, then the partA−1Φ is equivalent to applying inverse power method to make Ψo more aligned tothe eigenspace of the smallest eigenvalues, while the part (ΦT A−1Φ)−1 is to forceΨT

oΦ = I so Ψo inherits some weakened locality from Φ. So if we apply thistransformation to Ψo again to obtain Ψo2, Ψo2 will approximate the eigenspace ofthe smallest eigenvalues better, but with more loss of locality.

In the second row and third row of Figure 4.11, we show some examples of thelocal basis functions ψcm, ψo and ψo2 (all are normalized to have unit l2 norm).Interestingly, these basis functions are not just localized as expected, but indeedthey have very similar profiles. One can see that for N = 75, the basis functionsψcm and ψo are almost identical. So it seems that in spite of how we impose locality(either the L1 minimization approach, or the construction of the dual basis Φ), thelocal behaviors of the basis functions are determined by the operator A itself. Webelieve that this “coincidence” is governed by some intrinsic property of A, whichmay be worth further exploring and studying. If we can understand a higher level,unified mechanism that results in the locality of the basis, we may be able to extendthese methods to a more general class of operators. We also observed that as N

goes large, ψo and ψo2 become more and more localized since the support of thedual basis functions are smaller and smaller. However the locality of ψcm doesn’tchange much as N increases, since we use the same penalty parameter µ = 10 for(4.17) in this experiment.

We would like to remark that, though these two problems result in local basisfunctions with similar profiles, problem (4.17) requires to use the split Bregmaniteration to obtain the N basis functions simultaneously. In our problem (4.18),since the constraints are linear and separable, the basis functions can be obtainedseparately and directly without iteration. Furthermore, thanks to the exponential

136

decay of the basis functions, each subproblem for obtaining one basis function canbe restricted to a local domain without significant loss of accuracy, and the resultinglocal problem can be solved very efficiently. For definitions and detailed propertiesof these local problems for obtaining localized basis, recall Section 2.2.2. Thereforethe algorithm for solving problem (4.18) can be implemented locally and in parallel.

137

C h a p t e r 5

CONCENTRATION OF EIGENVALUE SUMS ANDGENERALIZED LIEB’S CONCAVITY THEOREM

We introduce in this chapter our concentration inequalities on partial sums of eigen-values of randomHermitian matrices. They are extensions of the existing analogousconcentration results on extreme eigenvalues. The establishment of our new concen-tration results is an immediate application of a generalize Lieb’s concavity theoremwhich is also one of our main contributions.

In Section 5.1, we present our main concentration results after briefly reviewing thecorresponding existing results established by Tropp. The important concept of thek-trace functions and its properties are introduced in Section 5.2. Our generalizedLieb’s concavity results are displayed and discussed in Section 5.3, followed by theirproofs in Section 5.4. Section 5.5 is dedicated to the proofs of our concentrationinequalities. We provide in Section 5.6 some fundamental supporting materials andin Section 5.7 some other interesting theoretical results on the k-trace.

NotationThroughout this chapter, we will use some notations independent of the previouschapters, as we are focusing more on theoretical derivations.

For any positive integers n,m, we write Cn for the n-dimensional complex vectorspaces equipped with the standard l2 inner products, and Cn×m for the space of allcomplex matrices of size n×m. LetHn be the space of all n×n Hermitian matrices,H+n be the convex cone of all n × n Hermitian, positive semidefinite matrices, andH++n be the convex cone of all n × n Hermitian, positive definite matrices. For anymatrix A ∈ Hn, we denote by λi (A) the ith largest eigenvalue of A. We denote theLoewner partial orders on Hn: for any A, B ∈ Hn, we write A B or B A ifA− B ∈ H+n , and write A B or B ≺ A if A− B ∈ H++n . We write 0 for square zeromatrices of suitable size according to the context, and In for the identity matrix ofsize n × n.

For any function f : R→ R, the extension of f to a function fromHn toHn is given

138

by

f (A) =n∑

i=1f (λi)uiu∗i , A ∈ Hn,

where λ1, λ2, · · · , λn are the eigenvalues of A, and u1, u2, · · · , un ∈ Cn are the cor-

responding normalized eigenvectors. A function f is said to be operator monotoneincreasing (or decreasing) if A B implies f (A) f (B) (or f (A) f (B)); f issaid to be operator convex (or concave) on some convex set S, if

τ f (A) + (1 − τ) f (B) f (τA + (1 − τ)B) (or f (τA + (1 − τ)B)),

for any A, B ∈ S and any τ ∈ [0, 1]. For example, the function A 7→ Ar isboth operator monotone increasing and operator concave on H+n for r ∈ [0, 1] (theLoewner-Heinz theorem [72], [46], [60], see also [24]). One can find more detailsand properties of matrix functions in [24, 126]. For any A ∈ Cn×m, we denote by‖A‖p the standard Schatten p-norm,

‖A‖p = Tr[|A|p]1p , (5.1)

where |A| = (A∗A)12 . In particular, we write ‖A‖ = ‖A‖∞ = the largest singular

value of A.

5.1 Concentration of Eigenvalue SumsIn many problems, the assemble of a large complicated matrix is by samplingindependent random matrices with simpler structures. To study the spectrum of theexpected matrix by only evaluating the spectrum of the sample mean, we need toknow how the latter deviates from the former. Therefore, we often need to estimatethe spectrum of random matrices of the form Y =

∑mi=1 X (i), where X (i)1≤i≤m is

a finite sequence of independent, random matrices of the same size. In particular,we consider the Hermitian case where X (i) ∈ Hn. An important tool to study theextreme eigenvalues of a sum of random matrices is the following master bounds byTropp ([122, Theorem 3.6.1]).

Proposition 5.1.1. For any finite sequence of independent, randommatrices X (i)mi=1 ⊂

Hn,

Eλmax( m∑

i=1X (i)

)≤ inf

θ>0

1θlog Tr

[exp

( m∑i=1

logE exp(θX (i)))], (5.2a)

Eλmin( m∑

i=1X (i)

)≥ sup

θ<0

1θlog Tr

[exp

( m∑i=1

logE exp(θX (i)))]. (5.2b)

139

Furthermore, for all t ∈ R,

Pλmax

( m∑i=1

X (i))≥ t

≤ inf

θ>0e−θtTr

[exp

( m∑i=1


Pλmin

( m∑i=1

X (i))≤ t

≤ inf

θ<0e−θtTr

[exp

( m∑i=1

logE exp(θX (i)))]. (5.3b)

Tropp’s proof of Proposition 5.1.1 is based on the method of matrix transform anda theorem of Lieb (the concavity of function (1.2)). The essential use of the Lieb’stheorem in Tropp’s argument is to establish the Jensen’s inequality

ETr[exp(H + log A)

]≤ Tr

[exp(H + logEA)

],

for any random matrix A ∈ H++n and any fixed H ∈ Hn. Using a generalized Lieb’stheorem (Theorem 5.3.4), we will extend the above inequality to

E(Trk

[exp(H + log A)

] ) 1k ≤

(Trk

[exp(H + logEA)

] ) 1k

andE log Trk

[exp(H + log A)

]≤ log Trk

[exp(H + logEA)

]for any 1 ≤ k ≤ n. The k-trace function will be introduced in detail in thenext section. These more general inequalities, along with the k-trace inequality(5.15), will help us extend Tropp’s matrix master bounds on the largest (or smallest)eigenvalue to their counterparts on the sum of the k largest (or smallest) eigenvaluesas follows. Recall that we denote by λi (A) the ith largest eigenvalue of any matrixA ∈ Hn.

Theorem5.1.2. Given any finite sequence of independent, randommatrices X (i)mi=1 ⊂

Hn, let Y =∑m

i=1 X (i). Then for any 1 ≤ k ≤ n,

k∑i=1

λi (EY ) ≤ Ek∑

i=1λi (Y ) ≤ inf

θ>0

1θlog Trk

[exp

( m∑i=1


k∑i=1

λn−i+1(EY ) ≥ Ek∑

i=1λn−i+1(Y ) ≥ sup

θ<0

1θlog Trk

[exp

( m∑i=1

logE exp(θX (i)))].

(5.4b)

140

Furthermore, for all t ∈ R,

P

k∑i=1

λi (Y ) ≥ t≤ inf

θ>0e−

θtk

(Trk

[exp

( m∑i=1

logE exp(θX (i)))] ) 1

k , (5.5a)

P

k∑i=1

λn−i+1(Y ) ≤ t≤ inf

θ<0e−

θtk

(Trk

[exp

( m∑i=1

logE exp(θX (i)))] ) 1

k . (5.5b)

The generic estimates in Theorem 5.1.2 are not user-friendly in practice, as thek-trace is hardly computable in general. However, we can further establish moreconcrete estimates for particular random matrices in this class. For example, weconsider the scenario where each X (i) in the sum Y =

∑mi=1 X (i) also satisfies

0 ≤ λn(X (i)) ≤ λ1(X (i)) ≤ c for some uniform constant c > 0. One of the moststudied scenarios in this setting arises with an undirected, no-selfloop, randomlyweighted graph G = (V, E,W ) of n vertices. The weights wi j for all edges ei j, i < j

are uniformly bounded and follow independent distributions. Then the Laplacian ofsuch random graph is given by L =

∑1≤i< j≤n wi j L(i, j), where

L(i, j) =

i j

i

j

1 −1

−1 1

, i < j,

is the sub-Laplacian corresponding to the edge ei j with unit weight. In particular,if each weight follows a Bernoulli distribution B(1, p) for some uniform constantp ∈ [0, 1], the random graph is known as the famous Erdős-Rényi model.

For these kind of problems, one may want to study the extreme eigenvalues of EY

but can only afford to compute the eigenvalues of random samples of Y , as samplesof Y are much sparser than EY in general. Then it is crucial to know how theeigenvalues of Y =

∑mi=1 X (i) deviate from the corresponding eigenvalues of EY .

For such purposes, Tropp used the master bounds in Proposition 5.1.1 and delicatebounds for the matrix moment generating function ([122, Lemma 5.4.1]) to provethe following Chernoff-type inequalities ([122, Theorem 5.1.1]).

Proposition 5.1.3. Given any finite sequence of independent, random matricesX (i)mi=1 ⊂ Hn, let Y =

∑mi=1 X (i). Assume that for each i, 0 ≤ λn(X (i)) ≤

141

λ1(X (i)) ≤ c for some uniform constant c ≥ 0. Then

Eλmax(Y ) ≤ infθ>0

eθ − 1θ

λmax(EY ) +cθlog n, (5.6a)

Eλmin(Y ) ≥ supθ>0

1 − e−θ

θλmin(EY ) −

cθlog n. (5.6b)

Furthermore, for any t ≥ 0,

P λmax(Y ) ≥ (1 + ε)λmax(EY ) ≤ n(

eε

(1 + ε)1+ε

)λmax(EY )/c

, ε ≥ 0, (5.7a)

P λmin(Y ) ≤ (1 − ε)λmin(EY ) ≤ n(

e−ε

(1 − ε)1−ε

)λmin(EY )/c

, ε ∈ [0, 1), (5.7b)

In this positive semidefinite case, we can again extend the above results of Troppto their analogs on partial sums of eigenvalues. In particular, we will use Theo-rem 5.1.2, the second inequality in (5.15) and bounds on matrix moment generatingfunctions ([122, Lemma 5.4.1]) to prove the following Chernoff-type bounds.

Theorem5.1.4. Given any finite sequence of independent, randommatrices X (i)mi=1 ⊂

Hn, letY =∑m

i=1 X (i). Assume that for each i, 0 ≤ λn(X (i)) ≤ λ1(X (i)) ≤ c for someuniform constant c ≥ 0. Then for any 1 ≤ k ≤ n, we have expectation estimates

Ek∑

i=1λi (Y ) ≤ inf

θ>0

eθ − 1θ

k∑i=1

λi (EY ) +cθlog

(nk

), (5.8a)

Ek∑


θ>0

1 − e−θ

θ

k∑i=1

λn−i+1(EY ) −cθlog

(nk

), (5.8b)

and tail bounds

P

k∑i=1

λi (Y ) ≥ (1 + ε)k∑

i=1λi (EY )

(5.9a)

≤

(nk

) 1k(

eε

(1 + ε)1+ε

) 1ck

∑ki=1 λi (EY )

, ε ≥ 0,

P

k∑i=1

λn−i+1(Y ) ≤ (1 − ε)k∑

i=1λn−i+1(EY )

(5.9b)

≤

(nk

) 1k(

e−ε

(1 − ε)1−ε

) 1ck

∑ki=1 λn−i+1(EY )

, ε ∈ [0, 1).

142

In Tropp’s results (5.6), namely the case k = 1, the cost of “switching” λ and Eis of scale log n. In our estimates (5.8), the gap factor becomes log

(nk

)≤ k log n

that grows only sub-linearly in k, which is reasonable as we are estimating the sumof the k largest (or smallest) eigenvalues. We shall further compare our estimatesto another related work. Tropp et al. [40] introduced a subspace argument basedon Courant–Fischer characterization of eigenvalues to prove tail bounds for alleigenvalues of Y =

∑mi=1 X (i). Though not stated in [40], the following expectation

estimates for all eigenvalues can also be established using the subspace argument.Give any finite sequence of independent, random matrices X (i)mi=1 under the sameassumption as in Theorem 5.1.4, and Y =

∑mi=1 X (i), we have for any 1 ≤ k ≤ n,

Eλk (Y ) ≤ infθ>0

eθ − 1θ

λk (EY ) +cθlog(n − k + 1), (5.10a)

Eλk (Y ) ≥ supθ>0

1 − e−θ

θλk (EY ) −

cθlog k . (5.10b)

Summing (5.10a) (or (5.10b)) for the k largest (or smallest) eigenvalues, we imme-diately obtain

Ek∑

i=1λi (Y ) ≤ inf

θ>0

eθ − 1θ

k∑i=1

λi (EY ) +cθlog

k∏i=1

(n − i + 1), (5.11a)

Ek∑


θ>0

1 − e−θ

θ

k∑i=1


k∏i=1

(n − i + 1). (5.11b)

Therefore, our expectation estimates (5.8a) and (5.8b) are sharper for partial sums ofeigenvalues, as log

(nk

)< log

∏ki=1(n − i + 1) for k > 1. In particular, if one choose

k to be a fixed proportion of n, then log(

nk

)= O(k), while log

∏ki=1(n − i + 1) =

O(k log n). Our results are then better by a factor log n.

At last, we remark that if we combine Theorem 5.3.1 and the subspace argumentin [40], we shall be able to derive similar expectation estimates and tail bounds forthe sum of arbitrary successive eigenvalues of Y =

∑mi=1 X (i). We will leave this

potential extension to future works.

5.2 K-traceFor any matrix A ∈ Cn×n with eigenvalues λ1, λ2, · · · , λn, we define the k-trace ofA to be

Trk[A] =∑

1≤i1<i2<···<ik≤n

λi1λi2 · · · λik , 1 ≤ k ≤ n. (5.12)

143

In particular, Tr1[A] = Tr[A] is the normal trace of A, and Trn[A] = det[A] is thedeterminant of A. If we write A(i1···ik,i1···ik ) for the k × k principal submatrix of A

corresponding to the indices i1, i2, · · · , ik , then an equivalent definition of the k-traceof A is given by

Trk[A] =∑

1≤i1<i2<···<ik≤n

det[A(i1···ik,i1···ik )], 1 ≤ k ≤ n. (5.13)

Using the second definition (5.13), one can check that for any 1 ≤ k ≤ n, the k-traceenjoys the cyclic invariance property like the normal trace and the determinant. Thatis for any A, B ∈ Cn, Trk[AB] = Trk[BA].

The motivation of studying the k-trace is to provide effective estimates on the sum ofthe k largest (or smallest) eigenvalues of, in particular, random Hermitian matrices.As we know, the sum of the k largest eigenvalues of a Hermitian matrix A (as avariable) is a convex function in A. So if A is random, we have, for example, theexpectation estimate

Ek∑

i=1λi (A) ≥

k∑i=1

λi (EA)

by Jensen’s inequality. Recall that λi (A) denotes the ith largest eigenvalue of A. Thisprovides a lower bound for the left hand side if we know EA; or a way to bound theright hand side from above if we can sample A. However, an estimate between thesetwo quantities in an inverse fashion is more interesting and challenging. For thek = 1 case, Tropp [122] related the largest eigenvalue to the trace of the exponentialusing the observation

λ1(A) ≤ log Tr[exp(A)

]≤ λ1(A) + log n, A ∈ Hn, (5.14)

which only introduced a gap of log scale in dimension. In particular, the firstinequality in (5.14) was applied to the random matrix A, and the second wasapplied to EA. Tropp then applied the Lieb’s theorem to the intermediate quantityTr

[exp(A)

](more precisely, with A = H + logY for some fixed matrix H and some

random matrices Y ) to derive inverse expectation estimates and a series of matrixconcentration inequalities. Inspired by Tropp’s work, we will develop expectationestimates and tail bounds on the sum of the k largest eigenvalues based on an analogof (5.14) that

k∑i=1

λi (A) ≤ log Trk[exp(A)

]≤

k∑i=1

λi (A) + log(nk

), A ∈ Hn, (5.15)

144

This is actually the starting point of this paper. Naturally, manipulating the interme-diate quantity Trk

[exp(A)

]in our estimates requires extending the Lieb’s theorem

to a general k-trace version. Note that the sum of the k smallest eigenvalues can behandled in a similar spirit.

Apart from its particular use discussed above, the k-trace is of theoretical interestby itself, as it has many interpretations corresponding to different aspects of matrixtheories. Writing D(A(1), A(2), · · · , A(n)) the mixed discriminant of any n matricesA(1), A(2), · · · , A(n) ∈ Cn×n, we have the identity

Trk[A] =(nk

)· D(A, · · · , A︸︷︷︸

k

, In, · · · , In︸︷︷︸n−k

).

Also, if we consider the kth exterior algebra∧k (Cn), we can then interpret the k-traceof A as

Trk[A] = TrL(∧k (Cn ))[M (k)

0 (A)],

where TrL(∧k (Cn )) is the normal trace on the operator space L(∧k (Cn)), andM (k)

0 (A) ∈ L(∧k (Cn)) is defined as M (k)0 (A)(v1 ∧ v2 ∧ · · · ∧ vk ) = Av1 ∧ Av2 ∧

· · · ∧ Avk , for any v1 ∧ v2 ∧ · · · ∧ vk ∈ ∧k (Cn). These two interpretations, in fact,

will provide us important tools for studying the k-trace and proving our general-ized Lieb’s concavity theorems. We will discuss more on this in Section 5.6.1 andSection 5.6.2.

Throughout this work, we will be using the following nice properties of the k-trace.

Proposition 5.2.1. For any positive integers n, k, 1 ≤ k ≤ n, the k-trace functionTrk[·] satisfies the following:

(i) Cyclicity: Trk[AB] = Trk[BA], A, B ∈ Cn×n.

(ii) Homogeneity: Trk[αA] = αkTrk[A], A ∈ Cn×n, α ∈ C.

(iii) Monotonicity: For any A, B ∈ H+n , Trk[A] ≥ Trk[B], if A B; Trk[A] >Trk[B], if A B. In particular, Trk[A] ≥ 0, A ∈ H+n .

(iv) Concavity: The function A 7→ (Trk[A])1k is concave on H+n .

(v) Hölder’s Inequality: Trk[|AB |r] 1r ≤ Trk[|A|p]1pTrk[|B |q]

1q , for any r, p, q ∈

(0,+∞], 1p +1q =

1r , and any A, B ∈ Cn×n.

145

(vi) Consistency: For any n, k ≤ n ≤ n, and any A ∈ Cn×n, Trk

*,

A 0

0 0+-n×n

=

Trk[A].

Proof. (i), (ii), (iii) and (vi) can be easily verified by the definitions (5.12) and (5.13).(iv) is a consequence of the general Brunn–Minkowski theorem (Corollary 5.6.3)introduced in Section 5.6.1. (v) is a direct result of expression (5.51) in Section 5.6.2.In fact, since the normal trace enjoys the Hölder’s inequality, we have

Trk[|AB |r]1r = Tr[M (k)

0 (A)M (k)0 (B)r]

1r

≤ Tr[|M (k)0 (A) |p]

1pTr[|M (k)

0 (B) |q]1q

= Trk[|A|p]1pTrk[|B |q]

1q .

We have used multiple properties of the operator M (k)0 (A) introduced in Sec-

tion 5.6.2.

A Simplified NotationAs we will be working with the general setting of k-trace, there is no need to specifya particular value of k. Therefore, we will sometimes write

φ(A) = (Trk[A])1k

for notational simplicity. Note that the function φ also satisfies (i) cyclicity, (iii)monotonicity, (v) Hölder’s inequality and (vi) consistency as in Proposition 5.2.1.But now the map A 7→ φ(A) is homogeneous of order 1 and is concave on H+n .Abusing notation, we will also refer the function φ as the k-trace.

5.3 Generalized Lieb’s Concavity TheoremsWe present in this section the main theoretical results of this part of work.

Theorem 5.3.1 (Generalized Lieb’s Theorem). For any 1 ≤ k ≤ n and any H ∈ Hn,the function

A 7−→(Trk[exp(H + log A)]

) 1k (5.16)

is concave on H++n . Equivalently, for any 1 ≤ k ≤ n, the function

A 7−→ log Trk[exp(H + log A)] (5.17)

is concave on H++n .

146

This theorem extends the Lieb’s theorem ([67, Theorem 6]) from the normal traceto the k-traces, and hence connects it to theories of multilinear, symmetric formsof matrices. We will give a proof of this theorem in Section 5.4.1. As we will seein the proof, Theorem 5.3.1 is a joint result of the original Lieb’s theorem and theAlexandrov–Fenchel inequality for mixed discriminants (Theorem 5.6.1). One canget some first ideas by looking at three extreme cases that relate to some well-knownresults.

• k = 1: The concavity of A 7−→ Tr[exp(H + log A)] is the original Lieb’stheorem.

• k = n: We have(Trn[exp(H + log A)]

) 1n = det[A] 1

n · exp( 1nTr[H]) andlog Trn[exp(H + log A)] = log det[A] + Tr[H]. The concavity of det[A] 1

n

or log det[A] is known as the Brunn–Minkowski theorem [105].

• H = 0: The concavity of Trk[A]1k , also known as the general Brunn–

Minkowski theorem, is a consequence of the Alexandrov–Fenchel inequalityfor mixed discriminants. We will review this in Section 5.6.1.

Theorem 5.3.4 is already sufficient to yield our concentration results in Section 5.1.However, with more advanced techniques of matrix analysis, we can actually provestronger results in the k-trace setting, completing the whole story of generalizingLieb’s concavity theorem.

Lemma 5.3.2. For any r ∈ [0, 1], s ∈ [0, 1r ] and any K ∈ Cn×n, the function

A 7−→ Trk[(K∗Ar K )s] 1

k (5.18)

is concave on H+n .

Theorem 5.3.3 (Generalized Lieb’s Concavity Theorem). For any p, q ∈ [0, 1], s ∈[0, 1

p+q ] and any K ∈ Cn×m, the function

(A, B) 7−→ Trk[(B

q2 K∗ApK B

q2 )s

] 1k (5.19)

is jointly concave on H+n ×H+m.

Theorem 5.3.4 (Generalized Lieb’s Theorem). For any H ∈ Hn and any p j mj=1 ⊂

[0, 1] such that∑m

j=1 p j ≤ 1, the function

(A(1), A(2), . . . , A(m)) 7−→ Trk[exp

(H +

m∑j=1

p j log A( j))] 1k (5.20)

147

is jointly concave on (H++n )×m.

Lemma 5.3.2 is a k-trace extension of the concave part of Lemma 2.8 in [27] (seealso [47, Theorem 4.1]). The latter is a consequence of Lieb’s original concavitytheorem. However, we will first apply the technique of operator interpolation toprove Lemma 5.3.2 independently, and then use it to derive Theorem 5.3.3 and theother results. In fact, the convexity/concavity of function (5.18) in the trace case withdifferent ranges of p, s has been used as the first step towards many consequentialresults on more complicated trace functions. Theorem 5.3.3 is our generalizedLieb’s concavity theorem, which extends Hiai’s Theorem 2.1 in [48] (see also [25,Theorem 4.4]) from trace to k-trace. Note that, as stated in Hiai’s theorem, theconcavity also holds for −1 ≤ p, q ≤ 0, 1

p+q ≤ s ≤ 0. In fact, by consistencyand continuity of φ, it suffices to consider the case when n = m and A, B, K areinvertible. Then, following the discussions in [26], we have that

Trk[(B

q2 K∗ApK B

q2 )s

] 1k= Trk

[(B−

q2 K−1A−p(K−1)∗B−

q2 )−s

] 1k ,

which concludes the concavity for the mirrored range of p, q, s. Our derivation fromLemma 5.3.2 to Theorem 5.3.3 will be a counterpart of Zhang’s simple and usefulvariational argument in [136] for the trace case.

Theorem 5.3.4 is a generalization of Corollary 6.1 in [67] (from trace to k-trace).Lieb [67] proved the original trace version by checking the non-positiveness of thesecond-order directional derivatives (orHessians). Wewill first proveTheorem5.3.4for m = 1 by applying the Lie product formula to Lemma 5.3.2 (taking p, q → 0, s →+∞), hence providing an alternative proof of Theorem 5.3.4. We then improve theresult from m = 1 to m ≥ 1 using a k-trace version of the Araki–Lieb–Thirringinequality (Lemma 5.4.3).

The proofs of Lemma 5.3.2, Theorem 5.3.3 and Theorem 5.3.1 will be given inSection 5.4.2.

5.4 Proof of Concavity TheoremsWe provide in this section detailed proofs of our main results in Section 5.3. Wewill be heavily relying on a variety of supporting materials that are presented inSection 5.6.

148

5.4.1 A First Proof by Matrix DerivativeAs Theorem 5.3.4 is sufficient to lead to our concentration results, we provide in thissubsection an independent proof of it. As mentioned before, this generalized Lieb’stheorem is a joint result of the original Lieb’s theorem and the Alexandrov–Fenchelinequality. But we will not use the Lieb’s theorem directly. Instead, we will beusing the following lemma, also due to Lieb [67], which is an equivalence of theLieb’s theorem. We provide the proof here only to show its connection to the Lieb’stheorem.

Lemma 5.4.1. Given any A ∈ H++n , C ∈ Hn, define

T =∫ ∞

0(A + τI)−1C(A + τI)−1dτ,

R = 2∫ ∞

0(A + τI)−1C(A + τI)−1C(A + τI)−1dτ,

then for any B ∈ H+n , we have∫ 1

0dsTr

[T BsT B1−s] − Tr[RB

]≤ 0. (5.21)

Proof. By Lieb’s theorem (Theorem 6 [67]), for any H ∈ Hn, the function g(t) =Tr

[exp(H + log(A + tC))

]is concave. Also this function is smooth in t for t small

enough such that A + tC ∈ H++n . Thus we have ∂2

∂t2g(t) |t=0 = g′′(0) ≤ 0. WriteB(t) = exp(H + log(A + tC)), and

T (t) =∫ ∞

0(A + tC + τI)−1C(A + tC + τI)−1dτ,

R(t) = 2∫ ∞

0(A + tC + τI)−1C(A + tC + τI)−1C(A + tC + τI)−1dτ.

It is easy to check that ∂∂t log(A+ tC) = T (t), T ′(t) = −R(t) by formulas (5.58) and

(5.59). Then using the derivative formulas (5.57), (5.58) and (5.59), we have

g′(t) =∫ 1

0dsTr

[(B(t))sT (t)(B(t))1−s] = Tr

[T (t)B(t)

],

and

g′′(t) = Tr[T ′(t)B(t)

]+

∫ 1

0dsTr

[T (t)(B(t))sT (t)(B(t))1−s] .

For any B ∈ H++n , wemay choose H = log B−log A, so that B(0) = exp(H+log A) =B. And notice that T (0) = T, R(0) = R, we thus have

−Tr[RB

]+

∫ 1

0dsTr

[T BsT B1−s] = g′′(0) ≤ 0.

The extension to B ∈ H+n can be done by continuity.

149

We use this variant of the Lieb’s theorem since it is more convenient for us to choosearbitrary B ∈ H++n in inequality (5.21). In particular, if we choose B to be diagonalwith diagonal entries b1, b2, · · · , bn, then Lemma 5.4.1 implies that

n∑i=1

Riibi ≥

∫ 1

0ds

n∑i=1

n∑j=1

Ti j bsjTjib1−s

i , (5.22)

which is a critical estimate that we will be using.

We now prove a trace inequalities using Lemma 5.4.1 and the Alexandrov–Fenchelinequality Theorem 5.6.1. This inequality can be seen as a generalization ofLemma 5.4.1 from k = 1 to all 1 ≤ k ≤ n.

Lemma 5.4.2. For arbitrary A ∈ H++n , B ∈ H+n,C ∈ Hn, let

T =∫ ∞

0(A + τI)−1C(A + τI)−1dτ,

R = 2∫ ∞

0(A + τI)−1C(A + τI)−1C(A + τI)−1dτ,

then we have, for all 1 ≤ k ≤ n,∫ 1

0Tr

[M

(k)1 (T Bs; Bs)M (k)

1 (T B1−s; B1−s)]ds − Tr

[M

(k)1 (RB; B)

]≤ Tr

[M

(k)2 (T B,T B, B)

].

(5.23)

Proof. We first claim that we only need to consider the case when B = Λ is adiagonal matrix with all diagonal entries λ1, λ2, · · · , λn ≥ 0. Indeed, if B is notdiagonal, we consider its eigenvalue decomposition B = UΛUT , where U ∈ Cn×n

is unitary, and Λ is a diagonal matrix whose diagonal entries λ1, λ2, · · · , λn arethe eigenvalues of B. Since B ∈ H+n , λ1, λ2, · · · , λn ≥ 0. If we introduce A =

UT AU, C = UTCU, T = UTTU, R = UT RU, we have

T =∫ ∞

0( A + τI)−1C( A + τI)−1dτ,

R = 2∫ ∞

0( A + τI)−1C( A + τI)−1C( A + τI)−1dτ.

150

Then using the cyclic invariance of trace and the product properties (5.47), we have,for example,

Tr[M


1 (T B1−s; B1−s)]

= Tr[M

(k)1 (UUTTUΛsUT ;UΛsUT )M (k)

1 (UUTTUΛ1−sUT ;UΛ1−sUT )]

= Tr[M

(k)0 (U)M (k)

1 (TΛs;Λs)M (k)0 (UT )M (k)

0 (U)M (k)1 (TΛ1−s;Λ1−s)M (k)

0 (UT )]

= Tr[M

(k)1 (TΛs;Λs)M (k)

0 (UT )M (k)0 (U)M (k)

1 (TΛ1−s;Λ1−s)M (k)0 (UT )M (k)

0 (U)]

= Tr[M


1 (TΛ1−s;Λ1−s)].

Using the same trick to the other terms in the inequalities (5.23), one can show that(5.23) is equivalent to∫ 1

0Tr

[M


1 (TΛ1−s;Λ1−s)]ds − Tr

[M

(k)1 (RΛ;Λ)

]≤ Tr

[M

(k)2 (TΛ, TΛ,Λ)

].

which justifies our claim. In what follows, we will still use A,C,T, R for A, C, T, R.

We now prove (5.23) with B = Λ being diagonal whose diagonal entries areλ1, λ2, · · · , λn ≥ 0. Using product properties (5.47) and identities in Lemma 5.6.4,we rewrite the quantity

I ,

∫ 1

0dsTr

[M


1 (TΛ1−s;Λ1−s)]− Tr

[M

(k)1 (RΛ;Λ)

]=

∫ 1

0ds

Tr

[M

(k)1 (TΛsTΛ1−s;Λ)

]+ Tr

[M

(k)2 (TΛs

Λ1−s,ΛsTΛ1−s;Λ)

]

− Tr[M

(k)1 (RΛ;Λ)

]=

∫ 1

0ds

n∑i=1

( n∑j=1

Ti jλsjTjiλ

1−si

)d (n,k)

i

+∑

1≤i, j≤n

(TiiλiλsjTj jλ

1−sj − Tjiλiλ

si Ti jλ

1−sj )g(n,k)

i j

−

n∑i=1

Riiλid(n,k)i .

Then replacing bi by λid(n,k)i in (5.22), we have by Lemma 5.4.1

n∑i=1

Riiλid(n,k)i ≥

∫ 1

0ds

n∑i=1

n∑j=1

Ti j (λ j d(n,k)j )sTji (λid

(n,k)i )1−s .

151

Therefore we have

I ≤

∫ 1

0ds

∑1≤i, j≤n

Ti jTjiλsjλ

1−si d (n,k)

i +∑

1≤i, j≤n

(TiiTj jλiλ j − TjiTi jλ1+si λ1−s

j )g(n,k)i j

−∑

1≤i, j≤n

Ti jTji (λ j d(n,k)j )s (λid

(n,k)i )1−s

.

We now investigate the integrand for any s ∈ [0, 1]. We have∑1≤i, j≤n

Ti jTjiλsjλ

1−si d (n,k)

i +∑

1≤i, j≤n

(TiiTj jλiλ j − TjiTi jλ1+si λ1−s

j )g(n,k)i j

−∑

1≤i, j≤n

Ti jTji (λ j d(n,k)j )s (λid

(n,k)i )1−s

=

n∑i=1

T2ii λid

(n,k)i +

∑1≤i< j≤n

|Ti j |2(λs

jλ1−si d (n,k)

i + λsi λ

1−sj d (n,k)

j )

+∑

1≤i, j≤n

TiiTj jλiλ jg(n,k)i j −

∑1≤i< j≤n

|Ti j |2(λ1+s

i λ1−sj + λ1+s

j λ1−si )g(n,k)

i j

−

n∑i=1

T2ii λid

(n,k)i

−∑

1≤i< j≤n

|Ti j |2(λs

jλ1−si (d (n,k)

j )s (d (n,k)i )1−s + λs

i λ1−sj (d (n,k)

i )s (d (n,k)j )1−s)

=∑

1≤i, j≤n

TiiTj jλiλ jg(n,k)i j

+∑

1≤i< j≤n

|Ti j |2

λs

jλ1−si d (n,k)

i + λsi λ

1−sj d (n,k)

j − (λ1+si λ1−s

j + λ1+sj λ1−s

i )g(n,k)i j

− λsjλ

1−si (d (n,k)

j )s (d (n,k)i )1−s − λs

i λ1−sj (d (n,k)

i )s (d (n,k)j )1−s

≤∑

1≤i, j≤n

TiiTj jλiλ jg(n,k)i j − 2

∑1≤i< j≤n

|Ti j |2λiλ jg

(n,k)i j

=∑

1≤i, j≤n

(TiiTj jλiλ j − Ti jTjiλiλ j )g(n,k)i j .

We have used g(n,k)i j = g(n,k)

ji and g(n,k)ii = 0. The proof of the last inequality above is

as follows. For any s ∈ [0, 1], we have a Hölder-type inequality for scalars:

(a + b)s (c + d)1−s ≥ asc1−s + bsd1−s, a, b, c, d ≥ 0.

Then using the expansion relations (5.53),

d (n,k)i = λ jg

(n,k)i j + g(n,k+1)

i j , d (n,k)j = λig

(n,k)i j + g(n,k+1)

i j ,

152

we have

λsjλ

1−si d (n,k)

i + λsi λ

1−sj d (n,k)

j − (λ1+si λ1−s

j + λ1+sj λ1−s

i )g(n,k)i j

− λsjλ

1−si (d (n,k)

j )s (d (n,k)i )1−s − λs

i λ1−sj (d (n,k)

i )s (d (n,k)j )1−s

≤ λsjλ

1−si (λ jg

(n,k)i j + g(n,k+1)

i j ) + λsi λ

1−sj (λig

(n,k)i j + g(n,k+1)

i j )

− (λ1+si λ1−s

j + λ1+sj λ1−s

i )g(n,k)i j

− λsjλ

1−si (λs

i λ1−sj g(n,k)

i j + g(n,k+1)i j ) − λs

i λ1−sj (λs

jλ1−si g(n,k)

i j + g(n,k+1)i j )

= − 2λiλ jg(n,k)i j .

Finally using Lemma 5.6.4 again, we have

I ≤

∫ 1

0ds

∑1≤i, j≤n

(TiiTj jλiλ j − Ti jTjiλiλ j )g(n,k)i j

= Tr[M

(k)2 (TΛ,TΛ,Λ)

].

This completes the proof of Lemma 5.4.2.

We are now ready to prove Theorem 5.3.1 with all established results.

Proof of Theorem 5.3.1. We first prove the concavity of the function

fH,k (A) =(Trk

[exp

(H + log A

)] ) 1k .

Notice that given any A ∈ H++n and any C ∈ Hn, there exist some ε such thatA + tC ∈ H++n for t ∈ (−ε, ε ), and fH,k (A + tC) is continuously differentiable withrespect to t on (−ε, ε ). In what follows, any function of t is always assumed to bedefined on a reasonable neighborhood of 0 (so that A + tC ∈ H++n ).

Then the concavity of fH,k (A) onH++n is equivalently to the statement that ∂2∂t2 fH,k (A+

tC) ≤ 0|t=0 for all choices of A ∈ H++n ,C ∈ Hn. Now fix a pair A,C, defineB(t) = exp

(H + log(A + tC)

)∈ H++n and g(t) = Trk

[exp

(H + log(A + tC)

)]=

Tr[M

(k)0 (B(t))

]> 0. Since fH,k (A + tC) = g(t)

1k , and

∂2

∂t2fH,k (A + tC) =

1kg(t)

1k −2

(g′′(t)g(t) −

k − 1k

(g′(t))2),

we then need to show that g(0)g′′(0) ≤ k−1k (g′(0))2. Using the derivative formulas

(5.58) and (5.59), we have

∂

∂tlog(A + tC) =

∫ ∞

0(A + tC + xIn)−1C(A + tC + xIn)−1 , T (t),

153∂

∂tT (t) = −2

∫ ∞

0(A + tC + xIn)−1C(A + tC + xIn)−1C(A + tC + xIn)−1 , −R(t).

Then using formula (5.57), we can compute the first derivative

g′(t) =∂

∂tTr

[M

(k)0 (B(t))

]= Tr

[M

(k)1 (B′(t); B(t))

]= Tr

[M

(k)1

( ∫ 1

0dsB(t)sT (t)B(t)1−s; B(t)

)]=

∫ 1

0dsTr

[M

(k)1

(B(t)sT (t)B(t)1−s; B(t)s B(t)1−s)]

=

∫ 1

0dsTr

[M

(k)0 (B(t)s)M (k)

1 (T (t); In)M (k)0 (B(t)1−s)

]=

∫ 1

0dsTr

[M

(k)1 (T (t); In)M (k)

0 (B(t)1−s)M (k)0 (B(t)s)

]= Tr

[M

(k)1 (T (t); In)M (k)

0 (B(t))].

We have used the fact thatM (k)1 (X ;Y ) is linear in X , and so we can pull out the

integral symbol. Then the second derivative is

g′′(t) =∂

∂tTr

[M

(k)1 (T (t); In)M (k)

0 (B(t))]

= Tr[M

(k)1 (T (t); In)M (k)

1 (B′(t); B(t))]+ Tr

[M

(k)1 (T ′(t); In)M (k)

0 (B(t))]

=

∫ 1

0dsTr

[M

(k)1 (T (t); In)M (k)

0 (B(t)s)M (k)1 (T (t); In)M (k)

0 (B(t)1−s)]

− Tr[M

(k)1 (R(t); In)M (k)

0 (B(t))].

Write T = T (0), R = R(0) and B = B(0). We then apply Lemma 5.4.2 to reach

g(0)g′′(0)

= Tr[M

(k)0 (B)

] ∫ 1

0dsTr

[M

(k)1 (T ; In)M (k)

0 (Bs)M (k)1 (T ; In)M (k)

0 (B1−s)]

− Tr[M

(k)1 (R; In)M (k)

0 (B)]

= Tr[M

(k)0 (B)

] ∫ 1

0dsTr

[M


1 (T B1−s; B1−s)]

− Tr[M

(k)1 (RB; B)

]

≤ Tr[M

(k)0 (B)

]Tr

[M

(k)2 (T B,T B, B)].

154

To continue, we use definitions (5.46), identity (5.50) and the Alexandrov–Fenchelinequality (Theorem 5.6.1) to obtain

Tr[M

(k)0 (B)

]Tr

[M

(k)2 (T B,T B, B)]

=n!

k!(n − k)!D(B, · · · , B︸︷︷︸

k

, In, · · · , In︸︷︷︸n−k

)

×n!

(k − 2)!(n − k)!D(T B,T B, B · · · , B︸︷︷︸

k−2

, In, · · · , In︸︷︷︸n−k

)

≤k − 1

k

( n!(k − 1)!(n − k)!

D(T B, B · · · , B︸︷︷︸k−1

, In, · · · , In︸︷︷︸n−k

))2

=k − 1

kTr

[M

(k)1 (T B, B)

]2=

k − 1k

(g′(0))2.

We therefore have proved that

g(0)g′′(0) ≤k − 1

k(g′(0))2.

The concavity of fH,k (A) on H++n then follows.

Next we prove the equivalence of (i) the concavity of the functions fH,k (A) on H++n

and (ii) the concavity of the functions fH,k = log Trk[exp

(H + log A

)]onH++n . (i)

⇒ (ii) is trivial. To prove (ii)⇒ (i), we need the following lemma.

Let x = (x1, x2) ∈ (0,+∞)2. Define f (x) = Trk[exp

(H + log(x1A1 + x2A2)

)].

One can easily verify that fH,k (A) being concave on H++n is equivalent to f (x)1k

being concave on (0,+∞)2 for arbitrary but fixed choice of A1, A2 ∈ H++n , H ∈ Hn.Similarly, fH,k (A) being concave on H++n is equivalent to log f (x) being concaveon (0,+∞)2 for arbitrary but fixed choice of A1, A2 ∈ H++n , H ∈ Hn. Using thedefinition of the k-trace Trk , it is easy to check that f (x) is homogeneous of orderk. By Lemma 5.6.9, we know f (x)

1k is concave if and only if log f (x) is concave.

Therefore we have (i)⇔ (ii).

5.4.2 Proofs of Stronger ResultsWe provide in this subsection the proofs of our stronger results, Theorem 5.3.3 andTheorem5.3.1. The first step is to prove Lemma 5.3.2 using the technique of operatorinterpolation as in Lemma 5.6.8. The key of applying Lemma 5.6.8 is to choosesome proper holomorphic function G(z) and then interpolating on some power in[0, 1]. In particular, we will perform interpolation on w = 1

s to prove Lemma 5.3.2.

155

Our choice of the holomorphic functions G(z) in the following proof is inspired byLieb’s constructions in [67] for the use of maximum modulus principle. Recall thatwe will write φ(·) = Trk[]

1k for notational simplicity.

Proof of Lemma 5.3.2. Note that for s ∈ [0, 1], the concavity of (5.18) is a directconsequence of the facts that (i) φ is monotone increasing and concave on H+n , and(ii) X 7→ X r and X 7→ X s are operator monotone increasing and operator concaveon H+n . So in what follows we may assume that 1 ≤ s ≤ 1

r . We need to show that,for any A, B ∈ H+n and any τ ∈ [0, 1],

τφ((K∗Ar K )s) + (1 − τ)φ

((K∗Br K )s) ≤ φ(

(K∗Cr K )s),where C = τA + (1 − τ)B. We may assume that A, B ∈ H++n and K is invertible.Once this is done, the general result for A, B ∈ H+n and K ∈ Cn×n can be obtained bycontinuity. Let w = 1

s ∈ [r, 1] and r = rs ∈ [0, 1], so r = rw. Let M = Cr2 K , and

let M = Q |M | be the polar decomposition of M for some unitary matrix Q. SinceC, K are both invertible, |M | ∈ H++n . We then define two functions from S to Cn×n:

GA(z) = Ar z2 C−

r z2 Q |M |

zw , GB (z) = B

r z2 C−

r z2 Q |M |

zw , z ∈ S,

where S is given by (5.61). In what follows we will use X for A or B. We then have

φ((K∗X r K )s) = φ(

(M∗C−r2 X rC−

r2 M)s)

= φ((|M |Q∗C−

rw2 X

rw2 X

rw2 C−

rw2 Q |M |)

1w)

= φ(|GX (w) |

2w).

Since A, B,C, M are now fixed matrices in H++n , GA(z) and GB (z) are apparentlyholomorphic in the interior of S and continuous on the boundary. Also, it is easy tocheck that ‖GA(z)‖ and ‖GB (z)‖ are uniformly bounded on S, since Re(z) ∈ [0, 1].Therefore we can use inequality (5.65) with θ = w, pθ = 2

w to obtain

φ(|GX (w) |2w )

≤

∫ +∞

−∞

dt(2(1 − w)

wp0β1−w (t)φ

(|GX (it) |p0

)+

2p1βw (t)φ

(|GX (1 + it) |p1

)).

We still need to choose some p0, p1 ≥ 1 satisfying 1−wp0+ w

p1= 1

pw= w

2 to proceed.Note that GX (it) = X

ir t2 C−

ir t2 Q |M |

itw are now unitary matrices for all t ∈ R since

X,C, |M | ∈ H++n , and thus |GX (it) |p0 = In for all p0. Therefore we can takep0 → +∞, p1 = 2 to obtain

φ( |GX (w) |2w ) ≤

∫ +∞

−∞

dt βw (t)φ(|GX (1 + it) |2

).

156

Further, for each t ∈ R, we have

φ(|GX (1 + it) |2

)= φ

(GX (1 + it)∗GX (1 + it)

)= φ

(|M |

(1−it )w Q∗C−

r (1−it )2 X rC−

r (1+it )2 Q |M |

(1+it )w

)= φ

(|M |

1w Q∗C−

r (1−it )2 X rC−

r (1+it )2 Q |M |

1w),

where we have used the cyclicity of φ (|M | itw is unitary) for the last equality.Therefore we have

τφ(|GA(1 + it) |2

)+ (1 − τ)φ

(|GB (1 + it) |2

)= τφ

(|M |

1w Q∗C−

r (1−it )2 ArC−

r (1+it )2 Q |M |

1w)

+ (1 − τ)φ(|M |

1w Q∗C−

r (1−it )2 BrC−

r (1+it )2 Q |M |

1w)

≤ φ(|M |

1w Q∗C−

r (1−it )2 (τAr + (1 − τ)Br )C−

r (1+it )2 Q |M |

1w)

≤ φ(|M |

1w Q∗C−

r (1−it )2 CrC−

r (1+it )2 Q |M |

1w)

= φ(|M |

2w)

= φ((M∗M)

1w).

The first inequality above is due to the concavity of φ, the second inequality is dueto (i) that φ is monotone increasing onH+n and (ii) that X 7→ X r is operator concaveon H+n for r ∈ (0, 1]. Finally, since φ

((M∗M)

1w)is independent of t, and βw (t) is a

density on R, we obtain that

τφ((K∗Ar K )s) + (1 − τ)φ

((K∗Br K )s)

= τφ(|GA(w) |

2w)+ (1 − τ)φ

(|GB (w) |

2w)

≤ φ((M∗M)

1w)

= φ((K∗Cr K )s) .

So we have proved the concavity of (5.18) on H+n .

Our next proof, using essentially Hölder’s inequalities for the k-trace, is adaptedfrom Zhang’s proofs of Theorem 1.1 and Theorem 3.3 in [136].

Proof of Theorem 5.3.3. Without loss of generality, we may assume that m = n.

Otherwise we can replace A by *,

A 00 0

+-and K by *

,

K

0+-if n < m; or replace B

157

by *,

B 00 0

+-and K by

(K 0

)is n > m. By the consistency of φ, these changes

of variables will not affect whether the function (5.19) is jointly concave in (A, B)or not. We write X = A

p2 and Y = K B

q2 . Let s1 =

p+qp s, s2 =

p+qq s, so 1

s =1s1+ 1

s2.

Then for any Z ∈ Cn×n that is invertible, we have by Hölder’s inequality ((v) inProposition 5.2.1) that

φ((B

q2 K∗ApK B

q2 )s) = φ(

|X Z Z−1Y |2s)≤ φ

(|X Z |2s1 ) s

s1 φ(|Z−1Y |2s2 ) s

s2

≤ss1φ((Z∗X∗X Z )s1 ) + s

s2φ((Y ∗(Z−1)∗Z−1Y )s2 )

=ss1φ((Z∗X∗X Z )s1 ) + s

s2φ((Z−1YY ∗(Z−1)∗)s2 ) .

We have used the fact that φ(

f (|M |))= φ

(f ( |M∗ |)

)for any matrix M ∈ Cn×n and

any function f , since φ is only a function of eigenvalues and the spectrums of f (|M |)and f (|M∗ |) are the same. Let (XY )∗ = Q |(XY )∗ | be the polar decomposition of(XY )∗, where Q ∈ Cn×n is unitary. So we have XYQ = |(XY )∗ |. If X and Y areinvertible, we can particularly choose Z = YQ |(XY )∗ |−

s1s1+s2 to have

X Z = XYQ |(XY )∗ |−s1

s1+s2 = |(XY )∗ |s2

s1+s2 ,

and Z−1Y = |(XY )∗ |s1

s1+s2 Q∗,

which yields the equality

ss1φ((Z∗X∗X Z )s1 ) + s

s2φ((Z−1YY ∗(Z−1)∗)s2 ) = φ(

|(XY )∗ |2s1s2s1+s2

)= φ

(|XY |2s) .

Now for general X,Y that are not necessarily invertible, we can always find twosequences of invertible matrices X j

+∞j=1, Yj

+∞j=1 such that (i) X j → X , Yj → Y

and (ii) X∗j X j X∗X , YjY ∗j YY ∗. Such sequences can be easily obtained byperturbing the singular values of X and Y . For each pair of (X j,Yj ), we can findsome invertible Z j so that the above equality holds. Also, for any invertible Z , wehave Z∗X∗j X j Z Z∗X∗X Z , Z−1YjY ∗j (Z−1)∗ Z−1YY ∗(Z−1)∗, and thus

φ((Z∗X∗X Z )s1 ) ≤ φ(

(Z∗X∗j X j Z )s1 ),φ((Z−1YY ∗(Z−1)∗)s2 ) ≤ φ(

(Z−1YjY ∗j (Z−1)∗)s2 )

158

by Theorem 5.7.6, which we will prove in Section 5.7. Then we obtain a sequenceof inequalities,

φ(|XY |2s) ≤ infss1φ((Z∗X∗X Z )s1 ) + s

s2φ((Z−1YY ∗(Z−1)∗)s2 ) : Z invertible

≤ss1φ((Z∗j X∗X Z j )s1 ) + s

s2φ((Z−1j YY ∗(Z−1j )∗)s2 )

≤ss1φ((Z∗j X∗j X j Z j )s1 ) + s

s2φ((Z−1j YjY ∗j (Z−1j )∗)s2 )

= φ(|X jYj |2s).

But since φ( |XY |2s) = lim j→+∞ φ( |X jYj |2s) by continuity, the first inequality above

must be an equality. Therefore, by substituting X = Ap2 ,Y = K B

q2 , we obtain that

φ((B

q2 K∗ApK B

q2 )s)

= infss1φ((Z∗ApZ )s1 ) + s

s2φ((Z−1K BqK∗(Z−1)∗)s2 ) : Z invertible.

Note that s ∈ [0, 1p+q ] implies s1 ∈ [0, 1p ], s2 ∈ [0,

1q ]. By Lemma 5.3.2, the map

(A, B) 7−→ss1φ((Z∗ApZ )s1 ) + s

s2φ((Z−1K BqK∗(Z−1)∗)s2 )

is jointly concave in (A, B) for every invertible Z , which then implies the jointconcavity of the infimum over all invertible Z .

Proof of Theorem 5.3.4 (Part I). We first prove the theorem for m = 1. Let r =

p1 ∈ [0, 1], and K (N ) = (K (N ))∗ = exp( 12N H

), N ≥ 1. Then using the Lie product

formula

limN→+∞

(exp

( 12N

Y)exp

( 1N

X)exp

( 12N

Y))N

= exp(X + Y ), X,Y ∈ Hn,

we have

limN→+∞

φ((

(K (N ))∗ArN K (N ))N

)= lim

N→+∞φ

((exp

( 12N

H)exp

( rN

log A)exp

( 12N

H))N

)= φ

(exp(H + r log A)

).

By Theorem 5.3.3, for each N ≥ 1, φ((

(K (N ))∗ArN K (N ))N

)is concave in A, thus

the limit function φ(exp(H + r log A)

)is also concave in A.

159

To go from m = 1 to m > 1 in Theorem 5.3.4, we need to use the convexity ofthe map A 7→ φ(exp(A)), which we will prove via the following lemmas. Theyare the k-trace extensions of the Araki–Lieb–Thirring inequality [4], the Golden–Thompson inequality and a variant of the Peierls–Bogoliubov inequality (see, e.g.,[24, Theorem 2.12]).

Lemma 5.4.3 (k-trace Araki–Lieb–Thirring Inequality). For any A, B ∈ H+n , thefunction

t 7→ Trk[(B

t2 At B

t2 )

1t]

is monotone increasing on (0,+∞), that is

Trk[(B

t2 At B

t2 )

1t]≤ Trk

[(B

s2 As B

s2 )

1s], 0 < t ≤ s. (5.24)

Proof. Using the definition and properties of the operator M (k)0 in Section 5.6.2,

we have that

Trk[(B

t2 At B

t2 )

1t]= Tr

[M

(k)0

((B

t2 At B

t2 )

1t)]

= Tr[(

(M (k)0 (B))

t2 (M (k)

0 (A))t (M (k)0 (B))

t2) 1

t].

Since A, B ∈ H+n ,M(k)0 (A) andM (k)

0 (B) are both Hermitian and positive semidef-inite. Then inequality (5.24) follows immediately from the original Araki–Lieb–Thirring inequality [4] for normal trace.

Lemma 5.4.4 (k-trace Golden–Thompson Inequality). For any A, B ∈ Hn,

Trk[exp(A + B)

]≤ Trk

[exp(A) exp(B)

], (5.25)

with equality holds if and only if AB = BA.

Proof. We here only prove the inequality. The condition for equality will be justifiedin an alternative proof of this lemma in Section 5.6.2. For any A, B ∈ Hn, we have

Trk[exp(A + B)

]= lim

m→+∞Trk

[(exp

( 12m

B)exp

( 1m

A)exp

( 12m

B))m]

≤ Trk[exp

(12

B)exp

(A)exp

(12

B)]

= Trk[exp

(A)exp

(B)].

The first equality above is the Lie product formula, and the inequality is due toLemma 5.4.3.

160

Lemma 5.4.5 (k-trace Peierls–Bogoliubov Inequality). The function

A 7−→ log Trk[exp(A)

](5.26)

is convex on Hn.

Proof. For any A, B ∈ Hn, τ ∈ (0, 1), by Lemma 5.4.4 we have

Trk[exp(τA + (1 − τ)B)

]≤ Trk

[exp

(τA

)exp

((1 − τ)B

)]≤ Trk

[exp(A)

] τTrk[exp(B)

]1−τ .The second inequality above is Hölder’s. Therefore

log Trk[exp(τA + (1 − τ)B)

]≤ τ log Trk

[exp(A)

]+ (1 − τ) log Trk

[exp(B)

].

We remark that Lemma 5.4.5 can also be proved using the operator interpolationin Lemma 5.6.8. Lemma 5.4.5 immediately implies that A 7→ log φ

(exp(A)

)=

1k log Trk

[exp(A)

]is convex, and thus A 7→ φ

(exp(A)

)is convex. This will help

us prove improve from m = 1 to m ≥ 1 in Theorem 5.3.4.

Proof of Theorem 5.3.4 (Part II). Given any A( j)mj=1, B( j)mj=1 ⊂ H++n , and any

τ ∈ [0, 1], letC ( j) = τA( j)+ (1−τ)B( j), 1 ≤ j ≤ m. Since the map X 7→ φ(exp(X ))is convex on Hn, the map X 7→ φ(exp(L + X )) is also convex on Hn for arbitraryL ∈ Hn. Now define

L = H +m∑

j=1p j logC ( j), r =

m∑j=1

p j ≤ 1.

If r = 0, the result is trivial; so we may assume that r > 0. We then have that

φ(exp(H +

m∑j=1

p j log X ( j)))

= φ(exp

(H + r

m∑j=1

p j

r(log X ( j) − logC ( j)) +

m∑j=1

p j logC ( j)))= φ

(exp

(L + r

m∑j=1

p j

r(log X ( j) − logC ( j))

))≤

m∑j=1

p j

rφ(exp(L + r log X ( j) − r logC ( j))

), X ( j) = A( j), B( j) .

161

For each j, by the concavity of (5.20) for m = 1, we have

τφ(exp(L + r log A( j) − r logC ( j))

)+ (1 − τ)φ

(exp(L + r log B( j) − r logC ( j))

)≤ φ

(exp(L + r log(τA( j) + (1 − τ)B( j)) − r logC ( j))

)= φ

(exp(L)

).

Therefore we obtain that

τφ(exp(H +

m∑j=1

p j log A( j)))+ (1 − τ)φ

(exp(H +

m∑j=1

p j log B( j)))

≤

m∑j=1

p j

rφ(exp(L)

)= φ

(exp(H +

m∑j=1

p j logC ( j))),

that is, (5.20) is jointly concave on (H++n )×m for all m ≥ 1.

5.4.3 Revisiting Previous Proofs in the Trace CaseIn this section, we will review some previous works on concavity results of tracefunctions. The purpose is to compare by example the spirits of methods from dif-ferent perspectives, so as to explain why we have chosen the interpolation techniqueby Stein and the variational method by Zhang to prove our main results. For awhole story of known results on both convexity and concavity, one may refer to[3, 24, 26, 67, 136]. As mentioned in the introduction, many alternative proofs ofLieb’s concavity theorem (the concavity of function (1.3)) have been found sinceits original establishment by Lieb in 1973. A proof using matrix tensors was givenby Ando [3] in 1979 (see also Carlen [24]). Ando interpreted Tr[K∗ApK Bq] asan inner product on the tensor space Cn ⊗ Cm and translated the Lieb’s concavitytheorem to the statement that the map (A, B) 7→ Ap ⊗ Bq is operator concave. Andothen proved the latter using the integral representation of Ap (see below). Here ⊗is the Kronecker product. Later, Nikoufar et al. [85] provided a simpler proof forthe concavity of (A, B) 7→ Ap ⊗ Bq using the concept of matrix perspectives (see,e.g., [36]). We summarize the ideas of their proofs as follows. For simplicity, weassume that p + q = 1. The result for p + q = r < 1 can be further obtained byusing the fact that A 7→ Ar is operator monotone increasing and operator concavefor r ∈ [0, 1]. For p ∈ [0, 1], the map A 7→ (A ⊗ Im)p = Ap ⊗ Im from H+n to H+nm is

162

operator concave, and thus its perspective from H+n ×H+m to H+nm,

(A, B) 7→ (In ⊗ B)12((In ⊗ B)−

12 (A ⊗ Im)(In ⊗ B)−

12) p(In ⊗ B)

12 = Ap ⊗ B1−p,

is jointly operator concave in (A, B). The simplified expression above results fromthe fact that A ⊗ Im commutes with In ⊗ B. For any K ∈ Cn×m, we have the identity(a variant of Ando’s interpretation)

Tr[K∗ApK B1−p] =⟨ m∑

j=1(Ke(m)

j ) ⊗ e(m)j , Ap ⊗ (BT )1−p

m∑j=1

(Ke(m)j ) ⊗ e(m)

j

⟩Cn⊗Cm

,

(5.27)

where BT is the transpose of B, and e(m)j = (0, . . . ,

jth1, . . . , 0) ∈ Cm. Note that since

B ∈ H+n , BT is also in H+n . Since B 7→ BT is linear, the joint operator concavity of(A, B) 7→ Ap ⊗ B1−p then implies the joint concavity of (A, B) 7→ Tr[K∗ApK B1−p].

As an application, Carlen and Lieb [27] applied the Lieb’s concavity theorem toprove the concavity of A 7→ Tr[(K∗Ar K )s] for r ∈ [0, 1], s ∈ [1, 1r ] (they used aslightly different but equivalent expression) based on a variational characterizationof this function (the supremum part of [27, Lemma 2.2]). We here provide asimplified proof that captures the main spirit. For any A, B ∈ H+n , K ∈ Cn×n, let

X = (K∗Ar K )s, Y = (K∗Br K )s .

Then for any τ ∈ [0, 1], note that 1s ≤ 1, r + (1 − 1

s ) ≤ 1, we have

τTr[X] + (1 − τ)Tr[Y ]

= τTr[K∗Ar K X1− 1

s]+ (1 − τ)Tr

[K∗Br KY 1− 1

s]

≤ Tr[K∗(τA + (1 − τ)B)r K (τX + (1 − τ)Y )1−

1s]

≤ Tr[(

K∗(τA + (1 − τ)B)r K) s] 1

sTr[τX + (1 − τ)Y

]1− 1s

= Tr[(

K∗(τA + (1 − τ)B)r K) s] 1

s(τTr[X] + (1 − τ)Tr[Y ]

)1− 1s ,

(5.28)

where the first inequality is due to Lieb’s concavity theorem with p = r, q =

1 − 1s , p + q = r + 1 − 1

s ≤ 1, and the second inequality is Hölder’s. The above thensimplifies to

τTr[(K∗Ar K )s] + (1 − τ)Tr[(K∗Br K )s] ≤ Tr

[(K∗(τA + (1 − τ)B)r K

) s],which concludes the concavity of A 7→ Tr[(K∗Ar K )s].

163

The variational characterizations of Tr[(K∗Ar K )s] in [27] can be abstracted to thefollowing two formulas ([26, Lemma 12]): for any X ∈ H+n ,

Tr[X s] = supsTr[XY ] − (s − 1)Tr[Y

ss−1 ] : Y ∈ H+n

, if s > 1 or s < 0, (5.29)

Tr[X s] = infsTr[XY ] + (1 − s)Tr[Y−

s1−s ] : Y ∈ H++n

, if 0 < s < 1. (5.30)

Further, these variational formulas were used to derive the convexity/concavity ofthe function (A, B) 7→ Tr[(B

q2 K∗ApK B

q2 )s] for a partial range of p, q, s (partial

to the necessary conditions on p, q, s for the corresponding convexity/concavity tohold). For example, formula (5.30) was used by Carlen et al. to prove the concavityfor 0 ≤ p, q ≤ 1, 0 < s ≤ 1

1+q [25, Theorem 4.4]. Recently, the above formulas weremodified by Zhang to the following [136, Theorem 3.3]: for any X,Y ∈ Cn×n andany r0, r1, r2 > 0 such that 1

r0= 1

r1+ 1

r2,

Tr[|XY |r1] = sup

r1r0Tr[|X Z |r0] −

r1r2Tr[|Y−1Z |r2] : Z ∈ Cn×n

, (5.31)

Tr[|XY |r0] = inf

r0r1Tr[|X Z |r1] +

r0r2Tr[|Z−1Y |r2] : Z ∈ Cn×n invertible

. (5.32)

Zhang then used them to provide a unified variational proof of the joint convex-ity/concavity of (A, B) 7→ Tr[(B

q2 K∗ApK B

q2 )s] for the full range of p, q, s, finally

confirming that the sufficient conditions on p, q, s coincide with the necessary con-ditions.

These arguments using matrix tensors and variational forms were also adopted byTropp [122] to provide an alternative proof of the concavity of A 7→ Tr[exp(H +

log A)]. Tropp’s proof is based on his variational formula for trace,

Tr[M] = supT∈H++n

Tr[T log M − T logT + T

], M ∈ H++n , (5.33)

which relies on the non-negativeness of the matrix relative entropy

D(T ; M) = Tr[T (logT − log M) − (T − M)

], T, M ∈ H++n .

The non-negativeness of D(T ; M) is a classical result of Klein’s inequality (see Petz[93, Proposition 3], Carlen [24, Theorem 2.11] or Tropp [122, Proposition 8.3.5]).Tropp substituted M = exp(H + log A) in (5.33) to obtain

Tr[exp(H + log A)] = supT∈H++n

(Tr[T H] + Tr[A] − D(T ; A)

).

The concavity of A 7→ Tr[exp(H + log A)] then follows from this variational expres-sion, the joint convexity of D(T ; A) in (T, A), and the fact that g(x) = supy∈Ω f (x, y)

164

is concave in x if f (x, y) is jointly concave in (x, y) and Ω is convex (see, e.g., [27,Lemma 2.3]). The joint convexity of the relative entropy D(T ; A) was first dueto Lindblad [70]. One can also see Ando [3], Carlen [24] and Tropp [122] foralternative proofs.

A Methodology of another flavor arose from the use of complex analysis. In thesame year of Lieb’s original paper on his concavity theorem, Epstein [37] provideda unified way of proving the concavity of A 7→ Tr[K∗ApK Aq], A 7→ Tr[(K∗AsK )

1s ]

and A 7→ Tr[exp(H + log A)], using a derivative argument based on the theory ofHerglotz functions (functions that are analytic in the open upper half plane C+ andhave a positive imaginary part). Epstein’s method relies on integral representationsof matrix powers, which has a deep connection to a profound theorem of Loewner’s: a real-valued function f on (0,+∞) is operator monotone if and only if it admitsan analytic continuation to a Herglotz function. Loewner’s theorem provides aconvenient tool for understanding trace functions by passing the study of a desiredproperty from the integral to the integrand that has a relatively simpler form. Onemay see the book of Donoghue [35] for a full account of this theory. Epstein’sapproach was further developed by Hiai [47, 48] and, for example, adopted in hisfirst proof of the concavity of (A, B) 7→ Tr[(B

q2 K∗ApK B

q2 )s] for the full range

0 ≤ p, q ≤ 1, 0 ≤ s ≤ 1p+q [48, Theorem 2.1]. Specifically, by substituting

σ = s(p + q) < 1, X = (Bq2 K∗ApK B

q2 )

1p+q in the integral formula,

Xσ =sin(πσ)

π

∫ ∞

0t−1+σ (In + tX−1)−1dt, 0 < σ < 1, X ∈ H++n , (5.34)

Hiai passed the joint concavity of (A, B) 7→ Tr[(Bq2 K∗ApK B

q2 )s] to the joint con-

cavity of (A, B) 7→ Tr[1 + t(B

q2 K∗ApK B

q2 )−

1p+q

]−1(the case s(p + q) = 1 was

handled differently by directly taking t → +∞). He then proved the latter using thederivative argument introduced by Epstein.

The introduction of complex analysis into the field has also led to another branchof methods based on interpolation theories. In his original proof of the concavityof (A, B) 7→ Tr[K∗ApK Bq] for 0 ≤ p, q ≤ 1, p + q ≤ 1 [67, Theorem 1], Liebmade use of the maximum modulus principle to concentrate the powers of Ap, Bq

to the only power p + q on A or B, and then proceeded with the operator concavityof X 7→ X p+q. This technique, relying on the holomorphicity of X z (X ∈ H+n ) asa function of z, already shed some light on the use of complex interpolation theo-ries. Later, Uhlmann [124] applied interpolation theories explicitly to again proveLieb’s concavity theorem, by interpreting Tr[K∗ApK B1−p] as an interpolation be-

165

tween Tr[K∗AK] and Tr[K∗K B]. Uhlmann’s quadratic interpolation of seminormsextended the relevant works of Lieb to positive linear forms of arbitrary *-algebras.Kosaki [58] further explored the idea of quadratic interpolation of seminorms andcaptured Lieb’s concavity theorem in the frame of general interpolation theories.

The abovemethodologies all have a unique perspective in understanding complicatedtrace functions. However, some of them are found hardly generalizable to k-trace,as they more or less rely on the linearity of the normal trace. For example, Ando’sidentity (5.27) and Tropp’s variational formula (5.33). Our k-trace function φ fork > 1 is at best sub-additive since it is concave and homogeneous of order 1. Hiai’suse of the integral formula also lives on linearity in that the trace operation canbe pulled into the integral. Though we can interpret the k-trace as the trace ofan anti-symmetric tensor, the power of 1

k will have to stay out of the integral, andthe anti-symmetric tensor in the integrand will also bring huge difficulties to thederivative argument that comes after.

The variational method introduced by Carlen and Lieb needs the linearity of traceas well in proving the concavity of A 7→ Tr[(K∗Ar K )s]. One can see this in the lastequality of (5.28), which will become an inequality in the undesirable direction ifTr[·] is replaced by Trk[·]

1k , since Trk[X] 1k is concave in X . In fact, the argument

in (5.28) that reduces the concavity of A 7→ Tr[(K∗Ar K )s] for r ∈ [0, 1], s ∈ [1, 1r ]to the joint concavity of (A, B) 7→ Tr[K∗ApK Bq] for p, q ∈ [0, 1], p + q ≤ 1 wasperformed in a variational manner in [27] based on the following formula (which isequivalent to the expression of the supremum case in [27, Lemma 2.2]):

Tr[(K∗Ar K )s] = supX∈H++n

sTr[K∗Ar K X1− 1

s ] − (s − 1)Tr[X]= sup

X∈H++nΨ(A, X ),

(5.35)which can be derived using Hölder’s inequality for trace. Recall that g(x) =supy∈Ω f (x, y) is concave in x if f (x, y) is jointly concave in (x, y) andΩ is convex.Since Tr[K∗Ar K X1− 1

s ] is jointly concave in (A, X ) (as r + (1 − 1s ) ≤ 1) and Tr[X]

is linear in X , the function Ψ(A, X ) is jointly concave in (A, X ). The concavity ofA 7→ Tr[(K∗Ar K )s] then follows from (5.35). It is then natural to consider a similarvariational formula for the k-trace that can also be shown by Hölder’s inequality:

Trk[(K∗Ar K )s]1k = sup

X∈H++n

sTrk[K∗Ar K X1− 1

s ]1k − (s − 1)Trk[X]

1k

= sup

X∈H++nΨk (A, X ).

(5.36)

166

Note that for s > 1, we have −(s − 1) < 0 and thus −(s − 1)Trk[X] 1k is convexin X since Trk[·]

1k is concave (this sign of −(s − 1) does not give trouble in the

trace case due to the linearity of trace). Therefore, even provided that (A, X ) 7→Trk[K∗Ar K X1− 1

s ] 1k is jointly concave, the function Ψk (A, X ) is not guaranteed to bejointly concave in (A, X ), and the variational formula in (5.36) fails to conclude theconcavity of A 7→ Trk[(K∗Ar K )s] 1k . As a consequence, we have not found a way toadapt this particular argument into a proof of Lemma 5.3.2.

However, these variational approaches, especially Zhang’s variational characteriza-tions, are conveniently applicable to the derivation of the more general cases givenLemma 5.3.2, as we have seen in the proof of Theorem 5.3.3. One can see thatthe k-trace version of Hölder’s inequality plays an essential role in the process,which has suggested us to employ complex interpolation theories in the first place asinterpolation of operators is based essentially on Hölder’s inequality. In particular,we found the operator interpolation technique (Theorem 5.6.7) developed by Stein[116](1956) nicely compatible to our problem. One can derive a variety of interpola-tion inequalities systematically by choosing G(z) properly in inequality (5.62). Thechoice of G(z) we make, inspired by Lieb’s original construction, gives the proofof our Lemma 5.3.2. We remark that, Lemma 5.6.8 does not only apply to trace ork-trace, but also to any continuous matrix function φ : H+n → [0,+∞) that satisfiesHölder’s inequality and is unitary invariant in the sense that φ(U∗XU) = φ(X ) forarbitrary X ∈ H+n and U ∈ Cn×n unitary. In fact, these two properties suffice toimply that log φ exp is convex onHn, which, along with a canonical majorizationargument, will yield the inequality (5.63) (see, e.g., [50]).

5.5 From Lieb’s Theorem to ConcentrationTropp’s proof of the master bounds relies on a critical use of the Lieb’s theorem.To be specific, Tropp used the concavity of A 7→ Tr

[exp(H + log A)

]to prove the

subadditivity of matrix cumulant generating function. For the sequence X (i)mi=1 ⊂

Hn under the same setting,

ETr[exp

( m∑i=1

X (i))] ≤ Tr[exp

( m∑i=1

logE exp X (i))] . (5.37)

Similarly, to prove Theorem 5.1.2, we need to extend (5.37) to the following lemmausing our generalized Lieb’s theorem.

Lemma 5.5.1. Let A(1), A(2), · · · , A(m) ∈ H++n be m independent, random, positive

167

definite matrices. Then we have for any 1 ≤ k ≤ n,

E(Trk

[exp(

m∑i=1

log A(i))] ) 1

k ≤(Trk

[exp(

m∑i=1

logEA(i))] ) 1

k , (5.38a)

E log Trk[exp(

m∑i=1

log A(i))]≤ log Trk

[exp(

m∑i=1

logEA(i))]. (5.38b)

Proof. We here only prove (5.38a). The proof for (5.38b) is similar. Since eachrandommatrix A(i) always lies inH++n , wemay apply Theorem 5.3.1 to get a Jensen’sinequality

E(Trk

[exp(H + log A(i))

] ) 1k ≤

(Trk

[exp(H + logEA(i))

] ) 1k ,

for arbitrary H ∈ Hn. And since A(1), A(2), · · · , A(m) are independent, we can splitthe expectation E into E1E2 · · ·Em, where Ei is the expectation operator with respectto the random matrix A(i). So then we may apply Theorem 5.3.1 repeatedly to get

E(Trk

[exp(

m∑i=1

log A(i))] ) 1

k

= E1 · · ·Em(Trk

[exp(

m−1∑i=1

log A(i) + log A(m))] ) 1

k

≤ E1 · · ·Em−1(Trk

[exp(

m−1∑i=1

log A(i) + logEm A(m))] ) 1

k

· · ·

≤(Trk

[exp(

m∑i=1

logEA(i))] ) 1

k .

Our proof of Theorem 5.1.2 will basically follow Tropp’s proof of Theorem 3.6.1 in[122], but with the normal trace Tr replaced by the general k-trace Trk for 1 ≤ k ≤ n.

Proof of Theorem 5.1.2. We first prove (5.4a) and (5.5a). The first inequality in(5.4a) is trivial, because the sum of a Hermitian matrix’s k largest eigenvalues is aconvex function of the matrix itself. Indeed we have

Ek∑

i=1λi (Y ) = E sup

Q∈Cn×kQ∗Q=Ik

Tr[Q∗YQ] ≥ supQ∈Cn×kQ∗Q=Ik

Tr[Q∗(EY )Q] =k∑

i=1λi (EY ).

168

For the second inequality in (5.4a), we apply a similar technique, the matrix Laplacetransform, as in [122]. The difference is that in the first step, we do not switch theexpectation operator and the logarithm. For any θ > 0, we have

Ek∑

i=1λi (Y ) =

1θE log exp

( k∑i=1

λi (θY ))

=1θE log

( k∏i=1

λi(exp(θY )

))≤

1θE log Trk

[exp(θY )

].

We have used the fact that∏k

i=1 λi (A) ≤ Trk[A]for any A ∈ H+n . Next we define

the random matrices A(i) = exp(θX (i)) ∈ H++n , 1 ≤ i ≤ m. Since X (i), 1 ≤ i ≤ m,are independent, A(i), 1 ≤ i ≤ m, are also independent. Therefore we may applyinequality (5.38b) in Lemma 5.5.1 to get

E log Trk[exp(θY )

]= E log Trk

[exp

( m∑i=1

θX (i))]= E log Trk

[exp

( m∑i=1

log A(i))]≤ log Trk

[exp

( m∑i=1

logEA(i))]= log Trk

[exp

( m∑i=1


Since θ > 0 is arbitrary, we thus have

Ek∑

i=1λi (Y ) ≤ inf

θ>0

1θlog Trk

[exp

( m∑i=1


The proof of (5.5a) shares a similar spirit, except that we use (5.38a) instead of(5.38b). For any t ∈ R, θ > 0, we use the Markov’s inequality to obtain

P

k∑i=1

λi (Y ) ≥ t= P

exp

( θk

k∑i=1

λi (Y ))≥ e

θtk

≤ e−θtk E exp

(1k

k∑i=1

λi (θY ))≤ e−

θtk E

[(Trk exp(θY )

) 1k

].

Then again by defining A(i) = exp(θX (i)) ∈ H++n , 1 ≤ i ≤ m, we may applyinequality (5.38a) in Lemma 5.5.1 to obtain

E[(Trk exp(θY )

) 1k

]≤

(Trk exp

( m∑i=1

logE exp(θX (i)))) 1

k

169

Since θ > 0 is arbitrary, we thus have

P

k∑i=1


θ>0e−

θtk

(Trk exp

( m∑i=1


k .

We proceed to (5.4b) and (5.5b). The first inequality (5.4b) can be similarly verifiedby noticing that the sum of a Hermitian matrix’s k smallest eigenvalues is a concavefunction of thematrix itself. For the second inequality in (5.4b) and inequality (5.5b),we only need to consider arbitrary θ < 0, and use the fact that θλn−i+1(A) = λi (θA)for any A ∈ Hn. Then repeating the arguments for (5.4a) and (5.5a), we can similarlyshow that

Ek∑


θ<0

1θlog Trk

[exp

( m∑i=1

logE exp(θX (i)))],

and

P

k∑i=1


θ<0e−

θtk

(Trk exp

( m∑i=1


k .

With all the preceding results, we are now ready to prove our Chernoff-type boundson the sum of the k largest (or smallest) eigenvalues. The following proof is animitation of Tropp’s proof of Theorem 5.1.1 in [122].

Proof of Theorem 5.1.4. Since 0 ≤ λn(X (i)) ≤ λ1(X (i)) ≤ c, we can use lemma5.4.1 in [122] to obtain the estimate

logE exp(θX (i)) ≤eθc − 1

cEX (i) = g(θ)EX (i), θ ∈ R,

where g(θ) = eθc−1c . So we have the following

Trk[exp

( m∑i=1

logE exp(θX (i)))]≤ Trk

[exp

(g(θ)

m∑i=1EX (i))]

= Trk[exp

(g(θ)EY

)]≤

(nk

) k∏i=1

λi(exp

(g(θ)EY

))=

(nk

)exp

( k∑i=1

λi(g(θ)EY

)).

(5.39)

170

The second inequality above is due to the fact that Trk[A]≤

(nk

) ∏ki=1 λi (A) for any

A ∈ H+n . Notice that for θ > 0, g(θ) = eθc−1c > 0. We then apply (5.39) to (5.4a) in

Theorem 5.1.2 to get

Ek∑

i=1λi (Y ) ≤ inf

θ>0

1θlog *

,

(nk

)exp

( k∑i=1

λi(g(θ)EY

))+-

= infθ>0

g(θ)θ

k∑i=1

λi (EY ) +1θlog

(nk

).

As mentioned in [122], this infimum does not admit a closed form. By makingchange of variable θ → θ/c, we obtain (5.8a)

Ek∑

i=1λi (Y ) ≤ inf

θ>0

eθ − 1θ

k∑i=1

λi (EY ) +cθlog

(nk

).

Similarly, we apply (5.39) to (5.5a) in Theorem 5.1.2 to get

P

k∑i=1


θ>0e−

θtk *

,

(nk

)exp

( k∑i=1

λi(g(θ)EY

))+-

1k

= infθ>0

e−θtk

(nk

) 1k

exp(g(θ)

k

k∑i=1

λi (EY )).

If we choose t = (1 + ε)∑k

i=1 λi (EY ) for ε ≥ 0, we have

P

k∑i=1

λi (Y ) ≥ (1 + ε)k∑

i=1λi (EY )

≤ infθ>0

(nk

) 1k

exp((g(θ) − (1 + ε)θ

) 1k

k∑i=1

λi (EY )).

Minimizing the right hand side with θ = log(1+ε)c gives (5.9a).

Now consider θ < 0, we have g(θ) = eθc−1c < 0. We then apply (5.39) to (5.4b) in

Theorem 5.1.2 to get

Ek∑


θ<0

1θlog *

,

(nk

)exp

( k∑i=1

λi(g(θ)EY

))+-

= supθ<0

g(θ)θ

k∑i=1

λn−i+1(EY ) +1θlog

(nk

).

171

We have used λi (g(θ)EY ) = g(θ)λn−i+1(EY ) when g(θ) < 0. By making changeof variable θ → −θ/c, we obtain (5.8b)

Ek∑


θ>0

1 − e−θ

θ

k∑i=1


(nk

).

Again, we apply (5.39) to (5.5b) in Theorem 5.1.2 to get

P

k∑i=1


θ<0e−

θtk *

,

(nk

)exp

( k∑i=1

λi(g(θ)EY

))+-

1k

= infθ<0

e−θtk

(nk

) 1k

exp(g(θ)

k

k∑i=1

λn−i+1(EY )).

If we choose t = (1 − ε)∑k

i=1 λn−i+1(EY ) for ε ∈ [0, 1), we have

P

k∑i=1

λn−i+1(Y ) ≤ (1 − ε)k∑

i=1λn−i+1(EY )

≤ infθ<0

(nk

) 1k

exp((g(θ) − (1 − ε)θ

) 1k

k∑i=1

λn−i+1(EY )).

Minimizing the right hand sides with θ = log(1−ε)c gives (5.9b).

5.6 Supporting Materials5.6.1 Mixed DiscriminantThe mixed discriminant D(A(1), A(2), · · · , A(n)) of n matrices A(1), A(2), · · · , A(n) ∈

Cn×n is defined as

D(A(1), A(2), · · · , A(n)) =1n!

∑σ∈Sn

det

A(σ(1))11 A(σ(2))

12 · · · A(σ(n))1n

A(σ(1))21 A(σ(2))

22 · · · A(σ(n))1n

......

. . ....

A(σ(1))n1 A(σ(2))

n2 · · · A(σ(n))nn

, (5.40)

where Sn denotes the symmetric group of order n. We here list some basic factsabout mixed discriminants. For more properties of mixed discriminants, one mayrefer to [6, 91].

• Symmetry: D(A(1), A(2), · · · , A(n)) is symmetric in A(1), A(2), · · · , A(n), i.e.

D(A(1), A(2), · · · , A(n)) = D(Aσ(1), Aσ(2), · · · , Aσ(n)), σ ∈ Sn.

172

• Multilinearity: for any α, β ∈ R,

D(αA+βB, A(2), · · · , A(n)) = αD(A, A(2), · · · , A(n))+βD(B, A(2), · · · , A(n)).

• Positiveness [6]: If A(1), A(2), · · · , A(n) ∈ H+n , then D(A(1), A(2), · · · , A(n)) ≥0; if A(1), A(2), · · · , A(n) ∈ H++n , then D(A(1), A(2), · · · , A(n)) > 0.

The relation between the mixed discriminant and Trk is obvious. If we calculate themixed discriminant for k copies of A ∈ Cn×n and n− k copies of In, we can find that

D(A, · · · , A︸︷︷︸k

, In, · · · , In︸︷︷︸n−k

) =(nk

)−1 ∑1≤i1<i2<···<ik≤n

det[A(i1···ik,i1···ik )] =(nk

)−1Trk[A].

(5.41)

This is why the mixed discriminant plays an important role in the proof of our maintheorem. In particular, we will need the following inequality on mixed discriminantby Alexandrov [2].

Theorem 5.6.1 (Alexandrov–Fenchel Inequality for Mixed Discriminants). For anyB ∈ Hn and any A, A(3), · · · , A(n)︸︷︷︸

n−2

∈ H++n , we have

D(A, B, A(3), · · · , A(n))2 ≥ D(A, A, A(3), · · · , A(n))D(B, B, A(3), · · · , A(n)), (5.42)

with equality if and only if B = λA for some λ ∈ R.

This theorem originally applied to real symmetric matrices when established. Aproof of its extension to Hermitian matrices can be found in [66]. By continuity,inequality (5.42) can extend to the case that A, A(3), · · · , A(n) ∈ H+n , but the necessityof the condition for equality is no longer valid.

Repeatedly applying the Alexandrov–Fenchel inequality (5.42) grants us the follow-ing corollary.

Corollary 5.6.2. For any 0 ≤ l ≤ k ≤ n, and any A, B, A(k+1), · · · , A(n)︸︷︷︸n−k

∈ H+n , we

have

D(A, · · · , A︸︷︷︸l

, B, · · · , B︸︷︷︸k−l

, A(k+1), · · · , A(n)︸︷︷︸n−k

)k (5.43)

≥ D(A, · · · , A︸︷︷︸k

, A(k+1), · · · , A(n)︸︷︷︸n−k

)l · D(B, · · · , B︸︷︷︸k

, A(k+1), · · · , A(n)︸︷︷︸n−k

)k−l .

173

A direct result of Corollary 5.6.2 is the following general Brunn–Minkowski theoremfor mixed discriminants.

Corollary 5.6.3. (General Brunn–Minkowski Theorem for Mixed Discriminants)For any 1 ≤ k ≤ n, and any fixed A(k+1), · · · , A(n)︸︷︷︸

n−k

∈ H+n , the function

H+n −→ R

A 7−→ D(A, · · · , A︸︷︷︸k

, A(k+1), · · · , A(n)︸︷︷︸n−k

)1k (5.44)

is concave.

Proof. Fixing A(k+1), · · · , A(n), we will use D(A[k]) and D(A[l], B[k− l]) to denote

D(A, · · · , A︸︷︷︸k

, A(k+1), · · · , A(n)︸︷︷︸n−k

)1k and D(A, · · · , A︸︷︷︸

l

, B, · · · , B︸︷︷︸k−l

, A(k+1), · · · , A(n)︸︷︷︸n−k

)

respectively. For any A, B ∈ H+n , and any τ ∈ [0, 1], using the multilinearity ofmixed discriminants and Corollary 5.6.2, we have

D((τA + (1 − τ)B)[k]) =k∑

l=0

(kl

)τl (1 − τ)k−l D(A[l], B[k − l])

≥

k∑l=0

(kl

)τl (1 − τ)k−l D(A[k])

lk D(B[k])

k−lk

=(τD(A[k])

1k + (1 − τ)D(B[k])

1k) k,

that is D((τA + (1 − τ)B)[k]) 1k ≥ τD(A[k]) 1

k + (1 − τ)D(B[k]) 1k .

If we choose A(k+1), · · · , A(n) to be n − k copies of In, Corollary 5.6.3 immediatelyimplies that the function A 7→

(Trk

[A] ) 1

k is concave on H+n , which is a special caseof Theorem 5.3.1 with H = 0. So we see the connection between the Alexandrov–Fenchel inequality and our generalized Lieb’s theorem. However, the arguments inthe proof of Corollary 5.6.3 do not seem to work with H , 0. We hence need moretools to handle the more general case.

5.6.2 Exterior AlgebraHere we give a brief review of exterior algebras on the vector space Cn. For moredetails, one may refer to [14, 100]. For the convenience of our use, the notations inour paper might be different from those in other materials. For any 1 ≤ k ≤ n, let

174

∧k (Cn) denote the vector space of the kth exterior algebra of Cn, equipped with theinner product

〈·, ·〉∧k : ∧k (Cn) × ∧k (Cn) −→ C

〈u1 ∧ · · · ∧ uk, v1 ∧ · · · ∧ vk〉∧k = det

〈u1, v1〉〈u1, v2〉 · · · 〈u1, vk〉

〈u2, v1〉〈u2, v2〉 · · · 〈u2, vk〉...

.... . .

...

〈uk, v1〉〈uk, v2〉 · · · 〈uk, vk〉

,

where 〈u, v〉 = u∗v is the standard l2 inner product on Cn.

Let L(∧k (Cn)) denote the space of all linear operators from ∧k (Cn) to itself. Forany matrices A(1), A(2), · · · , A(k) ∈ Cn×n, we can define an element in L(∧k (Cn)):

M (k) (A(1), A(2), · · · , A(k)) :

∧k (Cn) −→ ∧k (Cn)

v1 ∧ v2 ∧ · · · ∧ vk 7−→∑σ∈Sk

A(σ(1))v1 ∧ A(σ(2))v2 ∧ · · · ∧ A(σ(k))vk,

(5.45)

where Sk is the symmetric group of order k. Apparently, the map

(A(1), A(2), · · · , A(k)) 7−→ M (k) (A(1), A(2), · · · , A(k))

is symmetric in A(1), A(2), · · · , A(k) and is linear in each single A(i). For simplicity,we will use the following notations for any matrices A, B,C ∈ Cn×n:

M(k)0 (A) =

1k!M (k) (A, · · · , A), (5.46a)

M(k)1 (A; B) =

1(k − 1)!

M (k) (A, B, · · · , B), (5.46b)

M(k)2 (A, B;C) =

1(k − 2)!

M (k) (A, B,C, · · · ,C). (5.46c)

To avoid confusion, we define M (1)1 (A; B) = M1

0 (A), M (1)2 (A, B;C) = 0, and

M(2)2 (A, B;C) = M (2)

1 (A; B). Obviously the identity operator in L(∧k (Cn)) isM0(In). We will be using the following properties:

• Invertibility: if A ∈ Cn×n is invertible, then (M (k)0 (A))−1 =M (k)

0 (A−1).

• Adjoint: for any A ∈ Cn×n, (M (k)0 (A))∗ =M (k)

0 (A∗), with respect to the innerproduct 〈·, ·〉∧k .

175

• Positiveness: If A ∈ Hn, then M (k)0 (A) is Hermitian; if A ∈ H+n , then

M(k)0 (A) 0; if A ∈ H++n , thenM (k)

0 (A) 0.

• Product properties: for any A, B,C, D ∈ Cn×n, we have

M(k)0 (AB) =M (k)

0 (A)M (k)0 (B), (5.47a)

M(k)1 (A; B)M (k)

0 (C) =M (k)1 (AC; BC), (5.47b)

M(k)0 (C)M (k)

1 (A; B) =M (k)1 (C A;CB), (5.47c)

M(k)1 (A;C)M (k)

1 (B; D) =M (k)2 (AD,CB;CD) +M (k)

1 (AB;CD). (5.47d)

• Derivative properties: for any differentiable functions A(t), B(t) : R −→Cn×n, we have

∂

∂tM

(k)0 (A(t)) =M (k)

1 (A′(t); A(t)) (5.48a)

∂

∂tM

(k)1 (A(t); B(t)) =M (k)

1 (A′(t); B(t)) +M (k)2 (A(t), B′(t); B(t)).

(5.48b)

Next we consider the natural basis of ∧k (Cn),

ei1 ∧ ei2 ∧ · · · ∧ eik 1≤i1<i2<···<ik≤n,

which is orthogonal under the inner product 〈·, ·〉∧k . Then the trace function onL(∧k (Cn)) is defined as

Tr : L(∧k (Cn)) −→ C

Tr[F

]=

∑1≤i1<i2<···<ik≤n

〈ei1 ∧ ei2 ∧ · · · ∧ eik , F (ei1 ∧ ei2 ∧ · · · ∧ eik )〉∧k .

(5.49)

It is not hard to check that this trace function is also invariant under cyclic per-mutation, i.e. Tr

[F G

]= Tr

[GF

]for any F ,G ∈ L(∧k (Cn)). Then for any

A(1), · · · , A(k) ∈ Cn×n, the trace Tr[M (k) (A(1), · · · , A(k))] coincides with the defini-tion of the mixed discriminant, as one can check that

Tr[M (k) (A(1), · · · , A(k))

]=

∑σ∈Sk

∑1≤i1<···<ik≤n

〈ei1 ∧ · · · ∧ eik , A(σ(1))ei1 ∧ · · · ∧ A(σ(k))eik 〉∧k

=n!

(n − k)!D(A(1), · · · , A(k), In, · · · , In︸︷︷︸

n−k

).

(5.50)

176

From this observation, we can now express the k-trace of a matrix A ∈ Cn×n as

Trk[A] = Tr[M

(k)0 (A)

]. (5.51)

For those who are familiar with exterior algebra, it is clear that the spectrumof M (k)

0 is just λi1λi2 · · · λik 1≤i1<i2<···<ik≤n, where λ1, λ2, · · · , λn are the eigen-values of A. So in this way it is more convenient to see that Tr

[M

(k)0 (A)

]=

sum(spectrum ofM (k)0 (A)) =

∑1≤i1<···<ik≤n λi1λi2 · · · λik = Trk[A]. Our proof of

Theorem 5.3.1 will base on the expression (5.51).

In fact, our proof the main theorem can be done without introducing the exterioralgebra. We can instead go through the whole proof only using notations of mixeddiscriminant. The advantage of using exterior algebra is that it interprets the k-traceas the normal trace of operators in a space of higher dimension, so our k-tracefunctions have a nicer form that imitates the trace function in the original Lieb’stheorem. Also for the same reason, we are able to construct our proof by followingthe arguments of Lieb’s original proof in [67].

We next introduce some notations to simplify the expressions in what follows. Forany n real numbers λ1, λ2, · · · , λn ∈ R, we define the three symmetric forms

p(n,k) =∑

1≤i1<i2<···<ik≤n

λi1λi2 · · · λik , 1 ≤ k ≤ n, (5.52a)

d (n,k)i =

∑1≤ j1< j2<···< jk−1≤n

i< j1, j2,··· , jk−1

λ j1λ j2 · · · λ jk−1, 2 ≤ k ≤ n, 1 ≤ i ≤ n, (5.52b)

g(n,k)i j =

∑1≤l1<l2<···<lk−2≤n

i, j<l1,l2,··· ,lk−2

λl1λl2 · · · λlk−2, 3 ≤ k ≤ n, 1 ≤ i, j ≤ n, i , j . (5.52c)

For consistency, we define d (n,k)i = 1 if k = 1; g(n,k)

i j = 1 if k = 2 and i , j; g(n,k)i j = 0

if k = 1 or i = j. Also we define p(n,k) = d (n,k)i = g(n,k)

i j = 0 if k > n. Throughoutthis paper, whenever we are given some real numbers λ1, λ2, · · · , λn, the quantitiesp(n,k), d (n,k)

i , g(n,k)i j are always defined correspondingly with respect to λi1≤i≤n. The

following relations are easy to verify with the definitions above, and will be usefulin our proofs of lemmas and theorems. For any n, k, and any 1 ≤ i, j ≤ n such thati , j, we have the expansion relations

p(n,k) = λid(n,k)i + d (n,k+1)

i , d (n,k)i = λ jg

(n,k)i j + g(n,k+1)

i j . (5.53)

With the notations defined above, we give the following lemma. The proof isstraightforward by definition, so we omit it here.

177

Lemma 5.6.4. For any A, B ∈ Cn×n, and any diagonal matrix Λ ∈ Cn×n withdiagonal entries λ1, λ2, · · · , λn, we have the following identities

Tr[M

(k)0 (Λ)

]= p(n,k), (5.54a)

Tr[M

(k)1 (A;Λ)

]=

n∑i=1

Aiid(n,k)i , (5.54b)

Tr[M

(k)2 (A, B;Λ)

]=

∑1≤i, j≤n

(Aii B j j − A ji Bi j )g(n,k)i j , (5.54c)

for all 1 ≤ k ≤ n, where p(n,k), d (n,k)i , g(n,k)

i j are definedwith respect to λ1, λ2, · · · , λn.

We here provide an alternative of Lemma 5.4.4 using the following lemma.

Lemma 5.6.5. For any A ∈ Hn, we have

M(k)0

(exp(A)

)= exp

(M

(k)1 (A; In)

).

Proof. We need to show that for any v1 ∧ v2 ∧ · · · ∧ vk ∈ ∧k (Cn),

M(k)0

(exp(A)

)(v1∧ v2∧ · · · ∧ vk ) = exp

(M

(k)1 (A; In)

)(v1∧ v2∧ · · · ∧ vk ). (5.55)

We use Taylor expansion of ex to expand

M(k)0

(exp(A)

)=M

(k)0

( +∞∑j=0

1j!

A j), exp

(M

(k)1 (A; In)

)=

+∞∑j=0

1j!

(M

(k)1 (A; In)

) j .

Then for any integers j1, j2, . . . , jk ≥ 0, the coefficient of the term A j1v1 ∧ A j2v2 ∧

· · · ∧ A jk vk in the left hand side of (5.55) is

1j1! j2! · · · jk!

,

and the coefficient of the same term in the right hand side of (5.55) is also

1J!

(Jj1

) (J − j1

j2

)· · ·

(J − j1 − j2 − · · · − jk−1

jk

)=

1j1! j2! · · · jk!

,

where (J = j1 + j2 + · · · + jk ).

178

An alternative proof of Lemma 5.4.4. Using Lemma 5.6.5 and the original GT in-equality for normal trace, we have

Trk[exp(A + B)] = Tr[M

(k)0

(exp(A + B)

)]= Tr

[exp

(M

(k)1 (A + B; In)

)]= Tr

[exp

(M

(k)1 (A; In) +M (k)

1 (B; In))]

≤ Tr[exp

(M

(k)1 (A; In)

)exp

(M

(k)1 (B; In)

)]= Tr

[M

(k)0

(exp(A)

)M

(k)0

(exp(B)

)]= Trk

[exp(A) exp(B)

],

where we have used thatM (k)1 (X ; In) is linear in X . As shown by Petz [93], in the

original GT inequality, the equality Tr[exp(A + B)] = Tr[exp(A) exp(B)] holds forA, B ∈ Hn if and only if AB = BA. Therefore, according to our calculation above,the equality Trk[exp(A + B)] = Trk[exp(A) exp(B)] holds if and only if

M(k)1 (A; In)M (k)

1 (B; In) =M (k)1 (B; In)M (k)

1 (A; In). (5.56)

However, one can check by definition that (5.56) is true if and only if AB = BA.

5.6.3 Derivatives of Some Matrix FunctionsLet us remind ourselves that a basic but important way to prove concavity of adifferentiable function f (t) is by showing that f ′′(t) ≤ 0. Similarly, one way toprove concavity of a differentiable multivariate function f (x) is by showing that thesecond directional derivative ∂2

∂t2 f (x + ty) |t=0 ≤ 0 for all allowed direction y. Wewill use this idea to prove the concavity of the k-trace functions (5.18) and (5.19).For this purpose, we would need the following matrix derivative formulas.

• Consider a function A(t) : (a, b) −→ Hn, such that A(t) is differentiable on(a, b), then we have[132]

∂

∂texp

(A(t)

)=

∫ 1

0exp

(sA(t)

)A′(t) exp

((1 − s)A(t)

)ds. (5.57)

A′(t) denotes the derivative of A(t) with respect to t.

• Consider a function A(t) : (a, b) −→ H++n , such that A(t) is differentiable on(a, b), then we have[67]

∂

∂t(A(t)

)−1= −

(A(t)

)−1A′(t)(A(t)

)−1, (5.58)

and∂

∂tlog

(A(t)

)=

∫ ∞

0

(A(t) + τIn

)−1A′(t)(A(t) + τIn

)−1dτ. (5.59)

179

5.6.4 Operator InterpolationOne of our main tools is Stein’s interpolation of linear operators [116], that wasdeveloped from Hirschman’s stronger version of the Hadamard three-line theorem[51]. This technique was recently adopted by Sutter et al. [119] to establish amultivariate extension of the Golden–Thompson inequality, which inspired our useof interpolation in proving the generalized Lieb’s concavity theorem. Wewill followthe notations in [119]. For any θ ∈ (0, 1), we define a density βθ (t) on R by

βθ (t) =sin(πθ)

2θ(cosh(πt) + cos(πθ)

) , t ∈ R. (5.60)

Specially, we define

β0(t) = limθ0

βθ (t) =π

2(cosh(πt) + 1), and β1(t) = lim

θ1βθ (t) = δ(t).

βθ (t) is a density since βθ (t) ≥ 0, t ∈ R and∫ +∞−∞

βθ (t)dt = 1. We will always useS to denote a vertical strip on the complex plane C:

S = z ∈ C : 0 ≤ Re(z) ≤ 1. (5.61)

The idea of complex interpolation originates from an important result in harmonicanalysis, the Hadamard three-lines theorem [45], that if f (z) is uniformly boundedon S = z ∈ C : 0 ≤ Re(z) ≤ 1, holomorphic in the interior and continuous on theboundary, then g(x) = log supy | f (x+iy) | is a convex function on [0, 1]. Hirschman[51] improved this theorem to the following.

Theorem 5.6.6 (Hirschman). Let f (z) be uniformly bounded on S, holomorphic inthe interior and continuous on the boundary. Then for θ ∈ (0, 1), we have

log | f (θ) | ≤∫ +∞

−∞

dt(β1−θ (t) log | f (it) |1−θ + βθ (t) log | f (1 + it) |θ

).

Moreover, the assumption that f (z) is uniformly bounded can be relaxed to

log | f (z) | ≤ CeaIm(z), ∀z ∈ S, for some constants C < +∞, a < π.

Stein [116] further generalized this complex interpolation theory to interpolation oflinear operators.

Theorem 5.6.7 (Stein-Hirschman). Let G(z) be a map from S to bounded linearoperators on a separable Hilbert space that is holomorphic in the interior of S andcontinuous on the boundary. Let p0, p1 ∈ [1,+∞], θ ∈ [0, 1], and define pθ by

1pθ=

1 − θp0+θ

p1.

180

Then if ‖G(z)‖pRe(z) is uniformly bounded on S, the following inequality holds:

log ‖G(θ)‖pθ ≤∫ +∞

−∞

dt(β1−θ (t) log ‖G(it)‖1−θp0 + βθ (t) log ‖G(1+it)‖θp1

). (5.62)

A k-trace analog of the above theorem, that is used in the proof of Lemma 5.3.2, isas follows (recall that we write φ(·) = Trk[·]

1k ):

Lemma 5.6.8. Let G(z) : S → Cn×n be holomorphic in the interior of S andcontinuous on the boundary. Let p0, p1 ∈ [1,+∞], θ ∈ [0, 1], and define pθ by

1pθ=

1 − θp0+θ

p1.

Then if ‖G(z)‖ is uniformly bounded on S, the following inequality holds:

log[φ( |G(θ) |pθ )

1pθ

]≤

∫ +∞

−∞

dt(β1−θ (t) log

[φ(|G(it) |p0

) 1−θp0

]+ βθ (t) log

[φ(|G(1 + it) |p1

) θp1

] ).

(5.63)

Proof. For any X ∈ Cn×n and p ∈ [1,+∞), we have that

M(k)0

(|X |p

)=M

(k)0

((X∗X )

p2)=

(M

(k)0 (X∗X )

) p2

=((M (k)

0 (X ))∗M (k)0 (X )

) p2 = M (k)

0 (X )p,

and thus

Trk[|X |p

] 1p = Tr

[M

(k)0

(|X |p

)] 1p = Tr

[M (k)0 (X )p

] 1p = M (k)

0 (X ) p.

The above equality also holds for p → +∞ since we are dealing with finite-dimensional operators. If G(z) is holomorphic in the interior of S and continuouson the boundary, then so isM (k)

0(G(z)

). And if ‖G(z)‖ is uniformly bounded on

S, then ‖M (k)0

(G(z)

)‖pRe(z) is also uniformly bounded on S, since all norms are

equivalent for finite-dimensional operators. Therefore we can use Theorem 5.6.7with G(z) replaced byM (k)

0(G(z)

)to get

log(Trk

[|G(θ) |pθ

] 1pθ

)≤

∫ +∞

−∞

dt(β1−θ (t) log

(Trk

[|G(it) |p0

] 1−θp0

)+ βθ (t) log

(Trk

[|G(1 + it) |p1

] θp1

)).

We then multiply both sides by 1k to obtain (5.63).

181

For p0, p1 ∈ [1,+∞), we can rewrite inequality (5.63) as

log φ(|G(θ) |pθ )

≤

∫ +∞

−∞

dt( (1 − θ)pθ

p0β1−θ (t) log φ

(|G(it) |p0

)+θpθp1

βθ (t) log φ(|G(1 + it) |p1

))(5.64)

Notice that ∫ +∞

−∞

( (1 − θ)pθp0

β1−θ (t) +θpθp1

βθ (t))dt = 1.

Then using Jensen’s inequality on the concavity of logarithm, we can immediatelyconclude from (5.64) that for p0, p1 ∈ [1,+∞),

φ(|G(θ) |pθ ) ≤∫ +∞

−∞

dt( (1 − θ)pθ

p0β1−θ (t)φ

(|G(it) |p0

)+θpθp1

βθ (t)φ(|G(1+it) |p1

)),

(5.65)under the same setting as in Lemma 5.6.8.

We remark that the operator interpolation inequality in Theorem5.6.7 is interestinglypowerful and user-friendly for proving matrix inequalities, as we have seen in theproofs of Lemma 5.3.2 and Theorem 5.3.3. In fact this technique can also provemany fundamental results in matrix theories. We here show one more example tosee how we can use the trick of interpolation to prove that the map A 7→ Ar isoperator concave on H+n for r ∈ (0, 1]. Note that, in general, this result is proved byusing an integral expression for Ar (e.g. see [24]).

We need to show that for any A, B ∈ H+n , τ ∈ [0, 1],

τAr + (1 − τ)Br ≤ Cr, (5.66)

where C = τA + (1 − τ)B. We may assume that C is invertible. The case whenC is not invertible can be handled by continuity. Notice that X In if and only ifTr[K∗X K] ≤ Tr[K∗K] for all K ∈ Cn×n. (5.66) is then equivalent to the statementthat

τTr[K∗C−r2 ArC−

r2 K] + (1 − τ)Tr[K∗C−

r2 BrC−

r2 K] ≤ Tr[K∗K], ∀K ∈ Cn×n.

Now we fix K and define

GX (z) = Xz2 C−

z2 K, X = A, B,

so we haveTr[K∗C−

r2 X rC−

r2 K] = ‖GX (r)‖22 .

182

GX (z) is holomorphic in the interior of S and continuous on the boundary, and‖GX (z)‖ is uniformly bounded onS. We then use inequality (5.62) in Theorem5.6.7with θ = r, pθ = p0 = p1 = 2 to obtain

‖GX (r)‖22 ≤∫ +∞

−∞

dt((1 − r) β1−r (t)‖GX (it)‖22 + r βr (t)‖GX (1 + it)‖22

).

We have again used Jensen’s inequality to get rid of the logarithms. For each t ∈ R,we have that

‖GX (it)‖22 = Tr[K∗Cit2 X−

it2 X

it2 C−

it2 K] = Tr[K∗K],

and

‖GX (1 + it)‖22 = Tr[K∗C−1−it2 X

1−it2 X

1+it2 C−

1+it2 K] = Tr[K∗C−

1−it2 XC−

1+it2 K].

We then have

τ‖GA(1+it)‖22+(1−τ)‖GB (1+it)‖22 = Tr[K∗C−1−it2 (τA+(1−τ)B)C−

1+it2 K] = Tr[K∗K].

Finally we obtain

τ‖GA(r)‖22 + (1 − τ)‖GB (r)‖22 ≤ Tr[K∗K].

5.6.5 Homogeneous Convex/Concave FunctionsLemma 5.6.9. Let C be a convex cone in some linear space, i.e., C = conv(C) andC = λC for any λ > 0. Let function f : C → [0,+∞) be positively homogeneousof order 1, i.e., f (λx) = λ f (x), for any x ∈ C and λ > 0. Then for any s ∈ (0, 1),f (x) is concave if and only if f (x)s is concave; for any s ∈ (1,+∞), f (x) is convexif and only if f (x)s is convex.

In general this lemma is proved via an argument of level sets. Here we provide amore direct proof.

Proof. One direction is trivial. If f (x) is concave, then f (x)s is concave fors ∈ (0, 1), since (·)s is concave and monotone increasing. Conversely, if f (x)s isconcave for some s ∈ (0, 1), then f (τx + (1 − τ)y)s ≥ τ f (x)s + (1 − τ) f (y)s, forany x, y ∈ C, τ ∈ [0, 1]. Now given any fixed x, y ∈ C, τ ∈ [0, 1], we need to showthat τ f (x)+ (1−τ) f (y) ≤ f (τx+ (1−τ)y). If f (x) = f (y) = 0, then we are done.

183

Otherwise, wemay assume that f (x) > 0, and define M = τ f (x)+ (1−τ)( f (y)+ε )for some ε > 0(this ε is not necessary if f (y) > 0). We then have

f (τx + (1 − τ)y) = f(τ f (x)

MM xf (x)

+(1 − τ)( f (y) + ε )

MMy

f (y) + ε

) s· 1s

≥

(τ f (x)

Mf(

M xf (x)

) s

+(1 − τ)( f (y) + ε )

Mf(

My

f (y) + ε

) s) 1s

=

(τ f (x)

MM s +

(1 − τ)( f (y) + ε )1−s f (y)s

MM s

) 1s

= M(τ f (x) + (1 − τ)( f (y) + ε )1−s f (y)s

τ f (x) + (1 − τ)( f (y) + ε )

) 1s

.

We then take ε → 0 to obtain f (τx + (1 − τ)y) ≥ τ f (x) + (1 − τ) f (y). Thereforef (x) is concave. The convexity part can be proved similarly.

Lemma 5.6.10. Let C be a convex cone in some linear space, i.e., C = conv(C) andC = λC for any λ > 0. Let function f : C → [0,+∞) be positively homogeneous oforder 1, i.e., f (λx) = λ f (x), for any x ∈ C and λ > 0. Then f (x)

1s is concave if

and only if log f (x) is concave.

Proof. One direction is trivial. If f (x)1s is concave, then log f (x) = s log( f (x)

1s ) is

concave since log(·) is monotone and concave on (0,+∞). Conversely, if log f (x)is concave, then f (τx + (1 − τ)y) ≥ f (x)τ f (y)1−τ, for any x, y ∈ C, τ ∈ [0, 1].Now for any fixed x, y ∈ C, τ ∈ [0, 1], we define M = τ f (x)

1s + (1 − τ) f (y)

1s . We

then have

f (τx + (1 − τ)y)1s = f *

,

τ f (x)1s

MM x

f (x)1s

+(1 − τ) f (y)

1s

MMy

f (y)1s

+-

1s

≥ f *,

M x

f (x)1s

+-

τ f (x)1s

M · 1s

f *,

My

f (y)1s

+-

(1−τ) f (y)1s

M · 1s

= (M s)τ f (x)

1s

M · 1s+(1−τ) f (y)

1s

M · 1s

= M .

Therefore f (x)1s is concave.

184

5.7 Other Results on K-trace5.7.1 Some CorollariesThe following corollary follows from standard arguments on homogeneous, concavefunctions.

Corollary 5.7.1. For any p, q ∈ [0, 1], p + q > 0, s ∈ (0, 1p+q ], and any K ∈ Cn×m,

the function(A, B) 7−→ φ

((B

q2 K∗ApK B

q2 )s) 1

s(p+q) (5.67)

is jointly concave on H+n ×H+m. For any H ∈ Hn and any p j mj=1 ⊂ [0, 1] such that

0 <∑m

j=1 p j ≤ 1, the function

(A(1), A(2), . . . , A(m)) 7−→ φ(exp

(H +

m∑j=1

p j log A( j))) 1∑mj=1 p j , (5.68)

is jointly concave on (H++n )×m.

Proof. Consider any matrix function F : H+n → [0,+∞)(or H++n → [0,+∞)) that ispositively homogeneous of order 1, i.e. F (λA) = λF (A),∀λ ≥ 0. By Lemma 5.6.9,we have that F is concave⇐⇒ Fr is concave for some r ∈ (0, 1].

One can easily check that the functions (5.67) and (5.68) are positively homogeneousof order 1. Then this corollary follows from Theorem 5.3.3, Theorem 5.3.4 andLemma 5.6.9.

The following corollary is an analog of the concave part of Lemma 3.1 in [27].

Corollary 5.7.2. For any r ∈ [0, 1], s ∈ [0, 1r ] and any K ( j)mj=1 ⊂ Cn×n, the

function

(A(1), A(2), . . . , A(m)) 7−→ φ(( m∑

j=1(K ( j))∗(A( j))r K ( j)

) s)(5.69)

is jointly concave on (H+n )×m.

Proof. Define

A =

*.......,

A(1) 0 . . . 0

0 A(1) . . . 0...

.... . .

...

0 0 . . . A(m)

+///////-

∈ H+mn, K =

*.......,

K (1) 0 . . . 0

K (2) 0 . . . 0...

.... . .

...

K (m) 0 . . . 0

+///////-

∈ Cmn×mn.

185

Then we have

(K∗ Ar K )s =

*....,

( ∑mj=1(K ( j))∗(A( j))r K ( j)

) s. . . 0

.... . .

...

0 . . . 0

+////-

,

and thus

φ((K∗ Ar K )s) = φ(( m∑

j=1(K ( j))∗(A( j))r K ( j)

) s).

By Lemma 5.3.2, the left hand side above is concave in A, therefore the right handside is jointly concave in (A(1), A(2), . . . , A(m)).

5.7.2 Multivariate Golden–Thompson InequalitySutter et al. [119] recently applied the operator interpolation in Theorem 5.6.7 toderive a multivariate extension of the Golden–Thompson (GT) inequality, whichcovers the original GT inequality and its three-matrix extension by Lieb [67].

Following the ideas in [119], we may also use Lemma 5.6.8 to further extend themultivariate GT inequality to a k-trace form. In what follows, we write

∏mj=1 X ( j)

for the matrix multiplication in the index order, i.e.∏m

j=1 X ( j) = X (1) X (2) · · · X (m).We first present an analog of Theorem 3.2 in [119].

Lemma 5.7.3. For any A(1), A(2), . . . , A(m) ∈ H+n , p ∈ [1,+∞), r ∈ (0, 1], thefollowing inequality holds:

log φ(

m∏j=1

(A( j))r pr)≤

∫ +∞

−∞

dt βr (t) log φ(

m∏j=1

(A( j))1+it p). (5.70)

Proof. Define

G(z) =m∏

j=1(A( j))z, z ∈ S,

where S is defined as in Lemma 5.6.8. One can check that G(z) is holomorphic inthe interior of S and continuous on the boundary, and ‖G(z)‖ is uniformly boundedon S. We may first assume that each A( j) ∈ H++n so that (A( j))it, t ∈ R is unitary.The result for A( j) ∈ H+n can be obtained by continuity. Thus G(it) is unitary forall t ∈ R, and |G(it) |p0 = In for all p0. Thus we can apply inequality (5.64) withθ = r, p0 → +∞, p1 = p, pθ =

pr to obtain

log φ(|G(r) |

pr)≤

∫ +∞

−∞

dt βr (t) log φ(|G(1 + it) |p

),

which is exactly (5.70).

186

Using a multivariate version of the Lie product formula, we immediately obtain thefollowing from Lemma 5.7.3.

Theorem 5.7.4 (Multivariate Golden–Thompson Inequality for k-trace). For anyA(1), A(2), . . . , A(m) ∈ Hn, the following inequality holds:

log φ((exp

( m∑j=1

A( j))) p)≤

∫ +∞

−∞

dt β0(t) log φ(

m∏j=1

exp((1+it)A( j))

p). (5.71)

Proof. We only need to replace A( j) in inequality (5.70) by exp(A( j)), and taker → 0. Since each ‖ exp((1 + it)A( j))‖ = ‖ exp(A( j)) exp(it A( j))‖ = ‖ exp(A( j))‖is uniformly bounded for all t ∈ R, the right hand side of (5.70) then becomes theright hand side of (5.71). By a multivariate Lie product formula (see, e.g., [119])

limr0

(exp(r X (1)) exp(r X (2)) · · · exp(r X (m))

) 1r= exp

( m∑i=1

X ( j)),the left hand side of (5.70) then becomes

limr0

log φ(

m∏j=1

exp(r A( j))

pr)

= limr0

log φ(( m∏

j=1exp

(r A(m− j+1)) m∏

j=1exp

(r A( j))) p

2r)

= log φ((exp

( m∑j=1

2A( j))) p2)

= log φ((exp

( m∑j=1

A( j))) p).

If we choose m = 2, p = 2 in Theorem 5.7.4 and replace A( j) by 12 A( j), the right

hand side of inequality (5.71) is independent of t due to the cyclicity of φ. We thenrecover the k-trace GT inequality

φ(exp(A(1) + A(2))

)≤ φ

(exp(A(1)) exp(A(2))

),

187

that we have obtained in Lemma 5.4.4. If we choose m = 3, p = 2 in Theorem 5.7.4and again replace A( j) by 1

2 A( j), we have

log φ(exp(A(1) + A(2) + A(3))

)≤

∫ +∞

−∞

dt β0(t) log φ(exp(A(1)) exp

(1 + it2

A(2)) exp(A(3)) exp(1 − it

2A(2)))

≤ log φ(∫ +∞

−∞

dt β0(t) exp(A(1)) exp(1 + it

2A(2)) exp(A(3)) exp

(1 − it2

A(2)))The second inequality above is due to concavity of logarithm and φ. If we define

TA[B] =∫ +∞

0dt(A + t In)−1B(A + t In)−1, A, B ∈ H++n ,

and use Lemma 3.4 in [119] that∫ +∞

0dt(A−1 + t In)−1B(A−1 + t In)−1 =

∫ +∞

−∞

dt β0(t) A1+it2 BA

1−it2 , A, B ∈ H++n ,

we then further obtain

φ(exp(A(1) + A(2) + A(3))

)≤ φ

(exp(A(1))Texp(−A(2) )[exp(A(3))]

).

This can be seen as a k-trace generalization of Lieb’s [67] three-matrix extension ofthe GT inequality that Tr[exp(A + B + C)] ≤ Tr[exp(A)Texp(−B)[exp(C)]].

5.7.3 Monotonicity Preserving and Concavity PreservingAs mentioned in Section 5.4.3, Loewner’s theorem says that a real-valued functionf on (0,+∞) is operator monotone if and only if it admits an analytic continuationto a Herglotz function. Therefore, the extension of a monotone scalar functionto Hermitian matrices is not necessarily operator monotone. For instance, letf (x) = x3, and

A = *,

1 00 0

+-, B = *

,

2 −1−1 1

+-,

then f is monotone increasing, and A B. But neither A3 B3 nor A3 B3

is true. However, a composition with trace will preserve the monotonicity. Thatis, Tr[ f (A)] is monotone increasing (or decreasing) in A with respect to Loewnerpartial order, if f is monotone increasing (or decreasing). Likewise, if f is concave(or convex), its extension to Hermitian matrices is not necessarily operator concave(or convex), but A 7→ Tr[ f (A)] is still concave (or convex). One can see Theorem2.10 in [24]. This means that, some partial information like trace may preserve

188

monotonicity and concavity. In fact, we will show that for any integer k the partialinformation φ(·) = Trk[·]

1k also preserves monotonicity and concavity of scalar

functions. But we need to restrict to f that only takes values in [0,+∞). We needthe following lemma for proving concavity preserving.

Lemma 5.7.5. For any A ∈ H+n , let diag(A) denote the diagonal part of A, then

φ(A) ≤ φ(diag(A)).

Proof. Let D be a n×n diagonal matrix, whose diagonal entries follow independentRademacher distributions, i.e.

Dii =

1, with prob. 0.5,−1, with prob. 0.5.

Since D2 ≡ In, we have φ(A) = φ(DAD). Also notice that E[DAD] = diag(A),since E[Dii D j j] = δi j . Then by concavity of φ, we have

φ(A) = Eφ(DAD) ≤ φ(E[DAD]) = φ(diag(A)).

Lemma 5.7.5 can be proved, instead, using the concept of Majorization. Let a =ai

ni and b = bi

ni be two sequences, both in descending order, i.e. a1 ≥ a2 ≥

· · · ≥ an, b1 ≥ b2 ≥ · · · ≥ bn. We say b majorizes a, denoted by b a, if

k∑i=1

bi ≥

k∑i=1

ai, 1 ≤ k ≤ n,

and the equality holds for k = n. It is not hard to show that, if b a, anda, b ⊂ [0,+∞), then

∏di=1 bi ≤

∏di=1 ai. Now for any A ∈ H+n , let λ = λi

ni=1

and a = aini be the eigenvalues and the diagonal entries of A respectively, both in

descending order. Then since

k∑i=1

λi = maxvi

ki=1⊂C

n

v∗i v j=δi j

k∑i=1

v∗i Avi ≥

k∑i=1

e∗i Aei =

k∑i=1

ai, 1 ≤ k < n

and∑n

i=1 λi = Tr[A] =∑n

i=1 ai, we have λ a. Therefore we know that

det[A] =n∏

i=1λi ≤

n∏i=1

ai = det[diag(A)].

189

Then using the equivalent definition (5.13) of k-trace, we have that, for any 1 ≤ k ≤

n,

Trk[A] =∑

1≤i1<i2<···<ik≤n

det[A(i1···ik,i1···ik )]

≤∑

1≤i1<i2<···<ik≤n

Ai1i1 Ai2i2 · · · Aik ik ,

= Trk[diag(A)].

Theorem 5.7.6. Given a function f : [0,+∞) → [0,+∞), if f is monotone increas-ing (or decreasing) as a scalar function, then φ( f (·)) is monotone increasing (ordecreasing) on H+n , in the sense that φ( f (A)) ≥ φ( f (B))(or φ( f (A)) ≤ φ( f (B)))if A, B ∈ H+n, A B; if f is concave as a scalar function, then φ( f (·)) is concaveon H+n .

Proof. We first prove the monotonicity preserving property of φ. For any matrixA ∈ Hn, we denote by λi (A) the ith largest eigenvalue of A. For any A, B ∈ H+n , ifA B, then λi (A) ≥ λi (B), 1 ≤ i ≤ n. Therefore if f is monotone increasing, weimmediately have

λi ( f (A)) = f (λi (A)) ≥ f (λi (B)) = λi ( f (B)), 1 ≤ i ≤ n,

and thus φ( f (A)) ≥ φ( f (B)) by definition φ. Similarly, if f ismonotone decreasing,we have

λi ( f (A)) = f (λn−i+1(A)) ≤ f (λn−i+1(B)) = λi ( f (B)), 1 ≤ i ≤ n,

and thus φ( f (A)) ≤ φ( f (B)).

Next we prove the concavity preserving property of φ. Given any A, B ∈ H+n , andany τ ∈ [0, 1], we define C = τA + (1 − τ)B. Let U = [u1, u2, · · · , un] be a unitarymatrix such that the columns are all eigenvectors of C, then

U∗ f (C)U = f (U∗CU) = f (diag(U∗CU)).

If f is concave, then

f (u∗i Cui) = f (τu∗i Aui+ (1−τ)u∗i Bui) ≥ τ f (u∗i Aui)+ (1−τ) f (u∗i Bui), 1 ≤ i ≤ n,

190

and thus f (diag(U∗CU)) τ f (diag(U∗AU)) + (1 − τ) f (diag(U∗BU)). Further,for any unit vector u ∈ Cn, we have

f (u∗Au) = f( n∑

i=1λi (A)u∗viv

∗i u

)= f

( n∑i=1

λ j (A) |v∗i u|2)

≥

n∑i=1|v∗i u|2 f

(λi (A)

)= u∗

( n∑i=1

f(λi (A)

)viv∗i

)u = u∗ f (A)u,

where v1, v2, · · · , vn are all eigenvectors of A, and we have used that∑n

i=1 |v∗i u|2 =

‖u‖22 = 1. Then we have f (diag(U∗AU)) diag(U∗ f (A)U). Similar, we havef (diag(U∗BU)) diag(U∗ f (B)U). Finally, we can compute

φ(

f (C))= φ

(U∗ f (C)U

)= φ

(f (diag(U∗CU))

)≥ φ

(τ f (diag(U∗AU)) + (1 − τ) f (diag(U∗BU))

)≥ τφ

(f (diag(U∗AU))

)+ (1 − τ)φ

(f (diag(U∗BU))

)≥ τφ

(diag( f (U∗AU))

)+ (1 − τ)φ

(diag( f (U∗BU))

)≥ τφ

(f (U∗AU)

)+ (1 − τ)φ

(f (U∗BU)

)= τφ

(U∗ f (A)U

)+ (1 − τ)φ

(U∗ f (B)U

)= τφ

(f (A)

)+ (1 − τ)φ

(f (B)

).

We have used Lemma 5.7.5 for the last inequality above. Therefore φ( f (·)) isconcave on H+n .

191

C h a p t e r 6

CONCLUDING DISCUSSIONS

Problems related to PDmatrices have aroused great interest inmany fields of science.In this thesis, we developed and discussed theories and algorithms for a general classof PD matrices on three main problems: solving a PD linear system, computing theeigenpairs of a PDmatrices, and studying the concentration of random PDmatrices.Our methods integrate the ideas of a variety of existing methodologies and show tobe superior in many particular examples.

The problem of approximating the inverse of a PD operator with localized basisfunctions is important in both the physical and data sciences. To pursue this matter,we propose in the first part of the thesis an operator compression framework basedon the notion of energy decomposition. The energy decomposition A =

∑mk=1 Ek

extracts the hidden topological and geometric information of a PD matrix A, whichserves the purpose of finding an adaptive partitioning P of the finest underlyingstructure. Specifically, we introduce two important local measurements, the errorfactor ε(P) and the condition factor δ(P), to provide rigorous a priori estimate esti-mates for our partitioning technique. These two factors can be calculated efficientlyby solving local and partial eigen problems and therefore are practically useful indesigning our algorithms. Upon the establishment of an appropriate partitioning,we follow the ideas of the modified coarse space byMålqvist and Peterseim [75] andthe gamblet transform by Owhadi [88] to construct an effective coarse level basis Ψconsisting of basis functions with exponentially decaying profiles. The exponentialdecay property of Ψ enables us to replace it with a localized basis Ψ without com-promising the expected compression accuracy; and because of this the compressedoperator preserves the intrinsic sparsity of the original operator A, ensuring thelow complexity of our approach in practice. With all the preceding ingredients, wepropose a nearly-linear time algorithm to obtain an appropriate partitioning and alocalized basis for compressing the operator with prescribed accuracy and boundedcondition number.

Having a generic operator compression framework, we follow the idea in [88] toextend the compression scheme hierarchically to form aMMD algorithm. The main

192

idea is to decompose the operator into multiple scales of resolution,

A−1 =K∑

k=1U (k) ((U (k))T AU (k))−1(U (k))T + Ψ(K ) ((Ψ(K ))T AΨ(K ))−1(Ψ(K ))T,

such that the relative condition number in each scale can be bounded. By passingthe energy decomposition from the finest level to the coarsest level, we perform ourpartitioning and basis construction techniques level-by-level in a recursive manner.This provides flexibility and convenience to deal with various, and even unknownmultiresolution behavior appearing in the matrix. Our MMD method further leadsus to develop a nearly-linear time solver for large and sparse PD systems up to anerror ε with time complexity

O(m log n ·(log ε−1 + log n

)clog ε−1),

where m is the number of nonzero entries of A and c is some absolute constantdepending only on the geometric property of A.

Our groundwork introduces the idea of energy decomposition and its applicationsin operator compression and solving PD linear systems. We believe that the energyframework may prompt further research. Particularly, we discover further possibleimprovement of our algorithms during the development stage. For instance, due tothe pairing characteristic of Algorithm 4, we are quite affirmative that our partition-ing algorithm is not optimal. Instead, the clustering problem could be reformulatedinto some local optimization problem such that the construction of the partitionP can be more robust. Second, our current implementation is a combination ofMATLAB and C++ coding and no parallel computing is included. Therefore, oneof our future works is to develop an optimal coding such that more comparisonexperiments with state-of-the-art algorithms can be conducted.

In application-wise planning, we believe that our energy decomposition frameworkcan be specifically modified to suit the purpose of solving elliptic PDEs with highcontrast coefficients, as demonstrated in the high-contrast problem in Section 2.4.2.Based on our energy decomposition, more in-depth analysis and improvement couldbe made to show that such framework is one of the possible candidates to solvethe elliptic type problem with highly varying coefficients. Moreover, regarding thepartitioning procedure and the locality of the basis in our algorithms, the localizedMMD solver can be further improved to fit into the needs of frequent updating of thesolver. For example, in graph Laplacian system, our MMD solver can be updated

193

dynamically if new vertices/new edges are added to the given graphs. This dynamicupdate greatly reduces the time for the regeneration of the solver, especially whenthe updates size is small.

Following the preceding MMD framework, we propose a hierarchically precondi-tioned eigensolver to compute a relatively large number of leftmost eigenpairs of asparse PDmatrix. This eigensolver exploits the well-conditioned property of the de-composition components obtained through the MMD, the nice converging propertyof the IRLM [64], and also the preconditioning characteristics of the CGmethod. Inparticular, we develop an extension-refinement iterative scheme, in which eigenpairsare hierarchically extended and refined from the ones obtained from the previouslevel up to a desired amount and a prescribed accuracy. To find new eigenvectorcandidate in the complement space, we need to solve linear systems with respect toa compressed operator Ast in each level. For this purpose, we introduce a speciallydesigned spectrum-preserving preconditioner M that corrects the orthogonality ofthe eigenvectors of the compressed operator, yielding a small restricted conditionnumber for using the PCG method. Based on the design of our method, we presenttheoretical analysis on its runtime complexity and asymptotic behavior. We havealso conducted quantitative numerical experiments and performance comparisonwith the IRLM, which demonstrated the efficiency and effectiveness of our pro-posed algorithm. The results show that our preconditioning technique remarkablyreduces the number of iteration in each call of the PCG method.

We remark that the proposed algorithm and its implementation are still in the earlystage as the main purpose of this work is to explore the possibility of integratingthe multiresolution operator compression framework with the Krylov-type iterativeeigensolver. Therefore, one of the future topics is to conduct a comprehensivenumerical studies of our algorithm to various large-scale, real data such as graphLaplacians of real network data, or stiffness matrices stemmed from the discretiza-tion of high-contrasted elliptic PDEs. These studies will help numerically confirmthe asymptotic behavior of the relative condition numbers of M and Ast , especiallywhen we need to compute a large number of leftmost eigenpairs from large-scaleoperators.

Another possible research direction is to investigate the parallelization of our algo-rithm. This is important when we solve a large-scale eigenvalue problem. One wayto implement parallelization is by modifying the underlying Lanczos method intoa block version. Based on the conventional Lanczos iteration, many researchers

194

have proposed block Lanczos algorithms [33, 42] for computing the leading eigen-pairs of symmetric matrices. Instead of computing only one candidate vector, aLanczos method applies the target operator to a bunch of vectors in each iterationand hence expands the Krylov subspace much faster. We may also incorporate thisblock technique into our eigensolver and hence reduce its time complexity if parallelcomputation is available.

In the last part of the thesis, we turn to a more theoretical project on developingconcentration inequalities on eigenvalue sums. To obtain bounds on partial sums ofeigenvalues, we introduce the notion of k-trace and discuss its properties. With thehelp of the k-trace, we show that the sum of the k largest (or smallest) eigenvaluesof a general class of random PD matrices also obeys a Chernoff-type tail bound,generalizing the existing estimates on the largest (or smallest) eigenvalue [122].These new estimates provides theoretical guarantees for randomized algorithmsin eigenvalue-related problems, in particular for random spectral sparsification ofPD matrices. Our concentration results are consequences of a generalized Lieb’sconcavity theorem that the map

(A, B) 7−→ Trk[(B

q2 K∗ApK B

q2 )s

] 1k

is jointly concave on H+n ×H+m for any p, q ∈ [0, 1], s ∈ [0, 1p+q ] and any K ∈ Cn×m,

which extends the classic Lieb’s concavity theorem [67] form the normal trace tok-trace functions. Our proof of the generalized Lieb’s concavity theorem relies onthe properties of k-trace functions and the use of an operator interpolation techniquedue to Stein [116].

We would like to remark that our generalized Lieb’s concavity theorem (Theo-rem 5.3.3) can be further extended to a more general class of functions. We say afunction φ : H+n → R is a monotone concave symmetric form, if

(i) φ(U∗AU) = φ(A) for any A ∈ H+n and unitary matrix U ∈ Cn×n,

(ii) A B implies φ(A) ≤ φ(B) for any A, B ∈ H+n , and

(iii) φ(τA+ (1−τ)B) ≥ τφ(A)+ (1−τ)φ(B) for any A, B ∈ H+n and any τ ∈ [0, 1].

It has been proved by the author of this thesis [54] that for any monotone concavesymmetric form φ : H+n → R, the map

(A, B) 7−→ φ((B

q2 K∗ApK B

q2 )s

)

195

is jointly concave on H+n × H+m for all p, q ∈ [0, 1], s ∈ [0, 1p+q ] and all K ∈ Cn×m.

Notice that the k-trace function Trk[·]1k is a monotone concave symmetric form for

all 1 ≤ k ≤ n. Therefore the more general result in [54] covers our Theorem 5.3.3.

As we have mentioned earlier, it is possible to extend our concentration inequalitieson the sum of the largest (or smallest) eigenvalues to analogous results on the sumof arbitrary successive eigenvalues. In fact, Tropp and Gittens [40] have establishedconcentration inequalities on arbitrary single eigenvalue by projecting the randommatrix onto a subspace. We may therefore combine our k-trace methods and theirsubspace argument to obtain concentration results on the sum of arbitrary successiveeigenvalues.

Apart from its particular use in proving our concentration inequalities, the k-trace isalso of theoretical interest by itself, as it has many interpretations corresponding todifferent aspects of matrix theories. For instance, the k-trace has a direct connectionto theories of mixed discriminants and anti-symmetric matrix tensors. However, sofar we have not found another specific example in which an explicit use of the k-traceis crucial. As for future projects, we want to explore more theoretical applications ofthe k-trace, which may extend the corresponding theories on matrix trace functionsto an even broader scope.

196

BIBLIOGRAPHY

[1] W. Aiello, F. Chung, and L. Lu. A random graph model for massive graphs.In Proceedings of the Thirty-second Annual ACM Symposium on Theory ofComputing, STOC ’00, pages 171–180, New York, NY, USA, 2000. ACM.

[2] A. Aleksandrov. On the theory of mixed volumes of convex bodies. IV. mixeddiscriminants and mixed volumes. Mat. Sb.(NS), 3(1):938, 1938.

[3] T. Ando. Concavity of certain maps on positive definite matrices and appli-cations to Hadamard products. Linear Algebra and its Applications, 26:203–241, 1979.

[4] H. Araki. On an inequality of Lieb and Thirring. Letters in MathematicalPhysics, 19(2):167–170, 1990.

[5] J. Ashburner. A fast diffeomorphic image registration algorithm. Neuroimage,38(1):95–113, 2007.

[6] R. Bapat. Mixed discriminants of positive semidefinite matrices. LinearAlgebra and its Applications, 126:107–124, 1989.

[7] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Ei-jkhout, R. Pozo, C. Romine, and H. Van der Vorst. Templates for the solutionof linear systems: Building blocks for iterative methods. SIAM, 1994.

[8] K.-J. Bathe and E. L. Wilson. Numerical methods in finite element analysis,volume 197. Prentice-Hall Englewood Cliffs, NJ, 1976.

[9] M. Belkin, I. Matveeva, and P. Niyogi. Regularization and semi-supervisedlearning on large graphs. In International Conference on ComputationalLearning Theory, pages 624–638. Springer, 2004.

[10] M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques forembedding and clustering. In NIPS, volume 14, pages 585–591, 2001.

[11] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geomet-ric framework for learning from labeled and unlabeled examples. Journal ofmachine learning research, 7(Nov):2399–2434, 2006.

[12] L. Bergamaschi and E. Bozzo. Computing the smallest eigenpairs of thegraph Laplacian. SeMA Journal, pages 1–16, 2015.

[13] L. Bergamaschi, G. Gambolati, and G. Pini. Asymptotic convergence of con-jugate gradient methods for the partial symmetric eigenproblem. Numericallinear algebra with applications, 4(2):69–84, 1997.

197

[14] R. Bishop and S. Goldberg. Tensor Analysis on Manifolds. Dover Books onMathematics. Dover Publications, 1980.

[15] F. L. Bookstein. Principal warps: Thin-plate splines and the decomposi-tion of deformations. IEEE Transactions on pattern analysis and machineintelligence, 11(6):567–585, 1989.

[16] Y. Boykov and V. Kolmogorov. Computing geodesics and minimal surfacesvia graph cuts. In ICCV, volume 3, pages 26–33, 2003.

[17] E. Bozzo and M. Franceschet. Effective and efficient approximations of thegeneralized inverse of the graph Laplacian matrix with an application tocurrent-flow betweenness centrality. arXiv preprint arXiv:1205.4894, 2012.

[18] E. Bozzo and M. Franceschet. Resistance distance, closeness, and between-ness. Social Networks, 35(3):460–469, 2013.

[19] S. C. Brenner and C. Carstensen. Finite element methods. Encyclopedia ofcomputational mechanics, 2004.

[20] J. Bruna,W. Zaremba, A. Szlam, andY. LeCun. Spectral networks and locallyconnected networks on graphs. arXiv preprint arXiv:1312.6203, 2013.

[21] D. Cai, X. He, J. Han, and T. S. Huang. Graph regularized nonnegative matrixfactorization for data representation. IEEE Transactions on Pattern Analysisand Machine Intelligence, 33(8):1548–1560, 2011.

[22] D. Calvetti, L. Reichel, and D. C. Sorensen. An implicitly restarted Lanczosmethod for large symmetric eigenvalue problems. Electronic Transactionson Numerical Analysis, 2(1):21, 1994.

[23] Y. Cao, M. I. Miller, R. L. Winslow, and L. Younes. Large deformationdiffeomorphic metric mapping of vector fields. IEEE transactions on medicalimaging, 24(9):1216–1230, 2005.

[24] E. Carlen. Trace inequalities and quantum entropy: an introductory course.Entropy and the quantum, 529:73–140, 2010.

[25] E. A. Carlen, R. L. Frank, and E. H. Lieb. Some operator and trace functionconvexity theorems. Linear Algebra and its Applications, 490:174–185, 2016.

[26] E.A.Carlen, R. L. Frank, andE.H.Lieb. Inequalities for quantumdivergencesand theAudenaert–Datta conjecture. Journal of Physics A:Mathematical andTheoretical, 51(48):483001, 2018.

[27] E. A. Carlen and E. H. Lieb. A Minkowski type trace inequality and strongsubadditivity of quantum entropy II: Convexity and concavity. Letters inMathematical Physics, 83(2):107–126, Feb 2008.

198

[28] T. F. Chan and J. Zou. Additive Schwarz domain decomposition methods forelliptic problems on unstructured meshes. Numerical Algorithms, 8(2):329–346, 1994.

[29] K. Chaudhuri, F. Chung, and A. Tsiatas. Spectral clustering of graphs withgeneral degrees in the extended planted partition model. In Conference onLearning Theory, pages 35–1, 2012.

[30] F. R. Chung. Spectral graph theory, volume 92. American MathematicalSoc., 1997.

[31] S. Cocco, R. Monasson, and M. Weigt. From principal componentto direct coupling analysis of coevolution in proteins: Low-eigenvaluemodes are needed for structure prediction. PLoS Computational Biology,9(8):e1003176, 2013.

[32] C. Colovos and T. O. Yeates. Verification of protein structures: Patterns ofnonbonded atomic interactions. Protein science, 2(9):1511–1519, 1993.

[33] J. Cullum and W. E. Donath. A block Lanczos algorithm for computing theq algebraically largest eigenvalues and a corresponding eigenspace of large,sparse, real symmetric matrices. In 1974 IEEE Conference on Decisionand Control including the 13th Symposium on Adaptive Processes, pages505–509. IEEE, 1974.

[34] A. d’Aspremont, L. El Ghaoui, M. I. Jordan, and G. R. Lanckriet. A directformulation for sparse PCA using semidefinite programming. SIAM review,49(3):434–448, 2007.

[35] W. F. Donoghue. Monotonematrix functions and analytic continuation. 1974.

[36] A. Ebadian, I. Nikoufar, and M. E. Gordji. Perspectives of matrix convexfunctions. Proceedings of the National Academy of Sciences, 108(18):7313–7314, 2011.

[37] H. Epstein. Remarks on two theorems of E. Lieb. Communications inMathematical Physics, 31(4):317–325, 1973.

[38] P. Erdős and A. Rényi. On random graphs I. Publicationes Mathematicae(Debrecen), 6:290–297, 1959 1959.

[39] J. Francis. The transformation: a unitary analogue to the transformation. i.Comput. J., 4:265–271, 1961.

[40] A. Gittens and J. A. Tropp. Tail bounds for all eigenvalues of a sum of randommatrices. arXiv preprint arXiv:1104.4513, 2011.

[41] S. Goedecker. Low complexity algorithms for electronic structure calcula-tions. Journal of Computational Physics, 118(2):261–268, 1995.

199

[42] R. G. Grimes, J. G. Lewis, and H. D. Simon. A shifted block Lanczosalgorithm for solving sparse symmetric generalized eigenproblems. SIAMJournal on Matrix Analysis and Applications, 15(1):228–272, 1994.

[43] T. Guhr, A. Müller-Groeling, and H. A. Weidenmüller. Random-matrixtheories in quantum physics: common concepts. Physics Reports, 299(4-6):189–425, 1998.

[44] W. Hackbusch. Multi-grid methods and applications, volume 4. SpringerScience & Business Media, 2013.

[45] J. Hadamard et al. Théorème sur les séries entières. Acta Mathematica,22:55–63, 1899.

[46] E. Heinz. Beiträge zur störungstheorie der spektralzerleung. MathematischeAnnalen, 123(1):415–438, 1951.

[47] F. Hiai. Concavity of certain matrix trace and norm functions. Linear Algebraand its Applications, 439(5):1568–1589, 2013.

[48] F. Hiai. Concavity of certain matrix trace and norm functions. II. LinearAlgebra and its Applications, 496:193–220, 2016.

[49] F. Hiai et al. Concavity of certain matrix trace functions. Taiwanese Journalof Mathematics, 5(3):535–554, 2001.

[50] F. Hiai, R. König, and M. Tomamichel. Generalized log-majorization andmultivariate trace inequalities. In Annales Henri Poincaré, volume 18, pages2499–2521. Springer, 2017.

[51] I. I. Hirschman. A convexity theorem for certain groups of transformations.Journal d’Analyse Mathématique, 2(2):209–218, 1952.

[52] T. Y. Hou, D. Huang, K. C. Lam, and P. Zhang. An adaptive fast solverfor a general class of positive definite matrices via energy decomposition.Multiscale Modeling & Simulation, 16(2):615–678, 2018.

[53] Y. T. Hou and P. Zhang. Sparse operator compression of higher-order ellipticoperators with rough coefficients. Research in Mathematical Sciences, inpress, 2017.

[54] D. Huang. Generality of Lieb’s concavity theorem. arXiv preprintarXiv:1906.00002, 2019.

[55] I. Jolliffe. Principal component analysis. Wiley Online Library, 2002.

[56] I. T. Jolliffe and J. Cadima. Principal component analysis: a review and recentdevelopments. Philosophical Transactions of the Royal Society A: Mathemat-ical, Physical and Engineering Sciences, 374(2065):20150202, 2016.

200

[57] R. Kolluri, J. R. Shewchuk, and J. F. O’Brien. Spectral surface reconstructionfrom noisy point clouds. In Proceedings of the 2004 Eurographics/ACMSIGGRAPH symposium on Geometry processing, pages 11–21. ACM, 2004.

[58] H. Kosaki. Interpolation theory and the Wigner–Yanase–Dyson–Lieb con-cavity. Communications in Mathematical Physics, 87(3):315–329, 1982.

[59] I. Koutis, G. L. Miller, and R. Peng. A nearly-m log n time solver for SDDlinear systems. In Foundations of Computer Science (FOCS), 2011 IEEE52nd Annual Symposium on, pages 590–598. IEEE, 2011.

[60] F. Kraus. Über konvexe matrixfunktionen. Mathematische Zeitschrift,41(1):18–42, 1936.

[61] R. d. L. Kronig andW. G. Penney. Quantummechanics of electrons in crystallattices. Proceedings of the Royal Society of London. Series A, ContainingPapers of a Mathematical and Physical Character, 130(814):499–513, 1931.

[62] L. Laloux, P. Cizeau, M. Potters, and J.-P. Bouchaud. Random matrix theoryand financial correlations. International Journal of Theoretical and AppliedFinance, 3(03):391–397, 2000.

[63] M.G. Larson andA.Målqvist. Adaptive variationalmultiscalemethods basedon a posteriori error estimation: energy norm estimates for elliptic problems.Computer methods in applied mechanics and engineering, 196(21):2313–2324, 2007.

[64] R. B. Lehoucq and D. C. Sorensen. Deflation techniques for an implicitlyrestarted Arnoldi iteration. SIAM Journal on Matrix Analysis and Applica-tions, 17(4):789–821, 1996.

[65] R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK users’ guide: solutionof large-scale eigenvalue problems with implicitly restarted Arnoldi methods.SIAM, 1998.

[66] P. Li. The Alexandrov–Fenchel type inequalities, revisited. arXiv preprintarXiv:1710.00520, 2017.

[67] E. H. Lieb. Convex trace functions and the Wigner–Yanase–Dyson conjec-ture. Advances in Mathematics, 11(3):267–288, 1973.

[68] E. H. Lieb and M. B. Ruskai. Proof of the strong subadditivity of quantum-mechanical entropy. Journal of Mathematical Physics, 14(12):1938–1941,1973.

[69] Q. Lin and H. Xie. A multi-level correction scheme for eigenvalue problems.Mathematics of Computation, 84(291):71–88, 2015.

[70] G. Lindblad. Entropy, information and quantum measurements. Communi-cations in Mathematical Physics, 33(4):305–322, 1973.

201

[71] O. E. Livne and A. Brandt. Lean algebraic multigrid (LAMG): Fast graphLaplacian linear solver. SIAM Journal on Scientific Computing, 34(4):B499–B522, 2012.

[72] K. Löwner. Über monotone matrixfunktionen. Mathematische Zeitschrift,38(1):177–216, 1934.

[73] F. Lust-Piquard. Inégalités de Khintchine dans Cp(1 < p < ∞). CR Acad.Sci. Paris, 303:289–292, 1986.

[74] L. Mackey, M. I. Jordan, R. Y. Chen, B. Farrell, J. A. Tropp, et al. Matrixconcentration inequalities via the method of exchangeable pairs. The Annalsof Probability, 42(3):906–945, 2014.

[75] A. Målqvist and D. Peterseim. Localization of elliptic multiscale problems.Mathematics of Computation, 83(290):2583–2603, 2014.

[76] Á.Martínez. Tuned preconditioners for the eigensolution of large spdmatricesarising in engineering problems. Numerical Linear Algebra with Applica-tions, 23(3):427–443, 2016.

[77] M. L. Mehta. Random matrices, volume 142. Elsevier, 2004.

[78] L. Meirovitch. Elements of vibration analysis. McGraw-Hill, 1975.

[79] N. Mishra, R. Schreiber, I. Stanton, and R. E. Tarjan. Clustering socialnetworks. In A. Bonato and F. R. K. Chung, editors, Algorithms and Modelsfor the Web-Graph, pages 56–67, Berlin, Heidelberg, 2007. Springer BerlinHeidelberg.

[80] B. Moghaddam, Y. Weiss, and S. Avidan. Spectral bounds for sparse PCA:Exact and greedy algorithms. Advances in neural information processingsystems, 18:915, 2006.

[81] A.Myronenko andX. Song. Point set registration: Coherent point drift. IEEEtransactions on pattern analysis and machine intelligence, 32(12):2262–2275, 2010.

[82] M. Newman. Networks: an introduction. Oxford university press, 2010.

[83] M. E. Newman. Modularity and community structure in networks. Proceed-ings of the national academy of sciences, 103(23):8577–8582, 2006.

[84] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis andan algorithm. In Advances in neural information processing systems, pages849–856, 2002.

[85] I. Nikoufar, A. Ebadian, and M. E. Gordji. The simplest proof of Liebconcavity theorem. Advances in Mathematics, 248:531 – 533, 2013.

202

[86] R. I. Oliveira. Concentration of the adjacency matrix and of the Laplacian inrandom graphs with independent edges, 2009.

[87] R. I. Oliveira. Sums of random Hermitian matrices and an inequality byRudelson. Electronic Communications in Probability, 15:203–212, 2010.

[88] H. Owhadi. Multigrid with rough coefficients and multiresolution oper-ator decomposition from hierarchical information games. SIAM Review,59(1):99–149, 2017.

[89] H. Owhadi and C. Scovel. Universal scalable robust solvers from com-putational information games and fast eigenspace adapted multiresolutionanalysis. arXiv preprint arXiv:1703.10761, 2017.

[90] V. Ozolinš, R. Lai, R. Caflisch, and S. Osher. Compressed modes for vari-ational problems in mathematics and physics. Proceedings of the NationalAcademy of Sciences, 110(46):18368–18373, 2013.

[91] A. A. Panov. On some properties of mixed discriminants. Sbornik: Mathe-matics, 56(2):279–293, 1987.

[92] J. Peetre. A theory of interpolation spaces. Notes, Universidade de Brasilia,1963.

[93] D. Petz. A survey of certain trace inequalities. Banach Center Publications,30(1):287–298, 1994.

[94] G. Pisier and Q. Xu. Non-commutative martingale inequalities. Communi-cations in mathematical physics, 189(3):667–698, 1997.

[95] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. N. Amaral, T. Guhr, and H. E.Stanley. Random matrix approach to cross correlations in financial data.Physical Review E, 65(6):066126, 2002.

[96] T. Qin and K. Rohe. Regularized spectral clustering under the degree-corrected stochastic blockmodel. In C. J. C. Burges, L. Bottou, M. Welling,Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Infor-mation Processing Systems 26, pages 3120–3128. Curran Associates, Inc.,2013.

[97] M. Riesz et al. Sur les maxima des formes bilinéaires et sur les fonctionnelleslinéaires. Acta mathematica, 49(3–4):465–497, 1926.

[98] K. Rohe, S. Chatterjee, B. Yu, et al. Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4):1878–1915, 2011.

[99] E. Romero, M. B. Cruz, J. E. Roman, and P. B. Vasconcelos. A parallel im-plementation of the Jacobi–Davidson eigensolver for unsymmetric matrices.In VECPAR, pages 380–393. Springer, 2010.

203

[100] J. J. Rotman. Advanced modern algebra; 2nd ed. Graduate studies in mathe-matics. American Mathematical Society, Providence, RI, 2010.

[101] M. Rudelson. Random vectors in the isotropic position. Journal of FunctionalAnalysis, 164(1):60–72, 1999.

[102] Y. Saad. Numerical Methods for Large Eigenvalue Problems: Revised Edi-tion. SIAM, 2011.

[103] R. S. Sampath and G. Biros. A parallel geometric multigrid method forfinite elements on octree meshes. SIAM Journal on Scientific Computing,32(3):1361–1392, 2010.

[104] F. Schäfer, T. Sullivan, and H. Owhadi. Compression, inversion, and approx-imate pca of dense kernel matrices at near-linear computational complexity.arXiv preprint arXiv:1706.02205, 2017.

[105] R. Schneider. Convex bodies: The Brunn–Minkowski theory, secondexpanded edition. Encyclopedia of Mathematics and its Applications,1(151):ALL–ALL, 2014.

[106] B. Schölkopf and A. J. Smola. Learning with kernels: support vector ma-chines, regularization, optimization, and beyond. MIT press, 2002.

[107] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans-actions on pattern analysis and machine intelligence, 22(8):888–905, 2000.

[108] G. L. Sleijpen and H. A. Van der Vorst. A Jacobi–Davidson iteration methodfor linear eigenvalue problems. SIAM review, 42(2):267–293, 2000.

[109] D. C. Sorensen. Implicit application of polynomial filters in ak-step Arnoldimethod. Siam journal on matrix analysis and applications, 13(1):357–385,1992.

[110] D. C. Sorensen. Implicitly restarted Arnoldi/Lanczos methods for large scaleeigenvalue calculations. In Parallel Numerical Algorithms, pages 119–165.Springer, 1997.

[111] D. A. Spielman and N. Srivastava. Graph sparsification by effective resis-tances. SIAM Journal on Computing, 40(6):1913–1926, 2011.

[112] D. A. Spielman and S.-H. Teng. Nearly-linear time algorithms for graphpartitioning, graph sparsification, and solving linear systems. In Proceedingsof the thirty-sixth annual ACM symposium on Theory of computing, pages81–90. ACM, 2004.

[113] D. A. Spielman and S.-H. Teng. Spectral sparsification of graphs. SIAMJournal on Computing, 40(4):981–1025, 2011.

204

[114] D. A. Spielman and S.-H. Teng. A local clustering algorithm for massivegraphs and its application to nearly linear time graph partitioning. SIAMJournal on Computing, 42(1):1–26, 2013.

[115] D. A. Spielman and S.-H. Teng. Nearly linear time algorithms for precon-ditioning and solving symmetric, diagonally dominant linear systems. SIAMJournal on Matrix Analysis and Applications, 35(3):835–885, 2014.

[116] E. M. Stein. Interpolation of linear operators. Transactions of the AmericanMathematical Society, 83(2):482–492, 1956.

[117] G. Stewart. Accelerating the orthogonal iteration for the eigenvectors of aHermitian matrix. Numerische Mathematik, 13(4):362–376, 1969.

[118] K. Stüben. A review of algebraic multigrid. Journal of Computational andApplied Mathematics, 128(1):281–309, 2001.

[119] D. Sutter, M. Berta, and M. Tomamichel. Multivariate trace inequalities.Communications in Mathematical Physics, 352(1):37–58, 2017.

[120] J. A. Tropp. Freedman’s inequality for matrix martingales. Electronic Com-munications in Probability, 16:262–270, 2011.

[121] J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foun-dations of Computational Mathematics, 12(4):389–434, Aug 2012.

[122] J. A. Tropp. An introduction to matrix concentration inequalities. Founda-tions and Trends® in Machine Learning, 8(1–2):1–230, 2015.

[123] U. Trottenberg, C. W. Oosterlee, and A. Schuller. Multigrid. Academic press,2000.

[124] A. Uhlmann. Relative entropy and the Wigner–Yanase–Dyson–Lieb con-cavity in an interpolation theory. Communications in Mathematical Physics,54(1):21–32, 1977.

[125] P. Vaněk, J.Mandel, andM. Brezina. Algebraicmultigrid by smoothed aggre-gation for second and fourth order elliptic problems. Computing, 56(3):179–196, 1996.

[126] A. Vershynina, E. A. Carlen, and E. H. Lieb. Matrix and operator traceinequalities. Scholarpedia, 8(4):30919, 2013.

[127] U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing,17(4):395–416, 2007.

[128] G. H. Wannier. The structure of electronic excitation levels in insulatingcrystals. Physical Review, 52(3):191, 1937.

205

[129] P. Wesseling and C. W. Oosterlee. Geometric multigrid with applicationsto computational fluid dynamics. Journal of Computational and AppliedMathematics, 128(1):311–334, 2001.

[130] E. P. Wigner. Characteristic vectors of bordered matrices with infinite di-mensions I. In The Collected Works of Eugene Paul Wigner, pages 524–540.Springer, 1993.

[131] E. P. Wigner and M. M. Yanase. Information contents of distributions.Proceedings of the National Academy of Sciences of the United States ofAmerica, 49(6):910, 1963.

[132] R. M. Wilcox. Exponential operators and parameter differentiation in quan-tum physics. Journal of Mathematical Physics, 8(4):962–982, 1967.

[133] H. Xie. A multigrid method for eigenvalue problem. Journal of Computa-tional Physics, 274:550–561, 2014.

[134] H. Xie, L. Zhang, and H. Owhadi. Fast eigenpairs computation with operatoradapted wavelets and hierarchical subspace correction. SIAM Journal onNumerical Analysis, 57(6):2519–2550, 2019.

[135] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin. Graph embeddingand extensions: A general framework for dimensionality reduction. IEEEtransactions on pattern analysis and machine intelligence, 29(1), 2007.

[136] H. Zhang. From Wigner–Yanase–Dyson conjecture to Carlen–Frank–Liebconjecture. Advances in Mathematics, 365:107053, 2020.

[137] H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis.Journal of computational and graphical statistics, 15(2):265–286, 2006.

positivedeﬁnitematrices:compression, decomposition ...positivedeﬁnitematrices:compression,...

Documents