advances in structured operator theory and related areas: the leonid lerer anniversary volume

Operator Theory

Advances and Applications

237

Marinus A. Kaashoek

Leiba Rodman

Hugo J. Woerdeman

Editors

Advances in Structured Operator Theory and Related AreasThe Leonid Lerer Anniversary Volume

Joseph A. Ball (Blacksburg, VA, USA) Harry Dym (Rehovot, Israel) Marinus A. Kaashoek (Amsterdam, The Netherlands) Heinz Langer (Vienna, Austria) Christiane Tretter (Bern, Switzerland)

Vadim Adamyan (Odessa, Ukraine)

Albrecht Böttcher (Chemnitz, Germany) B. Malcolm Brown (Cardiff, UK) Raul Curto (Iowa, IA, USA) Fritz Gesztesy (Columbia, MO, USA) Pavel Kurasov (Stockholm, Sweden) Leonid E. Lerer (Haifa, Israel) Vern Paulsen (Houston, TX, USA) Mihai Putinar (Santa Barbara, CA, USA) Leiba Rodman (Williamsburg, VA, USA) Ilya M. Spitkovsky (Williamsburg, VA, USA)

Lewis A. Coburn (Buffalo, NY, USA) Ciprian Foias (College Station, TX, USA) J.William Helton (San Diego, CA, USA) Thomas Kailath (Stanford, CA, USA) Peter Lancaster (Calgary, Canada) Peter D. Lax (New York, NY, USA) Donald Sarason (Berkeley, CA, USA) Bernd Silbermann (Chemnitz, Germany) Harold Widom (Santa Cruz, CA, USA)

Associate Editors: Honorary and Advisory Editorial Board:

Operator Theory: Advances and Applications

Founded in 1979 by Israel Gohberg

Editors:

Volume 237

Wolfgang Arendt (Ulm, Germany)

Subseries Linear Operators and Linear Systems Subseries editors: Daniel Alpay (Beer Sheva, Israel) Birgit Jacob (Wuppertal, Germany) André C.M. Ran (Amsterdam, The Netherlands) Subseries Advances in Partial Differential Equations Subseries editors: Bert-Wolfgang Schulze (Potsdam, Germany) Michael Demuth (Clausthal, Germany) Jerome A. Goldstein (Memphis, TN, USA) Nobuyuki Tose (Yokohama, Japan) Ingo Witt (Göttingen, Germany)

Marinus A. Kaashoek Leiba RodmanHugo J. WoerdemanEditors

Advances in

The Leonid Lerer Anniversary Volume

Structured Operator Theoryand Related Areas

© Springer Basel 2013 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper

ISBN 978-3-0348-0638-1 ISBN 978-3-0348-0639-8 (eBook)DOI 10.1007/978-3-0348-0639-8Springer Basel Heidelberg New York Dordrecht London

Library of Congress Control Number:

Springer is part of Springer Science+Business Media (www.birkhauser-science.com) Basel

Mathematics Subject Classification 47A68, 47B35, 39B42, 93B28, 15A15; 11C20, 47A13, 47A48 (2010):

2013948131

athematics

USA

EditorsMarinus A. Kaashoek

Netherlands

Leiba RodmanDepartment of MCollege of William and Mary

Hugo J. Woerdeman

Drexel UniversityPUSA

Amsterdam Williamsburg

hiladelphia

Department of Mathematics, FEWVU University

Department of Mathematics

ISSN - ISSN 2296-4878 (electronic) 0255 0156

http://www.birkhauser-science.com

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Portrait of Leonid Lerer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Leonid Lerer’s Curriculum Vitae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Leonid Lerer’s List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

M.A. KaashoekLeonia Lerer’s Mathematical Work and Amsterdam Visits . . . . . . . . . . . 1

H. BartLeonia Lerer and My First Visit to Israel . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

I. KarelinThrough the Eyes of a Student . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

A.C.M. RanReminiscences on Visits to Haifa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

H.J. WoerdemanMy First Research Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

J.A. Ball and V. BolotnikovInterpolation in Sub-Bergman Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

H. Bart, T. Ehrhardt and B. SilbermannZero Sums of Idempotents and Banach Algebras Failingto be Spectrally Regular . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

T. Bella, Y. Eidelman, I. Gohberg, V. Olshevsky and E. TyrtyshnikovFast Inversion of Polynomial-Vandermonde Matrices forPolynomial Systems Related to Order OneQuasiseparable Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

xiii

vi Contents

H. Dym and M. PoratLong Proofs of Two Carlson–Schneider Type Inertia Theorems . . . . . . 107

T. Ehrhardt and I.M. SpitkovskyOn the Kernel and Cokernel of Some Toeplitz Operators . . . . . . . . . . . . 127

A.E. Frazho, M.A. Kaashoek and A.C.M. RanRational Matrix Solutions of a Bezout Type Equationon the Half-plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

M.A. Kaashoek and F. van SchagenInverting Structured Operators Related to Toeplitz PlusHankel Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

P. Lancaster and I. ZaballaOn the Sign Characteristics of Selfadjoint Matrix Polynomials . . . . . . . 189

Yu.I. LyubichQuadratic Operators in Banach Spaces and NonassociativeBanach Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

L. RodmanStrong Stability of Invariant Subspaces of Quaternion Matrices . . . . . . 221

H.J. WoerdemanDeterminantal Representations of Stable Polynomials . . . . . . . . . . . . . . . 241

Preface

This volume is dedicated to Leonid Arie Lerer on the occasion of his seventiethbirthday (April 19, 2013). Leonia, as he is known to his friends, is an expert in thetheory of structured matrices and operators and related matrix-valued functions.He has been a great inspiration to many.

Leonid Lerer started his mathematical career in Kishinev, Ukraine, with AlekMarkus and Israel Gohberg as research supervisors. He defended his Ph.D. thesisin 1969 in Kharkov, Ukraine. In December 1973 he immigrated to Israel. Since1981 he has been a professor at the Technion in Haifa where at present he has thestatus of emeritus. He has educated six Ph.D. students and five masters students.His more than 80 papers cover a wide spectrum of topics, ranging from functionalanalysis and operator theory, linear and multilinear algebra, ordinary differentialequations, to systems and control theory.

This anniversary volume begins with a picture of Leonid Lerer, his Curricu-lum Vitae and List of Publications, and personal notes written by former students,mathematical friends and colleagues.

The main part of the book consists of a selection of peer-reviewed researcharticles presenting recent results in areas that are close to Lerer’s mathematicalinterests. This includes articles on Toeplitz, Wiener–Hopf, and Toeplitz plus Han-kel operators, Bezout equations, inertia type results, matrix polynomials, in oneand severable variables, and related areas in matrix and operator theory.

We present this book to Leonid Lerer on behalf of all the authors as a tokenof our respect and gratitude. We wish him many more years of good health andhappiness.

March 2013 Rien KaashoekLeiba RodmanHugo Woerdeman

Leonid Arie Lerer in an undated photograph

Operator Theory:Advances and Applications, Vol. 237, ix–xiic⃝ 2013 Springer Basel

Leonid Lerer’s Curriculum Vitae

Date and place of birth: April 19, 1943; USSRDate of immigration: December 1973Marital status: Married, 2 children

Academic degrees

1965 M.Sc. Mathematics, Magna Cum LaudeKishinev State University, Kishinev

1969 Ph.D. MathematicsPhysico-Technical Institute of Low Temperatures of theAcademy of Sciences of the Ukrainian SSR, Kharkov

Academic appointments

1969–73 Lecturer, Dept. of Physics and MathematicsKishinev State University

1974–75 Senior Researcher,Technion Research & Development FoundationTechnion, Haifa

1975–80 Senior Lecturer, Dept. of Mathematics, Technion, Haifa

1981–88 Assoc. Professor, Dept. of Mathematics, Technion, Haifa

1988 (Feb.)– Full Professor, Dept. of Mathematics, Technion, Haifa

Visiting academic appointments

Visiting Professor, Dept. of Mathematics and Computer Scienceat Vrije Universiteit, Amsterdam:

Feb. 1984–Feb. 1985, Mar.–Sept. 1990, Apr.–Nov. 2002,Feb.–Apr. 2003, Jul.–Aug. 2003

Invited Guest, Dept. of Mathematics and Computer Scienceat Vrije Universiteit, Amsterdam:

Jul.–Aug. 1978, 1980, 1982, 1983, 1985–93, 1995, 1996–97,2000, 2004, Sept. 2005 (3–6 weeks each time)

x Leonid Lerer’s Curriculum Vitae

Invited Guest, Institute for Mathematics and its Applications,University of Minnesota, Minneapolis:

June 1992 (2 weeks)

Invited Guest, The Thomas Stieltjes Institute of Mathematics,Vrije Universiteit, Amsterdam:

Jul.–Aug. 1994 (5 weeks)

Research field

Operator Theory, Systems Theory, Linear Algebra, Integral Equations

Public professional activities

∙ Editorial board of the international journal “Integral Equations and OperatorTheory” (since its foundation in 1977).

∙ Editorial board of the book series “Operator Theory: Advances and Appli-cations”, Birkhauser Verlag, Basel–Boston–Berlin.

∙ Special editor of “Linear Algebra and its Applications”, 1988 and 2005.∙ Editor of “Gohberg Memorial Volumes”, Birkhauser, Basel, 2009–.∙ Editor of “Convolution equations and singular integral operators. Selectedpapers”, Birkhauser, Basel, 2009–2010.

∙ Organizing committee for:1. The third, forth, fifth, sixth, seventh, ninth, tenth, eleventh, thirteenth,fourteens and fifteenth Haifa Matrix Theory Conferences, 1985–2007.

2. International Conference “Operator Theory: Advances and Applica-tions”, Calgary, Canada, 1988.

3. Conference of the International Linear Algebra Society (ILAS), 2001.4. Workshops on Operator Theory and Applications, Amsterdam, 2002and 2003.

∙ Organizer and Chairman of Invited Special Sessions at1. International Symposium on the Mathematical Theory of Networks andSystems, Amsterdam, The Netherlands, 1989.

2. Second SIAM Conference on Linear Algebra in Signals, Systems andControl, San Francisco, USA, 1991.

3. International Symposium on Mathematical Theory of Networks andSystems, St. Louis, Missouri, USA, 1991.

4. The 14th Matrix Theory Conference, Haifa, 2007.5. The 15th Matrix Theory Conference, Haifa, 2009.6. International Workshop on Operator Theory and Applications,Williamsburg, USA, 2008.

∙ Program Committees for the International Symposia of the MathematicalTheory of Networks and Systems:1. MTNS – 89, Amsterdam, The Netherlands;2. MTNS – 93, Regensburg, Germany

Leonid Lerer’s Curriculum Vitae xi

Grants and awards

∙ Israel-U.S. Binational Science Foundation (BSF) Grant, 1988–1992.∙ Fellowship of the Netherlands Organization for Scientific Research (NWO),March–June, 1990.

∙ Grants from the Technion V.P.R. Fund – annually since 1979.∙ Grants from the Fund for Promotion of Research at the Technion – annuallysince 1981.

∙ Israel-U.S. Binational Science Foundation (BSF) Grant, 1995–1999.∙ Fellowship of the Netherlands Organization for Scientific Research (NWO),April–September 2002.

∙ Grant for Promotion of Funded Research ($ 2,500) (for the proposal submit-ted to BSF, graded “very good” but not funded by this agency), 2005.

∙ ISF grant 121/09 for the period 2009–2012, $ 40,000 yearly (still in progress).

Graduate students

B.A. Kon M.Sc. 1976Thesis title: “Asymptotic distribution of the spectra ofmultidimensional Wiener–Hopf operators andmultidimensional singular integral operators”.

M. Tismenetsky M.Sc. 1977Thesis title: “Spectral analysis of polynomial pencils via theroots of the corresponding matrix equations”.

B.A. Kon Ph.D. 1981Thesis title: “Operators of Wiener–Hopf types and resultantsof analytic functions”.

M.Tismenetsky Ph.D. 1981Thesis title: “Bezoutians, Toeplitz and Hankel matrices in thespectral theory of matrix polynomials”.

H.J. Woerdeman M.Sc. 1985 (Vrije Univ., Amsterdam)Thesis title: “Resultant operators for analytic matrix functionsin a finitely connected domain”.

J. Haimovici Ph.D. 1991Thesis title: “Operator equations and Bezout operators foranalytic operator functions”.

I. Karelin M.Sc. 1993Thesis title: “The algebraic Riccati equation and the spectraltheory of matrix polynomials”.

G. Gomez Ph.D. 1996Thesis title: “Bezout operators for analytic operator functionsand inversion of structured operators”.

xii Leonid Lerer’s Curriculum Vitae

I. Karelin Ph.D. 2000Thesis title: “Factorization of rational matrix functions,generalized Bezoutians and matrix quadratic equations”.

I. Margulis M.Sc. 2008Thesis title: “Inertia theorems based on operator equations ofLyapunov type and their applications”.

I. Margulis Ph.D. in progress.

Memberships

∙ Israel Mathematical Society∙ Society for Industrial and Applied Mathematics∙ The New York Academy of Sciences∙ European Mathematical Society

Invited talks at conferences

∙ Invited participation at 80 conferences.∙ Plenary speaker at:

1. SIAM Conference on Linear Algebra in Signals, Systems and Control,Boston, Mass., USA, 1986.

2. Workshop on Matrix and Operator Theory, Rotterdam, The Nether-lands, 1989.

3. Workshop on Linear Algebra for Control Theory, Institute for Mathe-matics and its Applications, University of Minnesota, USA, June 1992.

4. Colloquium of the Royal Dutch Academy of Arts and Sciences on Chal-lenges of a Generalized System Theory, June 1992.

5. The Second Conference of ILAS, Lisbon, Portugal, August 1992.6. The Third Conference of ILAS, Rotterdam, The Netherlands, August1994.

7. International Workshop on Operator Theory and Applications(IWOTA-95), Regensburg, Germany, August 1995.

8. Workshop on Operator Theory and Analysis on the occasion of the 60thbirthday of M.A. Kaashoek, Vrije Universiteit, Amsterdam, November1997.

9. AMS-IMS-SIAM Summer Research Conference on Structured Matri-ces in Operator Theory, Numerical Analysis, Control, Signal and ImageProcessing, July 1999.

10. Workshop on Operator Theory, System Theory and Scattering Theory,Beer-Sheba, June 2005.

Leonid (Arie) LererDepartment of MathematicsTechnion-Israel Institute of TechnologyHaifa 32000, Israele-mail: [email protected]

mailto:[email protected]

Operator Theory:Advances and Applications, Vol. 237, xiii–xixc⃝ 2013 Springer Basel

Leonid Lerer’s List of Publications

Theses

1. M.Sc. “Localization of zeroes of polynomials and some properties of normalmatrices”, Kishinev State University, 1965.

2. Ph.D. “Some problems in the theory of linear operators and in the theory ofbases in locally convex spaces”, Physico-Technical Institute of Low Temper-atures of the Academy of Sciences of the Ukrainian SSR, Kharkov, 1969.

Original papers in professional journals

1. L. Lerer, On the diagonal elements of normal matrices, Mat. Issled. 2 (1967),153–163.

2. L. Lerer, About the spectral theory of bounded operators in a locally convexspace, Mat. Issled. 2 (1967), 206–214.

3. L. Lerer, On completeness of the system of root vectors of a Fredholm oper-ator in a locally convex space, Mat. Issled. 3 (1968), 31–60.

4. L. Lerer, Basic sequences in a Montel space, Mat. Zametki 6 (1969), 329–334.Announcement of results: Mat. Issled. 3 (1968), 235–236.

5. L. Lerer, Certain criteria for stability of bases in locally convex spaces, I,Mat. Issled. 4 (1969), 35–55.

6. L. Lerer, Certain criteria for stability of bases in locally convex spaces, II,Mat. Issled. 4 (1969), 42–57.

7. L. Lerer, The stability of bases in locally convex spaces, Dokl. Akad. NaukU.S.S.R. 184 (1969), 30–33; Soviet Math. Dokl. 10 (1969), 24–28.

8. L. Lerer, On a class of perturbation for operators that admit reduction, Mat.Issled. 6 (1971), 168–173.

9. S. Buzurniuk, L. Lerer and E. Shevchik, An operator modification of SeidelMethod, Tezisi Dokl. Nauchn. Konf. Kishinev, (1972), 22–23.

10. L. Lerer, The asymptotic distribution of the spectra of finite truncations ofWiener–Hopf operators, Dokl. Akad. Nauk U.S.S.R. 207 (1972), 1035–1038;Soviet Math. Dokl. 13 (1972), 1651–1655.

11. L. Lerer, The asymptotic distribution of the spectra. I. General theorems andthe distribution of the spectrum of truncated Wiener–Hopf operators, Mat.Issled. 8 (1972), 141–164.

xiv Leonid Lerer’s List of Publications

12. L. Lerer, On the asymptotic distribution of the spectra. II. The distributionof the spectrum of truncated dual operators, Mat. Issled. 8 (1973), 84–95.

13. I. Gohberg and L. Lerer, Resultants of matrix polynomials, Bull. Amer. Math.Soc. 82 (1976), 565–567.

14. I. Gohberg and L. Lerer, Singular integral operators as a generalization ofthe resultant matrix, Applicable Anal. 7 (1977/78), 191–205.

15. I. Gohberg, L. Lerer and L. Rodman, Factorization indices for matrix poly-nomials, Bull. Amer. Math. Soc. 84 (1978), 275–277.

16. L. Lerer, On approximating the spectrum of convolution type operators. I.Wiener–Hopf matricial integral operator, Israel J. Math. 30 (1978), 339–362.

17. I. Gohberg and L. Lerer, On resultant operators of a pair of analytic functions,Proc. Amer. Math. Soc. 72 (1978), 65–73.

18. I. Gohberg, L. Lerer and L. Rodman, On canonical factorization of opera-tor polynomials, spectral divisors and Toeplitz matrices, Integral EquationsOperator Theory 1 (1978), 176–214.

19. I. Gohberg and L. Lerer, Factorization indices and Kronecker indices, IntegralEquations Operator Theory 2 (1979), 199–243. Erratum: Integral EquationsOperator Theory 2 (1979), 600–601.

20. I. Gohberg, L. Lerer and L. Rodman, Stable factorization of operator poly-nomials, I. Spectral divisors simply behaved at infinity, J. Math. Anal. Appl.74 (1980), 401–431.

21. I. Gohberg, L. Lerer and L. Rodman, Stable factorization of operator polyno-mials, II. Main results and applications to Toeplitz operators, J. Math. Anal.Appl. 75 (1980), 1–40.

22. I. Gohberg, M.A. Kaashoek, L. Lerer and L. Rodman, Common multiples andcommon divisors of matrix polynomials, I. Spectral method, Indiana Univ.Math. J. 30 (1981), 321–356.

23. L. Lerer and M. Tismenetsky, The Bezoutian and the eigenvalue separationproblem, Integral Equations Operator Theory 5 (1982), 386–445.

24. I. Gohberg, M.A. Kaashoek, L. Lerer and L. Rodman, Common multiplesand common divisors of matrix polynomials, II. Vandermonde and resultant,Linear and Multilinear Algebra 12 (1982/83), 159–203.

25. I. Gohberg and L. Lerer, On non-square sections of Wiener–Hopf operators,Integral Equations Operator Theory 5 (1982), 518–532. Errata: Integral Equa-tions Operator Theory 6 (1983), 904.

26. I. Gohberg, L. Lerer and L. Rodman, Wiener–Hopf factorization of piecewisematrix polynomials, Linear Algebra Appl. 52/53 (1983), 315–350.

27. L. Lerer, L. Rodman and M. Tismenetsky, Bezoutian and Schur–Cohn prob-lem for operator polynomials, J. Math. Anal. Appl. 103 (1984), 83–102.

Leonid Lerer’s List of Publications xv

28. P. Lancaster, L. Lerer and M. Tismenetsky, Factored forms for solutions of𝐴𝑋 −𝑋𝐵 = 𝐶 and 𝑋 − 𝐴𝑋𝐵 = 𝐶 in companion matrices, Linear AlgebraAppl. 62 (1984), 19–49.

29. I. Gohberg, M.A. Kaashoek, L. Lerer and L. Rodman, Minimal divisors ofrational matrix functions with prescribed zero and pole structure, in: Topicsin operator theory systems and networks (Rehovot, 1983), Oper. Theory Adv.Appl. 12, Birkhauser Verlag, Basel, 1984, pp. 241–275.

30. L. Lerer and M. Tismenetsky, On the location of spectrum of matrix poly-nomials, Contemp. Math. 47 (1985), 287–297.

31. B. Kon and L. Lerer, Resultant operators for analytic functions in a simpleconnected domain, Integral Equations Operator Theory 9 (1986), 106–120.

32. I. Gohberg, M.A. Kaashoek and L. Lerer, Minimality and irreducibility of thetime invariant linear boundary value systems, Internat. J. Control 44 (1986),363–379.

33. I. Gohberg, M.A. Kaashoek, L. Lerer and L. Rodman, On Toeplitz andWiener–Hopf operators with contourwise rational matrix and operator sym-bols, in: Constructive methods of Wiener–Hopf factorization, Oper. TheoryAdv. Appl. 21, Birkhauser Verlag, Basel, 1986, pp. 75–126.

34. L. Lerer and M. Tismenetsky, Generalized Bezoutian and the inversion prob-lem for block matrices, I. General scheme, Integral Equations Operator Theory9 No. 6 (1986), 790–819.

35. L. Lerer and H.J. Woerdeman, Resultant operators and the Bezout equationfor analytic matrix functions, I. J. Math. Anal. Appl. 125 (1987), 531–552.

36. L. Lerer and H.J. Woerdeman, Resultant operators and the Bezout equationfor analytic matrix functions, II. J. Math. Anal. Appl. 125 (1987), 553–567.

37. I. Gohberg, M.A. Kaashoek and L. Lerer, On minimality in the partial real-ization problem, Systems Control Lett. 9 (1987), 97–104.

38. I. Gohberg and L. Lerer, Matrix generalizations of M.G. Kreın theorems onorthogonal polynomials, in: Orthogonal matrix-valued polynomials and appli-cations (Tel Aviv, 1987/88), Oper. Theory Adv. Appl. 34, Birkhauser Verlag,Basel, 1987/88, pp. 137–202.

39. L. Lerer and M. Tismenetsky, Generalized Bezoutian and matrix equations,Linear Algebra Appl. 99 (1988), 123–160.

40. I. Gohberg, M.A. Kaashoek and L. Lerer, Nodes and realizations of rationalmatrix functions: minimality theory and applications, in: Topics in operatortheory and interpolation, Oper. Theory Adv. Appl. 29, Birkhauser Verlag,Basel, 1988, pp. 181–232.

41. L. Lerer, The matrix quadratic equation and factorization of matrix polyno-mials, in: The Gohberg anniversary collection, Vol. I (Calgary, AB, 1988),Oper. Theory Adv. Appl. 40, Birkhauser Verlag, Basel, 1989, pp. 279–324.

xvi Leonid Lerer’s List of Publications

42. L. Lerer, L. Rodman and M. Tismenetsky, Inertia theorems for matrix poly-nomials, Linear and Multilinear Algebra 30 (1991), 157–182.

43. I. Gohberg, M.A. Kaashoek and L. Lerer, A directional partial realizationproblem, Systems Control Lett. 17 (1991), 305–314.

44. I. Gohberg, M.A. Kaashoek and L. Lerer, Minimality and realization of dis-crete time-varying systems, in: Time-variant systems and interpolation, Oper.Theory Adv. Appl. 56, Birkhauser Verlag, Basel, 1992, pp. 261–296.

45. L. Lerer and L. Rodman, Spectrum separation and inertia for operator poly-nomials, J. Math. Anal. Appl. 169 (1992), 260–282.

46. I. Gohberg, M.A. Kaashoek and L. Lerer, Minimal rank completion problemsand partial realization, in: Recent Advances in Math. Theory of Systems,Control, Networks and Signal Processing, I, Mita Press, Tokyo, 1992, 65–70.

47. L. Lerer and L. Rodman, Sylvester and Lyapunov equations and some in-terpolation problems for rational matrix functions, Linear Algebra Appl. 185(1993), 83–117.

48. I. Koltracht, B.A. Kon and L. Lerer, Inversion of structured operators, Inte-gral Equations Operator Theory 20 (1994), 410–448.

49. G. Gomez and L. Lerer, Generalized Bezoutian for analytic operator functionsand inversion of structured operators, in: Systems and Networks: Mathemat-ical Theory and Applications (U. Helmke, R. Mennicken, J. Saures, eds.),Akademie Verlag, 1994, pp. 691–696.

50. L. Lerer and L. Rodman, Bezoutians and factorizations of rational matrixfunctions and matrix equations, in: Systems and Networks: MathematicalTheory and Applications (U. Helmke, R. Mennicken, J. Saures, eds.), Akade-mie Verlag, 1994, pp. 761–766.

51. I. Haimovici and L. Lerer, Bezout operators for analytic operator functions.I. A general concept of Bezout operator, Integral Equations Operator Theory21 (1995), 33–70.

52. L. Lerer and L. Rodman, Inertia of operator polynomials and stability ofdifferential equations, J. Math. Anal. Appl. 192 (1995), 579–606.

53. L. Lerer and L. Rodman, Common zero structure of rational matrix functions,J. Funct. Anal. 136 (1996), 1–38.

54. I. Gohberg, M.A. Kaashoek and L. Lerer, Factorization of banded lower tri-angular infinite matrices, Linear Algebra Appl. 247 (1996), 347–357.

55. L. Lerer and L. Rodman, Symmetric factorization and localization of zeroesof rational matrix functions, Linear and Multilinear Algebra 40 (1996), 259–281.

56. L. Lerer and L. Rodman, Bezoutian of rational matrix functions, J. Funct.Anal. 141 (1996), 1–36.

Leonid Lerer’s List of Publications xvii

57. L. Lerer and A.C.M. Ran, 𝐽-pseudo-spectral and 𝐽-inner-pseudo-outer fac-torization for matrix polynomials, Integral Equations Operator Theory 29(1997), 23–51.

58. L. Lerer and L. Rodman, Inertia theorems for Hilbert space operators basedon Lyapunov and Stein equations, Math. Nachr. 198 (1999), 131–148.

59. L. Lerer and L. Rodman, Bezoutian of rational matrix functions, matrix equa-tions and factorizations of rational matrix functions, Linear Algebra Appl.302-303 (1999), 105–133.

60. I. Karelin and L. Lerer, Generalized Bezoutian, matrix quadratic equationsand factorization of rational matrix functions, in: Recent advances in operatortheory (Groningen, 1998), Oper. Theory Adv. Appl. 122, Birkhauser Verlag,Basel, 2001, pp. 303–321.

61. I. Karelin, L. Lerer and A.C.M. Ran, 𝐽-symmetric factorization and the al-gebraic Riccati equation, in: Recent advances in operator theory (Groningen,1998), Oper. Theory Adv. Appl. 124, Birkhauser Verlag, Basel, 2001, pp.319–360.

62. I. Karelin and L. Lerer, Matrix quadratic equations and column/row factor-ization of matrix polynomials, Int. J. Appl. Math. Comput. Sci. 11 (2001),1285–1310.

63. L. Lerer and L. Rodman, Inertia bounds for matrix polynomials and appli-cations, in: Linear operators and matrices, Oper. Theory Adv. Appl. 130,Birkhauser Verlag, Basel, 2002, pp. 255–276.

64. L. Lerer and A. Ran, A new inertia theorem for Stein equations, inertia of in-vertible block Toeplitz matrices and matrix orthogonal polynomials, IntegralEquations Operator Theory 47 (2003), 339–360.

65. L. Lerer, M.A. Petersen and A.C.M. Ran, Existence of minimal nonsquare𝐽-symmetric factorizations for self-adjoint rational matrix functions, LinearAlgebra Appl. 379 (2004), 159–178.

66. I. Gohberg, I. Haimovici, M.A. Kaashoek and L. Lerer, The Bezout inte-gral operator: main property and underlying abstract scheme, in: The statespace method generalizations and applications, Oper. Theory Adv. Appl. 161,Birkhauser Verlag, Basel, 2005, pp. 225–270.

67. I. Gohberg, M.A. Kaashoek and L. Lerer, Quasi-commutativity of entire ma-trix functions and the continuous analogue of the resultant, in:Modern opera-tor theory and applications, Oper. Theory Adv. Appl. 170, Birkhauser Verlag,Basel, 2007, pp. 101–106.

68. I. Gohberg, M.A. Kaashoek and L. Lerer, The continuous analogue of the re-sultant and related convolution operators, in: The Extended Field of OperatorTheory, Oper. Theory Adv. Appl. 171, Birkhauser, Basel, 2007, 107–127.

69. I. Gohberg, M.A. Kaashoek and L. Lerer, On a class of entire matrix functionequations, Linear Algebra Appl. 425 (2007), 434–442.

xviii Leonid Lerer’s List of Publications

70. I. Gohberg, M.A. Kaashoek and L. Lerer, The inverse problem for Kreınorthogonal matrix functions, (Russian) Funktsional. Anal. i Prilozhen. 41(2007), 44–57; translation in Funct. Anal. Appl. 41 (2007), 115–125.

71. L. Lerer, I. Margulis and A.C.M. Ran, Inertia theorems based on operatorLypunov equations, Oper. Matrices 2 (2008), 153–166.

72. I. Gohberg, M.A. Kaashoek and L. Lerer, The resultant for matrix polynomi-als and quasi commutativity, Indiana Univ. Math. J. 57 (2008), 2793–2813.

73. D. Alpay, I. Gohberg, M.A. Kaashoek, L. Lerer and A. Sakhnovich, Kreınsystems, in: Modern analysis and applications. The Mark Kreın CentenaryConference. Vol. 2: Differential operators and mechanics, Oper. Theory Adv.Appl. 191, Birkhauser Verlag, Basel, 2009, pp. 19–36.

74. M.A. Kaashoek and L. Lerer, Quasi-commutativity of regular matrix poly-nomials: resultant and Bezoutian, in: Topics in operator theory. Volume 1.Operators, matrices and analytic functions, Oper. Theory Adv. Appl. 202,Birkhauser Verlag, Basel, 2010, pp. 297–314.

75. M.A. Kaashoek, L. Lerer and I. Margulis, Kreın orthogonal entire matrixfunctors and related Lypunov equations: A state space approach, IntegralEquations Operator Theory 65 (2009), 223–242.

76. D. Alpay, I. Gohberg, M.A. Kaashoek, L. Lerer and A.L. Sakhnovich, Kreınsystems and canonical systems on a finite interval: accelerants with a jumpdiscontinuities at the origin and continuous potentials, Integral EquationsOperator Theory 68 (2010), 115–150.

77. L. Lerer and A.C.M. Ran, The discrete algebraic Riccati equation and Her-mitian block-Toeplitz matrices, in: A panorama of modern operator theoryand related topics, Oper. Theory Adv. Appl. 218, Birkhauser/ Springer BaselAG, Basel, 2012, pp. 495–512.

78. M.A. Kaashoek and L. Lerer, The band method and inverse problems fororthogonal functions of Szego–Kreın type, Indag. Math. (N. S.) 23 (2012),900–920.

79. M.A. Kaashoek and L. Lerer, On a class of matrix polynomial equations,Linear Algebra Appl. 439 (2013), 613–620.

Edited books

1. Convolution equations and singular integral equations. Edited by L. Lerer,V. Olshevsky and I. Spitkovsky, Oper. Theory Adv. Appl. 206, BirkhauserVerlag, Basel, 2010.

2. A panorama of modern operator theory and related topics. The Israel Go-hberg memorial volume. Edited by Harry Dym, Marinus A. Kaashoek, PeterLancaster, Heinz Langer and Leonid Lerer. Oper. Theory Adv. Appl. 218,Birkhauser/Springer Basel AG, Basel, 2012.

Leonid Lerer’s List of Publications xix

Other publications(Reports listed were not fully incorporated in papers published elsewhere)

1. I. Gohberg, L. Lerer and L. Rodman, On factorization, indices and com-pletely decomposable matrix polynomials, Technical report 80–47, Tel-AvivUniversity, 1980, 72 pages.

2. Feldman, I.A. Wiener–Hopf operator equations and their application to thetransport equation. Translated from the Russian by C.G. Lekkerker, L. Lererand R. Troelstra. Integral Equations Operator Theory 3 (1980), 43–61.

3. I. Gohberg, M.A. Kaashoek, L. Lerer and L. Rodman, Common multiplesand common divisors of matrix polynomials, II. Vandermonde and resultantmatrices, Technical report 80–53, Tel-Aviv University, 1981, 122 pages.

4. L. Lerer and H.J. Woerdeman, Resultant operators and the Bezout equationfor analytic matrix functions, Rapport No. 299, Vrije Universiteit, Amster-dam, 1985, 54 pages.

5. H. Bart, M.A. Kaashoek and L. Lerer, Review on “Matrix Polynomials” byI. Gohberg, P. Lancaster and L. Rodman, Linear Algebra Appl. 64 (1985),167–172.

6. L. Lerer and M. Tismenetsky, Toeplitz classification of matrices and inversionformulas, II. Block-Toeplitz and perturbed block-Toeplitz matrices, TechnicalReport, IBM SC, Haifa, 1986, 38 pages.

7. L. Lerer and A.C.M. Ran, On a new inertia theorem and orthogonal polyno-mials, Proceedings of the Sixteenth International Symposium on MathematicalTheory of Networks and Systems (MTNS), Leuven, Belgium, 2004.

8. A. Berman, L. Lerer and R. Loewy, Preface to the 2005 Haifa Matrix TheoryConference at the Technion, Linear Algebra Appl. 416 (2006), 15–16.

9. M.A. Kaashoek and L. Lerer, Gohberg’s mathematical work in the period1998–2008, in: Israel Gohberg and his Friends, eds. H. Bart, T. Hempfling,M.A. Kaashoek, Birkhauser Verlag, Basel, 2008, pp. 111–115.

10. L. Lerer, V. Olshevsky and I. Spitkovsky, Introduction, in: Convolution equa-tions and singular integral operators, Oper. Theory Adv. Appl. 206, Birkhau-ser Verlag, Basel, 2010, pp. ix–xxii.

Operator Theory:Advances and Applications, Vol. 237, 1–7c⃝ 2013 Springer Basel

Leonia Lerer’s Mathematical Workand Amsterdam Visits

M.A. Kaashoek

It is a great pleasure to congratulate professor Lerer on the occasion of his 70thbirthday and to wish him many happy returns.

I will refer to professor Lerer as Leonia, the name used by his Dutch friends.Arie would have been an alternative. The latter is a very common Dutch name,but few know about its Hebrew meaning. Leonid is out of the question; too manypolitical recollections.

Leonia Lerer, Bil Helton, Harm Bart, Israel Gohberg, Joe PincusVU University campus, 1979

Leonia and I met for the first time in October 1976 at the Mathemati-sches Forschungsinstitut Oberwolfach, a wonderful conference resort in the south-western part of Germany (at that time Bundesrepublik Deutschland), in the blackforest near Freiburg. It was a very special meeting, organized by Gohberg, Gramschand Neubauer, with a select group of 39 participants, including a strong US delega-tion consisting of Kevin Clancey, Lewis Coburn, Chandler Davis, Ron Douglas, Bill

2 M.A. Kaashoek

Helton, and Joe Pincus. From Israel, besides Gohberg, three other participants:Harry Dym, Paul Furhrman, and Leonia Lerer. Directly after this meeting RonDouglas and Leonia came to Amsterdam. It was Leonia’s first visit. Three yearslater, again directly after an Oberwolfach meeting, we had a mini-conference inAmsterdam with the five persons on the picture at the previous page as the mainlecturers. If I remember it correctly, this was Leonia’s third visit to Amsterdam.

Many other visits were to follow. There were short visits of about 3 to 5weeks, in total more than 20, often supported by the Dutch National ScienceFoundation. Apart from that, having a sabbatical at the Technion, Leonia heldvisiting professorships at the VU University, for four periods:

∙ February 1984–February 1985,∙ March–September 1990,∙ April–November 2002,∙ February–April and July–August 2003.

In his first period as visiting professor, Leonia supervised the master thesis of HugoWoerdeman, now professor at Drexel University and co-editor of this volume. Thejoint work of Leonia and Hugo resulted in two papers on “Resultant operators andthe Bezout equation for analytic matrix functions,” both appeared in 1987 in theJournal of Math. Anal. Appl. Later in an acknowledgment in his PhD thesis, Hugowrote: I am indebted to professor Lerer, who introduced me in a very stimulatingway to mathematical research.

When one lives in the Netherlands for so many years as Leonia did, one learnsthe local customs and uses the local means of transportation:

On the bicycle with Israel Gohberg.

Leonia Lerer’s Mathematical Work and Amsterdam Visits 3

Leonia and I have 24 joint papers, of these 24 papers 19 were written jointlywith Israel Gohberg. Our many meetings, in the Netherlands as well as in Israel,and our joint work belong to the gratifying experiences of my life. Andre Ran isLeonia’s second co-author at the VU Amsterdam; together Leonia and Andre wrote7 joint papers on topics involving 𝐽-spectral factorization and inertia theorems.

What did Leonia talk about at Oberwolfach in 1976? Here I present theabstract of his talk as it appears in the Tagungsbericht :

The abstract has three elements. First element: from polynomials to analyticfunctions; second element: from resultant matrices to resultant operators, andthird element: singular integral operators as generalized resultant matrices. It wasa beautiful lecture.

Leonia’s work has a wonderful mix of matrix and operator theory on the onehand and matrix function theory on the other hand. It reminds me of a statementPaul Halmos once made in an interview. He said: I still have this religion that ifyou know the answer to every matrix question, somehow you answer every operatorquestion. I do not believe in this statement, and I think Leonia does not either;there is much more two way traffic between the two fields. But Leonia’s talk atOberwolfach certainly provided some support for the Halmos religion.

Resultants and Bezout operators form a main theme in Leonia’s work. Aboutone third of his papers after 1976 have the word resultant or Bezout in the title.His talk in Oberwolfach was based on his first papers after immigration to Israel.They appeared in the Bulletin and Proceedings of the AMS, both in 1976, andboth co-authored by Gohberg.

His work in this area is partially motivated by mathematical system andcontrol theory with location of zeros and problems of stability as main themes.The famous Anderson-Jury paper in IEEE Transactions Automatic Control from1976 served as a source of inspiration for his later work on generalized Bezoutoperators. As a further illustration of Leonia’s work I will discuss in more detail

4 M.A. Kaashoek

his 1994 paper, with Israel Koltracht and Ben Kon as co-authors. I consider thisarticle as one of Leonia’s top papers. It has a short title:

The KKL-paper deals with bounded linear integro-differential operators 𝐴on 𝐿2[0, 𝜔] of which the action is given by

(𝐴𝑓)(𝑥) =𝑑

𝑑𝑥

∫ 𝜔

0

(∂

∂𝑡Φ(𝑥, 𝑡)

)𝑓(𝑡) 𝑑𝑡,

Φ(𝑥, 𝑡) =1

2

𝑛∑𝑗=1

∫ 2𝜔−∣𝑥−𝑡∣

𝑥+𝑡

𝑐𝑗

(𝑠+ 𝑥− 𝑡

2

)𝑏𝑗

(𝑠− 𝑥+ 𝑡

2

)𝑑𝑠.

Here 𝑏𝑗 , 𝑐𝑗 ∈ 𝐿2[0, 𝜔], 𝑗 = 1, 2, . . . , 𝑛. If 𝐴 is as above, we say that 𝐴 belongs to theKKL class, and we shall refer to 𝑏𝑗 , 𝑐𝑗 ∈ 𝐿2[0, 𝜔], 𝑗 = 1, 2, . . . , 𝑛, as the definingdata.

The paper has two beautiful theorems. To state the first theorem we needthe Volterra integral operator on 𝐿2[0, 𝜔] which is defined by

(𝑉 𝑓)(𝑥) = −𝑖∫ 𝑥

0

𝑓(𝑡) 𝑑𝑡, 0 ≤ 𝑥 ≤ 𝜔.

Furthermore, with the defining data 𝑏𝑗, 𝑐𝑗 ∈ 𝐿2[0, 𝜔], 𝑗 = 1, 2, . . . , 𝑛, and theVolterra integral operator we associate the Lyapunov equation

𝑋𝑉 − 𝑉 ∗𝑋 =

𝑛∑𝑗=1

𝑐𝑗⟨⋅, 𝑏𝑗⟩. (1)

Note that the right-hand side of the Lyapunov equation (1) is an operator of finiterank. If this operator is zero, then 𝑋 is also equal to zero. Thus, using terminol-ogy common for structured matrices, an operator 𝑋 satisfying (1) is an operatorwith a relatively small displacement. The authors proved that these operators 𝑋with a relative small displacement are precisely the integro-differential operators𝐴 introduced above. This is the first beautiful theorem.

Theorem 1. [KKL-1994] An operator 𝐴 on 𝐿2[0, 𝜔] belongs to the KKL class ifand only if for some 𝑏𝑗 , 𝑐𝑗 ∈ 𝐿2[0, 𝜔], 𝑗 = 1, 2, . . . , 𝑛 the operator 𝑋 = 𝐴 satisfiesthe identity (1).

Now assume that 𝐴 satisfies the Lyapunov equation (1), and let 𝐴 be invert-ible. Multiplying (1) from the left and from the right by the inverse of 𝐴 yields


another Lyapunov operator identity, which is analogous to (1):

𝑉 𝐴−1 − 𝐴−1𝑉 ∗ =𝑛∑

𝑗=1

𝛾𝑗⟨⋅, 𝛽𝑗⟩, where 𝐴𝛽𝑗 = 𝑏𝑗 and 𝐴𝛾𝑗 = 𝑐𝑗 .

A variant of the first theorem now yields the second which can be viewed asan operator analogue of the finite-dimensional inversion theorems for Toeplitzmatrices due to Gohberg–Semencul and Gohberg–Heinig.

Theorem 2. [KKL-1994] Assume 𝐴 belongs to the KKL class with defining data𝑏𝑗 , 𝑐𝑗 ∈ 𝐿2[0, 𝜔], 𝑗 = 1, 2, . . . , 𝑛, and let 𝐴 be a Fredholm operator. If the equations

𝐴𝛽𝑗 = 𝑏𝑗 and 𝐴𝛾𝑗 = 𝑐𝑗 (𝑗 = 1, . . . , 𝑛)

are solvable, then 𝐴 has a bounded inverse and 𝐴−1 is given by

(𝐴−1𝑓) =𝑑

𝑑𝑥

∫ 𝜔

0

(∂

∂𝑡𝑘(𝑥, 𝑡)

)𝑓(𝑡) 𝑑𝑡,

where

𝑘(𝑥, 𝑡) =

𝑛∑𝑗=1

∫ min(𝑥,𝑡)

0

𝛾𝑗(𝑥− 𝑠)𝛽𝑗(𝑡− 𝑠) 𝑑𝑠.

The KKL paper present lots of examples, another characteristic of Leonia’swork. Here we mention the following two, both are taken from the KKL paper.

Example 1. Let 𝑘 ∈ 𝐿1[−𝜔, 𝜔], and let 𝐴 be the operator from the KKL classdefined by the following data:

𝑏1(𝑥) = 1, 𝑏2(𝑥) = 1 +

∫ 𝜔−𝑥

0

𝑘(𝑠) 𝑑𝑠,

𝑐1(𝑥) =

∫ 0

𝑥−𝜔

𝑘(𝑠) 𝑑𝑠, 𝑐2(𝑥) = 1, 0 ≤ 𝑥 ≤ 𝜔.

Then 𝐴 is the convolution operator given by

(𝐴𝑓)(𝑡) = 𝑓(𝑡) +

∫ 𝜔

0

𝑘(𝑡− 𝑠)𝑓(𝑠) 𝑑𝑠.

Example 2. Let 𝜙, 𝜓 ∈ 𝐿2[0, 𝜔], and let 𝐴 be the operator from the KKL classdefined by the following data:

𝑏1(𝑥) = 𝜙(𝑥), 𝑏2(𝑥) = 𝜓(𝑥),

𝑐1(𝑥) = 1− 𝜓(𝜔 − 𝑥) 𝑐2(𝑥) = 1− 𝜓(𝜔 − 𝑥), 0 ≤ 𝑥 ≤ 𝜔.Then 𝐴 is the Bezout operator defined by the entire functions

𝐹 (𝑧) = 1 + 𝑖𝑧

∫ 𝜔

𝑜

𝑒𝑖𝑧𝑡𝜙(𝑡) 𝑑𝑡 and 𝐺(𝑧) = 1 + 𝑖𝑧

∫ 𝜔

𝑜

𝑒𝑖𝑧𝑡𝜙(𝑡) 𝑑𝑡.

Other themes in the work of Leonia are:

∙ Operators in locally convex spaces∙ Asymptotic distribution of spectra and related limit theorems

6 M.A. Kaashoek

∙ Rational matrix functions∙ Spectral theory of matrix and operator polynomials∙ Inverse problems for Szego–Krein orthogonal polynomials and their continu-ous analogs

∙ Partial realization problems∙ Minimality of partial realization problems in discrete time, including multi-variable systems

The majority of my joint papers with Leonia belong to the areas described by thelast four bullets.

Leonia and Israel Gohberg

Gohberg had a profound influence on the work of Leonia. He was Leonia’smathematical (grand-)father, and according to MatSciNet Gohberg is Leonia’s topco-author. Gohberg’s passing away in October 2009 was a great personal loss forLeonia, as for many of us. For Leonia it meant a set back for a long period.

Leonia and Alek MarkusAmsterdam (November 29, 2002)


Alek Markus is Leonia’s doctor-father. Of course, the mathematical son fol-lowed his mathematical fathers in many ways, but not always. On one particularnon-mathematical point, it was the other way around: the mathematical son wasleading and his mathematical fathers were following. Of the three, Leonia was thefirst to immigrate from Kishinev to Israel. Gohberg followed later and Markusmany years later.

I conclude with best wishes: for Leonia personally, for Bertha his wife, andfor his two daughters Hannah and Safira, and for the new family member Gal, thehusband of Safira. It is my sincere hope that both of us will have the time andenergy to continue our joint work.

M.A. KaashoekDepartment of MathematicsFaculty of SciencesVU UniversityDe Boelelaan 1081aNL-1081 HV Amsterdam, The Netherlandse-mail: [email protected]



Leonia Lerer and My First Visit to Israel

H. Bart

In 1981, my wife Greetje and I went to Israel for the first time. I was to attendthe Toeplitz Memorial Conference in Tel-Aviv organized by Israel Gohberg. Butbefore going there we went to Haifa. There, at the Technion, I gave my veryfirst lecture in Israel. The topic was ‘New methods for solving integral equations’.Leonia Lerer was our host. We had already met him before during one of his visitsto Amsterdam.

For both Greetje and me, coming to Israel meant something special. Being childrenof traditionally protestant parents we were raised with the stories from the Bible.So we were excited to be at the places we had heard about so much. We madeseveral trips, some on our own, but at least one with Leonia. He took us to RoshHaNikra, a white chalk cliff face located on the coast of the Mediterranean Sea,which opens up into spectacular interconnected grottos. Leonia’s daughter Hannawent with us, doing the driving. Bertha, Leonia’s wife could not join us becauseshe was pregnant. A little later Saffira was born.

10 H. Bart

We had good contacts as families. Several times Leonia was our guest inThe Netherlands, sometimes on his own, sometimes accompanied by other familymembers. In preparing this note, my wife and I went through our old pictures inorder to find traces of such get togethers. The photographs we found brought backgood memories. The picture above, showing Leonia in our garden talking withRoland Duduchava, was taken in 2003 during a celebration of Israel’s Gohberg75th birthday.

Leonia: congratulations with your seventieth birthday and best wishes to youand your family for the future. Also thanks for everything you did in connectionwith the many Haifa matrix theory conferences.

H. BartEconometric InstituteErasmus University RotterdamP.O. Box 1738NL-3000 DR Rotterdam, The Netherlandse-mail: [email protected]



Through the Eyes of a Student

Irina Karelin

I began my graduate studies in 1989 with Professor Lerer. I did my aliah in March1989. The main reason to come to Haifa was to study in Technion – Israel Instituteof Technology. I was a graduate of Kharkov University (Ukraine) in Mathematicsand wanted to continue my studies. My degree thesis was in Operator theoryarea and I of course expected to find somebody who is working in this area andwill agree to be my supervisor for master studies. I came to the Department ofMathematics in Technion to Professor Abraham Berman. He asked me about mystudies in Kharkov and referred me to Professor Lerer.

We talked about my previous studies and my final student work. Leonid toldme that the subject of my work is rather close to one of his areas of interest, thathe has some problems to be solved and proposed me to do my master degree underhis supervision. I was happy to accept his offer and we started to work together.To approach to the subject of your future research Leonid said you have to reada lot. He proposed me a long list of articles and some books for reading. Amongthese works were the books Matrix Polynomials by Gohberg, Lancaster, Rodman,The Theory of Matrices by Lancaster and Tismenetsky, articles by Lerer and hisjoint works with Rodman, Tismenetsky and some other authors. In the beginningall seemed to me unknown and new but I always could turn to Leonid with anyquestion. I understood it later that all that Leonid advised to do was well thoughtout, methodically correct. All chapters and articles he chose much contributed tomy entering into the subject.

At the same time during my first semester in Technion I attended Leonid’slectures for graduate students on realizations and factorizations of rational matrixfunctions. I liked him as a lecturer. It was not only my impression. I heard thisopinion from other students. His teaching manner was calm and pleasing. Hisexplanations always were very clear and accurate. Atmosphere he created wasalways good and friendly.

The subject of my research was chosen. It was related to factorizations ofcolumn and row reduced matrix polynomials. At the beginning our meetings werenot so frequent, later on we met much more often. But every meeting was very use-ful and productive for me. Sometimes my research progress was good and smooth,sometimes I felt frustrated from unsuccessful attempts to find a solution of some

12 I. Karelin

problem. Even in such crisis moments of my work I knew that great mathematicianand very good man, Professor Lerer always encourages and supports me, turns metowards right direction, gives useful advice.

Leonid himself worked very intensively and also had many unsolved problemsfor his students. When I decided to study for PhD degree and Leonid agreed tocontinue with me there was no question what to do. He proposed me a number ofopportunities for my research. The theme of my work was chosen. It was connectedto factorization of rational matrix functions and Riccati equations. Leonid’s styleof supervision was as it had to be. He didn’t generally tell me, and I supposeto other students as well, what to do. He encouraged our independent research,didn’t restrict our freedom, but he always might warn us of some things and triedto steer us towards others. So when I prepared my conference report his assistancewas very important and fruitful. He helped me to choose more profitable resultsfor the presentation, to build the report in the right form, to provide possiblequestions.

Many times I run tutorials in the courses where Leonid taught. Students ap-preciated Leonid as a lecturer very much. I heard from students that they liked hisexplanations, liked his logical and consistent presentation of material, illustratedwith many examples and solved problems. Working with Leonid as his teachingassistant I acquired wide experience for my future professional life.

Papers of Leonid always had sharp form and profound content. Statementsand proofs of results were exact, explanations at the same time were sufficient andnever redundant. Leonid also asked his students exact statements and logical, fullexplanations. When I prepared my PhD thesis Leonid read the written text andreturned it back to me many times until the version of the thesis was good.

Today I work in industry as an algorithm developer. While working with Pro-fessor Lerer I acquired many skills very useful in my professional activities. Leonidtaught me to do thorough literature review, helped me to develop my mathematicalthinking, my writing skills, taught me to be an independent researcher.

Dear Professor Lerer! I want to express my deep gratitude and admiration!Happy birthday!

Irina KarelinCmosaiX, Ltd.Kibutz Yagur 30065, Israele-mail: [email protected]



Reminiscences on Visits to Haifa

Andre C.M. Ran

My first contacts with Leonid Lerer date from the middle of the eighties, when Iwas still a PhD student at the VU university. Leonia visited Amsterdam regularly,and we maintained warm contacts without actually working together. This changedwhen I visited Israel for a somewhat longer time in 1996.We started a collaborationwhich produced a first paper in 1997, and I became a frequent visitor to Haifa afterthis. All in all, we wrote six papers together, several with students of Leonia.

To visit the Technion and work with Leonia was always a great pleasure. Themathematics was wonderful, and the warm personality of Leonia made visitingthere a joy. Leonia was aware of the sensitivities of my family regarding travel toIsrael, and so he made sure I was met at the airport by a trusted driver. I was alsonot allowed to leave campus without a local to guide me.

In early 1998 I visited Haifa for the Tenth Haifa Matrix Meeting. Beforeleaving I talked to my father, who was terminally ill. A few days into my visit inHaifa my sister called me to inform me that my father had passed away quietlyon January 7. I will always be grateful for the support and warmth that I receivedfrom Leonia on that day and the following one. Obviously, I had to arrange a veryquick return to the Netherlands, and Leonia helped and comforted me as much aspossible.

Fortunately, I could return to Haifa for several more Haifa Matrix Meetingsand an ILAS Meeting over the course of the years, most of them combined withsome extra time to work together with Leonia.

I would like to close with a heartfelt thanks to Leonia for his inspiring leadin our joint work and for his warm friendship.

Andre C.M. RanAfdeling wiskunde, FEWVrije UniversiteitDe Boelelaan 1081 aNL-1081 HV AmsterdamThe Netherlandse-mail: [email protected]



My First Research Experience

Hugo J. Woerdeman

My first exposure to mathematical research was under the supervision of Profes-sor Leonid (“Leonia”) Lerer during the academic year 1984–1985. From the verybeginning we had a very amicable relationship, and what will always stand outin my memory is his consistently calm and friendly demeanor. Leonia was alwaysencouraging, and he never lost his patience, even during the slow periods. He alsomanaged to enjoy life to the fullest – among other things, he savored the freshDutch raw herring during lunch. It was a great environment for me to enter thiscompletely new endeavor. I learned about resultants of analytic functions, singularintegral operators, and many other things I had never heard of, and was expectedto generalize the scalar case to the matrix case. So I learned about realizations,Jordan triples, matrix factorizations, and a lot of other good stuff. So, how did wemanage to make a breakthrough? At some point Leonia encouraged me to workthrough a specific example, and I remember generating pages and pages of com-putations with matrices that very quickly became quite large. And then at somepoint things clicked: I could see the big picture and focus in all those pages ofscribbles on the steps where something real had happened. And the rest is historyas they say, as after a year of hard work we were able to put two papers togetherresulting in my first two publications.

So on this occasion of your 70th birthday, Leonia, I would like to thankyou very much for the excellent guidance you gave me during my initial researchexperience, and for the many years of friendship that followed. I will never forgetthe way my first mathematical breakthrough came about, and it has been a guidefor me ever since. Thank you!

Hugo J. WoerdemanDepartment of MathematicsDrexel UniversityPhiladelphia, PA 19104, USAe-mail: [email protected]



Interpolation in Sub-Bergman Spaces

Joseph A. Ball and Vladimir Bolotnikov

Dedicated to Leonia Lerer, a long-time friend and colleague

Abstract. A general interpolation problem with operator argument is studiedfor functions 𝑓 from sub-Bergman spaces associated with an analytic function𝑆 mapping the open unit disk 𝔻 into the closed unit disk.

Mathematics Subject Classification (2010). 30E05, 47A57, 46E22.

Keywords. De Branges–Rovnyak space, Schur-class function.

1. Introduction

Given a Hilbert space 𝒴 and a positive integer 𝑛, we denote by 𝐴𝑛(𝒴) the standardweighted Bergman space of 𝒴-valued functions 𝑓 analytic on the open unit disk 𝔻

and with finite norm ∥𝑓∥𝒜𝑛(𝒴):

𝒜𝑛(𝒴) ={𝑓(𝑧) =

∑𝑗≥0

𝑓𝑗𝑧𝑗 : ∥𝑓∥2𝒜𝑛(𝒴) :=

∑𝑗≥0

𝜇𝑛,𝑗 ∥𝑓𝑗∥2𝒴 <∞}

(1.1)

where the weights 𝜇𝑛,𝑗’s are defined by

𝜇𝑛,𝑗 :=1(

𝑗+𝑛−1𝑗

) = 𝑗!(𝑛− 1)!(𝑗 + 𝑛− 1)! . (1.2)

It is clear from (1.1) that the spaces𝒜1(𝒴) and𝒜2(𝒴) are respectively the standardvector-valued Hardy space 𝐻2(𝒴) and the unweighted Bergman space 𝐴2

2(𝒴) ofthe unit disk. It then follows that 𝒜𝑛(𝒴) is the reproducing kernel Hilbert spacewith reproducing kernel

𝑘𝑛(𝑧, 𝜁) =1

(1 − 𝑧𝜁)𝑛 𝐼𝒴and that for 𝑛 > 1, the 𝒜𝑛(𝒴)-norm equals

∥𝑓∥2𝐴𝑛(𝒴) =∫𝔻

(𝑛− 1) ∥𝑓(𝑧)∥2𝒴(1 − ∣𝑧∣2)𝑛−2𝑑𝐴(𝑧) <∞

where 𝑑𝐴 is the planar Lebesgue measure normalized so that 𝐴(𝔻) = 1.

18 J.A. Ball and V. Bolotnikov

For Hilbert spaces 𝒰 and 𝒴, we denote by ℒ(𝒰 ,𝒴) the space of boundedlinear operators mapping 𝒰 into 𝒴 (abbreviated to ℒ(𝒰) in case 𝒰 = 𝒴) and wedefine the operator-valued Schur class 𝒮(𝒰 ,𝒴) to be the class of analytic functions𝑆 on 𝔻 whose values 𝑆(𝑧) are contraction operators in ℒ(𝒰 ,𝒴). Each Schur-classfunction 𝑆 ∈ 𝒮(𝒰 ,𝒴) induces a contractive multiplication operator 𝑀𝑆 : 𝑓 �→ 𝑆𝑓from 𝒜𝑛(𝒰) into 𝒜𝑛(𝒴) for every 𝑛 ≥ 1; in fact the Schur-class is exactly the classof contractive multipliers on𝒜𝑛(𝒴) for each 𝑛 = 1, 2, . . . (see, e.g., [14, Proposition4.1]). The latter property is equivalent to the positivity of the kernels

𝐾𝑆,𝑛(𝑧, 𝜁) =𝐼𝒴 − 𝑆(𝑧)𝑆(𝜁)∗

(1− 𝑧𝜁)𝑛 (𝑛 ≥ 1) (1.3)

on 𝔻 × 𝔻 for all 𝑛 ≥ 1. Thus the positivity of 𝐾𝑆,𝑛 for some 𝑛 ≥ 1 alreadyguarantees the membership 𝑆 ∈ 𝒮(𝒰 ,𝒴) so that 𝐾𝑆,𝑛 is positive for all 𝑛 ≥ 1.

Thus, with any function 𝑆 ∈ 𝒮(𝒰 ,𝒴), one can associate a family of positivekernels (1.3) and thereby a family of reproducing kernel Hilbert space ℋ(𝐾𝑆,𝑛).The spaces ℋ(𝐾𝑆,1) were introduced by de Branges and Rovnyak [11, 12] as a con-venient and natural setting for canonical operator models. Further developmentson de Branges–Rovnyak spaces can be found in [18]. The study of ℋ(𝐾𝑆,2) wasinitiated in [20, 21] and there are very few publications concerning the case where𝑛 > 2 (see, e.g., [19]).

The general complementation theory applied to the contractive operator𝑀𝑆 :𝒜𝑛(𝒰) → 𝒜𝑛(𝒴) (see, e.g., [18]) provides the characterization of ℋ(𝐾𝑆,𝑛) as the

operator range ℋ(𝐾𝑆,𝑛) = Ran(𝐼 −𝑀𝑆𝑀∗𝑆)

12 ⊂ 𝒜𝑛(𝒴) with the lifted norm

∥(𝐼 −𝑀𝑆𝑀∗𝑆)

12 𝑓∥2ℋ(𝐾𝑆,𝑛)

= ∥(𝐼 − 𝜋)𝑓∥𝒜𝑛(𝒴) (1.4)

where 𝜋 is the orthogonal projection onto Ker(𝐼 − 𝑀𝑆𝑀∗𝑆)

12 . It follows that

∥𝑓∥ℋ(𝐾𝑆,𝑛) ≥ ∥𝑓∥𝒜𝑛(𝒴) for every 𝑓 ∈ ℋ(𝐾𝑆,𝑛) and thus the spaces ℋ(𝐾𝑆,𝑛) arecontractively included in 𝒜𝑛(𝒴). For this reason, the spaces ℋ(𝐾𝑆,𝑛) are termedsub-Hardy (in [18] for 𝑛 = 1) and sub-Bergman (in [20], [21] for 𝑛 = 2). Upon

setting 𝑓 = (𝐼 −𝑀𝑆𝑀∗𝑆)

12 ℎ in (1.4) we get

∥(𝐼 −𝑀𝑆𝑀∗𝑆)ℎ∥ℋ(𝐾𝑆,𝑛) = ⟨(𝐼 −𝑀𝑆𝑀

∗𝑆)ℎ, ℎ⟩𝒜𝑛(𝒴). (1.5)

The purpose of this paper is to study an interpolation problem of Nevanlinna–Picktype in the space ℋ(𝐾𝑆,𝑛). To formulate the problem we need several definitions.

A pair (𝐸, 𝑇 ) consisting of operators 𝑇 ∈ ℒ(𝒳 ) and 𝐸 ∈ ℒ(𝒳 ,𝒴) is calledan output pair. An output pair (𝐸, 𝑇 ) is called 𝑛-output-stable if the associated𝑛-observability operator

𝒪𝑛,𝐸,𝑇 : 𝑥 �→ 𝐸(𝐼 − 𝑧𝑇 )−𝑛𝑥 =

∞∑𝑗=0

(𝜇−1𝑛,𝑗𝐸𝑇𝑗𝑥) 𝑧𝑗 (1.6)

Interpolation in Sub-Bergman Spaces 19

maps 𝒳 into 𝒜𝑛(𝒴) and is bounded. For an 𝑛-output stable pair (𝐸, 𝑇 ), we definethe tangential functional calculus 𝑓 �→ (𝐸∗𝑓)∧𝐿(𝑇 ∗) on 𝒜𝑛(𝒴) by

(𝐸∗𝑓)∧𝐿(𝑇 ∗) =∞∑𝑗=0

𝑇 ∗𝑗𝐸∗𝑓𝑗 if 𝑓(𝑧) =∞∑𝑗=0

𝑓𝑗𝑧𝑗 ∈ 𝒜𝑛(𝒴). (1.7)

The computation⟨ ∞∑𝑗=0

𝑇 ∗𝑗𝐸∗𝑓𝑗 , 𝑥

⟩𝒳=

∞∑𝑗=0

⟨𝑓𝑗, 𝐸𝑇

𝑗𝑥⟩𝒴

=

∞∑𝑗=0

𝜇𝑛,𝑗 ⋅⟨𝑓𝑗, 𝜇

−1𝑛,𝑗𝐸𝑇

𝑗𝑥⟩𝒴 = ⟨𝑓, 𝒪𝑛,𝐸,𝑇𝑥⟩𝒜𝑛(𝒴)

shows that the 𝑛-output stability of (𝐸, 𝑇 ) is exactly what is needed to verifythat the infinite series in the definition (1.7) of (𝐸∗𝑓)∧𝐿(𝑇 ∗) converges in theweak topology on 𝒳 . The same computation shows that tangential evaluation withoperator argument amounts to the adjoint of 𝒪𝑛,𝐸,𝑇 (in the metric of 𝒜𝑛(𝒴)):

(𝐸∗𝑓)∧𝐿(𝑇 ∗) = 𝒪∗𝑛,𝐸,𝑇 𝑓 for 𝑓 ∈ 𝒜𝑛(𝒴). (1.8)

Since ℋ(𝐾𝑆,𝑛) is included in 𝒜𝑛(𝒴), evaluation (1.7) applies to functions inℋ(𝐾𝑆,𝑛). In this paper we study the following interpolation problem.

Problem 1.1. Given a Schur-class function 𝑆 ∈ 𝒮(𝒰 ,𝒴), given an 𝑛-output stablepair (𝐸, 𝑇 ) ∈ ℒ(𝒳 ,𝒴) × ℒ(𝒳 ) and given x ∈ 𝒳 ,

(i) Find all 𝑓 ∈ ℋ(𝐾𝑆,𝑛) such that

(𝐸∗𝑓)∧𝐿(𝑇 ∗) := 𝒪∗𝑛,𝐸,𝑇 𝑓 = x. (1.9)

(ii) (norm-constrained version): Find all 𝑓 ∈ ℋ(𝐾𝑆,𝑛) satisfying (1.9) with∥𝑓∥ℋ(𝐾𝑆,𝑛) ≤ 1.

The Hardy-space special case of this problem (where 𝑛 = 1 and ℋ(𝐾𝑆) :=ℋ(𝐾𝑆,1)) is the classical de Branges–Rovnyak space and has been studied by theauthors and S. ter Horst in [5, 6]. The set of all functions in 𝐻2(𝒴) satisfying acondition of the form 𝒪∗𝑛,𝐸,𝑇 𝑓 = 0 (i.e., condition (1.9) with 𝑆 = 0 and x = 0) is

one way to describe a generic shift-invariant subspace of𝐻2(𝒴) and the descriptionof the set of all solutions amounts to a calculation of the Beurling–Lax representerΘ(𝑧) for the space ℳ in terms of the interpolation data {𝐸, 𝑇, 𝑆 = 0,x = 0}.Another special case is to allow a general 𝑛 but still insist that 𝑆 = 0 and x =0; this special case recovers the Beurling–Lax representation theorem for shift-invariant subspaces ℳ in the weighted Bergman space obtained by the authorsin [4].


2. Interpolation in reproducing kernel Hilbert spaces:A brief survey

The following operator interpolation problem with norm constraint is well knownin the literature: Given Hilbert space operators 𝐴 ∈ ℒ(𝒴,𝒳 ) and 𝐵 ∈ ℒ(𝒰 ,𝒳 ),describe all 𝑋 ∈ ℒ(𝒰 ,𝒴) that satisfy the conditions

𝐴𝑋 = 𝐵 and ∥𝑋∥ ≤ 1. (2.1)

The solvability criterion is known as the Douglas factorization lemma [13]: Thereis an 𝑋 ∈ ℒ(𝒰 ,𝒴) satisfying (2.1) if and only if 𝐴𝐴∗ ≥ 𝐵𝐵∗. If this is the case,then (see, e.g., [6]) 𝑋 ∈ ℒ(𝒰 ,𝒴) satisfies conditions (2.1) if and only if the operator⎡⎣𝐼𝒰 𝐵∗ 𝑋∗

𝐵 𝐴𝐴∗ 𝐴𝑋 𝐴∗ 𝐼𝒴

⎤⎦ :⎡⎣𝒰𝒳𝒴

⎤⎦→⎡⎣𝒰𝒳𝒴

⎤⎦ is positive semidefinite. (2.2)

On the other hand, if 𝐴𝐴∗ ≥ 𝐵𝐵∗, then there exist (unique) contractions 𝑋1 ∈ℒ(𝒰 ,Ran𝐴) and 𝑋2 ∈ ℒ(𝒴,Ran𝐴) such that(𝐴𝐴∗)

12𝑋1 = 𝐵, (𝐴𝐴∗)

12𝑋2 = 𝐴, Ker𝑋1 = Ker𝐵, Ker𝑋2 = Ker𝐴. (2.3)

Applying Schur complement arguments to the positive semidefinite operator in(2.2) leads us to the following more explicit description of the set of all solutionsto the problem (2.1) (see, e.g., [6] for the proof).

Lemma 2.1. Let 𝐴𝐴∗ ≥ 𝐵𝐵∗. Then an operator 𝑋 satisfies condition (2.2) (andtherefore, also conditions (2.1)) if and only if it is of the form

𝑋 = 𝑋∗2𝑋1 + (𝐼 −𝑋∗2𝑋2)12𝑄(𝐼 −𝑋∗1𝑋1)

12 (2.4)

where 𝑋1 and 𝑋2 are defined in (2.3) and where the parameter 𝑄 is an arbitrarycontraction from Ran(𝐼 −𝑋∗1𝑋1) into Ran(𝐼 −𝑋∗2𝑋2).

Remark 2.2. It follows from (2.4) that there is a unique 𝑋 subject to conditions(2.1) if and only if 𝑋1 is isometric on 𝒰 or 𝑋2 is isometric on 𝒴. Furthermore, foreach 𝑄 in (2.4) and each 𝑢 ∈ 𝒰 , we have

∥𝑋𝑢∥2 = ∥𝑋∗2𝑋1𝑢∥2 + ∥(𝐼 −𝑋∗2𝑋2)12𝑄(𝐼 −𝑋∗1𝑋1)

12𝑢∥2,

so that𝑋∗2𝑋1 is the minimal norm solution to the problem (2.1) (see [6, Section 2]).

The left tangential Nevanlinna–Pick interpolation problem for the reproduc-ing kernel Hilbert space ℋ(𝐾) can be formulated as follows. We are given vectors𝑦1, . . . , 𝑦𝑟 ∈ 𝒴 and points 𝜔1, . . . , 𝜔𝑟 ∈ 𝔻 along with numbers 𝑥1, . . . , 𝑥𝑟 ∈ ℂ andseek 𝑓 ∈ ℋ(𝐾) (possibly also with ∥𝑓∥ℋ(𝐾) ≤ 1) satisfying the left tangentialNevanlinna–Pick interpolation conditions

⟨𝑓(𝜔𝑖), 𝑦𝑖⟩ = 𝑥𝑖 for 𝑖 = 1, . . . , 𝑟. (2.5)

This problem can be reformulated more abstractly as follows. We introduce theℒ(ℂ𝑟,𝒴)-valued function

𝑧 �→ 𝐹 (𝑧) :=[𝐾(𝑧, 𝜔1)𝑦1 ⋅ ⋅ ⋅ 𝐾(𝑧, 𝜔𝑟)𝑦𝑟

]. (2.6)


Then 𝐹 induces a multiplication operator 𝑀𝐹 : ℂ𝑟 → ℋ(𝐾)

𝑀𝐹 :

[𝑐1...𝑐𝑟

]→ 𝐹 (𝑧)

[𝑐1...𝑐𝑟

]= 𝑐1𝐾(𝑧, 𝜔1)𝑦1 + ⋅ ⋅ ⋅+ 𝑐𝑟𝐾(𝑧, 𝜔𝑟)𝑦𝑟.

Then a standard reproducing-kernel computation gives us that 𝑀∗𝐹 : ℋ(𝐾)→ ℂ𝑟

is given by

𝑀∗𝐹 : 𝑓 �→

⎡⎢⎣⟨𝑓(𝜔1), 𝑦1⟩𝒴...⟨𝑓(𝜔𝑟), 𝑦𝑟⟩𝒴

⎤⎥⎦and the Nevanlinna–Pick problem with interpolation conditions (2.5) can be re-formulated as follows: for given 𝐹 as in (2.6) and x ∈ ℂ𝑟, find 𝑓 ∈ ℋ(𝐾) (possiblyalso with ∥𝑓∥ℋ(𝐾) ≤ 1) so that 𝑀∗

𝐹 𝑓 = x.

We now formulate our abstract left tangential Nevanlinna–Pick interpolationproblem as follows. Let 𝐾(𝑧, 𝜁) be an ℒ(𝒴)-valued positive kernel on a Cartesianproduct set Ω×Ω and letℋ(𝐾) be the associated reproducing kernel Hilbert space,that is, the unique inner product space of 𝒴-valued functions on Ω that containsthe functions 𝑧 �→ 𝐾𝜔(𝑧) := 𝐾(𝑧, 𝜔)𝑦 for all fixed 𝜔 ∈ Ω and 𝑦 ∈ 𝒴 which in turnhave the reproducing property for ℋ(𝐾):

⟨𝑓, 𝐾𝜔𝑦⟩ℋ(𝐾) = ⟨𝑓(𝜔), 𝑦⟩𝒴 for all 𝑓 ∈ ℋ(𝐾).For 𝒳 an auxiliary Hilbert space, we let ℳ(𝒳 ,ℋ(𝐾)) (the space of multipliersfrom 𝒳 into ℋ(𝐾)) denote the space of ℒ(𝒳 ,𝒴)-valued functions such that thefunction 𝑧 �→ 𝐹 (𝑧)𝑥 is in ℋ(𝐾) for each 𝑥 ∈ 𝒳 . A consequence of the closed-graphtheorem is that the multiplication operator𝑀𝐹 : 𝑥 �→ 𝐹 (⋅)𝑥 is then bounded as anoperator from 𝒳 into ℋ(𝐾). With this notation in hand we can pose the followinginterpolation problem:

Problem 2.3. Given a positive kernel 𝐾 along with 𝐹 ∈ℳ(𝒳 ,ℋ(𝐾)) and x ∈ 𝒳 ,

(i) Find all functions 𝑓 ∈ ℋ(𝐾) satisfying

𝑀∗𝐹𝑓 = x. (2.7)

(ii) Find all functions 𝑓 ∈ ℋ(𝐾) satisfying (2.7) with ∥𝑓∥ℋ(𝐾) ≤ 1.

As a straightforward application of the general Hilbert space results in Lemma2.1, Remark 2.2 and the preceding discussion, we have the following solution ofProblem 2.3.

Proposition 2.4. Problem 2.3 (ii) has a solution if and only if

𝑃 ≥ xx∗, where 𝑃 :=𝑀∗𝐹𝑀𝐹 . (2.8)

Problem 2.3 (i) has a solution if and only if x ∈ Ran𝑃 12 .


Proof. By specializing the Douglas lemma to the case where

𝐴 =𝑀∗𝐹 : ℋ(𝐾)→ 𝒳 and 𝐵 = x ∈ 𝒳 ∼= ℒ(ℂ,𝒳 ), (2.9)

we see that solutions 𝑋 : ℂ→ ℋ(𝐾) to problem (2.1) necessarily have the form ofa multiplication operator 𝑀𝑓 for some function 𝑓 ∈ ℋ(𝐾). This observation leadsus to (2.8). On the other hand, by the second equality in (2.8), Ran𝑃

12 = Ran𝑀∗

𝐹 .

Thus, x belongs to Ran𝑃12 if and only if it belongs to Ran𝑀∗

𝐹 , that is, if and onlyif equality 𝑀∗

𝐹 𝑓 = x holds for some 𝑓 ∈ ℋ(𝐾), which means that this 𝑓 solvesProblem 2.3. □

Let us assume that the operator 𝑃 = 𝑀∗𝐹𝑀𝐹 is strictly positive definite.

Then the operator 𝑀𝐹𝑃− 1

2 is an isometry and the space

𝒩 = {𝐹 (𝑧)𝑥 : 𝑥 ∈ 𝒳} with norm ∥𝐹𝑥∥ℋ(𝑆) = ∥𝑃 12𝑥∥𝒳 (2.10)

is isometrically included in ℋ(𝐾). Moreover, the orthogonal complement of 𝒩 in

ℋ(𝐾) is the reproducing kernel Hilbert space ℋ(��) with reproducing kernel��(𝑧, 𝜁) = 𝐾(𝑧, 𝜁)− 𝐹 (𝑧)𝑃−1𝐹 (𝜁)∗. (2.11)

The following theorem is an adaptation of Lemma 2.1 to the special case (2.9).

Theorem 2.5. Assume that condition (2.8) holds and that 𝑃 is strictly positive

definite. Let �� be the kernel defined in (2.11).

1. A function 𝑓 : Ω→ 𝒴 solves Problem 2.3 (ii) if and only if it is of the form

𝑓(𝑧) = 𝐹 (𝑧)𝑃−1x+ ℎ(𝑧) (2.12)

for some function ℎ ∈ ℋ(��) subject to ∥ℎ∥ℋ(��) ≤√1− ∥𝑃− 1

2x∥2.2. The representation (2.12) is orthogonal (in the metric of ℋ(𝐾)) so that𝐹 (𝑧)𝑃−1x is the minimal-norm solution of Problem 2.3.

3. Problem 2.3 (ii) has a unique solution if and only if

∥𝑃− 12x∥ = 1 or ��(𝑧, 𝜁) ≡ 0.

Proof. It is readily seen that

𝑋1 = 𝑃− 1

2x ∈ 𝒳 ∼= ℒ(ℂ,𝒳 ) and 𝑋2 = 𝑃− 1

2𝑀∗𝐹 ∈ ℒ(ℋ(𝐾𝑆),𝒳 )

are the operators 𝑋1 and 𝑋2 from (2.4) after specialization to the case (2.9).

Statements (2) and (3) now follow from Remark 2.2, since 𝑃−12x ∈ ℒ(ℂ,𝒳 ) being

isometric means that ∥𝑃− 12x∥ = 1 and, on the other hand, the isometric property

for the operator𝑀𝐹𝑃− 1

2 means that the space 𝒩 defined in (2.10) is equal to the

whole space ℋ(𝐾). Thus ℋ(��) = ℋ(𝐾)⊖𝒩 = {0} or �� ≡ 0.

In the present framework, the parametrization formula (2.4) takes the form

𝑀𝑓 =𝑀𝐹𝑃− 1

2x+

√1− ∥𝑃− 1

2x∥2 ⋅ (𝐼 −𝑀𝐹𝑃−1𝑀∗

𝐹 )12𝑄 (2.13)


where 𝑄 is equal to the operator of multiplication 𝑀𝑘 : ℂ→ ℋ(��) by a function𝑘 ∈ ℋ(��) with ∥𝑘∥ ≤ 1. Since 𝑀𝐹𝑃

− 12 is an isometry, the second term on the

right-hand side of (2.13) is equal to the operator𝑀ℎ of multiplication by a function

ℎ ∈ ℋ(��) such that ∥ℎ∥ℋ(��) = ∥ℎ∥ℋ(𝐾) ≤√1− ∥𝑃− 1

2x∥2. □

Remark 2.6. The parametrization formula (2.12) can be obtained in a more an-alytic way (still originating from (2.2)) as follows. Specializing (2.2) to the case(2.9) we conclude that a function 𝑓 solves Problem 2.3 (ii) if and only if

P =

⎡⎣ 1 x∗ 𝑀∗𝑓

x 𝑃 𝑀∗𝐹

𝑀𝑓 𝑀𝐹 𝐼ℋ(𝐾)

⎤⎦ ≥ 0.

The latter condition is equivalent to the positivity on Ω×Ω of the following kernel:

K(𝑧, 𝜁) =

⎡⎣ 1 x∗ 𝑓(𝜁)∗

x 𝑃 𝐹 (𝜁)∗

𝑓(𝑧) 𝐹 (𝑧) 𝐾(𝑧, 𝜁)

⎤⎦ ર 0. (2.14)

This equivalence is justified by the fact that the set of all vectors of the form

𝑔(𝑧) =

𝑟∑𝑗=1

⎡⎣ 𝑐𝑗𝑥𝑗

𝐾(⋅, 𝑧𝑗)𝑦𝑗

⎤⎦ (𝑐𝑗 ∈ ℂ, 𝑦𝑗 ∈ 𝒴, 𝑥𝑗 ∈ 𝒳 , 𝑧𝑗 ∈ Ω)

is dense in ℂ⊕𝒳 ⊕ℋ(𝐾) and since for every such vector,

⟨P𝑔, 𝑔⟩ℂ⊕𝒳⊕ℋ(𝐾𝑆)

=

𝑟∑𝑗,ℓ=1

⟨K(𝑧𝑗 , 𝑧ℓ)

[ 𝑐ℓ𝑥ℓ𝑦ℓ

],[ 𝑐𝑗𝑥𝑗𝑦𝑗

]⟩ℂ⊕𝒳⊕𝒴

.

If in addition, 𝑃 is strictly positive definite, one can take its Schur complement inK to get the inequality[

1− ∥𝑃− 12x∥2 𝑓(𝜁)∗ − x∗𝑃−1𝐹 (𝜁)∗

𝑓(𝑧)− 𝐹 (𝑧)𝑃−1x 𝐾(𝑧, 𝜁)− 𝐹 (𝑧)𝑃−1𝐹 (𝜁)∗]ર 0

which is equivalent to (2.14). By [9, Theorem 2.2], the latter positivity is equivalent

to the membership of the function ℎ(𝑧) := 𝑓(𝑧) − 𝐹 (𝑧)𝑃−1x in the space ℋ(��)together with the norm constraint ∥ℎ∥2ℋ(��)

≤ 1− ∥𝑃− 12x∥.

Remark 2.7. The function ℎ on the right-hand side of (2.12) is in fact the gen-eral solution of the homogeneous interpolation problem 𝑀∗

𝐹ℎ = 0. Thus the firstpart of Theorem 2.5 states that the solution set of the corresponding homogeneous

interpolation problem coincides with the reproducing kernel Hilbert space ℋ(��).The results of this sort hold true even in a more general setting of Hilbert modules[1, 10]. The most interesting part, however, is to get a more detailed parametriza-

tion ofℋ(��). Although Problem 2.3 is too general to get such a parametrization, inthe context of Nevanlinna–Pick type Problem 1.1 in sub-Bergman spaces ℋ(𝐾𝑆,𝑛)we shall see that something can be done in this direction (see Theorem 3.8 below).


3. The main result

To apply the general results from Section 2 to sub-Bergman spaces ℋ(𝐾𝑆,𝑛) wemust show that Problem 1.1 is a particular case of Problem 2.3. In other words(see formulas (1.9) and (2.7)), we need to find an ℒ(𝒳 ,𝒴)-valued function 𝐹 (𝑧)in the multiplier space ℳ(𝒳 ,ℋ(𝐾)) such that the adjoint of the multiplicationoperator 𝑀𝐹 (in metric of ℋ(𝐾𝑆,𝑛)) is equal to the adjoint of the observabilityoperator 𝒪𝑛,𝐸,𝑇 (in metric of 𝒜𝑛(𝒴)). This function 𝐹 will be constructed in Sec-tion 3.2 below. Then we will immediately get the parametrization formula (2.12)for the solution set of Problem 1.1 and then Theorem 3.8 (the main result of thepaper) will give a more detailed description of the solution set of the associatedhomogeneous problem; the result involves the construction of multiplier functionsΘ𝑛 ∈ ℳ(𝒜𝑛(𝒴)) and Θ𝑘 ∈ ℳ(𝒜𝑘(𝒳 ),𝒜𝑛(𝒴)) for 𝑘 = 1, 2, . . . , 𝑛 − 1. The addi-tional ingredients needed to arrive at this result are introduced in Sections 3.1–3.4below.

Before starting this program, we make one last observation. For an operator𝐴 : 𝒳 → ℋ(𝐾𝑆,𝑘) ⊂ 𝒜𝑘(𝒴), the adjoint operator can be taken in the metric of𝒜𝑘(𝒴) as well as in the metric of ℋ(𝐾𝑆,𝑘) and these two operations in general arenot the same. To avoid confusion, in what follows we use the notation 𝐴∗ for theadjoint of 𝐴 in the metric of 𝒜𝑘(𝒴) and 𝐴[∗] for the adjoint of 𝐴 in the metric ofℋ(𝐾𝑆,𝑘). The precise value of 𝑘 (1 ≤ 𝑘 ≤ 𝑛) occurring here will be clear from thecontext.

3.1. Operators 𝑵 and 𝑷𝒌

Recall that if the pair (𝐸, 𝑇 ) is 𝑛-output stable, then the 𝑛-observability gramian

𝒢𝑛,𝐸,𝑇 := (𝒪𝑛,𝐸,𝑇 )∗𝒪𝑛,𝐸,𝑇 =

∞∑𝑗=0

𝜇−1𝑛,𝑗 ⋅ 𝑇 ∗𝑗𝐸∗𝐸𝑇 𝑗 (3.1)

is bounded on 𝒳 and the strong convergence of the power series in its represen-tation (3.1) follows from the definition of the inner product in 𝒜𝑛(𝒴). One mayconclude that (𝐸, 𝑇 ) is 𝑛-output stable if and only if the power series in (3.1) con-verges weakly (and therefore strongly, since all terms are positive semidefinite).The power series representation (3.1) suggests that 𝒢𝑛,𝐸,𝑇 be defined for 𝑛 = 0 bysimply letting 𝒢0,𝐸,𝑇 := 𝐸

∗𝐸.

Lemma 3.1. If (𝐸, 𝑇 ) is 𝑛-output stable, then it is 𝑘-output stable for all 𝑘 =0, . . . , 𝑛− 1 and the observability gramians satisfy the Stein identity

𝒢𝑘,𝐸,𝑇 − 𝑇 ∗𝒢𝑘,𝐸,𝑇𝑇 = 𝒢𝑘−1,𝐸,𝑇 for all 𝑘 = 1, . . . , 𝑛. (3.2)

Proof. Since 𝜇𝑛,𝑗 ≤ 𝜇𝑛−1,𝑗 (see (1.2)), we conclude that if the power series in(3.1) converges for some integer 𝑛, it also converges for any positive integer 𝑘 < 𝑛.Identity (3.2) follows from power series representations for 𝒢𝑚,𝐸,𝑇 and 𝒢𝑚−1,𝐸,𝑇 ,

due to the binomial-coefficient identity(𝑚𝑘

)=(𝑚−1𝑘

)+(𝑚−1𝑘−1). □


The evaluation map (1.7) extends to Schur-class functions 𝑆 ∈ 𝒮(𝒰 ,𝒴) by(𝐸∗𝑆)∧𝐿(𝑇 ∗) = 𝒪∗𝑛,𝐸,𝑇𝑀𝑆∣𝒰 .

Lemma 3.2. Let (𝐸, 𝑇 ) be 𝑛-output stable, let 𝑆 ∈ 𝒮(𝒰 ,𝒴) be a Schur-class functionand let 𝑁 ∈ ℒ(𝒰 ,𝒳 ) be defined by

𝑁∗ = (𝐸∗𝑆)∧𝐿(𝑇 ∗) := 𝒪∗𝑛,𝐸,𝑇𝑀𝑆∣𝒰 . (3.3)

Then the pair (𝑁, 𝑇 ) is 𝑛-output stable and the following equality holds:

𝒪∗𝑛,𝐸,𝑇𝑀𝑆 = 𝒪∗𝑛,𝑁,𝑇 : 𝒜𝑛(𝒰)→ 𝒳 . (3.4)

Furthermore, the operators

𝑃𝑘 := 𝒢𝑘,𝐸,𝑇 − 𝒢𝑘,𝑁,𝑇 : 𝒳 → 𝒳 (3.5)

satisfy the Stein identities

𝑃𝑘 − 𝑇 ∗𝑃𝑘𝑇 = 𝑃𝑘−1 for 𝑘 = 1, . . . , 𝑛, (3.6)

as well as the inequalities

𝑃𝑘 ≥ 𝑃𝑘−1 ≥ 0 for 𝑘 = 2, . . . , 𝑛. (3.7)

Proof. Making use of power series representations

𝑆(𝑧) =∑𝑗≥0

𝑆𝑗𝑧𝑗 and ℎ(𝑧) =

∑𝑗≥0

ℎ𝑗𝑧𝑗

of a given 𝑆 ∈ 𝒮(𝒰 ,𝒴) and of an arbitrary fixed function ℎ ∈ 𝒜𝑛(𝒰) we have

(𝑀𝑆ℎ)(𝑧) =

∞∑ℓ=0

⎛⎝ 𝑘∑𝑗=0

𝑆𝑗ℎℓ−𝑗

⎞⎠ 𝑧ℓ which together with (1.8) implies𝒪∗𝑛,𝐸,𝑇𝑀𝑆ℎ = (𝐸∗(𝑆ℎ))∧𝐿 (𝑇 ∗) =

∞∑ℓ=0

𝑇 ∗ℓ𝐸∗

⎛⎝ ℓ∑𝑗=0

𝑆𝑗ℎℓ−𝑗

⎞⎠ , (3.8)

Note that the latter series converges weakly since the pair (𝐸, 𝑇 ) is 𝑛-outputstable and 𝑆ℎ ∈ 𝒜𝑛(𝒴). If we regularize the series by replacing 𝑆𝑗 by 𝑟𝑗𝑆𝑗 and byreplacing ℎ𝑖 by 𝑟

𝑖ℎ𝑖, we even get that the double series in (3.8), after taking theinner product against a fixed vector 𝑥 ∈ 𝒳 , converges absolutely. We may thenrearrange the series to have the form

⟨𝒪∗𝑛,𝐸,𝑇𝑀𝑆𝑟ℎ𝑟, 𝑥⟩ =∞∑

𝑗,𝑘=0

⟨𝑟𝑗+𝑘(𝑇 ∗)𝑗+𝑘𝐸∗𝑆𝑗ℎ𝑘, 𝑥⟩.

We may then invoke Abel’s theorem to take the limit as 𝑟 ↗ 1 (justified by thefacts that (𝐸, 𝑇 ) is 𝑛-output stable and that 𝑆ℎ ∈ 𝒜𝑛(𝒴)) to get

𝒪∗𝑛,𝐸,𝑇𝑀𝑆ℎ = (𝐸∗𝑆ℎ)∧𝐿(𝑇 ∗) =∞∑

𝑗,𝑘=0

(𝑇 ∗)𝑗+𝑘𝐸∗𝑆𝑗ℎ𝑘. (3.9)


On the other hand, due to (3.3),

𝒪∗𝑛,𝑁,𝑇ℎ = (𝑁∗ℎ)∧𝐿(𝑇 ∗) =∞∑𝑘=0

𝑇 ∗𝑘𝑁∗ℎ𝑘 =∞∑𝑘=0

𝑇 ∗𝑘𝒪∗𝑛,𝐸,𝑇𝑆ℎ𝑘

=∞∑𝑘=0

𝑇 ∗𝑘

⎛⎝ ∞∑𝑗=0

𝑇 ∗𝑗𝐸∗𝑆𝑗

⎞⎠ℎ𝑘 = ∞∑𝑗,𝑘=0

(𝑇 ∗)𝑗+𝑘𝐸∗𝑆𝑗ℎ𝑘

where all the series converge weakly, since that in (3.9) does. Since ℎ was pickedarbitrarily in 𝒜𝑛(𝒰), the last equality and (3.9) imply (3.4). Therefore, the op-erator 𝒪∗𝑛,𝑁,𝑇 : 𝒜𝑛(𝒰) → 𝒳 is bounded and hence the pair (𝑁, 𝑇 ) is 𝑛-outputstable.

By Lemma 3.1, the pairs (𝐸, 𝑇 ) and (𝑁, 𝑇 ) are 𝑘-output stable for all 𝑘 =1, . . . , 𝑛. By using the definition (3.5) of the operators 𝑃𝑘 along with the identities(3.2), we get

𝑃𝑘 − 𝑇 ∗𝑃𝑘𝑇 = 𝒢𝑘,𝐸,𝑇 − 𝒢𝑘,𝑁,𝑇 − 𝑇 ∗ (𝒢𝑘,𝐸,𝑇 − 𝒢𝑘,𝑁,𝑇 )𝑇

= (𝒢𝑘,𝐸,𝑇 − 𝑇 ∗𝒢𝑘,𝐸,𝑇𝑇 )− (𝒢𝑘,𝑁,𝑇 − 𝑇 ∗𝒢𝑘,𝑁,𝑇𝑇 )

= 𝒢𝑘−1,𝐸,𝑇 − 𝒢𝑘−1,𝑁,𝑇 =: 𝑃𝑘−1

and we arrive at (3.6). Since 𝑀𝑆 is a contraction on 𝒜𝑘(𝒴), we may replace 𝑛 by𝑘 in the preceding proof to conclude that

𝒪∗𝑘,𝐸,𝑇𝑀𝑆 = 𝒪∗𝑘,𝑁,𝑇 : 𝒜𝑘(𝒰)→ 𝒳 for 𝑘 = 1, . . . , 𝑛. (3.10)

Therefore, from (3.5) and (3.10) we see that

𝑃𝑘 = 𝒪∗𝑘,𝐸,𝑇𝒪𝑘,𝐸,𝑇 −𝒪∗𝑘,𝑁,𝑇𝒪𝑘,𝑁,𝑇 = 𝒪∗𝑘,𝐸,𝑇 (𝐼 −𝑀𝑆𝑀∗𝑆)𝒪𝑘,𝐸,𝑇 ≥ 0.

Finally, we invoke (3.6) to conclude that 𝑃𝑘 = 𝑇 ∗𝑃𝑘𝑇 + 𝑃𝑘−1 ≥ 𝑃𝑘−1 whichcompletes the proof of (3.7). □

3.2. Functions 𝑭𝑺𝒌

We now assume that we are given the data set (𝐸, 𝑇, 𝑆(𝑧),x) for an interpolationproblem as in Problem 1.1. By making use of the operator 𝑁 defined by (3.3), wenow introduce ℒ(𝒳 ,𝒴)-valued functions

𝐹𝑆𝑘 (𝑧) = (𝐸 − 𝑆(𝑧)𝑁)(𝐼 − 𝑧𝑇 )−𝑘, 𝑘 = 1, . . . , 𝑛, (3.11)

which are therefore completely specified by the data set (𝐸,𝑁, 𝑇, 𝑆(𝑧),x). For themultiplication operator 𝑀𝐹𝑆

𝑘we have

𝑀𝐹𝑆𝑘= 𝒪𝑘,𝐸,𝑇 −𝑀𝑆𝒪𝑘,𝑁,𝑇 = (𝐼 −𝑀𝑆𝑀

∗𝑆)𝒪𝑘,𝐸,𝑇 (3.12)

where the first equality follows from (3.11) and (1.6), while the second equality isa consequence of (3.10).

Lemma 3.3. Let (𝐸, 𝑇 ) be an 𝑛-output stable pair, let 𝑆 ∈ 𝒮(𝒰 ,𝒴) be a Schur-classfunction and let 𝑁 and 𝑃𝑘 be defined as in (3.3) and (3.5) respectively. Then


1. The function 𝐹𝑆𝑘 given by (3.11) is in the space of multipliersℳ(𝒳 ,ℋ(𝐾𝑆,𝑘)),

and moreover

𝑀[∗]𝐹𝑆

𝑘

𝑀𝐹𝑆𝑘= 𝑃𝑘 and 𝑀

[∗]𝐹𝑆

𝑘

= 𝒪∗𝑘,𝐸,𝑇 . (3.13)

2. The kernel 𝑅𝑘(𝑧, 𝜁) =

[𝑃𝑘 𝐹𝑆

𝑘 (𝜁)∗

𝐹𝑆𝑘 (𝑧) 𝐾𝑆,𝑘(𝑧, 𝜁)

]is positive on 𝔻× 𝔻.

Proof. Formula (3.12) and the range characterization of ℋ(𝐾𝑆,𝑘) imply that 𝑀𝐹𝑆𝑘

maps 𝒳 into ℋ(𝐾𝑆,𝑘). Furthermore, it follows from (3.11), (1.5), (3.4), and (3.5)that

∥𝐹𝑆𝑘 𝑥∥2ℋ(𝐾𝑆,𝑘)

= ⟨(𝐼 −𝑀𝑆𝑀∗𝑆)𝒪𝑘,𝐸,𝑇𝑥,𝒪𝑘,𝐸,𝑇𝑥⟩𝒜𝑘(𝒴)

= ⟨(𝒪∗𝑘,𝐸,𝑇𝒪𝑘,𝐸,𝑇 −𝒪∗𝑘,𝑁,𝑇𝒪𝑘,𝑁,𝑇 )𝑥, 𝑥⟩𝒳 = ⟨𝑃𝑘𝑥, 𝑥⟩𝒳for every 𝑥 ∈ 𝒳 which implies the first relation in (3.13). On the other hand, uponmaking subsequent use of (3.12) and (1.5), we see that

⟨𝑥,𝑀 [∗]𝐹𝑆

𝑘

𝑓⟩𝒳 = ⟨𝐹𝑆𝑘 𝑥, 𝑓⟩ℋ(𝐾𝑆,𝑘) = ⟨(𝐼 −𝑀𝑆𝑀

∗𝑆)𝒪𝑘,𝐸,𝑇𝑥, 𝑓⟩ℋ(𝐾𝑆,𝑘)

= ⟨𝒪𝑘,𝐸,𝑇𝑥, 𝑓⟩𝒜𝑘(𝒴) = ⟨𝑥, 𝒪∗𝑘,𝐸,𝑇 𝑓⟩𝒳for every 𝑓 ∈ ℋ(𝐾𝑆,𝑘) and 𝑥 ∈ 𝒳 , which implies the second equality in (3.13). Bythe first equality in (3.13), the operator[

𝑃𝑘 𝑀[∗]𝐹𝑆

𝑘

𝑀𝐹𝑆𝑘

𝐼

]=

[𝑀

[∗]𝐹𝑆

𝑘

𝑀𝐹𝑆𝑘

𝑀[∗]𝐹𝑆

𝑘

𝑀𝐹𝑆𝑘

𝐼

]=

[𝑀

[∗]𝐹𝑆

𝑘

𝐼

] [𝑀𝐹𝑆

𝑘𝐼] ∈ ℒ([ 𝒳

ℋ(𝐾𝑆,𝑘)

])is positive semidefinite; as in Remark 2.6, this condition in turn is equivalent topositivity of the kernel 𝑅𝑘(𝑧, 𝜁) on 𝔻× 𝔻. □

We now conclude from (3.13) that the interpolation condition (1.9) in Prob-lem 1.1 can be written in the form 𝑀∗

𝐹𝑆𝑛𝑓 = x. Therefore Proposition 1.1 and

Theorem 2.5 apply leading us to the following result.

Lemma 3.4. Let 𝑁 and 𝑃𝑛 be defined as in (3.3) and (3.5) respectively.

1. Problem 1.1 (i) has a solution if and only if x ∈ Ran𝑃 12𝑛 .

2. If 𝑃𝑛 is strictly positive definite, the function 𝑓min(𝑧) = 𝐹𝑆𝑛 (𝑧)𝑃

−1𝑛 x solves

Problem 1.1 (i) and has the minimal possible norm ∥𝑓min∥ℋ(𝐾𝑆,𝑛) = ∥𝑃−12

𝑛 x∥.3. A function ℎ ∈ ℋ(𝐾𝑆,𝑛) satisfies (𝐸∗ℎ)∧𝐿(𝑇 ∗) = 0 if and only if ℎ is in the

reproducing kernel Hilbert space ℋ(��𝑆,𝑛) with reproducing kernel

��𝑆,𝑛(𝑧, 𝜁) =𝐼𝒴 − 𝑆(𝑧)𝑆(𝜁)∗

(1− 𝑧𝜁)𝑛 − 𝐹𝑆𝑛 (𝑧)𝑃

−1𝑛 𝐹𝑆

𝑛 (𝜁)∗. (3.14)


3.3. 𝑱-inner function Θ and the Schur-class function 퓔The assumption 𝑃𝑛 > 0 allowed us to get a simple explicit formula for the minimal-

norm solution in Lemma 3.4. The parametrization of the space ℋ(��𝑆,𝑛) will beestablished in Theorem 3.8 below under the stronger assumption 𝑃1 > 0. By (3.7)this condition implies that 𝑃𝑘 > 0 for all 𝑘 = 1, . . . , 𝑛.

Let us observe that if 𝑃1 is boundedly invertible, then so is the observabilitygramian 𝒢1,𝐸,𝑇 (see formula (3.5)) which in turn implies (see, e.g., [8]) that theoperator 𝑇 is strongly stable in the sense that 𝑇 𝑛 converge to zero in the strongoperator topology. Let 𝐽 be the signature operator given by

𝐽 =

[𝐼𝒴 00 −𝐼𝒰

]and let Θ(𝑧) =

[𝑎(𝑧) 𝑏(𝑧)𝑐(𝑧) 𝑑(𝑧)

](3.15)

be a ℒ(𝒴 ⊕ 𝒰)-valued function such that for all 𝑧, 𝜁 ∈ 𝔻,

𝐽 −Θ(𝑧)𝐽Θ(𝜁)∗1− 𝑧𝜁 =

[𝐸𝑁

](𝐼 − 𝑧𝑇 )−1𝑃−11 (𝐼 − 𝜁𝑇 ∗)−1 [𝐸∗ 𝑁∗

]. (3.16)

The function Θ is determined by equality (3.16) uniquely up to a constant 𝐽-unitary factor on the right. One possible choice of Θ satisfying (3.16) is

Θ(𝑧) = 𝐷 + 𝑧

[𝐸𝑁

](𝐼 − 𝑧𝑇 )−1𝐵

where the operator [ 𝐵𝐷 ] : 𝒴⊕𝒰 →[ 𝒳𝒴⊕𝒰

]is an injective solution to the 𝐽-Cholesky

factorization problem[𝐵𝐷

]𝐽[𝐵∗ 𝐷∗

]=

[𝑃−11 00 𝐽

]−⎡⎣𝑇𝐸𝑁

⎤⎦𝑃−11

[𝑇 ∗ 𝐸∗ 𝑁∗

].

Such a solution exists due to identity (3.2) for 𝑘 = 1, that is,

𝑃1 − 𝑇 ∗𝑃1𝑇 = 𝐸∗𝐸 −𝑁∗𝑁. (3.17)

If spec(𝑇 ) ∩ 𝕋 ∕= 𝕋 (which is the case if, e.g., dim𝒳 < ∞), then a function Θsatisfying (3.16) can be taken in the form

Θ(𝑧) = 𝐼 + (𝑧 − 𝜇)[𝐸𝑁

](𝐼𝒳 − 𝑧𝑇 )−1𝑃−11 (𝜇𝐼𝒳 − 𝑇 ∗)−1

[𝐸∗ −𝑁∗ ] (3.18)

where 𝜇 is an arbitrary point in 𝕋 ∖ spec(𝑇 ) (see [7]). For Θ of the form (3.18),the verification of identity (3.16) is straightforward and relies on the Stein iden-tity (3.17) only. It follows from (3.16) that Θ is 𝐽-contractive on 𝔻, i.e., thatΘ(𝑧)𝐽Θ(𝑧)∗ ≤ 𝐽 for all 𝑧 ∈ 𝔻. A less trivial fact is that due to the strong sta-bility of 𝑇 , the function Θ is 𝐽-inner; that is, the nontangential boundary valuesΘ(𝑡) exist for almost all 𝑡 ∈ 𝕋 and are 𝐽-unitary: Θ(𝑡)𝐽Θ(𝑡)∗ = 𝐽 . Every 𝐽-innerfunction Θ =

[𝑎 𝑏𝑐 𝑑

]gives rise to the one-to-one linear fractional transform

ℰ �→ TΘ[ℰ ] := (𝑎ℰ + 𝑏)(𝑐ℰ + 𝑑)−1 (3.19)


mapping the Schur class 𝒮(𝒰 ,𝒴) into itself. In case Θ is a 𝐽-inner functionsatisfying identity (3.16), the transform (3.19) establishes a one-to-one corre-spondence between 𝒮(𝒰 ,𝒴) and the set of all Schur class functions 𝐺 such that(𝐸∗𝐺)∧𝐿(𝑇 ∗) = 𝑁∗. Since the given function 𝑆 satisfies the latter condition bydefinition (3.3) of 𝑁 , it follows that 𝑆 = TΘ[ℰ ] for some (uniquely determined)function ℰ ∈ 𝒮(𝒰 ,𝒴) which is recovered from 𝑆 by

ℰ = (𝑎− 𝑆𝑐)−1(𝑆𝑑− 𝑏). (3.20)

Since Θ is 𝐽-inner, it follows that 𝑑(𝑧) is boundedly invertible for all 𝑧 ∈ 𝔻 andalso ∥𝑑−1(𝑧)𝑐(𝑧)∥ < 1 for all 𝑧 ∈ 𝔻.

The following result gives the construction of the multiplier Φ𝑛 needed for theMain Result (Theorem 3.8 below); the remaining multipliers Φ𝑘 (𝑘 = 1, . . . , 𝑛− 1)are obtained in Lemma 3.7 below (formula (3.30)).

Lemma 3.5. Let us assume that Schur-class functions 𝒮 and ℰ are related as in(3.20). Then the following identity holds for all 𝑧 ∈ 𝔻:

𝐼𝒴 − 𝑆(𝑧)𝑆(𝜁)∗1− 𝑧𝜁 − 𝐹𝑆

1 (𝑧)𝑃−11 𝐹𝑆

1 (𝜁)∗ = Φ𝑛(𝑧)

𝐼𝒴 − ℰ(𝑧)ℰ(𝜁)∗1− 𝑧𝜁 Φ𝑛(𝜁)

∗, (3.21)

where 𝐹𝑆1 (𝑧) = (𝐸 − 𝑆(𝑧)𝑁)(𝐼 − 𝑧𝑇 )−1 (according to formula (3.11)) and where

Φ𝑛 = (𝑎− 𝑏𝑑−1𝑐)(𝐼 + ℰ𝑑−1𝑐)−1. (3.22)

Proof. Substituting (3.20) into (3.22) gives

Φ𝑛 = (𝑎− 𝑏𝑑−1𝑐)(𝐼 + (𝑎− 𝑆𝑐)−1(𝑆𝑑− 𝑏)𝑑−1𝑐)−1

= (𝑎− 𝑏𝑑−1𝑐) (𝑎− 𝑆𝑐+ (𝑆𝑑− 𝑏)𝑑−1𝑐)−1 (𝑎− 𝑆𝑐) = 𝑎− 𝑆𝑐.Recalling the matrix representation for Θ in (3.15) and the expression (3.19) forℰ , we therefore have[

𝐼𝒴 −𝑆]Θ =[𝑎− 𝑆𝑐 𝑏− 𝑆𝑑] = Φ𝑛

[𝐼𝒴 −ℰ] . (3.23)

Multiplying both sides of (3.16) by[𝐼𝒴 −𝑆(𝑧)] on the left and by [ 𝐼𝒴

−𝑆(𝜁)∗]on

the right we get[𝐼𝒴 −𝑆(𝑧)] 𝐽 −Θ(𝑧)𝐽Θ(𝜁)∗

1− 𝑧𝜁

[𝐼𝒴

−𝑆(𝜁)∗]= 𝐹𝑆

1 (𝑧)𝑃−11 𝐹𝑆

1 (𝜁)∗

which can be rearranged (see the formula (3.15) for 𝐽) to

𝐼𝒴 − 𝑆(𝑧)𝑆(𝜁)∗1− 𝑧𝜁 − 𝐹𝑆

1 (𝑧)𝑃−11 𝐹𝑆

1 (𝜁)∗ =

[𝐼𝒴 −𝑆(𝑧)] Θ(𝑧)𝐽Θ(𝜁)∗

1− 𝑧𝜁

[𝐼𝒴

−𝑆(𝜁)∗]

which together with (3.23) implies (3.21). □

In case Θ =[𝑎 𝑏𝑐 𝑑

]is taken in the form (3.18), the formula (3.22) takes the

form

Φ𝑛(𝑧) = (𝐼 + (𝑧 − 𝜇)𝐸Δ(𝑧)−1𝐸∗)(𝐼 + (𝑧 − 𝜇)ℰ(𝑧)𝑁Δ(𝑧)−1𝐸∗)−1 (3.24)


where

Δ(𝑧) = (𝜇𝐼 − 𝑇 ∗)𝑃1(𝐼 − 𝑧𝑇 )− (𝑧 − 𝜇)𝑁∗𝑁.Indeed, let us first note that(

𝐼 − (𝑧 − 𝜇)𝑁(𝐼 − 𝑧𝑇 )−1𝑃−11 (𝜇𝐼 − 𝑇 ∗)−1𝑁∗)−1= 𝐼 + (𝑧 − 𝜇) (𝐼 − (𝑧 − 𝜇)𝑁(𝐼 − 𝑧𝑇 )−1𝑃−11 (𝜇𝐼 − 𝑇 ∗)−1𝑁∗)−1×𝑁(𝐼 − 𝑧𝑇 )−1𝑃−11 (𝜇𝐼 − 𝑇 ∗)−1𝑁∗

= 𝐼 + (𝑧 − 𝜇)𝑁 (𝐼 − (𝑧 − 𝜇)(𝐼 − 𝑧𝑇 )−1𝑃−11 (𝜇𝐼 − 𝑇 ∗)−1𝑁∗𝑁)−1× (𝐼 − 𝑧𝑇 )−1𝑃−11 (𝜇𝐼 − 𝑇 ∗)−1𝑁∗

= 𝐼 + (𝑧 − 𝜇)𝑁Δ(𝑧)−1𝑁∗.With this result in hand we see next that

𝑑(𝑧)−1𝑐(𝑧) = (𝑧 − 𝜇) (𝐼 − (𝑧 − 𝜇)𝑁(𝐼 − 𝑧𝑇 )−1𝑃−11 (𝜇𝐼 − 𝑇 ∗)−1𝑁∗)−1×𝑁(𝐼 − 𝑧𝑇 )−1𝑃−11 (𝜇𝐼 − 𝑇 ∗)−1𝐸∗

= (𝑧 − 𝜇) (𝐼 + (𝑧 − 𝜇)𝑁Δ(𝑧)−1𝑁∗)×𝑁(𝐼 − 𝑧𝑇 )−1𝑃−11 (𝜇𝐼 − 𝑇 ∗)−1𝐸∗

= (𝑧 − 𝜇)𝑁Δ(𝑧)−1 (Δ(𝑧) + (𝑧 − 𝜇)𝑁∗𝑁)× (𝐼 − 𝑧𝑇 )−1𝑃−11 (𝜇𝐼 − 𝑇 ∗)−1𝐸∗

= (𝑧 − 𝜇)𝑁Δ(𝑧)−1𝐸∗.Therefore we get

𝑎(𝑧)− 𝑏(𝑧)𝑑(𝑧)−1𝑐(𝑧) = 𝐼 + (𝑧 − 𝜇)𝐸(𝐼 − 𝑧𝑇 )−1𝑃−11 (𝜇𝐼 − 𝑇 ∗)−1× (𝐸∗ + (𝑧 − 𝜇)𝑁∗𝑁Δ(𝑧)−1𝐸∗)

= 𝐼 + (𝑧 − 𝜇)𝐸(𝐼 − 𝑧𝑇 )−1𝑃−11 (𝜇𝐼 − 𝑇 ∗)−1× (Δ(𝑧) + (𝑧 − 𝜇)𝑁∗𝑁)Δ(𝑧)−1𝐸∗

= 𝐼 + (𝑧 − 𝜇)𝐸Δ(𝑧)−1𝐸∗and (3.24) follows from the two latter equalities and (3.22).

3.4. Inner functions Ψ𝒌

Our next construction is similar to that in the previous section. Due to identities(3.6) and (strict) inequalities (3.7), we can find operators B𝑗 , D𝑗 : 𝒳 → 𝒳 so that[

𝑇 B𝑗

𝑃12𝑗 D𝑗

][𝑃−1𝑗+1 0

0 𝐼𝒳

][𝑇 ∗ 𝑃

12𝑗

B∗𝑗 D∗𝑗

]=

[𝑃−1𝑗+1 0

0 𝐼𝒳

]and [

𝑇 ∗ 𝑃12𝑗

B∗𝑗 D∗𝑗

][𝑃𝑗+1 00 𝐼𝒳

][𝑇 B𝑗

𝑃12

𝑗 D𝑗

]=

[𝑃𝑗+1 00 𝐼𝒳

].


In fact, the latter equalities determineB𝑗 andD𝑗 uniquely up to a common unitaryfactor on the right:

B𝑗 = (𝑃−1𝑗+1 −𝐴𝑃−1𝑗+1𝐴∗)

12 , D𝑗 = −𝑃−

12

𝑗 𝐴∗𝑃𝑗B𝑗 .

We now define the functions

Ψ𝑗(𝑧) = D𝑗 + 𝑧𝑃12

𝑗 (𝐼 − 𝑧𝑇 )−1B𝑗 (3.25)

for 𝑗 = 1, . . . , 𝑛− 1, which are inner and satisfy the identities𝐼𝒳 −Ψ𝑗(𝑧)Ψ𝑗(𝜁)

∗

1− 𝑧𝜁 = 𝑃12

𝑗 (𝐼 − 𝑧𝑇 )−1𝑃−1𝑗+1(𝐼 − 𝜁𝑇 ∗)−1𝑃12

𝑗 . (3.26)

As was shown in [4, Proposition 7.2], the function Ψ𝑗 in (3.25) for 𝑗 = 1, . . . , 𝑛− 1can alternatively be given by

Ψ𝑗(𝑧) = 𝑃12

𝑗 (𝐼 − 𝑧𝑇 )−1𝑃−1𝑗+1(𝑧𝐼 − 𝑇 ∗)B−1𝑗 .

If spec(𝑇 ) ∩ 𝕋 ∕= 𝕋, then a function Ψ𝑗 satisfying (3.26) can be taken in the form

Ψ𝑗(𝑧) = 𝐼 + (𝑧 − 𝜇)𝑃 12

𝑗 (𝐼𝒳 − 𝑧𝑇 )−1𝑃−1𝑗+1(𝜇𝐼𝒳 − 𝑇 ∗)−1𝑃12

𝑗 (3.27)

where 𝜇 is an arbitrary point in 𝕋∖spec(𝑇 ). For this choice, an alternative formulafor Ψ𝑗 is the following

Ψ𝑗(𝑧) = 𝑃12𝑗 (𝐼 − 𝑧𝑇 )−1𝑃−1𝑗+1(𝑧𝐼 − 𝑇 ∗)(𝜇𝐼 − 𝑇 ∗)−1𝑃𝑗+1(𝐼 − 𝜇𝑇 )−1𝑃−

12

𝑗 .

Lemma 3.6. Let Ψ1, . . . ,Ψ𝑛−1 be the inner functions defined in (3.25). Then

𝑛−1∑𝑘=1

(𝐼 − 𝑧𝑇 )−𝑘+1𝑃− 1

2

𝑘

Ψ𝑘(𝑧)Ψ𝑘(𝜁)∗

(1− 𝑧𝜁)𝑛−𝑘𝑃− 1

2

𝑘 (𝐼 − 𝜁𝑇 ∗)−𝑘+1

=𝑃−11

(1− 𝑧𝜁)𝑛−1 − (𝐼 − 𝑧𝑇 )−𝑛+1𝑃−1𝑛 (𝐼 − 𝜁𝑇 ∗)−𝑛+1.

(3.28)

Proof. Equality

(𝐼 − 𝑧𝑇 )−𝑘+1𝑃− 1

2

𝑘

𝐼𝒳 −Ψ𝑘(𝑧)Ψ𝑘(𝜁)∗

1− 𝑧𝜁 𝑃− 1

2

𝑘 (𝐼 − 𝜁𝑇 ∗)−𝑘+1

= (𝐼 − 𝑧𝑇 )−𝑘𝑃−1𝑘+1(𝐼 − 𝜁𝑇 ∗)−𝑘

follows immediately from (3.26) and can be written equivalently as

(𝐼 − 𝑧𝑇 )−𝑘+1𝑃− 1

2

𝑘


(1− 𝑧𝜁)𝑛−𝑘𝑃− 1

2

𝑗 (𝐼 − 𝜁𝑇 ∗)−𝑗+1

= (𝐼 − 𝑧𝑇 )−𝑘+1 𝑃−1𝑘

(1 − 𝑧𝜁)𝑛−𝑘(𝐼 − 𝜁𝑇 ∗)−𝑘+1

− (𝐼 − 𝑧𝑇 )−𝑘𝑃−1𝑘+1

(1− 𝑧𝜁)𝑛−𝑘−1 (𝐼 − 𝜁𝑇 ∗)−𝑘.

Summing up the latter equalities for 𝑘 = 1, . . . , 𝑛− 1 we get (3.28). □


3.5. The main result

Now we are in a position to represent the kernel (3.14) as the sum of 𝑛 positivekernels.

Lemma 3.7. The kernel ��𝑆,𝑛(𝑧, 𝜁) defined in (3.14) can be represented as

��𝑆,𝑛(𝑧, 𝜁) = Φ𝑛(𝑧)𝐾ℰ,𝑛(𝑧, 𝜁)Φ𝑛(𝜁)∗ +

𝑛−1∑𝑘=1

Φ𝑘(𝑧)Φ𝑘(𝜁)∗

(1− 𝑧𝜁)𝑘 (3.29)

where Φ𝑛 is defined in (3.24) and where

Φ𝑘(𝑧) = 𝐹𝑆𝑛−𝑘(𝑧)𝑃

− 12

𝑛−𝑘Ψ𝑛−𝑘(𝑧) (𝑘 = 1, . . . , 𝑛− 1). (3.30)

Proof. We first divide both sides in (3.21) by (1−𝑧𝜁)𝑛−1 and combine the obtainedidentity with (3.14) to get

��𝑆,𝑛(𝑧, 𝜁) = Φ𝑛(𝑧)𝐼𝒴 − ℰ(𝑧)ℰ(𝜁)∗(1− 𝑧𝜁)𝑛 Φ𝑛(𝜁)

∗

+𝐹𝑆1 (𝑧)𝑃

−11 𝐹𝑆

1 (𝜁)∗

(1− 𝑧𝜁)𝑛−1 − 𝐹𝑆𝑛 (𝑧)𝑃

−1𝑛 𝐹𝑆

𝑛 (𝜁)∗. (3.31)

Multiplying both sides in (3.28) by 𝐹𝑆1 (𝑧) = (𝐸 − 𝑆(𝑧)𝑁)(𝐼 − 𝑧𝑇 )−1 on the left

and by 𝐹𝑆1 (𝜁)

∗ on the right we get𝑛−1∑𝑘=1

𝐹𝑆𝑘 (𝑧)𝑃

− 12

𝑘


(1− 𝑧𝜁)𝑛−𝑘𝑃− 1

2

𝑘 𝐹𝑆𝑘 (𝜁)

∗

=𝐹𝑆1 (𝑧)𝑃

−11 𝐹𝑆

1 (𝜁)∗

(1 − 𝑧𝜁)𝑛−1 − 𝐹𝑆𝑛 (𝑧)𝑃

−1𝑛 𝐹𝑆

𝑛 (𝜁)∗

which being plugged into (3.31) gives the desired representation (3.29). □Identity (3.29) means that the map

𝑋 : ��𝑆,𝑛(⋅, 𝜁)𝑦 �→

⎡⎢⎢⎢⎢⎣𝐾ℰ,𝑛(⋅, 𝜁)Φ𝑛(𝜁)

∗𝑦1

(1−⋅𝜁)𝑛−1Φ𝑛−1(𝜁)∗𝑦...

1(1−⋅𝜁)1Φ1(𝜁)

∗𝑦

⎤⎥⎥⎥⎥⎦extends by linearity and continuity to an isometry from ℋ(��𝑆,𝑛) to ℋ(𝐾ℰ,𝑛) ⊕𝒜𝑛−1(𝒳 )⊕⋅ ⋅ ⋅⊕𝒜1(𝒳 ). Furthermore, standard reproducing-kernel arguments showthat 𝑋∗ is given by multiplication by the function Φ :=

[Φ𝑛 ⋅ ⋅ ⋅ Φ1

]. We con-

clude that the meaning of the identity (3.29) is that the function Φ is an coisometric

multiplier from ℋ(𝐾ℰ,𝑛)⊕𝒜𝑛−1(𝒳 )⊕ ⋅ ⋅ ⋅⊕𝒜1(𝒳 ) onto ℋ(��𝑆,𝑛). Combining thisobservation with statement (3) in Lemma 3.4 we arrive at the following result.

Theorem 3.8. Let us assume that the data set of Problem 1.1 is such that theoperator 𝑃1 defined in (3.5) is strictly positive definite. Let Φ𝑘 be defined as in(3.22), (3.30). Then:


1. A function 𝑓 is a solution of Problem 1.1 (i) if and only if it is of the form

𝑓 = 𝑓min +Φ𝑛ℎ𝑛 +𝑛−1∑𝑗=1

Φ𝑗ℎ𝑗 , 𝑓min(𝑧) = 𝐹𝑆𝑛 (𝑧)𝑃

−1𝑛 x (3.32)

for some choice of ℎ𝑛 ∈ ℋ(𝐾ℰ,𝑛) and ℎ𝑗 ∈ 𝒜𝑗(𝒳 ) for 𝑗 = 1, . . . , 𝑛− 1.

2. A function 𝑓 is a solution of Problem 1.1 (ii) if and only if it is of the form(3.32) for some ℎ𝑛 ∈ ℋ(𝐾ℰ,𝑛) and ℎ𝑗 ∈ 𝒜𝑗(𝒳 ) for 𝑗 = 1, . . . , 𝑛− 1 such that

∥ℎ𝑛∥2ℋ(𝐾𝑆,𝑛)+

𝑛−1∑𝑗=1

∥ℎ𝑗∥2𝒜𝑗(𝒳 ) ≤ 1− ∥𝑃− 12x∥2.

In particular, such solutions exist if and only if ∥𝑃− 12x∥2 ≤ 1.

If 𝑆 ≡ 0, then the space ℋ(𝐾𝑆,𝑛) amounts to the Bergman space 𝒜𝑛(𝒴).In this case, we get from (3.3) that 𝑁 = 0 and subsequently, 𝑃𝑘 = 𝒢𝑘,𝐸,𝑇 for𝑘 = 1, . . . , 𝑛. We then have equality (3.23) with ℰ ≡ 0 and Φ𝑛 and inner ℒ(𝒴)-valued function subject to

𝐼𝒴 −Ψ𝑛(𝑧)Ψ𝑛(𝜁)∗

1− 𝑧𝜁 = 𝐸(𝐼 − 𝑧𝑇 )−1𝒢−11,𝐸,𝑇 (𝐼 − 𝜁𝑇 ∗)−1𝐸∗.

In the present context, Theorem 3.8 parametrizes the solution set of the interpola-tion problem with operator argument in 𝒜𝑛(𝒴). The homogeneous version of thisproblem can be interpreted as a Beurling-type theorem for 𝒜𝑛(𝒴).

Another particular case where 𝑛 = 1 partly recovers results on interpolationin de Branges–Rovnyak spaces presented in [6].

4. Examples

Parametrization (3.32) is especially transparent in case dim𝒰 = dim𝒴 = 1 anddim𝒳 < ∞. Besides, in this case (as we will see below), the matrices 𝑃𝑘 areall positive definite for 𝑘 = 2, . . . , 𝑛. The matrix 𝑃1 may be singular but in thescalar-valued case we are able to handle this option as well.

If dim𝒳 < ∞ and dim𝒰 = dim𝒴 = 1, then with respect to an appropriatebasis of 𝒳 the output stable observable pair (𝐸, 𝑇 ) has the following form: 𝑇 is ablock diagonal matrix 𝑇 = diag{𝑇1, . . . , 𝑇𝑘} with the diagonal block 𝑇𝑖 equal tothe upper triangular 𝑛𝑖 × 𝑛𝑖 Jordan block with the number 𝑧𝑖 ∈ 𝔻 on the maindiagonal and 𝐸 is the row vector

𝐸 =[𝐸1 . . . 𝐸𝑘

], where 𝐸𝑖 =

[1 0 . . . 0

] ∈ ℂ1×𝑛𝑖 .

It is not hard to show that for (𝐸, 𝑇 ) as above and for every function 𝑓 analyticat 𝑧1, . . . , 𝑧𝑘, evaluation (1.7) amounts to

(𝐸∗𝑓)∧𝐿(𝑇 ∗) = Col1≤𝑖≤𝑘 Col0≤𝑗<𝑛𝑖

𝑓 (𝑗)(𝑧𝑖)

𝑗!. (4.1)


If we specify the entries of the column x∗ by letting

x∗ = Col1≤𝑖≤𝑘 Col0≤𝑗<𝑛𝑖 𝑥𝑖𝑗 ,

then it is readily seen that Problem 1.1 amounts to the following Lagrange–Sylvester interpolation problem:

LSP: Given a scalar Schur-class function 𝑆 ∈ 𝒮, distinct points 𝑧1, . . . , 𝑧𝑘 ∈ 𝔻

and a collection {𝑥𝑖𝑗} of complex numbers, find all functions 𝑓 ∈ ℋ(𝐾𝑆,𝑛) suchthat

𝑓 (𝑗)(𝑧𝑖)/𝑗! = 𝑥𝑖𝑗 for 𝑗 = 0, . . . , 𝑛𝑖 − 1; 𝑖 = 1, . . . , 𝑘.

The auxiliary column 𝑁∗ is now defined from the derivatives of the givenfunction 𝑆 via formula (4.1), and we define the matrix 𝑃1 as the unique solutionof the Stein equation (3.17). This matrix 𝑃1 turns out to be equal to the Schwarz–Pick matrix

𝑃1 =

⎡⎢⎣⎡⎣ 1

ℓ!𝑟!

∂ℓ+𝑟

∂𝑧ℓ∂𝜁𝑟1− 𝑆(𝑧)𝑆(𝜁)

1− 𝑧𝜁

∣∣∣∣∣𝑧=𝑧𝑖𝜁=𝑧𝑗

⎤⎦𝑟=0,...,𝑛𝑗−1

ℓ=0,...,𝑛𝑖−1

⎤⎥⎦𝑘

𝑖,𝑗=1

,

which in turn is known to be positive definite unless 𝑆 is a Blaschke productof degree 𝑑 < n := 𝑛1 + ⋅ ⋅ ⋅ + 𝑛𝑘, in which case 𝑃1 is positive semidefinite andrank𝑃1 = 𝑑. It is not hard to show that the matrices 𝑃𝑘 defined in (3.5) are equalto higher-order Schwarz–Pick matrices

𝑃𝑘 =

⎡⎢⎣⎡⎣ 1

ℓ!𝑟!

∂ℓ+𝑟

∂𝑧ℓ∂𝜁𝑟1− 𝑆(𝑧)𝑆(𝜁)(1− 𝑧𝜁)𝑘

∣∣∣∣∣𝑧=𝑧𝑖𝜁=𝑧𝑗

⎤⎦𝑟=0,...,𝑛𝑗−1

ℓ=0,...,𝑛𝑖−1

⎤⎥⎦𝑘

𝑖,𝑗=1

.

Lemma 4.1. If a Schur-class function is not a unimodular constant, then the ma-trices 𝑃𝑘 are positive definite for 𝑘 ≥ 2.

Proof. By (3.7), it suffices to prove the statement for 𝑘 = 2. Observe that if𝐶𝑖 ∈ ℂ1×𝑛𝑖 is a row-vector with a non-zero left-most entry and 𝑇𝑖 ∈ ℂ𝑛𝑖×𝑛𝑖 isthe upper triangular Jordan block, then the pair (𝐶𝑖, 𝑇𝑖) is observable in the sensethat the gramian 𝒢1,𝐶𝑖,𝑇𝑖 is positive definite. Consequently, if 𝐶 =

[𝐶1 ⋅ ⋅ ⋅ 𝐶𝑘

]is a block-row vector with all blocks 𝐶𝑖 having nonzero left-most entries and if𝑇 = diag{𝑇1, . . . , 𝑇𝑘} is the conformally decomposed block-diagonal matrix asabove, then the pair (𝐶, 𝑇 ) is observable. In particular we may take 𝐶 equal tothe top row in the matrix 𝑃1. The left-most entries in its blocks are equal to1−𝑆(𝑧1)𝑆(𝑧𝑖)

1−𝑧1𝑧𝑖and are non-zero unless 𝑆 is a unimodular constant. Thus, for this

choice of 𝐶 we have

0 < 𝒢1,𝐶,𝑇 =

∞∑𝑗=0

𝑇 ∗𝑗𝐶∗𝐶𝑇 𝑘 ≤∞∑𝑗=0

𝑇 ∗𝑗𝑃1𝑇 𝑗. (4.2)


Since the spectral radius of 𝑇 is strictly less than one, it follows from the Steinidentity 𝑃2 − 𝑇 ∗𝑃2𝑇 = 𝑃1 that 𝑃2 can be represented via converging series

𝑃2 =

∞∑𝑗=0

𝑇 ∗𝑗𝑃1𝑇 𝑗

and now we conclude from (4.2) that 𝑃2 > 0 regardless of whether 𝑃1 is invertibleor not. □

We now proceed to three different cases.

Case 1: 𝑆 is not a finite Blaschke product or it is a finite Blaschke product ofdegree deg 𝑆 > n. In this case 𝑃1 > 0 and all the solutions 𝑓 to the problem LSPare given by formula (3.32), where now all the ingredients are not only explicitbut also computable.

Case 2: 𝑆 be a Blaschke product of degree deg 𝑆 = n. Then the matrix 𝑃1 isinvertible, but the associated function ℰ defined by (3.19) is a unimodular constant,so that the corresponding sub-Bergman space ℋ(𝐾ℰ,𝑛) is trivial. Observe, that inthis case, the formula (3.21) takes the form

𝐼𝒴 − 𝑆(𝑧)𝑆(𝜁)∗1− 𝑧𝜁 = 𝐹𝑆

1 (𝑧)𝑃−11 𝐹𝑆

1 (𝜁)∗. (4.3)

Furthermore, the parametrizing formula (3.32) does not contain the second termon the right. In particular, if 𝑛 = 1, then Problem 1.1(i) has a unique solution.

Case 3: deg 𝑆 < n. Since the matrices 𝑃𝑘 are invertible for 𝑘 = 2, . . . , 𝑛, theformula (3.32) for 𝑓min as well as formulas (3.30) for Φ𝑘 for 𝑘 = 1, . . . , 𝑛 − 2 stillmake sense and by the preceding analysis,

𝐹𝑆2 (𝑧)𝑃

−12 𝐹𝑆

2 (𝜁)∗

(1− 𝑧𝜁)𝑛−1 − 𝐹𝑆𝑛 (𝑧)𝑃

−1𝑛 𝐹𝑆

𝑛 (𝜁)∗ =

𝑛−2∑𝑘=1


(1 − 𝑧𝜁)𝑘 . (4.4)

The only remaining question is to modify appropriately the function Φ𝑛−1.

Let us consider conformal block decompositions

𝑃1 =

[𝑃1 𝑋𝑋∗ 𝑌

], 𝑇 =

[𝑇 𝑇10 𝑇2

], 𝐸 =

[𝐸 𝐸1

], 𝑁 =

[𝑁 𝑁1

](4.5)

with 𝑃1, 𝑇 ∈ ℂ𝑑×𝑑 and 𝐸, 𝑁 ∈ ℂ1×𝑑 where 𝑑 := deg 𝑆. Making use of (4.5) andtaking the advantage of the upper triangular structure of 𝑇 , we also decomposethe function 𝐹𝑆

1 as

𝐹𝑆1 (𝑧) =

[𝐹𝑆1 (𝑧) 𝐹𝑆

1 (𝑧)], where 𝐹𝑆

1 (𝑧) = (𝐸 − 𝑆(𝑧)𝑁)(𝐼 − 𝑧𝑇 )−1. (4.6)

The block 𝑃1 is the Schwarz–Pick matrix of a Blaschke product of degree 𝑑 based

on 𝑑 points (counted with multiplicities) and therefore, 𝑃1 is invertible. On the


other hand the subproblem of LSP based on these points is of the type consideredin Case 2 and thus, by virtue of (4.3) we have

𝐼𝒴 − 𝑆(𝑧)𝑆(𝜁)∗1− 𝑧𝜁 = 𝐹𝑆

1 (𝑧)𝑃−11 𝐹𝑆

1 (𝜁)∗. (4.7)

Since rank𝑃1 = rank𝑃1 = 𝑑 it follows that 𝑌 = 𝑋∗𝑃−11 𝑋 so that 𝑃1 can berepresented as

𝑃1 =

[𝑃

121

𝑋∗𝑃−12

1

] [𝑃

121 𝑃

− 12

1 𝑋

].

From this representation and from the fact that the kernel 𝑅1(𝑧, 𝜁) is positive on𝔻× 𝔻 (see statement (2) in Lemma 3.3) we conclude that

𝐹𝑆1 (𝑧)

[𝑃−11 𝑋−𝐼

]≡ 0

which implies that the entries 𝐹𝑆1 and 𝐹𝑆

1 in (4.6) are related by 𝐹𝑆1 = 𝐹𝑆

1 𝑃−11 𝑋

so that 𝐹𝑆1 (𝑧) can be written as

𝐹𝑆1 (𝑧) = 𝐹

𝑆1 (𝑧)

[𝐼 𝑃−11 𝑋

]. (4.8)

We now define the function

Φ𝑛−1(𝑧) := 𝐹𝑆1 (𝑧)𝑃

− 12

1 Ψ1(𝑧) (4.9)

where Ψ1 is the inner ℂ𝑑×𝑑-valued function given by

Ψ1(𝑧) = 𝐼 + (𝑧 − 𝜇)𝑃 121

[𝐼 𝑃−11 𝑋

](𝐼 − 𝑧𝑇 )−1𝑃−12 (𝜇𝐼 − 𝑇 ∗)−1

[𝐼

𝑋∗𝑃−11

]𝑃

121

(compare with (3.27)) and satisfying the identity

𝐼𝑑 − Ψ1(𝑧)Ψ1(𝜁)∗

1− 𝑧𝜁 = 𝑃121

[𝐼 𝑃−11 𝑋

](𝐼 − 𝑧𝑇 )−1𝑃−12 (𝐼 − 𝜁𝑇 ∗)−1

[𝐼

𝑋∗𝑃−11

]𝑃

121

similar to that in (3.26). By (4.8),

𝐹𝑆2 (𝑧) = 𝐹

𝑆1 (𝑧)(𝐼 − 𝑧𝑇 )−1 = 𝐹𝑆

1 (𝑧)[𝐼 𝑃−11 𝑋

](𝐼 − 𝑧𝑇 )−1,

which together with (4.7), (4.9) and the previous identity implies

𝐹𝑆2 (𝑧)𝑃

−12 𝐹𝑆

2 (𝜁)∗ = 𝐹𝑆

1 (𝑧)𝑃− 1

21

𝐼𝑑 − Ψ1(𝑧)Ψ1(𝜁)∗

1− 𝑧𝜁 𝑃− 1

21 𝐹𝑆

1 (𝜁)∗

=𝐼𝒴 − 𝑆(𝑧)𝑆(𝜁)∗

1− 𝑧𝜁 − Φ𝑛−1(𝑧)Φ𝑛−1(𝜁)∗

1− 𝑧𝜁 .

Therefore the kernel (3.14) can be written as

��𝑆,𝑛(𝑧, 𝜁) =Φ𝑛−1(𝑧)Φ𝑛−1(𝜁)∗

(1− 𝑧𝜁)𝑛 +𝐹𝑆2 (𝑧)𝑃

−12 𝐹𝑆

2 (𝜁)∗

(1− 𝑧𝜁)𝑛−1 − 𝐹𝑆𝑛 (𝑧)𝑃

−1𝑛 𝐹𝑆

𝑛 (𝜁)∗


which together with (4.4) implies

��𝑆,𝑛(𝑧, 𝜁) =

𝑛−1∑𝑘=1


(1− 𝑧𝜁)𝑘 .

Thus in the present case we have the same parametrization of the solution set butthe parameter ℎ𝑛−1 is taken in 𝒜𝑛−1(ℂ𝑑) rather than in 𝒜𝑛−1(ℂn).

5. Some open questions and directions for future work

5.1. Stein equations, inertia theorems, and associated orthogonal polynomials

In the classical setting, Stein equations and associated inertial theorems (wherethe solution of the Stein equation may be indefinite) are closely associated withorthogonal polynomials with respect to an appropriate weight on the unit circleand location of the zeros of these polynomials (inside or outside the unit circle).Here instead of a single Stein equation we have a nested family of Stein equations(3.6), (3.7) conceivably associated with a family of orthogonal polynomials withrespect to a weight on the unit disk rather than on the unit circle. It would be ofinterest to extend the classical theory to this nested/Bergman-space setting. Thiswould follow up on one of the interests of Leonia Lerer (see, e.g., [15, 16]) to whomthis paper is dedicated.

5.2. Overlapping spaces

In the classical theory of de Branges–Rovnyak spaces (see in particular Sarason’sbook [18]), a prominent role is played by the overlapping space

ℋ(1− 𝑆(𝑧)𝑆(𝜁∗(1− 𝑧𝜁)

)∩ℋ

(𝑆(𝑧)𝑆(𝜁)∗

(1− 𝑧𝜁)).

It would be of interest to determine if something significant can be said about

the analogous overlapping spaces ℋ(1−𝑆(𝑧)𝑆(𝜁)∗

(1−𝑧𝜁)𝑛

)∩ℋ

(𝑆(𝑧)𝑆(𝜁)∗

(1−𝑧𝜁)𝑛

)for 𝑛 > 1; some

results in this direction appear in [19].

5.3. Characterization of de Branges–Rovnyak spaces viabackward-shift operator identities

In the classical setting (𝑛 = 1) the de Branges–Rovnyak reproducing kernel Hilbertspaces ℋ(𝐾𝑆,1), and more generally reproducing kernel Hilbert spaces with kernel

of the form 𝐽−Θ(𝑧)𝐽Θ(𝑤)∗

1−𝑧𝜁, can be characterized via invariance under the backward-

shift operator combined with an appropriate functional identity (see Section 5.3 of[2] for a very general formulation and additional references and history). A naturalquestion for future investigation is whether such a characterization can be given

for reproducing kernel Hilbert spaces with kernel of the form 𝐽−Θ(𝑧)𝐽Θ(𝑤)∗

(1−𝑧𝜁)𝑛.


Acknowledgement

We would like to thank our colleague Sanne ter Horst for useful comments on anearly draft of this manuscript.

References

[1] D. Alpay and V. Bolotnikov, On tangential interpolation in reproducing kernelHilbert modules and applications, in: Topics in interpolation theory (eds. H. Dym,B. Fritzsche, V. Katsnelson and B. Kirstein), pp. 37–68, OT 95, Birkhauser, Basel,1997.

[2] D. Arov and H. Dym, 𝐽-Contractive Matrix Valued Functions and Related Topics,Encyclopedia of Mathematics and its Applications 116, Cambridge, 2008.

[3] J.A. Ball and V. Bolotnikov, Contractive multipliers from Hardy space to weightedHardy space, Preprint, arXiv:1209.3690

[4] J.A. Ball and V. Bolotnikov,Weighted Bergman spaces: shift-invariant subspaces andinput/state/output linear systems, Integral Equations Operator Theory 76 (2013)no. 3, 301–356.

[5] J.A. Ball, V. Bolotnikov and S. ter Horst Interpolation in de Branges–Rovnyakspaces, Proc. Amer. Math. Soc. 139 (2011), no. 2, 609–618.

[6] J.A. Ball, V. Bolotnikov and S. ter Horst, Abstract interpolation in vector-valuedde Branges–Rovnyak spaces, Integral Equations Operator Theory 70 (2011), no. 2,227–268.

[7] J.A. Ball, I. Gohberg, and L. Rodman. Interpolation of Rational Matrix Functions,OT45, Birkhauser Verlag, 1990.

[8] J.A. Ball and M.W. Raney, Discrete-time dichotomous well-posed linear systemsand generalized Schur–Nevanlinna–Pick interpolation, Complex Anal. Oper. Theory1 (2007), 1–54.

[9] F. Beatrous and J. Burbea, Positive-definiteness and its applications to interpolationproblems for holomorphic functions, Trans. Amer. Math. Soc., 284 (1984), no. 1, 247–270.

[10] V. Bolotnikov and L. Rodman, Remarks on interpolation in reproducing kernelHilbert spaces, Houston J. Math. 30 (2004), no. 2, 559–576.

[11] L. de Branges and J. Rovnyak, Canonical models in quantum scattering theory, Per-turbation Theory and its Applications in Quantum Mechanics (ed. C. Wilcox), pp.295–392, Wiley, New York, 1966.

[12] L. de Branges and J. Rovnyak, Square Summable Power Series, Holt, Rinehart andWinston, New–York, 1966.

[13] R.G. Douglas, On majorization, factorization, and range inclusion of operators onHilbert space, Proc. Amer. Math. Soc. 17 (1966), 413–415.

[14] O. Giselsson and A. Olofsson, On some Bergman shift operators, Complex Anal.Oper. Theory 6 (2012), 829–842.

[15] L. Lerer and A.C.M. Ran, A new inertia theorem for Stein equations, inertia ofinvertible Hermitian block Toeplitz matrices and matrix orthogonal polynomials, In-tegral Equations and Operator Theory 47 (2003) no. 3, 339–360


[16] L. Lerer, I. Margulis, and A.C.M. Ran, Inertia theorems based on operator Lyapunovequations, Oper. Matrices 2 (2008) no. 2, 153–166.

[17] M. Rosenblum and J. Rovnyak, Hardy Classes and Operator Theory, Oxford Uni-versity Press, 1985.

[18] D.E. Sarason, Sub-Hardy Hilbert Spaces in the Unit Disk, John Wiley & Sons, Inc.,New York, 1994.

[19] S. Sultanic, Sub-Bergman Hilbert spaces, J. Math. Anal. Appl. 324 (2006), no. 1,639–649.

[20] K. Zhu, Sub-Bergman Hilbert spaces on the unit disk. Indiana Univ. Math. J. 45(1996), no. 1, 165–176.

[21] K. Zhu, Sub-Bergman Hilbert spaces on the unit disk. II, J. Funct. Anal. 202 (2003),no. 2, 327–341

Joseph A. BallDepartment of MathematicsVirginia TechBlacksburg, VA 24061-0123, USAe-mail: [email protected]

Vladimir BolotnikovDepartment of MathematicsThe College of William and MaryWilliamsburg VA 23187-8795, USAe-mail: [email protected]




Zero Sums of Idempotents and BanachAlgebras Failing to be Spectrally Regular

H. Bart, T. Ehrhardt and B. Silbermann

Dedicated to Leonia Lerer, in celebration of his seventieth birthday

Abstract. A large class of Banach algebras is identified allowing for non-trivialzero sums of idempotents, hence failing to be spectrally regular. Belonging toit are the 𝐶∗-algebras known under the name Cuntz algebras. Other Banachalgebras lying in the class are those of the form ℒ(𝑋) with 𝑋 a (non-trivial)Banach space isomorphic to a (finite) direct sum of at least two copies of 𝑋.There do exist (somewhat exotic) Banach spaces for which ℒ(𝑋) is spectrallyregular.

Mathematics Subject Classification (2010). Primary 46H99, 47C15; Secondary30G30, 46E15.

Keywords. Logarithmic residue, spectral regularity, (zero) sum of idempotents,Cuntz algebra, space of (bounded) continuous functions, Cantor type set.

1. Introduction

All algebras considered in this paper are assumed to be associative. A logarithmicresidue is a contour integral of the type

1

2𝜋𝑖

∫∂Δ

𝑓 ′(𝜆)𝑓(𝜆)−1𝑑𝜆, (1)

where the analytic function 𝑓 has its values in a unital complex Banach algebra ℬand ∂Δ is a suitable contour in the complex plane ℂ, in fact the positively orientedboundary of a Cauchy domain Δ. In the scalar case ℬ = ℂ, the expression (1) isequal to the number of zeros of 𝑓 in Δ, multiplicities of course taken into account.Thus, in that situation, the integral (1) vanishes if and only if 𝑓 takes non-zerovalues, not only on ∂Δ (which has been implicitly assumed in order to let (1)make sense) but on all of Δ. This state of affairs leads to the following question: iffor a Banach algebra-valued analytic function 𝑓 the integral (1) vanishes, can oneconclude that 𝑓 takes invertible values on all of Δ?

42 H. Bart, T. Ehrhardt and B. Silbermann

There are many situations where the answer to this question is positive (see[6], [8], [9], [11], [12], [13] and [15]); in general it is negative, however. The Banachalgebra ℒ(ℓ2) of all bounded linear operators on ℓ2 is a counterexample (see [5] and[6]). This comes about from the fact that ℒ(ℓ2) allows for non-trivial zero sumsof idempotents, i.e., a finite collection of non-zero idempotents adding up to zero.Indeed, for Banach algebras featuring that phenomenon the answer to the abovequestion is always negative.

For a long time, ℒ(ℓ2) was basically the only known counterexample in con-nection with the issue stated above. In this paper, the existence of non-trivial zerosums of idempotents will be established for a large class of Banach algebras. Be-longing to it are the 𝐶∗-algebras known under the name Cuntz algebras. OtherBanach algebras lying in the class are those of the form ℒ(𝑋) with 𝑋 a non-trivialBanach space isomorphic to a (finite) direct sum of at least two copies of 𝑋 . Hereℒ(𝑋) stands for the Banach algebra of bounded linear operators on 𝑋 .

This brings us to a description of the contents of the different sections to befound below. Apart from the introduction contained in Section 1 and the list ofreferences, the paper consists of seven sections. Section 2 is of a preliminary natureand introduces the main concepts. Among them is that of spectral regularity ofa Banach algebra, meaning basically that for the algebra in question the answerto the question formulated above is always positive. It is also recalled from [6]that a spectrally regular Banach algebra does not allow for non-trivial zero sumsof idempotents. Section 3 contains information about zero sums of idempotentsin general algebras. Special attention is paid to sums involving a small number ofterms. Section 4 gives a criterion for a Banach algebra to have a non-trivial zerosum of idempotents, hence failing to be spectrally regular. The pertinent conditionis phrased in terms reminiscent of the defining characteristic for the 𝐶∗-algebrascalled Cuntz algebras, but there is no restriction here to the 𝐶∗-context (as is thecase in [15]). Sections 5 and 6 deal with the situation where the Banach algebraunder consideration is of the form ℒ(𝑋) with 𝑋 an infinite-dimensional Banachspace. The idempotents in ℒ(𝑋) then are the projections in 𝑋 . There do existinfinite-dimensional Banach spaces 𝑋 for which ℒ(𝑋) is spectrally regular so thatnon-trivial zero sums of projections in 𝑋 do not exist. This follows by combiningresults from [11] with the quite remarkable examples given in [2], [22] and [1].Section 5 is concerned with the situation where a finite number of projectionsadd up to an operator which is quasinilpotent or compact. For non-trivial sumsof that type, the conclusion is drawn that they involve at least five projectionswith infinite rank and co-rank. At the end of the section, the following embeddingissue comes up: for Banach spaces 𝑋 and 𝑌 , when can ℒ(𝑌 ) be viewed as acontinuously embedded subalgebra of ℒ(𝑋)? One simple case in which there is apositive answer is identified. Generally speaking however, the issue seems to berather non-trivial. In Section 6, the criterion exhibited in Section 3 leads to theconclusion that ℒ(𝑋) allows for non-trivial zero sums of projections, hence is notspectrally regular, whenever 𝑋 is non-trivial and isomorphic to a (finite) directsum of at least two copies of itself. In Sections 7 and 8, this is used to identify new

Sums of Idempotents and Spectral Regularity of Banach Algebras 43

examples of Banach algebras lacking the property of being spectrally regular. Someof these relate to deep problems in general topology and the geometry of Banachspaces. This is especially true for Banach algebras of the form ℒ(𝑋) with 𝑋 takento be a Banach space of bounded continuous functions. For instance, using a trulyremarkable theorem of A.A. Miljutin [28], the following result is obtained: If 𝑆 isan uncountable compact metrizable topological space and 𝒞(𝑆;ℂ) stands for theBanach space of all complex continuous functions on 𝑆 (endowed with the max-norm), then the operator algebra ℒ(𝒞(𝑆;ℂ)) allows for non-trivial zero sums ofprojections, hence is not spectrally regular. In two examples given towards the endof the paper, a (generalizing) modification of the well-known Cantor constructionplays an important role.

One final remark to close the introduction. The expression (1) defines the leftlogarithmic residue of 𝑓 . There is also a right version, obtained by replacing the leftlogarithmic derivative 𝑓 ′(𝜆)𝑓(𝜆)−1 by the right logarithmic derivative 𝑓(𝜆)−1𝑓 ′(𝜆).For some special cases, the relationship between left logarithmic residues and rightlogarithmic residues has been investigated: see [7], [8], [9], and [11]. As far asthe issues considered in the present paper are concerned, the results that can beobtained for the left and the right version of the logarithmic residue are analogousto each other. Therefore the qualifiers left and right will be suppressed in whatfollows.

2. Preliminaries

A spectral configuration is a triple (ℬ,Δ, 𝑓) where ℬ is a non-trivial unital complexBanach algebra, Δ is a bounded Cauchy domain in ℂ (see [34] or [20]) and 𝑓 is aℬ-valued analytic function defined on an open neighborhood of the closure of Δand having invertible values on all of the boundary ∂Δ of Δ. With such a spectralconfiguration, taking ∂Δ to be positively oriented, one can associate the contourintegral

𝐿𝑅(𝑓 ; Δ) =1

2𝜋𝑖

∫∂Δ

𝑓 ′(𝜆)𝑓(𝜆)−1𝑑𝜆.

We call it the logarithmic residue associated with (ℬ,Δ, 𝑓); sometimes the termlogarithmic residue of 𝑓 with respect to Δ is used as well.

The spectral configuration (ℬ,Δ, 𝑓) is called winding free when the logarith-mic residue 𝐿𝑅(𝑓 ; Δ) = 0, spectrally winding free if 𝐿𝑅(𝑓 ; Δ) is quasinilpotent,and spectrally trivial in case 𝑓 takes invertible values on all of Δ. This terminologyis taken from [13].

In [13] a unital Banach algebra ℬ is said to be spectrally regular if a spectralconfiguration having ℬ as the underlying Banach algebra is spectrally trivial when-ever it is spectrally winding free. Here we shall work with a (possibly) somewhatweaker notion. We shall call ℬ spectrally regular if a spectral configuration havingℬ as the underlying Banach algebra is spectrally trivial whenever it is windingfree. Whether this notion is strictly weaker than the one employed (under the


same name) in [13] is not known. It makes sense to adopt the weaker form of spec-tral regularity here because in this paper we are (mainly) interested in “negativeresults”, i.e., in results having as conclusion that the Banach algebra under con-sideration fails to be spectrally regular. In that case we call it spectrally irregular.So the unital Banach algebra ℬ is spectrally irregular if and only if there exists aspectral configuration having ℬ as the underlying Banach algebra which is windingfree but not spectrally trivial.

Closely connected to the issue of spectral (ir)regularity is that of zero sumsof idempotents. Let 𝒜 be an algebra (possibly without norm). A (finite) numberof idempotents 𝑝1, . . . , 𝑝𝑛 in 𝒜 are said to form a zero sum if they add up to thezero element in 𝒜, i.e., if 𝑝1 + ⋅ ⋅ ⋅ + 𝑝𝑛 = 0. The zero sum is called trivial when𝑝𝑗 = 0, 𝑗 = 1, . . . , 𝑛. Non-triviality of a zero sum of idempotents then means thatat least one among the idempotents involved does not vanish. By leaving out thezero terms, such a non-trivial zero sum can be transformed into a genuine zerosum, that is one where all terms are non-zero.

In line with what has been the case up to now, here too spectral irregularitywill be brought to light via the construction of non-trivial zero sums of idempo-tents. The background for this is the following basic result taken from [5].

Theorem 2.1. Let ℬ be a unital Banach and let 𝑝1, . . . , 𝑝𝑛 be idempotents in ℬ. Ifℬ is spectrally regular and 𝑝1 + ⋅ ⋅ ⋅+ 𝑝𝑛 = 0, then 𝑝𝑗 = 0, 𝑗 = 1, . . . , 𝑛.

We say that Banach algebra ℬ has the non-trivial zero sum property if thereexist a positive integer 𝑚 and non-zero idempotents 𝑝1, . . . , 𝑝𝑚 in ℬ such that𝑝1 + ⋅ ⋅ ⋅ + 𝑝𝑚 = 0. The above theorem can then be read as follows: if ℬ has thenon-trivial zero sum property, then ℬ is spectrally irregular ; schematically:

non-trivial zero sum property ⇒ spectral irregularity.

It is an open problem whether or not the reverse implication is valid too. Noexample is known of a spectrally irregular Banach algebra which fails to have thenon-trivial zero sum property, so of a spectrally irregular Banach algebra allowingfor trivial zeros sums of idempotents only.

We close this section by mentioning that the issue of spectral (ir)regularityis closely related to non-commutative Gelfand theory (cf. [31]). In this connectionfamilies of matrix representations play an important role: the existence of certainspecific families for a Banach algebra ℬ implies spectral regularity of ℬ (see [13]and [14]). The strongest result of this type is that ℬ is spectrally regular whenℬ allows for a radical-separating family of matrix representations. Here a family{𝜙𝜔 : ℬ → ℂ𝑚𝜔×𝑚𝜔}𝜔∈Ω is called radical-separating if it separates the points of ℬmodulo the radical of ℬ. Thus spectral irregularity brings with it that ℬ does notallow for a radical-separating family of matrix representations; schematically:

non-trivial zero sum property⇒ spectral irregularity

⇒ absence of radical-separating families.


The absence of radical-separating families (so a-fortiori having the non-trivial zerosum property) implies the non-existence of the families of other types consideredin [13] and [14].

3. Sums of small numbers of idempotents

In this section we focus on (zero) sums of a small number of idempotents (notcounting the possibly repeated occurrence of the unit element). To put the (new)material to be presented into proper perspective, we first recall some known facts(see [5]).

If 𝒜 is any algebra (normable or not), then a zero sum of three idempotentsin 𝒜 is necessarily trivial. There is an example of an algebra which allows fora genuine zero sum of four idempotents. The algebra in question is, however,non-normable. It must be because, as Theorem 4.3 from [5] asserts, zero sumsof four idempotents in a normed algebra are always trivial. We shall return tothis (non-trivial) result, in a moment. In [30] it is shown that every boundedlinear operator on the separable Hilbert space ℓ2 can be written as the sum of fiveidempotents in the Banach algebra ℒ(ℓ2) of all bounded linear operators on ℓ2.An immediate consequence of this is that ℒ(ℓ2) allows for non-trivial zero sumsof six idempotents. In fact one can make do with one less: the five idempotentsconstructed in [5], Example 3.1 yield a genuine zero sum in ℒ(ℓ2).

We now return to Theorem 4.3 from [5]. As was already indicated, the the-orem says that a zero sum of four idempotents in a Banach algebra (or, moregenerally a normed algebra) is always trivial. Here is an extension of this result.

Theorem 3.1. Let 𝑝1, 𝑝2, 𝑝3 and 𝑝4 be idempotents in a non-trivial Banach algebraℬ with unit element 𝑒ℬ, and let 𝜈 be a non-negative integer. If

𝑝1 + 𝑝2 + 𝑝3 + 𝑝4 + 𝜈𝑒ℬ = 0,

then 𝜈 = 0 and 𝑝1 = 𝑝2 = 𝑝3 = 𝑝4 = 0.

Theorem 4.3 in [5] corresponds to the case where the integer 𝜈 is a prioriassumed to be zero. As to the proof of Theorem 3.1, we shall first show that𝜈 = 0 which brings us in the situation considered in [5]. Then we shall extend theargument to cover the case 𝜈 = 0 too. In this way, a new proof of Theorem 4.3in [5] is obtained which is more transparent than the original one. The argumentgiven in [5], although conceptually elementary, is technically quite complicated.The reasoning presented below is suggested by the material in [19], Section 3. Thespectrum of an element 𝑥 ∈ ℬ is denoted by 𝜎(𝑥).

Proof. As already indicated we first show that 𝜈 = 0. Put 𝑥1 = 𝑝1 + 𝑝2 − 𝑒ℬ and𝑥2 = 𝑝3 + 𝑝4 − 𝑒ℬ. Then, by Lemma 3 in [19],

𝜆 ∈ 𝜎(𝑥𝑗) ∖ {−1,+1} ⇔ −𝜆 ∈ 𝜎(𝑥𝑗) ∖ {−1,+1}, 𝑗 = 1, 2.


As 𝑥1 + 𝑥2 + (𝜈 + 2)𝑒ℬ = 0, we also have

𝜆 ∈ 𝜎(𝑥1) ⇔ −(𝜆+ 𝜈 + 2) ∈ 𝜎(𝑥2).

Introduce the set of (negative) integers

𝑁 = {−𝜈 − 1,−2𝜈 − 3,−3𝜈 − 5, . . .} ∪ {−𝜈 − 3,−2𝜈 − 5,−3𝜈 − 7, . . .},

and assume 𝜎(𝑥1) contains an element 𝜇 which is not in𝑁 . Then −𝜇−𝜈−2 belongsto 𝜎(𝑥2) and 𝜇+ 𝜈 + 2 ∈ 𝜎(𝑥2) provided that −𝜇− 𝜈 − 2 /∈ {−1,+1}. The lattercomes down to 𝜇 ∕= −𝜈− 1,−𝜈− 3 which certainly holds because 𝜇 is not in 𝑁 . So𝜇+ 𝜈 + 2 ∈ 𝜎(𝑥2). But then −𝜇− 2𝜈 − 4 ∈ 𝜎(𝑥1), and we get 𝜇+ 2𝜈 + 4 ∈ 𝜎(𝑥1)because −𝜇− 2𝜈− 4 /∈ {−1,+1}. Proceeding in this way (formally by induction ofcourse), we obtain 𝜇+2𝑘(𝜈+2) ∈ 𝜎(𝑥1), 𝑘 = 0, 1, 2, . . . . As 𝜈+2 is positive, thisconflicts with the boundedness of 𝜎(𝑥1) implied by the non-triviality of ℬ. Theconclusion is that 𝜎(𝑥1) ⊂ 𝑁 .

Next assume that there is an element 𝜇 ∈ 𝜎(𝑥1) which does not belong to−𝑁 ∪ {−1,+1}. As 𝜇 differs from −1 and +1, we have −𝜇 ∈ 𝜎(𝑥1), and thisimplies −𝜇 ∈ 𝑁 . But then 𝜇 ∈ −𝑁 and we have a contradiction. The conclusionis that 𝜎(𝑥1) ⊂ −𝑁 ∪ {−1,+1}.

The upshot of these arguments is that 𝜎(𝑥1) ⊂ 𝑁 ∩ [−𝑁 ∪ {−1,+1}]. Now 𝑁consists of negative integers, so 𝑁 and −𝑁 are disjoint. Hence 𝜎(𝑥1) is containedin 𝑁 ∩ {−1,+1}, and it follows that the latter is non-empty. Thus either −1 or+1 must belong to 𝑁 , and this is the case only when 𝜈 = 0, as desired.

In this way we have arrived at the situation considered in [5], Theorem 4.3: wehave four idempotents 𝑝1, 𝑝2, 𝑝3 and 𝑝4 in a Banach algebra adding up to the zeroelement. The following argument provides a new (and more transparent) proof forthe conclusion of [5], namely that 𝑝1 = 𝑝2 = 𝑝3 = 𝑝4 = 0.

With 𝑥1 = 𝑝1 + 𝑝2 − 𝑒ℬ and 𝑥2 = 𝑝3 + 𝑝4 − 𝑒ℬ as before, we have 𝑥1 +𝑥2 + 2𝑒ℬ = 0, and so 𝜆 ∈ 𝜎(𝑥1) ⇔ −(𝜆 + 2) ∈ 𝜎(𝑥2). Also with 𝜈 = 0, theset 𝑁 introduced above becomes 𝑁 = {−1,−3,−5, . . .}. As we have seen above,𝜎(𝑥1) ⊂ 𝑁 ∩ {−1,+1}, and it follows that 𝜎(𝑥1) = {−1}. We shall now investigatethe behavior of the resolvent (𝜆𝑒−𝑥1)−1 which is defined and analytic on ℂ ∖ {−1}.

Take 𝜆 ∕= −1,+1, and put 𝑦 = 𝑝1− 𝑝2. As has been observed in [19], and canbe easily verified, 𝑥21+ 𝑦

2 = 𝑒ℬ and 𝑦𝑥1 = −𝑥1𝑦. With the help of these identities,one easily obtains the identities 𝑦 = (𝜆𝑒ℬ − 𝑥1)𝑦(𝜆𝑒ℬ + 𝑥1)−1 and

(𝜆𝑒ℬ − 𝑥1)(𝜆𝑒ℬ + 𝑥1 − 𝑦(𝜆𝑒ℬ + 𝑥1)−1𝑦)

= 𝜆2𝑒ℬ − 𝑥21 − (𝜆𝑒ℬ − 𝑥1)𝑦(𝜆𝑒ℬ + 𝑥1)−1𝑦

= 𝜆2𝑒ℬ − 𝑥21 − 𝑦2

= (𝜆2 − 1)𝑒ℬ.


Dividing by 𝜆2 − 1 we get

(𝜆𝑒ℬ − 𝑥1)(

1

𝜆2 − 1(𝜆𝑒ℬ + 𝑥1 − 𝑦(𝜆𝑒ℬ + 𝑥1)−1𝑦)

)= 𝑒ℬ. (2)

In the same way one proves that, again for 𝜆 ∕= −1,+1,(1

𝜆2 − 1(𝜆𝑒ℬ + 𝑥1 − 𝑦(𝜆𝑒ℬ + 𝑥1)−1𝑦)

)(𝜆𝑒ℬ − 𝑥1) = 𝑒ℬ,

and, in combination with (2), this leads to

(𝜆𝑒ℬ − 𝑥1)−1 =(

1

𝜆2 − 1(𝜆𝑒ℬ + 𝑥1 − 𝑦(𝜆𝑒ℬ + 𝑥1)−1𝑦)

).

Now 1 /∈ 𝜎(𝑥1), so (𝜆𝑒ℬ + 𝑥1)−1 is analytic in a neighborhood of −1, and itfollows that (𝜆𝑒ℬ − 𝑥1)−1 has a simple pole at −1. But then

(𝜆𝑒ℬ − (𝑒ℬ + 𝑥1)

)−1has a simple pole at the origin. From 𝜎(𝑒ℬ + 𝑥1) = {0}, we see that 𝑒ℬ + 𝑥1 isquasinilpotent. Hence, by standard spectral theory,(

𝜆𝑒ℬ − (𝑒ℬ + 𝑥1))−1

=1

𝜆𝑒ℬ, 𝜆 ∕= 0.

Thus 𝜆𝑒ℬ − (𝑒ℬ + 𝑥1) = 𝜆 𝑒ℬ for all complex 𝜆, and so 𝑒ℬ + 𝑥1 = 0. In otherwords 𝑝1 + 𝑝2 = 0. From 𝑝1 = 𝑝

21 = (−𝑝2)2 = 𝑝22 = 𝑝2 = −𝑝1 it is now clear that

𝑝1 = 𝑝2 = 0. In the same way one shows that 𝑝3 and 𝑝4 vanish too. □

Corollary 3.2. Let 𝑝1, 𝑝2, 𝑝3 and 𝑝4 be idempotents in a Banach algebra ℬ with unitelement 𝑒ℬ, let 𝜈 be a non-negative integer, and let 𝒥 be a proper closed two-sidedideal in ℬ. If

𝑝1 + 𝑝2 + 𝑝3 + 𝑝4 + 𝜈𝑒ℬ ∈ 𝒥 ,then 𝜈 = 0 and 𝑝1, 𝑝2, 𝑝3, 𝑝4 ∈ 𝒥 .

Proof. Pass to the quotient algebra ℬ/𝒥 (which is non-trivial because 𝒥 is proper),and apply Theorem 3.1 with ℬ replaced by ℬ/𝒥 . □

4. Banach algebras of Cuntz type

The material in the next paragraph is presented by way of motivation for a defi-nition that will given below. The symbol 𝛿𝑘,𝑙 stands for the Kronecker delta.

The Cuntz algebra 𝒪𝑛 is the universal unital 𝐶∗-algebra generated by 𝑛isometries 𝑣1, . . . , 𝑣𝑛 ∈ 𝒪𝑛 satisfying the identities

𝑛∑𝑗=1

𝑣𝑗𝑣∗𝑗 = 𝑒, 𝑣∗𝑘𝑣𝑙 = 𝛿𝑘,𝑙𝑒, 𝑘, 𝑙 = 1, . . . , 𝑛, (3)

where 𝑒 is the unit element in 𝒪𝑛. Here 𝑛 is an integer larger than or equal to2. The first to consider this algebra was J. Cuntz [17]. The Cuntz algebras areuniversal in the sense that for fixed 𝑛, any two concrete realization generated byisometries 𝑣1, . . . , 𝑣𝑛 and 𝑣1, . . . , 𝑣𝑛, respectively, are *-isomorphic to each other


(cf. [17], [18]). For completeness, and to make the proper connection with [17], wenote that the relations (3) come down to the same as

𝑛∑𝑗=1

𝑣𝑗𝑣∗𝑗 = 𝑒, 𝑣∗𝑘𝑣𝑘 = 𝑒, 𝑘 = 1, . . . , 𝑛.

To see this multiply the first part of (3) from the left with 𝑣∗𝑘 and from the rightwith 𝑣𝑘, and recall that a sum of nonnegative elements in a 𝐶∗-algebra can onlyvanish when so do all its terms.

Returning to general Banach algebras (not necessarily 𝐶∗), we stipulate thata non-trivial unital Banach algebra ℬ will be said to have the Cuntz 𝑛-property if𝑛 is an integer larger than one and there exist elements 𝑣1, . . . , 𝑣𝑛, 𝑤1, . . . , 𝑤𝑛 inℬ such that

𝑛∑𝑗=1

𝑣𝑗𝑤𝑗 = 𝑒ℬ, 𝑤𝑘𝑣𝑙 = 𝛿𝑘,𝑙𝑒ℬ, 𝑘, 𝑙 = 1, . . . , 𝑛. (4)

We emphasize that this definition does not imply that the algebra is generated bythe elements 𝑣1, . . . , 𝑣𝑛, 𝑤1, . . . , 𝑤𝑛. The following statement can be easily verified.If, for 𝑠 = 1, 2, the Banach algebra ℬ has the Cuntz 𝑛𝑠-property, then ℬ has theCuntz (𝑛1 + 𝑛2 − 1)-property; hence, if ℬ has the Cuntz 𝑛-property, then ℬ hasthe Cuntz (𝑘𝑛 − 𝑘 + 1)-property for each positive integer 𝑘. The argument is asfollows. Suppose 𝑣𝑠,1, . . . , 𝑣𝑠,𝑛𝑠 , 𝑤𝑠,1, . . . , 𝑤𝑠,𝑛𝑠 ∈ ℬ satisfy the identities

𝑛𝑠∑𝑗=1

𝑣𝑠,𝑗𝑤𝑠,𝑗 = 𝑒ℬ, 𝑤𝑠,𝑘𝑣𝑠,𝑙 = 𝛿𝑘,𝑙𝑒ℬ, 𝑘, 𝑙 = 1, . . . , 𝑛𝑠.

For 𝑗 = 1, . . . , 𝑛1, write 𝑣𝑗 = 𝑣2,1𝑣1,𝑗 , 𝑤𝑗 = 𝑤1,𝑗𝑤2,1. Also, for the values of 𝑗ranging from 𝑛1 + 1, up to 𝑛1 + 𝑛2 − 1, put 𝑣𝑗 = 𝑣2,𝑗−𝑛1+1 and 𝑤𝑗 = 𝑤2,𝑗−𝑛1+1.Then (4) holds with 𝑛 = 𝑛1 + 𝑛2 − 1.

A non-trivial unital Banach algebra ℬ is said to be of Cuntz type if ℬ has theCuntz 𝑛-property for some integer 𝑛 larger than one. Such an algebra necessarilyis non-commutative.

Theorem 4.1. Let 𝑚 be an integer larger than or equal to five, let ℬ be a non-trivialunital Banach algebra, and suppose ℬ is of Cuntz type. Then ℬ allows for a zerosum of 𝑚 non-zero idempotents.

Thus Banach algebras of Cuntz type have the non-trivial zero sum prop-erty introduced in Section 2 and are (therefore) spectrally irregular. As has beenmentioned in Section 3, in a Banach algebra non-trivial zero sums of idempotentsinvolving less than five idempotents cannot exist.

Proof. We shall break up the reasoning into seven steps. In the first, some prepara-tory action is taken.

Step 1. If 𝐴1, . . . , 𝐴𝑙 are matrices, their direct sum will be denoted by 𝐴1⊕⋅ ⋅ ⋅⊕𝐴 𝑙

or⊕ 𝑙𝑗=1𝐴𝑗 . So⊕ 𝑙

𝑗=1𝐴𝑗 is the block diagonal matrix with 𝐴1, . . . , 𝐴 𝑙 on the diagonal


(in this order). In case all the matrices 𝐴𝑗 coincide with a single matrix 𝐴, we use𝐴⊕ 𝑙 for ⊕ 𝑙

𝑗=1𝐴𝑗 = 𝐴 ⊕ ⋅ ⋅ ⋅ ⊕ 𝐴 (with 𝑙 terms in the direct sum). If 𝐴 is the

sum of 𝑘 non-zero idempotents, then so is 𝐴⊕𝑙. Indeed, if 𝐴 =∑𝑘

𝑗=1 𝑃𝑗 , then

𝐴⊕𝑙 =∑𝑘

𝑗=1 𝑃⊕𝑙𝑗 .

Whenever convenient complex matrices will be identified with matrices hav-ing entries in ℬ. The identification goes via “tensorizing” with 𝑒ℬ, i.e., via replacingeach scalar entry by the corresponding multiple of the unit element in ℬ.Step 2. Consider the matrices

𝑀1 =1

2

[−5 35

−1 7

], 𝑀2 =

1

2

[−5 −351 7

].

𝑁1 =1

6

⎡⎢⎣ 5 15 −15 15 −170 210 −14

⎤⎥⎦, 𝑁2 =1

6

⎡⎢⎣ 5 −15 −1−5 15 1

70 −210 −14

⎤⎥⎦, 𝑁3 =1

3

⎡⎢⎣ 10 0 1

0 0 0

−70 0 −7

⎤⎥⎦.These are idempotents, regardless of whether they are considered as complex ma-trices (in which case they have rational entries and rank 1), or as matrices withentries in ℬ. Also

𝑀1 +𝑀2 =

[−5 0

0 7

], 𝑁1 +𝑁2 +𝑁3 =

⎡⎢⎣5 0 0

0 5 0

0 0 −7

⎤⎥⎦ .Thus [−5] ⊕ [7 ] is the sum of the two non-zero idempotents 𝑀1 and 𝑀2, and[5]⊕ [5]⊕ [−7 ] is the sum of the three non-zero idempotents 𝑁1, 𝑁2 and 𝑁3.

Step 3. For simplification of notation, put 𝑑 = 𝑛− 1. Then [[−5]⊕ [7 ]]⊕𝑑is a sum

of two non-zero idempotents, and[[5] ⊕ [5]⊕ [−7 ]]⊕𝑑

is a sum of three non-zeroidempotents. In fact [

[−5]⊕ [7 ]]⊕𝑑=𝑀⊕𝑑

1 +𝑀⊕𝑑2 ,[

[5]⊕ [5]⊕ [−7 ]]⊕𝑑= 𝑁⊕𝑑

1 +𝑁⊕𝑑2 +𝑁⊕𝑑

3 .

Now choose permutation similarities Π𝑀 and Π𝑁 (having the effect of interchang-ing rows and columns) such that Π−1𝑀 = Π𝑀 and Π−1𝑁 = Π𝑁 while, moreover,

Π𝑀

([[−5]⊕ [7 ]]⊕𝑑

)Π𝑀 = [−5]⊕𝑑 ⊕ [7 ]⊕𝑑,

Π𝑁

([[5]⊕ [5]⊕ [−7 ]]⊕𝑑

)Π𝑁 = [5]⊕(𝑑+1) ⊕ [5]⊕(𝑑−1) ⊕ [−7 ]⊕𝑑.


It follows that the right-hand sides of these expressions are the sum of two andthree idempotents, respectively:

[−5]⊕𝑑 ⊕ [7 ]⊕𝑑 =2∑

𝑘=1

Π𝑀

(𝑀⊕𝑑

𝑘

)Π𝑀 ,

[5]⊕(𝑑+1) ⊕ [5]⊕(𝑑−1) ⊕ [−7 ]⊕𝑑 =

3∑𝑘=1

Π𝑁

(𝑁⊕𝑑

𝑘

)Π𝑁 .

Step 4. Before proceeding with the main line of the argument, we make an auxiliaryremark. Let 𝑣 ∈ ℬ𝑗×𝑘, 𝑤 ∈ ℬ𝑘×𝑗 and suppose 𝑤𝑣 is the identity element in ℬ𝑘×𝑘.Further assume 𝑎 ∈ ℬ𝑘×𝑘 is a sum of 𝑙 non-zero idempotents in ℬ𝑘×𝑘. Then theproduct 𝑣𝑎𝑤 ∈ ℬ𝑗×𝑗 is a sum of 𝑙 non-zero idempotents in ℬ𝑗×𝑗. To see this,write 𝑎 = 𝑟1 + ⋅ ⋅ ⋅ + 𝑟𝑙 with 𝑟1, . . . , 𝑟𝑙 non-zero idempotents in ℬ𝑘×𝑘. Then wehave 𝑣𝑎𝑤 = 𝑣𝑟1𝑤 + ⋅ ⋅ ⋅ + 𝑣𝑟𝑙𝑤 with (𝑣𝑟𝑖𝑤)

2 = 𝑣𝑟𝑖𝑤𝑣𝑟𝑖𝑤 = 𝑣𝑟2𝑖𝑤 = 𝑣𝑟𝑖𝑤. From𝑤𝑣𝑟𝑖𝑤𝑣 = 𝑟𝑖 one sees that 𝑣𝑟𝑖𝑤 cannot be zero.

Step 5. From now on 𝑣1, . . . , 𝑣𝑛, 𝑤1, . . . , 𝑤𝑛 will be elements in ℬ satisfying (4).Put 𝑉 = [𝑣1 . . . 𝑣𝑑+1] and 𝑊 = [𝑤1 . . . 𝑤𝑑+1]

⊤, where ⊤ signals the operation oftaking the transpose. Then 𝑉 ∈ ℬ1×(𝑑+1),𝑊 ∈ ℬ(𝑑+1)×1, and (4) can be rephrasedby saying that 𝑉𝑊 is the identity element in ℬ and 𝑊𝑉 is the identity elementin ℬ(𝑑+1)×(𝑑+1). Write

𝑉1 = 𝑉 ⊕ [𝑒ℬ]⊕(2𝑑−1) = 𝑉 ⊕ [𝑒ℬ]⊕(𝑑−1) ⊕ [𝑒ℬ]⊕𝑑,

𝑊1 =𝑊 ⊕ [𝑒ℬ]⊕(2𝑑−1) =𝑊 ⊕ [𝑒ℬ]⊕(𝑑−1) ⊕ [𝑒ℬ]⊕𝑑.

These matrices belong to ℬ2𝑑×3𝑑 and ℬ3𝑑×2𝑑, respectively. Also 𝑊1𝑉1 is the iden-tity element in ℬ3𝑑×3𝑑. In combination with what we obtained in Step 4, this givesthat 𝑉1

([5]⊕(𝑑+1)⊕ [5]⊕(𝑑−1)⊕ [−7]⊕𝑑

)𝑊1 is a sum of three non-zero idempotents

in ℬ2𝑑×2𝑑 :

𝑉1([5]⊕(𝑑+1) ⊕ [5]⊕(𝑑−1) ⊕ [−7 ]⊕𝑑

)𝑊1 =

3∑𝑘=1

𝑉1Π𝑁

(𝑁⊕𝑑

𝑘

)Π𝑁𝑊1.

Using the defining expressions for 𝑉1 and 𝑊1, we can rewrite the left-hand side ofthis identity as 5𝑉𝑊 ⊕ [5]⊕(𝑑−1) ⊕ [−7 ]⊕𝑑 and this, in view of the fact that 𝑉𝑊is the identity element in ℬ, is equal to [5]⊕𝑑⊕ [−7 ]⊕𝑑. So the latter is the sum ofthree non-zero idempotents in ℬ2𝑑× 2𝑑 :

[5]⊕𝑑 ⊕ [−7 ]⊕𝑑 =3∑

𝑘=1

𝑉1Π𝑁

(𝑁⊕𝑑

𝑘

)Π𝑁𝑊1.

Again referring to Step 2, we recall that [−5]⊕𝑑⊕ [7 ]⊕𝑑 is the sum of the two non-

zero idempotents in ℬ2𝑑× 2𝑑, namely Π𝑀

(𝑀⊕𝑑

1

)Π𝑀 and Π𝑀

(𝑀⊕𝑑

2

)Π𝑀 . Thus


the zero element in ℬ2𝑑× 2𝑑 appears as the sum of five non-zero idempotents:

2∑𝑘=1

Π𝑀

(𝑀⊕𝑑

𝑘

)Π𝑀 +

3∑𝑘=1

𝑉1Π𝑁

(𝑁⊕𝑑

𝑘

)Π𝑁𝑊1 = 0.

Step 6. Next we make a reduction from ℬ2𝑑× 2𝑑 to ℬ. Put𝑉2 =

[𝑣1 ⋅ ⋅ ⋅ 𝑣𝑑 𝑣𝑑+1𝑣1 ⋅ ⋅ ⋅ 𝑣𝑑+1𝑣𝑑

],

𝑊2 =[𝑤1 ⋅ ⋅ ⋅ 𝑤𝑑 𝑤1𝑤𝑑+1 ⋅ ⋅ ⋅ 𝑤𝑑𝑤𝑑+1

]⊤.

These matrices belong to ℬ1×2𝑑 and ℬ2𝑑×1, respectively. Also 𝑊2𝑉2 is the identityelement in ℬ2𝑑×2𝑑, and so (again on account of Step 4) the zero element in ℬappears as the sum of five non-zero idempotents in ℬ:

2∑𝑘=1

𝑉2Π𝑀

(𝑀⊕𝑑

𝑘

)Π𝑀𝑊2 +

3∑𝑘=1

𝑉2𝑉1Π𝑁

(𝑁⊕𝑑

𝑘

)Π𝑁𝑊1𝑊2 = 0. (5)

Step 7.We have proved the statement in the theorem now for𝑚 = 5: there exist fivenon-zero idempotents 𝑝(5)1, . . . , 𝑝(5)5 ∈ ℬ with 𝑝(5)1+ ⋅ ⋅ ⋅+ 𝑝(5)5 = 0. To get theresult for arbitrary 𝑚 larger than or equal to five, it suffices to deal with the cases𝑚 = 6 up to and including 𝑚 = 9. Indeed, we then have that for 𝑙 = 5, 6, 7, 8, 9there are non-zero idempotents 𝑝(𝑙)1, . . . , 𝑝(𝑙)𝑙 ∈ ℬ such that 𝑝(𝑙)1+ ⋅ ⋅ ⋅+𝑝(𝑙)𝑙 = 0.Given any integer 𝑚 larger than or equal to five, we write 𝑚 in the form 5𝑘 + 𝑙with 𝑘 a non-negative integer and 𝑙 ∈ {5, 6, 7, 8, 9}. Then

𝑘𝑝1 + ⋅ ⋅ ⋅+ 𝑘𝑝5 + 𝑝(𝑙)1 + ⋅ ⋅ ⋅+ 𝑝(𝑙)𝑙 = 0

is a zero sum involving 𝑚 non-zero idempotents in ℬ.We finish the proof by establishing the existence of the idempotents

𝑝(𝑙)1, . . . , 𝑝(𝑙)𝑙 featuring in the previous paragraph. Take 𝑙 ∈ {6, 7, 8, 9}, and put

𝑞𝑗 =

⎧⎨⎩

[𝑝𝑗 0

0 0

], 𝑗 = 1, . . . , (𝑙 − 5),

[𝑝𝑗 0

0 𝑝 5+𝑗−𝑙

], 𝑗 = (𝑙 − 4), . . . , 5,

[0 0

0 𝑝 5+𝑗−𝑙

], 𝑗 = 6, . . . , 𝑙.

Then 𝑞1, . . . , 𝑞𝑙 are 𝑙 non-zero idempotents in ℬ2×2 adding up to the zero ele-ment in ℬ2×2. For 𝑝(𝑙)𝑗 we can now take [𝑣1 𝑣2]𝑞𝑗 [𝑤1 𝑤2]

⊤, 𝑗 = 1, . . . , 𝑙. Indeed,[𝑤1 𝑤2]

⊤[𝑣1 𝑣2] is the unit element in ℬ2×2. □Elaborating on the above proof, we rewrite the identity (5) as

2∑𝑘=1

𝑉𝑀

(𝑀⊕(𝑛−1)𝑘

)𝑊𝑀 +

3∑𝑘=1

𝑉𝑁

(𝑁⊕(𝑛−1)𝑘

)𝑊𝑁 = 0,


where 𝑉𝑀 ∈ ℬ1× 2(𝑛−1) and 𝑊𝑀 ∈ ℬ2(𝑛−1)×1 are given by

𝑉𝑀 =[𝑣1 ⋅ ⋅ ⋅ 𝑣𝑛−1 𝑣𝑛𝑣1 ⋅ ⋅ ⋅ 𝑣𝑛𝑣𝑛−1

]Π𝑀

𝑊𝑀 = Π𝑀

[𝑤1 ⋅ ⋅ ⋅ 𝑤𝑛−1 𝑤1𝑤𝑛 ⋅ ⋅ ⋅ 𝑤𝑛−1𝑤𝑛

]⊤,

and 𝑉𝑁 ∈ ℬ1× 3(𝑛−1) and 𝑊𝑁 ∈ ℬ3(𝑛−1)×1 by

𝑉𝑁 =[𝑣21 𝑣1𝑣2 ⋅ ⋅ ⋅ 𝑣1𝑣𝑛 𝑣2 ⋅ ⋅ ⋅ 𝑣𝑛−1 𝑣𝑛𝑣1 ⋅ ⋅ ⋅ 𝑣𝑛𝑣𝑛−1

]Π𝑁

𝑊𝑁 = Π𝑁

[𝑤21 𝑤2𝑤1 ⋅ ⋅ ⋅ 𝑤𝑛𝑤1 𝑤2 ⋅ ⋅ ⋅ 𝑤𝑛−1 𝑤1𝑤𝑛 ⋅ ⋅ ⋅ 𝑤𝑛−1𝑤𝑛

]⊤.

Anticipating on what we shall need in the proof of Theorem 6.3 below, wenote that 𝑊𝑀𝑉𝑀 and 𝑊𝑁𝑉𝑁 are the identity elements in ℬ2(𝑛−1)× 2(𝑛−1) andℬ3(𝑛−1)× 3(𝑛−1), respectively.

In our reasoning, the idempotents 𝑝1, . . . , 𝑝𝑚 meant in Theorem 4.1 come upas linear combinations involving rational coefficients of monomials in the elements𝑣1, . . . , 𝑣𝑛, 𝑤1, . . . , 𝑤𝑛 satisfying (4). It is possible to give additional information,for instance on the number of the monomials involved (maximally 9(𝑛− 1)2) andtheir degree (at most 4), but we refrain from giving further details here. Insteadwe say something more about the cases 𝑛 = 2, 𝑚 = 5 and 𝑛 = 3, 𝑚 = 5.

When 𝑛 = 2, the matrix Π𝑀 can be chosen to be the 2× 2 identity matrix,and one obtains

𝑉𝑀 =[𝑣1 𝑣2𝑣1

], 𝑊𝑀 =

[𝑤1 𝑤1𝑤2

]⊤.

Also, for Π𝑁 one can take the 3× 3 identity matrix which leads to𝑉𝑁 =

[𝑣21 𝑣1𝑣2 𝑣2𝑣1

], 𝑊𝑁 =

[𝑤21 𝑤2𝑤1 𝑤1𝑤2

]⊤.

The idempotents in ℬ associated with the 2 × 2 matrices 𝑀𝑘, 𝑘 = 1, 2, from

Step 2 in the proof of Theorem 4.1 are now[𝑣1 𝑣2𝑣1

]𝑀𝑘

[𝑤1 𝑤1𝑤2

]⊤. Similarly,

those corresponding to the 3 × 3 matrices 𝑁𝑘, 𝑘 = 1, 2, 3 (see again Step 2),

can be written as[𝑣21 𝑣1𝑣2 𝑣2𝑣1

]𝑁𝑘

[𝑤1 𝑤1𝑤2 𝑤2𝑤1

]⊤. These five non-zero

idempotents, involving degree four polynomials in the elements 𝑣1, 𝑣2, 𝑤1 and 𝑤2,add up to the zero element in ℬ.

In case 𝑛 = 3, the matrix Π𝑀 can be chosen to be the 4 × 4 permutationsimilarity corresponding to the exchange of the second and the third row (column),and one gets

𝑉𝑀 =[𝑣1 𝑣3𝑣1 𝑣2 𝑣3𝑣2

],

𝑊𝑀 =[𝑤1 𝑤1𝑤3 𝑤2 𝑤2𝑤3

]⊤.

Also, for Π𝑁 one can take the 6 × 6 permutation similarity corresponding to theexchange of the third and the fifth row (column), which leads to

𝑉𝑁 =[𝑣21 𝑣1𝑣2 𝑣3𝑣1 𝑣2 𝑣1𝑣3 𝑣3𝑣2

],

𝑊𝑁 =[𝑤21 𝑤2𝑤1 𝑤1𝑤3 𝑤2 𝑤3𝑤1 𝑤2𝑤3

]⊤.


The idempotents in ℬ associated with the 2×2 matrices𝑀𝑘, 𝑘 = 1, 2, from Step 2in the proof of Theorem 4.1 are now[

𝑣1 𝑣3𝑣1 𝑣2 𝑣3𝑣2] [𝑀𝑘 0

0 𝑀𝑘

] [𝑤1 𝑤1𝑤3 𝑤2 𝑤2𝑤3

]⊤.

Similarly, those corresponding to the 3× 3 matrices 𝑁𝑘, 𝑘 = 1, 2, 3 (see once moreStep 2), can be written as

[𝑣21 𝑣1𝑣2 𝑣3𝑣1 𝑣2 𝑣1𝑣3 𝑣3𝑣2

] [𝑁𝑘 0

0 𝑁𝑘

]× [𝑤2

1 𝑤2𝑤1 𝑤1𝑤3 𝑤2 𝑤3𝑤1 𝑤2𝑤3

]⊤.

These five non-zero idempotents, involving degree four polynomials in the elements𝑣1, 𝑣2, 𝑣3, 𝑤1, 𝑤2 and 𝑤3, add up to the zero element in ℬ.

For these small values of 𝑛 (2 and 3) and 𝑚 (= 5), the five polynomials thatcame up above can of course be computed explicitly. Once this is done, it alsopossible to prove directly that they constitute non-zero idempotents in ℬ whichadd up to the zero element of ℬ. For other low values of 𝑛 and 𝑚 such an approachmight work too; for higher values it becomes practically unmanageable though.

A Cuntz algebra obviously is a Banach algebra (actually a 𝐶∗-algebra) ofCuntz type. Thus we have the following direct consequence of Theorem 4.1.

Corollary 4.2. Given an integer 𝑚 larger than or equal to five, a Cuntz algebraallows for a zero sum of 𝑚 non-zero idempotents.

In particular Cuntz algebras have the non-trivial zero sum property and are(consequently) spectrally irregular.

5. Sums of idempotents in Banach algebras of the form 퓛(𝑿)

Throughout this section, 𝑋 will denote a (non-trivial) Banach space. Adoptingstandard terminology, by a projection in 𝑋 we mean an idempotent in the Banachalgebra ℒ(𝑋) of bounded linear operators on 𝑋 . If 𝑃 is a projection in 𝑋 , the(possibly infinite) dimension of the range of 𝑃 will be called the rank of 𝑃 , writtenrank𝑃 . The following result generalizes Proposition 2.1 in [5] which states thatnon-trivial zero sums of finite rank projections cannot exist.

Proposition 5.1. Let 𝑃1, . . . , 𝑃𝑚 be finite rank projections in 𝑋 and assume theirsum 𝑃1 + ⋅ ⋅ ⋅+ 𝑃𝑚 is quasinilpotent. Then 𝑃𝑘 = 0, 𝑘 = 1, . . . ,𝑚.

So, in fact, the sum 𝑃1 + ⋅ ⋅ ⋅+ 𝑃𝑚 vanishes.

Proof. Put 𝑆 = 𝑃1+ ⋅ ⋅ ⋅+𝑃𝑛. Then 𝑆 is quasinilpotent and has finite rank. Hence𝑆 is nilpotent and the trace of 𝑆, written trace𝑆, vanishes. As 𝑃1, . . . , 𝑃𝑚 are


projections, we have trace𝑃𝑘 = rank𝑃𝑘, 𝑘 = 1, . . .𝑚. It follows that

𝑛∑𝑘=1

rank𝑃𝑘 =

𝑛∑𝑘=1

trace𝑃𝑘 = trace

( 𝑛∑𝑘=1

𝑃𝑘

)= trace𝑆 = 0,

and we get 𝑃𝑘 = 0, 𝑘 = 1, . . . , 𝑛, as desired. □

If 𝑃 is a projection in 𝑋 , the (possibly infinite) dimension of the null spaceof 𝑃 is called the co-rank of 𝑃 , written co-rank𝑃 . Note that co-rank𝑃 coincideswith rank (𝐼𝑋 − 𝑃 ), where 𝐼𝑋 is the identity operator on 𝑋 .

Proposition 5.2. Let 𝑚 be a positive integer and let 𝑃1, . . . , 𝑃𝑚 be projections in 𝑋.Assume these projections have all finite co-rank and 𝑃1+⋅ ⋅ ⋅+𝑃𝑚 is quasinilpotent.Then 𝑋 is finite dimensional and 𝑃𝑘 = 0, 𝑘 = 1, . . . ,𝑚.

So, actually, the sum 𝑃1 + ⋅ ⋅ ⋅+ 𝑃𝑚 vanishes.

Proof. Taking into account Proposition 5.1, it suffices to show that 𝑋 is finitedimensional. Put 𝑆 = 𝑃1 + ⋅ ⋅ ⋅+ 𝑃𝑚. Then 𝑆 is quasinilpotent and, consequently,𝑚𝐼𝑋 − 𝑆 is invertible. On the other hand

𝑚𝐼𝑋 − 𝑆 =𝑚∑

𝑘=1

(𝐼𝑋 − 𝑃𝑘)

is of finite rank. But then so is 𝐼𝑋 = (𝑚𝐼𝑋 − 𝑆)−1(𝑚𝐼𝑋 − 𝑆), and the finitedimensionality of 𝑋 follows. □

Theorem 5.3. Let 𝑚 be a positive integer, let 𝑃1, . . . , 𝑃𝑚 be projections in 𝑋, andsuppose the sum 𝑃1 + ⋅ ⋅ ⋅ + 𝑃𝑚 is compact. Then (precisely) one of the followingstatements holds:

(a) 𝑃1, . . . , 𝑃𝑚 are all of finite rank (hence so is their sum),(b) 𝑚 ≥ 5 and at least five among the projections 𝑃1, . . . , 𝑃𝑚 have both infinite

rank and co-rank.

It is worthwhile to say a few words about the situation when all the projec-tions 𝑃1, . . . , 𝑃𝑚 have finite co-rank. If that is the case and, in addition,𝑃1 + ⋅ ⋅ ⋅ + 𝑃𝑚 is compact, then (a) holds, i.e., 𝑃1, . . . , 𝑃𝑚 are all of finite rankas well. It follows that 𝐼𝑋 = (𝐼𝑋 − 𝑃1) + 𝑃1 has finite rank, and we arrive atone of the conclusions also appearing in Proposition 5.2, namely that 𝑋 is finitedimensional. So a compact sum of finite co-rank projections can only occur whenthe underlying space is finite dimensional.

Proof. First assume that each of the projections 𝑃1, . . . , 𝑃𝑚 is either of finite rankor of finite co-rank. Write 𝑘 for the number of projections among 𝑃1, . . . , 𝑃𝑚that are of finite co-rank. If 𝑘 = 0, we have (a). So suppose 𝑘 is at least one.Renumbering (if necessary), we can achieve the situation where 𝑃1, . . . , 𝑃𝑘 have


finite co-rank and 𝑃𝑘+1, . . . , 𝑃𝑚 are of finite rank. Now

𝑘∑𝑗=1

𝑃𝑘 =𝑛∑

𝑗=1

𝑃𝑘 −𝑛∑

𝑗=𝑘+1

𝑃𝑘,

where the first sum in the right-hand side is compact (by hypothesis) and the sec-ond of finite rank. So 𝑃1+⋅ ⋅ ⋅+𝑃𝑘 is compact. The projections (𝐼𝑋−𝑃1), . . . , (𝐼𝑋−𝑃𝑘) are of finite rank. Further

𝐼𝑋 =1

𝑘

( 𝑘∑𝑗=1

𝑃𝑘 +

𝑘∑𝑗=1

(𝐼𝑋 − 𝑃𝑘)).

It follows that 𝐼𝑋 is compact and, consequently, 𝑋 is finite dimensional. Underthese circumstances, the validity of (a) is a triviality.

Next consider the case when among 𝑃1, . . . , 𝑃𝑚, there are idempotents whichare neither of finite rank nor of finite co-rank. Let there be 𝑛 of those. We mayassume (renumbering if necessary) that 𝑃1, . . . , 𝑃𝑛 are of this type and (hence)𝑃𝑛+1, . . . , 𝑃𝑚 are not, i.e., they are of finite rank or finite co-rank. Let 𝜈 bethe number of idempotents among 𝑃𝑛+1, . . . , 𝑃𝑚 that have finite co-rank, andsuppose (without loss of generality) that 𝑃𝑛+1, . . . , 𝑃𝑛+𝜈 are of that kind. Then𝑃𝑛+𝜈+1, . . . , 𝑃𝑛 have finite rank.

The same is true for the projections (𝐼𝑋 − 𝑃𝑛+1), . . . , (𝐼𝑋 − 𝑃𝑛+𝜈), and itfollows that 𝑃1+⋅ ⋅ ⋅+𝑃𝑛+𝜈𝐼𝑋 is compact. Now apply Corollary 3.2 with ℬ = ℒ(𝑋)and taking for 𝒥 the ideal of compact operators on 𝑋 . This gives that all theprojections 𝑃1, . . . , 𝑃𝑛 are compact. But then they are of finite rank, which isimpossible in view of how the number 𝑛 has been introduced. So 𝑛 (and a fortiori𝑚) must be at least five, as claimed in (b). □

In the situation where the sum of idempotents in Theorem 5.3 is both compactand quasinilpotent (for instance because it vanishes), the conclusion of the theoremcan be sharpened.

Theorem 5.4. Let 𝑚 be a positive integer and let 𝑃1, . . . , 𝑃𝑚 be projections in 𝑋.Suppose the sum 𝑃1+ ⋅ ⋅ ⋅+𝑃𝑚 is compact and quasinilpotent. Then (precisely) oneof the following statements holds:

(a) 𝑃𝑘 = 0, 𝑘 = 1, . . . ,𝑚 (so, in fact, the sum 𝑃1 + ⋅ ⋅ ⋅+ 𝑃𝑚 vanishes),(b) 𝑚 ≥ 5 and at least five among the idempotents 𝑃1, . . . , 𝑃𝑛 have both infinite

rank and co-rank.

Proof. Combine Theorem 5.3 and Proposition 5.1. □As we have seen, in dealing with non-trivial zero sums of idempotents, the

number five plays a special role. This fact is underlined by the following result onzero sums of five projections.

Corollary 5.5. Let 𝑃1, 𝑃2, 𝑃3, 𝑃4 and 𝑃5 be projections in 𝑋, not all equal to thezero operator on 𝑋, and assume 𝑃1+𝑃2+𝑃3+𝑃4+𝑃5 = 0. Then all five projections𝑃1, 𝑃2, 𝑃3, 𝑃4 and 𝑃5 have both infinite rank and co-rank.


Before we proceed with some additional observations, we make a connectionwith [15]. Notions like finite rank and compactness can be introduced in a meaning-ful way for elements in 𝐶∗-algebras (see [4] and [24]). It turns out that the resultsobtained so far in this section have analogues in this 𝐶∗-context. For details werefer to [15], Section 3 in particular. The proofs presented there (of Propositions3.9 and 3.10, of Theorems 3.12 and 3.13, and of Corollary 3.14) are modificationsof those given here.

Now let us return to zero sums of projections in the given Banach space 𝑋 .Suppose we have such a sum:

𝑃1 + ⋅ ⋅ ⋅+ 𝑃𝑚 = 0, (6)

with 𝑚 a positive integer and 𝑃1, . . . , 𝑃𝑚 non-zero projections in 𝑋 . Then neces-sarily𝑚 ≥ 5, and at least five among the idempotents 𝑃1, . . . , 𝑃𝑛 have both infiniterank and co-rank (Theorem 5.4). Without loss of generality we may assume that𝑃𝑚 is of this type. Thus both the image and the null space of 𝑃𝑚 have infinitedimension. Let 𝑛 be any positive integer larger than 𝑚, and let 𝑟1, . . . , 𝑟𝑛−𝑚 bepositive integers too. Then a routine argument shows that 𝑃𝑚 can be written as𝑃𝑚 = 𝑄𝑚 + 𝑄𝑚+1 + ⋅ ⋅ ⋅ + 𝑄𝑛 with 𝑄𝑚 a projection of both infinite rank andco-rank, and 𝑄𝑗 a projection of finite rank 𝑟𝑗−𝑚, 𝑗 = 𝑚+1, . . . , 𝑛. In this way wearrive at the zero sum

𝑃1 + ⋅ ⋅ ⋅+ 𝑃𝑚−1 +𝑄𝑚 +𝑄𝑚+1 + ⋅ ⋅ ⋅+𝑄𝑛 = 0, (7)

involving a total of 𝑛 non-zero idempotents, featuring just as many projectionsof both infinite rank and co-rank as there are in the original zero sum (6), andcompared to that one having 𝑛−𝑚 additional projections of prescribed finite rank.One may ask whether it is also possible to transform (6) into a zero sum (7) with𝑛 terms by writing 𝑃𝑚 as a sum 𝑃𝑚 = 𝑄𝑚 + 𝑄𝑚+1 + ⋅ ⋅ ⋅ + 𝑄𝑛 involving onlyprojections of both infinite rank and co-rank. This is problematic because, as hasbeen shown in [22], there are Banach spaces lacking complemented subspaces withboth infinite dimension and codimension; see however Theorem 6.3 below.

Suppose that besides𝑋 another (non-trivial) Banach space 𝑌 is given. If thereexists an injective continuous Banach algebra homomorphism Φ : ℒ(𝑌 ) → ℒ(𝑋)(possibly non-unital) and if ℒ(𝑌 ) has the non-trivial zero sum property, thenobviously so does ℒ(𝑋). Indeed, if 𝑃1 + ⋅ ⋅ ⋅ + 𝑃𝑚 = 0 is a non-trivial zero sumof projections in 𝑋 , then Φ(𝑃1) + ⋅ ⋅ ⋅ + Φ(𝑃𝑚) = 0 is a non-trivial zero sum ofprojections in 𝑌 . This straightforward observation lead to the following embeddingissue: when can ℒ(𝑌 ) be viewed as a continuously embedded subalgebra of ℒ(𝑋)?This is a non-trivial question indeed. Even first attempts to deal with it touch ondeep problems in the geometry of Banach spaces. Here are some observations.

There is one very simple case in which the answer is positive.

Proposition 5.6. Assume 𝑌 is isomorphic to a complemented subspace of 𝑋. Thenℒ(𝑌 ) can be continuously embedded into ℒ(𝑋), i.e., there exists an injective con-tinuous Banach algebra homomorphism from ℒ(𝑌 ) into ℒ(𝑋).


Proof. We may assume that 𝑌 is a complemented subspace of 𝑋 . Let 𝐽 be thenatural embedding of the (closed) subspace 𝑌 into𝑋 and let 𝑃 be the projection of𝑋 onto 𝑌 , viewed as an operator from 𝑋 into 𝑌 . Then 𝐽 : 𝑌 → 𝑋 and 𝑃 : 𝑋 → 𝑌are bounded linear operators, injective and surjective, respectively. Also 𝑃𝐽 is theidentity operator on 𝑌 . Define Φ : ℒ(𝑌 ) → ℒ(𝑋) by Φ(𝑇 ) = 𝐽𝑇𝑃, 𝑇 ∈ ℒ(𝑌 ).Then Φ is an injective continuous Banach algebra homomorphism. □

The complementedness assumption in Proposition 5.6 is essential: when 𝑌is a non-complemented closed subspace 𝑌 of 𝑋 it may happen that an injectivecontinuous Banach algebra homomorphism from ℒ(𝑌 ) into ℒ(𝑋) does not exist.For an example we need to delve into the geometry of Banach spaces.

For quite some time it was an open problem whether there exists an infinite-dimensional Banach space 𝑍 that solves the so-called scalar-plus-compact problem.This means that 𝑍 has only very few operators in the sense that each boundedlinear operator on 𝑍 is the sum of a scalar multiple of 𝐼𝑍 and a compact operatoron 𝑍. It was in [2] that a first example was given. In the recent paper [1], anotherspace 𝐸 with the property in question has been produced, this time having theadditional feature that 𝐸 contains a closed subspace 𝐹 which is isomorphic tothe separable Hilbert space ℓ2. As is clear from Corollary 2.2 in [10], a Banachspace that solves the scalar-plus-compact problem does not allow for non-trivialzero sums of projections (see also [11], Corollary 3.4). So this is the case for 𝐸.However, as has been established in [5], Example 3.1, such a non-trivial zero sumof projections does exist for the Banach space ℓ2. By isomorphy, this carries over to𝐹 . Hence there is no injective continuous Banach algebra homomorphism (possiblynon-unital) of ℒ(𝐹 ) into ℒ(𝐸). This conclusion is corroborated by the fact thatℒ(𝐸) is spectrally regular (see Corollary 4.3 in [11]) and ℒ(𝐹 ) is not (cf. [13],Section 4 in particular).

6. Cuntz type Banach algebras of the form 퓛(𝑿)

As was mentioned in Section 1 (Introduction), the first example found of a unitalBanach algebra failing to be spectrally regular was the operator algebra ℒ(ℓ2).Actually, this algebra allows for non-trivial zero sums of idempotents which impliesthat it lacks the property of being spectrally regular. Example 3.1 in [5], exhibitingthis, fits in the framework of Cuntz type algebras developed in Section 4. In thepresent section, the set up in question will be considered for Banach algebras ofthe form ℒ(𝑋). But first we make some introductory remarks.

As we have seen already at the end of the previous section, it can happen thatℒ(𝑋) is spectrally regular even when 𝑋 has infinite dimension. The examples from[2] and [1] mentioned there are rather spectacular and connected to deep problemsfrom Banach space geometry. Another similarly remarkable instance where ℒ(𝑋)is spectrally regular can be found in [22]. Indeed, an example is given there of aninfinite-dimensional Banach space 𝑋 such that each bounded linear operator on𝑋 is the sum of a scalar multiple of 𝐼𝑋 and a strictly singular operator on 𝑋 . In


that case the arguments given in the first part of [11], Section 4 apply upon slightmodification.

Next we turn to the investigation of the situation where ℒ(𝑋) is of Cuntztype (hence spectrally irregular because of the occurrence of non-trivial zero sumsof idempotents). The following observation is straightforward.

Proposition 6.1. If the Banach algebra ℬ has the Cuntz 𝑛-property, then so hasthe operator algebra ℒ(ℬ).Proof. Let 𝑣1, . . . , 𝑣𝑛, 𝑤1, . . . , 𝑤𝑛 be elements in ℬ satisfying (4). Replacing theseelements by their left regular representations on ℬ (considered as a Banach space),we obtain operators 𝑉1, . . . , 𝑉𝑛,𝑊1, . . . ,𝑊𝑛 in ℒ(ℬ) such that (4) holds with thelower case letters replaced by the corresponding upper case ones, and with 𝐼𝑋replaced by the identity operator 𝐼ℬ on ℬ. □

From Proposition 6.1 we see that with every Banach algebra of Cuntz typethere comes one of the form ℒ(𝑋) with 𝑋 a Banach space. This in itself is alreadyreason enough to pay special attention to the case of such operator algebras. Alsoit is an elementary fact that a unital Banach algebra ℬ can be identified (forinstance via the use of left regular representations) with a Banach subalgebra ofℒ(ℬ). Proposition 6.1 and its proof now tell us that each Banach algebra havingthe Cuntz 𝑛-property can be viewed as a Banach subalgebra of a Banach algebrahaving the Cuntz 𝑛-property too and being of the type ℒ(𝑋) for some Banachspace 𝑋 . This is an additional reason for now looking at Banach algebras of theform ℒ(𝑋).Theorem 6.2. Let 𝑋 be a non-trivial Banach space, and let 𝑛 be an integer largerthan one. Then the operator algebra ℒ(𝑋) has the Cuntz 𝑛-property if and only if𝑋 is isomorphic to 𝑋𝑛, where 𝑋𝑛 denotes the direct sum of 𝑛 copies of 𝑋.

For the arguments below to work, the norm on 𝑋𝑛 needs to have an appro-priate relationship with the given norm on 𝑋 . What matters is that the followinglinear operators are continuous: the embeddings

𝑋 ∋ 𝑥 �→ (0, . . . , 0, 𝑥, 0, . . . , 0)⊤ ∈ 𝑋𝑛,

with 𝑥 in the 𝑘th position, and the projections

𝑋𝑛 ∋ (𝑥1, . . . , 𝑥𝑛)⊤ �→ 𝑥𝑘 ∈ 𝑋.Here 𝑘 is allowed to take the values 1, . . . , 𝑛. Such norms are mutually equivalent,and we will settle here on the norm ∣∣∣ ⋅ ∣∣∣ : 𝑋𝑛 → [0,∞) given by

∣∣∣(𝑥1, . . . , 𝑥𝑛)⊤ ∣∣∣ = max𝑘=1,...,𝑛

∥𝑥𝑘∥, (𝑥1, . . . , 𝑥𝑛)⊤ ∈ 𝑋𝑛,

where ∥ ⋅ ∥ is the norm on 𝑋 . One could also take any norm on 𝑋𝑛 induced by amonotone norm on ℂ𝑛 (see [16] for the definition). For 𝑁 such a monotone normon ℂ𝑛, the norm ∣∣∣ ⋅ ∣∣∣𝑁 on 𝑋 induced by 𝑁 has the form ∣∣∣(𝑥1, . . . , 𝑥𝑛)⊤ ∣∣∣𝑁 =

𝑁(∥𝑥1∥, . . . , ∥𝑥𝑛∥), (𝑥1, . . . , 𝑥𝑛)⊤ ∈ 𝑋𝑛.


Proof. First let us deal with the “only if part” of the theorem. So assume theexistence of 𝑉1, . . . , 𝑉𝑛,𝑊1, . . . ,𝑊𝑛 in ℒ(𝑋) satisfying

𝑛∑𝑗=1

𝑉𝑗𝑊𝑗 = 𝐼𝑋 , 𝑊𝑘𝑉𝑙 = 𝛿𝑘,𝑙𝐼𝑋 , 𝑘, 𝑙 = 1, . . . , 𝑛, (8)

where 𝐼𝑋 stands for the identity operator on 𝑋 . Now introduce the bounded linearoperators 𝑉 : 𝑋𝑛 → 𝑋 and 𝑊 : 𝑋 → 𝑋𝑛 by

𝑉 (𝑥1, . . . , 𝑥𝑛)⊤ = 𝑉1𝑥1 + ⋅ ⋅ ⋅+ 𝑉𝑛𝑥𝑛, (𝑥1, . . . , 𝑥𝑛)

⊤ ∈ 𝑋𝑛, (9)

𝑊𝑥 = (𝑊1𝑥, . . . ,𝑊𝑛𝑥)⊤, 𝑥 ∈ 𝑋. (10)

By (8) these are each others inverse. Thus 𝑋 is isomorphic to 𝑋𝑛.Next we turn to the “if part” of the theorem. Let the bounded linear operators

𝑉 : 𝑋𝑛 → 𝑋 and 𝑊 : 𝑋 → 𝑋𝑛 be each others inverse. The choice made for thenorm on 𝑋𝑛 implies that the expressions (9) and (10) determine bounded linearoperators 𝑉1, . . . , 𝑉𝑛,𝑊1, . . . ,𝑊𝑛 on 𝑋 . The identities (8) follow from 𝑉𝑊 = 𝐼𝑋and 𝑊𝑉 = 𝐼𝑋𝑛 . □

Combining Proposition 6.1 and Theorem 6.2, one immediately gets that thefollowing is true. If the Banach algebra ℬ has the Cuntz 𝑛-property, then ℬ, viewedas Banach space, is isomorphic to ℬ𝑛. The converse is not true. For an example,take ℬ = ℓ∞ with the coordinatewise product. Clearly ℓ∞ and ℓ2∞ are isomorphic.However ℓ∞, being commutative, is not of Cuntz type. Note also that ℓ∞ providesan example of a unital Banach algebra ℬ which (being commutative) is spectrallyregular while ℒ(ℬ), being of Cuntz type, lacks this property. Contrasting with thiswe have that ℬ is spectrally regular whenever ℒ(ℬ) is. Indeed, ℬ can be viewedas a Banach subalgebra of ℒ(ℬ) (for instance via left regular representations) andCorollary 4.1 in [13] applies.

As an immediate consequence of Theorems 6.2 and 4.1 we have the followingresult. If 𝑋 is a non-trivial Banach space, 𝑛 is an integer larger than one, and theBanach spaces 𝑋𝑛 and 𝑋 are isomorphic, then the operator algebra ℒ(𝑋) allowsfor non-trivial zero sums of idempotents (hence it is spectrally irregular). In factwe can say a bit more (cf. the second paragraph after Corollary 5.5).

Theorem 6.3. Let 𝑋 be a non-trivial Banach space, let 𝑛 be an integer larger thanone, and assume 𝑋 is isomorphic to 𝑋𝑛. Then, given an integer 𝑚 larger than orequal to five, there exist 𝑚 projections 𝑃1, . . . , 𝑃𝑚 in 𝑋, all of infinite rank andinfinite co-rank, and such that 𝑃1 + ⋅ ⋅ ⋅+ 𝑃𝑚 = 0.

Proof. For this we return to the proof of Theorem 4.1 and the remark following it.The Banach algebra ℬ featuring there is now taken to be ℒ(𝑋) and, whenever con-venient, complex matrices will be identified with operator matrices having entriesin ℒ(𝑋), in this case via “tensorizing” with 𝐼𝑋 .

With the 2 × 2 matrices 𝑀1 and 𝑀2 and the 3× 3 matrices 𝑁1, 𝑁2 and 𝑁3

as is Step 2 of the proof of Theorem 4.1, we have a non-trivial zero sum of five


projections in 𝑋 , namely

2∑𝑘=1

𝑉𝑀

(𝑀⊕(𝑛−1)𝑘

)𝑊𝑀 +

3∑𝑘=1

𝑉𝑁

(𝑁⊕(𝑛−1)𝑘

)𝑊𝑁 = 0,

where the bounded linear operators 𝑉𝑀 ,𝑊𝑀 , 𝑉𝑁 and 𝑊𝑁 act as follows

𝑉𝑀 : 𝑋2(𝑛−1) → 𝑋, 𝑊𝑀 : 𝑋 → 𝑋2(𝑛−1),

𝑉𝑁 : 𝑋3(𝑛−1) → 𝑋, 𝑊𝑁 : 𝑋 → 𝑋3(𝑛−1),

and the products 𝑊𝑀𝑉𝑀 and 𝑊𝑁𝑉𝑁 yield the identity operators on 𝑋2(𝑛−1) and𝑋3(𝑛−1), respectively. Recall that, as the scalar matrices, 𝑀1,𝑀2, 𝑁1, 𝑁2 and 𝑁3

are rank one idempotents. Also note that the operators 𝑉𝑀 and 𝑉𝑁 are injective,and that 𝑊𝑀 and 𝑊𝑁 are surjective. It now suffices to establish the followingauxiliary result (valid because 𝑋 is infinite dimensional): if 𝑅 is a rank one 𝑘 × 𝑘idempotent matrix, 𝑉 : 𝑋𝑘 → 𝑋 is an injective linear operator, and 𝑊 : 𝑋 → 𝑋𝑘

is a surjective linear operator, then both the dimension of the range space of theoperator 𝑉 𝑅𝑊 : 𝑋 → 𝑋 and that of its null space are infinite.

To see this, we reason as follows. Modulo similarity it may be assumed that𝑅 has one in the (1, 1)th position and zeros everywhere else. Viewed as an op-erator matrix, 𝑅 then has 𝐼𝑋 in the (1, 1)th position and the zero operator on𝑋 everywhere else. From this (using the infinite dimensionality of 𝑋) it fol-lows that the range of 𝑅, written Im𝑅, and the null space of 𝑅, denoted byKer𝑅, both have infinite dimension. Of course 𝑅 is viewed here as an operatoron 𝑋𝑘. As 𝑊 is surjective, Im𝑉 𝑅𝑊 = 𝑉 [Im𝑅], and the injectivity of 𝑉 givesthat Im𝑉 𝑅𝑊 and Im𝑅 have the same dimension. Again using the injectivityof 𝑉 , we get Ker𝑉 𝑅𝑊 = 𝑊−1[Ker𝑅]. The surjectivity of 𝑊 now implies that𝑊 [Ker𝑉 𝑅𝑊 ] = Ker𝑅 and we see that the dimension of Ker𝑅 does not exceedthat of Ker𝑉 𝑅𝑊 . Thus both the dimension of ImVRW and that of Ker𝑉 𝑅𝑊are infinite.

This covers the case 𝑚 = 5. For 𝑚 > 5, use a similar argument and theconstruction described in Step 7 of the proof of Theorem 4.1. □

The following corollary will be used at several places in the next section (forinstance in the proof of Theorems 7.2 and 7.4).

Corollary 6.4. Let 𝑋 and 𝑌 be non-trivial Banach spaces, let 𝑌 be isomorphic toa complemented subspace of 𝑋, and suppose 𝑌 is isomorphic to 𝑌 𝑛 where 𝑛 isan integer larger than one. Then, given an integer 𝑚 larger than or equal to five,there exist 𝑚 projections 𝑃1, . . . , 𝑃𝑚 in 𝑋, all of infinite rank and infinite co-rank,and such that 𝑃1 + ⋅ ⋅ ⋅+ 𝑃𝑚 = 0.

In particular ℒ(𝑋) is spectrally irregular.Proof. We may assume that 𝑌 is a complemented subspace of 𝑋 so that thesituation is as in the proof of Proposition 5.6. Let Φ : ℒ(𝑌 ) → ℒ(𝑋) be theinjective continuous Banach algebra homomorphism constructed there. Clearly


maps projections in 𝑌 of infinite rank into projections of infinite rank in𝑋 , and theanalogue of this for co-ranks is valid too. By Theorem 6.3 there exist𝑚 projections𝑄1, . . . , 𝑄𝑚 in 𝑌 , all of infinite rank and infinite co-rank, and such that 𝑄1+ ⋅ ⋅ ⋅+𝑄𝑚 = 0. For 𝑗 = 1, . . . ,𝑚, put 𝑃𝑗 = Φ(𝑄𝑗). Then the projections 𝑃1, . . . , 𝑃𝑚 havethe desired properties. □

7. Applications to specific Banach spaces

We will now use the material presented in the previous section to identify certainBanach spaces 𝑋 for which the operator algebra ℒ(𝑋) allows for non-trivial zerosums, hence is spectrally irregular. In most cases this will be done by showing thatℒ(𝑋) is of Cuntz type so that Theorem 4.1 applies; occasionally we will need torefer to Corollary 6.4.

For Σ a non-empty set, 1 ≤ 𝑝 ≤ ∞, and 𝑍 a Banach space, let ℓ𝑝(Σ;𝑍) denotethe Banach space of all ℓ𝑝-functions from Σ into 𝑍, i.e., the functions 𝑓 : Σ → 𝑍for which the following quantities are finite: in case 𝑝 =∞,

∥𝑓∥∞ = sup𝜎∈Σ

∥𝑓(𝜎)∥𝑍 ,

in case 1 ≤ 𝑝 <∞,

∥𝑓∥𝑝 =(

sup𝐹 finite subset of Σ

∑𝜎∈𝐹

∥𝑓(𝜎)∥𝑝𝑍) 1

𝑝

.

Here ∥ ⋅ ∥𝑍 stands for the (given) norm on 𝑍. With the usual algebraic operationsand the norm given by the expressions above, ℓ𝑝(Σ;𝑍) is a Banach space.

Theorem 7.1. Let Σ be an infinite set, let 1 ≤ 𝑝 ≤ ∞, and let 𝑍 be a Banachspace. Then the Banach space ℓ𝑝(Σ;𝑍) is isomorphic to its square ℓ𝑝(Σ;𝑍)

2. Also,when 𝑍 is non-trivial, the operator algebra ℒ(ℓ𝑝(Σ;𝑍)) has the Cuntz 2-property.

Proof. By Theorem 6.2 it is sufficient to prove the first part of the theorem. Write Σas the disjoint union of Σ1 and Σ2 of two sets both sets having the same cardinalityas Σ. This is possible by virtue of the basic set theoretical result saying that thesum of two infinite cardinalities is that same cardinality again. Let 𝜙1 : Σ→ Σ1 and

𝜙2 : Σ → Σ2 be bijections. Now define 𝑊 : ℓ𝑝(Σ;𝑍)→(ℓ𝑝(Σ;𝑍)

)2by stipulating

that 𝑊𝑓 = (𝑊1𝑓,𝑊2𝑓)⊤ where 𝑊𝑗 : ℓ𝑝(Σ;𝑍)→ ℓ𝑝(Σ;𝑍) is given by

𝑊𝑗𝑓 = 𝑓 ∣Σ𝑗 ∘ 𝜙𝑗 : Σ→ 𝑍, 𝑗 = 1, 2.

Also introduce 𝑉 :(ℓ𝑝(Σ;𝑍)

)2 → ℓ𝑝(Σ;𝑍))by 𝑉 (𝑓, 𝑔)⊤ = 𝑉1𝑓 + 𝑉2𝑔 with

𝑉1𝑓 ∣Σ1 = 𝑓 ∘ 𝜙−11 , 𝑉1𝑓 ∣Σ2 = 0, 𝑉2𝑔∣Σ2 = 𝑔 ∘ 𝜙−12 , 𝑉2𝑔∣Σ1 = 0.

Then 𝑉 and 𝑊 are each others inverse. □


As a particular case of Theorem 7.1 we have that, for 1 ≤ 𝑝 ≤ ∞, the operatoralgebras ℒ(ℓ𝑝) are of Cuntz type. Hence they allow for non-trivial zero sums ofidempotents and are spectrally irregular. The case 𝑝 = 2 was already covered in[6]; see also [5].

In the same vein, by pasting together two copies of the real line ℝ, it becomesclear that 𝐿𝑝(ℝ) is isomorphic to a direct sum of two copies of itself. Thus ℒ(𝐿𝑝(ℝ)

)has the Cuntz 2-property. Possible generalizations of this result involve measurespaces (Σ, 𝜇) such that two or more copies of (Σ, 𝜇) can be combined into a measurespace which is equivalent to one copy of (Σ, 𝜇), in the sense that there exists abijective measurable function whose inverse is measurable too.

As is usual, the subspaces of ℓ∞ consisting of those complex sequences havinga limit, respectively limit zero, are denote by 𝑐, respectively 𝑐0. Clearly 𝑐0 is iso-morphic to the direct sum 𝑐20 of two copies of itself. Thus Theorem 6.2 guaranteesthat ℒ(𝑐0) has the Cuntz 2-property. But then ℒ(𝑐0) allows for non-trivial zerosums, hence is spectrally irregular. The same conclusion holds for ℒ(𝑐). This canbe derived from Corollary 6.4 upon noting that 𝑐0 is a complemented subspace of𝑐 (having codimension one).

The above observations concerning 𝑐0 and ℒ(𝑐0) can be brought into a moregeneral context. Let 𝑐0(Σ;𝑍) be the (closed) subspace of ℓ∞(Σ;𝑍) consisting of thefunctions 𝑓 from Σ into 𝑍 having the following property: for each 𝜀 > 0 there exista finite subset 𝐹 ⊂ Σ (depending on 𝜀) such that ∥𝑓(𝑡)∥𝑍 < 𝜀 for all 𝑡 ∈ Σ ∖𝐹 .Here, as before, ∥ ⋅ ∥𝑍 denotes the norm on the Banach space 𝑍.

Theorem 7.2. Let Σ be an infinite set and let 𝑍 be a Banach space. Then theBanach space 𝑐0(Σ;𝑍) isomorphic to its square 𝑐0(Σ;𝑍)

2. Also, when 𝑍 is non-trivial, the operator algebra ℒ(𝑐0(Σ;𝑍)) has the Cuntz 2-property.

The proof of Theorem 7.2 follows the same line of thought as the argumentgiven for Theorem 7.1.

We can also obtain an analogue of 𝑐 by stipulating that 𝑐(Σ;𝑍) is the (closed)subspace of ℓ∞(Σ;𝑍) consisting of the functions 𝑓 : Σ → 𝑍 having the followingproperty: there exists 𝑧 ∈ 𝑍 (depending on 𝑓) such that for each 𝜀 > 0 there existsa finite subset 𝐹 ⊂ Σ (depending on 𝜀) with ∥𝑓(𝜎) − 𝑧∥𝑍 < 𝜀 for all 𝜎 ∈ Σ ∖𝐹 .Clearly 𝑐0(Σ;𝑍) is a closed subspace of 𝑐(Σ;𝑍). Another closed subspace of 𝑐(Σ;𝑍)

is the set 𝑍 of all constant functions on Σ (which can be viewed as a copy of 𝑍).

As 𝑐(Σ;𝑍) = 𝑐0(Σ;𝑍) ∔ 𝑍 we have that 𝑐0(Σ;𝑍) is complemented in 𝑐(Σ;𝑍).Thus by Corollary 6.4, for each integer 𝑚 larger than or equal to five, the operatoralgebra ℒ(𝑐(Σ;𝑍)) allows for zero sums involving 𝑚 non-zero idempotents, henceis spectrally irregular.

To further demonstrate the applicability of Theorem 4.1 and Corollary 6.4,we present a couple of results involving functions that are continuous on the realline ℝ with the possible exception of the points in a certain infinite subset of ℝwhere jumps occur.

Let Γ be an infinite subset of ℝ having no accumulation point in ℝ. By𝐶Γ(ℝ;𝑍) we denote the subspace of ℓ∞(ℝ;𝑍) consisting of all functions 𝑓 : ℝ→ 𝑍


that are continuous on ℝ except possibly in the points of Γ where jumps may occur.The latter means that for each 𝑎 ∈ Γ both lim𝑡↑𝑎 𝑓(𝑡) and lim𝑡↓𝑎 𝑓(𝑡) exist, one orboth of them possibly different from the value of 𝑓 at 𝑎. It is not hard to see that𝐶Γ(ℝ;𝑍) is closed in ℓ∞(ℝ;𝑍).

A few preliminary remarks are in order.As Γ has no accumulation point in ℝ, every compact subset of ℝ contains

only a finite number of points of Γ. Suppose Γ is bounded below. Then we canfind a monotonically increasing sequence 𝑎1, 𝑎2, 𝑎3, . . . of real numbers such thatlim𝑘→∞

𝑎𝑘 = ∞ and Γ = {𝑎1, 𝑎2, 𝑎3, . . .}. It is now easy to see that 𝐶Γ(ℝ;𝑍) is

isomorphic to 𝐶ℕ(ℝ;𝑍), where ℕ stands for the set of non-negative integers.Next assume that Γ is bounded above. Then there exists a monotonically

decreasing sequence 𝑏1, 𝑏2, 𝑏3, . . . of real numbers with lim𝑘→∞

𝑏𝑘 = −∞ and Γ =

{. . . , 𝑏3, 𝑏2, 𝑏1}, and again it follows that 𝐶Γ(ℝ;𝑍) is isomorphic to 𝐶ℕ(ℝ;𝑍).Finally, in case Γ is neither bounded below nor bounded above, there are

a monotonically increasing sequence 𝑎1, 𝑎2, 𝑎3, . . . and a monotonically decreasingsequence 𝑏1, 𝑏2, 𝑏3, . . . of real numbers for which 𝑏1 < 𝑎1,

lim𝑘→∞

𝑎𝑘 =∞, lim𝑘→∞

𝑏𝑘 = −∞,

and Γ = {. . . , 𝑏3, 𝑏2, 𝑏1, 𝑎1, 𝑎2, 𝑎3, . . .}. The conclusion is now that 𝐶Γ(ℝ;𝑍) isisomorphic to 𝐶ℤ(ℝ;𝑍), where ℤ denotes the set of all integers.

Theorem 7.3. Let Γ be an infinite subset of the real line ℝ having no accumulationpoint in ℝ. Suppose Γ is neither bounded below nor above. Then the Banach space𝐶Γ(ℝ;𝑍) is isomorphic to its square 𝐶Γ(ℝ;𝑍)

2. Also, when the Banach space 𝑍is non-trivial, the operator algebra ℒ(𝐶Γ(ℝ;𝑍)

)has the Cuntz 2-property.

Proof. It is enough to show an isomorphy between 𝐶Γ(ℝ;𝑍) and 𝐶Γ(ℝ;𝑍)2. As

𝐶Γ(ℝ;𝑍) is isomorphic to 𝐶ℤ(ℝ;𝑍), it suffices to consider the Γ = ℤ. Write ℝ asthe disjoint union of Σ1 and Σ2:

Σ1 =∪𝑘∈ℤ

(2𝑘, 2𝑘 + 1], Σ2 =∪𝑘∈ℤ

(2𝑘 + 1, 2𝑘 + 2].

Also define 𝜙1 and 𝜙2 on ℝ by

𝜙1(𝑡) = 𝑘 + 𝑡, 𝜙2(𝑡) = 𝑘 + 1 + 𝑡, 𝑡 ∈ (𝑘, 𝑘 + 1], 𝑘 ∈ ℤ.

Then 𝜙1 : ℝ→ Σ1 and 𝜙2 : ℝ→ Σ2 are bijective with inverses given by

𝜙−11 (𝑠) = −𝑘 + 𝑠, 𝑠 ∈ (2𝑘, 2𝑘 + 1], 𝑘 = 0, 1, 2, . . . ,

𝜙−12 (𝑠) = −𝑘 − 1 + 𝑠, 𝑠 ∈ (2𝑘 + 1, 2𝑘 + 2], 𝑘 = 0, 1, 2, . . . .

Take 𝑓 in 𝐶ℤ(ℝ;𝑍). Then, along with 𝑓 , the function 𝑓 ∣Σ1 ∘ 𝜙1 : ℝ → 𝑍 iscontinuous on ℝ except maybe in the points of ℤ where jumps occur. Thus itbelongs to 𝐶ℤ(ℝ;𝑍), and the same is true for 𝑓 ∣Σ2 ∘ 𝜙2 : ℝ → 𝑍. Now intro-

duce 𝑊 : 𝐶ℤ(ℝ;𝑍) →(𝐶ℤ(ℝ;𝑍)

)2by stipulating that 𝑊𝑓 = (𝑊1𝑓,𝑊2𝑓)

⊤ with


𝑊𝑗𝑓 = 𝑓 ∣Σ𝑗 ∘ 𝜙𝑗 . Then 𝑊 is bijective. For its inverse 𝑉 from(𝐶ℤ(ℝ;𝑍)

)2into

𝐶ℤ(ℝ;𝑍) we have 𝑉 (𝑓, 𝑔)⊤ = 𝑉1𝑓 + 𝑉2𝑔 with

𝑉1𝑓 ∣Σ1 = 𝑓 ∘ 𝜙−11 , 𝑉1𝑓 ∣Σ2 = 0, 𝑉2𝑔∣Σ2 = 𝑔 ∘ 𝜙−12 , 𝑉2𝑔∣Σ1 = 0,

and with this the argument is complete. □

Theorem 7.4. Let Γ be a an infinite subset of the real line ℝ having no accu-mulation point in ℝ. Suppose Γ is bounded below or above. Also assume 𝑍 to benon-trivial. Then, given an integer 𝑚 larger than or equal to five, there exist 𝑚projections 𝑃1, . . . , 𝑃𝑚 in 𝐶Γ(ℝ;𝑍), all of finite rank and co-rank, and such that𝑃1 + ⋅ ⋅ ⋅+ 𝑃𝑚 = 0.

In particular ℒ(𝐶Γ(ℝ;𝑍))is spectrally irregular.

Proof. As 𝐶Γ(ℝ;𝑍) is isomorphic to 𝐶ℕ(ℝ;𝑍), it suffices to consider the caseΓ = ℕ. Write 𝐶− for the (closed) subspace of 𝐶ℕ(ℝ;𝑍) consisting of all functions𝑓 ∈ 𝐶ℕ(ℝ;𝑍) for which 𝑓 vanishes on (0,∞). Further let 𝐶+ be the (closed)subspace of 𝐶ℕ(ℝ;𝑍) having as elements the functions 𝑓 ∈ 𝐶ℕ(ℝ;𝑍) such that 𝑓vanishes on (−∞, 0]. Then 𝐶ℕ(ℝ;𝑍) = 𝐶− ∔ 𝐶+. Also 𝐶+ is isomorphic to 𝐶2

+,as can be seen via an argument analogous to the proof of Theorem 7.3. Now applyCorollary 6.4. □

In the case where Γ is bounded below, we have the possibility to introducethe subalgebra 𝐶Γ,−(ℝ;𝑍) of 𝐶Γ(ℝ;𝑍) consisting of all 𝑓 ∈ 𝐶Γ(ℝ;𝑍) for whichlim

𝑡→−∞ 𝑓(𝑡) exists in 𝑍. This subalgebra is again closed, so we have a Banach subal-

gebra of 𝐶Γ(ℝ;𝑍) here. We do not know whether for Γ bounded below the Banachspace 𝐶Γ(ℝ;𝑍) is isomorphic to 𝐶ℤ(ℝ;𝑍), the space 𝐶Γ,−(ℝ;𝑍) is however.

Lemma 7.5. Let Γ be a an infinite subset of the real line ℝ having no accumula-tion point in ℝ. Suppose Γ is bounded below. Then 𝐶Γ,−(ℝ;𝑍) is isomorphic to𝐶ℤ(ℝ;𝑍).

Proof. As in the proof of Theorem 7.4 it suffices to consider the case Γ = ℕ. Definethe function 𝜓 on ℝ by

𝜓(𝑡) =

⎧⎨⎩log 𝑡, 𝑡 ∈ (0, 1],3𝑘 + 1 + 𝑡, 𝑡 ∈ (−𝑘 − 1,−𝑘], 𝑘 = 0, 1, 2, . . . ,

𝑘 + 𝑡, 𝑡 ∈ (𝑘 + 1, 𝑘 + 2], 𝑘 = 0, 1, 2, . . . .

Then 𝜓 : ℝ→ ℝ is bijective with inverse 𝜓−1 : ℝ→ ℝ given by

𝜓−1(𝑠) =

⎧⎨⎩𝑒𝑠, 𝑠 ∈ (−∞, 0],−3𝑘 − 1 + 𝑠, 𝑠 ∈ (2𝑘, 2𝑘 + 1], 𝑘 = 0, 1, 2, . . . ,

−𝑘 + 𝑠, 𝑠 ∈ (2𝑘 + 1, 2𝑘 + 2], 𝑘 = 0, 1, 2, . . . .

Take 𝑓 in 𝐶ℕ,−(ℝ;𝑍). Then the function 𝑓 ∘ 𝜓 : ℝ → 𝑍 belongs to 𝐶ℤ(ℝ;𝑍).Introduce 𝑊 : 𝐶ℕ,−(ℝ;𝑍)→ 𝐶ℤ(ℝ;𝑍) by stipulating that 𝑊𝑓 = 𝑓 ∘ 𝜙. Then 𝑊is bijective. For its inverse 𝑉 : 𝐶ℤ(ℝ;𝑍)→ 𝐶ℕ,−(ℝ;𝑍) we have 𝑉 𝑔 = 𝑔 ∘ 𝜓−1.


Take 𝑓 in 𝐶ℤ(ℝ;𝑍). Then the function 𝑓 ∘ 𝜙 : ℝ→ 𝑍 belongs to 𝐶ℕ,−(ℝ;𝑍).Introduce 𝑊 : 𝐶ℤ(ℝ;𝑍)→ 𝐶ℕ,−(ℝ;𝑍) by stipulating that 𝑊𝑓 = 𝑓 ∘ 𝜙. Then 𝑊is bijective. For its inverse 𝑉 : 𝐶ℕ,−(ℝ;𝑍)→ 𝐶ℤ(ℝ;𝑍) we have 𝑉 𝑔 = 𝑔 ∘ 𝜙−1. □

The next result is now immediate from combining Lemma 7.5 and Theo-rem 7.3.

Theorem 7.6. Let Γ be an infinite subset of the real line ℝ having no accumulationpoint in ℝ. Suppose Γ is bounded below. Then the Banach space 𝐶Γ,−(ℝ;𝑍) isisomorphic to its square 𝐶Γ,−(ℝ;𝑍)2. Also, when the Banach space 𝑍 is non-trivial, the operator algebra ℒ(𝐶Γ,−(ℝ;𝑍)

)has the Cuntz 2-property.

Under the assumption that Γ is bounded above, Lemma 7.5 and Theorem 7.6hold with 𝐶Γ,−(ℝ;𝑍) replaced by the Banach subalgebra 𝐶Γ,+(ℝ;𝑍) of 𝐶Γ(ℝ;𝑍)consisting of all 𝑓 ∈ 𝐶Γ(ℝ;𝑍) for which lim

𝑡→+∞ 𝑓(𝑡) exists in 𝑍.

Theorems 7.3, 7.4 and 7.6 were concerned with functions on ℝ being con-tinuous with the possible exception of jump discontinuities only. Now we turn toBanach spaces consisting of continuous functions. For 𝑆 a topological space and𝑍 a (complex) Banach space, the expression 𝒞∞(𝑆;𝑍) will denote the Banachspace of all bounded continuous functions from 𝑆 to 𝑍, endowed with the usualalgebraic operations (defined pointwise) and the sup-norm. Clearly 𝒞∞(𝑆;𝑍) is asubspace of ℓ∞(𝑆;𝑍). In case 𝑆 is compact, 𝒞∞(𝑆;𝑍) coincides with the Banachspace 𝒞(𝑆;𝑍) of all continuous functions from 𝑆 to 𝑍, again endowed with theusual algebraic operations (defined pointwise) and the max-norm.

We begin with some auxiliary observations.

Proposition 7.7. Let 𝑆 be a topological space, let 𝑍 be a Banach space, and let 𝑛 be apositive integer larger than or equal to two. Suppose 𝑆 is homeomorphic to the topo-logical direct sum of 𝑛 copies of itself. Then the Banach space 𝒞∞(𝑆;𝑍) is isomor-phic to 𝒞∞(𝑆;𝑍)𝑛. Also, when 𝑍 is non-trivial, the operator algebra ℒ(𝒞∞(𝑆;𝑍))has the Cuntz 𝑛-property.

The condition on 𝑆 is met if and only if 𝑆 can be written as the disjoint unionof 𝑛 clopen (which by definition means: open and closed) sets 𝑆1, . . . , 𝑆𝑛, each ofwhich as a topological subspace of 𝑆 (i.e., provided with the relative topology withrespect to 𝑆) is homeomorphic to 𝑆. Note that we have a fractal type structurehere. Each 𝑆𝑗 is again the disjoint union of 𝑛 clopen sets homeomorphic to 𝑆, andfor these this is true again. And so on, indefinitely. We shall come back to thispoint after the proof. See also Examples A, B and C below.

Proof. Let 𝑆1, . . . , 𝑆𝑛 be clopen sets as above. For 𝑗 = 1, . . . , 𝑛, let 𝜙𝑗 be a home-omorphism of 𝑆 onto 𝑆𝑗 . Define 𝑊 : 𝒞∞(𝑆;𝑍)→ 𝒞∞(𝑆;𝑍)𝑛 by

𝑊𝑓 = (𝑊1𝑓, . . . ,𝑊𝑛𝑓)⊤, 𝑊𝑗𝑓 = 𝑓 ∣𝑆𝑗 ∘ 𝜙𝑗 , 𝑗 = 1, . . . , 𝑛.

Then 𝑊 is linear and bounded. In fact, with the choice made for the norm on𝒞∞(𝑆;𝑍)𝑛 in the paragraph directly following Theorem 6.2, the operator 𝑊 is


norm preserving. It is also bijective. This can be seen as follows. For 𝑗 = 1, . . . , 𝑛,let 𝑉𝑗 : 𝒞∞(𝑆;𝑍)→ 𝒞∞(𝑆;𝑍) be given by

𝑉𝑗𝑓 ∣𝑆𝑗 = 𝑓 ∘ 𝜙−1𝑗 , 𝑉𝑗𝑓 ∣𝑆𝑘 = 0, 𝑘 = 1, . . . , 𝑛, 𝑘 ∕= 𝑗.Note that 𝑉𝑗𝑓 ∈ 𝒞∞(𝑆;𝑍) because the sets 𝑆1, . . . , 𝑆𝑛 are clopen in 𝑆. The inverse𝑉 : 𝒞∞(𝑆;𝑍)𝑛 → 𝒞∞(𝑆;𝑍) of the operator 𝑊 is now given by 𝑉 (𝑓1, . . . , 𝑓𝑛)

⊤ =𝑉1𝑓1 + ⋅ ⋅ ⋅ + 𝑉𝑛𝑓𝑛. This proves the first statement in the proposition; the secondis immediate from Theorem 6.2. □

Returning to the fractal type structure mentioned prior to the above proof,we recall that 𝑆𝑗 is again the disjoint union of 𝑛 clopen sets homeomorphic to 𝑆,and for these clopen sets this is true again. By induction, it follows that for anypositive integer 𝑘, the space 𝑆 is the disjoint union of 𝑘𝑛 clopen subsets of 𝑆, eachhomeomorphic to 𝑆. In particular, there exists a (countably) infinite collectionof clopen subsets of 𝑆, each homeomorphic to 𝑆, possibly not mutually disjointhowever.

In general it is not clear whether, under the conditions of Proposition 7.7,there is an infinite collection of mutually disjoint clopen subsets of 𝑆. A fortiori, itis not clear whether 𝑆 can be written as the disjoint union of an infinite collectionof clopen sets homeomorphic to 𝑆. When this is possible, we have the followingresult.

Proposition 7.8. Let 𝑆 be a topological space, let 𝑍 be a Banach space, and suppose𝑆 is the disjoint union of an infinite collection of clopen sets homeomorphic to 𝑆.Then the Banach space 𝒞∞(𝑆;𝑍) is isomorphic to its square 𝒞∞(𝑆;𝑍)2. Also,when 𝑍 is non-trivial, the operator algebra ℒ(𝒞∞(𝑆;𝑍)) has the Cuntz 2-property.

Proof. Let Σ be an infinite index set, and assume 𝑆 is the disjoint union of thefamily {𝑆𝜎}𝜎∈Σ of clopen subsets of 𝑆, each homeomorphic to 𝑆. In other words,𝑆 is the topological direct sum of the family {𝑆𝜎}𝜎∈Σ. Using the type of argumentfeaturing in the proof of Proposition 7.7, one finds that 𝒞∞(𝑆;𝑍) is isomorphic toℓ∞(Σ; 𝒞∞(𝑆;𝑍)

). By Theorem 7.1, the latter space is isomorphic to its square,

but then so is 𝒞∞(𝑆;𝑍). This establishes the first statement in the proposition;the second now comes from Theorem 6.2. □

By 𝐾 we denote the familiar ternary Cantor set, also called the Cantormiddle-third set. Recall that 𝐾 is compact, so that 𝒞∞(𝐾;𝑍) coincides with𝒞(𝐾;𝑍).Corollary 7.9. Let 𝑍 be a Banach space. Then the Banach space 𝒞(𝐾;𝑍) is iso-morphic to its square 𝒞(𝐾;𝑍)2. Also, when 𝑍 is non-trivial, the operator algebraℒ(𝒞(𝐾;𝑍)) has the Cuntz 2-property.

Proof. Let 𝐾− and 𝐾+ be the intersection of 𝐾 with the closed interval [0,1/3]and [2/3, 1], respectively. Then 𝐾− and 𝐾+ are open sets in 𝐾. Also 𝐾 is thedisjoint union of 𝐾− and 𝐾+. So 𝐾 is the topological direct sum of 𝐾− and 𝐾+.Now note that 𝐾 = 3𝐾− and 𝐾 = −2 + 3𝐾+. It follows that both 𝐾− and 𝐾+


are homeomorphic to 𝐾. Now apply Proposition 7.7 to get the first part of thetheorem, and Theorem 6.2 to get the second. □

The next result is a mild generalization of a truly remarkable result ofA.A. Miljutin [28].

Theorem 7.10. Let 𝑆 be an uncountable compact metrizable topological space andsuppose 𝑍 is a finite-dimensional Banach space. Then the Banach spaces 𝒞(𝑆;𝑍)and 𝒞(𝐾;𝑍) are isomorphic.

Proof. Consider 𝒞𝑟(𝑆;ℝ), the Banach space of real-valued continuous functions on𝑆, which is the real analogue of the complex Banach space 𝒞(𝑆;ℂ). By a celebratedresult of A.A. Miljutin [28], the real Banach spaces 𝒞𝑟(𝑆;ℝ) and 𝒞𝑟(𝐾;ℝ) are iso-morphic (see also [29], the remark below Theorem 6.2.5). A simple complexificationargument now shows that 𝒞(𝑆;ℂ) and 𝒞(𝐾;ℂ) are isomorphic too. To make thestep from complex-valued functions to those having values in the (complex) finite-dimensional Banach space 𝑍, we argue as follows. Write 𝑛 for the dimension of𝑍. As each 𝑛-dimensional complex Banach space is isomorphic to ℂ𝑛, we mayassume that 𝑍 actually coincides with ℂ𝑛. For 𝑗 = 1, . . . , 𝑛, let 𝑅𝑗 : ℂ

𝑛 → ℂ

be the 𝑗th coordinate function. Take 𝑓 in 𝒞(𝑆;ℂ𝑛) and put 𝑓𝑗 = 𝑅𝑗 ∘ 𝑓 . Then𝑓𝑗 ∈ 𝒞(𝑆;ℂ). As we have already seen, there exists a bijective bounded linearoperator from 𝒞(𝑆;ℂ) onto 𝒞(𝐾;ℂ). Let us denote it by 𝐵. For 𝑘 ∈ 𝐾, now write

𝑇𝑓(𝑘) =(𝐵𝑓1(𝑘), . . . , 𝐵𝑓𝑛(𝑘)

)⊤. Then 𝑇 (𝑓) belongs to 𝒞(𝐾;ℂ𝑛) and we have a

mapping 𝑇 : 𝒞(𝑆;ℂ𝑛) → 𝒞(𝐾;ℂ𝑛). This mapping is easily seen to be a bijectivebounded linear operator. □

Theorem 7.11. Let 𝑆 be an uncountable compact metrizable topological space andsuppose 𝑍 is a non-trivial finite-dimensional Banach space. Then the operatoralgebra ℒ(𝒞(𝑆;𝑍)) has the Cuntz 2-property.

Proof. The Banach spaces 𝒞(𝑆;𝑍) and 𝒞(𝐾;𝑍) are isomorphic. Hence the sameis true for ℒ(𝒞(𝑆;𝑍)) and ℒ(𝒞(𝐾;𝑍)). The desired result is now immediate fromCorollary 7.9. □

In the next result, certain notions from general topology play a role. Hereare the pertinent definitions taken from [29]. A non-empty topological space 𝑆 iscalled topologically complete if there exists a complete metric on 𝑆 which generatesthe topology of 𝑆. The space is said to be nowhere locally compact if no point of𝑆 has a neighborhood with compact closure. Finally, 𝑆 is called zero-dimensionalif 𝑆 has a base consisting of clopen sets.

Theorem 7.12. Let 𝑆 be a non-empty topological space which is topologically com-plete, nowhere locally compact and zero-dimensional. Further let 𝑍 be a Banachspace. Then the Banach space 𝒞∞(𝑆;𝑍) is isomorphic to its square 𝒞∞(𝑆;𝑍)2.Also, in case 𝑍 is non-trivial, the operator algebra ℒ(𝒞(𝑆;𝑍)) has the Cuntz 2-property.


Proof. By a result of P. Alexandroff and P. Urysohn [3], cited as Theorem 1.9.8in [29], the space 𝑆 is homeomorphic to the space ℙ of all irrational numbers. Letℙ1 be the set of negative irrational numbers, and let ℙ2 be the set of the positiveones. Then ℙ1 and ℙ2 are clopen sets in ℙ and ℙ is the disjoint union of ℙ1 andℙ2. Also both ℙ1 and ℙ2 are homeomorphic to ℙ. This is clear from the fact thatthese spaces satisfy the conditions mentioned in the theorem, but it can also beseen in a more direct way by constructing concrete homeomorphisms 𝜙1 from ℙ

onto ℙ1 and 𝜙2 from ℙ onto ℙ2. For the latter on can take for instance 𝜙2 : ℙ→ ℙ2given by

𝜙2(𝑡) =

⎧⎨⎩1

1− 𝑡 , 𝑡 ∈ ℙ, 𝑡 < 0,

𝑡+ 1, 𝑡 ∈ ℙ, 𝑡 > 0.

Now apply Proposition 7.7. □Theorem 7.13. Let 𝑆 be a non-empty topological space with a countable number ofpoints and no isolated points. Further let 𝑍 be a Banach space. Then the Banachspaces 𝒞∞(𝑆;𝑍) and 𝒞∞(𝑆;𝑍)2 are isomorphic. Also, when 𝑍 is non-trivial, theoperator algebra ℒ(𝒞(𝑆;𝑍)) has the Cuntz 2-property.

Proof. By a result of W. Sierpinski [33] (see Theorem 1.9.6 in [29]), the space 𝑆is homeomorphic to ℚ, the space of rational numbers. The argument for dealingwith ℚ is similar to that given in the proof of the preceding result for ℙ; use, forinstance,

√2 as a “division point” instead of 0. □

In each of the above specializations to concrete spaces, we have isomorphy ofa Banach space with its square (hence with all its powers). In order to prove this,we needed some rather non-trivial results from general topology. In some casesone can avoid this by settling for something less, for instance isomorphy with thecube instead of the square. An example is given below. More on Banach spacesisomorphic to their cubes can be found in the next section.

Example A. Let 𝑆 be the subset of the open interval (−1, 1) consisting of therational numbers ∞∑

𝑘=1

3𝜀𝑘

(1

4

)𝑘

, (11)

where the 𝜀𝑘 are allowed to take the numerical values −1, 0, 1 (and no others),while only a finite number among 𝜀1, 𝜀2, 𝜀3, . . . may differ from zero, so that (11)is actually a finite sum. To get an idea of what is going on, let us look at a fewcases.

The first is where all 𝜀𝑘 vanish. Then (11) only gives the number 0. Nextconsider the situation where 𝜀𝑘 = 0 for all 𝑘 ≥ 2. This leads to the three numbers− 3

4 , 0 and34 . When 𝜀𝑘 = 0 for 𝑘 ≥ 3, the sum (11) reduces to 3

4𝜀1 +316 𝜀2 with

𝜀1, 𝜀2 ∈ {−1, 0, 1}, and so we arrive at the nine numbers−1516, −3

4, − 9

16, − 3

16, 0,

3

16,

9

16,3

4,15

16.


These include the three outcomes we already had in the previous stage. In case𝜀𝑘 vanishes for all 𝑘 ≥ 4, the sum (11) becomes 3

4𝜀1 +316 𝜀2 +

364 𝜀3 with the

restrictions stipulated above on 𝜀1, 𝜀2 and 𝜀3. Thus, besides the nine numbersindicated above, we get eighteen additional ones, making a total of twentyseven:

−6364, −15

16, −57

64, −51

64, −3

4, −45

64, −39

64, − 9

16, −33

64,

−1564, − 3

16, − 9

64, − 3

64, 0,

3

64,

9

64,

3

16,

15

64,

33

64,

9

16,

39

64,

45

64,

3

4,

51

64,

57

64,

15

16,

63

64.

And so on.

The number of points in 𝑆 is countably infinite and 𝑆, being a subspace ofthe real line, has a countable base. Also it is a straightforward matter to provethat 𝑆 has no isolated points. For 𝑍 a non-trivial Banach space, Theorem 7.13now gives that the operator algebra ℒ(𝒞(𝑆;𝑍)) has the Cuntz 2-property. Theproof given above employs Sierpinsky’s (rather non-trivial) characterization of ℚas the unique non-empty countable space without isolated points. Being contentwith the Cuntz 3-property, we can avoid the use of heavy machinery from generaltopology. Here is the argument.

Split 𝑆 in three parts 𝑆−, 𝑆0 and 𝑆+:

𝑆− = 𝑆 ∩(−1,−1

2

), 𝑆0 = 𝑆 ∩

(−14,1

4

), 𝑆+ = 𝑆 ∩

(1

2, 1

).

These parts correspond to 𝜀1 = −1, 𝜀1 = 0 and 𝜀1 = 1, respectively. Clearly 𝑆−, 𝑆0and 𝑆0 are open subsets of 𝑆 which itself is the dsjoint union of these sets. But then𝑆−, 𝑆0 and 𝑆0 are closed in 𝑆 too. Now note that 𝑆 = 3+4𝑆− = 4𝑆0 = −3+4𝑆+.Hence 𝑆 is homeomorphic to the topological direct sum of three copies of itself.Applying Proposition 7.7 we get that the operator algebra ℒ(𝒞∞(𝑆;𝑍)) has theCuntz 3-property. □

The material presented above, is concerned with non-trivial Banach spacesthat are isomorphic to their squares and, consequently, allow for non-trivial zerosums of projections. For such a Banach space 𝑋 there exist “Cuntz operators”𝑉1, 𝑉2,𝑊1 and 𝑊2 in ℒ(𝑋) satisfying the identities

𝑉1𝑊1 + 𝑉2𝑊2 = 𝐼𝑋 , 𝑊𝑗𝑉𝑘 = 𝛿𝑗,𝑘𝐼𝑋 , 𝑗, 𝑘 = 1, 2.

For several of the above concrete instances of Banach spaces isomorphic to theirsquare or a higher power we gave (or would be able to give) explicit descriptionsof these Cuntz operators (cf. Theorems 7.1 and 7.6; see also Proposition 7.7 andExample A). In combination with the material presented in Section 4, such descrip-tions, when available, can be used to obtain explicit expressions for the projectionsforming a non-trivial zero sum. The expressions in question are complicated andnot very illuminating; we refrain from giving further details here.


8. Banach spaces isomorphic to their cubes

The applications we gave in the previous section were concerned with Banachspaces having the Cuntz 2-property. However, this does not cover all possiblecases. Indeed, Theorem 10 in [21] provides an example of a Banach space which isisomorphic to its cube while it is not isomorphic to its square. More generally itis shown in [23] that for every integer 𝑘 ≥ 2, there is a Banach space 𝐸 such that𝐸𝑛 is isomorphic to 𝐸𝑚 if and only if 𝑚 = 𝑛 (mod 𝑘). In particular the space 𝐸is then not isomorphic to 𝐸𝑘, but it is isomorphic to 𝐸𝑘+1, and the latter impliesthat ℒ(𝐸) is of Cuntz type, hence spectrally irregular because of the occurrenceof non-trivial zero sums of idempotents. The conclusion is that one needs the fullforce of Theorem 6.2.

The examples of the type meant above are complicated. In this section weembark on a somewhat less ambitious endeavor: to construct Banach spaces, evi-dently isomorphic to their cubes, but for which it is not clear whether or not theyare isomorphic to their square. For a given integer 𝑘 larger than 2, the constructioncan be modified such as to result in Banach spaces 𝐹 with 𝐹𝑛 isomorphic to 𝐹𝑚 if𝑚 = 𝑛 (mod 𝑘) while it is unclear whether or not 𝐹𝑛 is isomorphic to 𝐹𝑚 in case𝑚 ∕= 𝑛 (mod 𝑘). We refrain from giving the details concerning this refinement.

We now begin with the construction which, as one will realize, is inspiredby that of the familiar Cantor set. The starting point is a non-empty topologicalspace 𝑈 , not necessarily compact or metrizable. Suppose 𝑈−, 𝑈0 and 𝑈+ are mutu-ally disjoint subspaces of 𝑈 , all three homeomorphic to 𝑈 , hence non-empty. Let𝜙𝑗 : 𝑈 → 𝑈𝑗 be a homeomorphism from 𝑈 onto 𝑈𝑗 . Here 𝑗 ∈ {−, 0,+}. For𝑛 = 1, 2, 3, . . . and 𝑗1, 𝑗2, . . . , 𝑗𝑛 ∈ {−, 0,+}, write

𝑈𝑗1,𝑗2,...,𝑗𝑛 = 𝜙𝑗1𝜙𝑗2 . . . 𝜙𝑗𝑛 [𝑈 ]. (12)

Note that, as far as the expressions 𝑈−, 𝑈0 and 𝑈+ are concerned, no confusion ispossible. Indeed, the sets 𝑈− = 𝜙−[𝑈 ], 𝑈0 = 𝜙0[𝑈 ] and 𝑈+ = 𝜙+[𝑈 ] coming from(12), coincide with the originally given 𝑈−, 𝑈0 and 𝑈+. Clearly, all the subspaces𝑈𝑗1,𝑗2,...,𝑗𝑛 are non-empty,

𝑈𝑗1,𝑗2,...,𝑗𝑛 ⊂ 𝑈𝑗1,𝑗2,...,𝑗𝑛−1 ⊂ ⋅ ⋅ ⋅ ⊂ 𝑈𝑗1,𝑗2 ⊂ 𝑈𝑗1 ⊂∪

𝑗∈{−,0,+}𝑈𝑗 ,

and 𝑈𝑗1,𝑗2,...,𝑗𝑛 = 𝜙𝑗1 [𝑈𝑗2,...,𝑗𝑛 ]. We also have

𝑈𝑗1,𝑗2,...,𝑗𝑛 ∩ 𝑈𝑗 =

{𝑈𝑗,𝑗2,...,𝑗𝑛 , 𝑗 = 𝑗1,

∅, 𝑗 ∕= 𝑗1.Note further that

𝑈𝑗1,𝑗2,...,𝑗𝑛 ∩ 𝑈𝑖1,𝑖2,...,𝑖𝑛 ∕= ∅ ⇔ 𝑗𝑘 = 𝑖𝑘, 𝑘 = 1, . . . , 𝑛,

and so 𝑈𝑗1,𝑗2,...,𝑗𝑛 and 𝑈𝑖1,𝑖2,...,𝑖𝑛 coincide if and only if they are not disjoint.


For 𝑛 = 1, 2, 3 . . ., introduce

𝑆𝑛 =∪

𝑗1,𝑗2,...,𝑗𝑛 ∈{−,0,+}𝑈𝑗1,𝑗2,...,𝑗𝑛 .

Then 𝑆𝑛 is non-empty and 𝑆1 = 𝑈− ∪ 𝑈0 ∪ 𝑈+. Regardless of whether 𝑗𝑛+1 is−, 0 or +, the inclusion 𝑈𝑗1,𝑗2,...,𝑗𝑛,𝑗𝑛+1 ⊂ 𝑈𝑗1,𝑗2,...,𝑗𝑛 holds. Hence

𝑆𝑛+1 ⊂∪

𝑗1,𝑗2,...,𝑗𝑛 ∈{−,0,+}𝑈𝑗1,𝑗2,...,𝑗𝑛 = 𝑆𝑛,

so 𝑆1 ⊃ 𝑆2 ⊃ 𝑆3 ⊃ . . . . We also note that 𝑆𝑛+1 = 𝜙−[𝑆𝑛] ∪ 𝜙0[𝑆𝑛] ∪ 𝜙+[𝑆𝑛],and this true for 𝑛 = 0 too when we interpret 𝑆0 as 𝑈 . The identity

𝜙𝑗 [𝑆𝑛] = 𝑆𝑛+1 ∩ 𝑈𝑗, 𝑗 ∈ {−, 0,+}, 𝑛 = 1, 2, 3, . . . ,

needed later, is now immediate.Let 𝑆 =

∩∞𝑛=1 𝑆𝑛 be the intersection of the descending chain of sets 𝑆1, 𝑆2,

𝑆3, . . . . In order to directly relate 𝑆 to the sets 𝑈𝑗1,𝑗2,...,𝑗𝑛 we do the following.Write 𝒥 for the collection of (infinite) sequences with entries from {−, 0,+}. Withan element {𝑗1, 𝑗2, 𝑗3 . . .} from 𝒥 , we associate the intersection of the descendingsequence 𝑈𝑗1 ⊃ 𝑈𝑗1,𝑗2 ⊃ 𝑈𝑗1,𝑗2,𝑗3 ⊃ . . . , i.e., the set

𝑈𝑗1,𝑗2,𝑗3,... =

∞∩𝑛=1

𝑈𝑗1,𝑗2,...,𝑗𝑛 .

As 𝑈𝑗1 ⊂ 𝑆1, 𝑈𝑗1,𝑗2 ⊂ 𝑆2, 𝑈𝑗1,𝑗2,𝑗3 ⊂ 𝑆𝑛 and so on, we have that 𝑈𝑗1,𝑗2,𝑗3,... ⊂ 𝑆.So the union of all the sets 𝑈𝑗1,𝑗2,𝑗3,... is contained in 𝑆. In fact there is equality:

𝑆 =∪

{𝑗1,𝑗2,𝑗3,...} ∈𝒥𝑈𝑗1,𝑗2,𝑗3,... =

∪{𝑗1,𝑗2,𝑗3,...}∈𝒥

∞∩𝑛=1

𝑈𝑗1,𝑗2,...,𝑗𝑛 .

For completeness we mention that, given 𝑠 ∈ 𝑆, there is precisely one sequence{𝑗1, 𝑗2, 𝑗3, . . .} ∈ 𝒥 such that 𝑠 ∈ 𝑈𝑗1,𝑗2,𝑗3,... . As we shall see below in an example,it may happen that the set 𝑈𝑗1,𝑗2,𝑗3,... contains more than one point. It can beempty too.

Next introduce 𝑆− = 𝑆 ∩𝑈−, 𝑆0 = 𝑆 ∩𝑈0 and 𝑆+ = 𝑆 ∩𝑈+. Then 𝑆, beinga subset of the disjoint union 𝑈−∪𝑈0∪𝑈+, is the disjoint union of 𝑆−, 𝑆0 and 𝑆+.Also, for 𝑗 ∈ {−, 0,+} we have (using the injectivity of 𝜙𝑗 in the second equalitybelow)

𝜙𝑗 [𝑆] = 𝜙𝑗

[ ∞∩𝑛=1

𝑆𝑛

]=

∞∩𝑛=1

𝜙𝑗[𝑆𝑛]=

∞∩𝑛=1

[𝑆𝑛+1 ∩ 𝑈𝑗

]=

[ ∞∩𝑛=1

𝑆𝑛+1

]∩ 𝑈𝑗 = 𝑆 ∩ 𝑈𝑗 = 𝑆𝑗.

Hence the restriction 𝜓𝑗 of 𝜙𝑗 to 𝑆, viewed as a mapping 𝜓𝑗 : 𝑆 → 𝑆𝑗 , is ahomeomorphism from 𝑆 onto 𝑆𝑗 . Consequently, for 𝑖, 𝑗 ∈ {−, 0,+}, the mapping𝜓𝑗𝜓

−1𝑖 : 𝑆𝑖 → 𝑆𝑗 is a homeomorphism from 𝑆𝑖 onto 𝑆𝑗 . For completeness we


mention that 𝜓−1𝑖 : 𝑆𝑖 → 𝑆 is the restriction of 𝜙−1𝑖 to 𝑆𝑖, considered as a mappingonto 𝑆.

The statements in the previous paragraph are only of interest when the set𝑆 is non-empty. A relevant special case in which this necessarily holds is when theunderlying topological space 𝑈 is compact and the sets 𝑈−, 𝑈0 and 𝑈− are closedin 𝑈 . In that case we can even deduce the non-emptiness of all sets 𝑈𝑗1,𝑗2,𝑗3,...

with the sequence {𝑗1, 𝑗2, 𝑗3, . . .} taken from 𝒥 . Closedness of 𝑈−, 𝑈0 and 𝑈− isguaranteed when 𝑈 , in addition to being compact, is also Hausdorff.

In order to conclude that 𝑆 is non-empty, sometimes fixed point theoremscan be employed too. Here is such a case, which applies, for instance, to non-emptyclosed subset 𝑈 of real or complex Banach spaces. Suppose 𝑈 is a complete metricspace (possibly non-compact) and let 𝜚 denote the metric on 𝑈 . Further assumethat the homeomorphisms 𝜙− : 𝑈 → 𝑈−, 𝜙0 : 𝑈 → 𝑈0 and 𝜙+ : 𝑈 → 𝑈+ arecontractions and there exists a constant 𝑐 ∈ (0, 1) such that

𝜚(𝜙𝑗(𝑥), 𝜙𝑗(𝑦)

) ≤ 𝑐𝜚(𝑥, 𝑦), 𝑥, 𝑦 ∈ 𝑈.Take 𝑗 ∈ {−, 0,+}. Then by the Banach fixed point theorem, 𝜙𝑗 has a (unique)fixed point. Now let 𝑢−,∞, 𝑢0,∞ and 𝑢+,∞ be the fixed points of 𝜙−, 𝜙0 and 𝜙+,respectively. Clearly these belong to 𝑈− = 𝜙−[𝑈 ], 𝑈0 = 𝜙0[𝑈 ] and 𝑈+ = 𝜙+[𝑈 ],respectively. Note now that 𝑢−,∞ ∈ 𝑈−,−,−,... ⊂ 𝑆, 𝑢0,∞ ∈ 𝑈0, 0, 0, ... ⊂ 𝑆 and𝑢+,∞ ∈ 𝑈+,+,+,... ⊂ 𝑆. In particular 𝑆 is non-empty.

In what follows we shall assume that 𝑆 is non-empty. Along with 𝑆, thehomeomorphic images 𝑆−, 𝑆0 and 𝑆+ of 𝑆 under, respectively, the homeomor-phisms 𝜓− = 𝜙−∣𝑆 : 𝑆 → 𝑆−, 𝜓0 = 𝜙0∣𝑆 : 𝑆 → 𝑆0 and 𝜓+ = 𝜙+∣𝑆 : 𝑆 → 𝑆+,are then non-empty as well. Recall now that 𝑆 is the disjoint union of 𝑆−, 𝑆0 and𝑆+. Thus, if the latter three sets happen to be clopen in 𝑆, we have that 𝑆 ishomeomorphic to the topological direct sum of three copies of itself. But then,given a Banach space 𝑍, we can conclude from Proposition 7.7 that the Banachspace 𝒞∞(𝑆;𝑍) is isomorphic to its cube 𝒞∞(𝑆;𝑍)3, hence the operator algebraℒ(𝒞∞(𝑆;𝑍)) has the Cuntz 3-property and is (therefore) spectrally irregular. Itis not clear whether or not 𝒞∞(𝑆;𝑍) is isomorphic to its square 𝒞∞(𝑆;𝑍)2. Thismight among other things, depend on the choice of the Banach space 𝑍.

The requirement, featuring in the above paragraph, that 𝑆−, 𝑆0 and 𝑆+ areclopen in 𝑆, is met when the three sets 𝑈−, 𝑈0 and 𝑈+ are open in the underlyingspace 𝑈 . Here is the argument. Clearly under this assumption 𝑆− = 𝑆 ∩𝑈−, 𝑆0 =𝑆∩𝑈0 and 𝑆+ = 𝑆∩𝑈+ are open in 𝑆. But then 𝑆− = 𝑆 ∖ [𝑆0∪𝑆+] is closed in 𝑆.Similarly 𝑆0 and 𝑆+ are closed in 𝑆 too. The same type of reasoning shows that𝑆−, 𝑆0 and 𝑆+ are clopen in 𝑆 whenever 𝑈−, 𝑈0 and 𝑈+ are closed in 𝑈 .

By way of illustration, we now present an example in which the underlyingtopological space 𝑈 is not compact.

Example B. Let 𝑈 be the open interval (0, 1) and take for 𝑈−, 𝑈0 and 𝑈+ the openintervals

𝑈− =(0,1

3

), 𝑈0 =

(1

3,2

3

), 𝑈+ =

(2

3, 1

).


Further, define 𝜙− : 𝑈 → 𝑈−, 𝜙0 : 𝑈 → 𝑈0 and 𝜙+ : 𝑈 → 𝑈+ by

𝜙−(𝑡) =1

3𝑡, 𝜙+(𝑡) =

2

3+1

3𝑡, 0 < 𝑡 < 1,

𝜙0(𝑡) =

⎧⎨⎩

1

3+1

5𝑡, 0 < 𝑡 <

5

12,

𝑡,5

12≤ 𝑡 ≤ 7

12,

7

15+1

5𝑡,

7

12< 𝑡 < 1.

Obviously these mappings are homeomorphisms. Observe that 𝜙0 acts as the iden-tity mapping on the closed interval

[512 ,

712

]. Hence this interval is contained in (and

actually equal) to the set 𝑈0, 0, 0, ... (notation as above). But then the closed in-terval

[512 ,

712

]is a subset of every set 𝑆𝑛 and so

[512 ,

712

] ⊂ 𝑆. In particular 𝑆is non-empty, in fact even uncountable. As the sets 𝑈−, 𝑈0 and 𝑈+ are open in𝑈 , we may conclude that, given a Banach space 𝑍, the Banach space 𝒞∞(𝑆;𝑍)is isomorphic to its cube 𝒞∞(𝑆;𝑍)3, hence the operator algebra ℒ

(𝒞∞(𝑆;𝑍)) hasthe Cuntz 3-property and is (therefore) spectrally irregular. It is not clear whetheror not 𝒞∞(𝑆;𝑍) is isomorphic to its square 𝒞∞(𝑆;𝑍)2. This is an open ques-tion, even for the case 𝑍 = ℂ, so for the Banach space 𝒞∞(𝑆;ℂ). Note here thatTheorem 7.13 does not apply because 𝑆 is uncountable. Further Theorem 7.12cannot be used for the space 𝑆, containing the closed interval

[512 ,

712

], is not

nowhere locally compact. For that matter, it is not zero-dimensional neither. Fi-nally, Theorem 7.11 cannot be employed to show that ℒ(𝒞∞(𝑆;ℂ)) has the Cunz2-property. The reason is that 𝑆 is not compact. Indeed, 𝑆 is not a closed subsetof the real line; zero is an accumulation point of 𝑆 which does not belong to 𝑆.

As was said before, the construction presented above is inspired by that ofthe familiar Cantor set, earlier denoted by 𝐾. It is illuminating to observe that, infact, a slight modification of 𝐾 is a subset of 𝑆. The set in question, here denotedby 𝐾∘, is obtained as follows. Start with the open interval (0, 1) and leave out theclosed middle third interval

[13 ,

23

]. What remains is the union of the two open

intervals(0, 13)and

(23 , 1). Next in each of them cut out the closed intervals

[19 ,

29

]and

[79 ,

89

], which leaves us with the four open intervals

(0, 19),(29 ,

13

),(23 ,

79

)and(

89 , 1). And so on (formal definition by induction), resulting in

𝐾∘ = (0, 1) ∖∞∪

𝑚=1

3𝑚−1∪𝑘=1

[3𝑘 − 23𝑚

,3𝑘 − 13𝑚

].

Clearly 𝐾∘ is contained in the usual Cantor middle-third set 𝐾 which admits therepresentation

𝐾 = [0, 1] ∖∞∪

𝑚=1

3𝑚−1∪𝑘=1

(3𝑘 − 23𝑚

,3𝑘 − 13𝑚

).


Along with 𝐾, the set 𝐾∘ is uncountable. Indeed,

𝐾 ∖𝐾∘ ⊂ {0, 1} ∪∞∪

𝑚=1

3𝑚−1∪𝑘=1

{3𝑘 − 23𝑚

,3𝑘 − 13𝑚

},

and the set in the right-hand side of this inclusion is countable. (Actually, 𝐾∘coincides with the complement in 𝐾 of the countable subset of 𝐾 consisting of theend points of the intervals left out.) One sees that 𝐾∘ ⊂ 𝑆 by ignoring the presenceof 𝑈0 and 𝜙0. More precisely, by looking at the sets 𝑈𝑗1,𝑗2,...,𝑗𝑛 with 𝑗1, . . . , 𝑗𝑛 takenfrom {−,+}, so avoiding the use of the index 0. In sharp contrast to 𝐾 and 𝐾∘,the Cantor type set 𝑆 contains (countably many) closed intervals.

The topological space 𝐾∘ is neither countable nor compact. So neitherTheorem 7.13 nor Theorem 7.11 applies. However, as 𝐾∘ is homeomorphic tothe topological direct sum of two copies of itself, we can conclude from Proposi-tion 7.7 that for every Banach space 𝑍, the Banach space 𝒞∞(𝐾∘;𝑍) is isomor-phic to its square 𝒞∞(𝐾∘;𝑍)2. Hence, when 𝑍 is non-trivial, the operator algebraℒ(𝒞∞(𝐾∘;𝑍)) has the Cuntz 2-property and is (therefore) spectrally irregular.Theorem 7.12 does apply, and its proof indicates that 𝐾∘ is homeomorphic to ℙ,the space of irrational real numbers. □

We give another example, this one along lines suggested by Van Mill.

Example C. Start with the half closed, half open square 𝑈 = [0, 1]× (0, 1). Thenfor 𝑈−, 𝑈0 and 𝑈+, take

𝑈− =[0,1

5

]× (0, 1), 𝑈0 =

[2

5,3

5

]× (0, 1), 𝑈+ =

[4

5, 1

]×(2

5,3

5

).

Further, for 0 ≤ 𝑥 ≤ 1, 0 < 𝑦 < 1, put

𝜙−(𝑥,𝑦)=(1

5

√𝑥,𝑦

), 𝜙0(𝑥,𝑦)=

(2

5+1

5𝑥,𝑦

), 𝜙+(𝑥,𝑦)=

(4

5+1

5𝑥,2

5+1

5𝑦

).

Then 𝜙− : 𝑈 → 𝑈−, 𝜙0 : 𝑈 → 𝑈0 and 𝜙+ : 𝑈 → 𝑈+ are homeomorphisms.

𝜙−

��

𝜙0

��

𝜙+

��

𝑈 𝑈− 𝑈0 𝑈+


By repeated application of these homomorphisms, the sets 𝑈−, 𝑈0, 𝑈+ are“squeezed” into, respectively,

the half closed, half open rectangle

[0,

1

25

]× (0, 1), (= 𝑈−,−,−, ...),

the open line segment

{1

2

}× (0, 1), (= 𝑈0, 0, 0, ...),

the singleton set

{(1,1

2

)}, (= 𝑈+, +,+, ...).

The topological space 𝑆 resulting from the construction described above isuncountable but not compact, so Theorems 7.13 and 7.11 do not apply. Note thatTheorem 7.12 does not apply either. As 𝑈−, 𝑈0 and 𝑈+ are closed in 𝑈 , the space𝑆 is homeomorphic to the topological direct sum of three of its copies. So, if 𝑍 isa Banach space, then 𝒞∞(𝑆;𝑍) is isomorphic to 𝒞∞(𝑆;𝑍)3. It is not clear whetheror not 𝒞∞(𝑆;𝑍) is isomorphic to 𝒞∞(𝑆;𝑍)2. So on this basis, and assuming 𝑍 tobe non-trivial, we can conclude that ℒ(𝒞∞(𝑆;𝑍)) has the Cuntz 3-property, butnot (yet) that it has the Cuntz 2-property. □

We close with one more remark. Both Examples B and C feature a topologicalspace 𝑆 such that (evidently!) 𝑆 is homeomorphic to the topological direct sumof three copies of 𝑆 while it is not clear whether or not 𝑆 is homeomorphic to thetopological direct sum of two copies of 𝑆. There do exist topological spaces that arehomeomorphic to the direct sum of three copies of itself but not to that of two. Onesuch example is given in [26]. However, that example is not of primary interest tous because the conditions of Theorem 7.10 are satisfied. This is different for a spaceconstructed by W. Hanf for whose description (modulo Stone duality, i.e., in the


language of Boolean algebras) we refer to [27], Section 6.2 (see also [25], [32], [35]and [36]). That space, here for the moment denoted by 𝑇 , although uncountable,compact and Hausdorff, is not metrizable; hence Theorem 7.10 does not apply.The question arises whether or not 𝒞∞(𝑇 ;ℂ) or, more generally, 𝒞∞(𝑇 ;𝑍) with𝑍 a Banach space, is isomorphic to its square. Evidently it is isomorphic to itscube. Recall here (cf. the first paragraph of this section) that a very sophisticatedexample of a Banach space homeomorphic to its cube but not to its square hasbeen given by W.T. Gowers [21].

Acknowledgement

The authors gratefully acknowledge stimulating contacts with Jan van Mill fromthe Free University in Amsterdam about the subject matter of Sections 7 and 8of the present paper.

References

[1] S.A. Argyros, D. Freeman, R.G. Haydon, E. Odell, Th. Raikoftsalis, Th. Schlump-recht, D. Zisimopoulou, Embedding uniformly convex spaces into spaces with veryfew operators. J. Funct. Anal. 262 (2012), 825–849.

[2] S.A. Argyros, R.G. Haydon, A hereditarily indecomposable ℒ∞-space that solvesthe scalar-plus-compact problem, Acta Math. 206 (2011), 1–54.

[3] P. Alexandroff and P. Urysohn, Uber nulldimensionale Punktmengen, Math. Ann.98 (1928), 89–106.

[4] B.A. Barnes, G.J. Murphy, M.R.F. Smyth, T.T. West, Riesz and Fredholm Theoryin Banach Algebras, Research Notes in Mathematics, Vol. 67, Pitman (AdvancedPublishing Program), Boston, London, Melbourne 1982.

[5] H. Bart, T. Ehrhardt, B. Silbermann, Zero sums of idempotents in Banach algebras,Integral Equations and Operator Theory 19 (1994), 125–134.

[6] H. Bart, T. Ehrhardt, B. Silbermann, Logarithmic residues in Banach algebras,Integral Equations and Operator Theory 19 (1994), 135–152.

[7] H. Bart, T. Ehrhardt, B. Silbermann, Sums of idempotents and logarithmic residuesin matrix algebras, In: Operator Theory: Advances and Applications, Vol. 122,Birkhauser, Basel 2001, 139–168.

[8] H. Bart, T. Ehrhardt, B. Silbermann, Logarithmic residues of Fredholm operatorvalued functions and sums of finite rank projections, In: Operator Theory: Advancesand Applications, Vol. 130, Birkhauser, Basel 2001, 83–106.

[9] H. Bart, T. Ehrhardt, B. Silbermann, Logarithmic residues of analytic Banach alge-bra valued functions possessing a simply meromorphic inverse, Linear Algebra Appl.341 (2002), 327–344.

[10] H. Bart, T. Ehrhardt, B. Silbermann, Sums of idempotents in the Banach algebragenerated by the compact operators and the identity, In: Operator Theory: Advancesand Applications, Vol. 135, Birkhauser, Basel 2002, 39–60.


[11] H. Bart, T. Ehrhardt, B. Silbermann, Logarithmic residues in the Banach algebragenerated by the compact operators and the identity, Mathematische Nachrichten268 (2004), 3–30.

[12] H. Bart, T. Ehrhardt, B. Silbermann, Trace conditions for regular spectral behaviorof vector-valued analytic functions, Linear Algebra Appl. 430 (2009), 1945–1965.

[13] H. Bart, T. Ehrhardt, B. Silbermann, Spectral regularity of Banach algebras and non-commutative Gelfand theory, In: H. Dym et al. (eds.): Operator Theory: Advancesand Applications. The Israel Gohberg Memorial Volume, Vol. 218, Birkhauser, Basel2012, 123–153.

[14] H. Bart, T. Ehrhardt, B. Silbermann, Families of homomorphisms in non-com-mutative Gelfand theory: comparisons and counterexamples, In: W. Arendt et al.(eds.), Spectral Theory, Mathematical System Theory, Evolution Equations, Differen-tial and Difference Equations, Operator Theory: Advances and Applications, OT 221,Birkhauser, Springer Basel AG, 2012, 131–160.

[15] H. Bart, T. Ehrhardt, B. Silbermann, Logarithmic residues, Rouche’s Theorem, spec-tral regularity, and zero sums of idempotents: the 𝐶∗-algebra case, Indag. Math. 23(2012), 816–847.

[16] D.S. Bernstein, Matrix Mathematics, Second Edition, Princeton University Press,Princeton and Oxford, 2009.

[17] J. Cuntz, Simple 𝐶∗-algebras generated by isometries, Commun. Math. Physics, 57(1977), 173–185.

[18] K.R. Davidson, 𝐶∗-algebras by Example, Fields Institute Monographs, 6. AmericanMathematical Society, Providence, Rhode Island, 1996.

[19] T. Ehrhardt, V. Rabanovich, Yu. Samoılenko, B. Silbermann, On the decompositionof the identity into a sum of idempotents, Methods Funct. Anal. Topology 7 (2001),1–6.

[20] I. Gohberg, S. Goldberg, M.A. Kaashoek, Classes of Linear Operators, Vol. I, Op-erator Theory: Advances and Applications, Vol. 49, Birkhauser, Basel 1990.

[21] W.T. Gowers, A solution to the Schroeder-Bernstein problem for Banach spaces,Bull. London Math. Soc. 28, No. 3 (1996), 297–304.

[22] W.T. Gowers, B. Maurey, The unconditional basic sequence problem, Journal A.M.S.6 (1993), 851–874.

[23] W.T. Gowers, B. Maurey, Banach spaces with small spaces of operators, Math. Ann.307 (1997), 543–568.

[24] R. Hagen, S. Roch, B. Silbermann, 𝐶∗-algebras and Numerical Analysis, MarcelDekker, New York, 2001.

[25] W. Hanf, On some fundamental problems concerning isomorphism of Boolean alge-bras, Math. Scand. 5 (1957),205–217.

[26] J. Ketonen, The structure of countable Boolean algebras, Ann.of Math. (2) 108,No. 1 (1978), 41–89.

[27] S. Koppelberg, Handbook of Boolean algebras, Vol. 1 (J.D. Monk and R. Bonnet,eds.), North-Holland Publishing Co., Amsterdam, 1989.


[28] A.A. Miljutin, Isomorphisms of the spaces of continuous functions over compact setsof the cardinality of the continuum, Teor. Funkciı Funkcional Anal. i Prilozen. Vyp. 2(1966), 150–156 (Russian).

[29] J. van Mill, The infinite-dimensional topology of function spaces, North-HollandMathematical Library, 64, North-Holland Publishing Co., Amsterdam, 2001.

[30] C. Pearcy, D. Topping, Sums of small numbers of idempotents, Michigan Math. J.14 (1967), 453–465.

[31] S. Roch, P.A. Santos, B. Silbermann, Non-commutative Gelfand Theories, SpringerVerlag, London Dordrecht, Heidelberg, New York 2011.

[32] B.M. Scott, On an example of Sundaresan, The Proceedings of the 1980 Topol-ogy Conference (Univ. Alabama, Birmingham, Ala., 1980), Topology Proc. 5 (1980),185–186.

[33] W. Sierpinski, Sur une propriete topologique des ensembles denombrables denses ensoi, Fund. Math. 1 (1920), 11–16.

[34] A.E. Taylor, D.C. Lay, Introduction to Functional Analysis, Second Edition, JohnWiley and Sons, New York 1980.

[35] V. Trnkova, Isomorphisms of sums of countable Boolean algebras, Proc. Amer. Math.Soc., 80, No. 3 (1980), 389–392.

[36] V. Trnkova, V. Koubek, Isomorphisms of sums of Boolean algebras, Proc. Amer.Math. Soc., 66, No. 2 (1977), 231–236.

H. BartEconometric InstituteErasmus University RotterdamP.O. Box 1738NL-3000 DR Rotterdam, The Netherlandse-mail: [email protected]

T. EhrhardtMathematics DepartmentUniversity of CaliforniaSanta Cruz, CA-95064, USAe-mail: [email protected]

B. SilbermannFakultat fur MathematikTechnische Universitat ChemnitzD-09107 Chemnitz, Germanye-mail: [email protected]





Fast Inversion of Polynomial-VandermondeMatrices for Polynomial Systems Relatedto Order One Quasiseparable Matrices

T. Bella, Y. Eidelman, I. GohbergZ”L, V. Olshevskyand E. Tyrtyshnikov

Dedicated to Leonia Lerer on the occasion of his seventieth birthday

Abstract. While Gaussian elimination is well known to require 𝒪(𝑛3) opera-tions to invert an arbitrary matrix, Vandermonde matrices may be invertedusing 𝒪(𝑛2) operations by a method of Traub [24]. While this original versionof the Traub algorithm was noticed to be unstable, it was shown in [12] thatwith a minor modification, the Traub algorithm can typically yield a very highaccuracy. This approach has been extended from classical Vandermonde ma-trices to polynomial-Vandermonde matrices involving real orthogonal poly-nomials [3], [10], and Szego polynomials [19]. In this paper we present analgorithm for inversion of a class of polynomial-Vandermonde matrices withspecial structure related to order one quasiseparable matrices, generalizingmonomials, real orthogonal polynomials, and Szego polynomials. We derive afast 𝒪(𝑛2) inversion algorithm applicable in this general setting, and exploreits reduction in the previous special cases. Some very preliminary numericalexperiments are presented, demonstrating that, as observed by our colleaguesin previous work, good forward accuracy is possible in some circumstances,which is consistent with previous work of this type.

Mathematics Subject Classification (2010). 15A09; 65F05.

Keywords. Inversion of vandermonde matrices; polynomial vandermonde ma-trices; quasiseparable matrices.

1. Introduction

Let 𝑅 = {𝑟0(𝑥), 𝑟1(𝑥), . . . , 𝑟𝑛−1(𝑥)} be a sequence of polynomials satisfyingdeg(𝑟𝑘) = 𝑘, and 𝑥1, . . . , 𝑥𝑛 a set of pairwise distinct values. Then the corre-

80 T. Bella et al.

sponding polynomial-Vandermonde matrix is given by

𝑉𝑅(𝑥) =

⎡⎢⎢⎢⎣𝑟0(𝑥1) 𝑟1(𝑥1) ⋅ ⋅ ⋅ 𝑟𝑛−1(𝑥1)𝑟0(𝑥2) 𝑟1(𝑥2) ⋅ ⋅ ⋅ 𝑟𝑛−1(𝑥2)...

......

𝑟0(𝑥𝑛) 𝑟1(𝑥𝑛) ⋅ ⋅ ⋅ 𝑟𝑛−1(𝑥𝑛)

⎤⎥⎥⎥⎦ . (1.1)

In this paper the problem of inversion of the matrix 𝑉𝑅(𝑥) for a given system ofpolynomials 𝑅 satisfying some special recurrence relations is considered. While thestructure-ignoring approach of Gaussian elimination for inversion of 𝑉𝑅(𝑥) requires𝒪(𝑛3) operations, the special structure allows algorithms to be derived exploitingthat structure, resulting in fast algorithms that can compute the 𝑛2 entries of theinverse in only 𝒪(𝑛2) operations.

In the simplest case where 𝑅 = {1, 𝑥, 𝑥2, . . . , 𝑥𝑛−1}, 𝑉𝑅(𝑥) reduces to a clas-sical Vandermonde matrix and the inversion algorithm is due to Traub [24]. Inaddition to the order of magnitude decrease in complexity, it was observed in [12]that a minor modification of the original Traub algorithm results in very goodaccuracy. The derivation of this fast and accurate algorithm attracted attention inthe community, and several results were published giving fast algorithms for inver-sion of 𝑉𝑅(𝑥) for various special cases of the polynomial system 𝑅. This previouswork is listed in Table 1.

Matrix 𝑉𝑅(𝑥) Polynomials 𝑅 Fast inversion algorithm

Classical Vandermonde monomials Traub [24]

Chebyshev–Vandermonde Chebyshev Gohberg–Olshevsky [10]

Three-Term Vandermonde Real orthogonal Calvetti–Reichel [3]

Szego–Vandermonde Szego Olshevsky [19]

Table 1. Fast 𝒪(𝑛2) inversion algorithms for polynomial-Vandermondematrices.

In this paper, we consider a more general class of polynomials that containsall of those listed in Table 1 as special cases. This more general class of polynomialsis related to a class of rank structured matrices called quasiseparable matrices, andhence we refer to them as quasiseparable polynomials. As quasiseparable polynomi-als generalize monomials, real orthogonal polynomials, and Szego polynomials, theresulting inversion algorithm for polynomial-Vandermonde matrices 𝑉𝑅(𝑥) whosedefining polynomials 𝑅 are quasiseparable polynomials generalizes the previouswork in Table 1 in addition to providing new results.

In addition to generalizing these results, it is also applicable to some interest-ing new classes of polynomials for which no fast inversion algorithm of this type iscurrently available. The algorithm that is derived in this paper relies on the use ofperturbed recurrence relations for associated polynomials for the computational

Fast Inversion of (𝐻, 1)-quasiseparable-Vandermonde Matrices 81

speedup, and thus the classes of polynomials for which it may be used are bestreferred to in terms of the recurrence relations that they satisfy. One such class ofpolynomials are those satisfying the three recurrence relations

𝑟𝑘(𝑥) = (𝛼𝑘𝑥− 𝛿𝑘) ⋅ 𝑟𝑘−1(𝑥)− (𝛽𝑘𝑥+ 𝛾𝑘) ⋅ 𝑟𝑘−2(𝑥). (1.2)

As collected in Table 2, special cases of these recurrence relations are satisfied bymonomials, real orthogonal polynomials (including Chebychev polynomials), andSzego polynomials. It is of interest that although three-term recurrence relationsfor Szego polynomials (shown in Table 2) do exist in most cases [9], far more oftentwo-term recurrence relations are used for computations with Szego polynomials.

Two-term recurrence relations for the Szego polynomials{𝜙#𝑘

}in terms of the

reflection coefficients 𝜌𝑘 and complimentary parameters 𝜇𝑘 are[𝜙0(𝑥)

𝜙#0 (𝑥)

]=

1

𝜇0

[11

],

[𝜙𝑘(𝑥)

𝜙#𝑘 (𝑥)

]=

1

𝜇𝑘

[1 −𝜌∗𝑘−𝜌𝑘 1

] [𝜙𝑘−1(𝑥)𝑥𝜙#𝑘−1(𝑥)

],

(1.3)which involve an auxiliary sequence of polynomials {𝜙𝑘}. In this paper, general-izations of these two-term recurrence relations of the form[

𝐺𝑘(𝑥)𝑟𝑘(𝑥)

]=

[𝛼𝑘 𝛽𝑘𝛾𝑘 1

] [𝐺𝑘−1(𝑥)

(𝛿𝑘𝑥+ 𝜃𝑘)𝑟𝑘−1(𝑥)

], (1.4)

which we will refer to as Szego-type recurrence relations, are also considered. Fi-nally, motivated by the most generally applicable recurrence relations available forthe class of quasiseparable polynomials that we will consider [7], the [EGO05]-typerecurrence relations[

𝐺𝑘(𝑥)𝑟𝑘(𝑥)

]=

[𝛼𝑘 𝛽𝑘𝛾𝑘 𝛿𝑘𝑥+ 𝜃𝑘

] [𝐺𝑘−1(𝑥)𝑟𝑘−1(𝑥)

], (1.5)

are considered as well. Details about these classes and their corresponding recur-rence relations will be given later.

Polynomial System 𝑅 Recurrence relations

monomials 𝑟𝑘(𝑥) = 𝑥 ⋅ 𝑟𝑘−1(𝑥)Chebyshev polynomials 𝑟𝑘(𝑥) = 2𝑥 ⋅ 𝑟𝑘−1(𝑥)− 𝑟𝑘−2(𝑥)Real orthogonal polynomials 𝑟𝑘(𝑥) = (𝛼𝑘𝑥− 𝛿𝑘)𝑟𝑘−1(𝑥)− 𝛾𝑘 ⋅ 𝑟𝑘−2(𝑥)

Szego polynomials𝑟𝑘(𝑥) =

(1𝜇𝑘𝑥+ 𝜌𝑘

𝜌𝑘−1

1𝜇𝑘

)𝑟𝑘−1(𝑥)

− ( 𝜌𝑘

𝜌𝑘−1

𝜇𝑘−1

𝜇𝑘⋅ 𝑥)𝑟𝑘−2(𝑥)

Table 2. Systems of polynomials and corresponding recurrence relations.

82 T. Bella et al.

1.1. Structure of the paper

In Section 2 an inversion formula valid for a general system of polynomials (al-though expensive in general) is presented. The formula presented there reducesthe problem of inversion of 𝑉𝑅(𝑥) to that of evaluating the so-called associated

polynomials �� corresponding to the polynomial system 𝑅. A relation between the

polynomial systems 𝑅 and �� is presented in terms of their confederate matrices.

This relation suggests a procedure for evaluating the associated polynomials ��.In Section 3 quasiseparable matrices and polynomials are defined and shown togeneralize the confederate matrices of the motivating special cases. Conversionsare given between the polynomial language (i.e., polynomials satisfying recurrencerelations) and the matrix language (i.e., generators of a quasiseparable matrix),and the motivating recurrence relations are identified in terms of the generatorsof their quasiseparable confederate matrices. In Section 4, perturbed recurrence

relations are presented for the associated polynomials ��. Three different sets ofrecurrence relations are given, two generalizing known formulas for real orthogo-nal polynomials and Szego polynomials, and a third that produces new formulasfor these cases. We briefly describe in Section 5 a fast algorithm for computingthe coefficients of the master polynomial, which are required for computing theperturbations of the recurrence relations. In Section 6 the reduction of the de-scribed algorithms in the special cases of monomials, real orthogonal polynomials,and Szego polynomials are examined in detail as well. Section 7 consists of someresults of preliminary numerical experiments with the proposed algorithm, andconclusions are offered in the final section.

2. Confederate matrices and associated polynomials

In this section we present the formula that will be used to invert a polynomial-Vandermonde matrix. Such a matrix is completely determined by 𝑛 polynomi-als 𝑅 = {𝑟0(𝑥), . . . , 𝑟𝑛−1(𝑥)} and 𝑛 nodes 𝑥 = (𝑥1, . . . , 𝑥𝑛). The desired inverse𝑉𝑅(𝑥)

−1 is given by the formula

𝑉𝑅(𝑥)−1 = 𝐼 ⋅ 𝑉 𝑇

��(𝑥) ⋅ diag(𝑐1, . . . , 𝑐𝑛), (2.1)

with

𝑐𝑖 =𝑛∏

𝑘=1𝑘 ∕=𝑖

(𝑥𝑘 − 𝑥𝑖)−1

(see [18], [19]) where 𝐼 is the antidiagonal matrix (with ones on the antidiagonal

and zeros elsewhere), and �� is the system of associated (generalized Horner) poly-nomials, defined as follows: if we define themaster polynomial 𝑃 (𝑥) by 𝑃 (𝑥) = (𝑥−𝑥1) ⋅ ⋅ ⋅ (𝑥− 𝑥𝑛), then for the polynomial sequence 𝑅 = {𝑟0(𝑥), . . . , 𝑟𝑛−1(𝑥), 𝑃 (𝑥)},the associated polynomials �� = {𝑟0(𝑥), . . . , 𝑟𝑛−1(𝑥), 𝑃 (𝑥)} are those satisfying the


relations

𝑃 (𝑥) − 𝑃 (𝑦)𝑥− 𝑦 =

𝑛−1∑𝑘=0

𝑟𝑘(𝑥) ⋅ 𝑟𝑛−𝑘−1(𝑦), (2.2)

see [16]. It can be shown that for any polynomials 𝑅, a corresponding sequence of

polynomials �� satisfying (2.2) exist, and can be understood as a generalization ofthe Horner polynomials associated with the monomials; see, for instance, [2].

This discussion gives a relation between the inverse 𝑉𝑅(𝑥)−1 and the poly-

nomial-Vandermonde matrix 𝑉��(𝑥), where �� is the system of polynomials as-sociated with 𝑅. The next definition from [17] provides a connection between

recurrence relations for 𝑅 with those for ��.

Definition 2.1. Let the sequence of polynomials 𝑅 = {𝑟0(𝑥), 𝑟1(𝑥), . . . , 𝑟𝑛(𝑥)} withdeg(𝑟𝑘) = 𝑘 satisfy the 𝑛-term recurrence relations

𝑟𝑘(𝑥) = (𝛼𝑘𝑥− 𝑎𝑘−1,𝑘) ⋅ 𝑟𝑘−1(𝑥)− 𝑎𝑘−2,𝑘 ⋅ 𝑟𝑘−2(𝑥) − ⋅ ⋅ ⋅ − 𝑎0,𝑘 ⋅ 𝑟0(𝑥) (2.3)

for 𝑘 = 1, . . . , 𝑛, and let

𝑃 (𝑥) = 𝑃0 ⋅ 𝑟0(𝑥) + 𝑃1 ⋅ 𝑟1(𝑥) + ⋅ ⋅ ⋅+ 𝑃𝑛−1 ⋅ 𝑟𝑛−1(𝑥) + 𝑃𝑛 ⋅ 𝑟𝑛(𝑥) (2.4)

for 𝑃𝑛 ∕= 0. Then the confederate matrix of 𝑃 (𝑥) with respect to 𝑅 is given by

𝐶𝑅(𝑃 ) =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

𝑎01

𝛼1

𝑎02

𝛼2

𝑎03

𝛼3⋅ ⋅ ⋅ 𝑎0,𝑘

𝛼𝑘⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 𝑎0,𝑛

𝛼𝑛− 𝑃0

𝛼𝑛𝑃𝑛1𝛼1

𝑎12

𝛼2

𝑎13

𝛼3⋅ ⋅ ⋅ 𝑎1,𝑘

𝛼𝑘⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 𝑎1,𝑛

𝛼𝑛− 𝑃1

𝛼𝑛𝑃𝑛

0 1𝛼2

𝑎23

𝛼3⋅ ⋅ ⋅ ... ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 𝑎2,𝑛

𝛼𝑛− 𝑃2

𝛼𝑛𝑃𝑛

0 0 1𝛼3

. . . 𝑎𝑘−2,𝑘

𝛼𝑘

. . ....

......

. . .. . . 𝑎𝑘−1,𝑘

𝛼𝑘

. . .. . .

......

.... . .

. . . 1𝛼𝑘

. . .. . .

......

.... . .

. . .. . .

. . .. . .

...

0 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 1𝛼𝑛−1

𝑎𝑛−1,𝑛

𝛼𝑛− 𝑃𝑛−1

𝛼𝑛𝑃𝑛

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

. (2.5)

We refer to [17] for many useful properties of the confederate matrix and onlyrecall here that det(𝑥𝐼 − 𝐶𝑅(𝑃 )) = 𝑃 (𝑥)/(𝛼0 ⋅ 𝛼1 ⋅ . . . ⋅ 𝛼𝑛), and that similarly,the characteristic polynomial of the 𝑘 × 𝑘 leading submatrix of 𝐶𝑅(𝑃 ) is equal to𝑟𝑘(𝑥)/𝛼0 ⋅ 𝛼1 ⋅ . . . ⋅ 𝛼𝑘.

The motivation for considering confederate matrices is that they will allow thecomputation of the polynomials associated with the given system of polynomials.

The confederate matrices of 𝑅 and �� are related by

𝐶��(𝑃 ) = 𝐼 ⋅ 𝐶𝑅(𝑃 )𝑇 ⋅ 𝐼. (2.6)

84 T. Bella et al.

Recurrence Relations Confederate matrix 𝐶𝑅(𝑟𝑛)

𝑟𝑘(𝑥) = 𝑥𝑟𝑘−1(𝑥)

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0

1. . .

. . . 0

0. . .

. . .. . .

......

. . .. . .

. . ....

0 ⋅ ⋅ ⋅ 0 1 0

⎤⎥⎥⎥⎥⎥⎥⎥⎦Monomials Companion matrix

𝑟𝑘(𝑥) = (𝛼𝑘𝑥− 𝛿𝑘)𝑟𝑘−1(𝑥)− 𝛾𝑘𝑟𝑘−2(𝑥)

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

𝛿1𝛼1

𝛾2

𝛼20 ⋅ ⋅ ⋅ 0

1𝛼1

𝛿2𝛼2

. . .. . .

...

0 1𝛼2

. . . 𝛾𝑛−1

𝛼𝑛−10

.... . . 𝛿𝑛−1

𝛼𝑛−1

𝛾𝑛

𝛼𝑛

0 ⋅ ⋅ ⋅ 0 1𝛼𝑛−1

𝛿𝑛𝛼𝑛

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦Real orthogonalpolynomials Tridiagonal matrix

𝑟𝑘(𝑥)=[1𝜇𝑘⋅𝑥+ 𝜌𝑘

𝜌𝑘−1

1𝜇𝑘

]𝑟𝑘−1(𝑥)

− 𝜌𝑘

𝜌𝑘−1

𝜇𝑘−1

𝜇𝑘⋅ 𝑥𝑟𝑘−2(𝑥)

⎡⎢⎢⎢⎢⎢⎣

−𝜌∗0𝜌1 −𝜌∗0𝜇1𝜌2 ⋅ ⋅ ⋅ −𝜌∗0𝜇1 ⋅ ⋅ ⋅𝜇𝑛−1𝜌𝑛𝜇1 −𝜌∗1𝜌2 ⋅ ⋅ ⋅ −𝜌∗1𝜇2 ⋅ ⋅ ⋅𝜇𝑛−1𝜌𝑛

𝜇2 ⋅ ⋅ ⋅ −𝜌∗2𝜇3 ⋅ ⋅ ⋅𝜇𝑛−1𝜌𝑛. . .

. . ....

𝜇𝑛−1 −𝜌∗𝑛−1𝜌𝑛

⎤⎥⎥⎥⎥⎥⎦

Szego polynomials Unitary Hessenberg matrix

Table 3. Polynomial systems and corresponding confederate matrices.

(see [18], [19]). The passage from 𝐶𝑅(𝑃 ) to 𝐶��(𝑃 ) in (2.6) can be seen as atransposition across the antidiagonal, or a pertransposition.

In Table 3, the confederate matrices corresponding to the polynomials consid-ered in previous work are given, including monomials, real orthogonal polynomials,and Szego polynomials. These equivalences are all well known. We will show laterthat all of these confederate matrices in Table 3 are special cases of order one qua-siseparable matrices, and use properties of these confederate matrices to derivethe fast algorithm.

In accordance with (2.1), the main computational burden in inversion is tocompute 𝑉��, which requires evaluating 𝑛 associated polynomials {𝑟𝑘(𝑥)}𝑛−1𝑘=0 at 𝑛points {𝑥𝑘}𝑛𝑘=1. Using (2.6) directly to accomplish this is expensive for arbitrary𝑅, since it leads to the full 𝑛-term recurrence relations. However, in special caseswhere sparse recurrence relations for 𝑅may be found, this leads to a fast algorithm.For instance, in the monomial case, 𝑅 = {1, 𝑥, 𝑥2, . . . , 𝑥𝑛−1} satisfy the obvious


recurrence relations 𝑥𝑘 = 𝑥 ⋅𝑥𝑘−1 and hence the confederate matrix (2.5) becomes

𝐶𝑅(𝑃 ) =

⎡⎢⎢⎢⎢⎢⎢⎣0 0 ⋅ ⋅ ⋅ 0 −𝑃01 0 ⋅ ⋅ ⋅ 0 −𝑃10 1

. . ....

......

. . .. . . 0

...0 ⋅ ⋅ ⋅ 0 1 −𝑃𝑛−1

⎤⎥⎥⎥⎥⎥⎥⎦ (2.7)

which is the well-known companion matrix. Using the pertransposition rule (2.6),we obtain the confederate matrix

𝐶��(𝑟𝑛) =

⎡⎢⎢⎢⎢⎢⎢⎣−𝑃𝑛−1 −𝑃𝑛−2 ⋅ ⋅ ⋅ −𝑃1 −𝑃01 0 ⋅ ⋅ ⋅ 0 0

0 1. . .

... 0...

. . .. . . 0

...0 ⋅ ⋅ ⋅ 0 1 0

⎤⎥⎥⎥⎥⎥⎥⎦ . (2.8)

for the associated polynomials 𝑟𝑘(𝑥). Using the formula (2.3), we read from thematrix (2.8) the familiar Horner recurrence relations

𝑟0(𝑥) = 1, 𝑟𝑘(𝑥) = 𝑥 ⋅ 𝑟𝑘−1(𝑥) + 𝑃𝑛−𝑘. (2.9)

Thus the use of the Horner recurrence relations provides the computationalspeedup in the original Traub algorithm. In the next sections we use the quasisep-arability of the confederate matrices to derive corresponding recurrence relationsthat accomplish the same speedup in a more general setting.

3. Quasiseparable matrices and polynomials

Definition 3.1 (Quasiseparable matrices and polynomials).

∙ A matrix 𝐴 is called (𝐻,𝑚)-quasiseparable if

(i) it is strongly upper Hessenberg (upper Hessenberg with a nonzero firstsubdiagonal),

and

(ii) max(rank 𝐴12) = 𝑚 where the maximum is taken over all symmetricpartitions of the form

𝐴 =

[ ∗ 𝐴12

∗ ∗]; (3.1)

86 T. Bella et al.

for instance, the low-rank blocks of a 5×5 (𝐻,𝑚)-quasiseparable matrixwould be those shaded below:⎡⎢⎢⎢⎢⎣

★ ★ ★ ★ ★★ ★ ★ ★ ★0 ★ ★ ★ ★0 0 ★ ★ ★0 0 0 ★ ★

⎤⎥⎥⎥⎥⎦⎡⎢⎢⎢⎢⎣★ ★ ★ ★ ★★ ★ ★ ★ ★0 ★ ★ ★ ★0 0 ★ ★ ★0 0 0 ★ ★

⎤⎥⎥⎥⎥⎦⎡⎢⎢⎢⎢⎣★ ★ ★ ★ ★★ ★ ★ ★ ★0 ★ ★ ★ ★0 0 ★ ★ ★0 0 0 ★ ★

⎤⎥⎥⎥⎥⎦⎡⎢⎢⎢⎢⎣★ ★ ★ ★ ★★ ★ ★ ★ ★0 ★ ★ ★ ★0 0 ★ ★ ★0 0 0 ★ ★

⎤⎥⎥⎥⎥⎦.

∙ Let 𝐴 = [𝑎𝑖𝑗 ] be an (𝐻,𝑚)-quasiseparable matrix. Then the system of poly-nomials {𝑟𝑘(𝑥)} related to 𝐴 via

𝑟𝑘(𝑥) = 𝛼1 ⋅ ⋅ ⋅𝛼𝑘 det (𝑥𝐼 −𝐴)(𝑘×𝑘) (where 𝛼𝑖 = 1/𝑎𝑖+1,𝑖)

is called a system of (𝐻,𝑚)-quasiseparable polynomials. That is, (𝐻,𝑚)-quasiseparable polynomials are those polynomials with an (𝐻,𝑚)-quasisepar-able confederate matrix.

The low-rank property described in this definition means there is redundancyin the definition of the 𝑛2 entries of an (𝐻,𝑚)-quasiseparable matrix, and theseentries may be described by a smaller number 𝒪(𝑚𝑛) of parameters. If 𝑚 is suf-ficiently small and independent of 𝑛, then this provides a significant reduction.The following well-known result may be found, for instance, in [4], and it providesthis smaller set of parameters, called the generators of the (𝐻,𝑚)-quasiseparablematrix. An 𝑛×𝑛 matrix 𝐴 is (𝐻,𝑚)-quasiseparable if and only if it may be writtenin the form

𝐴 =

��

��

��

��

��

��

��

�𝑑1

. . .

. . .

𝑑𝑛

��

��

𝑝2𝑞1

. . .

𝑝𝑛𝑞𝑛−10

𝑔𝑖𝑏×𝑖𝑗ℎ𝑗

with

𝑏×𝑖𝑗 = (𝑏𝑖+1) ⋅ ⋅ ⋅ (𝑏𝑗−1), 𝑏𝑖,𝑖+1 = 𝐼. (3.2)


Here 𝑝𝑘, 𝑞𝑘, 𝑑𝑘 are scalars, the elements 𝑔𝑘 are row vectors of maximal size 𝑚, ℎ𝑘are column vectors of maximal size 𝑚, and 𝑏𝑘 are matrices of maximal size 𝑚×𝑚such that all products are defined. The elements {𝑝𝑘, 𝑞𝑘, 𝑑𝑘, 𝑔𝑘, 𝑏𝑘, ℎ𝑘} are calledthe generators of the matrix 𝐴.

The elements in the upper part of the matrix 𝑔𝑖𝑏×𝑖𝑗ℎ𝑗 are products of a row

vector, a (possibly empty) sequence of matrices possibly of different sizes, andfinally a column vector, as depicted here:

𝑔𝑖𝑏×𝑖𝑗ℎ𝑗 =

1×𝑢𝑖︷︸︸︷𝑔𝑖

. .

𝑢𝑖×𝑢𝑖+1︷︸︸︷𝑏𝑖+1

. .

𝑢𝑖+1×𝑢𝑖+2︷︸︸︷𝑏𝑖+2

. .

⋅ ⋅ ⋅

𝑢𝑗−2×𝑢𝑗−1︷︸︸︷𝑏𝑗−1

. .

𝑢𝑗−1×1︷︸︸︷ℎ𝑗

(3.3)

with 𝑢𝑘 ⩽ 𝑚 for each 𝑘 = 1, . . . , 𝑛− 1.The generator definition and formula (2.3) together give the following 𝑛-term

recurrence relations for (𝐻, 1)-quasiseparable polynomials. These 𝑛-term recur-rence relations are not useful for computations, as they would not provide fastalgorithms, but rather are used theoretically in the derivations.

Lemma 3.2. Let 𝑅 be a sequence of (𝐻, 1)-quasiseparable polynomials specified bythe generators of the corresponding (𝐻, 1)-quasiseparable confederate matrix. Then𝑅 satisfies the 𝑛-term recurrence relations

𝑟𝑘(𝑥) =1

𝑝𝑘+1𝑞𝑘

⎡⎣(𝑥 − 𝑑𝑘)𝑟𝑘−1(𝑥) − 𝑘−2∑𝑗=0

(𝑔𝑗+1𝑏

×𝑗+1,𝑘ℎ𝑘𝑟𝑗(𝑥)

)⎤⎦ , (3.4)

for 𝑘 = 1, . . . , 𝑛− 1.It is easily verified that each of the confederate matrices in Table 3 is (𝐻, 1)-

quasiseparable. Therefore, the class of (𝐻, 1)-quasiseparable polynomials includesas special cases the important classical polynomial classes of real orthogonal poly-nomials and Szego polynomials. The next theorems show that, like these motivat-ing examples, the confederate matrices corresponding to the polynomials satisfyingthe recurrence relations (1.2), (1.4), and (1.5) are also (𝐻, 1)-quasiseparable. Fur-thermore, explicit expressions for the generators of the confederate matrices aregiven in terms of the recurrence relations coefficients. This is useful in case theinput to the algorithms is to be these recurrence relations coefficients, as the al-gorithms are given in terms of the quasiseparable generators. We omit the proofs,which follow from repeated use of the appropriate recurrence relations and givengenerators to produce the 𝑛-term recurrence relations of Lemma 3.2.

Theorem 3.3. Let 𝑅 = {𝑟0(𝑥), . . . , 𝑟𝑛(𝑥)} be a sequence of polynomials satisfyingdeg(𝑟𝑘) = 𝑘 and the recurrence relations (1.2). Then the confederate matrix of

88 T. Bella et al.

𝑟𝑛(𝑥) with respect to 𝑅 is given by

𝐶𝑅(𝑟𝑛) =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

𝛿1𝛼1

𝛿1𝛼1

𝛽2+𝛾2

𝛼2⋅ ⋅ ⋅

𝛿1𝛼1

𝛽2+𝛾2

𝛼2

(𝛽3

𝛼3

)(𝛽4

𝛼4

)⋅ ⋅ ⋅(

𝛽𝑛

𝛼𝑛

)1𝛼1

𝛿2𝛼2+ 𝛽2

𝛼1𝛼2⋅ ⋅ ⋅

(𝛿2𝛼2

+𝛽2

𝛼1𝛼2

)𝛽3+𝛾3

𝛼3

(𝛽4

𝛼4

)⋅ ⋅ ⋅(

𝛽𝑛

𝛼𝑛

). . .

. . .

(𝛿𝑛−1𝛼𝑛−1

+𝛽𝑛−1

𝛼𝑛−2𝛼𝑛−1

)𝛽𝑛+𝛾𝑛

𝛼𝑛1

𝛼𝑛−1

𝛿𝑛𝛼𝑛

+ 𝛽𝑛

𝛼𝑛−1𝛼𝑛

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.

Furthermore, 𝐶𝑅(𝑟𝑛) is an (𝐻, 1)-quasiseparable matrix with generators

𝑑1 =𝛿1𝛼1, 𝑑𝑘 =

𝛿𝑘𝛼𝑘

+𝛽𝑘

𝛼𝑘−1𝛼𝑘, 𝑘 = 2, . . . , 𝑛

𝑝𝑘+1𝑞𝑘 =1

𝛼𝑘, 𝑔𝑘 =

𝑑𝑘𝛽𝑘+1 + 𝛾𝑘+1

𝛼𝑘+1, 𝑘 = 1, . . . , 𝑛− 1

𝑏𝑘 =𝛽𝑘+1

𝛼𝑘+1, 𝑘 = 2, . . . , 𝑛− 1 ℎ𝑘 = 1, 𝑘 = 2, . . . , 𝑛.

(3.5)

Theorem 3.4. Let 𝑅 = {𝑟0(𝑥), . . . , 𝑟𝑛(𝑥)} be a sequence of polynomials satisfyingdeg(𝑟𝑘) = 𝑘 and the recurrence relations (1.4). Then the confederate matrix of𝑟𝑛(𝑥) with respect to 𝑅 is given by⎡⎢⎢⎢⎢⎢⎢⎢⎣

− 𝜃1+𝛾1

𝛿1−(𝛼1 − 𝛽1𝛾1)𝛾2

𝛿2⋅ ⋅ ⋅ −(𝛼1 − 𝛽1𝛾1) ⋅ ⋅ ⋅ (𝛼𝑛−1 − 𝛽𝑛−1𝛾𝑛−1)𝛾𝑛

𝛿𝑛1𝛿1

− 𝜃2+𝛾2𝛽1

𝛿2⋅ ⋅ ⋅ −𝛽1(𝛼2 − 𝛽2𝛾2) ⋅ ⋅ ⋅ (𝛼𝑛−1 − 𝛽𝑛−1𝛾𝑛−1)𝛾𝑛

𝛿𝑛

1𝛿2

. . ....

. . . −𝛽𝑛−1(𝛼𝑛−1 − 𝛽𝑛−1𝛾𝑛−1)𝛾𝑛

𝛿𝑛1

𝛿𝑛−1− 𝜃𝑛+𝛾𝑛𝛽𝑛−1

𝛿𝑛

⎤⎥⎥⎥⎥⎥⎥⎥⎦.

(3.6)Furthermore, 𝐶𝑅(𝑟𝑛) is an (𝐻, 1)-quasiseparable matrix with generators

𝑑1 = −𝜃1 + 𝛾1𝛿1

, 𝑑𝑘 = −𝜃𝑘 + 𝛾𝑘𝛽𝑘−1𝛿𝑘

, 𝑘 = 2, . . . , 𝑛

𝑝𝑘 = 1, 𝑘 = 2, . . . , 𝑛

𝑞𝑘 =1

𝛿𝑘, 𝑘 = 1, . . . , 𝑛− 1

𝑔1 = 1, 𝑔𝑘 = 𝛽𝑘−1, 𝑘 = 2, . . . , 𝑛− 1𝑏𝑘 = 𝛼𝑘−1 − 𝛽𝑘−1𝛾𝑘−1, 𝑘 = 2, . . . , 𝑛− 1ℎ𝑘 = −𝛾𝑘

𝛿𝑘(𝛼𝑘−1 − 𝛽𝑘−1𝛾𝑘−1) , 𝑘 = 2, . . . , 𝑛

We next give a detailed example of the specification of this result to theclassical Szego case.


Example 3.5 (Classical Szego polynomials). With the choices

𝛼𝑘 =1

𝜇𝑘, 𝛽𝑘 = −𝜌∗𝑘, 𝛾𝑘 = − 𝜌𝑘

𝜇𝑘, 𝛿𝑘 =

1

𝜇𝑘, 𝜃𝑘 = 0

from which it follows that

𝛼𝑘 − 𝛽𝑘𝛾𝑘 = 1− ∣𝜌𝑘∣2𝜇𝑘

= 𝜇𝑘,𝛾𝑘𝛿𝑘

= −𝜌𝑘

the two-term recurrence relations (1.4) become[𝜙𝑘(𝑥)

𝜙#𝑘 (𝑥)

]=

1

𝜇𝑘

[1 −𝜌∗𝑘−𝜌𝑘 1

] [𝜙𝑘−1(𝑥)𝑥𝜙#𝑘−1(𝑥)

](3.7)

and the matrix (3.6) reduces to the matrix⎡⎢⎢⎢⎢⎢⎣𝜌1 𝜇1𝜌2 𝜇1𝜇2𝜌3 ⋅ ⋅ ⋅ 𝜇1 ⋅ ⋅ ⋅𝜇𝑛−1𝜌𝑛𝜇1 −𝜌∗1𝜌2 −𝜌∗1𝜇2𝜌3 ⋅ ⋅ ⋅ −𝜌∗1𝜇2 ⋅ ⋅ ⋅𝜇𝑛−1𝜌𝑛0 𝜇2 −𝜌∗2𝜌3 ⋅ ⋅ ⋅ −𝜌∗2𝜇3 ⋅ ⋅ ⋅𝜇𝑛−1𝜌𝑛...

. . .. . .

. . ....

0 ⋅ ⋅ ⋅ 0 𝜇𝑛−1 −𝜌∗𝑛−1𝜌𝑛

⎤⎥⎥⎥⎥⎥⎦ .

Using the convention that 𝜌0 := −1 to insert 1 = −𝜌∗0 throughout the first row,this matrix becomes exactly the unitary Hessenberg matrix displayed in Table 3.This demonstrates that the Szego polynomials are a special case of polynomialssatisfying (1.4), and likewise the unitary Hessenberg matrix is a special case ofthose of the form (3.6).

We also note that the condition 𝑏𝑘 ∕= 0 is not satisfied by the real orthogonalpolynomials, and hence the form (3.6) cannot be used for them.

Theorem 3.6. Let 𝑅 = {𝑟0(𝑥), . . . , 𝑟𝑛(𝑥)} be a sequence of polynomials satisfyingdeg(𝑟𝑘) = 𝑘 and the recurrence relations (1.5). Then the confederate matrix of𝑟𝑛(𝑥) with respect to 𝑅 is given by

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

− 𝜃1𝛿1

−𝛽1(𝛾2𝛿2) −𝛽1𝛼2(

𝛾3𝛿3) −𝛽1𝛼2𝛼3(

𝛾4𝛿4) ⋅ ⋅ ⋅ −𝛽1𝛼2𝛼3𝛼4 ⋅ ⋅ ⋅𝛼𝑛−1(

𝛾𝑛𝛿𝑛

)1𝛿1

− 𝜃2𝛿2

−𝛽2(𝛾3𝛿3) −𝛽2𝛼3(

𝛾4𝛿4) ⋅ ⋅ ⋅ −𝛽2𝛼3𝛼4 ⋅ ⋅ ⋅𝛼𝑛−1(

𝛾𝑛𝛿𝑛

)

0 1𝛿2

− 𝜃3𝛿3

−𝛽3(𝛾4𝛿4)

. . . −𝛽3𝛼4 ⋅ ⋅ ⋅𝛼𝑛−1(𝛾𝑛𝛿𝑛

)

0 0 1𝛿3

− 𝜃4𝛿4

. . ....

.... . .

. . .. . .

. . . −𝛽𝑛−1(𝛾𝑛𝛿𝑛

)

0 ⋅ ⋅ ⋅ 0 0 1𝛿𝑛−1

− 𝜃𝑛𝛿𝑛

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(3.8)

90 T. Bella et al.

Furthermore, 𝐶𝑅(𝑟𝑛) is an (𝐻, 1)-quasiseparable matrix with generators

𝑑𝑘 = −𝜃𝑘𝛿𝑘, 𝑘 = 1, . . . , 𝑛, 𝑝𝑘 = 1, 𝑘 = 2, . . . , 𝑛

𝑞𝑘 =1

𝛿𝑘, 𝑘 = 1, . . . , 𝑛− 1, 𝑔𝑘 = 𝛽𝑘, 𝑘 = 1, . . . , 𝑛− 1

𝑏𝑘 = 𝛼𝑘, 𝑘 = 2, . . . , 𝑛− 1, ℎ𝑘 = −𝛾𝑘𝛿𝑘, 𝑘 = 2, . . . , 𝑛

Example 3.7 (Szego polynomials). If we choose

𝛼𝑘 = 𝜇𝑘, 𝛽𝑘 = 𝜌∗𝑘−1𝜇𝑘, 𝛾𝑘 =

𝜌𝑘𝜇𝑘, 𝛿𝑘 =

1

𝜇𝑘, 𝜃𝑘 =

𝜌∗𝑘−1𝜌𝑘𝜇𝑘

the two-term recurrence relations (1.5) do not reduce to the known two-termrecurrence relations for the Szego polynomials (1.3), but become instead the newrelations [

𝐺0(𝑥)

𝜙#0 (𝑥)

]=

[01

][𝐺𝑘(𝑥)

𝜙#𝑘 (𝑥)

]=

[𝜇𝑘 𝜌∗𝑘−1𝜇𝑘𝜌𝑘

𝜇𝑘

𝑥+𝜌∗𝑘−1𝜌𝑘

𝜇𝑘

][𝐺𝑘−1(𝑥)𝜙#𝑘−1(𝑥)

]. (3.9)

The matrix (3.8) does in fact reduce to the classical unitary Hessenberg matrixdisplayed in (6.8).

Both the classical Szego formula (3.7) and the new formula (3.9) describe,

of course, the same Szego polynomials {𝜙#𝑘 (𝑥)}. However, the auxilary polynomi-als {𝐺𝑘(𝑥)} differ from {𝜙𝑘(𝑥)} used in (3.7). Indeed, it is well known that theauxiliary polynomials involved in (3.7) satisfy

𝜙𝑘(𝑥) = 𝑥𝑛 ⋅[𝜙#𝑘 (

1

𝑥∗)

]∗,

and in particular, deg𝜙𝑘(𝑥) = deg𝜙#𝑘 (𝑥). At the same time, it is easy to seethat the auxiliary polynomials {𝜙𝑘(𝑥)} of the new formula (3.9) are different; in

particular deg𝐺𝑘(𝑥) = deg𝜙#𝑘 (𝑥) − 1.Example 3.8 (Real orthogonal polynomials). For systems with 𝛼𝑘 = 0, the ma-trix (3.8) becomes tridiagonal, and the corresponding system of polynomials areorthogonal on a real interval. Indeed, 𝛼𝑘 = 0 implies 𝐺𝑘−1 = 𝛽𝑘−1𝑟𝑘−2(𝑥) andhence the relations (1.5) become just the familiar three-term recurrence relations

𝑟𝑘(𝑥) = (𝛿𝑘𝑥+ 𝜃𝑘)𝑟𝑘−1(𝑥) + 𝛾𝑘𝛽𝑘−1𝑟𝑘−2(𝑥).

4. Sparse recurrence relations for associated polynomials

At this point we can give an overview of the procedure for inversion of a polynomial-Vandermonde matrix whose polynomials are quasiseparable polynomials. Giventhe generators of an (𝐻, 1)-quasiseparable matrix (or using the theorems of the


previous section to obtain these generators given the recurrence relations coef-ficients) which is the confederate matrix with respect to the master polynomial𝑃 (𝑥) =

∏(𝑥− 𝑥𝑘) defined by the nodes 𝑥𝑘, 𝑘 = 1, . . . , 𝑛, with (2.4), we have

��

��

��

��

��

��

��

�𝑑1

. . .

. . .

𝑑𝑛

��

��

𝑝2𝑞1

. . .


𝑔𝑖𝑏×𝑖𝑗ℎ𝑗

𝐶𝑅(𝑃 ) = − 1𝑃𝑛

0

𝑃0

...

𝑃𝑛−1(4.1)

Applying (2.6) gives us the confederate matrix for the associated polynomials as

��

��

��

��

��

��

��

�

. . .

. . .

��

��

. . .

0

𝑑𝑛

𝑑1

��

��


𝑝2𝑞1

𝑔𝑛−𝑗𝑏×𝑛−𝑗,𝑛−𝑖ℎ𝑛−𝑖

𝐶��(𝑃 ) = − 1𝑃𝑛

0

𝑃𝑛−1 ⋅ ⋅ ⋅ 𝑃0

(4.2)

From this last equation we can see that the 𝑛-term recurrence relations satisfied

by the associated polynomials �� are given by

𝑟𝑘(𝑥) =1

𝑝𝑘+1𝑞𝑘

[(𝑥− 𝑑𝑘)𝑟𝑘−1(𝑥) −

𝑘−2∑𝑗=0

(𝑔𝑗+1��

×𝑗+1,𝑘ℎ𝑘𝑟𝑗(𝑥)

)︸︷︷︸

typical term as in (3.4)

− 𝑃𝑛−𝑘

𝑃𝑛𝑟0(𝑥)︸︷︷︸

perturbation

]

(4.3)where, in order to simplify the formulas, we have introduced the notation

𝑝𝑘 = 𝑞𝑛−𝑘+1, 𝑞𝑘 = 𝑝𝑛−𝑘+1, 𝑑𝑘 = 𝑑𝑛−𝑘+1,

𝑔𝑘 = ℎ𝑛−𝑘+1, ��𝑘 = 𝑏𝑛−𝑘+1, ℎ𝑘 = 𝑔𝑛−𝑘+1.(4.4)

Notice that the nonzero top row of the second matrix in (4.2) introduces pertur-bation terms into the recurrence relations for all of the associated polynomials.

92 T. Bella et al.

Having found explicit 𝑛-term recurrence relations for the sequence of poly-nomials associated with the given polynomials, the next goal is to find sparserecurrence relations. The motivation is that the 𝑛-term recurrence relations areslow; they lead to 𝒪(𝑛3) algorithms, while two- and three-term recurrence rela-tions lead to 𝒪(𝑛2) algorithms.

Sparse recurrence relations are not, of course, available for all polynomialsequences 𝑅; this is a special property. In this section we consider the case where𝑅 is a system of (𝐻, 1)-quasiseparable polynomials, and we derive sparse recurrencerelations for the associated system of polynomials.

For certain polynomial systems whose confederate matrix is not Hessenberg,such recurrence relations are derived in [7]. Obtaining similar formulas for theleading minors of 𝐶��(𝑃 ) of the form shown in (4.2) is not simple, as the secondterm in (4.2) now affects each column in such a way that the resulting leadingsubmatrices become only (𝐻, 2)-quasiseparable, as opposed to the submatrices of(4.1), which are (𝐻, 1)-quasiseparable.

A summary of the results obtained in this section is presented in the nextTable 4. It is worth noting that in this paper we generalize all of the previousalgorithms, and not just the most widely applicable one. As a result, some ofthe results that are derived have some restrictions on their use. This leads togeneralizations of the classical algorithms as well as some new ones below.

Generators of 𝑅 ⇒ Perturbed Type of recurrence Restrictions

recurrence relations for �� relations derived on applicability

Theorem 4.1 Perturbed 3-term 𝑔𝑘 ∕= 0

Theorem 4.2 Perturbed Szego-type 𝑏𝑘 ∕= 0

Theorem 4.3 Perturbed [EGO05]-type none

Table 4. Perturbed recurrence relations for the system of associated

polynomials ��.

Theorem 4.1 (Perturbed three-term recurrence relations). Let 𝑅 = {𝑟0(𝑥), . . .,𝑟𝑛−1(𝑥), 𝑃 (𝑥)} be a system of (𝐻, 1)-quasiseparable polynomials corresponding toan (𝐻, 1)-quasiseparable matrix of size 𝑛×𝑛 with generators {𝑝𝑘, 𝑞𝑘, 𝑑𝑘, 𝑔𝑘, 𝑏𝑘, ℎ𝑘},with the convention that 𝑔𝑛 = 1, 𝑏𝑛 = 0. Suppose further that 𝑔𝑘 ∕= 0 for 𝑘 =

1, . . . , 𝑛 − 1. Then the system of polynomials �� associated with 𝑅 satisfies therecurrence relations

𝑟0(𝑥) = 𝑃𝑛, 𝑟1(𝑥) =1

𝑝2𝑞1

(𝑥− 𝑑1

)𝑟0(𝑥) +

1

𝑝2𝑞1𝑃𝑛−1


𝑟𝑘(𝑥) = (��𝑘𝑥− 𝛿𝑘) ⋅ 𝑟𝑘−1(𝑥)− (𝛽𝑘𝑥+ 𝛾𝑘) ⋅ 𝑟𝑘−2(𝑥)︸︷︷︸typical three-term recurrence relation terms

+ ��𝑘𝑃𝑛−𝑘 − 𝛽𝑘𝑃𝑛−𝑘+1︸︷︷︸perturbation term

,

(4.5)for 𝑘 = 2, . . . , 𝑛− 1, where

��𝑘 =1

𝑝𝑘+1𝑞𝑘, 𝛿𝑘 =

1

𝑝𝑘+1𝑞𝑘

(𝑑𝑘 − 𝑝𝑘𝑞𝑘−1ℎ𝑘 ��𝑘−1

ℎ𝑘−1

)(4.6)

𝛽𝑘 =1

𝑝𝑘+1𝑞𝑘

ℎ𝑘��𝑘−1ℎ𝑘−1

, 𝛾𝑘 =1

𝑝𝑘+1𝑞𝑘

ℎ𝑘

ℎ𝑘−1

(ℎ𝑘−1𝑔𝑘−1 − 𝑑𝑘−1��𝑘−1

), (4.7)

and the coefficients 𝑃𝑘, 𝑘 = 0, . . . , 𝑛 are as defined in (2.4).

Proof. Let 𝑆 = {𝑠0(𝑥), 𝑠1(𝑥), . . . , 𝑠𝑛−1(𝑥)} be the system of polynomials corre-sponding to the (𝐻, 2)-quasiseparable matrix 𝐶��(𝑃 ) of the form in (4.2). Thenfrom (2.3) and (4.2), we have for 𝑘 = 1, 2, . . . , 𝑛− 1𝑠𝑘(𝑥) =

1

𝑝𝑘+1𝑞𝑘

[(𝑥− 𝑑𝑘)𝑠𝑘−1(𝑥) − 𝑔𝑘−1ℎ𝑘𝑠𝑘−2(𝑥)− 𝑔𝑘−2��𝑘−1ℎ𝑘𝑠𝑘−3(𝑥)

− ⋅ ⋅ ⋅ − 𝑔2��3 ⋅ ⋅ ⋅ ��𝑘−1ℎ𝑘𝑠1(𝑥)− 𝑔1��2 ⋅ ⋅ ⋅ ��𝑘−1ℎ𝑘𝑠0(𝑥) + 𝑃𝑛−𝑘

]. (4.8)

It suffices to show that the system of polynomials {𝑟0(𝑥), 𝑟1(𝑥), . . ., 𝑟𝑛−1(𝑥)}defined by the recurrence relations in (4.5)–(4.7) coincide with the system 𝑆; thatis, that 𝑟𝑘(𝑥) = 𝑠𝑘(𝑥) for each 𝑘. We present this proof by induction. By directconfirmation, it is seen that 𝑟0(𝑥) = 𝑠0(𝑥) and 𝑟1(𝑥) = 𝑠1(𝑥).

Next suppose that the conclusion is true for each index less than or equal to𝑘 − 1 for some 2 ⩽ 𝑘 ⩽ 𝑛− 1. Then (4.8) for 𝑘 − 1 yields

𝑥𝑟𝑘−2(𝑥) = 𝑝𝑘𝑞𝑘−1𝑟𝑘−1(𝑥) + 𝑑𝑘−1𝑟𝑘−2(𝑥) (4.9)

+ 𝑔𝑘−2ℎ𝑘−1𝑟𝑘−3(𝑥) + 𝑔𝑘−3��𝑘−2ℎ𝑘−1𝑟𝑘−4(𝑥)

+ ⋅ ⋅ ⋅+ 𝑔2��3 ⋅ ⋅ ⋅ ��𝑘−2ℎ𝑘−1𝑟1(𝑥)+ 𝑔1��2 ⋅ ⋅ ⋅ ��𝑘−2ℎ𝑘−1𝑟0(𝑥) − 𝑃𝑛−𝑘+1.

Next, the polynomial 𝑟𝑘(𝑥) satisfies the recurrence relations (4.5), noting that by

hypothesis, ℎ𝑘 = 𝑔𝑛−𝑘+1 ∕= 0 for each 𝑘. Inserting (4.9) into (4.5) and using theinductive hypothesis, we arrive at exactly (4.8) for 𝑟𝑘(𝑥), which completes theproof. □

Theorem 4.2 (Perturbed Szego-type recurrence relations). Let 𝑅 = {𝑟0(𝑥), . . .,𝑟𝑛−1(𝑥), 𝑃 (𝑥)} be a system of (𝐻, 1)-quasiseparable polynomials corresponding toan (𝐻, 1)-quasiseparable matrix of size 𝑛×𝑛 with generators {𝑝𝑘, 𝑞𝑘, 𝑑𝑘, 𝑔𝑘, 𝑏𝑘, ℎ𝑘},with the convention that 𝑔𝑛 = 0, 𝑏𝑛 = 1. Suppose further that 𝑏𝑘 ∕= 0 for 𝑘 =

2, . . . , 𝑛 − 1. Then the system of polynomials �� associated with 𝑅 satisfy the re-currence relations [

𝐺0(𝑥)𝑟0(𝑥)

]=

[ −𝑔1𝑃𝑛𝑃𝑛

], (4.10)

94 T. Bella et al.

[𝐺𝑘(𝑥)𝑟𝑘(𝑥)

]= 1

𝑝𝑘+1𝑞𝑘

[𝑣𝑘 −𝑔𝑘+1

ℎ𝑘/��𝑘 1

] [𝐺𝑘−1(𝑥)

𝑢𝑘(𝑥)𝑟𝑘−1(𝑥) +𝑃𝑛−𝑘

]︸︷︷︸

perturbation term

(4.11)for 𝑘 = 1, . . . , 𝑛−1, with auxiliary polynomials 𝐺𝑘(𝑥), and the coefficients 𝑃𝑘, 𝑘 =0, . . . , 𝑛 are as defined in (2.4), with the notations

𝑢𝑘(𝑥) = (𝑥− 𝑑𝑘) + 𝑔𝑘ℎ𝑘��𝑘

, 𝑣𝑘 = 𝑝𝑘+1��𝑘+1𝑞𝑘 − 𝑔𝑘+1ℎ𝑘

��𝑘. (4.12)

Proof. Suppose first that the generators are such that 𝑔𝑘 ∕= 0 for each 𝑘. The proofin this case will be given by showing that the system of polynomials generated bythe perturbed two-term recurrence relations (4.10)–(4.11) coincide with those givenby Theorem 4.1. From (4.11),(𝑣𝑘 +

𝑔𝑘+1ℎ𝑘

��𝑘

)[𝐺𝑘−1(𝑥)

𝑢𝑘(𝑥)𝑟𝑘−1(𝑥) + 𝑃𝑛−𝑘

]= 𝑝𝑘+1𝑞𝑘

[1 𝑔𝑘+1

− ℎ𝑘

��𝑘𝑣𝑘

][𝐺𝑘(𝑥)𝑟𝑘(𝑥)

],

(4.13)and using (4.11) for 𝑘 + 1, we have

𝐺𝑘(𝑥) =

(��𝑘+1

ℎ𝑘+1

)(𝑝𝑘+2𝑞𝑘+1𝑟𝑘+1(𝑥) − 𝑢𝑘+1𝑟𝑘(𝑥)− 𝑃𝑛−𝑘−1)

Together with this, (4.13) produces (4.5) as desired. As by assumption 𝑔𝑘 ∕= 0, foreach 𝑘, Theorem 4.1 implies the result.

For the case of a polynomial system 𝑅 where 𝑔𝑗 = 0 for some 𝑗, note thatthe coefficients of the polynomials generated by the two-term recurrence relationsdepend continuously on the entries of the 2 × 2 transfer matrix. Let {𝜖𝑘} be asequence tending to zero with 𝜖𝑘 ∕= 0 for each 𝑘, and consider a sequence 𝑅𝑘 with𝑔𝑗 = 𝜖𝑘 for each 𝑗 such that 𝑔𝑗 = 0 in the original polynomial system 𝑅, and allother generators the same as in 𝑅. Then the result of the theorem holds for thesystem 𝑅𝑘 for every 𝑘 by above, and 𝑅𝑘 → 𝑅, so by continuity, the result musthold for 𝑅 as well. This completes the proof. □

The formulas of the previous two theorems generalize the classical formulasfor monomials, real-orthogonal polynomials, and Szego polynomials (demonstratedbelow). We emphasize at this point that these formulas have limitations in thegeneral case: Theorem 4.1 requires nonzero 𝑔𝑘 for each 𝑘, and Theorem 4.2 requiresnonzero 𝑏𝑘 for each 𝑘. The next theorem is more general, and does not have anysuch limitations.

Theorem 4.3 (Perturbed [EGO05]-type recurrence relations). Let 𝑅 = {𝑟0(𝑥), . . .,𝑟𝑛−1(𝑥), 𝑃 (𝑥)} be a system of (𝐻, 1)-quasiseparable polynomials corresponding toan (𝐻, 1)-quasiseparable matrix of size 𝑛×𝑛 with generators {𝑝𝑘, 𝑞𝑘, 𝑑𝑘, 𝑔𝑘, 𝑏𝑘, ℎ𝑘},


with the convention that 𝑞𝑛 = 0, 𝑏𝑛 = 0. Then the system of polynomials �� asso-ciated with 𝑅 satisfy the recurrence relations[

𝐹0(𝑥)𝑟0(𝑥)

]=

[0𝑃𝑛

], (4.14)[

𝐹𝑘(𝑥)𝑟𝑘(𝑥)

]=

1

𝑝𝑘+1𝑞𝑘

[𝑞𝑘𝑝𝑘��𝑘 −𝑞𝑘𝑔𝑘𝑝𝑘ℎ𝑘 𝑥− 𝑑𝑘

] [𝐹𝑘−1(𝑥)𝑟𝑘−1(𝑥)

]︸︷︷︸

typical terms

+1

𝑝𝑘+1𝑞𝑘

[0

𝑃𝑛−𝑘

]︸︷︷︸perturbation term

(4.15)

with auxiliary polynomials 𝐹𝑘(𝑥), and the coefficients 𝑃𝑘, 𝑘 = 0, . . . , 𝑛 are as de-fined in (2.4).

Proof. The recurrence relations (4.15) define a system of polynomials which satisfythe 𝑛-term recurrence relations

𝑟𝑘(𝑥) = (𝛼𝑘𝑥− 𝑎𝑘−1,𝑘) ⋅ 𝑟𝑘−1(𝑥)− 𝑎𝑘−2,𝑘 ⋅ 𝑟𝑘−2(𝑥)− ⋅ ⋅ ⋅ − 𝑎0,𝑘 ⋅ 𝑟0(𝑥) (4.16)

for some coefficients 𝛼𝑘, 𝑎𝑘−1,𝑘, . . . , 𝑎0,𝑘. The proof is presented by showing thatthese 𝑛-term recurrence relations coincide exactly with (4.3). Using relations for

𝑟𝑘(𝑥) and 𝐹𝑘−1(𝑥) from (4.15), we have

𝑟𝑘(𝑥) =1

𝑝𝑘+1𝑞𝑘

[(𝑥− 𝑑𝑘)𝑟𝑘−1(𝑥)− 𝑔𝑘−1ℎ𝑘𝑟𝑘−2(𝑥)

+ ℎ𝑘𝑝𝑘−1��𝑘−1𝐹𝑘−2(𝑥) +𝑃𝑛−𝑘

𝑃𝑛𝑟0(𝑥)

].

(4.17)

Notice that again using (4.15) to eliminate 𝐹𝑘−2(𝑥) from the equation (4.17) will

result in an expression for 𝑟𝑘(𝑥) in terms of 𝑟𝑘−1(𝑥), 𝑟𝑘−2(𝑥), 𝑟𝑘−3(𝑥), 𝐹𝑘−3(𝑥),and 𝑟0(𝑥) without modifying the coefficients of 𝑟𝑘−1(𝑥), 𝑟𝑘−2(𝑥), or 𝑟0(𝑥). Againapplying (4.15) to eliminate 𝐹𝑘−3(𝑥) results in an expression in terms of 𝑟𝑘−1(𝑥),𝑟𝑘−2(𝑥), 𝑟𝑘−3(𝑥), 𝑟𝑘−4(𝑥), 𝐹𝑘−4(𝑥), and 𝑟0(𝑥) without modifying the coefficients of𝑟𝑘−1(𝑥), 𝑟𝑘−2(𝑥), 𝑟𝑘−3(𝑥), or 𝑟0(𝑥). Continuing in this way, the 𝑛-term recurrencerelations of the form (4.16) are obtained without modifying the coefficients of theprevious ones. Suppose that for some 0 < 𝑗 < 𝑘 − 1 the expression for 𝑟𝑘(𝑥) is ofthe form

𝑟𝑘(𝑥) =1

𝑝𝑘+1𝑞𝑘

[(𝑥− 𝑑𝑘)𝑟𝑘−1(𝑥)− 𝑔𝑘−1ℎ𝑘𝑟𝑘−2(𝑥)− ⋅ ⋅ ⋅

⋅ ⋅ ⋅ − 𝑔𝑗+1��×𝑗+1,𝑘ℎ𝑘𝑟𝑗(𝑥) + 𝑝𝑗+1��

×𝑗,𝑘ℎ𝑘𝐹𝑗(𝑥) +

𝑃𝑛−𝑘

𝑃𝑛𝑟0(𝑥)

].

(4.18)

Using (4.15) for 𝐹𝑗(𝑥) gives the relation

𝑟𝑘(𝑥) =1

𝑝𝑘+1𝑞𝑘

[(𝑥− 𝑑𝑘)𝑟𝑘−1(𝑥)− 𝑔𝑘−1ℎ𝑘𝑟𝑘−2(𝑥) − ⋅ ⋅ ⋅

⋅ ⋅ ⋅ − 𝑔𝑗 ��×𝑗,𝑘ℎ𝑘𝑟𝑗−1(𝑥) + 𝑝𝑗 ��×𝑗−1,𝑘ℎ𝑘𝐹𝑗−1(𝑥) +𝑃𝑛−𝑘

𝑃𝑛𝑟0(𝑥)

].

(4.19)

96 T. Bella et al.

Therefore since (4.17) is the case of (4.18) for 𝑗 = 𝑘 − 2, (4.18) is true for each

𝑗 = 𝑘 − 2, 𝑘 − 3, . . . , 0, and for 𝑗 = 0, using the fact that 𝐹0 = 0 we have exactly(4.3) as desired. □

5. Computing the coefficients 𝑷𝒌 of the master polynomial 𝑷 (𝒙)

Note that in order to use the recurrence relations of the previous section it isnecessary to decompose the master polynomial 𝑃 (𝑥) into the 𝑅 basis; that is, thecoefficients 𝑃𝑘 as in (2.4) must be computed. To this end, an efficient method ofcalculating these coefficients follows.

It is easily seen that the last polynomial 𝑟𝑛(𝑥) in the system 𝑅 does not affectthe resulting confederate matrix 𝐶𝑅(𝑃 ). Thus, if

�� = {𝑟0(𝑥), . . . , 𝑟𝑛−1(𝑥), 𝑥𝑟𝑛−1(𝑥)},we have 𝐶𝑅(𝑃 ) = 𝐶��(𝑃 ). Decomposing the polynomial 𝑃 (𝑥) into the �� basis can

be done recursively by setting 𝑟(0)𝑛 (𝑥) = 1 and then for 𝑘 = 0, . . . , 𝑛− 1 updating

𝑟(𝑘+1)𝑛 (𝑥) = (𝑥− 𝑥𝑘+1) ⋅ 𝑟(𝑘)𝑛 (𝑥).

Lemma 5.1. Let 𝑅 = {𝑟0(𝑥), . . . , 𝑟𝑛(𝑥)} be given by (2.3), and 𝑓(𝑥) =∑𝑘

𝑖=1 𝑎𝑖 ⋅𝑟𝑖(𝑥), where 𝑘 < 𝑛− 1. Then the coefficients of 𝑥 ⋅ 𝑓(𝑥) = ∑𝑘+1

𝑖=1 𝑏𝑖 ⋅ 𝑟𝑖(𝑥) can becomputed by ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

𝑏0...𝑏𝑘𝑏𝑘+1

0...0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦=

[𝐶𝑅(𝑟𝑛) 0

0 ⋅ ⋅ ⋅ 0 1𝛼𝑛

0

]⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

𝑎0...𝑎𝑘00...0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦. (5.1)

Proof. It can be easily checked that

𝑥 ⋅ [ 𝑟0(𝑥) ⋅ ⋅ ⋅ 𝑟𝑛(𝑥)]− [ 𝑟0(𝑥) ⋅ ⋅ ⋅ 𝑟𝑛(𝑥)

] ⋅ [ 𝐶𝑅(𝑟𝑛) 00 ⋅ ⋅ ⋅ 0 1

𝛼𝑛0

]=[0 ⋅ ⋅ ⋅ 0 𝑥 ⋅ 𝑟𝑛(𝑥)

].

Multiplying the latter equation by the column of the coefficients, we obtain (5.1).□

This lemma suggests the following algorithm for computing coefficients{𝑃0, 𝑃1, . . ., 𝑃𝑛−1, 𝑃𝑛} in (2.4) of the master polynomial.Algorithm 5.2 (Coefficients of the master polynomial in the 𝑅 basis). Cost: 𝒪(𝑛×𝑚(𝑛)), where 𝑚(𝑛) is the cost of multiplication of an 𝑛× 𝑛 quasiseparable matrixby a vector.


Input: A quasiseparable confederate matrix 𝐶𝑅(𝑟𝑛) and 𝑛 nodes 𝑥=(𝑥1,𝑥2,. . .,𝑥𝑛).

1. Set[𝑃(0)0 ⋅ ⋅ ⋅ 𝑃

(0)𝑛−1 𝑃

(0)𝑛

]=[1 0 ⋅ ⋅ ⋅ 0

]2. For 𝑘 = 1 : 𝑛,⎡⎢⎢⎢⎢⎣

𝑃(𝑘)0...

𝑃(𝑘)𝑛−1𝑃(𝑘)𝑛

⎤⎥⎥⎥⎥⎦ =([

𝐶��(𝑥𝑟𝑛−1(𝑥)) 00 ⋅ ⋅ ⋅ 0 1 0

]− 𝑥𝑘𝐼

)⎡⎢⎢⎢⎢⎣𝑃(𝑘−1)0...

𝑃(𝑘−1)𝑛−1𝑃(𝑘−1)𝑛

⎤⎥⎥⎥⎥⎦where �� = {𝑟0(𝑥), . . . , 𝑟𝑛−1(𝑥), 𝑥𝑟𝑛−1(𝑥)}.

3. Take[𝑃0 ⋅ ⋅ ⋅ 𝑃𝑛−1 𝑃𝑛

]=[𝑃(𝑛)0 ⋅ ⋅ ⋅ 𝑃

(𝑛)𝑛−1 𝑃

(𝑛)𝑛

]Output: Coefficients {𝑃0, 𝑃1, . . . , 𝑃𝑛−1, 𝑃𝑛} such that (2.4) is satisfied.

It is clear that the computational burden in implementing this algorithm isin multiplication of the matrix 𝐶��(𝑟𝑛) by the vector of coefficients. The cost ofeach such step is 𝒪(𝑚(𝑛)), where 𝑚(𝑛) is the cost of multiplication of an 𝑛 × 𝑛quasiseparable matrix by a vector, thus the cost of computing the 𝑛 coefficients is𝒪(𝑛 ×𝑚(𝑛)). Using a fast 𝒪(𝑛) algorithm for multiplication of a quasiseparablematrix by a vector from [5], the cost of this algorithm is 𝒪(𝑛2).

6. Special cases of these new inversion algorithms

In what follows we show how these algorithms (as the previous section containsa choice of three perturbed recurrence relations, each leading to an inversion al-gorithm for the corresponding polynomial-Vandermonde matrix) generalizes theprevious work in the important special cases of monomials, real orthogonal poly-nomials, and Szego polynomials. The reductions in all three special cases are sum-marized in Table 5.

6.1. First special case. Monomials and the classical Traub algorithm

As shown earlier, the well-known companion matrix (2.7) results when the poly-nomial system 𝑅 is simply a system of monomials. By choosing the generators𝑝𝑘 = 1, 𝑞𝑘 = 1, 𝑑𝑘 = 0, 𝑔𝑘 = 1, 𝑏𝑘 = 1, and ℎ𝑘 = 0, the matrix (4.1) reducesto (2.7), and also (4.2) reduces to the confederate matrix for the Horner polyno-mials (2.8). In this special case, the perturbed three-term recurrence relations ofTheorem 4.1 become

𝑟0(𝑥) = 𝑃𝑛, 𝑟𝑘(𝑥) = 𝑥𝑟𝑘−1(𝑥) + 𝑃𝑛−𝑘, (6.1)

coinciding with the known recurrence relations for the Horner polynomials, usedin the evaluation of the polynomial

𝑃 (𝑥) = 𝑃0 + 𝑃1𝑥+ ⋅ ⋅ ⋅+ 𝑃𝑛−1𝑥𝑛−1 + 𝑃𝑛𝑥𝑛. (6.2)

98 T. Bella et al.

Special Case R.R. Type Resulting R.R.

Theorem 4.1 – 3-term r.r. (6.1)

Monomials Theorem 4.2 – Szego-type r.r. (6.1)

Theorem 4.3 – [EGO05]-type r.r. (6.1)


Real orthogonal Theorem 4.2 – Szego-type r.r. N/A, 𝑏𝑘 = 0.



Szego polynomials Theorem 4.2 – Szego-type r.r. (6.11)


Table 5. Reduction of derived recurrence relations in special cases.

In fact, after eliminating the auxiliary polynomials present in Theorems 4.2 and4.3, these recurrence relations also reduce to (6.1). Thus all of the presented re-currence relations generalize those used in the classical Traub algorithm.

6.2. Second special case. Real orthogonal polynomials andthe Calvetti–Reichel algorithm

Consider the almost tridiagonal confederate matrix

𝐶𝑅(𝑃 ) =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

𝑑1 ℎ2 0 ⋅ ⋅ ⋅ 0 −𝑃0/𝑃𝑛𝑞1 𝑑2 ℎ3

. . .... −𝑃1/𝑃𝑛

0 𝑞2 𝑑3 ℎ4 0...

0 0 𝑞3 𝑑4. . . −𝑃𝑛−3/𝑃𝑛

.... . .

. . .. . .

. . . ℎ𝑛 − 𝑃𝑛−2/𝑃𝑛0 ⋅ ⋅ ⋅ 0 0 𝑞𝑛−1 𝑑𝑛 − 𝑃𝑛−1/𝑃𝑛

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦. (6.3)

The corresponding system of polynomials𝑅 satisfy three-term recurrence relations.Such confederate matrices can be seen as special cases of our general class bychoosing the generators 𝑝𝑘 = 1, 𝑏𝑘 = 0, and 𝑔𝑘 = 1, and in this case the matrix(4.1) reduces to (6.3).

To invert the corresponding polynomial-Vandermonde matrix by our algo-

rithm, we first find the confederate matrix 𝐶��(𝑃 ) of the polynomial system ��associated with 𝑅. That is, we must evaluate the polynomials corresponding to


the confederate matrix 𝐶𝑅(𝑃 ) given by⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

𝑑𝑛 − 𝑃𝑛−1/𝑃𝑛 ℎ𝑛 − 𝑃𝑛−2/𝑃𝑛 −𝑃𝑛−3/𝑃𝑛 ⋅ ⋅ ⋅ −𝑃1/𝑃𝑛 −𝑃0/𝑃𝑛𝑞𝑛−1 𝑑𝑛−1 ℎ𝑛−1

. . ....

0 𝑞𝑛−2 𝑑𝑛−2 ℎ𝑛−2 0...

0 0 𝑞𝑛−3 𝑑𝑛−3. . .

.... . .

. . .. . .

. . . ℎ20 ⋅ ⋅ ⋅ 0 0 𝑞1 𝑑1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.

(6.4)Note that the highlighted column corresponds to the full recurrence relation

𝑟3(𝑥) =1

𝑞𝑛−3(𝑥 − 𝑑𝑛−2)𝑟2(𝑥)− ℎ𝑛−1

𝑞𝑛−3𝑟1(𝑥) +

1

𝑞𝑛−3𝑃𝑛−3𝑃𝑛

𝑟0(𝑥) (6.5)

In this case the perturbed three-term recurrence relations from Theorem 4.1 aswell as the two-term recurrence relations from Theorem 4.3 both become

𝑟𝑘(𝑥) =1

𝑞𝑛−𝑘(𝑥− 𝑑𝑛−𝑘)𝑟𝑘−1(𝑥)− 𝑞𝑛−𝑘+1

𝑞𝑛−𝑘ℎ𝑛−𝑘+1𝑟𝑘−2(𝑥) +

1

𝑞𝑛−𝑘𝑃𝑛−𝑘 (6.6)

which coincides with the Clenshaw rule for evaluating

𝑃 (𝑥) = 𝑃0𝑟0(𝑥) + 𝑃1𝑟1(𝑥) + ⋅ ⋅ ⋅+ 𝑃𝑛−1𝑟𝑛−1(𝑥) + 𝑃𝑛𝑟𝑛(𝑥). (6.7)

Thus our formula generalizes both the Clenshaw rule and the algorithms designedfor inversion of three-term-Vandermonde matrices in [3] and [10].

Notice that the Szego-like two-term recurrence relations of Theorem 4.2 areinapplicable as 𝑏𝑘 = 0 is a necessary choice of generators.

6.3. Third special case. Szego polynomials and the algorithm of [18]

Next consider the important special case of the almost unitary Hessenberg matrixof Table 2,

𝐶𝑅(𝑃 ) =

⎡⎢⎢⎢⎢⎢⎣−𝜌∗0𝜌1 −𝜌∗0𝜇1𝜌2 −𝜌∗0𝜇1𝜇2𝜌3 ⋅ ⋅ ⋅ −𝜌∗0𝜇1 ⋅ ⋅ ⋅𝜇𝑛−1𝜌𝑛𝜇1 −𝜌∗1𝜌2 −𝜌∗1𝜇2𝜌3 ⋅ ⋅ ⋅ −𝜌∗1𝜇2 ⋅ ⋅ ⋅𝜇𝑛−1𝜌𝑛0 𝜇2 −𝜌∗2𝜌3 ⋅ ⋅ ⋅ −𝜌∗2𝜇3 ⋅ ⋅ ⋅𝜇𝑛−1𝜌𝑛...

. . .. . .

. . ....

0 ⋅ ⋅ ⋅ 0 𝜇𝑛−1 −𝜌∗𝑛−1𝜌𝑛

⎤⎥⎥⎥⎥⎥⎦ (6.8)

that corresponds to the Szego polynomials (represented by the reflection coeffi-cients 𝜌𝑘 and complimentary parameters 𝜇𝑘), and polynomial 𝑃 (𝑥). The Szegopolynomials are known to satisfy the two-term recurrence relations (1.3) as wellas the three-term recurrence relations

𝜙#0 (𝑥) = 1, 𝜙#1 (𝑥) =1

𝜇1⋅ 𝑥𝜙#0 (𝑥) −

𝜌1𝜇1𝜙#0 (𝑥)

𝜙#𝑘 (𝑥) =

[1

𝜇𝑘⋅ 𝑥+ 𝜌𝑘

𝜌𝑘−11

𝜇𝑘

]𝜙#𝑘−1(𝑥) −

𝜌𝑘𝜌𝑘−1

𝜇𝑘−1𝜇𝑘

⋅ 𝑥 ⋅ 𝜙#𝑘−2(𝑥)(6.9)

100 T. Bella et al.

(see [13], [9]). As above, the polynomials associated with the system of Szegopolynomials are determined by the confederate matrix 𝐶��(𝑃 ) given by⎡⎢⎢⎢⎢⎢⎢⎣

−𝜌𝑛𝜌∗𝑛−1 − 𝑃𝑛−1

𝑃𝑛−𝜌𝑛𝜇𝑛−1𝜌∗𝑛−2 − 𝑃𝑛−2

𝑃𝑛⋅ ⋅ ⋅ −𝜌𝑛𝜇𝑛−1 . . . 𝜇1𝜌∗0 − 𝑃0

𝑃𝑛

𝜇𝑛−1 −𝜌𝑛−1𝜌∗𝑛−2 ⋅ ⋅ ⋅ −𝜌𝑛−1𝜇𝑛−2 . . . 𝜇1𝜌∗00

. . .. . .

......

. . .. . .

...0 ⋅ ⋅ ⋅ 𝜇1 −𝜌1𝜌∗0

⎤⎥⎥⎥⎥⎥⎥⎦ .(6.10)

For this special case, let 𝑝𝑘 = 1, 𝑞𝑘 = 𝜇𝑘, 𝑑𝑘 = −𝜌𝑘𝜌∗𝑘−1, 𝑔𝑘 = 𝜌∗𝑘−1, 𝑏𝑘 = 𝜇𝑘−1,and ℎ𝑘 = −𝜇𝑘−1𝜌𝑘 (alternatively 𝑔𝑘 = 𝜌∗𝑘−1𝜇𝑘, 𝑏𝑘 = 𝜇𝑘, ℎ𝑘 = −𝜌𝑘). This choiceof generators reduces (4.1) to the matrix (6.8) as well as (4.2) to (6.10), and inthis case the perturbed two-term recurrence relations of Theorem 4.2 become[

𝜙0(𝑥)

𝜙#0 (𝑥)

]=

1

𝜇𝑛

[ −𝜌𝑛1

],[

𝜙𝑘(𝑥)

𝜙#𝑘 (𝑥)

]=

1

𝜇𝑛−𝑘

[1 −𝜌∗𝑛−𝑘

−𝜌𝑛−𝑘 1

][𝜙𝑘−1(𝑥)𝑥𝜙#𝑘−1(𝑥) + 𝑃𝑛−𝑘

],

(6.11)

coinciding with those recurrence relations derived in [18]. The recurrence relationsfrom Theorem 4.3 reduce to new two-term recurrence relations; that is, relationsthat do not generalize those derived in [18]. They become[

𝐹𝑘(𝑥)

𝜙#𝑘 (𝑥)

]=

1

𝜇𝑛−𝑘

[𝜇𝑛−𝑘𝜇𝑛−𝑘+1 −𝜇𝑛−𝑘𝜌

∗𝑛−𝑘+1

−𝜇𝑛−𝑘+1𝜌𝑛−𝑘 𝑥+ 𝜌𝑛−𝑘𝜌∗𝑛−𝑘+1

][𝐹𝑘−1(𝑥)𝜙#𝑘−1(𝑥)

]

+

[0𝑃𝑛−𝑘

].

(6.12)

Also, the perturbed three-term recurrence relations of Theorem 4.1 reduce to

𝜙0(𝑥) =1

𝜇𝑛, 𝜙1(𝑥) =

{1

𝜇𝑛−1⋅ 𝑥𝜙0(𝑥)− 𝜌𝑛−1𝜌

∗𝑛

𝜇𝑛−1𝜙0(𝑥)

}+𝑃𝑛−1𝜇𝑛−1

.

𝜙𝑘(𝑥) =

[1

𝜇𝑛−𝑘⋅ 𝑥+ 𝜌𝑛−𝑘

𝜌𝑛−𝑘+1

1

𝜇𝑛−𝑘

]𝜙𝑘−1(𝑥) − 𝜌𝑛−𝑘

𝜌𝑛−𝑘+1

𝜇𝑛−𝑘+1

𝜇𝑛−𝑘⋅ 𝑥 ⋅ 𝜙𝑘−2(𝑥)

+𝑃𝑛−𝑘 − 𝑃𝑛−𝑘+1𝜇𝑛−𝑘+1

𝜌𝑛−𝑘

𝜌𝑛−𝑘+1

𝜇𝑛−𝑘(6.13)

in this case, also coinciding with the perturbed three-term recurrence relations in[18]. Thus both of these theorems generalize the recurrence relations derived in[18] as well.


7. Numerical experiments

The numerical properties of the Traub algorithm and its generalizations (that arethe special cases of our general algorithm) were studied by many different authors.It was noticed in [12] that a version of the Traub algorithm can yield high accuracyin certain cases if the algorithm is preceded with the Leja ordering of the nodes;that is, ordering such that

∣𝑥1∣ = max1⩽𝑖⩽𝑛

∣𝑥𝑖∣,𝑘−1∏𝑗=1

∣𝑥𝑘 − 𝑥𝑗 ∣ = max𝑘⩽𝑖⩽𝑛

𝑘−1∏𝑗=1

∣𝑥𝑖 − 𝑥𝑗 ∣, 𝑘 = 2, . . . , 𝑛− 1

(see [22], [15], [20]). It was noticed in [12] that the same is true for Chebyshev–Vandermonde matrices.

No error analysis was done, but the conclusions of the above authors wasthat in many cases the Traub algorithm and its extensions can yield much betteraccuracy than Gaussian elimination, even for very ill-conditioned matrices.

We made our preliminary experiments with the general algorithm, and ourconclusions are consistent with the experience of our colleagues. In all cases westudied the proposed algorithm yields better accuracy than Gaussian elimination,e.g., in the new special cases of Szego–Vandermonde and (𝐻, 1)-quasiseparable-Vandermonde matrices. However, our experiments need to be done for differentspecial cases and also the numerical properties of different recurrence relations areworth analyzing. This is a topic for future study.

The algorithm has been implemented in MATLAB version 7. The results ofthe algorithm using standard MATLAB code, and hence double precision arith-metic, were compared with exact solutions calculated using the MATLAB Sym-bolic Toolbox command vpa(), which allows software-implemented precision ofarbitrary numbers of digits. The number of digits was set to 64, however in caseswhere the condition number of the coefficient matrix exceeded 1030, this was raisedto 100 digits to maintain accuracy.

We compare the forward accuracy of the inverse computed by the algorithmwith respect to the inverse computed in high precision, defined by

𝑒 =∥𝑉𝑅(𝑥)−1 − 𝑉𝑅(𝑥)−1∥2

∥𝑉𝑅(𝑥)−1∥2 (7.1)

where 𝑉𝑅(𝑥)−1 is the solution computed by each algorithm in MATLAB in double

precision, and 𝑉𝑅(𝑥)−1 is the exact solution. In the tables, TraubQS denotes the

proposed Traub-like algorithm, and inv() indicates MATLAB’s inversion com-mand. Finally, cond(𝑉 ) denotes the condition number of the matrix 𝑉 computedvia the MATLAB command cond().

Experiment 1. In this experiment, the problem was chosen by choosing the gen-erators that define the recurrence relations of the polynomial system randomly

102 T. Bella et al.

in (−1, 1), and the nodes 𝑥𝑘 were selected equidistant on (−1, 1) via the formula

𝑥𝑘 = −1 + 2

(𝑘

𝑛− 1), 𝑘 = 0, 1, . . . , 𝑛− 1.

We test the accuracy of the inversion algorithm for various sizes 𝑛 of matricesgenerated in this way. Some results are tabulated in Table 6, and shown graphicallyin Figure 1.

𝑛 cond(V) inv() TraubQS

10 4.2e04 4.1e-14 3.4e-152.2e05 2.5e-14 6.3e-153.7e08 1.0e-13 8.9e-14

15 1.1e10 3.5e-11 3.5e-111.1e11 1.5e-12 4.8e-134.7e11 1.3e-13 7.7e-14

20 7.6e14 1.1e-10 3.4e-121.2e15 4.2e-11 1.1e-117.8e17 1.2e-09 1.7e-15

25 4.8e19 1.2e-09 1.7e-131.1e24 5.9e-07 1.3e-111.5e27 8.4e-08 2.4e-09

30 3.3e24 7.2e-07 1.1e-135.0e27 2.8e-06 1.7e-111.8e30 1.3e-03 9.5e-10

35 7.3e23 2.4e-04 6.9e-108.3e26 2.6e-03 1.2e-061.4e27 2.9e-05 1.4e-08

40 1.1e31 8.2e-02 2.4e-132.4e32 3.4e+00 9.9e-121.7e33 1.2e-01 1.0e-08

45 4.3e30 1.7e-01 1.7e-051.7e31 5.9e-01 1.0e-083.9e35 1.0e-02 2.4e-08

50 2.1e42 1.0e+00 4.7e-063.9e44 1.0e+00 7.0e-066.6e45 1.0e+00 6.3e-06

Table 6. Equidistant nodes on (−1, 1).


1010

1020

1030

1040

1016

1014

1012

1010

108

106

104

102

100

cond(V)

forw

ard

erro

r

Equidistant Nodes on ( 1,1)

inv()Traub QS

Figure 1. Equidistant nodes on (−1, 1).

Notice that the performance of the proposed inversion algorithm is an im-provement over that of MATLAB’s standard inversion command inv() in thisspecific case.

Experiment 2. Next, the values for the generators and the nodes were chosenrandomly on the unit disc. We test the accuracy for various 30 × 30 matricesgenerated in this way, and present some results in Table 7 and Figure 2.

1025

1030

1016

1014

1012

1010

108

106

104

102

100

cond(V)

forw

ard

erro

r

Random Parameters on the Unit Disc

inv()Traub QS

Figure 2. Random parameters on the unit disc.

104 T. Bella et al.

cond(V) inv() TraubQS

1.7e21 1.3e-07 3.5e-143.9e23 1.2e-05 9.6e-154.3e23 2.7e-03 1.2e-142.8e24 2.4e-05 8.4e-132.9e24 3.9e-03 4.3e-12

1.8e25 6.8e-07 2.6e-122.2e25 8.9e-03 3.4e-143.1e25 1.3e-03 3.6e-143.5e25 2.9e-03 7.9e-146.8e25 1.0e+00 2.2e-11

2.2e27 1.0e-02 2.9e-114.9e27 3.6e+00 2.3e-136.6e27 9.9e+00 7.6e-137.6e27 4.6e-04 2.0e-122.0e28 1.9e-03 5.7e-14

2.4e28 6.9e-04 9.6e-152.6e28 2.5e-02 1.2e-135.2e28 2.4e-05 1.7e-126.9e30 1.2e-03 2.5e-141.4e33 1.0e+00 2.9e-13

Table 7. Random parameters on the unit disc.

8. Conclusions

In this paper we used properties of confederate matrices to extend the classicalTraub algorithm for inversion of Vandermonde matrices to the general polynomial-Vandermonde case. The relation between polynomial systems satisfying some re-currence relations and quasiseparable matrices allowed an order of magnitudecomputational savings in this case, resulting in an 𝒪(𝑛2) algorithm as opposedto Gaussian elimination, which requires 𝒪(𝑛3) operations. Finally, some numeri-cal experiments were presented that indicate that, under some circumstances, theresulting algorithm can give better performance than Gaussian elimination.


References

[1] M. Bakonyi and T. Constantinescu, Schur’s algorithm and several applications, inPitman Research Notes in Mathematics Series, vol. 61, Longman Scientific and Tech-nical, Harlow, 1992.

[2] T. Bella, Y. Eidelman, I. Gohberg, I. Koltracht and V. Olshevsky, A Bjorck–Pereyra-type algorithm for Szego–Vandermonde matrices based on properties of unitary Hes-senberg matrices, Linear Algebra and Applications, Volume 420, Issues 2-3 pp. 634–647, 2007.

[3] D. Calvetti and L. Reichel, Fast inversion of Vandermonde-like matrices involvingorthogonal polynomials, BIT, 1993.

[4] Y. Eidelman and I. Gohberg, On a new class of structured matrices, Integral Equa-tions and Operator Theory, 34 (1999), 293–324.

[5] Y. Eidelman and I. Gohberg, Linear complexity inversion algorithms for a class ofstructured matrices, Integral Equations and Operator Theory, 35 (1999), 28–52.

[6] Y. Eidelman and I. Gohberg, A modification of the Dewilde–van der Veen methodfor inversion of finitestructured matrices, Linear Algebra Appl., 343-344 (2002), 419–450.

[7] Y. Eidelman, I. Gohberg and V. Olshevsky, Eigenstructure of Order-One-Quasi-separable Matrices. Three-term and Two-term Recurrence Relations, Linear Algebraand its Applications, Volume 405, 1 August 2005, pages 1–40.

[8] G. Forney, Concatenated codes, The M.I.T. Press, 1966, Cambridge.

[9] L.Y. Geronimus, Polynomials orthogonal on a circle and their applications, Amer.Math. Translations, 3 pp. 1–78, 1954 (Russian original 1948).

[10] I. Gohberg and V. Olshevsky, Fast inversion of Chebyshev–Vandermonde matrices,Numerische Mathematik, 67, No. 1 (1994), 71–92.

[11] I. Gohberg and V. Olshevsky, A fast generalized Parker–Traub algorithm for in-version of Vandermonde and related matrices, Journal of Complexity, 13(2) (1997),208–234.A short version in pp. in Communications, Computation, Control and Signal Process-ing: A tribute to Thomas Kailath, eds. A. Paulraj, V. Roychowdhury and C. Shaper,Kluwer Academic Publishing, 1996, 205–221.

[12] I. Gohberg and V. Olshevsky, The fast generalized Parker–Traub algorithm for inver-sion of Vandermonde and related matrices, J. of Complexity, 13(2) (1997), 208–234.

[13] U. Grenader and G. Szego, Toeplitz forms and Applications, University of CaliforniaPress, 1958.

[14] W.G. Horner, A new method of solving numerical equations of all orders by contin-uous approximation, Philos. Trans. Roy. Soc. London, (1819), 308–335.

[15] N. Higham, Stability analysis of algorithms for solving confluent Vandermonde-likesystems, SIAM J. Matrix Anal. Appl., 11(1) (1990), 23–41.

[16] T. Kailath and V. Olshevsky, Displacement structure approach to polynomial Van-dermonde and related matrices, Linear Algebra and Its Applications, 261 (1997),49–90.

[17] J. Maroulas and S. Barnett, Polynomials with respect to a general basis. I. Theory,J. of Math. Analysis and Appl., 72 (1979), 177–194.

106 T. Bella et al.

[18] V. Olshevsky, Eigenvector computation for almost unitary Hessenberg matrices andinversion of Szego–Vandermonde matrices via Discrete Transmission lines. LinearAlgebra and Its Applications, 285 (1998), 37–67.

[19] V. Olshevsky, Associated polynomials, unitary Hessenberg matrices and fast general-ized Parker–Traub and Bjorck–Pereyra algorithms for Szego–Vandermonde matricesinvited chapter in the book “Structured Matrices: Recent Developments in The-ory and Computation,” 67–78, (D. Bini, E. Tyrtyshnikov, P. Yalamov, eds.), 2001,NOVA Science Publ., USA.

[20] V. Olshevsky, Pivoting for structured matrices and rational tangential interpolation,in Fast Algorithms for Structured Matrices: Theory and Applications, CONM/323,pp. 1–75, AMS publications, May 2003.

[21] F. Parker, Inverses of Vandermonde matrices, Amer. Math. Monthly, 71 (1964), 410–411.

[22] L. Reichel and G. Opfer, Chebyshev–Vandermonde systems, Math. of Comp., 57(1991), 703–721.

[23] J. Stoer, R. Bulirsch, Introduction to Numerical Analysis, Springer-Verlag, 1992,277–301.

[24] J. Traub, Associated polynomials and uniform methods for the solution of linearproblems, SIAM Review, 8, No. 3 (1966), 277–301.

T. BellaDepartment of MathematicsUniversity of Rhode IslandKingston, RI 02881, USAe-mail: [email protected]

Y. Eidelman and I. GohbergZ”L

School of Mathematical SciencesRaymond and Beverly Sackler Faculty of Exact SciencesTel Aviv UniversityRamat-Aviv 69978, Israele-mail: [email protected]

V. OlshevskyDepartment of MathematicsUniversity of ConnecticutStorrs, CT 06269, USAe-mail: [email protected]

E. TyrtyshnikovInstitute of Numerical MathematicsRussian Academy of SciencesGubkina Street, 8Moscow, 119991, Russiae-mail: [email protected]






Long Proofs of Two Carlson–SchneiderType Inertia Theorems

Harry Dym and Motke Porat

To Leonid Lerer on the occasion of his retirement from the faculty of the Department ofMathematics at the Technion, with affection and respect.

Abstract. This expository note is devoted to a discussion of the equivalenceof inertia theorems of the Carlson–Schneider type with the existence of finite-dimensional reproducing kernel Krein spaces of the de Branges type. The firstfive sections focus on an inertia theorem connected with a Lyapunov equation.A sixth supplementary section sketches an analogous treatment of the Steinequation. The topic was motivated by a question raised by Leonid Lerer.

Mathematics Subject Classification (2010). 46C20, 46E22, 47B32, 47B50,93B20.

Keywords. Inertia theorems, realization theory, reproducing kernel spaces,finite-dimensional de Branges–Krein spaces, factorization of rational matrix-valued functions, Lyapunov–Stein equations.

1. Introduction

The papers [9] by Lerer and Tismenetsky and [5] by Dym and Volok both studythe zero distribution of matrix polynomials, but by very different methods. Inparticular, [9] rests heavily on the spectral theory of matrix polynomials thatis conveniently summarized in the monograph [6] and on the Carlson–Schneiderinertia theorem, whereas [5] uses reproducing kernel space methods. Some yearsago, in an exchange of E-mails with the first listed author of this note, Leonidwondered how the analysis in [5] managed to avoid invoking the Carlson–Schneiderinertia theorem. The purpose of the first five sections is to suggest an answer tothis question by showing how to deduce Carlson–Schneider type theorems fromreproducing kernel formulas. To be more precise, the theorem we consider is weakerthan the Carlson–Schneider theorem because of the observability assumption; itcorresponds to Corollary 1 on p. 449 of [8], which is credited to C.T. Chen [2]

108 H. Dym and M. Porat

and H.K. Wimmer [10]. The proof is a bit on the long side, but is completelyself-contained and uses only elementary ideas from linear system theory. A sixthsection deals with an analogous inertia theorem for the disc.

The notation Π+ (resp., Π−) for the open right (resp., left) half-plane andℰ−(𝐴), ℰ0(𝐴) and ℰ+(𝐴) for the sum of the algebraic multiplicities of the eigen-values of a matrix 𝐴 ∈ ℂ𝑛×𝑛 in Π−, 𝑖ℝ and Π+, respectively, will be needed toformulate the version of the Carlson–Schneider inertia theorem under considera-tion:

Theorem 1.1. If 𝐴 ∈ ℂ𝑛×𝑛, 𝐶 ∈ ℂ𝑚×𝑛, (𝐶,𝐴) is an observable pair and 𝑃 = 𝑃 ∗ ∈ℂ𝑛×𝑛 is a solution of the Lyapunov equation

𝐴∗𝑃 + 𝑃𝐴+ 𝐶∗𝐶 = 0, (1.1)

then

(1) 𝜎(𝐴) ∩ 𝑖ℝ = ∅ and 𝑃 is invertible, i.e., ℰ0(𝐴) = ℰ0(𝑃 ) = 0,(2) ℰ+(𝐴) = ℰ−(𝑃 ) and ℰ−(𝐴) = ℰ+(𝑃 ).

This note is organized as follows: In Section 2 we present some preliminaryfacts and notation for subsequent use. A short well-known proof of Theorem 1.1based on elementary tools of linear system theory is presented in Section 3. InSection 4 Theorem 1.1 is used to establish a finite-dimensional reproducing kernelKrein space (RKKS) with a reproducing kernel (RK) of the special form

𝐾𝜔(𝜆) =𝐼𝑚 −Θ(𝜆)Θ(𝜔)∗

𝜌𝜔(𝜆)(1.2)

where 𝜌𝜔(𝜆) = 𝜆 + 𝜔∗ and Θ admits a factorization Θ = Θ−11 Θ2, with Θ1 andΘ2 both inner rmvf’s (rational matrix-valued functions) with respect to Π+. Aproof of Theorem 1.1 that is based on methods and formulas of reproducing kernelspaces is presented in Section 5. An analogous treatment of an inertia theorem forthe Stein equation is sketched in the sixth section.

2. Preliminaries

We begin with a lemma that verifies assertion (1) of Theorem 1.1 and reduces theproblem of establishing (2) to the case where the matrix 𝐴 is of particular form.Subsequently, we present some notation and formulas that will be used frequentlyin the later sections.

Lemma 2.1. If the assumptions in Theorem 1.1 are in force, then 𝜎(𝐴) ∩ 𝑖ℝ = ∅and 𝑃 is invertible and without loss of generality 𝐴 may be assumed to be of theform

𝐴 = 𝐴+, or 𝐴 = 𝐴−, or 𝐴 =

[𝐴+ 00 𝐴−

], with 𝜎(𝐴±) ⊂ Π±. (2.1)

Long Proofs of Two Carlson–Schneider Type Inertia Theorems 109

Proof. The proof is separated into parts:

1. If 𝑃𝑢 = 0 for some 𝑢 ∈ ℂ𝑛, then, by (1.1),

0 = 𝑢∗𝐴∗𝑃𝑢+ 𝑢∗𝑃𝐴𝑢+ 𝑢∗𝐶∗𝐶𝑢 = 𝑢∗𝐶∗𝐶𝑢 =⇒ 𝐶𝑢 = 0.

Therefore, (1.1) implies that 𝑃𝐴𝑢 = 0 and hence, by repeating the previ-ous argument, that 𝐶𝐴𝑢 = 0 and 𝑃𝐴2𝑢 = 0. Thus, 𝐶𝐴𝑖𝑢 = 0 for everynonnegative integer 𝑖 and hence, as (𝐶,𝐴) is observable, 𝑢 = 0, i.e., 𝑃 isinvertible.

2. If 𝐴𝑢 = 𝜆𝑢 for some 𝑢 ∈ ℂ𝑛 and 𝜆 ∈ ℂ, then 𝑢∗𝐴∗ = 𝜆∗𝑢∗ and hence (1.1)implies that

0 = 𝑢∗𝐴∗𝑃𝑢+ 𝑢∗𝑃𝐴𝑢+ 𝑢∗𝐶∗𝐶𝑢 = (𝜆 + 𝜆∗)𝑢∗𝑃𝑢+ 𝑢∗𝐶∗𝐶𝑢

= 𝑢∗𝐶∗𝐶𝑢 if 𝜆 ∈ 𝑖ℝThus, 𝐶𝐴𝑘𝑢 = 𝜆𝑘𝐶𝑢 = 0 in this case, which implies that 𝑢 = 0, since thepair (𝐶,𝐴) is observable. Therefore, 𝜎(𝐴) ∩ 𝑖ℝ = ∅.

3. As 𝜎(𝐴)∩ 𝑖ℝ = ∅, 𝐴 = 𝑈𝐽𝑈−1, where 𝐽 is a Jordan matrix with 𝜎(𝐽) ⊂ Π+

or 𝜎(𝐴) ⊂ Π+ or is of the form

𝐽 =

[𝐴+ 00 𝐴−

]where 𝜎(𝐴±) ⊂ Π±.

Then 𝑃 is an invertible Hermitian solution of equation (1.1) if and only if𝑈∗𝑃𝑈 a Hermitian invertible solution of the equation

𝐽∗(𝑈∗𝑃𝑈) + (𝑈∗𝑃𝑈)𝐽 + (𝐶𝑈)∗𝐶𝑈 = 0.

Moreover, 𝜎(𝐽) ∩ 𝑖ℝ = ∅, ℰ±(𝐽) = ℰ±(𝐴), ℰ±(𝑈∗𝑃𝑈) = ℰ±(𝑃 ) and the pair(𝐶𝑈, 𝐽) is observable, since

rank

[𝜆𝐼𝑛 −𝐴𝐶

]= rank

[𝑈 00 𝐼𝑚

] [𝜆𝐼𝑛 − 𝐽𝐶𝑈

]𝑈−1 = rank

[𝜆𝐼𝑛 − 𝐽𝐶𝑈

]. □

In view of the preceding lemma, we can without loss of generality assumethat 𝐴 is of the form (2.1) and 𝑘 := ℰ+(𝐴). In this note we shall always assumethat 1 ≤ 𝑘 ≤ 𝑛 − 1 and shall leave the cases 𝑘 = 0 and 𝑘 = 𝑛 to the reader.Correspondingly, let 𝐶 =

[𝐶1 𝐶2

]with 𝐶1 ∈ ℂ𝑚×𝑘, 𝐶2 ∈ ℂ𝑚×(𝑛−𝑘) and 𝑃 =[

𝑃11 𝑃12𝑃21 𝑃22

]with 𝑃11 ∈ ℂ𝑘×𝑘, 𝑃22 ∈ ℂ(𝑛−𝑘)×(𝑛−𝑘) be the decompositions of the

matrices 𝐶 and 𝑃 that are conformal with that of 𝐴. Then the Lyapunov equation(1.1) is equivalent to the first, second and fourth of the following four equations:

𝐴∗+𝑃11 + 𝑃11𝐴+ + 𝐶∗1𝐶1 = 0, (2.2)

𝐴∗−𝑃21 + 𝑃21𝐴+ + 𝐶∗2𝐶1 = 0, (2.3)

𝐴∗+𝑃12 + 𝑃12𝐴− + 𝐶∗1𝐶2 = 0 (2.4)

and

𝐴∗−𝑃22 + 𝑃22𝐴− + 𝐶∗2𝐶2 = 0. (2.5)


Finally, let 𝒮𝑚×𝑚𝑘 (Π+) denote the generalized Schur class of mvf’s (matrix-valued

functions) 𝑠(𝜆), that are meromorphic in Π+ and for which the kernel

Λ𝑠𝜔(𝜆) =

𝐼𝑚 − 𝑠(𝜆)𝑠(𝜔)∗𝜌𝜔(𝜆)

has 𝑘 negative squares on 𝔥+𝑠 × 𝔥+𝑠 , and 𝔥+𝑠 denotes the domain of analyticity of 𝑠in Π+.

3. A quick proof of Theorem 1.1

In this section we present a variant of the well-known short proof of Theorem 1.1(see, e.g., [8] or [3]).

Proof. Assertion (1) of the theorem is verified in Lemma 2.1. Next, as 𝜎(𝐴±) ⊂ Π±,(2.2) and (2.5) both have unique solutions:

𝑃11 = −∫ ∞0

𝑒−𝑡𝐴∗+𝐶∗1𝐶1𝑒

−𝑡𝐴+𝑑𝑡 ≤ 0 and 𝑃22 =

∫ ∞

0

𝑒𝑡𝐴∗−𝐶∗2𝐶2𝑒

𝑡𝐴−𝑑𝑡 ≥ 0,

respectively. The observability of the pair (𝐶,𝐴) implies that the pairs (𝐶1, 𝐴+)and (𝐶2, 𝐴−) are observable, as is perhaps verified most easily by the Popov–Belevitch–Hautus test. Then Lemma 2.1 implies that 𝑃11 and 𝑃22 are invertiblematrices and hence, by Schur complements, that

𝑃 =

[𝐼𝜅 𝑃−111 𝑃120 𝐼𝑛−𝑘

]∗ [𝑃11 00 𝑃22 − 𝑃21𝑃−111 𝑃12

] [𝐼𝜅 𝑃−111 𝑃120 𝐼𝑛−𝑘

].

Thus, by the Sylvester law of inertia,

ℰ+(𝑃 ) = ℰ+(𝑃11) + ℰ+(𝑃22 − 𝑃21𝑃−111 𝑃12)

= ℰ+(𝑃22 − 𝑃21𝑃−111 𝑃12)

= 𝑛− 𝑘 = ℰ−(𝐴)as 𝑃22 > 0, 𝑃11 < 0 and 𝑃12 = 𝑃

∗21, and

ℰ−(𝑃 ) = ℰ−(𝑃11) + ℰ−(𝑃22 − 𝑃21𝑃−111 𝑃12)

= ℰ−(𝑃11)= 𝑘 = ℰ+(𝐴). □

4. From the inertia theorem to a RKKS

In this section we will establish a finite-dimensional reproducing kernel Krein spaceℳ with a RK of the form (1.2) and then use the inertia theorem to obtain a coprimefactorization formula for the rmvf Θ ∈ 𝒮𝑚×𝑚

ℰ−(𝑃 )(Π+) of the form

Θ = Θ−11 Θ2,


where Θ1 and Θ2 are both inner rational matrix-valued functions (rmvf’s) with re-spect to Π+, and hence Blaschke–Potapov products. Moreover, we will give explicitrealization formulas for Θ1 and Θ2 that are minimal.

Theorem 4.1. If 𝐴 ∈ ℂ𝑛×𝑛, 𝐶 ∈ ℂ𝑚×𝑛, the pair (𝐶,𝐴) is observable and 𝑃 =𝑃 ∗ ∈ ℂ𝑛×𝑛 is a solution of the Lyapunov equation (1.1), then:

(1) 𝜎(𝐴) ∩ 𝑖ℝ = ∅ and 𝑃 is invertible, i.e., ℰ0(𝐴) = ℰ0(𝑃 ) = 0,

(2) The space

ℳ = {𝐹 (𝜆)𝑢 : 𝑢 ∈ ℂ𝑛} (4.1)

with

𝐹 (𝜆) = 𝐶(𝜆𝐼𝑛 −𝐴)−1, 𝜆 ∈ ℂ ∖ 𝜎(𝐴) (4.2)

and indefinite inner product

⟨𝐹 (𝜆)𝑢, 𝐹 (𝜆)𝑣⟩ℳ = 𝑣∗𝑃𝑢, (4.3)

is an 𝑛-dimensional RKKS with RK

𝐾𝜔(𝜆) = 𝐹 (𝜆)𝑃−1𝐹 (𝜔)∗, 𝜆, 𝜔 ∈ ℂ ∖ 𝜎(𝐴). (4.4)

(3) The RK may be expressed in the form


𝜌𝜔(𝜆), 𝜆, 𝜔 ∈ ℂ ∖ 𝜎(𝐴), (4.5)

where the 𝑚×𝑚 rmvf

Θ(𝜆) = 𝐼𝑚 − 𝐶(𝜆𝐼𝑛 −𝐴)−1𝑃−1𝐶∗ (4.6)

admits the factorization

Θ(𝜆) = Θ1(𝜆)−1Θ2(𝜆), (4.7)

with

Θ1(𝜆) = 𝐼𝑚 + 𝐶1(𝜆𝐼𝑘 −𝐴1)−1𝑃−111 𝐶

∗1 , (4.8)

Θ2(𝜆) = 𝐼𝑚 + 𝐶(𝜆𝐼𝑛−𝑘 −𝐴−)−1𝑌 𝐶∗, (4.9)

𝐴1 = −𝑃−111 𝐴∗+𝑃11, 𝐶 = 𝐶2 − 𝐶1𝑃

−111 𝑃12

and

𝑌 = −(𝑃22 − 𝑃21𝑃−111 𝑃12)−1.

(4) The rmvf’s Θ1 and Θ2 are finite Blaschke–Potapov products that are bothinner with respect to Π+ and are left coprime.

(5) The realizations (4.8) and (4.9) are minimal.

Proof. Assertion (1) is verified in Lemma 2.1. Next, sinceℳ is a RKKS with RKthat is given by formula (4.4) and (𝐶,𝐴) is an observable pair, the columns of thermvf 𝐹 are linearly independent. Therefore, (2) holds.

The formula (4.5) in (3) may be verified by a straightforward calculationthat uses the Lyapunov equation (1.1). Another direct calculation serves to verify


the factorization formula and it is easily seen that 𝑃11 is invertible (just as in theproof in Section 3) and hence that 𝑃22−𝑃12𝑃−111 𝑃21 is also invertible (by the Schurcomplements formula for 𝑃 ) and 𝑌 is well defined. Thus, (3) holds.

The observability of the pairs (𝐶1, 𝐴+) and (𝐶2, 𝐴−) is inherited from theobservability of the pair (𝐶,𝐴) (as is most easily seen by the Popov–Belevitch–Hautus test). Therefore, successive applications of Theorem 1.1 to equation (2.2)for the 𝑘 × 𝑘 matrix 𝑃11 and to equation (2.5) for the (𝑛 − 𝑘) × (𝑛 − 𝑘) matrix𝑃22, yields the implications

ℰ±(𝐴+) = ℰ∓(𝑃11) =⇒ ℰ−(𝑃11) = 𝑘 =⇒ 𝑃11 < 0.

and

ℰ±(𝐴−) = ℰ∓(𝑃22) =⇒ ℰ+(𝑃22) = 𝑛− 𝑘 =⇒ 𝑃22 > 0.

Thus, as 𝑃11 < 0 and 𝑃12 = 𝑃∗21,

𝑌 = −(𝑃22 − 𝑃21𝑃−111 𝑃12)−1 < 0.

A direct calculation shows that

Θ1(𝜆)∗Θ1(𝜆)− 𝐼𝑚 = (𝜆 + 𝜆∗)𝐶1(𝜆𝐼𝑘 +𝐴+)

−1𝑃11(𝜆∗𝐼𝑘 +𝐴∗+)−1𝐶∗1

and hence

Θ1(𝜆)∗Θ1(𝜆) = 𝐼𝑚 on 𝑖ℝ and Θ1(𝜆)

∗Θ1(𝜆) ≤ 𝐼𝑚 in Π+.

A similar calculation shows that, if 𝑍 = −𝑌 −1, thenΘ2(𝜆)

∗Θ2(𝜆) − 𝐼𝑚 = − (𝜆+ 𝜆∗)𝐶𝑍−1(𝜆∗𝐼𝑛−𝑘 −𝐴∗−)−1𝑍(𝜆𝐼𝑛−𝑘 −𝐴−)−1𝑍−1𝐶∗

+ 𝐶𝑍−1(𝜆∗𝐼𝑛−𝑘 −𝐴∗−)−1𝑋(𝜆𝐼𝑛−𝑘 −𝐴−)−1𝑍−1𝐶∗

with 𝑋 = 𝐴∗−𝑍 + 𝑍𝐴− + 𝐶∗𝐶. Now, multiplying equation (2.3) on the right by𝑃−111 𝑃12 and subtracting it from equation (2.5), yields the equation

0 = 𝐴∗−(𝑃22 − 𝑃21𝑃−111 𝑃12) + 𝑃22𝐴− − 𝑃21𝐴+𝑃−111 𝑃12 + 𝐶

∗2𝐶2 − 𝐶∗2𝐶1𝑃

−111 𝑃12

= 𝐴∗−(−𝑌 −1) + (𝑃22 − 𝑃21𝑃−111 𝑃12 + 𝑃21𝑃−111 𝑃12𝐴− − 𝑃21𝐴+𝑃

−)𝐴−111 𝑃12 + 𝐶

∗2𝐶

= 𝐴∗−𝑍 + 𝑍𝐴− + 𝑃21𝑃−111 𝑃12𝐴− − 𝑃21𝐴+𝑃

−111 𝑃12 + 𝐶

∗2𝐶 =M.

Next, multiplying equation (2.4) on the right by 𝑃21𝑃−111 and subtracting it fromM,

0 = 𝐴∗−𝑍 + 𝑍𝐴− − 𝑃21𝐴+𝑃−111 𝑃12 + 𝐶

∗2𝐶 − 𝑃21𝑃−111 𝐴

∗+𝑃12 − 𝑃21𝑃−111 𝐶

∗1𝐶2

= 𝐴∗−𝑍 + 𝑍𝐴− − 𝑃21𝑃−111 (𝑃11𝐴+ +𝐴∗+𝑃11)𝑃

−111 𝑃12 + 𝐶

∗2𝐶 + (𝐶∗ − 𝐶∗2 )𝐶2

= 𝐴∗−𝑍 + 𝑍𝐴− + 𝑃21𝑃−111 𝐶

∗1𝐶1𝑃

−111 𝑃12 − (𝐶 − 𝐶2)

∗(𝐶 − 𝐶2) + 𝐶∗𝐶

= 𝐴∗−𝑍 + 𝑍𝐴− + 𝐶∗𝐶 + 𝑃21𝑃

−111 𝐶

∗1𝐶1𝑃

−111 𝑃12 − (𝐶1𝑃

−111 𝑃12)

∗(𝐶1𝑃−111 𝑃12)

= 𝐴∗−𝑍 + 𝑍𝐴− + 𝐶∗𝐶.

Therefore,

Θ2(𝜆)∗Θ2(𝜆)− 𝐼𝑚 = −(𝜆+ 𝜆∗)𝐶𝑍−1(𝜆∗𝐼𝑛−𝑘 −𝐴∗−)−1𝑍(𝜆𝐼𝑛−𝑘 −𝐴−)−1𝑍−1𝐶∗


and hence

Θ2(𝜆)∗Θ2(𝜆) = 𝐼𝑚 on 𝑖ℝ and Θ2(𝜆)

∗Θ2(𝜆) ≤ 𝐼𝑚 in Π+.

As 𝜎(𝐴1), 𝜎(𝐴−) ⊂ Π−, the rmvf’s Θ1 and Θ2 are holomorphic in Π+ and henceΘ1 and Θ2 are inner with respect to Π+. Therefore, as all inner rmvf’s are finiteBlaschke–Potapov products, it follows that Θ1 and Θ2 are finite Blaschke–Potapovproducts.

Next, (5) is verified by establishing the observability and controllability offour pairs of matrices. This will be done in four steps.

5.1 The pair (𝐶1, 𝐴1) is observable.

If there exist 𝜆 ∈ ℂ and 𝑢 ∈ ℂ𝑘, such that 𝐶1𝑢 = 0 and 𝐴1𝑢 = 𝜆𝑢, then−𝑃−111 𝐴

∗+𝑃11𝑢 = 𝜆𝑢 and equation (2.2) implies that

0 = 𝐴∗+𝑃11𝑢+ 𝑃11𝐴+𝑢+ 𝐶∗1𝐶1𝑢

= −𝜆𝑃11𝑢+ 𝑃11𝐴+𝑢 = 𝑃11(𝐴+𝑢− 𝜆𝑢).Thus, 𝐴+𝑢 = 𝜆𝑢, as 𝑃11 is invertible, and hence the observability of the pair(𝐶1, 𝐴+) implies 𝑢 = 0.

5.2 The pair (𝐴1, 𝑃−111 𝐶

∗1 ) is controllable.

If there exist 𝜆 ∈ ℂ and 0 ∕= 𝑢 ∈ ℂ𝑘, such that 𝐶1𝑃−111 𝑢 = 0 and 𝐴∗1𝑢 = 𝜆𝑢,

i.e., −𝑃11𝐴+𝑃−111 𝑢 = 𝜆𝑢, then equation (2.2) implies that

0 = 𝐴∗+𝑢+ 𝑃11𝐴+𝑃−111 𝑢+ 𝐶

∗1𝐶1𝑃

−111 𝑢 = 𝐴

∗+𝑢− 𝜆𝑢.

Thus, 𝜆 ∈ 𝜎(𝐴∗+)∩𝜎(−𝑃11𝐴+𝑃−111 ) = 𝜎(𝐴

∗+)∩𝜎(−𝐴+) = ∅, as 𝜎(𝐴+) ⊂ Π+,

a contradiction. Thus 𝑢 = 0, i.e., the pair (𝐶1𝑃−111 , 𝐴

∗1) is observable.

5.3 The pair (𝐶,𝐴−) is observable.

If there exist 𝜆 ∈ ℂ and a vector 𝑢 ∈ ℂ𝑛−𝑘, such that 𝐶𝑢 = 0 and 𝐴−𝑢 = 𝜆𝑢,then 𝐶2𝑢 = 𝐶1𝑃

−111 𝑃12𝑢 and equations (2.2) and (2.4) imply that

0 = 𝜆𝑃12𝑢+𝐴∗+𝑃12𝑢+ 𝐶

∗1𝐶1𝑃

−111 𝑃12𝑢

= (𝜆𝐼𝑛−𝑘 +𝐴∗+ − (𝐴∗+𝑃11 + 𝑃11𝐴+)𝑃

−111 )𝑃12𝑢

= (𝜆𝐼𝑛−𝑘 − 𝑃11𝐴+𝑃−111 )𝑃12𝑢 = 0.

If 𝑃12𝑢 = 0, then 𝐶2𝑢 = 0 and thus the observability of the pair (𝐶2, 𝐴−)implies 𝑢 = 0. If 𝑃12𝑢 ∕= 0, then 𝜆 ∈ 𝜎(𝑃11𝐴+𝑃

−111 ) = 𝜎(𝐴+). Then, 𝜆 ∈

𝜎(𝐴+) ∩ 𝜎(𝐴−) ⊂ Π+ ∩ Π− = ∅, a contradiction and so 𝑢 = 0.

5.4 The pair (𝐴−, 𝑌 𝐶∗) is controllable.

The equation 𝐴∗−𝑍 + 𝑍𝐴− + 𝐶∗𝐶 = 0, with 𝑍 = −𝑌 −1, implies

−𝑌 𝐴∗− −𝐴−𝑌 + 𝑌 𝐶∗𝐶𝑌 = 0. (4.10)


If there exist 𝜆 ∈ ℂ and 𝑢 ∈ ℂ𝑛−𝑘, such that 𝐶𝑌 𝑢 = 0 and 𝐴∗−𝑢 = 𝜆𝑢, then,as follows from (4.10),

0 = −𝑌 𝐴∗−𝑢−𝐴−𝑌 𝑢+ 𝑌 𝐶∗𝐶𝑌 𝑢= −𝜆𝑌 𝑢−𝐴−𝑌 𝑢 = −(𝐴− + 𝜆𝐼𝑛−𝑘)𝑌 𝑢 .

Thus, if 𝑌 𝑢 ∕= 0, then 𝜆 ∈ 𝜎(−𝐴−). Therefore 𝜆 ∈ 𝜎(−𝐴−) ∩ 𝜎(𝐴∗−) = ∅, as𝜎(𝐴−) ⊂ Π−, and so 𝑢 = 0 and the pair (𝐶𝑌,𝐴∗−) is observable.

Finally, since the realization of Θ is also minimal and

degΘ = degΘ−11 + degΘ2,

the factorization (4.7) is left coprime. □

Remark 4.2. The factorization (4.7) is a special case of a general theorem byKrein and Langer (see, e.g., [7]) which states that every Θ ∈ 𝒮𝑚×𝑚

𝑘 (Π+), admitsa factorization

Θ = Θ−11 Θ2,

where Θ1 is a Blaschke–Potapov product of degree 𝑘, Θ2 is in the Schur class𝒮𝑚×𝑚(Π+) and kerΘ2(𝜆)

∗ ∩ kerΘ1(𝜆)∗ = {0}.

5. From the RKKS to the inertia theorem

Theorem 5.1. Let 𝐴 ∈ ℂ𝑛×𝑛 and 𝐶 ∈ ℂ𝑚×𝑛. If 𝜎(𝐴) ∩ 𝑖ℝ = ∅ (i.e., ℰ0(𝐴) = 0),the pair (𝐶,𝐴) is observable, 𝐹 (𝜆) is the 𝑚 × 𝑛 rmvf defined by the realizationformula

𝐹 (𝜆) = 𝐶(𝜆𝐼𝑛 −𝐴)−1, (5.1)

𝑃 = 𝑃 ∗ ∈ ℂ𝑛×𝑛 is invertible (i.e., ℰ0(𝑃 ) = 0) and the space

ℳ = {𝐹 (𝜆)𝑢 : 𝑢 ∈ ℂ𝑛}endowed with the indefinite inner product

⟨𝐹 (𝜆)𝑢, 𝐹 (𝜆)𝑣⟩ℳ = 𝑣∗𝑃𝑢 (5.2)

is an 𝑛-dimensional RKKS, with RK


𝜌𝜔(𝜆), 𝜆, 𝜔 ∈ ℂ ∖ 𝜎(𝐴)

where

Θ(𝜆) = 𝐼𝑚 − 𝐶(𝜆𝐼𝑛 −𝐴)−1𝑃−1𝐶∗, (5.3)

then 𝑃 is a solution of the Lyapunov equation (1.1) and ℰ±(𝐴) = ℰ∓(𝑃 ).


Proof. It is readily checked that

𝐾𝜔(𝜆) = 𝐹 (𝜆)𝑃−1𝐹 (𝜔)∗, 𝜆, 𝜔 ∈ ℂ ∖ 𝜎(𝐴)

is a RK for the spaceℳ. Therefore, by the uniqueness of the RK,

𝐹 (𝜆)𝑃−1𝐹 (𝜔)∗ =𝐼𝑚 −Θ(𝜆)Θ(𝜔)∗

𝜌𝜔(𝜆), 𝜆, 𝜔 ∈ ℂ ∖ 𝜎(𝐴), (5.4)

which implies, by a straightforward calculation, that the matrix 𝑃 is a solution ofthe Lyapunov equation (1.1).

Let 𝐶 =[𝐶1 𝐶2

]with 𝐶1 ∈ ℂ𝑚×𝑘 and 𝑃 =

[𝑃11 𝑃12𝑃21 𝑃22

]with 𝑃11 ∈ ℂ𝑘×𝑘,

be conformal with the decomposition (2.1) of 𝐴 with 𝐴+ ∈ ℂ𝑘×𝑘 and 𝜎(𝐴±) ⊂ Π±.The rest of the proof is divided into steps:

1. 𝑃11 and 𝑃22 − 𝑃21𝑃−111 𝑃12 are invertible matrices.

The verification is the same as the proof in Section 3. ▼2. If Θ1 and Θ2 are given by formulas (4.8) and (4.9), then

Θ(𝜆) = Θ1(𝜆)−1Θ2(𝜆).

Moreover, the realization formulas (4.8) and (4.9) are minimal.

The verification is the same as in steps 3 and 5 in the proof of Theorem 4.1. ▼3. There exist 𝜔1, . . . , 𝜔𝑛 ∈ ℂ ∖ 𝜎(𝐴) and 𝑣1, . . . , 𝑣𝑛 ∈ ℂ𝑚 such that

𝑊 =[𝐹 (𝜔1)

∗Θ1(𝜔1)𝑣1 . . . 𝐹 (𝜔𝑛)∗Θ1(𝜔𝑛)

∗𝑣𝑛]

is invertible.

This follows from the fact that

ℳ1 = {𝐹 (𝜆)∗𝑢 : 𝜆 ∈ Π+ ∖ 𝜎(𝐴), 𝑢 ∈ ℂ𝑚} = ℂ𝑚. (5.5)

To verify the equality in (5.5), note that if there is a vector 𝑣 ∈ ℂ𝑛 such that𝑣 ⊥ℳ1, then ⟨𝐹 (𝜆)∗𝑢, 𝑣⟩ = ⟨𝑢, 𝐹 (𝜆)𝑣⟩ = 0 for every 𝑢 ∈ ℂ𝑚 and 𝜆 ∈ ℂ+ ∖ 𝜎(𝐴),and hence 𝐹 (𝜆)𝑣 = 0 for every 𝜆 ∈ ℂ+ ∖ 𝜎(𝐴). But this implies that 𝑣 = 0, sincethe pair (𝐶,𝐴) is observable. Therefore, 𝑣 ⊥ ℳ1 if and only if 𝑣 = 0 and thusequality holds in (5.5).

Therefore, as the matrices Θ1(𝜔) and Θ2(𝜔) are all invertible matrices, exceptfor a finite number of points 𝜔 in ℂ, there exist 𝜔1, . . . , 𝜔𝑛 ∈ ℂ+ ∖ 𝜎(𝐴) and𝑣1, . . . , 𝑣𝑛 ∈ ℂ𝑚 such that the vectors

𝑤1 = 𝐹 (𝜔1)∗Θ1(𝜔1)𝑣1, . . . , 𝑤𝑛 = 𝐹 (𝜔𝑛)

∗Θ1(𝜔𝑛)∗𝑣𝑛

are linearly independent and so 𝑊 is invertible. ▼4. Let

𝐾Θ𝑠𝜔 (𝜆) =

𝐼𝑚 −Θ𝑠(𝜆)Θ𝑠(𝜔)∗

𝜌𝜔(𝜆)


be the RK of the finite-dimensional RKKS ℋ(Θ𝑠) = 𝐻𝑚2 ⊖Θ𝑠𝐻

𝑚2 , for 𝑠 = 1, 2. If

𝑊1 and 𝑊2 are the 𝑛× 𝑛 matrices defined by the formulas

(𝑊1)𝑖𝑗 = 𝑣∗𝑖𝐾

Θ1𝜔𝑗(𝜔𝑖)𝑣𝑗 𝑎𝑛𝑑 (𝑊2)𝑖𝑗 = 𝑣

∗𝑖𝐾

Θ2𝜔𝑗(𝜔𝑖)𝑣𝑗 , (5.6)

then 𝑊1 ≥ 0, 𝑊2 ≥ 0 and

𝑊 ∗𝑃−1𝑊 =𝑊2 −𝑊1. (5.7)

For every 𝑥 ∈ ℂ𝑛 and 𝑠 = 1, 2,

⟨𝑊𝑠𝑥, 𝑥⟩ =𝑛∑

𝑖,𝑗=1

𝑥𝑗𝑥∗𝑖 (𝑊𝑠)𝑖𝑗 =

𝑛∑𝑖,𝑗=1

𝑥∗𝑖 𝑣∗𝑖𝐾

Θ𝑠𝜔𝑗(𝜔𝑖)𝑣𝑗𝑥𝑗

=

⟨𝑛∑

𝑖=1

𝐾Θ𝑠𝜔𝑖𝑣𝑖𝑥𝑖,

𝑛∑𝑗=1

𝐾Θ𝑠𝜔𝑗𝑣𝑗𝑥𝑗

⟩ℋ(Θ𝑠)

≥ 0,

i.e., 𝑊𝑠 ≥ 0 for 𝑠 = 1, 2. Next, equation (5.4) implies, for every 𝜆, 𝜔 ∈ ℂ ∖ 𝜎(𝐴),

𝐹 (𝜆)𝑃−1𝐹 (𝜔)∗ =𝐼𝑚 −Θ1(𝜆)

−1Θ2(𝜆)Θ2(𝜔)∗Θ1(𝜔)

−∗

𝜌𝜔(𝜆),

i.e.,

Θ1(𝜆)𝐹 (𝜆)𝑃−1𝐹 (𝜔)∗Θ1(𝜔)

∗ =Θ1(𝜆)Θ1(𝜔)

∗ −Θ2(𝜆)Θ2(𝜔)∗

𝜌𝜔(𝜆)(5.8)

for every 𝜆, 𝜔 ∈ ℂ ∖ 𝜎(𝐴). Therefore,(𝑊 ∗𝑃−1𝑊 )𝑖𝑗 = 𝑣

∗𝑖Θ1(𝜔𝑖)𝐹 (𝜔𝑖)𝑃

−1𝐹 (𝜔𝑗)∗Θ1(𝜔𝑗)∗𝑣𝑗

= 𝑣∗𝑖

[Θ1(𝜔𝑖)Θ1(𝜔𝑗)

∗ −Θ2(𝜔𝑖)Θ2(𝜔𝑗)∗

𝜌𝜔𝑗 (𝜔𝑖)

]𝑣𝑗

= 𝑣∗𝑖

[𝐼𝑚 −Θ2(𝜔𝑖)Θ2(𝜔𝑗)

∗


]𝑣𝑗 − 𝑣∗𝑖

[𝐼𝑚 −Θ1(𝜔𝑖)Θ1(𝜔𝑗)

∗


]𝑣𝑗

= 𝑣∗𝑖𝐾Θ2𝜔𝑗(𝜔𝑖)𝑣𝑗 − 𝑣∗𝑖𝐾Θ1

𝜔𝑗(𝜔𝑖)𝑣𝑗

= (𝑊2)𝑖𝑗 − (𝑊1)𝑖𝑗

and thus (5.7) holds. ▼

5. ℰ+(𝑃 ) ≤ rank𝑊2 and ℰ−(𝑃 ) ≤ rank𝑊1.

Formula (5.7) can be reexpressed as

𝑊 ∗𝑃−1𝑊 =[𝐼𝑛 𝐼𝑛

] [−𝑊1 00 𝑊2

] [𝐼𝑛𝐼𝑛

]and since 𝑊 is invertible, the Sylvester law of inertia implies

ℰ±(𝑃 ) = ℰ±(𝑃−1) = ℰ±(𝑊 ∗𝑃−1𝑊 ) ≤ ℰ±([−𝑊1 0

0 𝑊2

]).


Therefore, since 𝑊1 ≥ 0 and 𝑊2 ≥ 0,

ℰ+([−𝑊1 0

0 𝑊2

])= rank𝑊2 and ℰ−

([−𝑊1 00 𝑊2

])= rank𝑊1,

as claimed. ▼6. rank𝑊1 ≤ degΘ1 and rank𝑊2 ≤ degΘ2.

Since 𝐾Θ𝑠𝜔 (𝜆) = 𝐹𝑠(𝜆)𝑅𝑠𝐹𝑠(𝜔)

∗ for 𝑠 = 1, 2, with

𝐹1(𝜆) = 𝐶1(𝜆𝐼𝑘 −𝐴1)−1, 𝐹2(𝜆) = 𝐶(𝜆𝐼𝑛−𝑘 −𝐴−)−1 and 𝑅𝑠 > 0

and the exhibited realizations are minimal, the matrices

(𝑊𝑠)𝑖𝑗 = 𝑣∗𝑖𝐾

Θ𝑠𝜔𝑗(𝜔𝑖)𝑣𝑗 = 𝑣

∗𝑖 𝐹𝑠(𝜔𝑖)𝑅𝑠𝐹𝑠(𝜔𝑗)

∗𝑣𝑗

are of the form 𝑊𝑠 = 𝑋∗𝑠𝑅𝑠𝑋𝑠, where 𝑋𝑠 =

[𝐹𝑠(𝜔1)

∗𝑣1 . . . 𝐹𝑠(𝜔𝑛)∗𝑣𝑛.

]There-

fore,rank𝑊𝑠 = rank(𝑋∗𝑠𝑅𝑠𝑋𝑠) ≤ rank𝑅𝑠 = degΘ𝑠. ▼

7. ℰ±(𝑃 ) = ℰ∓(𝐴).It is well known (see, e.g., [3]) that if the realization

Θ(𝜆) = �� + 𝐶(𝜆𝐼𝑛 −𝐴)−1𝐵is minimal, then deg Θ is equal to the size of the matrix 𝐴. Thus, Step 2 guaranteesthat degΘ1 = ℰ+(𝐴) and degΘ2 = ℰ−(𝐴), and hence, Steps 5 and 6 imply thatℰ+(𝑃 ) ≤ ℰ−(𝐴) and ℰ−(𝑃 ) ≤ ℰ+(𝐴). Therefore, as ℰ0(𝑃 ) = ℰ0(𝐴) = 0,

𝑛 = ℰ+(𝑃 ) + ℰ−(𝑃 ) ≤ ℰ−(𝐴) + ℰ+(𝐴) = 𝑛,which means that ℰ+(𝑃 ) = ℰ−(𝐴) and ℰ−(𝑃 ) = ℰ+(𝐴). □

6. A disc analog

In this section we shall sketch an analog of the preceding analysis for the Steinequation

𝑃 −𝐴∗𝑃𝐴 = 𝐶∗𝐶, (6.1)

The notations 𝔻 and 𝔼 for the open unit disc and the exterior of the closedunit disc and 𝜋−(𝐴), 𝜋0(𝐴) and 𝜋+(𝐴) for the sum of the algebraic multiplicitiesof the eigenvalues of a matrix 𝐴 ∈ ℂ𝑛×𝑛 in 𝔼,𝕋 and 𝔻, respectively, will beneeded to formulate the version of the Carlson–Schneider inertia theorem underconsideration. It is equivalent to Theorem 4 on p. 453 of [8].

Theorem 6.1. If 𝐴 ∈ ℂ𝑛×𝑛, 𝐶 ∈ ℂ𝑚×𝑛, (𝐶,𝐴) is an observable pair and 𝑃 = 𝑃 ∗ ∈ℂ𝑛×𝑛 is a solution of the Stein equation (6.1), then

(1) 𝜎(𝐴) ∩ 𝕋 = ∅ and 𝑃 is invertible, i.e., 𝜋0(𝐴) = ℰ0(𝑃 ) = 0,(2) 𝜋+(𝐴) = ℰ+(𝑃 ) and 𝜋−(𝐴) = ℰ−(𝑃 ).

This section is organized as follows: In subsection 6.1 we present some pre-liminary facts and notation for subsequent use. A short well-known proof of Theo-


rem 6.1 based on elementary tools of linear system theory is presented in Subsec-tion 6.2. In Subsection 6.3, Theorem 6.1 is used to establish a finite-dimensionalreproducing kernel Krein space with a reproducing kernel of the special form


𝜌𝜔(𝜆)(6.2)

where 𝜌𝜔(𝜆) = 1 − 𝜆𝜔∗ and Θ admits a factorization Θ = Θ−11 Θ2, with Θ1 andΘ2 both inner rmvf’s with respect to 𝔻. Finally, a proof of Theorem 6.1 thatis based on methods and formulas of reproducing kernel spaces is presented inSubsection 6.4.

6.1. Preliminaries

Lemma 6.2. If the assumptions in Theorem 6.1 are in force, then 𝜎(𝐴) ∩ 𝕋 = ∅and without loss of generality 𝐴 may be assumed to be of the form 𝐽 = 𝐴± or

𝐴 =

[𝐴− 00 𝐴+

], with 𝜎(𝐴−) ⊂ 𝔼 and 𝜎(𝐴+) ⊂ 𝔻. (6.3)

Proof. The proof is separated into parts:

1. If 𝐴𝑢 = 𝜆𝑢 for some 𝑢 ∈ ℂ𝑛 and 𝜆 ∈ 𝕋, then 𝑢∗𝐴∗ = 𝜆∗𝑢∗ and hence (6.1)implies that

0 = 𝑢∗𝑃𝑢− 𝑢∗𝐴∗𝑃𝐴𝑢− 𝑢∗𝐶∗𝐶𝑢 = −𝑢∗𝐶∗𝐶𝑢.Thus, 𝐶𝐴𝑘𝑢 = 𝜆𝑘𝐶𝑢 = 0, which implies that 𝑢 = 0, since the pair (𝐶,𝐴) isobservable. Therefore, 𝜎(𝐴) ∩ 𝕋 = ∅.

2. As 𝜎(𝐴) ∩ 𝕋 = ∅, 𝐴 = 𝑈𝐽𝑈−1, where 𝐽 is a Jordan matrix of the form𝐽 = 𝐴± or

𝐽 =

[𝐴− 00 𝐴+

]with 𝜎(𝐴−) ⊂ 𝔼 and 𝜎(𝐴+) ⊂ 𝔻.

Then 𝑃 is an invertible Hermitian solution of equation (6.1) if and only if𝑈∗𝑃𝑈 is an invertible Hermitian solution of the equation

𝑈∗𝑃𝑈 − 𝐽∗(𝑈∗𝑃𝑈)𝐽 = (𝐶𝑈)∗𝐶𝑈.

Moreover, 𝜎(𝐽) ∩ 𝕋 = ∅, 𝜋±(𝐽) = 𝜋±(𝐴), ℰ±(𝑈∗𝑃𝑈) = ℰ±(𝑃 ) and the pair(𝐶𝑈, 𝐽) is observable, since

rank

[𝜆𝐼𝑛 − 𝐴𝐶

]= rank

[𝑈 00 𝐼𝑚

] [(𝜆𝐼𝑛 − 𝐽)𝐶𝑈

]𝑈−1 = rank

[(𝜆𝐼𝑛 − 𝐽)𝐶𝑈

]. □

In view of the preceding lemma, we can without loss of generality assumethat 𝐴 is of the form (6.3) and let 𝑘 := 𝜋−(𝐴). Just as in the half-plane case, weshall assume that 1 ≤ 𝑘 ≤ 𝑛 − 1. Correspondingly, let 𝐶 =

[𝐶1 𝐶2

]with 𝐶1 ∈

ℂ𝑚×𝑘, 𝐶2 ∈ ℂ𝑚×(𝑛−𝑘) and 𝑃 =

[𝑃11 𝑃12𝑃21 𝑃22

]with 𝑃11 ∈ ℂ𝑘×𝑘, 𝑃22 ∈ ℂ(𝑛−𝑘)×(𝑛−𝑘)

be the decompositions of the matrices 𝐶 and 𝑃 that are conformal with that of𝐴. Then the Stein equation (6.1) is equivalent to the first, second and fourth of


the following four equations:

𝑃11 −𝐴∗−𝑃11𝐴− = 𝐶∗1𝐶1, (6.4)

𝑃21 −𝐴∗+𝑃21𝐴− = 𝐶∗2𝐶1, (6.5)

𝑃12 −𝐴∗−𝑃12𝐴+ = 𝐶∗1𝐶2 (6.6)

and

𝑃22 −𝐴∗+𝑃22𝐴+ = 𝐶∗2𝐶2. (6.7)

Finally, let 𝒮𝑚×𝑚𝜅 (𝔻) denote the generalized Schur class of 𝑚×𝑚 mvf’s 𝑠(𝜆), that

are meromorphic in 𝔻 and for which the kernel

Λ𝑠𝜔(𝜆) =

𝐼𝑚 − 𝑠(𝜆)𝑠(𝜔)∗𝜌𝜔(𝜆)

has 𝜅 negative squares on 𝔥+𝑠 × 𝔥+𝑠 , and 𝔥+𝑠 denotes the domain of analyticity of 𝑠in 𝔻.

6.2. A quick proof of Theorem 6.1

In this section we present a variant of the well-known short proof of Theorem 6.1

Proof. It is verified in Lemma 6.2 that 𝜎(𝐴) ∩ 𝕋 = ∅. Next, as 𝜎(𝐴−) ⊂ 𝔼 and𝜎(𝐴+) ⊂ 𝔻, (6.4) and (6.7) both have unique solutions:

𝑃11 = −∑∞

𝑗=0(𝐴∗−)

−𝑗−1𝐶∗1𝐶1𝐴−𝑗−1− ≤ 0, and 𝑃22 =

∑∞𝑗=0

(𝐴∗+)𝑗𝐶∗2𝐶2𝐴

𝑗+ ≥ 0

respectively. The observability of the pair (𝐶,𝐴) implies that the pairs (𝐶1, 𝐴−)and (𝐶2, 𝐴+) are observable, as is perhaps verified most easily by the Popov–Belevitch–Hautus test. Thus,

𝑃11𝑢 = 0 =⇒ 0 ≥ −∑∞

𝑗=0(𝐶1𝐴

−𝑗−1− 𝑢)∗(𝐶1𝐴

−𝑗−1− 𝑢) = 0

=⇒ 𝐶1𝐴−𝑗−1− 𝑢 = 0, 𝑗 = 0, 1, . . . =⇒ 𝑢 = 0,

i.e., 𝑃11 is invertible and by similar calculations it can be proved that 𝑃22 is alsoinvertible. Therefore, 𝑃11 < 0, 𝑃21 = 𝑃 ∗12 and 𝑃22 − 𝑃 ∗12𝑃−111 𝑃12 ≥ 𝑃22 > 0. BySchur complements,

𝑃 =

[𝐼𝑘 𝑃−111 𝑃120 𝐼𝑛−𝑘

]∗ [𝑃11 00 𝑃22 − 𝑃 ∗12𝑃−111 𝑃12

] [𝐼𝑘 𝑃−111 𝑃120 𝐼𝑛−𝑘

]is invertible and thus, by the Sylvester law of inertia,

ℰ−(𝑃 ) = ℰ−(𝑃11) + ℰ−(𝑃22 − 𝑃21𝑃−111 𝑃12)

= ℰ−(𝑃11) = 𝑘 = 𝜋−(𝐴)and

ℰ+(𝑃 ) = ℰ+(𝑃11) + ℰ+(𝑃22 − 𝑃21𝑃−111 𝑃12)

= ℰ+(𝑃22 − 𝑃21𝑃−111 𝑃12) = 𝑛− 𝑘 = 𝜋+(𝐴). □


6.3. From the inertia theorem to a RKKS

In this section we will establish a finite-dimensional reproducing kernel Krein spaceℳ with a RK of the form (6.2) and then use the inertia theorem to obtain a coprimefactorization formula for the rmvf Θ ∈ 𝒮𝑛×𝑛

ℰ−(𝑃 )(𝔻) of the form

Θ = Θ−11 Θ2,

with Θ1 and Θ2 both inner rmvf’s with respect to 𝔻 and hence Blaschke–Potapovproducts. Moreover, we will give explicit realization formulas for Θ1 and Θ2 andprove that they are minimal.

Theorem 6.3. If 𝐴 ∈ ℂ𝑛×𝑛, 𝐶 ∈ ℂ𝑚×𝑛, the pair (𝐶,𝐴) is observable and 𝑃 =𝑃 ∗ ∈ ℂ𝑛×𝑛 is a solution of the Stein equation (6.1), then:

(1) 𝜎(𝐴) ∩ 𝕋 = ∅ and 𝑃 is invertible, i.e., 𝜋0(𝐴) = ℰ0(𝑃 ) = 0,(2) The space

ℳ = {𝐹 (𝜆)𝑢 : 𝑢 ∈ ℂ𝑛} (6.8)

with

𝐹 (𝜆) = 𝐶(𝐼𝑛 − 𝜆𝐴)−1, 𝜆−1 ∈ ℂ ∖ 𝜎(𝐴) (6.9)

and indefinite inner product

⟨𝐹 (𝜆)𝑢, 𝐹 (𝜆)𝑣⟩ℳ = 𝑣∗𝑃𝑢, (6.10)

is an 𝑛-dimensional RKKS with RK

𝐾𝜔(𝜆) = 𝐹 (𝜆)𝑃−1𝐹 (𝜔)∗, 𝜆−1, 𝜔−1 ∈ ℂ ∖ 𝜎(𝐴). (6.11)

(3) The RK may be expressed in the form


𝜌𝜔(𝜆), 𝜆−1, 𝜔−1 ∈ ℂ ∖ 𝜎(𝐴), (6.12)

where 𝜇 ∈ 𝕋 and

Θ(𝜆) = 𝐼𝑚 − 𝜌𝜇(𝜆)𝐶(𝐼𝑛 − 𝜆𝐴)−1𝑃−1(𝐼𝑛 − 𝜇∗𝐴∗)−1𝐶∗. (6.13)

The matrices 𝑃11 and 𝑃22 − 𝑃21𝑃−111 𝑃12 are invertible and the rmvf Θ(𝜆)admits the factorization

Θ(𝜆) = Θ1(𝜆)−1Θ2(𝜆), (6.14)

with 𝐴 of the form (6.3),

Θ1(𝜆) = 𝐼𝑚 − (𝜆− 𝜇)𝐶1(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 (𝜆𝐼𝑘 −𝐴∗−)−1𝐶∗1 (6.15)

Θ2(𝜆) = 𝐼𝑚 − 𝜌𝜇(𝜆)𝐹 (𝜇)𝑄1(𝐼𝑛−𝑘 − 𝜇𝐴+)(𝐼𝑛−𝑘 − 𝜆𝐴+)−1𝑄2𝑄

∗1𝐹 (𝜇)

∗, (6.16)

𝑄1 =

[−𝑃−111 𝑃12𝐼𝑛−𝑘

]𝑎𝑛𝑑 𝑄2 = (𝑃22 − 𝑃21𝑃−111 𝑃12)

−1.

(4) The rmvf’s Θ1 and Θ2 are finite Blaschke–Potapov products that are bothinner with respect to 𝔻 and are left coprime.

(5) The realizations (6.15) and (6.16) are minimal.


Proof. Assertion (1) is verified in Lemma 6.2. Next, sinceℳ is a RKKS with RKthat is given by formula (6.11) and (𝐶,𝐴) is an observable pair, the columns ofthe rmvf 𝐹 are linearly independent. Therefore, (2) holds.

Formula (6.12) in (3) may be verified by a straightforward calculation thatuses the Stein equation (6.1). Moreover, it is easily seen that 𝑃11 is invertible (justas in the proof in Section 6.2 and hence that 𝑃22 − 𝑃12𝑃−111 𝑃21 is also invertible(by the Schur complements formula for 𝑃 ) and 𝑄1 and 𝑄2 are well defined. Next,a lengthy calculation that takes advantage of the formulas (6.13), (6.15) and

(𝜆𝐼𝑚 −𝐴∗)−1𝐶∗𝐶(𝐼𝑚 − 𝜆𝐴)−1 = 1

𝜆𝑃 (𝐼𝑚 − 𝜆𝐴)−1 + 1

𝜆𝐴∗(𝜆𝐼𝑚 −𝐴∗)−1𝑃,

serves to verify that Θ1(𝜆)Θ(𝜆) = Θ2(𝜆) and hence that (6.14) holds.The observability of the pairs (𝐶1, 𝐴−) and (𝐶2, 𝐴+) is inherited from the

observability of the pair (𝐶,𝐴) (as is most easily seen by the Popov–Belevitch–Hautus test). Therefore, successive applications of Theorem 6.1 to equation (6.4)for the 𝑘 × 𝑘 matrix 𝑃11 and to equation (6.7) for the (𝑛 − 𝑘) × (𝑛 − 𝑘) matrix𝑃22, yields the implications

𝜋±(𝐴−) = ℰ±(𝑃11) =⇒ ℰ−(𝑃11) = 𝑘 =⇒ 𝑃11 < 0

and

𝜋±(𝐴+) = ℰ±(𝑃22) =⇒ ℰ+(𝑃22) = 𝑛− 𝑘 =⇒ 𝑃22 > 0.

Thus, as 𝑃22 > 0 and 𝑃12 = 𝑃∗21,

𝑄2 = (𝑃22 − 𝑃21𝑃−111 𝑃12)−1 > 0.

A direct calculation shows that

𝐼𝑚 −Θ1(𝜆)Θ1(𝜆)∗ = −(1− ∣𝜆∣2)Φ1𝑃11Φ

∗1, (6.17)

where Φ1 = 𝐶1(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 (𝜆𝐼𝑘 −𝐴∗−)−1(𝐼𝑘 − 𝜇∗𝐴∗−). A similar calculation,based on the fact that

𝐾 := 𝑃 − 𝑃[𝑃−111 00 0

]𝑃 =

[0 00 𝑄−12

](6.18)

is a solution of the Riccati equation

𝐾 −𝐴∗𝐾𝐴 = (𝐼𝑛 − 𝜇∗𝐴∗)𝐾𝑃−1𝐹 (𝜇)∗𝐹 (𝜇)𝑃−1𝐾(𝐼𝑛 − 𝜇𝐴), (6.19)

shows that

𝐼𝑚 −Θ2(𝜆)∗Θ2(𝜆) = (1− ∣𝜆∣2)Φ2𝐾Φ

∗2, (6.20)

where Φ2 = 𝐹 (𝜇)𝑃−1(𝐼𝑚−𝜆∗𝐴∗)−1(𝐼𝑚−𝜇∗𝐴∗). Since 𝜎(𝐴+) ⊂ 𝔻 and 𝜎(𝐴∗−) ⊂ 𝔼,

the rmvf’s Θ1 and Θ2 are holomorphic in 𝔻 and as 𝑃11 < 0 and 𝐾 ≥ 0, it followsfrom formulas (6.17) and (6.20) that the rmvf’s Θ1 and Θ2 are inner with respectto 𝔻. Therefore, as all inner rmvf’s are finite Blaschke–Potapov products, Θ1 andΘ2 are finite Blaschke–Potapov products.


Next, since observability and controllability for realizations of the form (6.13),and in particular for the realizations (6.15) and (6.16), may also be verified byapplying the Popov–Belevitch–Hautus test to the pairs (𝐶,𝐴) and (𝐴,𝐵), with𝐵 = 𝑃−1(𝐼𝑚 − 𝜇∗𝐴∗)−1𝐶∗, it suffices to check that

rank

[𝜆𝐼𝑛 −𝐴𝐶

]= 𝑛 and rank

[𝜆𝐼𝑛 −𝐴 𝐵

]= 𝑛

for all points 𝜆 ∈ ℂ (see, e.g., Theorem 3.5 in [1], where realizations of this formare discussed). This will be done in four steps.

6.1 The pair (𝐶1(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 , 𝐴∗−) is observable.

If there exist 𝑢 ∕= 0 in ℂ𝑘 and 𝜆 ∈ ℂ such that 𝐶1(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 𝑢 = 0and 𝐴∗−𝑢 = 𝜆𝑢, then

0 = 𝐶∗1𝐶1(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 𝑢

= [𝑃11(𝐼𝑘 − 𝜇𝐴−) + (𝜇𝐼𝑘 −𝐴∗−)𝑃11𝐴−](𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 𝑢

= (𝜇𝐼𝑚 −𝐴∗−)[(𝜇− 𝜆)−1𝐼𝑘 + 𝑃11𝐴−(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 ]𝑢

= (𝜇𝐼𝑚 −𝐴∗−)[𝑃11(𝐼𝑘 − 𝜇𝐴−) + (𝜇− 𝜆)𝑃11𝐴−](𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 𝑢

= (𝜇𝐼𝑚 −𝐴∗−)𝑃11(𝐼𝑘 − 𝜆𝐴−)(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 𝑢

which, as (𝜇𝐼𝑚 −𝐴∗−)𝑃11 is invertible and 𝑢 ∕= 0, implies that 𝜆 ∈ 𝔻. On theother hand, 𝜆 ∈ 𝜎(𝐴∗−) =⇒ 𝜆 ∈ 𝔼, which is a contradiction.

6.2 The pair (𝐴∗−, 𝐶∗1 ) is controllable.

This follows immediately from the observability of the pair (𝐶1, 𝐴−).

6.3 The pair (𝐹 (𝜇)𝑄1(𝐼𝑛−𝑘 − 𝜇𝐴+), 𝐴+) is observable.

If there exist 𝜉 ∕= 0 in ℂ𝑛−𝑘 and 𝜆 ∈ ℂ such that 𝐹 (𝜇)𝑄1(𝐼𝑛−𝑘 − 𝜇𝐴+)𝜉 = 0and 𝐴+𝜉 = 𝜆𝜉, then

0 = 𝐹 (𝜇)

[−𝑃−111 𝑃12𝐼𝑛−𝑘

](1 − 𝜇𝜆)𝜉

= (1 − 𝜇𝜆)[−𝐶1(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 𝑃12 + 𝐶2(𝐼𝑛−𝑘 − 𝜇𝐴+)−1]𝜉

= (1 − 𝜇𝜆)[−𝐶∗1𝐶1(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 𝑃12 + 𝐶∗1𝐶2(1− 𝜇𝜆)−1]𝜉

= (𝜇𝜆 − 1)𝐶∗1𝐶1(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 𝑃12𝜉 + 𝐶∗1𝐶2𝜉.

The equations (6.4) and (6.6) imply that

𝐶∗1𝐶1 = (𝐼𝑘 − 𝜇∗𝐴∗−)𝑃11 + 𝜇∗𝐴∗−𝑃11(𝐼𝑘 − 𝜇𝐴−)and

𝐶∗1𝐶2𝜉 = (𝐼𝑘 − 𝜆𝐴∗−)𝑃12𝜉


and hence

0 = [(𝜇𝜆− 1)(𝐼𝑘 − 𝜇∗𝐴∗−)𝑃11(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111

+ (𝜇𝜆− 1)𝜇∗𝐴∗− + (𝐼𝑘 − 𝜆𝐴∗−)]𝑃12𝜉= [(𝜇𝜆− 1)(𝐼𝑘 − 𝜇∗𝐴∗−)𝑃11(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 + (𝐼𝑘 − 𝜇∗𝐴∗−)]𝑃12𝜉= (𝐼𝑘 − 𝜇∗𝐴∗−)𝑃11[(𝜇𝜆− 1)𝐼𝑘 + 𝐼𝑘 − 𝜇𝐴−](𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 𝑃12𝜉

= 𝜇(𝐼𝑘 − 𝜇∗𝐴∗−)𝑃11(𝜆𝐼𝑘 −𝐴−)(𝐼𝑘 − 𝜇𝐴−)−1𝑃−111 𝑃12𝜉.

Therefore, since 𝜇 ∈ 𝕋, 𝜆 ∈ 𝔻 and 𝑃11 is invertible, 𝑃12𝜉 = 0 and then bythe preceding calculation

𝐶2(𝐼𝑛−𝑘 − 𝜇𝐴+)−1𝜉 = 0 =⇒ 𝐶2(1− 𝜇𝜆)−1𝜉 = 0 =⇒ 𝐶2𝜉 = 0 .

Finally, 𝐴+𝜉 = 𝜆𝜉 and 𝐶2𝜉 = 0, together with the observability of (𝐶2, 𝐴+)lead to contradiction.

6.4 The pair (𝐴+, 𝑄2𝑄∗1𝐹 (𝜇)

∗) is controllable.

We shall prove that the pair (𝐹 (𝜇)𝑄1𝑄2, 𝐴∗+) is observable: If there exist

nonzero 𝜉 ∈ ℂ𝑛−𝑘 and 𝜆 ∈ ℂ such that 𝐴∗+𝜉 = 𝜆𝜉 and 𝜇𝐹 (𝜇)𝑄1𝑄2𝜉 = 0,then, by (6.19),

𝑄1𝐹 (𝜇)∗𝐹 (𝜇)𝑄1

=[0 𝐼𝑛−𝑘

]𝐾𝑃−1𝐹 (𝜇)∗𝐹 (𝜇)𝑃−1𝐾

[0𝐼𝑛−𝑘

]=[0 𝐼𝑛−𝑘

](𝐼𝑛 − 𝜇∗𝐴∗)−1(𝐾 −𝐴∗𝐾𝐴)(𝐼𝑛 − 𝜇𝐴)−1

[0𝐼𝑛−𝑘

]= (𝐼𝑛−𝑘 − 𝜇∗𝐴∗+)−1(𝑄−12 −𝐴∗+𝑄−12 𝐴+)(𝐼𝑛−𝑘 − 𝜇𝐴+)

−1

= 𝑄−12 (𝐼𝑛−𝑘 − 𝜇𝐴+)−1 + (𝐼𝑛−𝑘 − 𝜇∗𝐴∗+)−1𝜇∗𝐴∗+𝑄−12 ,

and hence

0 = 𝜉∗𝑄2𝑄∗1𝐹 (𝜇)

∗𝐹 (𝜇)𝑄1𝑄2𝜉

= 𝜉∗(𝐼𝑛−𝑘 − 𝜇𝐴+)−1𝑄2𝜉 + 𝜉

∗𝑄2(𝐼𝑛−𝑘 − 𝜇∗𝐴∗+)−1𝜇∗𝐴∗+𝜉= 𝜉∗(1− 𝜇𝜆∗)−1𝑄2𝜉 + 𝜉

∗𝑄2(1− 𝜇∗𝜆)−1𝜇∗𝜆𝜉

=

((1− 𝜇∗𝜆) + 𝜇∗𝜆(1 − 𝜇𝜆∗)

∣1− 𝜇𝜆∗∣2)𝜉∗𝑄2𝜉 =

(1− ∣𝜆∣2∣1− 𝜇𝜆∗∣2

)𝜉∗𝑄2𝜉.

Therefore, as 𝜆 /∈ 𝕋 and 𝑄2 > 0, we have 𝜉 = 0, which is a contradiction.

Finally, since the realization of Θ is also minimal and

degΘ = degΘ−11 + degΘ2,

the factorization (6.14) is left coprime. □


6.4. From the RKKS to the inertia theorem

Theorem 6.4. Let 𝐴 ∈ ℂ𝑛×𝑛 and 𝐶 ∈ ℂ𝑚×𝑛. If 𝜎(𝐴) ∩ 𝕋 = ∅ (i.e., 𝜋0(𝐴) = 0),the pair (𝐶,𝐴) is observable, 𝐹 (𝜆) is the 𝑚 × 𝑛 rmvf defined by the realizationformula

𝐹 (𝜆) = 𝐶(𝐼𝑛 − 𝜆𝐴)−1, (6.21)

𝑃 = 𝑃 ∗ ∈ ℂ𝑛×𝑛 is invertible (i.e., ℰ0(𝑃 ) = 0) and the space

ℳ = {𝐹 (𝜆)𝑢 : 𝑢 ∈ ℂ𝑛}endowed with the indefinite inner product

⟨𝐹 (𝜆)𝑢, 𝐹 (𝜆)𝑣⟩ℳ = 𝑣∗𝑃𝑢 (6.22)

is an 𝑛-dimensional RKKS, with RK


𝜌𝜔(𝜆), 𝜆−1, 𝜔−1 ∈ ℂ ∖ 𝜎(𝐴)

where

Θ(𝜆) = 𝐼𝑚 − 𝜌𝜇(𝜆)𝐶(𝐼𝑛 − 𝜆𝐴)−1𝑃−1(𝐼𝑛 − 𝜇∗𝐴∗)−1𝐶∗, (6.23)

then 𝑃 is a solution of the Stein equation (6.1) and 𝜋±(𝐴) = ℰ±(𝑃 ).Proof. It is readily checked that

𝐾𝜔(𝜆) = 𝐹 (𝜆)𝑃−1𝐹 (𝜔)∗, 𝜆−1, 𝜔−1 ∈ ℂ ∖ 𝜎(𝐴)

is a RK for the spaceℳ. Therefore, by uniqueness of the RK,

𝐹 (𝜆)𝑃−1𝐹 (𝜔)∗ =𝐼𝑚 −Θ(𝜆)Θ(𝜔)∗

𝜌𝜔(𝜆), 𝜆−1, 𝜔−1 ∈ ℂ ∖ 𝜎(𝐴), (6.24)

which implies, by a straightforward calculation, that the matrix 𝑃 is a solution ofthe Stein equation (6.1).

Let 𝐶 =[𝐶1 𝐶2

]with 𝐶1 ∈ ℂ𝑚×𝑘 and 𝑃 =

[𝑃11 𝑃12𝑃21 𝑃22

]with 𝑃11 ∈ ℂ𝑘×𝑘,

be conformal with the decomposition (6.3) of 𝐴 with 𝐴− ∈ ℂ𝑘×𝑘.

The rest of the proof is divided into steps:

1. 𝑃11 and 𝑃22 − 𝑃21𝑃−111 𝑃12 are invertible matrices.

The verification is the same as the proof in Section 6.2. ▼2. If Θ1 and Θ2 are given by formulas (6.15) and (6.16), then

Θ(𝜆) = Θ1(𝜆)−1Θ2(𝜆).

Moreover, the realization formulas (6.15) and (6.16) are minimal.

The verification is the same as in steps 3 and 5 in the proof of Theorem 6.3. ▼3. Repeat steps 3 to 6 of Theorem 5.1. They are applicable to this setting, with onlyminor changes in the proof.

4. ℰ±(𝑃 ) = 𝜋±(𝐴).


It is well known (see, e.g., [1]) that if the realization

Θ(𝜆) = �� + 𝜌𝜇(𝜆)𝐶(𝜆𝐼 −𝐴)−1𝐵 (or Θ(𝜆) = �� + 𝜌𝜇(𝜆)𝐶(𝐼 − 𝜆𝐴)−1𝐵)is minimal, then degΘ is equal to the size of the matrix 𝐴. Thus, Step 2 guar-antees that degΘ1 = 𝜋−(𝐴) and degΘ2 = 𝜋+(𝐴), and hence, Steps 5 and 6of Theorem 5.1 imply that ℰ+(𝑃 ) ≤ 𝜋+(𝐴) and ℰ−(𝑃 ) ≤ 𝜋−(𝐴). Therefore, asℰ0(𝑃 ) = 𝜋0(𝐴) = 0,

𝑛 = ℰ+(𝑃 ) + ℰ−(𝑃 ) ≤ 𝜋+(𝐴) + 𝜋−(𝐴) = 𝑛,which means that ℰ+(𝑃 ) = 𝜋+(𝐴) and ℰ−(𝑃 ) = 𝜋−(𝐴). □

References

[1] D. Alpay and H. Dym, On a New Class of Realization Formulas and Their Applica-tion, Linear Algebra and Its Applications, 241-243 (1996), 3–84.

[2] C.T. Chen, A generalization of the inertia theorem, SIAM J. Appl. Math. 25 (1973),158–161.

[3] H. Dym, Linear Algebra in Action, Graduate Studies in Mathematics, 78 AmericanMathematical Society, Providence, 2007.

[4] V. Derkach and H. Dym, On Linear Fractional Transformations Associated withGeneralized 𝐽-Inner Matrix Functions, Integral Equations and Operator Theory, 65(2009), 1–50.

[5] H. Dym and D. Volok, Zero Distribution of matrix polynomials, Linear Algebra andits Applications, 425 (2007), 714–738.

[6] I. Gohberg, P. Lancaster and L. Rodman, Matrix Polynomials, Academic Press, NewYork-London, 1982.

[7] M.G. Krein and H. Langer, Uber die verallgemeinerten Resolventen und die cha-rakteristische Funktion eines isometrischen Operators im Raume Π𝑘, Hilbert spaceOperators and Operator Algebras (Proc. Intern. Conf., Tihany, 1970); Colloq. Math.Soc. Janos Bolyai, 5 (1972), 353–399.

[8] P. Lancaster and M. Tismenetsky, The Theory of Matrices, second edition withapplications, Computer Science and Applied Mathematics, Academic Press, NewYork, 1985.

[9] L. Lerer and M. Tismenetsky, The Bezoutian and The Eigenvalue-Separation Prob-lem for Matrix Polynomials, Integral Equations and Operator Theory, 5 (1982),386–445.

[10] H.K. Wimmer, Inertia theorems for matrices, controllability, and linear vibrations,Linear Algebra and its Applications 8 (1974), 337–343

Harry Dym and Motke PoratDepartment of MathematicsThe Weizmann Institute of ScienceRehovot 76100, Israele-mail: [email protected]

[email protected]




On the Kernel and Cokernel ofSome Toeplitz Operators

Torsten Ehrhardt and Ilya M. Spitkovsky

To Professor Leonia Lerer, in celebration of his seventieth birthday.

Abstract. We show that the kernel and/or cokernel of a block Toeplitz op-erator 𝑇 (𝐺) are trivial if its matrix-valued symbol 𝐺 satisfies the condi-tion 𝐺(𝑡−1)𝐺(𝑡)∗ = 𝐼𝑁 . As a consequence, the Wiener–Hopf factorizationof 𝐺 (provided it exists) must be canonical. Our setting is that of weightedHardy spaces on the unit circle. We extend our result to Toeplitz operatorson weighted Hardy spaces on the real line, and also Toeplitz operators onweighted sequence spaces.

Mathematics Subject Classification (2010). Primary 47B35. Secondary 47A68,47B30.

Keywords. Toeplitz operators, Wiener–Hopf factorization, partial indices, dis-crete convolution operators.

1. Introduction

The Wiener algebra 𝑊 by definition consists of functions 𝑓 defined on the unitcircle 𝕋 = { 𝑧 ∈ ℂ : ∣𝑧∣ = 1 } and having absolutely convergent Fourier series:

𝑓(𝑡) =

∞∑𝑗=−∞

𝑓𝑗𝑡𝑗 , where

∞∑𝑗=−∞

∣𝑓𝑗 ∣ <∞. (1.1)

The functions 𝑓 ∈ 𝑊 with 𝑓𝑗 = 0 for all 𝑗 ≤ 0 (resp., 𝑗 ≥ 0) form the subalgebra𝑊± of𝑊 the elements of which admit analytic continuations to the interior (resp.,exterior, including the point of infinity) of 𝕋. It is a classical result by Gohbergand Krein [9] (see also the monographs [7, 12] and a survey [8] for more detailedbibliographical information and far reaching generalizations) that any invertible

128 T. Ehrhardt and I.M. Spitkovsky

matrix function1 𝐺 ∈𝑊𝑁×𝑁 admits a representation

𝐺(𝑡) = 𝐺−(𝑡)Λ(𝑡)𝐺+(𝑡), 𝑡 ∈ 𝕋, (1.2)

where 𝐺±1+ ∈𝑊𝑁×𝑁+ , 𝐺±1− ∈𝑊𝑁×𝑁

− ,

Λ(𝑡) = diag[𝑡ϰ1 , . . . , 𝑡ϰ𝑁 ] (1.3)

and ϰ1, . . . ,ϰ𝑁 ∈ ℤ. Representation (1.2) plays a central role in a variety ofapplications, including systems of convolution type equations on the half-line andToeplitz operators 𝑇 (𝐺) with matrix symbols 𝐺. In particular, the defect numbers(i.e., dimensions of the kernel and cokernel) of these operators are expressed interms of the partial indices ϰ𝑗 . Namely,

dimker𝑇 (𝐺) = −∑ϰ𝑗≤0

ϰ𝑗 , dim ker𝑇 (𝐺)∗ =∑ϰ𝑗≥0

ϰ𝑗 . (1.4)

Note that the partial indices are defined by 𝐺 uniquely, up to their order.However, for 𝑁 > 1 the partial indices, and the factorization itself, are genericallynot stable: a necessary and sufficient stability criterion (also going back to [9])reads

max{ϰ𝑖 − ϰ𝑗 : 𝑖, 𝑗 = 1, . . . , 𝑁} ≤ 1, (1.5)

and thus requires the a priori knowledge of the partial indices. The latter is avail-able in some particular cases, e.g., for rational, triangular, or sectorial matrixfunction, see, e.g., [8], but in general the problem remains open.

One recent result in this direction, obtained by Voronin2 [13], claims that allthe partial indices are equal to zero (and thus (1.5) holds) for matrix functions 𝐺satisfying

𝐺(𝑡−1)𝐺(𝑡)∗ = 𝐼𝑁 , 𝑡 ∈ 𝕋. (1.6)

The proof, published in [14], makes use of the description of all factorizations(1.2) of a given matrix function 𝐺 ∈𝑊𝑁×𝑁 . In this paper, we propose a differentapproach, which provides information about the defect numbers of 𝑇 (𝐺) with 𝐺satisfying (1.6) but not necessarily lying in 𝑊𝑁×𝑁 and, moreover, not necessarilyfactorable. Namely, we will show that (under certain mild additional conditionson the spaces where the operators act) at least one of the defect numbers is zero.Note that according to Coburn’s lemma this property holds for general Toeplitzoperators with scalar non-zero symbols, but fails starting with 𝑁 = 2. Additionalconditions on the matrix symbol 𝐺 under which the property persists are of greatinterest. Some such conditions, analytic in nature and thus very different from(1.6), were established in [6].

In Section 3 we consider 𝑇 (𝐺) with measurable bounded symbols 𝐺 actingon weighted Hardy spaces on the unit circle or the real line. Section 4 deals withoperators acting on weighted discrete ℓ𝑝. Dealing with weighted spaces presents an

1Here and below we are using the standard notational convention: given any set 𝑋, 𝑋𝑀×𝑁

stands for the set of all 𝑀×𝑁 matrices with the entries in 𝑋, and 𝑋𝑀×1 is abbreviated to 𝑋𝑀 .2Voronin works with matrix functions defined on the real line ℝ but the transition between ℝ

and 𝕋 is obvious via an appropriate linear fractional transformation.

Toeplitz Operators 129

additional difficulty, and certain nested properties had to be established in orderto overcome it. These properties are tackled in Section 2, along with the relatedresults on the kernels of homogeneous Riemann–Hilbert problems.

We end this introduction with some basic observations about condition (1.6).Firstly, the set of matrix functions satisfying (1.6) forms a group under multiplica-tion. Furthermore, such matrix functions can be defined arbitrarily on a half-circle(without loss of generality, say for Im(𝑡) > 0); the values on the complementaryhalf-circle are then determined by (1.6) uniquely. Note also that (1.6) holds if 𝐺is even (that is, 𝐺(𝑡−1) = 𝐺(𝑡)) and unitary valued.

2. Homogeneous Riemann–Hilbert problems on the unit circle

For 1 ≤ 𝑝 < ∞ and a given positive weight 𝜚 on the unit circle 𝕋, let 𝐿𝑝(𝕋; 𝜚)denote the space of all measurable functions 𝑓 defined on 𝕋 and such that

∥𝑓∥𝑝,𝜚 :=(∫ 2𝜋

0

∣𝑓(𝑒𝑖𝑥)𝜚(𝑒𝑖𝑥)∣𝑝 𝑑𝑥)1/𝑝

<∞. (2.1)

In case 𝜚 ≡ 1 we simply write 𝐿𝑝(𝕋). Throughout the paper we will assume that

1 < 𝑝 <∞, 1

𝑝+1

𝑞= 1, (2.2)

and require that the weight 𝜚 satisfies

𝜚 ∈ 𝐿𝑝(𝕋) and 𝜚−1 ∈ 𝐿𝑞(𝕋). (2.3)

The conditions (2.3) imply that 𝐿∞(𝕋) ⊆ 𝐿𝑝(𝕋; 𝜚) ⊆ 𝐿1(𝕋). Therefore we candefine the Fourier coefficients of a function 𝑓 ∈ 𝐿𝑝(𝕋; 𝜚),

𝑓𝑛 =1

2𝜋

∫ 2𝜋

0

𝑓(𝑒𝑖𝑥) 𝑒−𝑖𝑛𝑥 𝑑𝑥. (2.4)

We also introduce the weighted Hardy spaces

𝐿𝑝+(𝕋; 𝜚) =

{𝑓 ∈ 𝐿𝑝(𝕋; 𝜚) : 𝑓𝑛 = 0 for all 𝑛 < 0

}, (2.5)

𝐿𝑝−(𝕋; 𝜚) =

{𝑓 ∈ 𝐿𝑝(𝕋; 𝜚) : 𝑓𝑛 = 0 for all 𝑛 > 0

}, (2.6)

as well as

𝐿𝑝−,0(𝕋; 𝜚) =

{𝑓 ∈ 𝐿𝑝(𝕋; 𝜚) : 𝑓𝑛 = 0 for all 𝑛 ≥ 0

}. (2.7)

In order to present our result, the following notation will be handy. For amatrix or vector function 𝜙 defined on 𝕋, we introduce the “tilde” operation,

𝜙(𝑡) = 𝜙(𝑡−1), 𝑡 ∈ 𝕋,

as well as the complex adjoint function 𝜙∗,

𝜙∗(𝑡) = 𝜙(𝑡)𝑇, 𝑡 ∈ 𝕋,


which is the function obtained by taking the transpose and the complex conjugatepointwise on 𝕋. The complex adjoint and the tilde operation commute with eachother. Therefore, the notation 𝜙∗ is unambiguous. We also note that

𝜙 ∈ 𝐿𝑝±(𝕋; 𝜚) =⇒ 𝜙 ∈ 𝐿𝑝

∓(𝕋; 𝜚)and

𝜙 ∈ 𝐿𝑝±(𝕋; 𝜚) =⇒ 𝜙∗ ∈ 𝐿𝑝

∓(𝕋; 𝜚).

The main results of this section are based on sufficient conditions whichstate that certain spaces are nested. These conditions will be analyzed first in thefollowing lemma. Notice that we only need the much simpler “if” parts. The proofof the “only if” parts is provided for completeness’ sake.

Given (2.2), we also introduce 𝑟 ∈ (1,∞] by1

𝑟+

1

max{𝑝, 𝑞} =1

min{𝑝, 𝑞} .Equivalently,

𝑟 =

{𝑝𝑞∣𝑝−𝑞∣ if 𝑝 ∕= 𝑞,+∞ if 𝑝 = 𝑞 (= 2).

(2.8)

Lemma 2.1. Let (2.2) and (2.8) hold. Then:

(a) 𝐿𝑝(𝕋, 𝜚) ⊆ 𝐿𝑞(𝕋; 𝜚−1) if and only if 𝑝 ≥ 𝑞 and 𝜚−1𝜚−1 ∈ 𝐿𝑟(𝕋).

(b) 𝐿𝑞(𝕋; 𝜚−1) ⊆ 𝐿𝑝(𝕋; 𝜚) if and only if 𝑞 ≥ 𝑝 and 𝜚𝜚 ∈ 𝐿𝑟(𝕋).

Proof. The “if” parts follow easily from Holder’s inequality using (2.8). Therefore,we restrict ourselves to the “only if” parts.

(a): The inclusion means that

∣𝑓𝜚∣𝑝 ∈ 𝐿1(𝕋) ⇒ ∣𝑓𝜚−1∣𝑞 ∈ 𝐿1(𝕋).By the substitution 𝑔 = ∣𝑓𝜚∣ this is equivalent to

𝑔𝑝 ∈ 𝐿1(𝕋) ⇒ (𝑔𝜚−1𝜚−1)𝑞 ∈ 𝐿1(𝕋).In case 𝑝 ≥ 𝑞 make the substitution ℎ = 𝑔𝑞 to conclude that

ℎ ∈ 𝐿𝑝/𝑞(𝕋) ⇒ ℎ𝜚−𝑞𝜚−𝑞 ∈ 𝐿1(𝕋).By the closed graph theorem, the corresponding linear operator is bounded, andthis map gives rise to a bounded linear functional on 𝐿𝑝/𝑞(𝕋). It follows that𝜚−𝑞𝜚−𝑞 ∈ 𝐿𝑝/𝑞(𝕋)′ = 𝐿𝑝/(𝑝−𝑞)(𝕋), which implies the above.

In case 𝑝 < 𝑞, we make a substitution ℎ = 𝑔𝑝 to conclude that

ℎ ∈ 𝐿1(𝕋) ⇒ ℎ𝜚−𝑝𝜚−𝑝 ∈ 𝐿𝑞/𝑝(𝕋).

Again by the closed graph theorem, the corresponding linear operator must bebounded. There exists 𝜀 > 0 such that 𝐸𝜀 = {𝑡 ∈ 𝕋 : 𝜚(𝑡)−𝑝𝜚(𝑡)−𝑝 ≥ 𝜀} haspositive measure. Then for each measurable subset 𝐸 ⊆ 𝐸𝜀, when taking thecharacteristic function ℎ = 𝜒𝐸 ,

𝜀𝜇(𝐸)𝑝/𝑞 = ∥𝜀𝜒𝐸∥𝑞/𝑝 ≤ ∥𝜒𝐸𝜚−𝑝𝜚−𝑝∥𝑞/𝑝 ≤ 𝐶∥𝜒𝐸∥1 = 𝐶𝜇(𝐸).


Thus, 0 < 𝜀/𝐶 ≤ 𝜇(𝐸)1−𝑝/𝑞 . Since we can find a sequence of measurable sets𝐸(𝑘) ⊂ 𝐸𝜀 with 𝜇(𝐸

(𝑘)) > 0 but 𝜇(𝐸(𝑘))→ 0, a contradiction follows.

Part (b) can be proved similarly, by replacing 𝑝 with 𝑞 and 𝜚 with 𝜚−1. □

Based on Lemma 2.1 we will now show that a certain homogeneous Riemann–Hilbert problem has only a trivial solution. A corresponding result holds also forthe “adjoint” Riemann–Hilbert problem.

Proposition 2.2. Let 𝐺 be an 𝑁 × 𝑁 matrix-valued measurable function on 𝕋

satisfying ��∗(𝑡)𝐺(𝑡) = 𝐼𝑁 , and let 𝜚 satisfy (2.3). Then:

(a) If 𝑝 ≥ 𝑞 and 𝜚−1𝜚−1 ∈ 𝐿𝑟(𝕋), then the equation

𝐺(𝑡)𝑓+(𝑡) = 𝑓−(𝑡) (2.9)

with 𝑓+ ∈ 𝐿𝑝+(𝕋; 𝜚)

𝑁 and 𝑓− ∈ 𝐿𝑝−,0(𝕋; 𝜚)

𝑁 has only the trivial solution𝑓+ = 𝑓− = 0.

(b) If 𝑞 ≥ 𝑝 and 𝜚𝜚 ∈ 𝐿𝑟(𝕋), then the equation

𝐺∗(𝑡)ℎ+(𝑡) = ℎ−(𝑡) (2.10)

with ℎ+ ∈ 𝐿𝑞+(𝕋; 𝜚

−1)𝑁 and ℎ− ∈ 𝐿𝑞−,0(𝕋; 𝜚

−1)𝑁 has only the trivial solutionℎ+ = ℎ− = 0.

Proof. (a): First, notice that the conditions of Lemma 2.1(a) hold. Thus,

𝐿𝑝(𝕋, 𝜚) ⊆ 𝐿𝑞(𝕋; 𝜚−1), (2.11)

which is what we are going to use below. Now assume that (2.9) holds. Passing tothe complex adjoint and applying the tilde operation yields

𝑓∗+(𝑡)��∗(𝑡) = 𝑓∗−(𝑡).

Multiplying the equations together and using ��∗(𝑡)𝐺(𝑡) = 𝐼𝑁 we obtain

𝑓∗+(𝑡)𝑓+(𝑡) = 𝑓∗−(𝑡)𝑓−(𝑡),

which is a scalar function since we are multiplying a row with a column vectorfunction. Indeed, using the components of

𝑓±(𝑡) =(𝑓±,1(𝑡), . . . , 𝑓±,𝑁(𝑡)

)𝑇the previous equation reads

𝑁∑𝑘=1

𝑓∗+,𝑘(𝑡)𝑓+,𝑘(𝑡) =

𝑁∑𝑘=1

𝑓∗−,𝑘(𝑡)𝑓−,𝑘(𝑡). (2.12)

Because of (2.11) and Holder’s inequality, each of the occuring products is in 𝐿1(𝕋).

Furthermore, since 𝑓∗+,𝑘 ∈ 𝐿𝑝+(𝕋; 𝜚) ⊆ 𝐿𝑞

+(𝕋; 𝜚−1), again by (2.11), it follows that

each product 𝑓∗+,𝑘𝑓+,𝑘 and thus the left-hand side of (2.12) is in 𝐿1+(𝕋). For similar


reasons the right-hand belongs to 𝐿1−,0(𝕋). Since 𝐿1+(𝕋) and 𝐿

1−,0(𝕋) have a trivial

intersection, we conclude that

𝑓∗+(𝑡)𝑓+(𝑡) =𝑁∑

𝑘=1

𝑓∗+,𝑘(𝑡)𝑓+,𝑘(𝑡) = 0. (2.13)

There are now two possibilities to finish the proof, both instructive in theirown ways. Firstly, one could use the Fourier coefficients [𝑓+]𝑛 ∈ ℂ𝑁 of 𝑓+(𝑡),

𝑓+(𝑡) =

∞∑𝑛=0

[𝑓+]𝑛𝑡𝑛, ∣𝑡∣ = 1,

noting that

𝑓∗+(𝑡) =∞∑𝑛=0

[𝑓+]𝑛𝑇𝑡𝑛, ∣𝑡∣ = 1, (2.14)

involves the complex conjugates and the transpose. Thus

0 = (𝑓∗+𝑓+)(𝑡) =∞∑𝑛=0

𝑡𝑛𝑛∑

𝑘=0

[𝑓+]𝑘𝑇[𝑓+]𝑛−𝑘,

and it follows that for each 𝑛 ≥ 0,

𝑛∑𝑘=0

[𝑓+]𝑘𝑇[𝑓+]𝑛−𝑘 = 0.

Consider 𝑛 = 0 to obtain [𝑓+]0𝑇[𝑓+]0 = 0 and thus [𝑓+]0 = 0. Next consider 𝑛 = 2

to obtain [𝑓+]1 = 0, then 𝑛 = 4 to get [𝑓+]2 = 0, and so on. Hence all Fouriercoefficients are zero. This implies 𝑓+(𝑡) = 0 and thus 𝑓−(𝑡) = 0.

An alternative way of reasoning is to realize that both 𝑓+(𝑧) and 𝑓∗+(𝑧) =

𝑓+(𝑧)𝑇admit analytic continuations into the unit disk { 𝑧 ∈ ℂ : ∣𝑧∣ < 1 }. In the

special case of 𝑧 being real (and ∣𝑧∣ < 1) we have 𝑓∗+(𝑧) = 𝑓+(𝑧)𝑇, and (2.13)

becomes

0 = 𝑓+(𝑧)𝑇𝑓+(𝑧), for 𝑧 real, ∣𝑧∣ < 1.

Since the values of 𝑓+ are vectors, this is a zero sum of non-negative real numbers.Therefore, 𝑓+(𝑧) = 0 for 𝑧 real, and by analytic continuation we obtain the samefor all ∣𝑧∣ < 1. As for a function in 𝐿1+(𝕋) there is a one-to-one correpondencebetween its analytic continuation into the unit disk and the boundary values (a.e.)on 𝕋, we obtain that 𝑓+(𝑡) = 0 for ∣𝑡∣ = 1 and consequently 𝑓−(𝑡) = 0.

(b): We remark that the assumptions and Lemma 2.1 imply that

𝐿𝑞(𝕋, 𝜚−1) ⊆ 𝐿𝑝(𝕋; 𝜚). (2.15)

Thus the result can be obtained from part (a) by interchanging 𝑝 with 𝑞 and 𝜚

with 𝜚−1. For 𝐺∗ the corresponding condition ��(𝑡)𝐺∗(𝑡) = 𝐼𝑁 holds. □


3. Toeplitz operators on the unit circle and real line

Given a Banach space 𝑋 , we denote by ℒ(𝑋) the space of all bounded linearoperators acting on 𝑋 . Also, we use the notation

𝛼(𝐴) = dimker𝐴 and 𝛽(𝐴) = dim𝑋/Im𝐴 (= dimker𝐴∗)

for 𝐴 ∈ ℒ(𝑋). Therein 𝐴∗ ∈ ℒ(𝑋 ′) stands for the adjoint of 𝐴.Toeplitz operators on 𝑳𝒑

+(𝕋; 𝝔). If we assume, in addition to (2.3), that the weight

𝜚 satisfies the Hunt–Muckenhoupt–Wheeden (or 𝐴𝑝) condition,

sup𝐼

(1

∣𝐼∣∫𝐼

𝜚(𝑡)𝑝 ∣𝑑𝑡∣)1/𝑝(

1

∣𝐼∣∫𝐼

𝜚(𝑡)−𝑞 ∣𝑑𝑡∣)1/𝑞

<∞, (3.1)

where the supremum is taken over all subarcs 𝐼 of 𝕋, then it is possible to considerToeplitz operators on 𝐿𝑝

+(𝕋). In fact, the Hunt–Muckenhoupt–Wheeden conditionis necessary and sufficient for the boundedness of Riesz projection

𝑃 :

∞∑𝑛=−∞

𝑓𝑛𝑡𝑛 �→

∞∑𝑛=0

𝑓𝑛𝑡𝑛 (3.2)

on the space 𝐿𝑝(𝕋; 𝜚). Under these conditions the image of 𝑃 equals the Hardyspace 𝐿𝑝

+(𝕋; 𝜚), and the kernel of 𝑃 equals 𝐿𝑝−,0(𝕋; 𝜚), see [11] or [1] and references

therein.For 𝐺 ∈ 𝐿∞(𝕋)𝑁×𝑁 the block Toeplitz operator acting on 𝐿𝑝

+(𝕋; 𝜚)𝑁 is de-

fined by

𝑇 (𝐺) : 𝑓 �→ 𝑃 (𝐺𝑓), (3.3)

where 𝐺𝑓 stands for the pointwise product on 𝕋 of the matrix-valued function 𝐺with the vector-valued function 𝑓 . Under the above assumptions 𝑇 (𝐺) is a boundedlinear operator with norm

∥𝑇 (𝐺)∥ℒ(𝐿𝑝+(𝕋;𝜚)

𝑁 ) ≤ 𝐶𝑝,𝜚 ∥𝐺∥𝐿∞(𝕋)𝑁×𝑁 , (3.4)

where 𝐶𝑝,𝜚 = ∥𝑃∥ℒ(𝐿𝑝(𝕋;𝜚)).

Theorem 3.1. Let 𝐺 ∈ 𝐿∞(𝕋)𝑁×𝑁 , and assume that ��∗(𝑡)𝐺(𝑡) = 𝐼𝑁 . Consider𝑇 (𝐺) on 𝐿𝑝

+(𝕋; 𝜚)𝑁 , with 𝜚 satisfying (3.1).

(a) If 𝑝 ≥ 𝑞 and 𝜚−1𝜚−1 ∈ 𝐿𝑟(𝕋), then 𝛼(𝑇 (𝐺)) = 0.(b) If 𝑞 ≥ 𝑝 and 𝜚𝜚 ∈ 𝐿𝑟(𝕋), then 𝛽(𝑇 (𝐺)) = 0.

Proof. (a): If the Toeplitz operator 𝑇 (𝐺) has a non-trivial kernel, then there existsa nonzero 𝑓+ ∈ 𝐿𝑝

+(𝕋; 𝜚) such that

0 = 𝑇 (𝐺)𝑓+ = 𝑃 (𝐺𝑓+),

which is equivalent to

𝑓− = 𝐺𝑓+ ∈ 𝐿𝑝−,0(𝕋; 𝜚).

Now Proposition 2.2 implies 𝑓+ = 0, which is a contradiction.


(b): The dual space of 𝐿𝑝+(𝕋; 𝜚)

𝑁 can be identified with 𝐿𝑞+(𝕋; 𝜚

−1)𝑁 via

Λ : 𝐿𝑞+(𝕋; 𝜚

−1)𝑁 → (𝐿𝑝+(𝕋; 𝜚)

𝑁 )′,

(Λ𝑔)(𝑓) =

∫ 2𝜋

0

𝑔(𝑒𝑖𝑥)𝑇𝑓(𝑒𝑖𝑥) 𝑑𝑥.

Under this identification it is easy to see that the adjoint operator of 𝑇 (𝐺) (actingon 𝐿𝑝

+(𝕋; 𝜚)𝑁 ) is the operator 𝑇 (𝐺∗) acting on 𝐿𝑞

+(𝕋; 𝜚−1)𝑁 . Therefore, one can

rely on part (a) and just interchange 𝑝 with 𝑞 and 𝜚 with 𝜚−1 to obtain the desiredresult. □

Factorization. An 𝑁 ×𝑁 matrix-valued measurable function 𝐺 defined on 𝕋 pos-sesses a factorization in 𝐿𝑝(𝕋; 𝜚) if it can be written in the form

𝐺(𝑡) = 𝐺−(𝑡)Λ(𝑡)𝐺+(𝑡), 𝑡 ∈ 𝕋,

such that

𝐺+ ∈ 𝐿𝑞+(𝕋; 𝜚

−1)𝑁×𝑁 , 𝐺− ∈ 𝐿𝑝−(𝕋; 𝜚)𝑁×𝑁 ,

𝐺−1+ ∈ 𝐿𝑝+(𝕋; 𝜚)

𝑁×𝑁 , 𝐺−1− ∈ 𝐿𝑞−(𝕋; 𝜚

−1)𝑁×𝑁 ,(3.5)

where

Λ(𝑡) = diag[𝑡ϰ1 , . . . , 𝑡ϰ𝑁 ],

and ϰ1, . . . ,ϰ𝑁 ∈ ℤ are called the partial indices. Notice that if 𝐺 possesses afactorization, then 𝐺 and 𝐺−1 belong to 𝐿1(𝕋).

This is a weighted version of the 𝐿𝑝-factorization as defined in [12]. Its exis-tence guarantees that the respective homogeneous Riemann–Hilbert problem hasfinitely many linearly independent solutions, while the closure 𝔐 of the range𝑅𝐺 = {𝑓−+𝐺𝑓+} has finite codimension and, moreover, 𝑅𝐺 is “rationally closed”,that is, contains all rational vector functions belonging to 𝔐. If 𝐺 ∈ 𝐿∞(𝕋)𝑁×𝑁 ,then the Toeplitz operator 𝑇 (𝐺) is Fredholm on 𝐿𝑝

+(𝕋; 𝜚)𝑁 if and only if 𝐺 is

𝐿𝑝(𝕋; 𝜚) factorable and in addition the operator 𝐺−𝑃+𝐺−1− is bounded in the

metric of 𝐿𝑝(𝕋; 𝜚)𝑁 .In the following theorem we assume (2.2), (2.3) and (2.8); however, (3.1) need

not hold.

Theorem 3.2. Let 𝐺 ∈ 𝐿1(𝕋)𝑁×𝑁 satisfy ��∗(𝑡)𝐺(𝑡) = 𝐼𝑁 , and assume in additionthat 𝐺 possesses a factorization in 𝐿𝑝(𝕋; 𝜚).

(a) If 𝑝 ≥ 𝑞 and 𝜚−1𝜚−1 ∈ 𝐿𝑟(𝕋), then the partial indices ϰ𝑘 ≥ 0.(b) If 𝑞 ≥ 𝑝 and 𝜚𝜚 ∈ 𝐿𝑟(𝕋), then the partial indices ϰ𝑘 ≤ 0.(c) If 𝑝 = 2 and both 𝜚𝜚 and 𝜚−1𝜚−1 belong to 𝐿∞(𝕋), then the factorization is

canonical, i.e., ϰ𝑘 = 0 and

𝐺(𝑡) = 𝐺−(𝑡)𝐺+(𝑡). (3.6)

(d) If the factorization is canonical and if the assumptions of (a) or (b) hold,

then one can choose the factors to satisfy ��∗−𝐺− = ��∗+𝐺+ = 𝐼𝑁 .


Proof. (a): Assume that for some 𝑘, ϰ𝑘 < 0. Then define

𝑓+(𝑡) = 𝐺−1+ (𝑡)𝑒𝑘

where 𝑒𝑘 is the 𝑘th unit vector in ℂ𝑁 . Applying 𝐺(𝑡) = 𝐺−(𝑡)Λ(𝑡)𝐺+(𝑡) we obtain

𝐺(𝑡)𝑓+(𝑡) = 𝐺−(𝑡) ⋅ 𝑡ϰ𝑘𝑒𝑘 =: 𝑓−(𝑡).

It is now easy to see that 0 ∕≡ 𝑓+ ∈ 𝐿𝑝+(𝕋; 𝜚)

𝑁 and 𝑓− ∈ 𝐿𝑝−,0(𝕋; 𝜚)

𝑁 , whence we

get a contradiction to Proposition 2.2(a).(b): Passing to the complex conjugates in the factorization we see that 𝐺∗

possesses a factorization in 𝐿𝑞(𝕋; 𝜚−1),

𝐺∗(𝑡) = 𝐺∗+(𝑡) diag[𝑡−ϰ1 , . . . , 𝑡−ϰ𝑁 ]𝐺∗−(𝑡),

with partial indices −ϰ1, . . . ,−ϰ𝑁 . Now we can apply the results of part (a),interchanging 𝑝 with 𝑞 and replacing 𝜚 by 𝜚−1, or argue similar as in (a) and applyProposition 2.2(b).

(c): This follows directly from (a) and (b).

(d): From the factorization 𝐺 = 𝐺−𝐺+ we obtain ��∗ = ��∗+��∗− and through

inversion and using 𝐺 = (��∗)−1,

𝐺−𝐺+ = 𝐺 = (��∗)−1 = (��∗−)−1(��∗+)

−1.

Under the assumptions of (a) we have 𝐿𝑝(𝕋; 𝜚) ⊆ 𝐿𝑞(𝕋; 𝜚−1). Consequently,

��∗−𝐺− = (��∗+)−1𝐺−1+ ,

with the left and right-hand sides lying in 𝐿1−(𝕋)𝑁×𝑁 and 𝐿1+(𝕋)𝑁×𝑁 , respectively

(compare (3.5)). Thus, each of the products (��∗+)−1𝐺−1+ and ��∗−𝐺− is identically

equal to some 𝐶 ∈ ℂ𝑁×𝑁 , which must be nonsingular. In particular, using the

analyticity of 𝐺+(𝑧) and ��∗+(𝑧) = 𝐺+(𝑧)

𝑇for ∣𝑧∣ < 1, we get

𝐶 = (𝐺+(0)∗)−1𝐺+(0)

−1,

which is therefore positive definite. Letting

𝐻−(𝑡) = 𝐺−(𝑡)𝐺+(0), 𝐻+(𝑡) = 𝐺+(0)−1𝐺+(𝑡),

we see that 𝐺 = 𝐻−𝐻+ is also a factorization, while ��∗−𝐻− = ��∗+𝐻+ = 𝐼𝑁 holds.

If the assumptions stated in (b) hold, then we have 𝐿𝑞(𝕋; 𝜚−1) ⊆ 𝐿𝑝(𝕋; 𝜚) and weconsider

𝐺−1− (��∗−)−1 = 𝐺+��

∗+,

and proceed analogously. □Examples. We now provide some simple examples illustrating Theorems 3.2 and3.1, as well as the essentiality of the nesting conditions in order for these theoremsto hold.

Both examples feature a scalar piecewise continuous function. For such func-tions a Fredholmness criterion and index formula were established by Gohbergand Krupnik, see [10] or more recent [4, 5]. Moreover, it is shown in [12] thatpiecewise continuous functions not satisfying the Fredholmness criterion, do not


admit a factorization in the above sense. With these considerations in mind, thestatements below are easily established.

Example 3.3. We consider 1 < 𝑝 < ∞, 𝜚 ≡ 1, and the following function whichhas two jump discontinuities at 𝑡 = ±𝑖. Let 𝛽 ∈ ℝ and define

𝜙(𝑒𝑖𝑥) =

{𝑒𝑖𝜋𝛽 if − 𝜋/2 < 𝑥 < 𝜋/2𝑒−𝑖𝜋𝛽 if 𝜋/2 < 𝑥 < 3𝜋/2.

This (scalar) function satisfies the condition 𝜙∗(𝑡)𝜙(𝑡) = 1. The “size” of the jumpdiscontinuities is given by

1

2𝜋arg

𝜙(𝑖 − 0)𝜙(𝑖 + 0)

= 𝛽 and1

2𝜋arg

𝜙(−𝑖 − 0)𝜙(−𝑖 + 0)

= −𝛽.

The function can be written as

𝜙(𝑡) = (−𝑡/𝑖)𝛽(𝑡/𝑖)−𝛽,

and its factorization in 𝐿𝑝(𝕋) can be easily obtained from there. Its specific formwill depend on the relation between 𝛽 and 𝑝.

Case 1: If ∣𝛽∣ < min{ 1𝑝 , 1𝑞 }, then a canonical factorization is given by𝜙(𝑡) =

[(1− 𝑖/𝑡)−𝛽(1 + 𝑖/𝑡)𝛽

] ⋅ [(1− 𝑡/𝑖)𝛽(1 + 𝑡/𝑖)−𝛽].

Moreover, 𝑇 (𝜙) is invertible on 𝐿𝑝(𝕋).

Case 2: If 1/𝑝 < 𝛽 < 1/𝑞, then a factorization with ϰ1 = 1 is given by

𝜙(𝑡) =[𝑖(1− 𝑖/𝑡)−𝛽+1(1 + 𝑖/𝑡)𝛽

] ⋅ 𝑡 ⋅ [(1− 𝑡/𝑖)𝛽−1(1 + 𝑡/𝑖)−𝛽].

The operator 𝑇 (𝜙) is Fredholm on 𝐿𝑝(𝕋) and has a trivial kernel and a cokernelof dimension one.

Case 3: If 1/𝑞 < 𝛽 < 1/𝑝, then a factorization with ϰ1 = −1 is given by𝜙(𝑡) =

[𝑖(1− 𝑖/𝑡)−𝛽(1 + 𝑖/𝑡)𝛽−1

] ⋅ 𝑡−1 ⋅ [(1− 𝑡/𝑖)𝛽(1 + 𝑡/𝑖)−𝛽+1].

The operator 𝑇 (𝜙) is Fredholm on 𝐿𝑝(𝕋) and has kernel dimension one and atrivial cokernel.

These results are in agreement with Theorems 3.1 and 3.2.

Case 4: If 𝛽 ∈ ℤ + {1/𝑝, 1/𝑞}, then 𝜙 possesses no factorization and 𝑇 (𝜙) is notFredholm. However, by Theorem 3.1, 𝑇 (𝜙) has a trivial kernel in case 𝑝 ≥ 𝑞 and atrivial cokernel in case 𝑞 ≥ 𝑝.

These four cases essentially cover all values for 𝛽 since adding an integer to𝛽 changes the function 𝜙 by at most a sign.

Example 3.4. Let 𝑝 = 2, fix two distinct points 𝜏1, 𝜏2 ∈ 𝕋 with Im(𝜏𝑘) > 0, andconsider the weight

𝜚(𝑡) = ∣1− 𝜏1∣𝛼1 ∣1− 𝜏1∣𝛼1 ∣1− 𝜏2∣𝛼2 ∣1− 𝜏2∣𝛼2


with ∣𝛼𝑘∣ < 1/2, which guarantees that 𝜚 satisfies (2.3) and in fact the 𝐴2 condi-tion. Introduce the function

𝜙(𝑡) = (−𝑡/𝜏1)𝛽1(−𝑡/𝜏1)−𝛽1(−𝑡/𝜏2)𝛽2(−𝑡/𝜏2)−𝛽2

with 𝛽1, 𝛽2 ∈ ℝ. This function satisfies 𝜙∗𝜙 = 1. We can factor 𝜙 = 𝜙−𝜙+ with

𝜙−(𝑡) = (1− 𝑡/𝜏1)−𝛽1+1(1− 𝑡/𝜏1)𝛽1(1 − 𝑡/𝜏2)−𝛽2(1− 𝑡/𝜏2)𝛽2−1𝜏1/𝜏2,

𝜙+(𝑡) = (1− 𝑡/𝜏1)𝛽1−1(1− 𝑡/𝜏1)−𝛽1(1 − 𝑡/𝜏2)𝛽2(1− 𝑡/𝜏2)−𝛽2+1.

This is a canonical factorization in 𝐿𝑝(𝕋; 𝜚) if and only if

−1/2 + 𝛼1 < 𝛽1 − 1 < 1/2 + 𝛼1, −1/2 + 𝛼2 < 𝛽2 < 1/2 + 𝛼2,

−1/2 + 𝛼1 < −𝛽1 < 1/2 + 𝛼1, −1/2 + 𝛼2 < −𝛽2 + 1 < 1/2 + 𝛼2.

For instance, we can choose the values 𝛼1 = −1/4, 𝛼2 = 1/4, and 𝛽𝑘 ∈ (1/4, 3/4).It is easy to verify that 𝜙∗±𝜙± ∕= 1, and that this also holds if we modify the factorsby some constant. This contrasts the statement of Theorem 3.2(d). Clearly, neitherthe assumptions of (a) nor (b) hold.

Toeplitz operators on 𝑳𝒑+(ℝ;𝝁). For a weight 𝜇 on the real line ℝ, let the weighted

space 𝐿𝑝(ℝ;𝜇) consist of all measurable functions 𝑓 defined on ℝ for which

∥𝑓∥𝑝,𝜇 :=(∫ ∞

−∞∣𝑓(𝑥)𝜇(𝑥)∣𝑝 𝑑𝑥

)1/𝑝<∞.

We will assume 𝜇 ∈ 𝐿𝑝(ℝ; (1 + ∣𝑥∣)−1) and 𝜇−1 ∈ 𝐿𝑞(ℝ; (1 + ∣𝑥∣)−1), and further-more the Hunt–Muckenhoupt–Wheeden (or 𝐴𝑝) condition on ℝ,

sup𝐼

(1

∣𝐼∣∫𝐼

𝜇(𝑥)𝑝 𝑑𝑥

)1/𝑝 (1

∣𝐼∣∫𝐼

𝜇(𝑥)−𝑞 𝑑𝑥

)1/𝑞

<∞

with the supremum taken over all finite intervals. This condition is equivalent tothe boundedness of the projection 𝑃 = (𝐼 + 𝑆)/2 on 𝐿𝑝(ℝ;𝜇), where 𝑆 is thesingular integral operator on ℝ. In fact, the image and the kernel of 𝑃 equal𝐿𝑝+(ℝ;𝜇) and 𝐿

𝑝−(ℝ;𝜇), respectively, the Hardy spaces of all functions in 𝐿

𝑝(ℝ;𝜇)which admit an analytic continuation into the upper/lower complex half-plane (fordetails, see, e.g., [2, p. 302]).

For 𝐻 ∈ 𝐿∞(ℝ)𝑁×𝑁 the Toeplitz operator on 𝐿𝑝+(ℝ;𝜇)

𝑁 is defined by

𝑇 (𝐻)𝑓 = 𝑃 (𝐻𝑓).

We now have the following result. Therein the notation ��(𝑥) = 𝜇(−𝑥), ��(𝑥) =𝐻(−𝑥), and 𝐻∗(𝑥) = 𝐻(𝑥)𝑇 is used, along with by now standard (2.2), (2.8).

Theorem 3.5. Let 𝐻 ∈ 𝐿∞(ℝ)𝑁×𝑁 , and assume that ��∗(𝑥)𝐻(𝑥) = 𝐼𝑁 . Consider𝑇 (𝐻) on 𝐿𝑝

+(ℝ;𝜇)𝑁 .

(a) If 𝑝 ≥ 𝑞 and 𝜇−1��−1 ∈ 𝐿𝑟(ℝ), then 𝛼(𝑇 (𝐻)) = 0.(b) If 𝑞 ≥ 𝑝 and 𝜇�� ∈ 𝐿𝑟(ℝ), then 𝛽(𝑇 (𝐻)) = 0.


Proof. (a): The result can be proved either in analogy to Theorem 3.1, or by notingthat the map 𝑌 defined by

(𝑌 𝑓)(𝑥) =2

𝑖+ 𝑥𝑓

(𝑖− 𝑥𝑖+ 𝑥

)is an isometric isomorphism from 𝐿𝑝(𝕋; 𝜚) onto 𝐿𝑝(ℝ;𝜇), where

𝜇(𝑥) =21/𝑝−1

(1 + 𝑥2)1/𝑝−1/2𝜚

(𝑖− 𝑥𝑖+ 𝑥

).

Furthermore, 𝑌 maps 𝐿𝑝+(𝕋; 𝜚) and 𝐿

𝑝−,0(𝕋; 𝜚) onto 𝐿

𝑝+(ℝ;𝜇) and 𝐿

𝑝−(ℝ;𝜇), re-

spectively. Moreover, 𝑌 𝑇 (𝐺)𝑌 −1 = 𝑇 (𝐻) with 𝐻(𝑥) = 𝐺((𝑖 − 𝑥)/(𝑖 + 𝑥)). Hencethe statements about defect numbers of 𝑇 (𝐻) can be reduced to the correspondingstatements for 𝑇 (𝐺) on the space 𝐿𝑝

+(𝕋; 𝜚). We also remark that 𝜚 satisfies the 𝐴𝑝

condition on 𝕋 if and only if 𝜇 satisfies the 𝐴𝑝 condition on ℝ. Finally, ��∗𝐻 = 𝐼𝑁implies that ��∗𝐺 = 𝐼𝑁 . We now compute that (since 1/𝑝− 1/2 = 1/2(1/𝑝− 1/𝑞))

𝜇(𝑥)−1��(𝑥)−1 = 2−2/𝑞(1 + 𝑥2)1/𝑝−1/𝑞𝜚( 𝑖−𝑥𝑖+𝑥 )

−1𝜚( 𝑖−𝑥𝑖+𝑥 )

−1.

In view of (2.8), it follows that 𝜇−1��−1 ∈ 𝐿𝑟(ℝ) if and only if 𝜚−1𝜚−1 ∈ 𝐿𝑟(𝕋).This proves part (a).

(b): This can be proved analogously by passing to the adjoint of 𝑇 (𝐻), whichcan be identified with 𝑇 (𝐻∗) acting of 𝐿𝑞(ℝ;𝜇−1). □

4. Toeplitz operators on ℓ𝒑 spaces

In this section, we are going to establish analogous statements for Toeplitz opera-tors on weighted ℓ𝑝 spaces. Toeplitz operators on such spaces have been analyzed,e.g., in [3], although the focus there was to develop a Fredholm theory for piecewisecontinuous symbols. We also refer to [5] and the references therein.

For 1 < 𝑝 < ∞ and a weight function 𝜚 : ℤ → ℝ+ define ℓ𝑝(𝜚) as the spaceof all sequences 𝑥 = {𝑥𝑛}∞𝑛=−∞ such that

∥𝑥∥𝑝,𝜚 :=( ∞∑

𝑛=−∞∣𝑥𝑛𝜚(𝑛)∣𝑝

)1/𝑝

<∞.

By 𝑐00 we denote the set of all sequences with finite support, i.e., sequences {𝑥𝑛}for which only finitely many of the 𝑥𝑛’s are nonzero. The set 𝑐00 is dense in thespaces ℓ𝑝(𝜚).

Given two sequences, 𝑥 = {𝑥𝑛}∞𝑛=−∞ and 𝑦 = {𝑦𝑛}∞𝑛=−∞, one can define theconvolution

𝑧 = 𝑥 ∗ 𝑦, 𝑧𝑛 =∞∑

𝑘=−∞𝑥𝑛−𝑘𝑦𝑘

provided that for each 𝑛 the series defining 𝑧𝑛 converges.


Multiplier algebras. Let 𝑀𝑝,𝜚 stand for the set of all sequences 𝑎 = {𝑎𝑛}∞𝑛=−∞such that

(i) 𝑎 ∗ 𝑥 ∈ ℓ𝑝(𝜚) for each 𝑥 ∈ 𝑐00, and(ii) ∥𝑎∥𝑀𝑝,𝜚 := sup

𝑥∈𝑐00

∥𝑎 ∗ 𝑥∥𝑝,𝜚∥𝑥∥𝑝,𝜚 <∞.

In this case, the map𝐿(𝑎) : 𝑥 �→ 𝑎 ∗ 𝑥

extends via continuity to a bounded linear operator acting on ℓ𝑝(𝜚), called theconvolution operator with symbol 𝑎. Obviously, ∥𝐿(𝑎)∥ℒ(ℓ𝑝(𝜚)) = ∥𝑎∥𝑀𝑝,𝜚.

It will be convenient to use the following notation. First, let

𝑈𝑛 : {𝑥𝑘}∞𝑘=−∞ �→ {𝑥𝑘−𝑛}∞𝑘=−∞ (4.1)

be the shift operators acting on appropriate spaces of sequences. Denote by 𝑒𝑛 thesequence

𝑒𝑛 = {𝛿𝑘,𝑛}∞𝑘=−∞, (4.2)

where 𝛿𝑘,𝑛 stands for the Kronecker symbol. Note that 𝑈𝑛𝑎 = 𝑎 ∗ 𝑒𝑛. Given aweight 𝜚, define the weight 𝜚 by

𝜚(𝑛) = 𝜚(−𝑛). (4.3)

We are now able to state the following basic properties concerning convolutionsand convolution operators. Notice in particular that (c) implies that𝑀𝑝,𝜚 is indeedan algebra, and thus a Banach algebra with appropriate norm. The unit elementin 𝑀𝑝,𝜚 is 𝑒0.

Proposition 4.1. Let (2.2) hold. Then:

(a) We have

𝑀𝑝,𝜚 =𝑀𝑞,𝜚−1 ⊆∩𝑛∈ℤ

𝑈𝑛

(ℓ𝑝(𝜚) ∩ ℓ𝑞(𝜚−1)) .

(b) If 𝑎 ∈ 𝑀𝑝,𝜚 and 𝑥 ∈ ℓ𝑝(𝜚), then 𝐿(𝑎)𝑥 = 𝑎 ∗ 𝑥 and the series defining theconvolution converges absolutely.

(c) If 𝑎, 𝑏 ∈𝑀𝑝,𝜚, then 𝑎 ∗ 𝑏 ∈𝑀𝑝,𝜚 and 𝐿(𝑎 ∗ 𝑏) = 𝐿(𝑎)𝐿(𝑏). In particular,

(𝑎 ∗ 𝑏) ∗ 𝑥 = 𝑎 ∗ (𝑏 ∗ 𝑥), 𝑥 ∈ ℓ𝑝(𝜚).(d) Suppose that

sup𝑛∈ℤ

(𝜚(𝑛+ 1)

𝜚(𝑛)+𝜚(𝑛− 1)𝜚(𝑛)

)<∞. (4.4)

Then(𝑦 ∗ 𝑎) ∗ 𝑥 = 𝑦 ∗ (𝑎 ∗ 𝑥)

whenever 𝑥 ∈ ℓ𝑝(𝜚), 𝑦 ∈ ℓ𝑞(𝜚−1), and 𝑎 ∈𝑀𝑝,𝜚 =𝑀𝑞,𝜚−1 . In particular, 𝑦 ∗𝑥is well defined.

We will need statements (c) and (d) in the proof of the Theorem 4.3 below.Notice that condition (4.4) is equivalent to the boundedness of the shift operators𝑈𝑛 on ℓ

𝑝(𝜚). In other words, it means that 𝑐00 ⊆𝑀𝑝,𝜚.


Proof. (a): From duality, i.e., the identification of the dual space of ℓ𝑝(𝜚) withℓ𝑞(𝜚−1) it follows that 𝑎 ∈𝑀𝑝,𝜚 if and only if

sup𝑦∈𝑐00

sup𝑥∈𝑐00

1

∥𝑦∥𝑞,𝜚−1∥𝑥∥𝑝,𝜚

∣∣∣∣∣∣∑𝑘

𝑦−𝑘

∑𝑗

𝑎𝑘−𝑗𝑥𝑗

∣∣∣∣∣∣ <∞,which involves only finite sums and is thus equal to

sup𝑥∈𝑐00

sup𝑦∈𝑐00

1

∥𝑦∥𝑞,𝜚−1∥𝑥∥𝑝,𝜚

∣∣∣∣∣∣∑𝑗

𝑥−𝑗

∑𝑘

𝑎𝑗−𝑘𝑦𝑘

∣∣∣∣∣∣ <∞.The latter is equivalent to 𝑎 ∈𝑀𝑞,𝜚−1 . Hence 𝑀𝑝,𝜚 =𝑀𝑞,𝜚−1 .

If 𝑎 ∈ 𝑀𝑝,𝜚, then by definition 𝑎 ∗ 𝑒𝑛 = 𝑈𝑛𝑎 belongs to ℓ𝑝(𝜚). Since also

𝑎 ∈𝑀𝑞,𝜚−1 , it follows that 𝑎 ∗ 𝑒𝑛 = 𝑈𝑛𝑎 belongs to ℓ𝑞(𝜚−1) as well. This concludes

the proof of (a).(b): Let 𝑎 ∈𝑀𝑝,𝜚 and 𝑥 ∈ ℓ𝑝(𝜚). Since 𝑈−𝑛𝑎 ∈ ℓ𝑞(𝜚−1), the series

∞∑𝑘=−∞

∣𝑎𝑛−𝑘𝑥𝑘∣ ≤( ∞∑

𝑘=−∞∣𝑎𝑛−𝑘𝜚

−1(𝑘)∣𝑞)1/𝑞 ( ∞∑

𝑘=−∞∣𝑥𝑘𝜚(𝑘)∣𝑝

)1/𝑝

is finite. Consequently, the convolution product 𝑎∗𝑥 is well defined and its definingseries are absolutely convergent. Moreover,

∣[𝑎 ∗ 𝑥]𝑛∣ ≤ ∥𝑈−𝑛𝑎∥𝑞,𝜚−1∥𝑥∥𝑝,𝜚. (4.5)

Since 𝐿(𝑎)𝑥 = 𝑎 ∗ 𝑥 for 𝑥 ∈ 𝑐00, the estimate (4.5) implies that the equality holdsfor all 𝑥 ∈ ℓ𝑝(𝜚).

(c): Notice first that 𝑎∗ (𝑏∗𝑥) is well defined. Also, 𝑎∗ 𝑏 is well defined due to(b) since 𝑎 ∈𝑀𝑝,𝜚 and 𝑏 ∈𝑀𝑝,𝜚 ⊆ ℓ𝑝(𝜚). It is easy to see that (𝑎∗𝑏)∗𝑥 = 𝑎∗(𝑏∗𝑥)holds for 𝑥 = 𝑒𝑛 and thus for all 𝑥 ∈ 𝑐00. This is enough to conclude that 𝑎 ∗ 𝑏 isa multiplier. Indeed,

∥(𝑎 ∗ 𝑏) ∗ 𝑥∥𝑝,𝜚 = ∥𝑎 ∗ (𝑏 ∗ 𝑥)∥𝑝,𝜚 ≤ ∥𝑎∥𝑀𝑝,𝜚∥𝑏 ∗ 𝑥∥𝑝,𝜚 ≤ ∥𝑎∥𝑀𝑝,𝜚∥𝑏∥𝑀𝑝,𝜚∥𝑥∥𝑝,𝜚whenever 𝑥 ∈ 𝑐00.

It follows that 𝑎∗ 𝑏 ∈𝑀𝑝,𝜚 and 𝐿(𝑎∗ 𝑏)𝑥 = 𝐿(𝑎)𝐿(𝑏)𝑥 for 𝑥 ∈ 𝑐00. Due to thedensity of 𝑐00 this holds for all 𝑥 ∈ ℓ𝑝(𝜚). Finally, from this and (b) we concludethat (𝑎 ∗ 𝑏) ∗ 𝑥 = 𝑎 ∗ (𝑏 ∗ 𝑥) for all 𝑥 ∈ ℓ𝑝(𝜚).

(d): It is easily seen that

𝑒𝑛 ∗ (𝑎 ∗ 𝑒𝑚) = (𝑒𝑛 ∗ 𝑎) ∗ 𝑒𝑚 = 𝑈𝑛+𝑚𝑎.

Thus 𝑦 ∗ (𝑎 ∗ 𝑥) = (𝑦 ∗ 𝑎) ∗ 𝑥 for 𝑥, 𝑦 ∈ 𝑐00. Due to assumption (4.4), the shiftoperator 𝑈𝑛 is bounded on both ℓ

𝑝(𝜚) and ℓ𝑞(𝜚−1) for each 𝑛. If we consider the 𝑛thcomponent in the two products under consideration, we can estimate in analogyto (4.5) as follows:

∣[𝑦 ∗ (𝑎 ∗ 𝑥)]𝑛∣ ≤ ∥𝑈−𝑛𝑦∥𝑞,𝜚−1∥𝑎∥𝑀𝑝,𝜚∥𝑥∥𝑝,𝜚


and

∣[(𝑦 ∗ 𝑎) ∗ 𝑥]𝑛∣ ≤ ∥𝑦∥𝑞,𝜚−1∥𝑎∥𝑀𝑞,𝜚−1 ∥𝑈−𝑛𝑥∥𝑝,𝜚for all 𝑥 ∈ ℓ𝑝(𝜚) and 𝑦 ∈ ℓ𝑞(𝜚−1). Using these estimates and density, the desiredequality follows. Finally, if we take 𝑎 = 𝑒0 ∈ 𝑀𝑝,𝜚, we obtain that 𝑦 ∗ 𝑥 is welldefined. □

The discrete analogue of Lemma 2.1 is the following. Notice the slight differ-ence in the conditions.

Lemma 4.2. Let (2.2) and (2.8) hold. Then:

(a) ℓ𝑝(𝜚) ⊆ ℓ𝑞(𝜚−1) if and only if

(𝑝 ≥ 𝑞 and 𝜚−1𝜚−1 ∈ ℓ𝑟) or (𝑝 < 𝑞 and 𝜚−1𝜚−1 ∈ ℓ∞). (4.6)

(b) ℓ𝑞(𝜚−1) ⊆ ℓ𝑝(𝜚) if and only if

(𝑞 ≥ 𝑝 and 𝜚𝜚 ∈ ℓ𝑟) or (𝑞 < 𝑝 and 𝜚𝜚 ∈ ℓ∞). (4.7)

Proof. (a) – “if” part: The case 𝑝 ≥ 𝑞 follows from Holder’s inequality. The case𝑞 > 𝑝 can be reduced to the case 𝑝 = 𝑞 = 2 due to the inclusions ℓ𝑝(𝜚) ⊆ ℓ2(𝜚)and ℓ2(𝜚−1) ⊆ ℓ𝑞(𝜚−1).

(a) – “only if” part: The case 𝑝 ≥ 𝑞 can be settled as in Lemma 2.1. Assume𝑝 < 𝑞. The assumption implies, after making the same kind of substitutions as inLemma 2.1, that

ℎ = {ℎ𝑘} ∈ ℓ1 =⇒ {ℎ𝑘𝜚(𝑘)−𝑝𝜚(𝑘)−𝑝} ∈ ℓ𝑞/𝑝.By the closed graph theorem, this must be a bounded linear map. Now considerℎ = 𝑒𝑛 in order to conclude that

𝜚(𝑛)−𝑝𝜚(𝑛)−𝑝 = ∥{ℎ𝑘𝜚(𝑘)−𝑝𝜚(𝑘)−𝑝}∥𝑞/𝑝 ≤ 𝐶∥{ℎ𝑘}∥1 = 𝐶.This implies that 𝜚(𝑛)−1𝜚(𝑛)−1 is bounded and thus a sequence in ℓ∞.

(b): This can be proved by interchanging 𝑝 with 𝑞 and 𝜚 with 𝜚−1. □

Toeplitz operators. Let us first introduce the projection

𝑃 : {𝑥𝑛}∞𝑛=−∞ ∈ ℓ𝑝(𝜚) �→ {𝑦𝑛}∞𝑛=−∞ ∈ ℓ𝑝(𝜚), 𝑦𝑛 =

{𝑥𝑛 if 𝑛 ≥ 00 if 𝑛 < 0

and the spaces

ℓ𝑝+(𝜚) ={𝑥 ∈ ℓ𝑝(𝜚) : 𝑥𝑛 = 0 for all 𝑛 < 0

}and

ℓ𝑝−(𝜚) ={𝑥 ∈ ℓ𝑝(𝜚) : 𝑥𝑛 = 0 for all 𝑛 > 0

}.

The image of 𝑃 equals ℓ𝑝+(𝜚), and the kernel of 𝑃 equals 𝑈−1(ℓ𝑝−(𝜚)).

For 𝑎 ∈𝑀𝑁×𝑁𝑝,𝜚 we define the Toeplitz operator acting on ℓ𝑝+(𝜚)

𝑁 by

𝑇 (𝑎)𝑥 = 𝑃 (𝑎 ∗ 𝑥),


i.e., 𝑇 (𝑎)𝑥 = 𝑃𝐿(𝑎)𝑥. Given 𝑥 = {𝑥𝑛} we define𝑥𝑇 = {𝑥𝑇𝑛}, �� = {𝑥𝑛}.

The unit element in the matrix version of the multiplier algebra is 𝑒0 ⊗ 𝐼𝑁 .Theorem 4.3. Let 𝑎 ∈𝑀𝑁×𝑁

𝑝,𝜚 , let condition (4.4) on 𝜚 hold, and assume that

��𝑇 ∗ 𝑎 = 𝑒0 ⊗ 𝐼𝑁 .Consider 𝑇 (𝑎) on ℓ𝑝+(𝜚)

𝑁 .

(a) If condition (4.6) holds, then 𝛼(𝑇 (𝑎)) = 0.(b) If condition (4.7) holds, then 𝛽(𝑇 (𝑎)) = 0.

Remark. Due to the assumption 𝑎 ∈ 𝑀𝑁×𝑁𝑝,𝜚 , the convolution product ��𝑇 ∗ 𝑎 is

always well defined. The relation ��𝑇 ∗ 𝑎 = 𝑒0 ⊗ 𝐼𝑁 can be rewritten as∞∑

𝑘=−∞(��𝑘)

𝑇 𝑎𝑛−𝑘 = 𝛿0,𝑛𝐼𝑁 for each 𝑛 ∈ ℤ.

Hence, at least formally, this corresponds to(∑𝑘��𝑇𝑘 𝑡

𝑘)(∑

𝑘𝑎𝑘𝑡

𝑘)= 𝐼𝑁 , 𝑡 ∈ 𝕋,

and thus to the condition ��∗(𝑡)𝐺(𝑡) = 𝐼𝑁 of the previous section.

Proof. Assume that the kernel of 𝑇 (𝑎) is non-trivial. Then there exist 𝑥 ∈ ℓ𝑝+(𝜚)𝑁 ,𝑥 ∕= 0, and 𝑦 ∈ 𝑈−1(ℓ𝑝−(𝜚))𝑁 such that

𝑎 ∗ 𝑥 = 𝑦.Since 𝑎 ∈𝑀𝑁×𝑁

𝑝,𝜚 we obviously have ��𝑇 ∈𝑀𝑁×𝑁𝑝,𝜚 as well. Multiplying with ��𝑇 we

obtain

��𝑇 ∗ 𝑦 = ��𝑇 ∗ (𝑎 ∗ 𝑥) = (��𝑇 ∗ 𝑎) ∗ 𝑥 = 𝑥due to Proposition 4.1 (c) noting that 𝑥 ∈ ℓ𝑝(𝜚)𝑁 and ��𝑇 ∗ 𝑎 = 𝑒0 ⊕ 𝐼𝑁 . Nowmultiply with ��𝑇 to obtain

��𝑇 ∗ (��𝑇 ∗ 𝑦) = ��𝑇 ∗ 𝑥.By Proposition 4.1 (d), this convolution product is well defined and we have

��𝑇 ∗ (��𝑇 ∗ 𝑦) = (��𝑇 ∗ ��𝑇 ) ∗ 𝑦since ��𝑇 ∈ 𝑀𝑁×𝑁

𝑝,𝜚 = 𝑀𝑁×𝑁𝑞,𝜚−1 and 𝑦, �� ∈ ℓ𝑝(𝜚)𝑁 ⊆ ℓ𝑞(𝜚−1)𝑁 , making now use of

the assumption (4.6). From ��𝑇 ∗ ��𝑇 = 𝑦𝑇 , which follows from 𝑎 ∗ 𝑥 = 𝑦, we obtain𝑧 := ��𝑇 ∗ 𝑥 = 𝑦𝑇 ∗ 𝑦.

The sequence 𝑧 = 0 because 𝑧𝑛 = 0 for 𝑛 ≥ 0 due to the right-hand side and𝑧𝑛 = 0 for 𝑛 < 0 due to the left-hand side. In particular, for 𝑛 ≥ 0 we obtain

𝑛∑𝑘=0

𝑥𝑘𝑇𝑥𝑛−𝑘 = 0.


We recursively consider 𝑛 = 0, 2, . . . to conclude 𝑥𝑛 = 0 for all 𝑛. Thus 𝑥 = 0 and𝑦 = 0.

The proof of part (b) is analogous, by interchanging 𝑝 with 𝑞 and 𝜚 with𝜚−1. Notice that the adjoint of 𝑇 (𝑎) on ℓ𝑝+(𝜚)𝑁 can be identified with the operator

𝑇 (𝑏) acting on ℓ𝑞+(𝜚−1)𝑁 with 𝑏𝑛 = 𝑎−𝑛

𝑇 . Clearly, ��𝑇 ∗ 𝑏 = 𝑒0 ⊗ 𝐼𝑁 holds as well.

Furthermore, 𝑏 ∈𝑀𝑁×𝑁𝑞,𝜚−1 =𝑀𝑁×𝑁

𝑝,𝜚 . □

References

[1] A. Bottcher and Yu.I. Karlovich, Carleson curves, Muckenhoupt weights, andToeplitz operators, Progress in Math., vol. 154, Birkhauser Verlag, Basel and Boston,1997.

[2] A. Bottcher, Yu.I. Karlovich, and I.M. Spitkovsky, Convolution operators and fac-torization of almost periodic matrix functions, Operator Theory: Advances and Ap-plications, vol. 131, Birkhauser Verlag, Basel and Boston, 2002.

[3] A. Bottcher and M. Seybold, Discrete Wiener–Hopf operators on spaces with Muck-enhoupt weight, Studia Math. 143 (2000), no. 2, 121–144.

[4] A. Bottcher and B. Silbermann, Introduction to large truncated Toeplitz matrices,Springer-Verlag, New York, 1999.

[5] , Analysis of Toeplitz operators, second ed., Springer Monographs in Mathe-matics, Springer-Verlag, Berlin, 2006, prepared jointly with A. Karlovich.

[6] M.C. Camara, L. Rodman, and I.M. Spitkovsky, One sided invertibility of matri-ces over commutative rings, corona problems, and Toeplitz operators with matrixsymbols, submitted.

[7] K.F. Clancey and I. Gohberg, Factorization of matrix functions and singular integraloperators, Operator Theory: Advances and Applications, vol. 3, Birkhauser, Baseland Boston, 1981.

[8] I. Gohberg, M.A. Kaashoek, and I.M. Spitkovsky, An overview of matrix factoriza-tion theory and operator applications, in: Factorization and integrable systems (Faro,2000), Operator Theory: Advances and Applications, vol. 141, Birkhauser Verlag,Basel and Boston, 2003, pp. 1–102.

[9] I. Gohberg and M.G. Krein, Systems of integral equations on a half-line with kerneldepending upon the difference of the arguments, Uspekhi Mat. Nauk 13 (1958), no. 2,3–72 (in Russian), English translation: Amer. Math. Soc. Transl. 14 (1960), no. 2,217–287.

[10] I. Gohberg and N. Krupnik, One-dimensional linear singular integral equations. I,Operator Theory: Advances and Applications, vol. 53, Birkhauser Verlag, Basel,1992, Introduction, translated from the 1979 German translation by B. Luderer andS. Roch and revised by the authors.

[11] R. Hunt, B. Muckenhoupt, and R. Wheeden, Weighted norm inequalities for theconjugate function and Hilbert transform, Trans. Amer. Math. Soc. 176 (1973), 227–251.


[12] G.S. Litvinchuk and I.M. Spitkovskii, Factorization of measurable matrix functions,Operator Theory: Advances and Applications, vol. 25, Birkhauser Verlag, Basel,1987, translated from the Russian by B. Luderer, with a foreword by B. Silbermann.

[13] A.F. Voronin, On the well-posedness of the Riemann boundary value problem with amatrix coefficient, Dokl. Akad. Nauk 414 (2007), no. 2, 156–158 (in Russian), Englishtranslation: Dokl. Math. 75 (2007), no. 3, 358–360.

[14] , Partial indices of unitary and Hermitian matrix functions, Sibirsk. Mat.Zh. 51 (2010), no. 5, 1010–1016 (in Russian), English translation: Sib. Math. J. 51(2010), no. 5, 805–809.

Torsten EhrhardtMathematics DepartmentUniversity of CaliforniaSanta Cruz, CA-95064, USAe-mail: [email protected]

Ilya M. SpitkovskyMathematics DepartmentThe College of William and MaryWilliamsburg, VA 23187-8795, USAe-mail: [email protected]

[email protected]





Rational Matrix Solutions of aBezout Type Equation on the Half-plane

A.E. Frazho, M.A. Kaashoek and A.C.M. Ran

Dedicated to Leonia Lerer on the occasion of his 70th birthday, in friendship

Abstract. A state space description is given of all stable rational matrix so-lutions of a general rational Bezout type equation on the right half-plane.Included are a state space formula for a particular solution satisfying a cer-tain 𝐻2 minimality condition, a state space formula for the inner functiondescribing the null space of the multiplication operator corresponding to theBezout equation, and a parameterization of all solutions using the particularsolution and this inner function. A state space version of the related Tolokon-nikov lemma is also presented.

Mathematics Subject Classification (2010). Primary 47B35, 39B42; Secondary47A68, 93B28.

Keywords. Bezout equation; stable rational matrix functions; state space rep-resentation; algebraic Riccati equation; stabilizing solution, right invertiblemultiplication operator; Wiener–Hopf operators.

1. Introduction

In this paper 𝐺 is a stable rational 𝑚× 𝑝 matrix function. Here stable means that𝐺 is proper, that is, the limit of 𝐺(𝑠) as 𝑠→ ∞ exists, and 𝐺 has all its poles inthe open left half-plane {𝑠 ∈ ℂ ∣ ℜ(𝑠) < 0}. In other words, 𝐺 is a rational matrix-valued 𝐻∞ function, where the latter means that 𝐺 is analytic and bounded onthe open right half-plane. In this paper 𝑝 will be larger than 𝑚, and thus 𝐺 willbe a “fat” non-square matrix function. We shall be interested in stable rational𝑝×𝑚 matrix-valued solutions 𝑋 of the Bezout type equation

𝐺(𝑠)𝑋(𝑠) = 𝐼𝑚, ℜ𝑠 ≥ 0. (1.1)

The symbol 𝐼𝑚 on the right-hand side denotes the 𝑚×𝑚 identity matrix.

146 A.E. Frazho, M.A. Kaashoek and A.C.M. Ran

Throughout we shall assume that 𝐺 admits a state space realization of theform

𝐺(𝑠) = 𝐶(𝑠𝐼𝑛 −𝐴)−1𝐵 +𝐷. (1.2)

Here 𝐴 is an 𝑛×𝑛 matrix which is assumed to be stable, that is, all the eigenvaluesof 𝐴 are contained in the open left half-plane. Moreover, 𝐵, 𝐶 and 𝐷 are matricesof appropriate sizes. Our aim is to give necessary and sufficient conditions for thesolvability of (1.1), and to give a full description of all stable rational matrix-valuedsolutions, in terms of the matrices appearing in the realization (1.2). The resultswe present are the continuous analogs of the main theorems in [6] and [8].

To state the main results we need some additional notation. By 𝑃 we denotethe controllability Gramian associated with the realization (1.2), that is, 𝑃 is the(unique) solution of the Lyapunov equation

𝐴𝑃 + 𝑃𝐴∗ +𝐵𝐵∗ = 0. (1.3)

Consider the algebraic Riccati equation

𝐴∗𝑄+𝑄𝐴+ (𝐶 − Γ∗𝑄)∗(𝐷𝐷∗)−1(𝐶 − Γ∗𝑄) = 0, (1.4)

where Γ is defined by

Γ = 𝐵𝐷∗ + 𝑃𝐶∗. (1.5)

Here it is assumed that 𝐷 is right invertible, which is a natural condition. Indeed,if (1.1) has a stable rational matrix solution 𝑋 , then using (1.2) and the fact that𝑋 is proper, we see that 𝐷𝑋(∞) = lim𝑠→∞𝐺(𝑠)𝑋(𝑠) = 𝐼𝑚. Hence 𝑋(∞) is aright inverse of 𝐷, and thus 𝐷 is right invertible. A solution 𝑄 of (1.4) is calledthe stabilizing solution of the algebraic Riccati equation (1.4) if 𝑄 is Hermitianand the 𝑛× 𝑛 matrix 𝐴0 given by

𝐴0 = 𝐴− Γ(𝐷𝐷∗)−1(𝐶 − Γ∗𝑄) (1.6)

is stable. If it exists, a stabilizing solution is unique (cf. formula (2.11)). Thefollowing is the first main result of this paper.

Theorem 1.1. There is a stable rational 𝑝 × 𝑚 matrix function 𝑋 satisfying theequation 𝐺(𝑠)𝑋(𝑠) = 𝐼𝑚 if and only if the following three conditions hold

1. The matrix 𝐷 is right invertible,2. there exists a stabilizing solution 𝑄 of the Riccati equation (1.4), and3. the matrix 𝐼𝑛 − 𝑃𝑄 is invertible.

In that case a particular solution of (1.1) is given by

Ξ(𝑠) =(𝐼𝑝 − 𝐶1(𝑠𝐼𝑛 −𝐴0)

−1(𝐼𝑛 − 𝑃𝑄)−1𝐵)𝐷∗(𝐷𝐷∗)−1, (1.7)

where 𝐴0 is the stable 𝑛× 𝑛 matrix given by (1.6) and

𝐶1 = 𝐷∗(𝐷𝐷∗)−1(𝐶 − Γ∗𝑄) +𝐵∗𝑄. (1.8)

Matrix Solutions of a Bezout Type Equation on the Half-plane 147

The matrix 𝐷∗(𝐷𝐷∗)−1 appearing in (1.7) is Moore–Penrose right inverseof 𝐷. In what follows we shall often denote 𝐷∗(𝐷𝐷∗)−1 by 𝐷+. Note thatdimKer𝐷 = 𝑝−𝑚.

The rational 𝑝× 𝑝 matrix function appearing in the right-hand side of (1.7)between the brackets will be denoted by 𝑌 , that is,

𝑌 (𝑠) = 𝐼𝑝 − 𝐶1(𝑠𝐼𝑛 −𝐴0)−1(𝐼𝑛 − 𝑃𝑄)−1𝐵. (1.9)

Note that the value of 𝑌 at infinity is invertible. Hence 𝑌 (𝑠)−1 is a well-definedrational matrix function. We shall see that 𝑌 (𝑠)−1 is again stable. Thus both 𝑌 (𝑠)and 𝑌 (𝑠)−1 are stable rational matrix functions. In other words the entries of both𝑌 (𝑠) and 𝑌 (𝑠)−1 are 𝐻∞ functions. In this case we say that 𝑌 is invertible outer.

Among other things the following theorem describes the set of all stablerational solutions to 𝐺(𝑠)𝑋(𝑠) = 𝐼𝑚.

Theorem 1.2. There exists a stable rational 𝑝 × 𝑚 matrix function 𝑋 satisfying𝐺(𝑠)𝑋(𝑠) = 𝐼𝑚 if and only if 𝐷 is right invertible and there exists a stable ra-tional 𝑝× 𝑝 matrix function 𝑌 which is invertible outer and satisfies the equation𝐺(𝑠)𝑌 (𝑠) = 𝐷. In this case one such 𝑌 is given by (1.9) and the inverse of this𝑌 is given by

𝑌 (𝑠)−1 = 𝐼𝑝 + 𝐶1(𝐼𝑛 − 𝑃𝑄)−1(𝑠𝐼𝑛 −𝐴)−1𝐵. (1.10)

Moreover, using this function 𝑌 the following holds.

(i) Let 𝐸 be any isometry mapping ℂ𝑝−𝑚 into ℂ𝑝 such that Im𝐸 = Ker𝐷. Thenthe function

Θ(𝑠) = 𝑌 (𝑠)𝐸 =(𝐼𝑝 − 𝐶1(𝑠𝐼𝑛 −𝐴0)

−1(𝐼𝑛 − 𝑃𝑄)−1𝐵)𝐸 (1.11)

is a stable rational 𝑝× (𝑝−𝑚) matrix function satisfying 𝐺(𝑠)Θ(𝑠) = 0, andΘ is inner, that is, Θ(−𝑠)∗Θ(𝑠) = 𝐼𝑝−𝑚.

(ii) If ℎ is any ℂ𝑝-valued 𝐻2 function satisfying 𝐺(𝑠)ℎ(𝑠) = 0, then there existsa unique ℂ(𝑝−𝑚)-valued 𝐻2 function 𝜔 such that ℎ(𝑠) = Θ(𝑠)𝜔(𝑠). In fact,𝜔(𝑠) = Θ(−𝑠)∗ℎ(𝑠).

(iii) The set of all stable rational 𝑝×𝑚 matrix functions 𝑋 satisfying 𝐺(𝑠)𝑋(𝑠) =𝐼𝑚 is given by

𝑋(𝑠) =(𝐼𝑝 − 𝐶1(𝑠𝐼𝑛 −𝐴0)

−1(𝐼𝑛 − 𝑃𝑄)−1𝐵)×

×(𝐷∗(𝐷𝐷∗)−1 + 𝐸𝑍(𝑠)

), (1.12)

where 𝑍 is an arbitrary stable rational (𝑝 −𝑚) ×𝑚 matrix function. More-over, if 𝑋 satisfies 𝐺(𝑠)𝑋(𝑠) = 𝐼𝑚, then 𝑍 in (1.12) is given by 𝑍(𝑠) =𝐸∗𝑌 (𝑠)−1𝑋(𝑠).

(iv) The rational 𝑝× 𝑝 matrix function

𝐺𝑒𝑥𝑡(𝑠) =

[𝐺(𝑠)

𝐸∗𝑌 (𝑠)−1

], ℜ𝑠 ≥ 0, (1.13)


is invertible outer and its inverse is given by

𝐺𝑒𝑥𝑡(𝑠)−1 =

[Ξ(𝑠) Θ(𝑠)

], ℜ𝑠 ≥ 0. (1.14)

Note that item (ii) tells us that the null space Ker𝑀𝐺 of the multiplicationoperator 𝑀𝐺 defined by 𝐺, mapping 𝐻2

𝑝 into 𝐻2𝑚, is given by Ker𝑀𝐺 = Θ𝐻2

𝑝−𝑚.Thus Θ plays the role of the inner function in the Beurling–Lax theorem specifiedfor Ker𝑀𝐺. Furthermore, (1.12) in item (iii) can be rewritten in the followingequivalent form 𝑋(𝑠) = Ξ(𝑠) + Θ(𝑠)𝑍(𝑠). Using this form of (1.12) we expect ourstate space formulas also to be useful in deriving rational 𝐻∞ solutions of (1.1)that satisfy an additional 𝐻∞ norm constraint, by reducing the norm constraintproblem to a generalized Sarason problem (cf. Section I.7 in [7]). Finally, item (iv)is inspired by Tolokonnikov’s lemma (see [18] and [16, Appendix 3]).

The formulas in Theorems 1.1 and 1.2 can be easily converted into a Matlabprogram to compute Ξ in (1.7), the function 𝑌 in (1.9), and Θ in (1.11).

We see Theorems 1.1 and 1.2 as the closed right half-plane analogues ofTheorem 1.1 in [6] and Theorem 1.1 in [8], which deal with equation (1.1) in thesetting of rational matrix functions analytic in the closed unit disc. Obviously,a way to obtain the set of all stable rational matrix solutions to equation (1.1)is to use the Cayley transform to derive the right half-plane solutions from theiranalogues in the disc case as given in [8]. However, note that in the present half-plane case there is an additional difficulty: The constant functions are not in 𝐿2,whereas in the disc case they are in 𝐿2. Furthermore, the particular solution Ξ inTheorem 1.1 is not the analogue of the least squares solution in [6]. On the otherhand, as we shall show in Section 4, the function Ξ has an interpretation in termsof solutions to a somewhat different minimization problem (see Theorem 4.1).

We take this occasion to mention that Theorem 1.1 in [6] and Theorem 1.1 in [8]have predecessors in the papers [14] and [13]. In particular, see Lemma 4.1 andTheorem 4.2 in [13]. We are grateful to Dr. Sander Wahls for mentioning to usthese and several related other references. It is interesting to see the role the Bezoutequation plays in solving the engineering problems considered in [14] and [13]. Theproofs in [6] and [8] are quite different from those in [14] and [13]; also differentRiccati equations are used and different state space formulas are obtained.

There is an extensive literature on the Bezout equation and the related coronaequation, see, e.g., the classical papers [4], [9], [18], the books [16], [15], [1], themore recent papers [19], [20], [21], [22], and the references therein. Also, findingrational matrix solutions in state space form for Bezout equations is a classicaltopic in mathematical system theory; see, e.g., the book [23], and the papers[11], [10]. However, as far as we know the formulas we present here are new andcannot easily be obtained using the methods presented in the classical sources. Theinterpretation of the special solution (1.7) as a limit of solutions of minimizationproblems also seems to be new. Moreover, the approach we follow in the presentpaper and the earlier papers [6, 8] can be extended to a Wiener space setting. Infact, in a Wiener space setting the function 𝑌 given by (1.9) appears in a very


natural way; see also the comment at the end of Section 2. We plan to return tothis in a future paper, also for the discrete case.

The paper consists of four sections, including this introduction. In the secondsection we present the preliminaries from operator theory used in the proofs, andwe explain the role of the Riccati equation (1.4), and prove the necessity of con-ditions 1, 2, 3 in Theorem 1.1. The third section contains the proofs of Theorem1.1 and 1.2. In the final section we consider an optimization problem, which helpsin identifying Ξ in as a solution with a special minimality property.

2. Operator theory and Riccati equation

In this section we prove the necessity of conditions 1, 2, 3 in Theorem 1.1. Our proofrequires some preliminaries from operator theory and uses the Riccati equation(1.4).

Let Ω be any proper rational 𝑘 × 𝑟 matrix function with no pole on theimaginary axis 𝑖ℝ. With Ω we associate the Wiener–Hopf operator 𝑇Ω and theHankel operator 𝐻Ω, both mapping 𝐿

2𝑟(ℝ

+) into 𝐿2𝑘(ℝ+). These operators are the

integral operators defined by

(𝑇Ω𝑓)(𝑡) = Ω(∞)𝑓(𝑡) +∫ 𝑡

0

𝜔(𝑡− 𝜏)𝑓(𝜏)𝑑𝜏, 𝑡 ≥ 0, 𝑓 ∈ 𝐿2𝑟(ℝ+), (2.1)

(𝐻Ω𝑓)(𝑡) =

∫ ∞

0

𝜔(𝑡+ 𝜏)𝑓(𝜏)𝑑𝜏, 𝑡 ≥ 0 𝑓 ∈ 𝐿2𝑟(ℝ+). (2.2)

Here 𝜔 is the Lebesque integrable (continuous) matrix function on the imaginaryaxis determined by Ω via the Fourier transform:

Ω(𝑠) = Ω(∞) +∫ ∞

−∞𝑒−𝑠𝜏𝜔(𝜏) 𝑑𝜏, 𝑠 ∈ 𝑖ℝ.

In the sequel we shall freely use the basic theory of Wiener–Hopf and Hankeloperators which can be found in Chapters XII and XIII of [12]. Note that in [12]the Fourier transform is taken with respect to the real line instead of the imaginaryaxis as is done here.

Now let 𝐺 be the stable rational 𝑝×𝑚 function given by (1.2). Then

𝐺(𝑠) = 𝐷 +

∫ ∞

0

𝑒−𝑠𝜏𝐶𝑒𝜏𝐴𝐵 𝑑𝜏, 𝑠 ∈ 𝑖ℝ.

Hence the Wiener–Hopf operator 𝑇𝐺 and the Hankel operator 𝐻𝐺 are given by

(𝑇𝐺𝑓)(𝑡) = 𝐷𝑓(𝑡) +

∫ 𝑡

0

𝐶𝑒(𝑡−𝜏)𝐴𝐵𝑓(𝜏)𝑑𝜏, 𝑡 ≥ 0, (2.3)

(𝐻𝐺𝑓)(𝑡) =

∫ ∞

0

𝐶𝑒(𝑡+𝜏)𝐴𝐵𝑓(𝜏)𝑑𝜏, 𝑡 ≥ 0. (2.4)

With 𝐺 we also associate the rational 𝑚×𝑚 matrix function 𝑅 given by 𝑅(𝑠) =𝐺(𝑠)𝐺(−𝑠)∗. Note that 𝑅 is a proper rational 𝑚×𝑚 matrix function with no pole


on the imaginary axis. By 𝑇𝑅 we denote the corresponding Wiener–Hopf operatoracting on 𝐿2𝑚(ℝ

+). It is well known (see, e.g., formula (24) in Section XII.2 of [12])that

𝑇𝑅 = 𝑇𝐺𝑇∗𝐺 +𝐻𝐺𝐻

∗𝐺. (2.5)

Next assume that the equation 𝐺(𝑠)𝑋(𝑠) = 𝐼𝑚 has a stable rational matrixsolution 𝑋 . The fact that 𝑋 is stable implies that 𝑋 is proper and has no poleson the imaginary axis, and thus 𝑇𝑋 is well defined. Furthermore, 𝑇𝐺𝑇𝑋 = 𝑇𝐺𝑋 ;see [12, Proposition XIII.1.2]. Since 𝐺𝑋 is identically equal to the 𝑚×𝑚 identitymatrix 𝑇𝐺𝑋 is the identity operator on 𝐿2𝑚(ℝ

+), and hence 𝑇𝑋 is a right inverseof 𝑇𝐺. The fact 𝑇𝐺 that is right invertible, implies that 𝑇𝐺𝑇

∗𝐺 is invertible and

hence strictly positive. The identity (2.5) then shows that 𝑇𝑅 is strictly positivetoo, and hence is invertible.

In the following proposition we use the algebraic Riccati equation (1.4) toobtain necessary and sufficient conditions for 𝑇𝑅 to be invertible in terms of thematrices 𝐴, 𝐵, and 𝐶 appearing in the realization (1.2). As in Section 1, we denoteby 𝑃 the controllability Gramian associated with the realization (1.2), that is, 𝑃 isthe solution of the Lyapunov equation (1.3). Finally, Γ is the 𝑛×𝑚 matrix definedby (1.5).

Proposition 2.1. Let 𝑅(𝑠) = 𝐺(𝑠)𝐺(−𝑠)∗. Then the operator 𝑇𝑅 is invertible ifand only if the algebraic Riccati equation

𝐴∗𝑄+𝑄𝐴+ (𝐶 − Γ∗𝑄)∗ (𝐷𝐷∗)−1 (𝐶 − Γ∗𝑄) = 0 (2.6)

has a stabilizing solution 𝑄, that is, 𝑄 is a Hermitian solution of (2.6) and theoperator 𝐴0, defined by

𝐴0 = 𝐴− Γ𝐶0, where 𝐶0 = (𝐷𝐷∗)−1 (𝐶 − Γ∗𝑄) , (2.7)

is stable.

Proof. The proposition is an immediate consequence of Theorem 14.8 in [3]. Tosee this, we first show that

𝑅(𝑠) = 𝐷𝐷∗ + 𝐶(𝑠𝐼𝑛 −𝐴)−1Γ− Γ∗(𝑠𝐼𝑛 +𝐴∗)−1𝐶∗. (2.8)

This partial fraction expansion for 𝑅 follows from the Lyapunov equation (1.3),and its immediate consequence

−(𝑠𝐼𝑛 −𝐴)−1𝐵𝐵∗(𝑠𝐼𝑛 +𝐴∗)−1 = (𝑠𝐼𝑛 −𝐴)−1𝑃 − 𝑃 (𝑠𝐼𝑛 +𝐴∗)−1.By employing 𝐺(𝑠) = 𝐷 + 𝐶(𝑠𝐼𝑛 −𝐴)−1𝐵 the identity (2.8) then follows from

𝑅(𝑠) = 𝐺(𝑠)𝐺(−𝑠)∗ = (𝐷 + 𝐶(𝑠𝐼𝑛 −𝐴)−1𝐵)(𝐷∗ −𝐵∗(𝑠𝐼𝑛 +𝐴∗)𝐶∗)= 𝐷𝐷∗ + 𝐶(𝑠𝐼𝑛 −𝐴)−1𝐵𝐷∗ −𝐷𝐵∗(𝑠𝐼𝑛 +𝐴∗)𝐶∗+ 𝐶(𝑠𝐼𝑛 −𝐴)−1𝑃𝐶∗ − 𝐶𝑃 (𝑠𝐼𝑛 + 𝐴∗)−1𝐶∗.

Using Γ = 𝐵𝐷∗ + 𝑃𝐶∗, this yields (2.8). Given (2.8) we can apply Theorem 14.8in [3], replacing 𝐽 by 𝐷𝐷∗ and 𝐵 by Γ, and rewriting the corresponding algebraicRiccati equation in the form (2.6). □


From the partial fraction expansion (2.8) it follows that the action of theWiener–Hopf operator 𝑇𝑅 on 𝐿2𝑚(ℝ

+) is given by

(𝑇𝑅𝑓)(𝑡) = 𝐷𝐷∗𝑓(𝑡) +

∫ 𝑡

0

𝐶𝑒(𝑡−𝜏)𝐴Γ𝑓(𝜏) 𝑑𝜏

+

∫ ∞

𝑡

Γ∗𝑒−(𝑡−𝜏)𝐴∗𝐶∗𝑓(𝜏) 𝑑𝜏. 𝑡 ≥ 0.

(2.9)

By𝑊obs and𝑊0, obs we denote the observability operators mapping the state spaceℂ𝑛 into 𝐿2𝑚(ℝ

+) defined by

(𝑊obs𝑥)(𝑡) = 𝐶𝑒𝑡𝐴𝑥 and (𝑊0, obs𝑥)(𝑡) = 𝐶0𝑒

𝑡𝐴0𝑥, where 𝑥 ∈ ℂ𝑛. (2.10)

Proposition 2.2. Assume that 𝑇𝑅 is invertible, or equivalently, there exists a sta-bilizing solution 𝑄 to the algebraic Riccati equation (2.6). Then this stabilizingsolution is uniquely determined by

𝑄 =𝑊 ∗obs𝑇

−1𝑅 𝑊obs. (2.11)

Proof. To establish this, let us first show that 𝑄 satisfies the following Lyapunovequation

𝐴∗𝑄+𝑄𝐴0 + 𝐶∗𝐶0 = 0. (2.12)

Recall that 𝐴0 = 𝐴− Γ𝐶0. Then (2.12) follows from the Riccati equation

0 = 𝐴∗𝑄+𝑄𝐴+ (𝐶 − Γ∗𝑄)∗ (𝐷𝐷∗)−1 (𝐶 − Γ∗𝑄)= 𝐴∗𝑄+𝑄 (𝐴0 + Γ𝐶0) + (𝐶 − Γ∗𝑄)∗ 𝐶0

= 𝐴∗𝑄+𝑄𝐴0 + 𝐶∗𝐶0.

Thus (2.12) holds. Because 𝐴 and 𝐴0 are both stable, the stabilizing solution 𝑄can also be written as

𝑄 =

∫ ∞

0

𝑒𝑡𝐴∗𝐶∗𝐶0𝑒

𝑡𝐴0𝑑𝑡 =𝑊 ∗obs𝑊0, obs. (2.13)

Next we prove that

𝑇−1𝑅 𝑊obs =𝑊0, obs. (2.14)

This essentially follows from [2], Corollary 6.3. For completeness we provide aproof. It suffices to compute 𝑇𝑅𝑊0, obs. To do this, we use (2.9). Fix 𝑥 ∈ ℂ𝑛. Fromthe second identity in (2.10) and the first identiy in (2.7) it follows that∫ 𝑡

0

𝐶𝑒(𝑡−𝜏)𝐴Γ(𝑊0, obs𝑥)(𝜏) 𝑑𝜏 =

∫ 𝑡

0

𝐶𝑒(𝑡−𝜏)𝐴Γ𝐶0𝑒𝜏𝐴0𝑥 𝑑𝜏

=

∫ 𝑡

0

𝐶𝑒(𝑡−𝜏)𝐴(𝐴−𝐴0)𝑒𝜏𝐴0𝑥 𝑑𝜏 = 𝐶𝑒𝑡𝐴

( ∫ 𝑡

0

𝐶𝑒−𝜏𝐴(𝐴−𝐴0)𝑒𝜏𝐴0 𝑑𝜏

)𝑥

= −𝐶𝑒𝑡𝐴( ∫ 𝑡

0

𝑑

𝑑𝜏(𝑒−𝜏𝐴𝑒𝜏𝐴0) 𝑑𝜏

)𝑥 = −𝐶𝑒𝑡𝐴0𝑥+ 𝐶𝑒𝑡𝐴𝑥.


Furthermore, using the Lyapunov identity (2.12) we obtain∫ ∞

𝑡

Γ∗𝑒−(𝑡−𝜏)𝐴∗𝐶∗(𝑊0, obs𝑥)(𝜏) 𝑑𝜏 =

∫ ∞

𝑡

Γ∗𝑒−(𝑡−𝜏)𝐴∗𝐶∗𝐶0𝑒

𝜏𝐴0𝑥 𝑑𝜏

= −∫ ∞

𝑡

Γ∗𝑒−(𝑡−𝜏)𝐴∗(𝐴∗𝑄+𝑄𝐴0)𝑒

𝜏𝐴0𝑥 𝑑𝜏

= −Γ∗𝑒−𝑡𝐴∗( ∫ ∞

𝑡

𝑒𝜏𝐴∗(𝐴∗𝑄 +𝑄𝐴0)𝑒

𝜏𝐴0 𝑑𝜏)𝑥

= −Γ∗𝑒−𝑡𝐴∗( ∫ ∞

𝑡

𝑑

𝑑𝜏(𝑒𝜏𝐴

∗𝑄𝑒𝜏𝐴0) 𝑑𝜏

)𝑥 = −Γ∗𝑒−𝑡𝐴∗(− 𝑒𝑡𝐴∗

𝑄𝑒𝑡𝐴0

)𝑥

= Γ∗𝑄𝑒𝑡𝐴0𝑥.

Using (2.9) and the second identity in (2.7) we conclude that

(𝑇𝑅𝑊0, obs)(𝑡) = 𝐷𝐷∗𝐶0𝑒

𝑡𝐴0 + (−𝐶𝑒𝑡𝐴0 + 𝐶𝑒𝑡𝐴) + Γ∗𝑄𝑒𝑡𝐴0

= (𝐷𝐷∗𝐶0 + Γ∗𝑄)𝑒𝑡𝐴0 − 𝐶𝑒𝑡𝐴0 + 𝐶𝑒𝑡𝐴 = 𝐶𝑒𝑡𝐴.

This proves 𝑇𝑅𝑊0, obs =𝑊obs, and hence (2.14) holds. Together (2.13) and (2.14)

show that𝑊 ∗obs𝑇

−1𝑅 𝑊obs =𝑊

∗obs𝑊0, obs = 𝑄. In particular, the stabilizing solution

is uniquely determined by (2.11). □

Lemma 2.3. Assume 𝑇𝑅 is invertible. Then 𝐼−𝐻∗𝐺𝑇−1𝑅 𝐻𝐺 is positive. Furthermore,the following are equivalent:

(i) 𝑇𝐺 is right invertible,(ii) 𝐼 −𝐻∗𝐺𝑇−1𝑅 𝐻𝐺 is strictly positive,

(iii) 𝐼 −𝐻∗𝐺𝑇−1𝑅 𝐻𝐺 is invertible.

Proof. Rewriting (2.5) as 𝑇𝐺𝑇∗𝐺 = 𝑇𝑅−𝐻𝐺𝐻

∗𝐺, and multiplying the latter identity

from the left and from the right by 𝑇−1/2𝑅 shows that

𝑇−1/2𝑅 𝑇𝐺𝑇

∗𝐺𝑇

−1/2𝑅 = 𝐼 − 𝑇−1/2𝑅 𝐻𝐺𝐻

∗𝐺𝑇

−1/2𝑅 . (2.15)

Hence 𝐼 − 𝑇−1/2𝑅 𝐻𝐺𝐻∗𝐺𝑇

−1/2𝑅 is positive which shows that 𝐻∗𝐺𝑇

−1/2𝑅 is a contrac-

tion. But then 𝐻∗𝐺𝑇−1𝑅 𝐻𝐺 =

(𝐻∗𝐺𝑇

−1/2𝑅

)(𝐻∗𝐺𝑇

−1/2𝑅

)∗is also a contraction, and

thus the operator 𝐼 −𝐻∗𝐺𝑇−1𝑅 𝐻𝐺 is positive.

Since 𝐼 − 𝐻∗𝐺𝑇−1𝑅 𝐻𝐺 is positive, the equivalence of items (ii) and (iii) is

trivial. Assume (ii) holds. Then 𝑇−1/2𝑅 𝐻𝐺 is a strict contraction, and hence the

same holds true for 𝑇−1/2𝑅 𝐻𝐺𝐻

∗𝐺𝑇

−1/2𝑅 . But then 𝐼−𝑇−1/2𝑅 𝐻𝐺𝐻

∗𝐺𝑇

−1/2𝑅 is strictly

positive, and (2.15) shows that 𝑇𝐺 is right invertible. The converse implication isproved in a similar way. □

Corollary 2.4. Assume that 𝑇𝑅 is invertible, or equivalently, there exists a stabi-lizing solution 𝑄 to the algebraic Riccati equation (2.6). Then the spectral radiusof 𝑄𝑃 is at most one.


Furthermore, the following are equivalent:

(i) 𝑇𝐺 is right invertible,(ii) 𝑟spec(𝑄𝑃 ) < 1,(iii) 𝐼𝑛 −𝑄𝑃 is invertible.

Proof. Let 𝑊con be the controllability operator mapping 𝐿𝑝2(ℝ

+) into ℂ𝑛 de-fined by

𝑊conℎ =

∫ ∞

0

𝑒𝑡𝐴𝐵ℎ(𝑡)𝑑𝑡, ℎ ∈ 𝐿2𝑝(ℝ+).

Then 𝑃 =𝑊con𝑊∗con and 𝐻𝐺 =𝑊obs𝑊con. Using these two identities and (2.11),

we obtain for the spectral radius of 𝐻∗𝐺𝑇−1𝑅 𝐻𝐺 that

𝑟spec(𝐻∗𝐺𝑇

−1𝑅 𝐻𝐺) = 𝑟spec(𝑊

∗con𝑊

∗obs𝑇

−1𝑅 𝑊obs𝑊con)

= 𝑟spec(𝑊∗con𝑄𝑊con)

= 𝑟spec(𝑄𝑊con𝑊∗con) = 𝑟spec(𝑄𝑃 ).

(2.16)

By Lemma 2.3 the operator 𝐼−𝐻∗𝐺𝑇−1𝑅 𝐻𝐺 is positive. Hence the spectral radius of

𝐻∗𝐺𝑇−1𝑅 𝐻𝐺 is at most one, and the preceding calculation shows that 𝑟spec(𝑄𝑃 ) ≤ 1

Since 𝑟spec(𝑄𝑃 ) ≤ 1, the equivalence of items (ii) and (iii) is trivial. Assume

𝑟spec(𝑄𝑃 ) < 1. Then (2.16) shows that 𝐼−𝐻∗𝐺𝑇−1𝑅 𝐻𝐺 is invertible, and Lemma 2.3tells us that 𝑇𝐺 is right invertible. To prove the converse implication, assume that𝑇𝐺 is right invertible. Then, by Lemma 2.3, the operator 𝐼−𝐻∗𝐺𝑇−1𝑅 𝐻𝐺 is strictly

positive. Hence 𝑟spec(𝐼−𝐻∗𝐺𝑇−1𝑅 𝐻𝐺)<1, and (2.16) shows that 𝑟spec(𝑄𝑃 )<1. □

Necessity of the conditions 1, 2, 3 in Theorem 1.1. Assume that the equation𝐺(𝑠)𝑋(𝑠) = 𝐼𝑚 has a stable rational matrix solution 𝑋 . As was shown in theparagraph preceding Theorem 1.1, this implies that 𝐷 is right invertible. Thuscondition 1 is necessary. Furthermore, in the paragraph directly after (2.5) it wasshown that 𝐺(𝑠)𝑋(𝑠) = 𝐼𝑚 has a stable rational matrix solution also implies that𝑇𝐺 is right invertible and 𝑇𝑅 is invertible. Given the latter we can apply Proposition2.1 to show that condition 2 is necessary. Finally, using Corollary 2.4, we see that𝑇𝐺 is right invertible and 𝑇𝑅 is invertible imply that 𝐼𝑛 − 𝑃𝑄 is invertible, whichshows that condition 3 is necessary. □

Comment. The identities appearing in this section can also be used to give analternative formula for the function 𝑌 in (1.9), namely

𝑌 (𝑠) = 𝐼𝑝 −∫ ∞

0

𝑒−𝑠𝑡𝑦(𝑡)𝑑𝑡, ℜ𝑠 ≥ 0, where 𝑦 = 𝑇 ∗𝐺(𝑇𝐺𝑇∗𝐺)−1𝑔. (2.17)

This formula also makes sense in a Wiener space setting. From formula (2.17) for𝑌 it follows that 𝑇𝐺𝑦 = 𝑔, which immediately implies that 𝐺(𝑠)𝑌 (𝑠) = 𝐷. Thelatter identity will be derived in the next section (see the second paragraph of theproof of Theorem 1.1) using state space computations. We plan to prove (2.17) ina future paper.


3. Proof of the two main theorems

It will be convenient first to prove the two identities given in the following lemma.

Lemma 3.1. Assume conditions 1, 2, 3 in Theorem 1.1 are satisfied. Then

𝐵𝐶1 = 𝐴(𝐼𝑛 − 𝑃𝑄)− (𝐼𝑛 − 𝑃𝑄)𝐴0, (3.1)

𝐷𝐶1 = 𝐶(𝐼𝑛 − 𝑃𝑄). (3.2)

Proof. Recall that 𝐶1 and 𝐶0 are respectively defined in (1.8) and (2.7). Thisimplies that

𝐶1 = 𝐷∗𝐶0 +𝐵

∗𝑄. (3.3)

To prove the first identity, we use the Lyapunov equation (2.12) with Γ defined in(1.5) to compute

𝐵𝐶1 = 𝐵𝐷∗𝐶0 +𝐵𝐵

∗𝑄 = (Γ− 𝑃𝐶∗)𝐶0 − (𝐴𝑃 + 𝑃𝐴∗)𝑄

= Γ𝐶0 + 𝑃𝐴∗𝑄+ 𝑃𝑄𝐴0 −𝐴𝑃𝑄− 𝑃𝐴∗𝑄

= Γ𝐶0 + 𝑃𝑄𝐴0 −𝐴𝑃𝑄 = 𝐴−𝐴0 + 𝑃𝑄𝐴0 −𝐴𝑃𝑄= 𝐴(𝐼𝑛 − 𝑃𝑄)− (𝐼𝑛 − 𝑃𝑄)𝐴0.

The second identity follows from

𝐷𝐶1 = 𝐶 − Γ∗𝑄+𝐷𝐵∗𝑄 = 𝐶 −𝐷𝐵∗𝑄− 𝐶𝑃𝑄 +𝐷𝐵∗𝑄

= 𝐶(𝐼𝑛 − 𝑃𝑄).Thus both identities are proved. □Proof of Theorem 1.1. In the previous section we have seen that the conditions 1,2, 3 in Theorem 1.1 are necessary. Therefore in what follows we assume these threeconditions are fullfilled. The latter allows us to introduce the 𝑝×𝑝 rational matrixfunction

𝑌 (𝑠) = 𝐼𝑝 − 𝐶1(𝑠𝐼𝑛 −𝐴0)−1(𝐼𝑛 − 𝑃𝑄)−1𝐵. (3.4)

Note that 𝑌 is stable, because the matrix 𝐴0 which is given by (1.6) is stable. Thelatter follows from the fact that condition 2 is satisfied. We claim that

𝑌 (𝑠)−1 = 𝐼𝑝 + 𝐶1(𝐼𝑛 − 𝑃𝑄)−1(𝑠𝐼𝑛 −𝐴)−1𝐵. (3.5)

Since 𝐴 is stable, we see that 𝑌 is invertible outer. To prove (3.5), we use (3.1).Indeed, using (3.1), we obtain

𝐴0 + (𝐼𝑛 − 𝑃𝑄)−1𝐵𝐶1 = 𝐴0 + (𝐼𝑛 − 𝑃𝑄)−1𝐴(𝐼𝑛 − 𝑃𝑄)−𝐴0

= (𝐼𝑛 − 𝑃𝑄)−1𝐴(𝐼𝑛 − 𝑃𝑄). (3.6)

Recall that the inverse of 𝐼𝑝 − 𝛾(𝑠𝐼𝑛 − 𝛼)−1𝛽 is the state space realization givenby 𝐼𝑝 + 𝛾

(𝑠𝐼𝑛 − (𝛼 + 𝛽𝛾)

)−1𝛽. Using this for the state space realization for 𝑌 in

(3.4) with (3.6), we obtain

𝑌 (𝑠)−1 = 𝐼𝑝 + 𝐶1(𝑠𝐼𝑛 − (𝐼𝑛 − 𝑃𝑄)−1𝐴(𝐼𝑛 − 𝑃𝑄))−1(𝐼𝑛 − 𝑃𝑄)−1𝐵= 𝐼𝑝 + 𝐶1(𝐼𝑛 − 𝑃𝑄)−1(𝑠𝐼𝑛 −𝐴)−1𝐵.


Hence the inverse of 𝑌 (𝑠) is given by (3.5). In particular, 𝑌 is an invertible outerfunction.

Next we show that 𝐺(𝑠)𝑌 (𝑠) = 𝐷. To do this we use (3.2) together with thestate space formula for 𝑌 (𝑠)−1 in (3.5), to obtain

𝐷𝑌 (𝑠)−1 = 𝐷 +𝐷𝐶1(𝐼𝑛 − 𝑃𝑄)−1(𝑠𝐼𝑛 −𝐴)−1𝐵= 𝐷 + 𝐶(𝑠𝐼𝑛 −𝐴)−1𝐵 = 𝐺(𝑠).

In other words, 𝐺(𝑠) = 𝐷𝑌 (𝑠)−1. By multiplying the latter identity from the leftby 𝑌 (𝑠) we obtain 𝐺(𝑠)𝑌 (𝑠) = 𝐷.

Finally, by comparing (1.7) and (3.4), we see that Ξ(𝑠) = 𝑌 (𝑠)𝐷∗(𝐷𝐷∗)−1.It follows that

𝐺(𝑠)Ξ(𝑠) = 𝐺(𝑠)𝑌 (𝑠)𝐷∗(𝐷𝐷∗)−1 = 𝐷𝐷∗(𝐷𝐷∗)−1 = 𝐼𝑚.

This completes the proof of Theorem 1.1. □

Proof of Theorem 1.2. Given the above proof of Theorem 1.1 it remains to proveitems (i)–(iv) in Theorem 1.2. We do this in four steps.

Step 1. First we show that Θ is inner. To do this, recall that

Θ(𝑠) = 𝐸 + 𝐶1(𝑠𝐼𝑛 −𝐴0)−1𝐵𝑖, where 𝐵𝑖 = −(𝐼𝑛 − 𝑃𝑄)−1𝐵𝐸. (3.7)

We shall make use of the following Lyapunov equation

𝐴∗0(𝑄−𝑄𝑃𝑄) + (𝑄 −𝑄𝑃𝑄)𝐴0 + 𝐶∗1𝐶1 = 0. (3.8)

To see this, notice, that (3.1), (3.2) and (3.3) with (2.7) and (2.12) yield

𝐶∗1𝐶1 = (𝐶∗0𝐷 +𝑄𝐵)𝐶1

= 𝐶∗0𝐶(𝐼𝑛 −𝑄𝑃 ) +𝑄𝐴(𝐼𝑛 − 𝑃𝑄)−𝑄(𝐼𝑛 − 𝑃𝑄)𝐴0

= −(𝑄𝐴+𝐴∗0𝑄)(𝐼𝑛 −𝑄𝑃 ) +𝑄𝐴(𝐼𝑛 − 𝑃𝑄)−𝑄(𝐼𝑛 − 𝑃𝑄)𝐴0

= −𝐴∗0(𝑄 −𝑄𝑃𝑄)− (𝑄 −𝑄𝑃𝑄)𝐴0.

Therefore (3.8) holds. The Lyapunov equation in (3.8) also yields

−(𝑠𝐼𝑛 +𝐴∗0)−1𝐶∗1𝐶1(𝑠𝐼𝑛 −𝐴0)−1

= (𝑄−𝑄𝑃𝑄)(𝑠𝐼𝑛 −𝐴0)−1 − (𝑠𝐼𝑛 +𝐴∗0)−1(𝑄 −𝑄𝑃𝑄).

(3.9)

To see this, simply multiply the previous equation by 𝑠𝐼𝑛 + 𝐴∗0 on the left and

𝑠𝐼𝑛 −𝐴0 on the right.


To show that Θ is an inner function, notice that (3.9) gives

Θ(−𝑠)∗Θ(𝑠) = (𝐸∗ −𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1𝐶∗1 ) (𝐸 + 𝐶1(𝑠𝐼𝑛 −𝐴0)−1𝐵𝑖

)= 𝐼𝑝−𝑚 + 𝐸∗𝐶1(𝑠𝐼𝑛 −𝐴0)

−1𝐵𝑖 −𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1𝐶∗1𝐸−𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1𝐶∗1𝐶1(𝑠𝐼𝑛 −𝐴0)

−1𝐵𝑖

= 𝐼𝑝−𝑚 + 𝐸∗𝐶1(𝑠𝐼𝑛 −𝐴0)−1𝐵𝑖 −𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1𝐶∗1𝐸

+𝐵∗𝑖 (𝑄−𝑄𝑃𝑄)(𝑠𝐼𝑛 −𝐴0)−1𝐵𝑖

−𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1(𝑄−𝑄𝑃𝑄)𝐵𝑖

= 𝐼𝑝−𝑚 + (𝐵∗𝑖 (𝑄−𝑄𝑃𝑄) + 𝐸∗𝐶1) (𝑠𝐼𝑛 −𝐴0)−1𝐵𝑖

−𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1 (𝐶∗1𝐸 + (𝑄−𝑄𝑃𝑄)𝐵𝑖) = 𝐼𝑝−𝑚.

The last equality follows from the fact that

𝐵∗𝑖 (𝑄 −𝑄𝑃𝑄) + 𝐸∗𝐶1 = 0. (3.10)

To verify this, observe that

𝐵∗𝑖 (𝑄−𝑄𝑃𝑄) = −𝐸∗𝐵∗(𝐼𝑛 −𝑄𝑃 )−1(𝑄−𝑄𝑃𝑄) = −𝐸∗𝐵∗𝑄𝐸∗𝐶1 = 𝐸

∗(𝐵∗𝑄+𝐷∗𝐶0) = 𝐸∗𝐵∗𝑄.

Hence 𝐵∗𝑖 (𝑄−𝑄𝑃𝑄) + 𝐸∗𝐶1 = 0. Therefore Θ(𝑠) is an inner function.

Step 2. It will be convenient first to prove item (iv). Take 𝑠 in the right half-plane,i.e., ℜ𝑠 ≥ 0. Using the definition of 𝑌 in (1.9), and the identities (1.7) and (1.11),we see that Ξ(𝑠) = 𝑌 (𝑠)𝐷+ and Θ(𝑠) = 𝑌 (𝑠)𝐸, and hence[

Ξ(𝑠) Θ(𝑠)]= 𝑌 (𝑠)

[𝐷+ 𝐸

].

Next observe that the 𝑝× 𝑝 matrix [𝐷+ 𝐸]is invertible, and[

𝐷+ 𝐸] [𝐷𝐸∗

]= 𝐼𝑝.

Thus[Ξ(𝑠) Θ(𝑠)

]is invertible, and[

Ξ(𝑠) Θ(𝑠)]−1

=

[𝐷𝐸∗

]𝑌 (𝑠)−1 =

[𝐺(𝑠)

𝐸∗𝑌 (𝑠)−1

].

This proves (1.14). Since the function[Ξ Θ

]is a stable rational 𝑝 × 𝑝 matrix

function, we see that the function defined by (1.13) is invertible outer.

Step 3. In this part we prove item (iii). Let 𝑋 be given by (1.12). In otherwords 𝑋(𝑠) = 𝑌 (𝑠)(𝐷∗(𝐷𝐷∗)−1+𝐸𝑍(𝑠)), where 𝑍 is an arbitrary stable rationalmatrix function of size (𝑝 − 𝑚) × 𝑚. Since 𝐺(𝑠)𝑌 (𝑠) = 𝐷, we see 𝐺(𝑠)𝑋(𝑠) =𝐷𝐷∗(𝐷𝐷∗)−1 + 𝐷𝐸𝑍(𝑠). But 𝐷𝐷∗(𝐷𝐷∗)−1 = 𝐼𝑚 and 𝐷𝐸 = 0. We concludethat 𝐺(𝑠)𝑋(𝑠) = 𝐼𝑚, as desired.

Next we deal with the reverse implication. Let 𝑋 be any stable rational𝑝 ×𝑚 matrix function satsfying the equation 𝐺(𝑠)𝑋(𝑠) = 𝐼𝑚. Put 𝐻 = 𝑋 − Ξ.Then 𝐻 is a rational matrix-valued function, and 𝐺(𝑠)𝐻(𝑠) = 0. Notice that


𝐸∗𝑌 (𝑠)−1Ξ(𝑠) = 0. Using item (iv) we obtain

𝐻(𝑠) =[Ξ(𝑠) Θ(𝑠)

] [ 𝐺(𝑠)𝐸∗𝑌 (𝑠)−1

]𝐻(𝑠)

=[Ξ(𝑠) Θ(𝑠)

] [ 0𝐸∗𝑌 (𝑠)−1𝐻(𝑠)

]= Θ(𝑠)𝐸∗𝑌 (𝑠)−1𝐻(𝑠) = Θ(𝑠)𝐸∗𝑌 (𝑠)−1𝑋(𝑠).

(3.11)

Thus 𝐻(𝑠) = Θ(𝑠)𝑍(𝑠), where 𝑍(𝑠) = 𝐸∗𝑌 (𝑠)−1𝑋(𝑠). Since 𝑌 is invertible outer,the inverse 𝑌 (⋅)−1 is a rational 𝑝× 𝑝 matrix function. Thus 𝑍 is a rational matrixfunction of size (𝑝−𝑚)×𝑚, and 𝑋 has the desired representation (1.12).

Step 4. We prove item (ii). Let ℎ be any ℂ𝑝-valued 𝐻2 function such that𝐺(𝑠)ℎ(𝑠) = 0 for ℜ𝑠 > 0. Repeating the first three identities in (3.11) with ℎin place of 𝐻 , we see that ℎ(𝑠) = Θ(𝑠)𝜔(𝑠), where 𝜔(𝑠) = 𝐸∗𝑌 (𝑠)−1ℎ(𝑠). Since𝑌 is invertible outer, the entries of 𝑌 (⋅)−1 are 𝐻∞ functions. Hence the entriesof 𝜔 are 𝐻2 functions. Furthermore, using the fact that Θ is inner, we see that𝜔(𝑠) = Θ(−𝑠)∗ℎ(𝑠) for ℜ𝑠 > 0. □

To complete this section, let us establish the following useful (see the nextsection) identity

Θ(−𝑠)∗Ξ(𝑠) = −𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1𝐶∗0 . (3.12)

For convenience, let us set 𝐵1 = −(𝐼𝑛 − 𝑃𝑄)−1𝐵𝐷+. Then (3.12) follows from(3.3), (3.9) and (3.10), that is,

Θ(−𝑠)∗Ξ(𝑠) = (𝐸∗ −𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1𝐶∗1)(𝐷+ + 𝐶1(𝑠𝐼𝑛 −𝐴0)−1𝐵1

)= 𝐸∗𝐶1(𝑠𝐼𝑛 −𝐴0)

−1𝐵1 −𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1𝐶∗1𝐷+

−𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1𝐶∗1𝐶1(𝑠𝐼𝑛 −𝐴0)−1𝐵1

=((𝐸∗𝐶1 +𝐵

∗𝑖 (𝑄 −𝑄𝑃𝑄)

)(𝑠𝐼𝑛 −𝐴0)

−1𝐵1

−𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1(𝐶∗1𝐷

+ + (𝑄 −𝑄𝑃𝑄)𝐵1

)= −𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1

(𝐶∗1𝐷

+ + (𝑄 −𝑄𝑃𝑄)𝐵1

)= −𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1 (𝐶∗1 −𝑄𝐵)𝐷∗(𝐷𝐷∗)−1= −𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴∗0)−1𝐶∗0 .

This establishes (3.12).

4. The minimization problem

Throughout this section 𝐺 is a stable rational 𝑚 × 𝑝 matrix function, and weassume that 𝐺 is given by the stable state space representation (1.2). We alsoassume that 𝑇𝐺 is right invertible.

For each 𝛾 > 0 let 𝑤𝛾 be the scalar weight function given by 𝑤𝛾(𝑠) = (𝑠+𝛾)−1.Note that for each 𝑋 ∈ 𝐻∞𝑝×𝑚 the function 𝑤𝛾𝑋 belongs to 𝐻2

𝑝×𝑚. With 𝐺 and


the weight function 𝑤𝛾 we associate the following minimization problem:

inf{∥𝑤𝛾𝑋∥2 ∣ 𝐺(𝑠)𝑋(𝑠) = 𝐼𝑚 (ℜ𝑠 > 0) and 𝑋 ∈ 𝐻∞𝑝×𝑚

}. (4.1)

The problem is to check whether or not the infimum is a minimum, and if so, tofind a minimizing function. We shall show in this section that such a minimizing𝑋 exists and is unique. It what follows this minimizing function will be denotedby Ξ𝛾 . The next theorem shows that Ξ𝛾 is a stable rational matrix function andprovides an explicit formula for Ξ𝛾 .

Theorem 4.1. For each 𝛾 > 0 there is a unique solution to the optimization problem(4.1), and this solution is given by

Ξ𝛾(𝑠) = Ξ(𝑠)−Θ(𝑠)𝐵∗𝑖 (𝛾𝐼 −𝐴∗0)−1𝐶∗0 , ℜ𝑠 > 0. (4.2)

Here Ξ and Θ are the rational matrix functions given by (1.7) and (1.11), respec-tively, the matrix 𝐴0 is defined by (1.6) and 𝐶0 by (2.7). In particular, we haveΞ(𝑠) = lim𝛾→∞ Ξ𝛾(𝑠).

Proof. Fix 𝛾 > 0. Since for each 𝑋 ∈ 𝐻∞𝑝×𝑚 the function 𝑤𝛾𝑋 belongs to 𝐻2𝑝×𝑚,

we have

∥𝑤𝛾Ξ𝛾∥2 = inf{∥𝑤𝛾𝑋∥2 ∣ 𝑤𝛾𝐺𝑋 = 𝑤𝛾𝐼𝑚 and 𝑋 ∈ 𝐻∞𝑝×𝑚

}(4.3)

≥ inf{∥𝑍∥2 ∣ 𝐺𝑍 = 𝑤𝛾𝐼𝑚 and 𝑍 ∈ 𝐻2

𝑝×𝑚

}= ∥𝑍𝛾∥2. (4.4)

The last optimization problem is a least squares optimization problem. So the op-timal solution 𝑍𝛾 for the problem (4.4) is unique. We first derive a formula for 𝑍𝛾 .

From item (ii) in Theorem 1.2 we know that Ker𝑇𝐺 = Im𝑇Θ. By taking theFourier transform, we see that 𝑍𝛾 is the unique matrix function in 𝐻

2𝑝×𝑚 such that

𝐺𝑍𝛾 = 𝑤𝛾𝐼𝑚 and 𝑍𝛾 is orthogonal to Θ𝐻2(𝑝−𝑚)×𝑚. Using 𝐺Ξ = 𝐼𝑚, we obtain

that all 𝐻2 solutions to 𝐺𝑍 = 𝑤𝛾𝐼𝑚 are given by

𝑍 = 𝑤𝛾Ξ +Θ𝐻2(𝑝−𝑚)×𝑚.

So we are looking for a 𝐻2 function 𝑍𝛾 such that

𝑍𝛾 = 𝑤𝛾Ξ +Θ𝐹 and 𝑍𝛾 ⊥ Θ𝐻2(𝑝−𝑚)×𝑚,

where 𝐹 is a matrix function in𝐻2(𝑝−𝑚)×𝑚. By exploiting that Θ is inner, we obtain

𝑤𝛾Θ∗Ξ+ 𝐹 ⊥ 𝐻2

(𝑝−𝑚)×𝑚.

But then (3.12) tells us that the latter is equivalent to

−𝑤𝛾𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴

∗0)−1𝐶∗0 + 𝐹 ⊥ 𝐻2

(𝑝−𝑚)×𝑚. (4.5)

However,−𝑤𝛾(𝑠)𝐵∗𝑖 (𝑠𝐼𝑛+𝐴

∗0)−1𝐶∗0 admits a partial fraction expansion of the form

−𝑤𝛾(𝑠)𝐵∗𝑖 (𝑠𝐼𝑛 +𝐴

∗0)−1𝐶∗0 = (𝑠+ 𝛾)−1𝐵∗𝑖 (𝛾𝐼𝑛 −𝐴∗0)−1𝐶∗0 +Ω∗(𝑠),

where Ω is in 𝐻2(𝑝−𝑚)×𝑚. Using this in the orthogonality relation (4.5), we see that

𝐹 (𝑠) = −𝑤𝛾(𝑠)𝐵∗𝑖 (𝛾𝐼𝑛 −𝐴∗0)−1𝐶∗0 . In other words,

𝑍𝛾(𝑠) = 𝑤𝛾(𝑠)Ξ(𝑠) − 𝑤𝛾(𝑠)Θ(𝑠)𝐵∗𝑖 (𝛾𝐼𝑛 −𝐴∗0)−1𝐶∗0 . (4.6)


Next put Ξ𝛾(𝑠) = (𝑠+ 𝛾)𝑍𝛾(𝑠). Then (4.6) implies that Ξ𝛾 is given by (4.2).Hence Ξ𝛾 is a stable rational matrix function. In particular, Ξ𝛾 belongs to 𝐻

∞𝑝×𝑚.

Furthermore, it follows that the inequality on the left-hand side of (4.4) is an equal-ity. We conclude that Ξ𝛾 given by (4.2) is the unique solution to the minimizationproblem (4.1). □

References

[1] M. Bakonyi and H.J. Woerdeman, Matrix completions, moments, and sums of Her-mitian squares, Princeton University Press, 2010.

[2] H. Bart, I. Gohberg, M.A. Kaashoek, and A.C.M. Ran, Factorization of matrix andoperator functions: the state space method, OT 178, Birkhauser Verlag, Basel, 2008.

[3] H. Bart, I. Gohberg, M.A. Kaashoek, and A.C.M. Ran, A state space approach tocanonical factorization: convolution equations and mathematical systems, OT 200,Birkhauser Verlag, Basel, 2010.

[4] L. Carlson, Interpolation by bounded analytic functions and the corona problem,Ann. Math. 76 (1962), 547–559.

[5] A.E. Frazho, M.A. Kaashoek, and A.C.M. Ran, The non-symmetric discrete alge-braic Riccati equation and canonical factorization of rational matrix functions onthe unit circle, Integral Equations and Operator Theory 66 (2010), 215–229.

[6] A.E. Frazho, M.A. Kaashoek and A.C.M. Ran, Right invertible multiplication op-erators and stable rational matrix solutions to an associate Bezout equation, I. theleast squares solution. Integral Equations and Operator Theory 70 (2011), 395–418.

[7] C. Foias, A. Frazho, I. Gohberg, and M.A. Kaashoek, Metric constrained interpola-tion, commutant lifting and systems, Birkhauser Verlag, Basel, 1998;

[8] A.E. Frazho, M.A. Kaashoek and A.C.M. Ran, Right invertible multiplication op-erators and stable rational matrix solutions to an associate Bezout equation, II:Description of all solutions, Operators and Matrices 6 (2012), 833–857.

[9] P. Fuhrmann, On the corona theorem and its applications to spectral problems inHilbert space, Trans. Amer. Math. Soc. 132 (1968), 55–66.

[10] P.A. Fuhrmann and R. Ober, On coprime factorizations, in: Operator Theory and itsApplications. The Tsuyoshi Ando Anniversary Volume. OT 62, Birkhauser Verlag,Basel, 1993, pp. 39–75.

[11] T. Georgiou and M.C. Smith, Optimal robustness in the gap metric, IEEE Trans.Automatic Control 35 (1990), 673–686.

[12] I. Gohberg, S. Goldberg, and M.A. Kaashoek, Classes of Linear Operators, VolumeI, OT 49, Birkhauser Verlag, Basel, 1990.

[13] G. Gu and E.F. Badran, Optimal design for channel equalization via the filterbankapproach, IEEE Trans.Signal Proc. 52 (2004), 536–545.

[14] G. Gu and L. Li, Worst-case design for optimal channel equalization in filterbanktransceivers, IEEE Trans. Signal Proc. 51 (2003), 2424–2435.

[15] J.W. Helton, Operator Theory, analytic functions, matrices and electrical engineer-ing, Regional Conference Series in Mathematics 68, Amer. Math. Soc. ProvidenceRI, 1987.


[16] N.K. Nikol’skii, Treatise on the shift operator, Grundlehren 273, Springer Verlag,Berlin 1986.

[17] V.V. Peller, Hankel Operators and their Applications, Springer Monographs in Math-ematics, Springer 2003.

[18] V.A. Tolokonnikov, Estimates in Carlson’s corona theorem. Ideals of the algebra𝐻∞, the problem of Szekefalvi-Nagy, Zap. Naucn. Sem. Leningrad. Otdel. Mat. Inst.Steklov. (LOMI) 113 (1981), 178–1981 (Russian).

[19] S. Treil, Lower bounds in the matrix corona theorem and the codimension one con-jecture, GAFA 14 (2004), 1118–1133.

[20] S. Treil and B.D. Wick, The matrix-valued 𝐻𝑝 corona problem in the disk andpolydisk, J. Funct. Anal. 226 (2005), 138–172.

[21] T.T. Trent and X. Zhang, A matricial corona theorem, Proc. Amer. Math. Soc. 134(2006), 2549–2558.

[22] T.T. Trent, An algorithm for the corona solutions on 𝐻∞(𝐷), Integral Equationsand Operator Theory 59 (2007), 421–435.

[23] M. Vidyasagar, Control system synthesis: a factorization approach, The MIT Press,Cambridge, MA, 1985.

A.E. FrazhoDepartment of Aeronautics and AstronauticsPurdue UniversityWest Lafayette, IN 47907, USAe-mail: [email protected]

M.A. KaashoekDepartment of MathematicsFaculty of SciencesVU UniversityDe Boelelaan 1081aNL-1081 HV Amsterdam, The Netherlandse-mail: [email protected]

A.C.M. RanDepartment of MathematicsFaculty of SciencesVU UniversityDe Boelelaan 1081aNL-1081 HV Amsterdam, The Netherlands

and

Unit for BMI, North-West UniversityPotchefstroom, South Africae-mail: [email protected]





Inverting Structured Operators Related toToeplitz Plus Hankel Operators

M.A. Kaashoek and F. van Schagen

Dedicated to our dear friend Leonia Lerer, on the occasion of his 70th birthday.

Abstract. In this paper the Ellis–Gohberg–Lay theorem on inversion of certainToeplitz plus Hankel operators is derived as a corollary of an abstract inversiontheorem for a certain class of structured operators. The main results also coverthe inversion theorems considered in [6].

Mathematics Subject Classification (2010). Primary 47A62, 47B35; Secondary47A50, 15A09, 65F05.

Keywords. Inversion, structured operators, structured matrices, Toeplitz op-erators, Hankel operators, Stein equation.

1. Introduction

To introduce the main results of this paper we first present an example. Let 𝑔and ℎ be matrix functions of sizes 𝑝 × 𝑞 and 𝑞 × 𝑝, respectively, with entries inthe Wiener algebra on the unit circle (see Section XXIX.2 in [7]). With 𝑔 andℎ we associate Hankel operators 𝐺 and 𝐻 , where 𝐺 : ℓ2+(ℂ

𝑞) → ℓ2+(ℂ𝑝) and

𝐻 : ℓ2+(ℂ𝑝)→ ℓ2+(ℂ

𝑞), as follows:

𝐺 =

⎡⎢⎢⎢⎣𝑔0 𝑔1 𝑔2 ⋅ ⋅ ⋅𝑔1 𝑔2 𝑔3 ⋅ ⋅ ⋅𝑔2 𝑔3 𝑔4...

.... . .

⎤⎥⎥⎥⎦ and 𝐻 =

⎡⎢⎢⎢⎣ℎ0 ℎ1 ℎ2 ⋅ ⋅ ⋅ℎ1 ℎ2 ℎ3 ⋅ ⋅ ⋅ℎ2 ℎ3 ℎ4...

.... . .

⎤⎥⎥⎥⎦ . (1.1)

Here 𝑔𝑗 and ℎ𝑗 , 𝑗 = 0, 1, 2, . . ., are the Fourier coefficients corresponding to theanalytic parts of 𝑔 and ℎ, respectively. Furthermore, we have two invertible blockToeplitz operators 𝑅 and 𝑉 acting on ℓ2+(ℂ

𝑝) and ℓ2+(ℂ𝑞), respectively. As for 𝐺

and 𝐻 the entries of the matrix functions defining 𝑅 and 𝑉 belong to the Wiener

162 M.A. Kaashoek and F. van Schagen

algebra on the unit circle. We are interested in inverting the operator 𝑅−𝐺𝑉 −1𝐻 .For this purpose we need linear maps

𝑎1, 𝑎2 : ℂ𝑝 → ℓ1+(ℂ

𝑝), 𝑑1, 𝑑2 : ℂ𝑞 → ℓ1+(ℂ

𝑞) (1.2)

satisfying the equations

(𝑅−𝐺𝑉 −1𝐻)𝑎1 = 𝜀𝑝, (𝑉 −𝐻𝑅−1𝐺)𝑑1 = 𝜀𝑞, (1.3)

𝑎∗2(𝑅−𝐺𝑉 −1𝐻) = 𝜀∗𝑝, 𝑑∗2(𝑉 −𝐻𝑅−1𝐺) = 𝜀∗𝑞 . (1.4)

Here for each positive integer 𝑟 the symbol 𝜀𝑟 denotes the canonical embedding ofℂ𝑟 into ℓ2+(ℂ

𝑟), that is, 𝜀𝑟𝑢 = col [𝛿𝑗,0𝑢]∞𝑗=0, where 𝑢 ∈ ℂ𝑟 and 𝛿𝑗,𝑘 is the Kronecker

delta. Note that 𝜀𝑟𝜀∗𝑟 = 𝐼 − 𝑆𝑟𝑆∗𝑟 , where 𝑆𝑟 is the forward shift on ℓ2+(ℂ𝑟). Since

ℓ1+(ℂ𝑝) and ℓ1+(ℂ

𝑞) are contained in ℓ2+(ℂ𝑝) and ℓ2+(ℂ

𝑞), respectively, the products𝑎𝑗0 = 𝜀

∗𝑝𝑎𝑗 and 𝑑𝑗0 = 𝜀

∗𝑞𝑑𝑗 , 𝑗 = 1, 2, are well defined.

For any linear map 𝑥 from ℂ𝑚 into ℓ1+(ℂ𝑚) we denote by 𝑇𝑥 the block lower

triangular Toeplitz operator acting on ℓ2+(ℂ𝑚) whose first column is equal to 𝑥,

and for an 𝑚 ×𝑚 matrix 𝑢 the symbol Δ𝑢 denotes the block diagonal operatoron ℓ2+(ℂ

𝑚) with all diagonal entries equal to 𝑢.

Theorem 1.1. Assume there exist linear maps 𝑎1, 𝑎2 and 𝑑1, 𝑑2 as in (1.2) satisfy-ing equations (1.3) and (1.4). Then 𝑎10 = 𝑎

∗20 and 𝑑10 = 𝑑

∗20. Furthermore, assume

that at least one of the matrices 𝑎10 and 𝑑10 is invertible. Then both the matrices𝑎10 and 𝑑10 are invertible, and the operators 𝑅 − 𝐺𝑉 −1𝐻 and 𝑉 − 𝐻𝑅−1𝐺 areinvertible. Moreover,

(𝑅−𝐺𝑉 −1𝐻)−1 = 𝑇𝑎1Δ𝑎−110𝑇 ∗𝑎2

− 𝑆𝑝𝑇𝑏1Δ𝑑−110𝑇 ∗𝑏2𝑆

∗𝑝 , (1.5)

(𝑉 −𝐻𝑅−1𝐺)−1 = 𝑇𝑑1Δ𝑑−110𝑇 ∗𝑑2

− 𝑆𝑞𝑇𝑐1Δ𝑎−110𝑇 ∗𝑐2𝑆

∗𝑞 . (1.6)

Here

𝑏1 = −𝑅−1𝐺𝑑1, 𝑐1 = −𝑉 −1𝐻𝑎1, 𝑏∗2 = −𝑑∗2𝐻𝑅−1, 𝑐∗2 = −𝑎∗2𝐺𝑉 −1. (1.7)

Note that invertibility of 𝑅 − 𝐺𝑉 −1𝐻 (or 𝑉 − 𝐻𝑅−1𝐺) is equivalent toinvertibility of the operator T given by

𝑇 =

[𝑅 𝐺𝐻 𝑉

]:

[ℓ2+(ℂ

𝑝)ℓ2+(ℂ

𝑞)

]→[ℓ2+(ℂ

𝑝)ℓ2+(ℂ

𝑞)

]. (1.8)

Moreover, in that case

𝑇−1 =[

(𝑅 −𝐺𝑉 −1𝐻)−1 −𝑅−1𝐺(𝑉 −𝐻𝑅−1𝐺)−1−𝑉 −1𝐻(𝑅−𝐺𝑉 −1𝐻)−1 (𝑉 −𝐻𝑅−1𝐺)−1

].

Using this connection one sees that for the selfadjoint case the above theorem isequivalent to Theorem 3.1 in [5] which is an infinite-dimensional generalization ofthe Gohberg–Heinig inversion theorem from [8]. In a somewhat less explicit formthe above theorem also appears in Section 5 of [6].

With the term “Toeplitz plus Hankel operator” used in the title we have inmind operators of the type (1.8).

Inverting Structured Operators 163

In the present paper we put Theorem 1.1 in a more general setting. Moreprecisely we derive Theorem 1.1 as a corollary of the two abstract inversion the-orems presented below, the alternative versions of these theorems arising fromRemark 1.4, and the auxiliary result Lemma 2.2.

To state our main results we need some additional notation. Consider theHilbert space operators

𝐴1 : 𝒳1 → 𝒳1, 𝐵1 : 𝒰 → 𝒳1, 𝐶1 : 𝒳1 → 𝒴, (1.9)

𝐴2 : 𝒳2 → 𝒳2, 𝐵2 : 𝒴 → 𝒳2, 𝐶2 : 𝒳2 → 𝒰 . (1.10)

Throughout we assume that 𝑃 : 𝒳2 → 𝒳1 and𝑄 : 𝒳1 → 𝒳2 are operators satisfyingthe following Stein equations

𝑃 −𝐴1𝑃𝐴2 = 𝐵1𝐶2, 𝑄−𝐴2𝑄𝐴1 = 𝐵2𝐶1. (1.11)

If the identities in (1.11) are satisfied, we shall refer to the set of operators(𝐴1, 𝐵1, 𝐶1;𝐴2, 𝐵2, 𝐶2) as a data set associated with the pair (𝑃,𝑄). We sum-marize the structure of the data set associated with (𝑃,𝑄) with the followingdiagram:

𝐵1 𝐴1 𝐶1𝒰 −→ 𝒳1 −→ 𝒳1 −→ 𝒴𝑃 ↑ 𝑄 ↓

𝒰 ←− 𝒳2 ←− 𝒳2 ←− 𝒴.𝐶2 𝐴2 𝐵2

(1.12)

We shall be interested in inverting the operators 𝐼𝒳1 −𝑃𝑄 using solutions ofthe following four equations

(𝐼𝒳1 − 𝑃𝑄)𝑋 = 𝐵1, (𝐼𝒳2 −𝑄𝑃 )𝑊 = 𝐵2, (1.13)

𝑌 (𝐼𝒳1 − 𝑃𝑄) = 𝐶1, 𝑍(𝐼𝒳2 −𝑄𝑃 ) = 𝐶2. (1.14)

Here the unknowns are operators

𝑋 : 𝒰 → 𝒳1, 𝑌 : 𝒳1 → 𝒴, 𝑊 : 𝒴 → 𝒳2, 𝑍 : 𝒳2 → 𝒰 . (1.15)

Theorem 1.2. Let the operators 𝑋, 𝑌 , 𝑍, and 𝑊 in (1.15) be solutions of theequations (1.13) and (1.14). Then

𝐼𝒴 + 𝑌 𝑃𝐵2 = 𝐼𝒴 + 𝐶1𝑃𝑊, 𝐼𝒰 + 𝐶2𝑄𝑋 = 𝐼𝒰 + 𝑍𝑄𝐵1. (1.16)

Assume in addition that 𝐼𝒳1 −𝑃𝑄 is invertible. Then 𝐼𝒳1 −𝐴1𝑃𝐴2𝑄 is invertibleif and only if at least one of the two operators 𝐼𝒰 + 𝐶2𝑄𝑋 and 𝐼𝒴 + 𝑌 𝑃𝐵2 isinvertible. In that case both 𝐼𝒰 + 𝐶2𝑄𝑋 and 𝐼𝒴 + 𝑌 𝑃𝐵2 are invertible and

𝑄(𝐼𝒳1 − 𝑃𝑄)−1 −𝐴2𝑄(𝐼𝒳1 − 𝑃𝑄)−1𝐴1

=𝑊 (𝐼𝒴 + 𝑌 𝑃𝐵2)−1𝑌 −𝐴2𝑄𝑋(𝐼𝒰 + 𝐶2𝑄𝑋)

−1𝑍𝑄𝐴1.(1.17)

Theorem 1.3. Assume that there exist operators 𝑌 and 𝑍 as in (1.15) such that theidentities in (1.14) are satisfied. If the associate operator 𝐼𝒴 +𝑌 𝑃𝐵2 is invertible,then Ker (𝐼𝒳1 − 𝑃𝑄) ⊂

∩∞𝜈=0Ker𝐶2𝐴

𝜈2𝑄. Moreover, if the operator 𝐼𝒳1 − 𝑃𝑄 is

Fredholm of index zero and∩∞

𝜈=0Ker𝐶2𝐴𝜈2 = {0}, then 𝐼𝒳1 − 𝑃𝑄 is invertible.


Remark 1.4. Notice that there is a lot of symmetry in the diagram (1.12) withrespect to the roles of 𝑃 and 𝑄, the indices 1 and 2, and the spaces 𝒰 and 𝒴.Therefore, in Theorems 1.2 and 1.3 one may simultaneously interchange thesepairs and obtain analoguous results. Also one can use duality which, for instance,interchanges the roles of 𝐵𝑗 and 𝐶𝑗 , 𝑗 = 1, 2. To be more precise, let (𝐴1, 𝐵1, 𝐶1;𝐴2, 𝐵2, 𝐶2) be a data set associated with (𝑃,𝑄), then (𝐴2, 𝐵2, 𝐶2;𝐴1, 𝐵1, 𝐶1) is adata set associated with (𝑄,𝑃 ) and (𝐴∗2, 𝐶

∗2 , 𝐵

∗2 ;𝐴

∗1, 𝐶

∗1 , 𝐵

∗1) is a data set associated

with (𝑃 ∗, 𝑄∗). These invariances under symmetry and duality yield alternativeversions of Theorem 1.2 and Theorem 1.3; see Theorem 2.3 and Theorems 3.6,3.7 and 3.8. The latter three theorems also play a role in the proof of Theorem 1.1.

To illustrate the preceding Theorems 1.2 and 1.3 we briefly sketch how Theo-rem 1.1 can be obtained as a corollary of these two theorems. For simplicity we takehere 𝑅 = 𝐼 and 𝑉 = 𝐼. First we make a special choice of the data in (1.9)–(1.11),as follows:

𝑃 = 𝐺𝐻 on 𝒳1 = ℓ2+(ℂ

𝑝) and 𝑄 is the identity operator on 𝒳2 = ℓ2+(ℂ

𝑝),

𝐴1 = 𝑆∗𝑝 , 𝐵1 = 𝐺𝜀𝑞 : ℂ

𝑞 → ℓ2+(ℂ𝑝), 𝐶1 = 𝜀

∗𝑝 : ℓ

2+(ℂ

𝑝)→ ℂ𝑝,

𝐴2 = 𝑆𝑝, 𝐵2 = 𝜀𝑝 : ℂ𝑝 → ℓ2+(ℂ

𝑝), 𝐶2 = 𝜀∗𝑞𝐻 : ℓ2+(ℂ

𝑝)→ ℂ𝑞.

Using that 𝐺 and 𝐻 are Hankel operators, we see that

𝑃 −𝐴1𝑃𝐴2 = 𝐺𝐻 − 𝑆∗𝑝𝐺𝐻𝑆𝑝 = 𝐺𝐻 −𝐺𝑆𝑞𝑆∗𝑞𝐻= 𝐺(𝐼 − 𝑆𝑞𝑆∗𝑞 )𝐻 = 𝐺𝜀𝑞𝜀

∗𝑞𝐻 = 𝐵1𝐶2.

Furthermore, 𝑄 − 𝐴2𝑄𝐴1 = 𝐼 − 𝑆𝑝𝑆∗𝑝 = 𝜀𝑝𝜀∗𝑝 = 𝐵2𝐶1. Thus the corresponding

Stein equations (1.11) are satisfied. Furthermore, since 𝑃 = 𝐺𝐻 and 𝑄 = 𝐼, wehave 𝐼 − 𝑃𝑄 = 𝐼 −𝐺𝐻 and the left-hand side of (1.17) becomes

(𝐼 −𝐺𝐻)−1 − 𝑆𝑝(𝐼 −𝐺𝐻)−1𝑆∗𝑝 .Next one shows that in this case, the equations (1.13) and (1.14) are equivalentto the equations (1.3) and (1.4). Hence for this case (1.16) yields 𝑎10 = 𝑎

∗20 and

𝑑10 = 𝑑∗20. It is then a matter of direct checking to prove that the invertibility of

𝐼 −𝐺𝐻 and 𝐼 −𝐻𝐺 follows from Theorem 1.3, and that formulas (1.5) and (1.6)can be obtained from (1.17). For further details see the full proof of Theorem 1.1in Section 4.

The setting described by the formulas (1.9)–(1.14) is of particular interestwhen the data are matrices. To illustrate this, assume that 𝐺 and 𝐻 in (1.1) areof finite rank and 𝑅 = 𝐼 and 𝑉 = 𝐼. In that case we know from mathematicalsystems theory (see, e.g., [3, Chapter 7]) that the defining functions 𝑔 and ℎ arerational matrix functions and admit stable finite-dimensional realizations:

𝑔(𝑧) = 𝐶1(𝐼𝑛1 − 𝑧𝐴1)−1𝐵1 and ℎ(𝑧) = 𝐶2(𝐼𝑛2 − 𝑧𝐴2)

−1𝐵2. (1.18)

Here stable means that 𝐴1 and 𝐴2 have their eigenvalues in the open unit disc.From (1.18) it follows that the entries 𝑔𝑗 and ℎ𝑗 in 𝐺 and 𝐻 are given by 𝑔𝑗 =


𝐶1𝐴𝑗1𝐵1 and ℎ𝑗 = 𝐶2𝐴

𝑗2𝐵2 for 𝑗 = 0, 1, 2, . . .. Hence 𝐺 = Γ1Λ1 and 𝐻 = Γ2Λ2

with

Λ𝑗 =[𝐵𝑗 𝐴𝑗𝐵𝑗 𝐴2

𝑗𝐵𝑗 ⋅ ⋅ ⋅] , Γ𝑗 =

⎡⎢⎢⎢⎣𝐶𝑗

𝐶𝑗𝐴𝑗

𝐶𝑗𝐴2𝑗

...

⎤⎥⎥⎥⎦ . (1.19)

Furthermore, (1.11) holds with

𝑃 =

∞∑𝜈=0

𝐴𝜈1𝐵1𝐶2𝐴

𝜈2 = Λ1Γ2 and 𝑄 =

∞∑𝜈=0

𝐴𝜈2𝐵2𝐶1𝐴

𝜈1 = Λ2Γ1. (1.20)

In particular, one sees that 𝐼 − 𝐺𝐻 = 𝐼 − Γ1Λ1Γ2Λ2 is invertible if and only if𝐼 − 𝑃𝑄 = 𝐼 − Λ1Γ2Λ2Γ1 is invertible. Moreover, the inverse of 𝐼 − 𝐺𝐻 can beexpressed in terms of the inverse of 𝐼−𝑃𝑄 and vice versa (cf. the remark precedingLemma 2.2).

Now assume, as in Theorem 1.1, that there exist linear maps 𝑎1, 𝑎2, 𝑑1, 𝑑2as in (1.2) satisfying equations (1.3) and (1.4) (with 𝑅 and 𝑉 identity operators)such that 𝑎10 or 𝑑10 is invertible. As we have seen, this implies that 𝐼 − 𝐺𝐻 isinvertible. To compute (𝐼 − 𝐺𝐻)−1 we apply Theorem 1.2 with 𝑃 , 𝑄, and theassociate data {𝐴1, 𝐵1, 𝐶1;𝐴2, 𝐵2, 𝐶2} as in (1.20). In this case equations (1.13)and (1.14) are just matrix equations, and identity (1.17) leads to the followinginversion formula:

(𝐼 −𝐺𝐻)−1 = 𝐼 +𝐻𝑏1Δ𝑑−110𝐻∗𝑏2 − 𝑆∗𝑝𝐻𝑎1Δ𝑎−1

10𝐻∗𝑎2

𝑆𝑝. (1.21)

Here 𝑏1 and 𝑏2 are defined by the first and third identity in (1.7), and 𝐻𝑥 denotesthe Hankel operator with first column equal to 𝑥. Moreover, the Hankel operatorsappearing in (1.21) are all of finite rank. The inversion formula (1.21) is the discreteanalogue of the inversion formula in [10, Theorem 0.1]. The full proof of (1.21)will be presented in Subsection 4.2.

We shall also show that the inversion formulas for 𝑅 − 𝐺𝑉 −1𝐻 and 𝑉 −𝐻𝑅−1𝐺 in Theorem 1.1 can be replaced by formulas analogous to (1.21), that is,with the Toeplitz operators in (1.5) and (1.6) being replaced by Hankel operators.See Theorem 4.3 for the precise formulation.

Theorems 1.2 and 1.3 also apply to other inversion problems than the onesrelated to Theorem 1.1. As an illustration we derive Theorem 2.1 in [6] as a corol-lary of Theorems 1.2 and 1.3. This implies that all examples in [6] are also coveredby Theorems 1.2 and 1.3 above. Whether or not the main inversion theorems inSection 2 of [6] also imply Theorems 1.2 and 1.3 remains an open question.

The operators considered in this paper belong to the area of structured ma-trices and operators which include Toeplitz and Hankel operators, Vandermonde,Cauchy and Pick matrices, resultants and Bezoutians, controllability and observ-ability operators, and many other classes of matrices and operators. The literatureon the subject is vast. Here we only mention [14], the review paper [12], and the


books [11], [13], [15], [16]. We see our Theorems 1.2 and 1.3 as a contribution tothis rich field of research.

This paper consists of six sections including the present introduction. Theproofs of Theorems 1.2 and 1.3 are given in Sections 2 and 3, respectively. InSection 4 we return to Theorem 1.1, and complete the sketch of the proof givenabove. In this section we also prove (1.21) and derive formulas for (𝑅−𝐺𝑉 −1𝐻)−1and (𝑉 − 𝐻𝑅−1𝐺)−1 analogous to (1.21). In Section 5 we derive Theorem 2.1in [6] as a corollary of Theorems 1.2 and 1.3 above. In general, with 𝑃 and 𝑄given, one can find different sets of operators (𝐴1, 𝐵1, 𝐶1;𝐴2, 𝐵2, 𝐶2) such theequations (1.9), (1.10), and (1.11) are satisfied. These different choices of the dataset (𝐴1, 𝐵1, 𝐶1;𝐴2, 𝐵2, 𝐶2) often lead to different versions of formula (1.17), and itcan happen that for some choice of the data set formula (1.17) leads to a formula for𝑄(𝐼 −𝑃𝑄)−1 while for another choice it does not. This phenomenon is illustratedon finite block Toeplitz matrices in the final section.

2. Proof of Theorem 1.2

In this section we prove Theorem 1.2. For simplicity, in this section, any identityoperator will be denoted by 𝐼, that is, in what follows we shall omit the subscriptindicating on which space the identity operator acts. Furthermore, we shall freelyuse the operators appearing in (1.9), (1.10), and (1.11). We begin with two lemmas.

Lemma 2.1. If the second identity of (1.13) and the first of (1.14) are satisfied,then 𝐼 + 𝑌 𝑃𝐵2 = 𝐼 + 𝐶1𝑃𝑊 . Analogously, if the first identity of (1.13) and thesecond of (1.14) are satisfied, then 𝐼 + 𝐶2𝑄𝑋 = 𝐼 + 𝑍𝑄𝐵1.

Proof. From the second identity of (1.13) and the first of (1.14) we get

𝐼 + 𝑌 𝑃𝐵2 = 𝐼 + 𝑌 𝑃 (𝐼 −𝑄𝑃 )𝑊 = 𝐼 + 𝑌 (𝐼 − 𝑃𝑄)𝑃𝑊 = 𝐼 + 𝐶1𝑃𝑊.

The other two identities give

𝐼 + 𝐶2𝑄𝑋 = 𝐼 + 𝑍(𝐼𝒳 −𝑄𝑃 )𝑄𝑋 = 𝐼 + 𝑍𝑄(𝐼 − 𝑃𝑄)𝑋 = 𝐼 + 𝑍𝑄𝐵1. □

In the proof of the following lemma we use a few times the classical resultthat given two operators 𝐹1 and 𝐹2 the invertibility of 𝐼 + 𝐹1𝐹2 is equivalent tothe invertibility of 𝐼+𝐹2𝐹1. Moreover, in that case the inverse of 𝐼+𝐹1𝐹2 is givenby (𝐼 + 𝐹1𝐹2)

−1 = 𝐼 − 𝐹1(𝐼 + 𝐹2𝐹1)−1𝐹2 (see [1], first paragraph on page 30).Lemma 2.2. Assume that 𝐼 − 𝑃𝑄 is invertible, and let 𝑋 and 𝑌 be the opera-tors defined by the first equations of (1.13) and (1.14). Then the following areequivalent:

(i) 𝐼 + 𝐶2𝑄𝑋 is invertible,(ii) 𝐼 + 𝑌 𝑃𝐵2 is invertible,(iii) 𝐼 −𝐴1𝑃𝐴2𝑄 is invertible.


Furthermore, in that case 𝐼 − 𝑃𝐴2𝑄𝐴1 is also invertible and

(𝐼 + 𝐶2𝑄𝑋)−1 = 𝐼 − 𝐶2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄)

−1𝐵1, (2.1)

(𝐼 + 𝑌 𝑃𝐵2)−1 = 𝐼 − 𝐶1(𝐼 − 𝑃𝐴2𝑄𝐴1)

−1𝑃𝐵2. (2.2)

Proof. Since 𝑋 = (𝐼 − 𝑃𝑄)−1𝐵1, we have

𝐼 + 𝐶2𝑄𝑋 = 𝐼 + 𝐶2𝑄(𝐼 − 𝑃𝑄)−1𝐵1.

It follows that 𝐼 + 𝐶2𝑄𝑋 is invertible if and only if

𝐼 +𝐵1𝐶2𝑄(𝐼 − 𝑃𝑄)−1 =(𝐼 − (𝑃𝑄−𝐵1𝐶2𝑄)

)(𝐼 − 𝑃𝑄)−1

is invertible. By the first identity in (1.11), we have 𝑃𝑄 − 𝐵1𝐶2𝑄 = 𝐴1𝑃𝐴2𝑄.This proves the equivalence of (i) and (iii). Moreover in that case

(𝐼 + 𝐶2𝑄𝑋)−1 =

(𝐼 + 𝐶2𝑄(𝐼 − 𝑃𝑄)−1𝐵1

)−1= 𝐼 − 𝐶2𝑄(𝐼 − 𝑃𝑄)−1

(𝐼 +𝐵1𝐶2𝑄(𝐼 − 𝑃𝑄)−1

)−1𝐵1

= 𝐼 − 𝐶2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄

)−1𝐵1.

This proves identity (2.1).Since 𝐼 − 𝐴1𝑃𝐴2𝑄 is invertible if and only if 𝐼 − 𝑃𝐴2𝑄𝐴1 is invertible,

the equivalence of (ii) and (iii), and the identity (2.2) can be proved using anappropriate modification of the arguments employed in the previous paragraph.

□

Proof of Theorem 1.2. Assuming that 𝐼 −𝑃𝑄 is invertible and given Lemmas 2.1and 2.2, it remains to prove the identity (1.17). We will show that

𝑊 (𝐼 + 𝑌 𝑃𝐵2)−1𝑌 = 𝑄(𝐼 − 𝑃𝑄)−1 − (𝐼 −𝐴2𝑄𝐴1𝑃 )

−1𝐴2𝑄𝐴1, (2.3)

𝐴2𝑄𝑋(𝐼 + 𝐶2𝑄𝑋)−1𝑍𝑄𝐴1 = 𝐴2𝑄(𝐼 − 𝑃𝑄)−1𝐴1 − (𝐼 −𝐴2𝑄𝐴1𝑃 )

−1𝐴2𝑄𝐴1.(2.4)

Subtracting these two identities (1.17) appears. In deriving (2.3) and (2.4) we shalluse a few times that

𝐵1𝐶2𝑄 = (𝐼 −𝐴1𝑃𝐴2𝑄)− (𝐼 − 𝑃𝑄), (2.5)

𝐵2𝐶1𝑃 = (𝐼 −𝐴2𝑄𝐴1𝑃 )− (𝐼 −𝑄𝑃 ). (2.6)

These identities follow from the ones in (1.11).Let us now prove (2.3). Using (2.2) and (2.6), a standard computation (cf.

the state space formulas in Theorem 2.4 in [2]) yields

𝑊 (𝐼 + 𝑌 𝑃𝐵2)−1 = (𝐼 −𝑄𝑃 )−1𝐵2

(𝐼 − 𝐶1(𝐼 − 𝑃𝐴2𝑄𝐴1)

−1𝑃𝐵2

)= (𝐼 −𝑄𝑃 )−1𝐵2

(𝐼 − 𝐶1𝑃 (𝐼 −𝐴2𝑄𝐴1𝑃 )

−1𝐵2

)= (𝐼 −𝐴2𝑄𝐴1𝑃 )

−1𝐵2.


We proceed with

𝑊 (𝐼 + 𝑌 𝑃𝐵2)−1𝑌 = (𝐼 −𝐴2𝑄𝐴1𝑃 )

−1𝐵2𝐶1(𝐼 − 𝑃𝑄)−1

= (𝐼 −𝐴2𝑄𝐴1𝑃 )−1𝐵2𝐶1

(𝐼 + 𝑃𝑄(𝐼 − 𝑃𝑄)−1

)= (𝐼 −𝐴2𝑄𝐴1𝑃 )

−1𝐵2𝐶1

+ (𝐼 −𝐴2𝑄𝐴1𝑃 )−1(𝐵2𝐶1𝑃 )(𝐼 −𝑄𝑃 )−1𝑄.

Again using (2.6), it follows that

𝑊 (𝐼 + 𝑌 𝑃𝐵2)−1𝑌 = (𝐼 −𝐴2𝑄𝐴1𝑃 )

−1𝐵2𝐶1

+ (𝐼 −𝑄𝑃 )−1𝑄 − (𝐼 −𝐴2𝑄𝐴1𝑃 )−1𝑄.

According to the second identity in (1.11) we have 𝐵2𝐶1 − 𝑄 = −𝐴2𝑄𝐴1, andhence the above calculations yield (2.3).

Next we prove (2.4). Using (2.1) and (2.5) (cf. the state space formulas inTheorem 2.4 in [2]) we have

𝐴2𝑄𝑋(𝐼 + 𝐶2𝑄𝑋)−1 = 𝐴2𝑄(𝐼 − 𝑃𝑄)−1𝐵1

(𝐼 − 𝐶2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄)

−1𝐵1

)= 𝐴2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄)

−1𝐵1.

Hence

𝐴2𝑄𝑋(𝐼 + 𝐶2𝑄𝑋)−1𝑍 = 𝐴2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄)

−1𝐵1𝐶2(𝐼 −𝑄𝑃 )−1= 𝐴2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄)

−1𝐵1𝐶2

+𝐴2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄)−1𝐵1𝐶2𝑄𝑃 (𝐼 −𝑄𝑃 )−1

= 𝐴2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄)−1𝐵1𝐶2

+𝐴2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄)−1𝐵1𝐶2𝑄(𝐼 − 𝑃𝑄)−1𝑃.

Using (2.5) we see that

𝐴2𝑄𝑋(𝐼 + 𝐶2𝑄𝑋)−1𝑍 = 𝐴2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄)

−1𝐵1𝐶2

−𝐴2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄)−1𝑃 +𝐴2𝑄(𝐼 − 𝑃𝑄)−1𝑃

= −𝐴2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄)−1𝐴1𝑃𝐴2 +𝐴2𝑄𝑃 (𝐼 −𝑄𝑃 )−1.

Now rewrite

𝐴2𝑄(𝐼 −𝐴1𝑃𝐴2𝑄)−1𝐴1𝑃𝐴2 = 𝐴2𝑄𝐴1𝑃 (𝐼 −𝐴2𝑄𝐴1𝑃 )

−1𝐴2

=(𝐼 − (𝐼 −𝐴2𝑄𝐴1𝑃 )

)(𝐼 −𝐴2𝑄𝐴1𝑃 )

−1𝐴2

= (𝐼 −𝐴2𝑄𝐴1𝑃 )−1𝐴2 −𝐴2.

Similarly

𝐴2𝑄(𝐼 − 𝑃𝑄)−1𝑃 = 𝐴2𝑄𝑃 (𝐼 −𝑄𝑃 )−1

= 𝐴2

(𝐼 − (𝐼 −𝑄𝑃 )

)(𝐼 −𝑄𝑃 )−1 = 𝐴2(𝐼 −𝑄𝑃 )−1 −𝐴2.


It follows that

𝐴2𝑄𝑋(𝐼 + 𝐶2𝑄𝑋)−1𝑍 = −(𝐼 −𝐴2𝑄𝐴1𝑃 )

−1𝐴2 +𝐴2(𝐼 −𝑄𝑃 )−1.Hence

𝐴2𝑄𝑋(𝐼 + 𝐶2𝑄𝑋)−1𝑍𝑄𝐴1 = −(𝐼 −𝐴2𝑄𝐴1𝑃 )

−1𝐴2𝑄𝐴1 +𝐴2(𝐼 −𝑄𝑃 )−1𝑄𝐴1

= −(𝐼 −𝐴2𝑄𝐴1𝑃 )−1𝐴2𝑄𝐴1 +𝐴2𝑄(𝐼 − 𝑃𝑄)−1𝐴1.

Thus (2.4) holds, and the proof is complete. □

Using Remark 1.4 about interchanging the roles of 𝑃 and 𝑄 one obtains thefollowing alternative of Theorem 1.2.

Theorem 2.3. Let the operators 𝑋, 𝑌 , 𝑍, and 𝑊 in (1.15) be solutions of theequations (1.13) and (1.14). Then

𝐼𝒴 + 𝑌 𝑃𝐵2 = 𝐼𝒴 + 𝐶1𝑃𝑊, 𝐼𝒰 + 𝐶2𝑄𝑋 = 𝐼𝒰 + 𝑍𝑄𝐵1.

Assume in addition that 𝐼𝒳2 −𝑄𝑃 is invertible. Then 𝐼𝒳2 −𝐴2𝑄𝐴1𝑃 is invertibleif and only if at least one of the two operators 𝐼𝒴 + 𝐶1𝑃𝑊 and 𝐼𝒰 + 𝑍𝑄𝐵1 isinvertible. In that case both 𝐼𝒴 + 𝐶1𝑃𝑊 and 𝐼𝒰 + 𝑍𝑄𝐵1 are invertible and

𝑃 (𝐼𝒳2 −𝑄𝑃 )−1 −𝐴1𝑃 (𝐼𝒳2 −𝑄𝑃 )−1𝐴2

= 𝑋(𝐼𝒰 + 𝑍𝑄𝐵1)−1𝑍 −𝐴1𝑃𝑊 (𝐼𝒴 + 𝐶1𝑃𝑊 )−1𝑌 𝑃𝐴2.

3. Conditions of invertibility, proof of Theorem 1.3

The next four results extend and sharpen Theorem 1.3. Hence in order to proveTheorem 1.3 it suffices to prove the four results presented below.

Proposition 3.1. Assume that there exist operators 𝑌 and 𝑍 as in (1.15) such that(1.14) is satisfied, and let the associated operator 𝐼𝒴 + 𝑌 𝑃𝐵2 be invertible. Then

(𝐼𝒳1 − 𝑃𝑄)𝑥 = 0 ⇒ 𝐶2𝐴𝜈2𝑄𝑥 = 0, (𝜈 = 0, 1, 2, . . .).

Proposition 3.2. Assume that there exist operators 𝑋 and 𝑊 as in (1.15) suchthat (1.13) is satisfied, and let the associated operator 𝐼𝒰 + 𝐶2𝑄𝑋 be invertible.Then

𝑦(𝐼𝒳1 − 𝑃𝑄) = 0 ⇒ 𝑦𝑃𝐴𝜈2𝐵2 = 0 (𝜈 = 0, 1, 2, . . .).

Corollary 3.3. Assume that the operator 𝐼𝒳1 − 𝑃𝑄 is Fredholm of index zero andthat


𝜈2 = {0}. Furthermore, assume that there exist operators 𝑌 and

𝑍 as in (1.15) such that (1.14) is satisfied. Then 𝐼𝒴 + 𝑌 𝑃𝐵2 is invertible impliesthat 𝐼𝒳1 − 𝑃𝑄 is invertible.

Corollary 3.4. Assume that the operator 𝐼𝒳1 − 𝑃𝑄 is Fredholm of index zero andthat

∩∞𝜈=0Ker𝐵

∗2𝐴

∗𝜈2 = {0}. Furthermore, assume that there exist operators 𝑋

and 𝑊 as in (1.15) such that (1.13) is satisfied. Then 𝐼𝒰 + 𝐶2𝑄𝑋 is invertibleimplies that 𝐼𝒳1 − 𝑃𝑄 is invertible.


Using Remark 1.4 we see that it suffices to prove Proposition 3.1 and Corol-lary 3.3.

Proof of Proposition 3.1. Assume that the operator 𝐼𝒴 + 𝑌 𝑃𝐵2 is invertible, andlet (𝐼𝒳1 − 𝑃𝑄)𝑥 = 0. Then 𝐶1𝑥 = 𝑌 (𝐼𝒳1 − 𝑃𝑄)𝑥 = 0 and

𝐶2𝑄𝑥 = 𝑍(𝐼𝒳1 −𝑄𝑃 )𝑄𝑥 = 𝑍𝑄(𝐼𝒳1 − 𝑃𝑄)𝑥 = 0.

Also

(𝐼𝒳1 −𝐴1𝑃𝐴2𝑄)𝑥 =(𝐼𝒳1 − (𝑃 −𝐵1𝐶2)𝑄

)𝑥 = (𝐼𝒳1 − 𝑃𝑄+𝐵1𝐶2𝑄)𝑥

= (𝐼𝒳1 − 𝑃𝑄)𝑥+ 𝐵1𝐶2𝑄𝑥 = 0,

and

(𝐼𝒳1 − 𝑃𝐴2𝑄𝐴1)𝑥 =(𝐼𝒳1 − 𝑃 (𝑄−𝐵2𝐶1)

)𝑥 = (𝐼𝒳1 − 𝑃𝑄+ 𝑃𝐵2𝐶1)𝑥

= (𝐼𝒳1 − 𝑃𝑄)𝑥+ 𝑃𝐵2𝐶1𝑥 = 0.

Next observe that

(𝐼𝒴 + 𝑌 𝑃𝐵2)𝐶1 = 𝐶1 + 𝑌 𝑃𝐵2𝐶1 = 𝐶1 + 𝑌 𝑃 (𝑄−𝐴2𝑄𝐴1)

= 𝐶1 + 𝑌 𝑃𝑄− 𝑌 𝑃𝐴2𝑄𝐴1 = 𝑌 − 𝑌 𝑃𝐴2𝑄𝐴1.

We see that

(𝐼𝒴 + 𝑌 𝑃𝐵2)𝐶1𝑃𝐴2𝑄𝑥 = 𝑌 𝑃𝐴2𝑄𝑥− 𝑌 𝑃𝐴2𝑄𝐴1𝑃𝐴2𝑄𝑥

= 𝑌 𝑃𝐴2𝑄(𝐼𝒳1 −𝐴1𝑃𝐴2𝑄)𝑥 = 0.

By assumption 𝐼𝒴 + 𝑌 𝑃𝐵2 is invertible. Hence 𝐶1𝑃𝐴2𝑄𝑥 = 0. Therefore

0 = 𝐵2𝐶1𝑃𝐴2𝑄𝑥 = (𝑄− 𝐴2𝑄𝐴1)𝑃𝐴2𝑄𝑥 = 𝑄𝑃𝐴2𝑄𝑥− (𝐴2𝑄)𝐴1𝑃𝐴2𝑄𝑥

= 𝑄𝑃𝐴2𝑄𝑥−𝐴2𝑄𝑥 = −(𝐼𝒳2 −𝑄𝑃 )𝐴2𝑄𝑥.

We conclude that

(𝐼𝒳1 − 𝑃𝑄)𝑥 = 0 ⇒ (𝐼𝒳2 −𝑄𝑃 )𝐴2𝑄𝑥 = 0. (3.1)

Next we prove that

(𝐼𝒳1 − 𝑃𝑄)𝑥 = 0 ⇒ (𝐼𝒳2 −𝑄𝑃 )𝐴𝜈2𝑄𝑥 = 0, 𝜈 = 0, 1, 2, . . . . (3.2)

For 𝜈 = 0 we have (𝐼𝒳2 −𝑄𝑃 )𝑄𝑥 = 𝑄(𝐼𝒳2 −𝑃𝑄)𝑥 = 0. We proceed by induction.Assume that the right-hand side of (3.2) holds for 𝜈 = 𝑘 ≥ 0. Then

(𝐼𝒳1 − 𝑃𝑄)𝑃𝐴𝑘2𝑄𝑥 = 𝑃 (𝐼𝒳2 −𝑄𝑃 )𝐴𝑘

2𝑄𝑥 = 0.

Thus (𝐼𝒳1 −𝑃𝑄)�� = 0 where 𝑥 = 𝑃𝐴𝑘2𝑄𝑥. Now apply (3.1) with �� replacing 𝑥. It

follows that

0 = (𝐼𝒳2 −𝑄𝑃 )𝐴2𝑄�� = (𝐼𝒳2 −𝑄𝑃 )𝐴2𝑄𝑃𝐴𝑘2𝑄𝑥

= −(𝐼𝒳2 −𝑄𝑃 )𝐴2(𝐼𝒳2 −𝑄𝑃 )𝐴𝑘2𝑄𝑥+ (𝐼𝒳2 −𝑄𝑃 )𝐴2𝐴

𝑘2𝑄𝑥

= (𝐼𝒳2 −𝑄𝑃 )𝐴𝑘+12 𝑄𝑥.


Thus by induction (3.2) holds. Using (3.2) we see that

𝐶2𝐴𝜈2𝑄𝑥 = 𝑍(𝐼𝒳2 −𝑄𝑃 )𝐴𝜈

2𝑄𝑥 = 0, 𝜈 = 0, 1, 2, . . . .

We proved the proposition. □

Proof of Corollary 3.3. Let the operator 𝐼𝒴 + 𝑌 𝑃𝐵2 be invertible, and assumethat (𝐼𝒳1 − 𝑃𝑄)𝑥 = 0. According to Proposition 3.1 this implies that the vector𝑄𝑥 belongs to


𝜈2 . By our hypotheses the latter space consists of the

zero vector only. So 𝑄𝑥 = 0. But then 𝑥 = 𝑃𝑄𝑥 = 0. Since 𝐼𝒳1 − 𝑃𝑄 is Fredholmof index zero and Ker (𝐼𝒳1 − 𝑃𝑄) = 0, it follows that the operator 𝐼𝒳1 − 𝑃𝑄 isinvertible. □

Remark 3.5. Assume that∩∞

𝜈=0Ker𝐶2𝐴𝜈2 = {0} and 𝐼𝒳1 − 𝑃𝑄 is Fredholm of

index zero. Assume also that 𝑌 and 𝑍 exist satisfying (1.14). Then Proposition 3.1and Lemma 2.2 imply that the operator 𝐼𝒴 + 𝑌 𝑃𝐵2 is invertible if and only if theoperators 𝐼𝒳1 − 𝑃𝑄 and 𝐼𝒳1 −𝐴1𝑃𝐴2𝑄 are both invertible.

We conclude this section with three alternatives of Theorem 1.3. They followdirectly from Theorem 1.3 by using the symmetry and duality arguments men-tioned in Remark 1.4.

Theorem 3.6. Assume that there exist operators 𝑌 and 𝑍 as in (1.15) such that theidentities in (1.14) are satisfied. If the associated operator 𝐼𝒰+𝑍𝑄𝐵1 is invertible,then Ker (𝐼𝒳2 − 𝑄𝑃 ) ⊂


𝜈1𝑃 . Moreover, if the operator 𝐼𝒳2 − 𝑄𝑃 is

Fredholm of index zero and∩∞

𝜈=0Ker𝐶1𝐴𝜈1 = {0}, then 𝐼𝒳2 −𝑄𝑃 is invertible.

Theorem 3.7. Assume that there exist operators 𝑋 and𝑊 as in (1.15) such that theidentities in (1.13) are satisfied. If the associate operator 𝐼𝒰 +𝐶2𝑄𝑋 is invertible,then Im𝑃𝐴𝜈

2𝐵2 ⊂ Im (𝐼𝒳1 − 𝑃𝑄) for 𝜈 = 0, 1, 2, . . .. Moreover, if 𝐼𝒳1 − 𝑃𝑄 isFredholm of index zero and span {Im𝐴𝜈

2𝐵2 ∣ 𝜈 = 0, 1, 2, . . .} is dense in 𝒳2, then𝐼𝒳1 − 𝑃𝑄 is invertible.

Theorem 3.8. Assume that there exist operators 𝑌 and 𝑍 as in (1.15) such that theidentities in (1.14) are satisfied. If the associate operator 𝐼𝒴+𝐶1𝑃𝑊 is invertible,then Im𝑄𝐴𝜈

1𝐵1 ⊂ Im (𝐼𝒳2 − 𝑄𝑃 ) for 𝜈 = 0, 1, 2, . . .. Moreover, if 𝐼𝒳2 − 𝑄𝑃 isFredholm of index zero and span {Im𝐴𝜈

1𝐵1 ∣ 𝜈 = 0, 1, 2, . . .} is dense in 𝒳1, then𝐼𝒳2 −𝑄𝑃 is invertible.

4. Toeplitz plus Hankel operators

This section consists of three subsections. In the first we prove Theorem 1.1. Inthe second subsection we derive the identity (1.21), and in the third we deriveformulas for (𝑅 −𝐺𝑉 −1𝐻) and (𝑉 −𝐻𝑅−1𝐺)−1 analogous to the one in (1.21).We begin with a general remark.

Remark. Since the entries of the matrix functions defining the operators 𝐺, 𝐻, 𝑅,𝑉 all belong to the Wiener algebra on the unit circle, it follows that for arbitrary


linear maps 𝑥 : ℂ𝑠 → ℓ1+(ℂ𝑝) and 𝑦 : ℂ𝑠 → ℓ1+(ℂ

𝑞) one has

𝐻𝑥 : ℂ𝑠 → ℓ1+(ℂ𝑞), 𝑅𝑥 : ℂ𝑠 → ℓ1+(ℂ

𝑝), 𝑅−1𝑥 : ℂ𝑠 → ℓ1+(ℂ𝑝),

𝐺𝑦 : ℂ𝑠 → ℓ1+(ℂ𝑝), 𝑉 𝑦 : ℂ𝑠 → ℓ1+(ℂ

𝑞), 𝑉 −1𝑦 : ℂ𝑠 → ℓ1+(ℂ𝑞).

In particular, the linear maps 𝑏1, 𝑏2 and 𝑐1, 𝑐2 defined by (1.7) have their valuesin ℓ1+(ℂ

𝑝) and ℓ1+(ℂ𝑞), respectively.

4.1. Proof of Theorem 1.1

In this subsection we prove Theorem 1.1. In order to do this we apply Theorems 1.2and 1.3 with a special choice of 𝑃 and 𝑄 and the associated data set, namely

𝑃 = 𝐼𝒳1 − (𝑅 −𝐺𝑉 −1𝐻), 𝑄 = 𝐼𝒳2 , (4.1)

𝐴1 = 𝑆∗𝑝 , 𝐵1 = 𝐺𝑉

−1𝜀𝑞(𝜀∗𝑞𝑉−1𝜀𝑞)−1, 𝐶1 = 𝜀

∗𝑝, (4.2)

𝐴2 = 𝑆𝑝, 𝐵2 = 𝜀𝑝, 𝐶2 = 𝜀∗𝑞𝑉

−1𝐻. (4.3)

Here 𝒳1 = 𝒳2 = ℓ2+(ℂ

𝑝), and 𝑆𝑝 is the forward shift on ℓ2+(ℂ

𝑝). Note that

𝐼 − 𝑃𝑄 = 𝑅−𝐺𝑉 −1𝐻.Since 𝑅 is the invertible and 𝐺 (and 𝐻) are compact operators, we see that 𝐼−𝑃𝑄is a Fredholm operator of index zero.

To see that for the above operators the identities in (1.11) are valid we firstrecall a useful equality. Note that 𝜀∗𝑞𝑉 −1𝜀𝑞 is the entry in the left upper corner ofthe block matrix representing 𝑉 −1. Since 𝑉 is an invertible Toeplitz operator, thematrix 𝜀∗𝑞𝑉

−1𝜀𝑞 is invertible and

𝑉 −1 − 𝑆𝑞𝑉 −1𝑆∗𝑞 = (𝑉 −1𝜀𝑞)(𝜀∗𝑞𝑉−1𝜀𝑞)−1(𝜀∗𝑞𝑉

−1). (4.4)

This result is well known and follows using a standard Schur complement argument(see, e.g., [6, Section 4]).

Next we deal with the Stein equations (1.11). Using that 𝐺 and 𝐻 are Hankeloperators, that 𝑅 is a Toeplitz operator, and that 𝑉 −1 satisfies (4.4) we see that

𝑃 −𝐴1𝑃𝐴2 = 𝐼 − (𝑅−𝐺𝑉 −1𝐻)− 𝑆∗𝑝𝑆𝑝 + 𝑆∗𝑝(𝑅 −𝐺𝑉 −1𝐻)𝑆𝑝= −𝑅+ 𝑆∗𝑝𝑅𝑆𝑝 +𝐺𝑉 −1𝐻 − 𝑆∗𝑝𝐺𝑉 −1𝐻𝑆𝑝= 𝐺(𝑉 −1 − 𝑆𝑞𝑉 −1𝑆∗𝑞 )𝐻= 𝐺(𝑉 −1𝜀𝑞)(𝜀∗𝑞𝑉

−1𝜀𝑞)−1(𝜀∗𝑞𝑉−1)𝐻 = 𝐵1𝐶2.

Also

𝑄−𝐴2𝑄𝐴1 = 𝐼 − 𝑆𝑝𝑆∗𝑝 = 𝜀𝑝𝜀∗𝑝 = 𝐵2𝐶1.

Thus equations (1.11) are satisfied. Furthermore, since 𝐼 − 𝑃𝑄 = 𝑅 − 𝐺𝑉 −1𝐻 ,the left-hand side of (1.17) becomes

(𝑅−𝐺𝑉 −1𝐻)−1 − 𝑆𝑝(𝑅 −𝐺𝑉 −1𝐻)−1𝑆∗𝑝 .


Finally we see that

𝐼 −𝐴1𝑃𝐴2𝑄 = 𝐼 − 𝑆∗𝑝(𝐼 − (𝑅 −𝐺𝑉 −1𝐻))𝑆𝑝

= 𝐼 − 𝑆∗𝑝𝑆𝑝 + (𝑆∗𝑝𝑅𝑆𝑝 − 𝑆∗𝑝𝐺𝑉 −1𝐻𝑆𝑝)= 𝑅−𝐺1𝑉

−1𝐻1,

(4.5)

where 𝐺1 = 𝑆∗𝑝𝐺 and 𝐻1 = 𝑆

∗𝑞𝐻 .

As a next step towards the proof of Theorem 1.1 it will be convenient first toprove the following proposition. In what follows we freely use the terminology andnotation introduced in Theorem 1.1 and in the paragraphs preceding Theorem 1.1.

Proposition 4.1. The following five conditions are equivalent.

(1) Equation (1.3) has solutions 𝑎1 and 𝑑1 and at least one of the matrices 𝑎10and 𝑑10 is invertible.

(2) Equation (1.3) has solutions 𝑎1 and 𝑑1 and both matrices 𝑎10 and 𝑑10 areinvertible.

(3) Equation (1.4) has solutions 𝑎∗2 and 𝑑∗2 and at least one of the matrices 𝑎∗20and 𝑑∗20 is invertible.

(4) Equation (1.4) has solutions 𝑎∗2 and 𝑑∗2 and both matrices 𝑎∗20 and 𝑑∗20 areinvertible.

(5) The operators 𝑅−𝐺𝑉 −1𝐻 and 𝑅−𝐺1𝑉−1𝐻1, where 𝐺1 = 𝑆

∗𝑝𝐺 and 𝐻1 =

𝑆∗𝑞𝐻, are invertible.

Moreover in that case

(𝑅 −𝐺𝑉 −1𝐻)−1 − 𝑆𝑝(𝑅−𝐺𝑉 −1𝐻)−1𝑆∗𝑝= 𝑎1𝑎

−110 𝑎

∗2 − 𝑆𝑝𝑅−1𝐺𝑑1𝑑−110 𝑑

∗2𝐻𝑅

−1𝑆∗𝑝 ,(4.6)

(𝑉 −𝐻𝑅−1𝐺)−1 − 𝑆𝑞(𝑉 −𝐻𝑅−1𝐺)−1𝑆∗𝑞= 𝑑1𝑑

−110 𝑑

∗2 − 𝑆𝑞𝑉 −1𝐻𝑎1𝑎−110 𝑎

∗2𝐺𝑉

−1𝑆∗𝑞 .(4.7)

Proof. We split the proof into seven parts. The first part has an auxiliary character.The equivalence of the five conditions is proved in Parts 2–6. In the final part weprove formulas (4.6) and (4.7). Throughout we use the operators defined by (4.1),(4.2), and (4.3).

Part 1. We shall present operators satisfying (1.13) and (1.14). When 𝑎1 and 𝑑1are linear maps satisfying equation (1.3), we put

𝑋 = 𝑅−1𝐺𝑑1(𝜀∗𝑞𝑉−1𝜀𝑞)−1, 𝑊 = 𝑎1. (4.8)

We claim that with 𝑃 and 𝑄 defined as in the first paragraph of this subsection,the operators 𝑋 and 𝑊 in (4.8) satisfy (1.13). Indeed, using (1.3), we have

(𝐼 − 𝑃𝑄)𝑋 = (𝑅 −𝐺𝑉 −1𝐻)𝑅−1𝐺𝑑1(𝜀∗𝑞𝑉 −1𝜀𝑞)−1= 𝐺𝑉 −1(𝑉 −𝐻𝑅−1𝐺)𝑑1(𝜀∗𝑞𝑉 −1𝜀𝑞)−1= 𝐺𝑉 −1𝜀𝑞(𝜀∗𝑞𝑉

−1𝜀𝑞)−1 = 𝐵1,

(𝐼 −𝑄𝑃 )𝑊 = (𝑅 −𝐺𝑉 −1𝐻)𝑎1 = 𝜀𝑝 = 𝐵2.


Hence (1.13) is satisfied. Furthermore, (1.3) gives

𝐼 + 𝐶1𝑃𝑊 = 𝐼 + 𝜀∗𝑝𝑃𝑎1 = 𝐼 + 𝜀∗𝑝(𝐼 − (𝑅−𝐺𝑉 −1𝐻))𝑎1

= 𝜀∗𝑝𝑎1 + 𝐼 − 𝜀∗𝑝𝜀𝑝 = 𝑎10,𝐼 + 𝐶2𝑄𝑋 = 𝐼 + 𝜀∗𝑞𝑉

−1𝐻𝑅−1𝐺𝑑1(𝜀∗𝑞𝑉−1𝜀𝑞)−1

= 𝐼 − 𝜀∗𝑞𝑉 −1(𝑉 −𝐻𝑅−1𝐺)𝑑1(𝜀∗𝑞𝑉 −1𝜀𝑞)−1 + 𝜀∗𝑞𝑑1(𝜀∗𝑞𝑉 −1𝜀𝑞)−1= 𝐼 − 𝜀∗𝑞𝑉 −1𝜀𝑞(𝜀∗𝑞𝑉 −1𝜀𝑞)−1 + 𝜀∗𝑞𝑑1(𝜀∗𝑞𝑉 −1𝜀𝑞)−1= 𝑑10(𝜀

∗𝑞𝑉

−1𝜀𝑞)−1.

Similar results hold for 𝑎2 in place of 𝑎1 and 𝑑2 in place of 𝑑1. Indeed, when𝑎2 and 𝑑2 are linear maps satisfying equation (1.4), we put

𝑌 = 𝑎∗2, 𝑍 = 𝑑∗2𝐻𝑅−1. (4.9)

These operators satisfy (1.14). Indeed, using (1.4), we have

𝑍(𝐼 −𝑄𝑃 ) = 𝑑∗2𝐻𝑅−1(𝑅−𝐺𝑉 −1𝐻)= 𝑑∗2(𝑉 −𝐻𝑅−1𝐺)𝑉 −1𝐻 = 𝜀∗𝑞𝑉

−1𝐻 = 𝐶2,

𝑌 (𝐼 − 𝑃𝑄) = 𝑎∗2(𝑅−𝐺𝑉 −1𝐻) = 𝜀∗𝑝 = 𝐶1.

Hence (1.14) is satisfied. Furthermore, (1.4) gives

𝐼 + 𝑌 𝑃𝐵2 = 𝐼 + 𝑎∗2(𝐼 − (𝑅−𝐺𝑉 −1𝐻))𝜀𝑝

= 𝐼 − 𝑎∗2(𝑅 −𝐺𝑉 −1𝐻)𝜀𝑝 + 𝑎∗2𝜀𝑝 = 𝑎∗2𝜀𝑝 = 𝑎∗20,𝐼 + 𝑍𝑄𝐵1 = 𝐼 + 𝑑

∗2𝐻𝑅

−1𝐺𝑉 −1𝜀𝑞(𝜀∗𝑞𝑉−1𝜀𝑞)−1

= 𝐼 + 𝑑∗2𝜀𝑞(𝜀∗𝑞𝑉

−1𝜀𝑞)−1 − 𝑑∗2(𝑉 −𝐻𝑅−1𝐺)𝑉 −1𝜀𝑞(𝜀∗𝑞𝑉 −1𝜀𝑞)−1= 𝐼 + 𝑑∗2𝜀𝑞(𝜀

∗𝑞𝑉

−1𝜀𝑞)−1 − 𝜀∗𝑞𝑉 −1𝜀𝑞(𝜀∗𝑞𝑉 −1𝜀𝑞)−1 = 𝑑∗20(𝜀∗𝑞𝑉 −1𝜀𝑞)−1.Part 2. In this part we show that condition (5) implies conditions (1)–(4). Soassume (5) is satisfied. Then there exists linear maps as in (1.2) satisfying equations(1.3) and (1.4). Also 𝐼 − 𝑃𝑄 is invertible, and using (4.5) we see that the sameholds true for 𝐼 − 𝐴1𝑃𝐴2𝑄. From Lemma 2.2 we conclude that 𝐼 + 𝐶2𝑄𝑋 and𝐼 + 𝑌 𝑃𝐵2 are invertible. Hence 𝑑10 and 𝑎

∗20 are invertible. Now use Theorem 1.2

and notice that the equalities in (1.16) imply that 𝐼+𝑍𝑄𝐵1 and 𝐼+𝐶1𝑃𝑊 also areinvertible. We conclude that 𝑑∗20 and 𝑎10 are invertible, 𝑎10 = 𝑎

∗20 and 𝑑10 = 𝑑

∗20.

So indeed condition (5) implies conditions (1)–(4).

Next we make a remark that will help to simplify the remaining parts of theproof. Assume that 𝐼 − 𝑃𝑄 is invertible. Then the equations (1.13) and (1.14)are uniquely solvable, and with these solutions the equalities in (1.16) hold true.Moreover Lemma 2.2 shows that if one of the four operators in (1.16) is invertible,then 𝐼 −𝐴1𝑃𝐴2𝑄 is invertible and hence condition (5) is satisfied. Conclusion: inorder to finish the proof of the equivalence of the five conditions (1)–(5) we onlyhave to show that (1) and (3) each separately imply that 𝐼 − 𝑃𝑄 is invertible.(Trivially, condition (2) implies (1) and (4) implies (3).)


Part 3. In this part we show that condition (1) with 𝑎10 invertible implies that𝐼 − 𝑃𝑄 is invertible. Define 𝑋 and 𝑊 by (4.8). As we have seen in the firstpart of the proof, 𝐼 + 𝐶1𝑃𝑊 = 𝑎10, and hence 𝐼 + 𝐶1𝑃𝑊 is invertible. Assumethat 𝑦(𝐼 − 𝑄𝑃 ) = 0. According to Theorem 3.8 it follows that 𝑦𝑄𝐴𝑛

1𝐵1 = 0 for𝑛 = 0, 1, 2, . . .. So for all nonnegative integers 𝑛 we obtain that 𝑦(𝑆∗𝑝)

𝑛𝐺𝑉 −1𝜀𝑞 = 0.

Here we use that 𝜀∗𝑞𝑉 −1𝜀𝑞 is invertible. Since 𝑉 is invertible, 𝑉 factors as 𝑉 =

𝑉−𝑉+, where 𝑉+ and 𝑉− are invertible Toeplitz operators, 𝑉+ and 𝑉 −1+ are both

lower triangular, and 𝑉− and 𝑉 −1− are both upper triangular (see [9] or [2, Theorem1.2]). Then

0 = 𝑦(𝑆∗𝑝)𝑛𝐺𝑉 −1𝜀𝑞 = (𝑦𝐺)𝑆𝑛

𝑞 𝑉−1𝜀𝑞 = (𝑦𝐺)𝑆𝑛

𝑞 𝑉−1+ 𝑉 −1− 𝜀𝑞 = (𝑦𝐺)𝑉 −1+ 𝑆𝑛

𝑞 𝜀𝑞𝑣−0,

where 𝑣−0 is the invertible (1, 1)-entry of 𝑉 −1− . Hence (𝑦𝐺𝑉 −1+ )𝑆𝑛𝑞 𝜀𝑝 = 0 for 𝑛 =

0, 1, 2, . . .. So we obtain 𝑦𝐺𝑉 −1+ = 0, and 𝑦𝐺 = 0. Since 𝐼−𝑃𝑄 = 𝑅−𝐺𝑉 −1𝐻 wehave 𝑦(𝑅 − 𝐺𝑉 −1𝐻) = 0, and 𝑦𝐺 = 0 implies that 𝑦𝑅 = 0. But 𝑅 is invertible,and therefore 𝑦 = 0. So 𝐼 − 𝑃𝑄 has a dense range. Since 𝐼 − 𝑃𝑄 is a Fredholmoperator of index zero, it follows that 𝐼 − 𝑃𝑄 is invertible.

Part 4. In this part we show that condition (1) with 𝑑10 invertible implies that𝐼−𝑃𝑄 is invertible. Define 𝑋 and 𝑊 by (4.8). As we have seen in the first part ofthe proof, 𝐼+𝐶2𝑄𝑋 = 𝑑10, and hence the operator 𝐼+𝐶2𝑄𝑋 is invertible. Assumethat 𝑦(𝐼 − 𝑃𝑄) = 0. From Theorem 3.7 we see that 0 = 𝑦𝑃𝐴𝑛

2𝐵2 = 𝑦𝑃𝑆𝑛𝑝 𝜀𝑝 for

𝑛 = 0, 1, 2, . . .. It follows that 𝑦𝑃 = 0, and therefore 𝑦 = 𝑦−𝑦𝑃𝑄 = 𝑦(𝐼−𝑃𝑄) = 0.So 𝐼 − 𝑃𝑄 has a dense range. Since 𝐼 −𝑃𝑄 is a Fredholm operator of index zero,it follows that 𝐼 − 𝑃𝑄 is invertible.

Part 5. In this part we show that condition (3) with 𝑎∗20 invertible implies that𝐼 −𝑃𝑄 is invertible. Define 𝑌 and 𝑍 by (4.9). As we have seen in the first part ofthe proof, 𝐼 + 𝑌 𝑃𝐵2 = 𝑎

∗20, and hence 𝐼 + 𝑌 𝑃𝐵2 is invertible. From Theorem 1.3

we conclude that

Ker (𝐼 − 𝑃𝑄) ⊂∞∩𝜈=0

Ker𝐶2𝐴𝜈2𝑄.

Assume that (𝐼−𝑃𝑄)𝑥 = 0. Then, by the previous identity 𝐶2𝐴𝜈2𝑄𝑥 = 𝐶2𝐴

𝜈2𝑥 = 0.

Using the definitions of 𝐴2 and 𝐶2, we obtain 𝜀∗𝑝𝑉

−1𝐻𝑆𝑛𝑝 𝑥 = 0 for 𝑛 = 0, 1, 2, . . ..

As above in Part 3 write 𝑉 = 𝑉−𝑉+. Then

0 = 𝜀∗𝑝𝑉−1+ 𝑉 −1− 𝐻𝑆𝑛

𝑝 𝑥 = 𝜀∗𝑝𝑉

−1+ 𝑉 −1−

(𝑆∗𝑞)𝑛𝐻𝑥 = 𝑣+0𝜀

∗𝑝

(𝑆∗𝑞)𝑛𝑉 −1− 𝐻𝑥.

We see that 𝐻𝑥 = 0. But then 𝑅𝑥 = 0 and 𝑥 = 0. As above we conclude that𝑅−𝐺𝑉 −1𝐻 is invertible. So 𝐼 − 𝑃𝑄 is invertible.

Part 6. In this part we show that condition (3) with 𝑑∗20 invertible that 𝐼−𝑃𝑄 isinvertible. Define 𝑌 and 𝑍 by (4.9). As we have seen in the first part of the proof𝐼 + 𝑍𝑄𝐵1 = 𝑑

∗20, and hence 𝐼 + 𝑍𝑄𝐵1 is invertible. Assume that (𝐼 − 𝑃𝑄)𝑥 = 0.

According to Theorem 3.6 we have that 𝐶1𝐴𝑛1𝑃𝑥 = 0 for 𝑛 = 0, 1, 2, . . .. So

𝜀𝑝𝑆𝑛𝑝𝑃𝑥 = 0 for all 𝑛. But then 𝑃𝑥 = 0, and we conclude that 𝑥 = 0. Hence

𝐼 − 𝑃𝑄 is invertible.


Part 7. Finally we apply Theorem 1.2 to show that the inverse of 𝑅−𝐺𝑉 −1𝐻 isgiven by (1.17) whenever one of the conditions (1)–(5) is satisfied. Formula (1.17)translates to (4.6), and hence

(𝑅−𝐺𝑉 −1𝐻)−1 − 𝑆𝑝(𝑅 −𝐺𝑉 −1𝐻)−1𝑆∗𝑝= 𝑎1𝑎

−110 𝑎

∗2 − 𝑆𝑝𝑅−1𝐺𝑑1𝑑−110 𝑑

∗2𝐻𝑅

−1𝑆∗𝑝 .

The identity (4.7) one obtains by just switching the roles of 𝑅 and 𝑉 , 𝐺 and 𝐻 ,𝑝 and 𝑞, 𝑎𝑗0 and 𝑑𝑗0. □

We now are ready to prove Theorem 1.1. Recall that for a linear map 𝛼 :ℂ𝑚 → ℓ2+(ℂ

𝑚) we denote by 𝑇𝛼 the𝑚×𝑚 block lower triangular Toeplitz operatorwith first column equal to 𝛼. For an 𝑚×𝑚 matrix 𝑢 the symbol Δ𝑢 denotes theblock diagonal operator on ℓ2+(ℂ

𝑚) with all diagonal entries equal to 𝑢.

Proof of Theorem 1.1. First we prove (1.5). Use Proposition 4.1 to derive the iden-tity (4.6). By multiplying this identity 𝑛− 1 times from the left by 𝑆𝑝 and 𝑛− 1times from the right by 𝑆∗𝑝 , and adding the resulting identities one gets (also using(1.7))

(𝑅 −𝐺𝑉 −1𝐻)−1 − 𝑆𝑛𝑝 (𝑅−𝐺𝑉 −1𝐻)−1(𝑆∗𝑝 )𝑛

=

𝑛−1∑𝑘=0

𝑆𝑘𝑝𝑎1𝑎

−110 𝑎

∗2(𝑆

∗𝑝)

𝑘 − 𝑆𝑝(

𝑛−1∑𝑘=0

𝑆𝑘𝑝 𝑏1𝑑

−110 𝑏

∗2(𝑆

∗𝑝)

𝑘

)𝑆∗𝑝 .

Since for any ℎ we have lim𝑛→∞(𝑆∗𝑝)𝑛ℎ = 0, the left-hand side converges pointwise

to (𝑅 −𝐺𝑉 −1𝐻)−1. Notice that𝑛−1∑𝑘=0

𝑆𝑘𝑝𝑎1𝑎

−110 𝑎

∗2(𝑆

∗𝑝)

𝑘 = 𝑇𝑎1Δ𝑎−110Π𝑛𝑇

∗𝑎2,

where Π𝑛 = 𝐼 − 𝑆𝑛𝑝 (𝑆

∗𝑝)

𝑛, and lim𝑛→∞ 𝑇𝑎1Δ𝑎−110Π𝑛𝑇

∗𝑎2= 𝑇𝑎1Δ𝑎−1

10𝑇 ∗𝑎2

. Similarly,

using that 𝑏1, 𝑏2 : ℂ𝑞 → ℓ1+(ℂ

𝑝), one finds

∞∑𝑛=0

𝑆𝑛𝑝 𝑏1𝑑

−110 𝑏

∗2(𝑆

∗𝑝)

𝑛 = 𝑇𝑏1Δ𝑑−110𝑇 ∗𝑏2 .

So we get

(𝑅−𝐺𝑉 −1𝐻)−1 = 𝑇𝑎1Δ𝑎−110𝑇 ∗𝑎2

− 𝑆𝑝𝑇𝑏1Δ𝑑−110𝑇 ∗𝑏2𝑆

∗𝑝 .

We proved (1.5). Formula (1.6) one obtains in a similar way from (4.7). □

It is interesting to specify Theorem 1.1 for the case when 𝐺 = 0, 𝐻 = 0 and𝑉 = 𝐼. Then 𝑅 − 𝐺𝑉 −1𝐻 = 𝑅. Recall that 𝑅 is assumed to be invertible. Thehypotheses 𝐺 = 0, 𝐻 = 0 and 𝑉 = 𝐼 imply that (1.3) and (1.4) reduce to

𝑎1 = 𝑅−1𝜀𝑝, 𝑑1 = 𝜀𝑞, 𝑎∗2 = 𝜀

∗𝑝𝑅−1, 𝑑∗2 = 𝜀

∗𝑞 .

In particular, 𝑎1 is the first column of 𝑅−1, 𝑎∗2 is the first row of 𝑅

−1, and 𝑎10 = 𝑎∗20is the (1, 1) entry of 𝑅−1. Since we assume that the entries of the matrix function


defining 𝑅 belong to the Wiener algebra on the circle, the classical result from [9]then tells us that 𝑎10 is invertible and

𝑅−1 = 𝑇𝑎1Δ𝑎−110𝑇 ∗𝑎2. (4.10)

The above formula for 𝑅−1 is precisely (1.5) for the case when 𝐺 and 𝐻 are zero.Indeed, when 𝐺 and 𝐻 are zero, then (1.7) tells us that the matrices 𝑏1 and 𝑏2 arezero. But in that case (1.5) reduces to (4.10).

4.2. Finite rank Hankel operators

In this subsection we return to the case when 𝐺 and 𝐻 are of finite rank and𝑅 = 𝐼𝑝 and 𝑉 = 𝐼𝑞. We shall derive the identity (1.21). To do this we use the dataappearing in (1.18), (1.19) and (1.20).

Proof of (1.21). Assume there exist linear maps 𝑎1, 𝑎2, 𝑑1, 𝑑2 as in (1.2) satisfyingequations (1.3) and (1.4) (with 𝑅 and 𝑉 identity operators) such that 𝑎10 or 𝑑10is invertible. As Theorem 1.1 tells us, this implies that 𝐼 − 𝐺𝐻 is invertible. Weintend to apply Theorem 1.2 with the data as in (1.18), (1.19) and (1.20). Firstone checks that the operators

𝑋 = Λ1𝑎1, 𝑌 = 𝑑∗2Γ1, 𝑊 = Λ2𝑑1, 𝑍 = 𝑎∗2Γ2 (4.11)

satisfy the identities (1.13) and (1.14). This allows us to prove the following iden-tities:

𝐼 + 𝐶2𝑄𝑋 = 𝑎10, 𝐼 + 𝐶1𝑃𝑊 = 𝑑10, (4.12)

𝐼 + 𝑍𝑄𝐵1 = 𝑎∗20, 𝐼 + 𝑌 𝑃𝐵2 = 𝑑

∗20. (4.13)

To see this let us establish the first identity in (4.12):

𝐼 + 𝐶2𝑄𝑋 = 𝐼 + 𝜀∗𝑝Γ2Λ2Γ1Λ1𝑎1 = 𝐼 + 𝜀∗𝑝𝐺𝐻𝑎1

= 𝐼 + 𝜀∗𝑝(𝑎1 − (𝐼 −𝐺𝐻)𝑎1) = 𝑎10.The other identities in (4.12) and (4.13) are proved in a similar way. It follows(see Lemma 2.1) that 𝑑10 = 𝑑

∗20 and 𝑎10 = 𝑎

∗20. Since we assume that one of the

matrices 𝑎10 and 𝑑10 is invertible, both are invertible, and we get from (1.17),(4.11), (4.12), and (4.13) that

𝑄(𝐼 − 𝑃𝑄)−1 −𝐴2𝑄(𝐼 − 𝑃𝑄)−1𝐴1

= Λ2𝑑1𝑑−110 𝑑

∗2Γ1 −𝐴2Λ2Γ1Λ1𝑎1𝑎

−110 𝑎

∗2Γ2Λ2Γ1𝐴1

= Λ2𝑑1𝑑−110 𝑑

∗2Γ1 −𝐴2Λ2𝐻𝑎1𝑎

−110 𝑎

∗2𝐺Γ1𝐴1.

The fact that both 𝐴1 and 𝐴2 are stable then yields:

𝑄(𝐼 − 𝑃𝑄)−1 =∞∑𝜈=0

𝐴𝜈2Λ2𝑑1𝑑

−110 𝑑

∗2Γ1𝐴

𝜈1 −

∞∑𝜈=0

𝐴𝜈+12 Λ2𝐻𝑎1𝑎

−110 𝑎

∗2𝐺Γ1𝐴

𝜈+11 .


Using (𝐼 −𝐺𝐻)−1 = 𝐼 + Γ2𝑄(𝐼 − 𝑃𝑄)−1Λ1, we find

(𝐼 −𝐺𝐻)−1 = 𝐼 +∞∑𝜈=0

Γ2𝐴𝜈2Λ2𝑑1𝑑

−110 𝑑

∗2Γ1𝐴

𝜈1Λ1

−∞∑𝜈=0

Γ2𝐴𝜈+12 Λ2𝐻𝑎1𝑎

−110 𝑎

∗2𝐺Γ1𝐴

𝜈+11 Λ1.

Next put 𝑏1 = −𝐺𝑑1 and 𝑏∗2 = −𝑑∗2𝐻 ; cf. the identities in (1.7). ThenΓ2𝐴

𝜈2Λ2𝑑1 = (𝑆∗𝑝)

𝜈(−𝑏1), 𝑑∗2Γ1𝐴𝜈1Λ1 = −𝑏∗2𝑆𝜈

𝑝 (𝜈 ≥ 0).

Also use

Γ2𝐴𝜈+12 Λ2𝐻𝑎1 = (𝑆∗𝑝)

𝜈+1𝐺𝐻𝑎1 = (𝑆∗𝑝)𝜈+1(𝑎1 − 𝜀𝑝) = (𝑆∗𝑝)

𝜈+1𝑎1,

𝑎∗2𝐺Γ1𝐴𝜈+11 Λ1 = 𝑎

∗2𝐺𝐻𝑆

𝜈+1𝑝 = (𝑎∗2 − 𝜀∗𝑝)𝑆𝜈+1

𝑝 = 𝑎∗2𝑆𝜈+1𝑝 .

In this way we obtain

(𝐼 −𝐺𝐻)−1 = 𝐼 +∞∑𝜈=0

(𝑆∗𝑝)𝜈𝑏1𝑑

−110 𝑏

∗2𝑆

𝜈𝑝 −

∞∑𝜈=0

(𝑆∗𝑝)𝜈+1𝑎1𝑎

−110 𝑎

∗2𝑆

𝜈+1𝑝

= 𝐼 +𝐻𝑏1Δ𝑑−110𝐻∗𝑏2 − 𝑆∗𝑝𝐻𝑎1Δ𝑎−1

10𝐻∗𝑎2

𝑆𝑝.

(4.14)

This proves (1.21). □

Remark. Notice that all the Hankel operators in (4.14) are of finite rank. Forexample, the identity 𝐻𝑏1 = −Γ2

[𝑊 𝐴2𝑊 𝐴2

2𝑊 ⋅ ⋅ ⋅] implies that the rankof 𝐻𝑏1 is at most the order of 𝐴2.

Formula (1.21) remains true in the more general case when 𝐺 and 𝐻 are offinite rank. To see this, assume there exist linear maps 𝑎1, 𝑎2, 𝑑1, 𝑑2 as in (1.2)satisfying equations (1.3) and (1.4) (with 𝑅 and 𝑉 identity operators) such that𝑎10 or 𝑑10 is invertible. Hence, by Theorem 1.1, the operator 𝐼 −𝐺𝐻 is invertibleTo derive the analogue of the formula (1.21) for this case we apply Theorem 1.2.Put

𝑃 = 𝐻, 𝑄 = 𝐺,

𝐴1 = 𝑆∗𝑞 , 𝐵1 = 𝐻𝜀𝑝, 𝐶1 = 𝜀

∗𝑝,

𝐴2 = 𝑆∗𝑝 , 𝐵2 = 𝐺𝜀𝑞, 𝐶2 = 𝜀

∗𝑞 .

The corresponding Stein equations (1.11) are satisfied, and

𝑋 = 𝐻𝑎1, 𝑊 = 𝐺𝑑1, 𝑌 = 𝑑∗2, and 𝑍 = 𝑎∗2solve the equations (1.13) and (1.14). Notice that the equalities (4.12), (4.13) holdtrue. Furthermore, 𝑑10 = 𝑑

∗20 and 𝑎10 = 𝑎

∗20. Theorem 1.2 yields

𝐺(𝐼 −𝐻𝐺)−1 − 𝑆∗𝑝𝐺(𝐼 −𝐻𝐺)−1𝑆∗𝑞 = 𝐺𝑑1𝑑−110 𝑑∗2 − 𝑆∗𝑝𝐺𝐻𝑎1𝑎−110 𝑎

∗2𝐺𝑆

∗𝑞 .

Since 𝑆∗𝑝𝐺𝐻𝑎1 = 𝑆∗𝑝(𝑎1 − 𝜀𝑝) = 𝑆∗𝑝𝑎1 and 𝐺𝑑1 = −𝑏1 and 𝑎∗2𝐺 = −𝑐∗2, we get

𝐺(𝐼 −𝐻𝐺)−1 − 𝑆∗𝑝𝐺(𝐼 −𝐻𝐺)−1𝑆∗𝑞 = −𝑏1𝑑−110 𝑑∗2 + 𝑆

∗𝑝𝑎1𝑎

−110 𝑐

∗2𝑆∗𝑞 .


Using the same reasoning as in the proof of Theorem 1.1 we obtain

𝐺(𝐼 −𝐻𝐺)−1 = −𝐻𝑏1Δ−1𝑑10𝑇 ∗𝑑2

+ 𝑆∗𝑝𝐻𝑎1Δ−1𝑎10𝑇 ∗𝑐2𝑆

∗𝑞 . (4.15)

To derive from (4.15) a formula for (𝐼 −𝐺𝐻)−1, we use the identity(𝐼 −𝐺𝐻)−1 = 𝐼 +𝐺(𝐼 −𝐻𝐺)−1𝐻. (4.16)

Thus we have to multiply (4.15) from the right by 𝐻 . For this purpose we use thefollowing lemma.

Lemma 4.2. Let 𝑥 : ℂ𝑟 → ℓ1+(ℂ𝑟), and let 𝐾 : ℓ1+(ℂ

𝑟) → ℓ1+(ℂ𝑠) be a Hankel

operator of which the defining matrix function has entries in the Wiener algebra.Then

𝐾𝑇𝑥 = 𝐻𝐾𝑥 : ℓ1+(ℂ

𝑟)→ ℓ1+(ℂ𝑠).

Here 𝑇𝑥 is the lower triangular Toeplitz operator with first column 𝑥 and 𝐻𝐾𝑥 isthe Hankel operator with with first column 𝐾𝑥.

Proof. Note that 𝐾𝑥 : ℂ𝑟 → ℓ1+(ℂ𝑠). The 𝑗th column of 𝐾𝑇𝑥 is 𝐾𝑆

𝑗𝑟𝑥 and the

𝑗th column of 𝐻𝐾𝑥 is (𝑆∗𝑠 )

𝑗𝐾𝑥. These are equal since 𝐾𝑆𝑗𝑟 = (𝑆∗𝑠 )

𝑗𝐾. □

From this lemma we see that 𝑇 ∗𝑑2𝐻 = 𝐻∗

𝐻∗𝑑2. With as before 𝑏∗2 = −𝑑∗2𝐻 it

follows that 𝑇 ∗𝑑2𝐻 = −𝐻∗𝑏2 . Next we consider 𝑇 ∗𝑐2𝑆∗𝑞𝐻 . The dual of this operator

is 𝐻∗𝑆𝑞𝑇𝑐2 = 𝐻∗𝑇𝑆𝑞𝑐2 = 𝐻𝐻∗𝑆𝑞𝑐2 . Note that

𝑐∗2𝑆∗𝑞𝐻 = −𝑎∗2𝐺𝐻𝑆𝑝 = (−𝑎∗2 + 𝜀∗𝑝)𝑆𝑝 = −𝑎∗2𝑆𝑝,

to see that 𝐻∗𝑆𝑞𝑇𝑐2 = −𝐻𝑆∗𝑝𝑎2 = −𝑆∗𝑝𝐻𝑎2 . We get 𝑇

∗𝑐2𝑆

∗𝑞𝐻 = −𝐻∗𝑎2

𝑆𝑝, and

therefore (4.15) and (4.16) together yield

(𝐼 −𝐺𝐻)−1 = 𝐼 +𝐻𝑏1Δ−1𝑑10𝐻∗𝑏2 − 𝑆∗𝑝𝐻𝑎1Δ

−1𝑎10𝐻∗𝑎2

𝑆𝑝, (4.17)

which is (1.21) for the present case.

Remark. Using (4.16) the identity (4.17) also follows from (1.6) with 𝑅 = 𝐼 and𝑉 = 𝐼 . Indeed, using (4.16) and (1.6) we obtain

(𝐼 −𝐺𝐻)−1 = 𝐼 +𝐺𝑇𝑑1Δ−1𝑑10𝑇 ∗𝑑2𝐻 −𝐺𝑆𝑞𝑇𝑐1Δ−1𝑎10

𝑇 ∗𝑐2𝑆∗𝑞𝐻.

Applying Lemma 4.2 yields the identities:

𝐺𝑇𝑑1 = −𝐻𝑏1 , 𝐺𝑆𝑞𝑇𝑐1 = −𝑆∗𝑝𝐻𝑎1 , 𝑇∗𝑑2𝐻 = −𝐻∗𝑏2 , 𝑇 ∗𝑐2𝑆∗𝑞𝐻 = −𝐻∗𝑎2

𝑆𝑝.

Using these identities one obtains (4.17).

In the next section we shall prove the analogue of formula (4.17) in the generalsetting of Theorem 1.1.


4.3. Hankel type formulas for the inverse of Toeplitz plus Hankel

In this section we derive formulas for the inverses of 𝑅−𝐺𝑉 −1𝐻 and 𝑉 −𝐻𝑅−1𝐺that generalize formula (1.21) (and (4.17)). We begin with some notation.

Let 𝜑 be the defining function of 𝑅, and 𝜓 the one of 𝑉 . Since R and V areinvertible, det𝜑 and det𝜓 have no zero on the unit circle, and hence 𝜑−1 and 𝜓−1

are well defined matrix functions. Moreover, the entries of 𝜑−1 and 𝜓−1 belong tothe Wiener algebra on the unit circle. We denote by 𝑅× and 𝑉 × the block Toeplitzoperators defined by 𝜑−1 and 𝜓−1, respectively.

Theorem 4.3. Assume there exist linear maps 𝑎1, 𝑎2 and 𝑑1, 𝑑2 as in (1.2) satisfy-ing equations (1.3) and (1.4). Then 𝑎10 = 𝑎

∗20 and 𝑑10 = 𝑑

∗20. Furthermore, assume

that at least one of the matrices 𝑎10 and 𝑑10 is invertible. Then both the matrices𝑎10 and 𝑑10 are invertible, and the operators 𝑅 − 𝐺𝑉 −1𝐻 and 𝑉 − 𝐻𝑅−1𝐺 areinvertible. Moreover,

(𝑅 −𝐺𝑉 −1𝐻)−1 = 𝑅× +𝐻𝑏1Δ−1𝑑10𝐻∗𝑏2 − 𝑆∗𝑝𝐻𝑎1Δ

−1𝑎10𝐻∗𝑎2

𝑆𝑝, (4.18)

(𝑉 −𝐻𝑅−1𝐺)−1 = 𝑉 × +𝐻𝑐1Δ−1𝑎10𝐻∗𝑐2 − 𝑆∗𝑞𝐻𝑑1Δ

−1𝑑10𝐻∗𝑑2

𝑆𝑞. (4.19)

Here

𝑏1 = −𝑅−1𝐺𝑑1, 𝑐1 = −𝑉 −1𝐻𝑎1, 𝑏∗2 = −𝑑∗2𝐻𝑅−1, 𝑐∗2 = −𝑎∗2𝐺𝑉 −1.Proof. As in the proof of Theorem 1.1, we conclude from Proposition 4.1 that theidentity (4.6) holds. Multiply this identity from the left by 𝑆∗𝑝 and from the rightby −𝑆𝑝. This yields (cf. the first identity in (5.16) of [6]):

(𝑅 −𝐺𝑉 −1𝐻)−1 − 𝑆∗𝑝(𝑅−𝐺𝑉 −1𝐻)−1𝑆𝑝 = 𝑏1𝑑−110 𝑏∗2 − 𝑆∗𝑝𝑎1𝑎−110 𝑎

∗2𝑆𝑝. (4.20)

Next, by 𝑛 − 1 times repeatedly multiplying (4.20) from the left by 𝑆∗𝑝 and fromthe right by 𝑆𝑝, and adding the resulting identities, we get

(𝑅 −𝐺𝑉 −1𝐻)−1 − (𝑆∗𝑝)𝑛(𝑅 −𝐺𝑉 −1𝐻)−1𝑆𝑛

𝑝

=

𝑛−1∑𝑘=0

(𝑆∗𝑝)𝑘𝑏1𝑑

−110 𝑏

∗2𝑆

𝑘𝑝 −

𝑛−1∑𝑘=0

(𝑆∗𝑝)𝑘𝑆∗𝑝𝑎1𝑎

−110 𝑎

∗2𝑆𝑝𝑆

𝑘𝑝 .

(4.21)

In order to determine for the terms in (4.21) the limits for 𝑛 going to infinity, wewill deal with each of these terms separately.

We begin with the first term on the right-hand side. Let Π𝑛 be the projectionof ℓ2+(ℂ

𝑝) mapping (𝑥0, 𝑥1, 𝑥2, . . .) onto (𝑥0, . . . , 𝑥𝑛−1, 0, 0, . . .). Then

𝑛−1∑𝑘=0


−110 𝑏

∗2𝑆

𝑘𝑝 = 𝐻𝑏1Δ

−1𝑑10Π𝑛𝐻

∗𝑏2 .

Since 𝐻∗𝑏2 is compact, lim𝑛→∞Π𝑛𝐻∗𝑏2= 𝐻∗𝑏2 , with convergence in the operator

norm, and hence∞∑𝑘=0


−110 𝑏

∗2𝑆

𝑘𝑝 = 𝐻𝑏1Δ𝑑−1

10𝐻∗𝑏2 . (4.22)


Similarly one derives that

∞∑𝑘=0

(𝑆∗𝑝)𝑘𝑆∗𝑝𝑎1𝑎

−110 𝑎

∗2𝑆𝑝𝑆

𝑘𝑝 = 𝑆

∗𝑝𝐻𝑎1Δ

−1𝑎10𝐻∗𝑎2

𝑆𝑝. (4.23)

Next we proceed with the left-hand side of the identity (4.21). First re-mark that the convergence of the right-hand side yields the existence of the limitlim𝑛→∞(𝑆∗𝑝)𝑛(𝑅 −𝐺𝑉 −1𝐻)−1𝑆𝑛

𝑝 . We will prove

lim𝑛→∞(𝑆

∗𝑝)

𝑛(𝑅−𝐺𝑉 −1𝐻)−1𝑆𝑛𝑝 = 𝑅

×. (4.24)

The proof of (4.24) will be based on the following two observations: (a) the operator(𝑅 −𝐺𝑉 −1𝐻)−1 − 𝑅−1 is compact, and (b) the operator 𝑅−1 − 𝑅× is compact.Note that (a) follows from

(𝑅−𝐺𝑉 −1𝐻)−1 = 𝑅−1 +𝑅−1𝐺𝑉 −1𝐻(𝑅−𝐺𝑉 −1𝐻)−1.Indeed, using the latter identity and 𝐺 (or 𝐻) is compact, we get item (a). Toget item (b), we use that 𝑅 is invertible, and hence the defining function 𝜑 of 𝑅admits a canonical factorizations, 𝜑 = 𝜑−𝜑+. Recall that 𝑅× is the block Toeplitzoperator defined by 𝜑−1 = 𝜑−1+ 𝜑

−1− and 𝑅−1 = 𝑅×+𝑅

×−, where 𝑅

×+ and 𝑅×− are the

block Toeplitz operators defined by 𝜑−1+ and 𝜑−1− , respectively. But then, using astandard identity (see formula (4) in [7, Section XXIII]), we see that 𝑅×−𝑅−1 isthe product of two compact Hankel operators, which proves item (b).

Given items (a) and (b) we see that (𝑅 − 𝐺𝑉 −1𝐻)−1 = 𝑅× +𝐾, where 𝐾is a compact operator. Since (𝑆∗𝑝)

𝑛 → 0 pointwise, the compactness of 𝐾 impliesthat (𝑆∗𝑝)𝑛𝐾 → 0 in operator norm, and thus (𝑆∗𝑝)𝑛𝐾𝑆𝑛

𝑝 → 0 in operator norm as

𝑛→∞. The fact that 𝑅× is a block Toeplitz operator is equivalent to 𝑆∗𝑝𝑅×𝑆𝑝 =

𝑅×, and hence (𝑆∗𝑝)𝑛𝑅×𝑆𝑛

𝑝 = 𝑅× for each 𝑛. We conclude that the left-hand side

of (4.24) is equal to

lim𝑛→∞(𝑆

∗𝑝)

𝑛𝑅×𝑆𝑛𝑝 + lim

𝑛→∞(𝑆∗𝑝 )

𝑛𝐾𝑆𝑛𝑝 = 𝑅

×.

This proves (4.24). Combining the limits (4.22), (4.23), and (4.24) we obtain (4.18).

The proof of (4.19) can be done in exactly the same manner. □

5. New proof of the main inversion theorem in [6]

In this section we show how Theorem 2.1 in [6] can be obtained from Theorem 1.2.

We begin with some preliminaries. Let 𝒳 be a Hilbert space with two or-thogonal direct sum decompositions:

𝒳 = 𝒰1 ⊕ 𝒴1 = 𝒰2 ⊕ 𝒴2. (5.1)


On 𝒳 we have two operators 𝐴 and 𝐾 such that relative to these decompositions

𝐴 =

[𝛼1 00 𝛼2

]:

[𝒰1𝒴1

]→[𝒰2𝒴2

], where 𝛼2 is invertible; (5.2)

𝐾 =

[𝜅1 00 𝜅2

]:

[𝒰2𝒴2

]→[𝒰1𝒴1

], where 𝜅2 is invertible and 𝜅2 = 𝛼

−12 . (5.3)

In what follows we denote by 𝜋ℋ the orthogonal projection of 𝒳 onto the subspaceℋ, viewed as an operator from 𝒳 to ℋ. Furthermore, 𝜏ℋ denotes the canonicalembedding of ℋ into 𝒳 , that is, 𝜏ℋ = 𝜋∗ℋ. The following result is Theorem 2.1in [6].

Theorem 5.1. Let 𝑇 be an invertible operator on 𝒳 and let 𝐴 and 𝐾 be as in (5.2)and (5.3), respectively. Assume that

𝜋𝒴1(𝑇 −𝐾𝑇𝐴)𝜏𝒴1 = 0. (5.4)

Consider the operators defined by

Ξ = 𝑇−1𝜏𝒰2 : 𝒰2 → 𝒳 , Ψ = 𝑇−1𝜏𝒰1 : 𝒰1 → 𝒳 ,Υ = 𝜋𝒰2𝑇

−1 : 𝒳 → 𝒰2, Ω = 𝜋𝒰1𝑇−1 : 𝒳 → 𝒰1.

Furthermore, put 𝜉0 = 𝜋𝒰2Ξ and 𝜓0 = 𝜋𝒰1Ψ. Then 𝜉0 is invertible if and only if𝜓0 is invertible, and in this case the inverse of 𝑇 satisfies the identity

𝑇−1 −𝐴𝑇−1𝐾 = Ξ𝜉−10 Υ−𝐴Ψ𝜓−10 Ω𝐾. (5.5)

First we state and prove a preliminary lemma.

Lemma 5.2. Let 𝑇 be an invertible operator on 𝒳 and let 𝐴 and 𝐾 be as in (5.2)and (5.3), respectively. Let

𝐴0 =

[0 00 𝛼2

]:

[𝒰1𝒴1

]→[𝒰2𝒴2

], 𝐾0 =

[0 00 𝜅2

]:

[𝒰2𝒴2

]→[𝒰1𝒴1

].

Then (5.5) holds true if and only if

𝑇−1 −𝐴0𝑇−1𝐾0 = Ξ𝜉−10 Υ−𝐴0Ψ𝜓

−10 Ω𝐾0.

Proof. It is sufficient to prove that

𝐴𝑇−1𝐾 − 𝐴0𝑇−1𝐾0 = 𝐴Ψ𝜓

−10 Ω𝐾 −𝐴0Ψ𝜓

−10 Ω𝐾0.

Write

𝐴𝑇−1𝐾 −𝐴0𝑇−1𝐾0 = (𝐴−𝐴0)𝑇

−1𝐾 +𝐴0𝑇−1(𝐾 −𝐾0),

𝐴Ψ𝜓−10 Ω𝐾 −𝐴0Ψ𝜓−10 Ω𝐾0 = (𝐴−𝐴0)Ψ𝜓

−10 Ω𝐾 +𝐴0Ψ𝜓

−10 Ω(𝐾 −𝐾0).

So it suffices to prove

(𝐴−𝐴0)𝑇−1𝐾 = (𝐴−𝐴0)Ψ𝜓

−10 Ω𝐾 (5.6)

and

𝐴0𝑇−1(𝐾 −𝐾0) = 𝐴0Ψ𝜓

−10 Ω(𝐾 −𝐾0). (5.7)


Write

𝑇−1 =[𝜓0 𝜏12𝜏21 𝜏22

]:

[𝒰1𝒴1]→[𝒰1𝒴1].

Then

Ψ = 𝑇−1𝜏𝒰1 =

[𝜓0𝜏21

], and Ω = 𝜋𝒰1𝑇

−1 =[𝜓0 𝜏12

].

A straightforward computation with these matrix representations reveals that (5.6)and (5.7) hold true. □

Proof of Theorem 5.1. In view of Lemma 5.2 we may assume that 𝛼1 = 0 and𝜅1 = 0. Let 𝒳1 = 𝒳2 = 𝒳 , 𝒴 = 𝒰2, 𝒰 = 𝒰1 ⊕ 𝒰1 and

𝑃 = 𝐼𝒳 − 𝑇, 𝑄 = 𝐼𝒳 ,

𝐴1 = 𝐾, 𝐵1 =[𝜏𝒰1 −𝑇𝜏𝒰1 + 𝜏𝒰1(𝐼𝒰1 + 𝜋𝒰1𝑇𝜏𝒰1)

], 𝐶1 = 𝜋𝒰2 ,

𝐴2 = 𝐴, 𝐵2 = 𝜏𝒰2 , 𝐶2 =

[−𝜋𝒰1𝑇𝜋𝒰1

].

First we present a few simple auxiliary identities that will play a role in thesequel:

𝜋𝒰1𝐴1 = 𝜋𝒰1𝐾 = 0, 𝐴2𝜏𝒰1 = 𝐴𝜏𝒰1 = 0,

𝐼𝒳 −𝐴2𝐴1 = 𝐼𝒳 −𝐴𝐾 = 𝜏𝒰2𝜋𝒰2 , 𝐼𝒳 = 𝜏𝒴1𝜋𝒴1 + 𝜏𝒰1𝜋𝒰1 .

Using these equalities and (5.4), we obtain

𝑇 −𝐾𝑇𝐴 = (𝜏𝒴1𝜋𝒴1 + 𝜏𝒰1𝜋𝒰1)(𝑇 −𝐾𝑇𝐴)(𝜏𝒴1𝜋𝒴1 + 𝜏𝒰1𝜋𝒰1)

= 𝜏𝒰1𝜋𝒰1𝑇 + 𝑇𝜏𝒰1𝜋𝒰1 − 𝜏𝒰1𝜋𝒰1𝑇𝜏𝒰1𝜋𝒰1 .

So it follows that

𝑃 −𝐴1𝑃𝐴2 = 𝐼𝒳 −𝐾𝐴− (𝑇 −𝐾𝑇𝐴)= 𝜏𝒰1𝜋𝒰1 − 𝜏𝒰1𝜋𝒰1𝑇 − 𝑇𝜏𝒰1𝜋𝒰1 + 𝜏𝒰1𝜋𝒰1𝑇𝜏𝒰1𝜋𝒰1 = 𝐵1𝐶2.

Furthermore,

𝑄−𝐴2𝑄𝐴1 = 𝐼𝒳 −𝐴𝐾 = 𝜏𝒰2𝜋𝒰2 = 𝐵2𝐶1.

Next let us define

𝑋 =[Ψ −𝜏𝒰1 +Ψ(𝐼𝒰1 + 𝜋𝒰1𝑇𝜏𝒰1)

], 𝑌 = Υ

𝑊 = Ξ, 𝑍 =

[−𝜋𝒰1

Ω

].

Then

𝑇𝑋 =[𝜏𝒰1 −𝑇𝜏𝒰1 + 𝜏𝒰1(𝐼𝒰1 + 𝜋𝒰1𝑇𝜏𝒰1)

]= 𝐵1,

𝑌 𝑇 = 𝐶1, 𝑇𝑊 = 𝐵2, 𝑍𝑇 =


]= 𝐶2.


We proceed with 𝐼𝒰 + 𝐶2𝑄𝑋 and 𝐼𝒴 + 𝑌 𝑃𝐵2. First

𝐼𝒴 + 𝑌 𝑃𝐵2 = 𝐼𝒰2 + 𝑌 (𝐼𝒳 − 𝑇 )𝜏𝒰2 = 𝐼𝒰2 + 𝑌 𝜏𝒰2 − 𝑌 𝑇𝜏𝒰2

= 𝐼𝒰2 + 𝜋𝒰2𝑇−1𝜏𝒰2 − 𝜋𝒰2𝜏𝒰2 = 𝜉0.

Next

𝐼𝒰 + 𝐶2𝑄𝑋 =

[𝐼𝒰1 00 𝐼𝒰1

]+


] [Ψ −𝜏𝒰1 +Ψ(𝐼𝒰1 + 𝜋𝒰1𝑇𝜏𝒰1)

].

We compute the four entries of this 2 × 2 matrix separately. The (1, 1)-entry is𝐼𝒰1 − 𝜋𝒰1𝑇Ψ = 𝐼𝒰1 − 𝜋𝒰1𝜏𝒰1 = 0. The (1, 2)-entry is

𝜋𝒰1𝑇𝜏𝒰1 − 𝜋𝒰1𝑇Ψ(𝐼𝒰1 + 𝜋𝒰1𝑇𝜏𝒰1) = −𝐼𝒰1 .

For the (2, 1)-entry 𝜋𝒰1Ψ = 𝜋𝒰1𝑇−1𝜏𝒰1 = 𝜓0. Finally the (2, 2)-entry is given by

𝐼𝒰1 − 𝜋𝒰1𝜏𝒰1 + 𝜋𝒰1Ψ(𝐼𝒰1 + 𝜋𝒰1𝑇𝜏𝒰1) = 𝜋𝒰1Ψ(𝐼𝒰1 + 𝜋𝒰1𝑇𝜏𝒰1).

We have


[0 𝐼𝒰1

𝜓0 𝜓0(𝐼𝒰1 + 𝜋𝒰1𝑇𝜏𝒰1)

],

which is invertible if and only if 𝜓0 is invertible. Our assumption was that 𝜉0 or𝜓0 is invertible. So we have that 𝐼𝒰 + 𝐶2𝑄𝑋 or 𝐼𝒰2 + 𝑌 𝑃𝐵2 is invertible andhence, according to Theorem 1.2 both are invertible and formula (1.17) holdstrue. The left-hand side of this formula is 𝑇−1 − 𝐴𝑇−1𝐾 since we have that𝑄(𝐼𝒳 − 𝑃𝑄)−1 = 𝑇−1. To finish the proof we have to check that the right-handside of (1.17) gives the right-hand side of (5.5). For the first term this is immediatefrom the above-established equalities 𝑊 = Ξ, 𝐼𝒰2 + 𝑌 𝑃𝐵2 = 𝜉0 and 𝑌 = Υ. Forthe second term first notice that

𝐴2𝑄𝑋(𝐼𝒰 + 𝐶2𝑄𝑋)−1𝑍𝑄𝐴1

= 𝐴[Ψ −𝜏𝒰1 +Ψ(𝐼𝒰1 + 𝜋𝒰1𝑇𝜏𝒰1)

] [−(𝐼𝒰1 + 𝜋𝒰1𝑇𝜏𝒰1) 𝜓−10

𝐼𝒰1 0

] [−𝜋𝒰1

Ω

]𝐾.

Use 𝐴𝜏𝒰1 = 0 and 𝜋𝒰1𝐾 = 0 to see that this is equal to 𝐴Ψ𝜓−10 Ω𝐾. We provedformula (5.5). □

6. Examples

In this section 𝑇 is an invertible 𝑛 × 𝑛 block Toeplitz matrix with blocks of size𝑝× 𝑝, and we take 𝑃 = 𝐼𝒳 − 𝑇 and 𝑄 = 𝐼𝒳 , where 𝒳 = (ℂ𝑝)𝑛. With these 𝑃 and𝑄 we shall associate two different sets of matrices {𝐴1, 𝐵1, 𝐶1;𝐴2, 𝐵2, 𝐶2} suchthe equations (1.9), (1.10), and (1.11) are satisfied. For both choices we compute(1.17). We shall see that for the first choice (1.17) cannot be used to obtain aformula for 𝑇−1, while for the second choice (1.17) leads to the Gohberg–Heinigformula [8] for 𝑇−1.First example. To introduce our first choice for the set (𝐴1, 𝐵1, 𝐶1;𝐴2, 𝐵2, 𝐶2)associated with 𝑃 and 𝑄 we need some auxiliary operators. We define the forward


block shift 𝑁 : (ℂ𝑝)𝑛 → (ℂ𝑝)𝑛 and the two embedding operators 𝜀 : ℂ𝑝 → (ℂ𝑝)𝑛

and 𝜂 : ℂ𝑝 → (ℂ𝑝)𝑛 by

𝑁

⎡⎢⎢⎢⎣𝑥0𝑥1...

𝑥𝑛−1

⎤⎥⎥⎥⎦ =⎡⎢⎢⎢⎣0 0 ⋅ ⋅ ⋅ 0𝐼𝑝 0 0

. . .. . .

...𝐼𝑝 0

⎤⎥⎥⎥⎦⎡⎢⎢⎢⎣𝑥0𝑥1...

𝑥𝑛−1

⎤⎥⎥⎥⎦ , 𝜀𝑥 =

⎡⎢⎢⎢⎣𝑥0...0

⎤⎥⎥⎥⎦ , 𝜂𝑥 =

⎡⎢⎢⎢⎣0...0𝑥

⎤⎥⎥⎥⎦ ,for each 𝑥 ∈ ℂ𝑝. Let 𝒳 = 𝒳1 = 𝒳2 = (ℂ𝑝)𝑛, 𝒰 = (ℂ𝑝)2 and 𝒴 = ℂ𝑝. Put𝐴 = 𝑁 + 𝜀𝜂∗ and

𝐴1 = 𝐴, 𝐵1 =[−𝑇𝜀+ 𝐴1𝑇𝜂 −𝜀] , 𝐶1 = 0,

𝐴2 = 𝐴∗ = 𝐴−1, 𝐵2 = 0, 𝐶2 =

[𝜀∗

𝜀∗𝑇 − 𝜂∗𝑇𝐴2

].

Then 𝑄 − 𝐴2𝑄𝐴1 = 𝐵2𝐶1 and 𝑃 − 𝐴1𝑃𝐴2 = 𝐵1𝐶2. The latter equality one cancheck as follows. First note that 𝑇 −𝑁𝑇𝑁∗ = 𝑇𝜀𝜀∗ + 𝜀𝜀∗𝑇 − 𝜀𝜀∗𝑇𝜀𝜀∗. It followsthat

−(𝑃 −𝐴1𝑃𝐴2) = 𝑇 −𝐴1𝑇𝐴2 = 𝑇 − (𝑁 + 𝜀𝜂∗)𝑇 (𝑁∗ + 𝜂𝜀∗)

= 𝑇𝜀𝜀∗ + 𝜀𝜀∗𝑇 − 𝜀𝜀∗𝑇𝜀𝜀∗ − 𝜀𝜂∗𝑇𝐴2 −𝑁𝑇𝜂𝜀∗= 𝑇𝜀𝜀∗ + 𝜀𝜀∗𝑇 − 𝜀𝜂∗𝑇𝜂𝜀∗ − 𝜀𝜂∗𝑇𝐴2 −𝑁𝑇𝜂𝜀∗= 𝑇𝜀𝜀∗ + 𝜀𝜀∗𝑇 −𝐴1𝑇𝜂𝜀

∗ − 𝜀𝜂∗𝑇𝐴2

= −𝐵1𝐶2.

Recall that 𝑇 is assumed to be invertible. Define 𝑥, 𝑤, 𝑧∗ and 𝑦∗ by

𝑇𝑥 = 𝜀, 𝑇𝑤 = 𝐴1𝑇𝜂, 𝑧∗𝑇 = 𝜀∗, 𝑦∗𝑇 = 𝜂∗𝑇𝐴2.

Then

𝑋 =[−𝜀+ 𝑤 −𝑥] , 𝑌 = 0, 𝑊 = 0, 𝑍 =

[𝑧∗

𝜀∗ − 𝑦∗]

solve (1.13) and (1.14) and

𝐼𝒰 + 𝐶2𝑋 =

[𝜀∗𝑤 −𝜀∗𝑥

(𝜀∗ − 𝑦∗)(𝑇𝑤 − 𝑇𝜀) 𝑦∗𝜀

]Next notice that 𝐼𝒴 + 𝑌 𝑃𝑊 = 𝐼𝒴 is invertible. Thus Theorem 1.2 tells us that𝐼𝒰 + 𝐶2𝑋 is invertible and

𝑇−1 −𝐴2𝑇−1𝐴1 = 𝐴2𝑋(𝐼 + 𝐶2𝑋)

−1𝑍𝐴1. (6.8)

Note that in this case 𝐴1 and 𝐴2 are not stable, and it is not clear how one canuse (6.8) to derive a formula for 𝑇−1.


Second example. The spaces 𝒳 , 𝒳1, 𝒳2, 𝒰 , and 𝒴 are as in the previous example.We choose

𝐴1 = 𝑁∗, 𝐵1 =

[𝜂 −𝑇𝜂 + 𝜂𝑡0

], 𝐶1 = 𝜀

∗,

𝐴2 = 𝑁, 𝐵2 = 𝜀, 𝐶2 =

[𝜂∗ − 𝜂∗𝑇𝜂∗

],

where 𝑡0 is the left upper entry of 𝑇 . Then the equations in (1.11) are satisfied.Using 𝑇 is invertible, define 𝑥, 𝑦, 𝑤 and 𝑧 by

𝑇𝑥 = 𝜀, 𝑇 𝑧 = 𝜂, 𝑦∗𝑇 = 𝜀∗, 𝑤∗𝑇 = 𝜂∗.

To satisfy (1.13) and (1.14) put

𝑊 = 𝑥, 𝑌 = 𝑦∗, 𝑋 =[𝑧 −𝜂 + 𝑧𝑡0

], 𝑍 =

[𝑤∗ − 𝜂∗𝑤∗

].

Then 𝐼𝒴 + 𝑌 𝑃𝐵2 = 𝑦∗𝜀𝑝 and


[𝜂∗𝑧 −𝐼 + 𝜂∗𝑧𝑡0𝜂∗𝑧 𝜂∗𝑧𝑡0

].

Now write for short 𝑦0 = 𝑦∗𝜀𝑝 and 𝑧𝑚 = 𝜂∗𝑧. So 𝐼𝒰 + 𝐶2𝑄𝑋 is invertible if and

only if 𝑧𝑚 is. A simple computation gives in this case that

𝑇−1 −𝑁𝑇−1𝑁∗ = 𝑥𝑦−10 𝑦∗ −𝑁𝑧𝑧−1𝑚 𝑤∗𝑁∗. (6.9)

This is a well-known formula for the inverse of 𝑇 from [8]. In this case 𝑁𝑛 = 0 and(𝑁∗)𝑛 = 0. Hence one can easily derive from (6.9) a closed expression for 𝑇−1.

References

[1] H. Bart, I. Gohberg, M.A. Kaashoek, and A.C.M. Ran, Factorization of matrix andoperator functions: the state space approach, OT 178, Birkhauser Verlag, Basel, 2008.

[2] H. Bart, I. Gohberg, M.A. Kaashoek, and A.C.M. Ran, A state space approach tocanonical factorization with applications, OT 200, Birkhauser Verlag, Basel, 2010.

[3] M.J. Corless and A.E. Frazho, Linear systems and control, Marcel Dekker, Inc., NewYork, 2003.

[4] R.L. Ellis and I. Gohberg, Orthogonal systems and convolution operators, OT 140,Birkhauser Verlag, Basel, 2003.

[5] R.L. Ellis, I. Gohberg, and D.C. Lay, Infinite analogues of block Toeplitz matricesand related orthogonal functions, Integral Equations and Operator Theory 22 (1995),375–419.

[6] A.E. Frazho and M.A. Kaashoek, A contractive operator view on an inversion for-mula of Gohberg–Heinig, in: Topics in Operator Theory I. Operators, matrices andanalytic functions, OT 202, Birkhauser Verlag, Basel, 2010, pp. 223–252.

[7] I. Gohberg, S. Goldberg, and M.A. Kaashoek, Classes of Linear Operators, VolumeII, OT 63, Birkhauser Verlag, Basel, 1993.


[8] I. Gohberg, G. Heinig, The inversion of finite Toeplitz matrices consisting of elementsof a non-commutative algebra, Rev. Roum. Math. Pures et Appl. 20 (1974), 623–663 (in Russian); English transl. in: Convolution Equations and Singular IntegralOperators, (eds. L. Lerer, V. Olshevsky, I.M. Spitkovsky), OT 206, Birkhauser Verlag,Basel, 2010, pp. 7–46.

[9] I.C. Gohberg and M.G. Krein, Systems of integral equations with kernels dependingon the difference of arguments, Uspekhi Math. Nauk 13 2(80) (1958), 3–72 (Russian);English Transl., Amer. Math. Soc. Transl. (Series 2) 14 (1960), 217–287.

[10] G.J. Groenewald and M.A. Kaashoek, A Gohberg–Heinig type inversion formulainvolving Hankel operators,in: Interpolation, Schur functions and moment problems,OT 165, Birkhauser Verlag, Basel, 2005, pp. 291–302.

[11] G. Heinig and K. Rost, Algebraic methods for Toeplitz-like matrices and operators,Akademie-Verlag, Berlin, 1984.

[12] T. Kailath and A.H. Sayed, Displacement structure: Theory and applications, SIAMRev. 37 (1995), 297–386.

[13] T. Kailath and A.H. Sayed (editors), Fast Reliable Algorithms for Matrices withStructure, SIAM, Philadelphia, 1999.

[14] I. Koltracht, B.A. Kon, and L. Lerer, Inversion of structured operators, Integralequations and Operator Theory 20 (1994), 410–448.

[15] V. Olshevsky (editor), Structured matrices in mathematics, Computer Science, andEngineering, Contempary Math. Series 280, 281, Amer. Math. Soc. 2001.

[16] V.Y. Pan, Structured matrices and polynomials, Birkhauser Boston, 2001.

M.A. Kaashoek and F. van SchagenDepartment of MathematicsFaculty of SciencesVU UniversityDe Boelelaan 1081aNL-1081 HV Amsterdam, The Netherlandse-mail: [email protected]

[email protected]




On the Sign Characteristics ofSelfadjoint Matrix Polynomials

Peter Lancaster and Ion Zaballa

Dedicated to Leonid Lerer on the occasion of his seventieth birthday.

Abstract. An important role is played in the spectral analysis of selfadjointmatrix polynomials by the so-called “sign characteristics” associated with realeigenvalues. In this paper the ordering of the real eigenvalues by their signcharacteristics is clarified. In particular, the roles played by the signature ofthe leading and trailing polynomial coefficients are discussed.

Mathematics Subject Classification (2010). 15A21, 47B15.

Keywords. Matrix polynomial. Sign characteristics.

1. Introduction

Let 𝐿0, 𝐿1, . . . , 𝐿ℓ ∈ ℂ𝑛×𝑛. We consider matrix polynomials:

𝐿(𝜆) := 𝐿ℓ𝜆ℓ + 𝐿ℓ−1𝜆ℓ−1 + ⋅ ⋅ ⋅+ 𝐿0, 𝜆 ∈ ℂ, det𝐿ℓ ∕= 0. (1)

Such a polynomial is said to be selfadjoint if the coefficients are either complexHermitian or, in particular, real and symmetric. The eigenvalues of 𝐿 are the zerosof det𝐿(𝜆), and the eigenfunctions are the real analytic functions on ℝ formed bythe zeros of det𝐿(𝜆); say 𝜇1(𝜆), 𝜇2(𝜆), . . . , 𝜇𝑛(𝜆) (in some order to be decided). Inthis way the eigenvalues can also be characterized as the zeros of the eigenfunctionsand, if 𝜆0 is an eigenvalue of 𝐿(𝜆), then dimKer𝐿(𝜆0) is exactly the number ofeigenfunctions that annihilate at 𝜆0.

The notion of “sign characteristic” associated with a real eigenvalue playsan important role in the spectral analysis and perturbation theory of selfadjointmatrix polynomials; see [1], [2], and [3], for example. In particular, it should be

The first author was supported in part by the Natural Sciences and Engineering Research Councilof Canada.

The second author was supported in part by MICINN MTM2010-19356-C02-01, EJ GIC10/169-IT-361-10 and UPV/EHU UFI11/52.

190 P. Lancaster and I. Zaballa

noted that, for convenience, and because of many applications, it was assumedin [3] that 𝐿ℓ > 0. Here, the more general case of nonsingular 𝐿ℓ prevails as in(the more comprehensive) references [1] and [2]. In particular, we will need thefollowing fundamental result (Theorem 3.7 of [2] and Theorem 6.10 of [1]).

Theorem 1.1. Let 𝐿(𝜆) be an 𝑛×𝑛 selfadjoint matrix polynomial with nonsingularleading coefficient and let 𝜇1(𝜆), . . . , 𝜇𝑛(𝜆) be real analytic functions of real 𝜆 forwhich det(𝜇𝑗𝐼𝑛 − 𝐿(𝜆)) = 0, 𝑗 = 1, . . . , 𝑛. Let 𝜆1 < ⋅ ⋅ ⋅ < 𝜆𝑟 be the different realeigenvalues of 𝐿(𝜆). For each 𝑗 write

𝜇𝑗(𝜆) = (𝜆− 𝜆𝑖)𝑚𝑖𝑗𝜈𝑖𝑗(𝜆), 𝑖 = 1, . . . , 𝑟,

where 𝜈𝑖𝑗(𝜆𝑖) ∕= 0 is real. Then the non-zero numbers among 𝑚𝑖1,. . . , 𝑚𝑖𝑛 are thepartial multiplicities of 𝐿(𝜆) associated with 𝜆𝑖.

The sign of 𝜈𝑖𝑗(𝜆𝑖) (for 𝑚𝑖𝑗 ∕= 0) is the sign characteristic attached to theelementary divisors (𝜆− 𝜆𝑖)𝑚𝑖𝑗 of 𝐿(𝜆).

Note that this statement provides definitions of partial multiplicities and signcharacteristics and these are associated with each elementary divisor. (The readeris referred to [1] and [2] for more comprehensive dicussion.) In particular, if 𝐿(𝜆)is semisimple (that is, 𝑚𝑖𝑗 = 1 for all 𝑖 and 𝑗) then 𝜆𝑖 is said to be of positive ornegative type according as the sign characteristic attached to the correspondingelementary divisor (𝜆− 𝜆𝑖) is positive or negative, respectively.

2. Admissible sign characteristics

Our first objectives are to provide characterizations of admissible sign character-istics for polynomials 𝐿(𝜆) with either positive definite leading coefficient 𝐿ℓ, orpositive definite trailing coefficient 𝐿0.

The first result relates the inertia of the leading coefficient 𝐿ℓ to the asymp-totic behaviour of the eigenfunctions.

Theorem 2.1. Let (𝜋, 𝑛− 𝜋, 0) be the inertia of 𝐿ℓ, and let 𝜆max be the largest realeigenvalue of 𝐿(𝜆). Then there are 𝜋 indices {𝑖1, . . . , 𝑖𝜋} ⊆ {1, . . . , 𝑛} such thatfor all 𝜆 > 𝜆max,

𝜇𝑗(𝜆) > 0 𝑖𝑓 𝑗 ∈ {𝑖1, . . . , 𝑖𝜋} 𝑎𝑛𝑑 𝜇𝑗(𝜆) < 0 𝑖𝑓 𝑗 /∈ {𝑖1, . . . , 𝑖𝜋}.Proof. For 𝑗 = 1, . . . , 𝑛 the zeros of 𝜇𝑗(𝜆) are real eigenvalues of the polynomialmatrix 𝐿(𝜆) and, since this matrix has at most 𝑛ℓ real eigenvalues (counting withmultiplicities), the number of real zeros of 𝜇𝑗(𝜆) is finite. Then for 𝜆 > 𝜆max either𝜇𝑗(𝜆) > 0 or 𝜇𝑗(𝜆) < 0 for 𝑗 = 1, . . . , 𝑛.

On the other hand, for any fixed real 𝜆, the real number 𝜇(𝜆) is an eigenvalueof the selfadjoint (real or complex) matrix 𝐿(𝜆). Let 𝜆1(𝐿ℓ) ≥ ⋅ ⋅ ⋅ ≥ 𝜆𝑛(𝐿ℓ) denotethe eigenvalues of 𝐿ℓ with

𝜆1(𝐿ℓ) ≥ ⋅ ⋅ ⋅ ≥ 𝜆𝜋(𝐿ℓ) > 0 > 𝜆𝜋+1(𝐿ℓ) ≥ ⋅ ⋅ ⋅ ≥ 𝜆𝑛(𝐿ℓ)

On the Sign Characteristics of Selfadjoint Matrix Polynomials 191

We will use the “Weyl inequalities” for the eigenvalues of the sum of two sym-metric or Hermitian matrices (see [4, Th. 4.3.1], for example). If 𝐻1, 𝐻2 are 𝑛× 𝑛Hermitian or symmetric matrices then

𝜆𝑗(𝐻1) + 𝜆𝑛(𝐻2) ≤ 𝜆𝑗(𝐻1 +𝐻2) ≤ 𝜆𝑗(𝐻1) + 𝜆1(𝐻2)

where the eigenvalues of 𝐻1, 𝐻2 and 𝐻1+𝐻2 are arranged in non-increasing order.

Let 𝜆0 be a real number. Applying the left-hand Weyl inequality repeatedlyto the symmetric matrix 𝐿(𝜆0) = (𝐿ℓ𝜆

ℓ0 + ⋅ ⋅ ⋅+ 𝐿1𝜆0) + (𝐿0) we have

𝜆𝑗(𝐿(𝜆0)) ≥ 𝜆𝑗(𝐿ℓ𝜆ℓ0 + ⋅ ⋅ ⋅+ 𝐿1𝜆0) + 𝜆𝑛(𝐿0)

≥ 𝜆𝑗(𝐿ℓ𝜆ℓ0 + ⋅ ⋅ ⋅+ 𝐿2𝜆20) + 𝜆0𝜆𝑛(𝐿1) + 𝜆𝑛(𝐿0)

≥ ⋅ ⋅ ⋅≥ 𝜆ℓ0𝜆𝑗(𝐿ℓ) + 𝜆

ℓ−10 𝜆𝑛(𝐿𝑛−1) + ⋅ ⋅ ⋅+ 𝜆𝑛(𝐿0)

Thus, if 𝑚 = min{𝜆𝑛(𝐿𝑛−1), . . . , 𝜆𝑛(𝐿0)}, then for 𝜆0 > 1,

𝜆𝑗(𝐿(𝜆0)) ≥ 𝜆ℓ0𝜆𝑗(𝐿ℓ) + (𝜆ℓ−10 + ⋅ ⋅ ⋅+ 𝜆0 + 1)𝑚

≥ 𝜆ℓ0𝜆𝑗(𝐿ℓ) +𝜆ℓ0 − 1𝜆0 − 1𝑚.

(2)

Assume now that 𝜆𝑗(𝐿ℓ) > 0. If 𝑚 ≥ 0 then 𝜆𝑗(𝐿(𝜆0)) > 0 for 𝜆0 > 1. Also,if 𝑚 < 0 then for 𝜆0 > 1− 𝑚

𝜆𝑗(𝐿ℓ)we have

𝜆ℓ0(𝜆0 − 1) > −𝑚

𝜆𝑗(𝐿ℓ)𝜆ℓ0 > −

𝑚

𝜆𝑗(𝐿ℓ)𝜆ℓ0 +

𝑚

𝜆𝑗(𝐿ℓ),

whence

𝜆ℓ0(𝜆0 − 1) > −𝑚

𝜆𝑗(𝐿ℓ)(𝜆ℓ0 − 1),

and so

𝜆𝑗(𝐿ℓ)𝜆ℓ0 > −𝑚

𝜆ℓ0 − 1𝜆0 − 1 .

Using this in (2) we find that 𝜆𝑗(𝐿(𝜆0)) > 0 for 𝜆0 > 1 + ∣𝑚∣𝜆𝑗(𝐿ℓ)

.

Next, we use the right-handWeyl inequality (𝜆𝑗(𝐻1+𝐻2) ≤ 𝜆𝑗(𝐻1)+𝜆1(𝐻2))to show in a similar way that

𝜆𝑗(𝐿(𝜆0)) ≤ 𝜆ℓ0𝜆𝑗(𝐿ℓ) + 𝜆ℓ−10 𝜆1(𝐿𝑛−1) + ⋅ ⋅ ⋅+ 𝜆1(𝐿0),

≤ 𝜆ℓ0𝜆𝑗(𝐿ℓ) +𝜆ℓ0 − 1𝜆0 − 1𝑀,

with 𝑀 = max{𝜆1(𝐿0), . . . , 𝜆1(𝐿ℓ−1)}.If we assume that 𝜆𝑗(𝐿ℓ) < 0 then, as above, 𝑀 < 0 implies 𝜆𝑗(𝐿(𝜆0)) < 0

for 𝜆0 > 1. Similarly,𝑀 > 0 implies that 𝜆𝑗(𝐿(𝜆0)) < 0 for 𝜆0 > 1− 𝑀𝜆𝑗(𝐿ℓ)

. Hence,

for 𝑗 = 𝜋 + 1, . . . , 𝑛, 𝜆𝑗(𝐿(𝜆0)) < 0 for 𝜆0 > 1− ∣𝑀∣𝜆𝑗(𝐿ℓ)

.


Bearing in mind that 𝜆𝑖(𝐿ℓ) ≥ 𝜆𝜋(𝐿ℓ), 𝑖 = 1, . . . , 𝜋 and 𝜆𝑖(𝐿ℓ) ≤ 𝜆𝜋+1(𝐿ℓ) for𝑖 = 𝜋+1, . . . , 𝑛, we conclude that 𝜆𝑗(𝐿(𝜆0)) > 0 for 𝑗 = 1, . . . , 𝜋 and 𝜆𝑗(𝐿(𝜆0)) < 0for 𝑗 = 𝜋 + 1, . . . , 𝑛 whenever

𝜆0 > max

{1 +

∣𝑚∣𝜆𝜋(𝐿ℓ)

, 1− ∣𝑀 ∣𝜆𝜋+1(𝐿ℓ)

}. (3)

But the eigenvalues of 𝐿(𝜆0) are 𝜇1(𝜆0),. . . , 𝜇𝑛(𝜆0). Then, for 𝜆0 satisfying (3),there are 𝜋 indices {𝑖1, . . . , 𝑖𝜋} ⊆ {1, . . . , 𝑛} such that, if

{𝑗1, . . . , 𝑗𝑛−𝜋} = {1, . . . , 𝑛}∖{𝑖1, . . . , 𝑖𝜋},

then 𝜇𝑖𝑘(𝜆0) > 0, 𝑘 = 1, . . . , 𝜋, and 𝜇𝑗𝑘(𝜆0) < 0, 𝑘 = 1, . . . , 𝑛 − 𝜋. The theoremfollows using the fact that 𝜇𝑗(𝜆) is either positive or negative for 𝜆 > 𝜆max. □

In a similar way, the behaviour of the eigenfunctions of 𝐿(𝜆) near zero isclosely related to the inertia of the trailing coefficient, 𝐿0.

Theorem 2.2. Let 𝐿(𝜆) = 𝐿ℓ𝜆ℓ+𝐿ℓ−1𝜆ℓ−1+ ⋅ ⋅ ⋅+𝐿0 be an 𝑛×𝑛 selfadjoint matrix

polynomial with det𝐿ℓ ∕= 0. Let 𝜇1(𝜆), . . . , 𝜇𝑛(𝜆) be the eigenfunctions of 𝐿(𝜆)and let (𝜋, 𝜈, 𝛿) be the inertia of 𝐿0. Let 𝜆𝑧 be the positive real eigenvalue of 𝐿(𝜆)closest to zero. Then there are 𝜋 indices {𝑖1, . . . , 𝑖𝜋} ⊆ {1, . . . , 𝑛} and 𝜈 indices{𝑗1, . . . 𝑗𝜈} ⊆ {1, . . . , 𝑛} ∖ {{𝑖1, . . . , 𝑖𝜋} such that for 0 < 𝜆 < 𝜆𝑧, 𝜇𝑘(𝜆) > 0 if𝑘 ∈ {𝑖1, . . . , 𝑖𝜋} and 𝜇𝑘(𝜆) < 0 if 𝑘 ∈ {𝑗1, . . . , 𝑗𝜈}.

Proof. The proof follows the same lines as that of Theorem 2.1. First, for 𝑗 =1, . . . , 𝑛, 𝜇𝑗(𝜆) is either positive or negative for 𝜆 between any two consecutivereal eigenvalues of 𝐿(𝜆). In particular, each eigenfunction has constant sign in(0, 𝜆𝑧). Let the eigenvalues of 𝐿0 be

𝜆1(𝐿0) ≥ ⋅ ⋅ ⋅ ≥ 𝜆𝜋(𝐿0) > 0 > 𝜆𝜋+𝛿+1(𝐿0) ≥ ⋅ ⋅ ⋅ ≥ 𝜆𝑛(𝐿0),

and let 𝜆0 be a positive real number. Then 𝜆𝑗(𝐿0) > 0 for 𝑗 = 1, . . . , 𝜋 and,applying successively the left-hand Weyl inequalities to 𝐿(𝜆0), we obtain

𝜆𝑗(𝐿(𝜆0)) ≥ 𝜆𝑛(𝐿ℓ𝜆ℓ0 + ⋅ ⋅ ⋅+ 𝐿1𝜆0) + 𝜆𝑗(𝐿0),

≥ ⋅ ⋅ ⋅≥ (𝜆ℓ0𝜆𝑛(𝐿ℓ) + ⋅ ⋅ ⋅+ 𝜆0𝜆𝑛(𝐿1)) + 𝜆𝑗(𝐿0),≥ 𝑚(𝜆0 + ⋅ ⋅ ⋅+ 𝜆ℓ0) + 𝜆𝑗(𝐿0),

= 𝑚𝜆01− 𝜆ℓ01− 𝜆0 + 𝜆𝑗(𝐿0),

(4)

where 𝑚 = min{𝜆𝑛(𝐿1), . . . , 𝜆𝑛(𝐿ℓ)}.


Now, if 0 < 𝜆0 < 1 − ∣𝑚∣𝜆𝑗(𝐿0)+∣𝑚∣ then 𝜆𝑗(𝐿(𝜆0)) > 0. In fact, if 𝑚 ≥ 0 then

0 < 𝜆0 < 1 and it is plain that 𝜆𝑗(𝐿(𝜆0)) ≥ 𝜆𝑗(𝐿0) > 0. And if 𝑚 < 0 then

1 +𝑚

𝜆𝑗(𝐿0)−𝑚 > 𝜆0 ⇒ 𝜆𝑗(𝐿0)

𝜆𝑗(𝐿0)−𝑚 > 𝜆0 ⇒ −𝜆𝑗(𝐿0)/𝑚1− 𝜆𝑗(𝐿0)/𝑚 > 𝜆0

⇒ −𝜆𝑗(𝐿0)𝑚

> 𝜆0

(1− 𝜆𝑗(𝐿0)

𝑚

)⇒ (1− 𝜆0)

(−𝜆𝑗(𝐿0)

𝑚

)> 𝜆0

⇒ −𝜆𝑗(𝐿0)𝑚

> 𝜆01

1− 𝜆0 > 𝜆01− 𝜆ℓ01− 𝜆0 ⇒ 𝜆𝑗(𝐿0) > −𝑚𝜆0 1− 𝜆

ℓ0

1− 𝜆0 .

It follows from (4) that 𝜆𝑗(𝐿(𝜆0)) ≥ 𝑚𝜆0 1−𝜆ℓ0

1−𝜆0+ 𝜆𝑗(𝐿0) > 0.

Similarly, if 𝑗 = 𝜋 + 𝛿 + 1, . . . , 𝑛 then 𝜆𝑗(𝐿0) < 0 and we can apply theright-hand Weyl inequalities to show that

𝜆𝑗(𝐿(𝜆0)) ≤𝑀𝜆0 1− 𝜆ℓ0

1− 𝜆0 + 𝜆𝑗(𝐿0),where 𝑀 = max{𝜆1(𝐿1), . . . , 𝜆1(𝐿𝑛)}. As in the previous case, if 0 ≤ 𝜆0 ≤ 1 +

∣𝑀∣𝜆𝑗(𝐿0)−∣𝑀∣ then 𝜆𝑗(𝐿(𝜆0)) < 0.

Therefore, for 𝜆0 > 0 close enough to zero, 𝜆𝑗(𝐿(𝜆0)) > 0 or 𝜆𝑗(𝐿(𝜆0)) < 0according as 𝜆𝑗(𝐿0) > 0 or 𝜆𝑗(𝐿0) < 0, respectively. Since 𝜇𝑘(𝜆0) is an eigenvalueof 𝐿(𝜆0) and 𝜇𝑘(𝜆) does not change sign in (0, 𝜆𝑧) there must be 𝜋 eigenfunctionstaking positive values in (0, 𝜆𝑧) and 𝜈 eigenfunctions taking negative values in thesame open interval. □

3. The semisimple case

With the help of Theorems 2.1 and 2.2 we can establish a necessary conditionon the sign characteristics of the real eigenvalues of semisimple selfadjoint matrixpolynomials with positive definite leading and/or trailing coefficient. This leads toa proof of the following result which was stated in [5] – without proof.

Theorem 3.1. Let 𝐿(𝜆) be an 𝑛 × 𝑛 semisimple selfadjoint matrix polynomialwith 𝐿ℓ > 0 and maximal and minimal real eigenvalues 𝜆max and 𝜆min, respec-tively. For any 𝛼 ≤ 𝜆max, let 𝑝(𝛼) denote the number of real eigenvalues (countingmultiplicities) of 𝐿(𝜆) of positive type in (𝛼,+∞) and 𝑛(𝛼) denote the numberof real eigenvalues (counting multiplicites) of 𝐿(𝜆) of negative type in [𝛼,+∞).Then

𝑛(𝛼) ≤ 𝑝(𝛼) for all 𝛼 ∈ [𝜆min, 𝜆max]. (5)

In particular, this theorem says that, in the semisimple case, if 𝐿ℓ > 0 thenfor each real eigenvalue of negative type there is at least one larger real eigenvalueof positive type.

Proof. Let 𝛼 ∈ ℝ be such that 𝜆min ≤ 𝛼 ≤ 𝜆max and let 𝜆0 ∈ [𝛼, 𝜆max] bean eigenvalue of 𝐿(𝜆). Since 𝐿(𝜆) is semisimple the algebraic multiplicity of 𝜆0


coincides with its geometric multiplicity. Bearing in mind that dimKer𝐿(𝜆0) is thenumber of eigenfunctions that have 𝜆0 as a zero, we can associate an eigenfunction(perhaps in more than one way) with each eigenvalue.

Let 𝜇0(𝜆) be the eigenfunction associated with 𝜆0. Then, according to The-orem 1.1, we can write 𝜇0(𝜆) = (𝜆 − 𝜆0)𝜈0(𝜆) with 𝜈0(𝜆0) ∕= 0 and the sign to beassociated with 𝜆0 is positive or negative according as 𝜈0(𝜆0) > 0 or 𝜈0(𝜆0) < 0. Ifthe negative sign applies, then 𝜇0(𝜆) is decreasing at 𝜆0 and so 𝜇0(𝜆0+) < 0. But𝐿ℓ > 0 and by Theorem 2.1 𝜇0(𝜆) > 0 for 𝜆 > 𝜆max. It follows then that there

is ��0 > 𝜆0 such that 𝜇0(��0) = 0 and 𝜇′0(��0) > 0. Thus, the eigenvalue ��0 has anassociated positive sign.

This means first that 𝜆0 < 𝜆max (i.e., 𝑛(𝜆max) = 0 and so 𝜆max is of positivetype) and, also, that for each eigenvalue of 𝐿(𝜆) in the interval [𝛼, 𝜆max) of negativetype there is a larger eigenvalue in (𝛼, 𝜆max] of positive type. Noting that 𝜆max

is necessarily of positive type, and 𝐿(𝜆) has no eigenvalues in (𝜆max,+∞), weconclude that 𝑛(𝛼) ≤ 𝑝(𝛼) for all 𝛼 ∈ [𝜆min, 𝜆max], as desired. □

When 𝐿(𝜆) is semisimple and of even degree, the number of eigenvalues ofpositive type equals the number of negative type ([1, Prop. 4.2]). In this case𝑛(𝜆min) = 𝑝(𝜆min) implying that 𝜆min is of negative type. On the other hand, wehave seen in the proof of the theorem that 𝜆max is of positive type.

Theorem 3.1 should be compared with (the more general) Theorem 1.11 andExample 1.5 of [2].

Theorem 3.2. Let 𝐿(𝜆) be an 𝑛×𝑛 semisimple selfadjoint matrix polynomial with𝐿0 > 0 and maximal and minimal real eigenvalues 𝜆max and 𝜆min, respectively.

∙ For 𝛼 < 0 let 𝑝−(𝛼) denote the number of real eigenvalues (counting multi-plicities) of 𝐿(𝜆) of positive type in (𝛼, 0] and 𝑛−(𝛼) the number of realeigenvalues (counting multiplicites) of 𝐿(𝜆) of negative type in [𝛼, 0).

∙ For 𝛼 > 0 let 𝑝+(𝛼) denote the number of real eigenvalues (counting multi-plicities) of 𝐿(𝜆) of positive type in (0, 𝛼] and 𝑛+(𝛼) the number of realeigenvalues (counting multiplicites) of 𝐿(𝜆) of negative type in [0, 𝛼).

Then 𝑛−(𝛼) ≤ 𝑝−(𝛼) for all 𝛼 ∈ [𝜆min, 0) and 𝑛+(𝛼) ≥ 𝑝+(𝛼) for all 𝛼 ∈ (0, 𝜆max).

Proof. The line of proof is similar to that of the previous theorem but using The-orem 2.2. Let 𝜆𝑧 be the eigenvalue of 𝐿(𝜆) closest to zero. If 𝛼 ∈ [𝜆min, 0) and𝜆0 ∈ [𝛼, 0) is an eigenvalue of 𝐿(𝜆) of negative type, the corresponding eigenfunc-tion is negative to the right of 𝜆0 (but close enough to 𝜆0). By Theorem 2.2, that

eigenfunction is positive in (0, 𝜆𝑧) so that there must be an eigenvalue 0 > ��0 < 𝜆0of 𝐿(𝜆) of positive type. Hence 𝑛−(𝛼) ≤ 𝑝−(𝛼) for 𝛼 ∈ [𝜆min, 0).

Similarly, if 𝛼 ∈ (0, 𝜆max] and 𝜆0 ∈ (0, 𝛼] is an eigenvalue of 𝐿(𝜆) of positivetype, the corresponding eigenfunction is negative to the left (but near) 𝜆0. ByTheorem 2.2, that eigenfunction is positive in (0, 𝜆𝑧) so that 𝜆0 > 𝜆𝑧, 𝜆𝑧 is of

negative type and there must be an eigenvalue 0 < 𝜆𝑧 ≤ ��0 < 𝜆0 of 𝐿(𝜆) ofnegative type. □


We remark again that, for the matrix polynomials of Theorem 3.2, 𝜆𝑧 isnecessarily of negative type.

Putting together the previous results we can provide an additional neces-sary condition that the sign characteristics of all semisimple selfadjoint matrixpolynomials with positive definite leading and trailing coefficients must satisfy.

Theorem 3.3. Let 𝐿(𝜆) be an 𝑛×𝑛 semisimple selfadjoint matrix polynomial with𝐿ℓ > 0 and 𝐿0 > 0. With the notation of the previous theorem, the followingcondition holds:

𝑛+(𝜆max) = 𝑝+(𝜆max). (6)

And if ℓ is even then

𝑛−(𝜆min) = 𝑝−(𝜆min) (7)

Proof. If 𝐿(𝜆) has positive definite leading coefficient and 𝜆𝑧 is the real positiveeigenvalue of 𝐿(𝜆) closest to zero, then, by (5), 𝑝(𝜆𝑧) ≥ 𝑛(𝜆𝑧). That is to say, thenumber of eigenvalues of 𝐿(𝜆) of positive type in (𝜆𝑧 ,+∞) is not smaller than thenumber of eigenvalues of negative type in [𝜆𝑧 ,+∞).

Now, 𝜆𝑧 is of negative type because the trailing coefficient is positive definite,and 𝜆max is of positive type because the leading coefficient is positive definite.Thus, 𝑝(𝜆𝑧) is also the number of eigenvalues of positive type in (0, 𝜆max] and 𝑛(𝜆𝑧)is the number of eigenvalues of negative type in [0, 𝜆max). Hence, 𝑝+(𝜆max) = 𝑝(𝜆𝑧)and 𝑛+(𝜆max) = 𝑛(𝜆𝑧). Since 𝑝(𝜆𝑧) ≥ 𝑛(𝜆𝑧) and by Theorem 3.2, 𝑛+(𝜆max) ≥𝑝+(𝜆max), we conclude that this is indeed an equality.

As mentioned above, if 𝐿(𝜆) is of even degree and semisimple, then the num-ber of eigenvalues of positive type and negative type is the same. It follows from𝑛+(𝜆max) = 𝑝+(𝜆max) that the number of eigenvalues of 𝐿(𝜆) of positive type in[𝜆min, 0] equals the number of eigenvalues of negative type in that interval. Takinginto account that 0 is not an eigenvalue of 𝐿(𝜆) and that 𝜆min is of negative type,the number of eigenvalues of 𝐿(𝜆) of negative type in [𝜆min, 0) is 𝑛−(𝜆min) and thatthe number of eigenvalues of positive type in [𝜆min, 0] is 𝑝−(𝜆min). In conclusion𝑛−(𝜆min) = 𝑝−(𝜆min) as claimed. □

4. Conclusions

Using the notion of the “sign characteristic” of real eigenvalues of selfadjoint ma-trix polynomials, Theorems 2 and 3 establish new results on the ordering of realeigenvalues with respect to these signs. The results take into account the inertiaof the (invertible) leading coefficient, and the trailing coefficient, respectively.

In Section 3, these results have been applied to semisimple matrix polynomi-als with positive definite leading coefficient (Theorem 4), and with positive definitetrailing coefficient (Theorem 5).

These results will be used in [6] to provide solutions for the inverse realsymmetric quadratic eigenvalue problem, in the semisimple case, when the leadingand trailing coefficients are prescribed to be hold some definiteness constraints.


References

[1] Gohberg I., Lancaster P., and Rodman L., Spectral analysis of self-adjoint matrixpolynomials. Research paper 419 (1979) Dept. Mathematics and Statistics, Universityof Calgary, Canada.

[2] Gohberg I., Lancaster P., and Rodman L., Spectral analysis of self-adjoint matrixpolynomials. Annals of Math., 112, (1980), 33–71.

[3] Gohberg I., Lancaster P., and Rodman L., Matrix Polynomials, Academic Press,New York, 1982 and SIAM, Philadelphia, 2009.

[4] Horn R.A., Johnson Ch.R.,Matrix Analysis, Cambridge University Press, New York,1985.

[5] Lancaster P., Prells U., Zaballa I., An Orthogonality property for real symmet-ric matrix polynomials, Operators and Matrices, to appear (avaliable on linehttp://files.ele-math.com/preprints/oam-07-21-pre.pdf).

[6] Lancaster P., Zaballa I., On the Inverse Symmetric Quadratic Eigenvalue Problem.In preparation.

Peter LancasterDept. of Mathematics and StatisticsUniversity of CalgaryCalgaryAlberta, T2N 1N4, Canadae-mail: [email protected]

Ion ZaballaDepartamento de Matematica Aplicada y EIOEuskal Herriko Unibertsitatea (UPV/EHU)Apdo 644E-48080 Bilbao, Spaine-mail: [email protected]

http://files.ele-math.com/preprints/oam-07-21-pre.pdf




Quadratic Operators in Banach Spaces andNonassociative Banach Algebras

Yu.I. Lyubich

Dedicated to Leonia Lerer on the occasion of his 70th birthday

Abstract. A survey of a general theory of quadratic operators in Banachspaces with close relations to the nonassociative Banach algebras is presented.Some applications to matrix and integral quadratic operators in classical Ba-nach spaces are given.

Mathematics Subject Classification (2010). 46H70, 45G10.

Keywords. Cubic matrices, integral operators, Bernstein algebras.

1. Introduction

Let Φ be a field, and let 𝑋 be a linear space over Φ, finite- or infinite-dimensional.If, in addition, a bilinear mapping 𝑊 from the Cartesian square 𝑋 ×𝑋 into 𝑋 isgiven then 𝑋 is called an algebra over Φ. In this setting the vector𝑊 (𝑥, 𝑦) is calledthe product of 𝑥 and 𝑦 and usually denoted by 𝑥𝑦. Accordingly, the mapping 𝑊is called a multiplication. Its bilinearity means the distributive laws

(𝑥 + 𝑦)𝑧 = 𝑥𝑧 + 𝑦𝑧, 𝑧(𝑥+ 𝑦) = 𝑧𝑥+ 𝑧𝑦 (𝑥, 𝑦, 𝑧 ∈ 𝑋)and the homogeneity of degree 1, i.e.,

(𝛼𝑥)𝑦 = 𝑥(𝛼𝑦) = 𝛼(𝑥𝑦) ≡ 𝛼𝑥𝑦, (𝑥, 𝑦 ∈ 𝑋, 𝛼 ∈ Φ).Neither associativity nor unitality are assumed in this definition. For the generaltheory of nonassociative algebras the basic reference is the book [20].

Given an algebra 𝑋 , the diagonal restriction of the multiplication, i.e., themapping 𝑉 : 𝑋 → 𝑋 defined as 𝑉 𝑥 = 𝑥𝑥 ≡ 𝑥2 is called a quadratic mapping orquadratic operator. Its simplest properties are

𝑉 (𝛼𝑥) = 𝛼2𝑉 𝑥, 𝑉 (−𝑥) = 𝑉 𝑥, 𝑉 (0) = 0. (1.1)

198 Yu.I. Lyubich

If the algebra is commutative then we have the elementary identities

𝑥2 − 𝑦2 = (𝑥− 𝑦)(𝑥+ 𝑦) (1.2)

and

(𝑥± 𝑦)2 = 𝑥2 + 𝑦2 ± 2𝑥𝑦. (1.3)

Later on char(Φ) ∕= 2. From (1.3) it follows that

𝑥𝑦 =1

2(𝑉 (𝑥 + 𝑦)− 𝑉 𝑥− 𝑉 𝑦) , (1.4)

or in a more elegant form

𝑥𝑦 =1

4(𝑉 (𝑥 + 𝑦)− 𝑉 (𝑥− 𝑦)) . (1.5)

As a result, with a fixed underlying linear space 𝑋 the correspondence between com-mutative algebras and quadratic operators is one-to-one. Furthermore, a subspace𝑌 ⊂ 𝑋 is invariant for 𝑉 if and only if this is a subalgebra of the correspondingcommutative algebra. The quadratic operator corresponding to this subalgebra isthe restriction 𝑉 ∣𝑌 . In particular, a one-dimensional subspace is a subalgebra ifand only if its basis vector is an eigenvector of 𝑉 . Note that if 𝑉 𝑥 = 𝜆𝑥, 𝜆 ∈ Φ,and 𝜆 ∕= 0 then 𝑧 = 𝑥/𝜆 is a fixed point of 𝑉 that is the same as an idempotent ofthe corresponding algebra: 𝑧2 = 𝑧.

Even without commutativity the mapping 𝑥 �→ 𝑥2 is quadratic by definition.However, 𝑥2 = 𝑥 ∘ 𝑥 where the new multiplication

𝑥 ∘ 𝑦 = 𝑥𝑦 + 𝑦𝑥

2. (1.6)

is commutative. But for a given quadratic mapping a noncommutative multiplica-tion is not unique. The general form of such multiplications is

𝑥𝑦 = 𝑥 ∘ 𝑦 + [𝑥, 𝑦] (1.7)

where the second summand is anticommutative, i.e., [𝑥, 𝑦] = −[𝑦, 𝑥] for all 𝑥, 𝑦.Indeed, the anticommutativity is equivalent to the identity [𝑧, 𝑧] = 0. Note that(1.7) is a unique decomposition of a given algebra into the sum of a commutativealgebra and an anticommutative one. If the algebra is associative then the com-mutative summand is an Jordan algebra, while the anticommutative one is a Liealgebra, and both are not associative, in general.

The quadratic operators come from the classical analysis, calculus of vari-ations and differential geometry, etc., as the second differentials of mappings ofclass 𝐶2. On the other hand, they appear as the generators of nonlinear dynamicalsystems in form of cubic matrices (finite or infinite), the integral operators, etc.In this setting the corresponding algebras are very useful for studying of the dy-namics. A bright example is the Mendelian algebra in genetics connected with theHardy–Weinberg quadratic mapping, see, e.g., [10, Sections 3 and 8]. This algebrais three-dimensional but, in general, the genetic situations are multidimensional.In this way many interesting classes of nonassociative algebras appear after the

Quadratic Operators 199

pioneering works of Serebrovskii [19], Glivenko [7] and Etherington [6]. Much ear-lier Bernstein [1] suggested the quadratic operators in ℝ𝑛 as an adequate languagefor a fundamental problem in the population dynamics. A synthesis of operatorand algebra approaches turned out to be extraordinary fruitful, see book [14] andthe references therein, especially, [18]. For purely algebraic aspects we refer to thebook [21].

In [10] the author investigated the Bernstein quadratic operators, and afterthat the corresponding algebras were introduced in [11] as a powerful tool for theBernstein problem. Eventually, in this way the problem has been completely solved[10, 11, 12, 8]. Note that the name Bernstein algebras appeared in [13] for the firsttime. The main structure theorem for these ones was first obtained in the operatorform [10, Theorem 4.1]. In [9] this result was reproduced in an explicit algebraicform as a base for a further investigation. Nowadays the Bernstein algebra theoryis a well-developed area of the modern algebra.

In the present paper we consider the quadratic operators in infinite-dimen-sional spaces with a special attention to the operators of finite rank, i.e., withfinite-dimensional images (Section 2). If a quadratic operator in a Banach spaceis continuous then the corresponding algebra is a Banach one, and vice versa. Westudy this relation in Sections 3 and then proceed to a compact situation (Section6) focusing on integral quadratic operators. These considerations culminate inSection 7 devoted mainly to the Bernstein quadratic operators and algebras. Inthe compact case they are of finite rank [15]. Moreover, the Bernstein integraloperators in 𝐶[0, 1] with positive kernels are of rank 1. This remarkable resultannounced by Bernstein [1] was never proved until [16] where a proof was givenwith help of the corresponding Bernstein algebra introduced in [15].

2. Algebraic preliminaries

Obviously, the set𝑄𝑋 of all quadratic operators in a linear space𝑋 is a linear spaceisomorphic to the space of all commutative multiplications in 𝑋 . Some quadraticoperators come naturally from the linear ones. For example, if 𝑇 is a linear operatorand 𝜑 is a linear functional then

𝑉 𝑥 = 𝜑[𝑥]𝑇𝑥 (2.1)

is a quadratic operator corresponding to the multiplication

𝑥𝑦 =1

2(𝜑[𝑥]𝑇𝑦 + 𝜑[𝑦]𝑇𝑥). (2.2)

Also note that if 𝑇 is a linear operator then the superposition 𝑇𝑉 and 𝑉 𝑇 arequadratic operators for every quadratic 𝑉 . Indeed, 𝑇 (𝑥𝑦) and (𝑇𝑥)(𝑇𝑦) are bilinearfor any algebra. A quadratic operator 𝑉 is called elementary if

𝑉 𝑥 = 𝑣[𝑥]𝑒, 𝑥 ∈ 𝑋, (2.3)

where 𝑒 ∈ 𝑋 and 𝑣[𝑥] is a quadratic functional, i.e., 𝑣[𝑥] = 𝑤[𝑥, 𝑥] where 𝑤[𝑥, 𝑦] isa (unique) symmetric bilinear functional, the polar of 𝑣.

200 Yu.I. Lyubich

Accordingly,

𝑥𝑦 = 𝑤[𝑥, 𝑦]𝑒, 𝑥 ∈ 𝑋, (2.4)

The simplest example is

𝑥𝑦 = 𝜑[𝑥]𝜑[𝑦]𝑒 (2.5)

where 𝜑 is a linear functional. This algebra is associative.For any quadratic operator 𝑉 we consider the invariant subspace 𝑆(𝑉 ) =

Span(Im𝑉 ) of the space 𝑋 . In more detail,

𝑆(𝑉 ) ={∑𝑚

𝑗=1𝛼𝑗𝑉 𝑥𝑗 : 𝑥𝑗 ∈ 𝑋, 𝛼𝑗 ∈ Φ, 𝑚 ≥ 1

}.

The corresponding subalgebra (even an ideal) of the algebra 𝑋 is

𝑋 ′ ={∑𝑚

𝑗=1𝛼𝑗𝑥𝑗𝑦𝑗 : 𝑥𝑗 , 𝑦𝑗 ∈ 𝑋, 𝛼𝑗 ∈ Φ, 𝑚 ≥ 1

}.

Indeed, 𝑆(𝑉 ) ⊂ 𝑋 ′, obviously. On the other hand 𝑋 ′ ⊂ 𝑆(𝑉 ) by polarization (1.4)or (1.5). Hence, 𝑆(𝑉 ) = 𝑋 ′. Up to the end of this section we keep all in terms of𝑉 but, of course, everything can be immediately polarized.

The dimension 𝑟(𝑉 ), 0 ≤ 𝑟(𝑉 ) ≤ ∞, of the space 𝑆(𝑉 ) is called the rank ofthe quadratic operator 𝑉 . Obviously, 𝑟(𝑉 ) = 0 if and only if 𝑉 = 0. Denote by𝐹𝑄𝑋 the set of all quadratic operators of finite rank.

Proposition 2.1. The set 𝐹𝑄𝑋 is a subspace of 𝑄𝑋 .

Proof. For every 𝛼 ∈ Φ/{0} we have 𝑆(𝛼𝑉 ) = 𝑆(𝑉 ). Furthermore, 𝑆(𝑉1 + 𝑉2) ⊂𝑆(𝑉1) + 𝑆(𝑉2). Thus,

𝑟(𝛼𝑉 ) = 𝑟(𝑉 ) (𝛼 ∕= 0), 𝑟(𝑉1 + 𝑉2) ≤ 𝑟(𝑉1) + 𝑟(𝑉2). (2.6)□

In addition, 𝑟(𝑇𝑉 ) ≤ 𝑅(𝑉 ) and 𝑟(𝑉 𝑇 ) ≤ 𝑟(𝑉 ) for any linear operator 𝑇 .Hence, the subspace 𝐹𝑄𝑋 is invariant for the mappings 𝑉 �→ 𝑇𝑉 and 𝑉 �→ 𝑉 𝑇 .

Proposition 2.2. Each 𝑉 ∈ 𝐹𝑄𝑋 is the sum of 𝑟(𝑉 ) elementary operators.

Proof. Let (𝑒𝑗)𝑟𝑗=1 be a basis of 𝑆(𝑉 ), so 𝑟 = 𝑟(𝑉 ), and let (𝑒∗𝑗 )

𝑟𝑗=1 be the dual

basis in 𝑆(𝑉 )∗. Since 𝑉 𝑥 ∈ 𝑆(𝑉 ) for every 𝑥 ∈ 𝑋 , we have𝑉 𝑥 =

∑𝑟

𝑗=1𝑣𝑗 [𝑥]𝑒𝑗 , 𝑥 ∈ 𝑋, (2.7)

where 𝑣𝑗 [𝑥] = 𝑒∗𝑗 [𝑉 𝑥] are quadratic functionals. □

Corollary 2.3. An operator 𝑉 ∈ 𝑄𝑋 ∖{0} is of rank 1 if and only if it is elementary.

Corollary 2.4. For 𝑉 ∈ 𝐹𝑄𝑋 the rank 𝑟(𝑉 ) is the minimal number of elementaryquadratic operators sum of which is equal to 𝑉 .

Proof. This follows from Proposition 2.2 because of the inequality (2.6). □

We call minimal any decomposition of 𝑉 ∈ 𝑄𝑋 ∖ {0} into a sum of 𝑟(𝑉 ) ele-mentary quadratic operators. To investigate this case we use the following general


Lemma 2.5. If (𝜔𝑖(𝑡))𝑛𝑖=1 is a linearly independent system of scalar functions on a

set 𝑆 then there are 𝑛 points 𝑡1, . . . , 𝑡𝑛 in 𝑆 such that

det(𝜔𝑖(𝑡𝑘))𝑛𝑖,𝑘=1 ∕= 0. (2.8)

Proof. Let us consider the mapping Ω : 𝑆 → Φ𝑛 defined as Ω(𝑡) = (𝜔𝑖(𝑡))𝑛𝑖=1,

𝑡 ∈ 𝑆. The subspace 𝐿 = Span(ImΩ) coincides with Φ𝑛. Indeed, otherwise, 𝐿would lie in a hyperplane, so there is (𝛼𝑖)

𝑛𝑖=1 ∈ Φ𝑛 ∖ {0} such that

𝑛∑𝑖=1

𝛼𝑖𝜔𝑖(𝑡) = 0, 𝑡 ∈ 𝑆,

that contradicts the linear independence of the system (𝜔𝑖(𝑡))𝑛𝑖=1. As a result, there

are 𝑛 linearly independent vectors in ImΩ. They are Ω(𝑡𝑘) with some 𝑡𝑘 ∈ 𝑆. Thus,the columns of the matrix (𝜔𝑖(𝑡𝑘))

𝑛𝑖,𝑘=1 are linearly independent. □

Now let

𝑉 𝑥 =

𝑠∑𝑗=1

𝑣𝑗 [𝑥]𝑒𝑗 , 𝑥 ∈ 𝑋, (2.9)

where 𝑒𝑗 are vectors and 𝑣𝑗 [𝑥] are quadratic functionals.

Lemma 2.6. If in (2.9) the 𝑣𝑗’s are linearly independent then all 𝑒𝑗 ∈ 𝑆(𝑉 ).Proof. By Lemma 2.5 there are 𝑠 vectors 𝑥1, . . . , 𝑥𝑠 such that

det(𝑣𝑗 [𝑥𝑘])𝑠𝑗,𝑘=1 ∕= 0.

Therefore, the system of linear equations𝑠∑

𝑗=1

𝑣𝑗 [𝑥𝑘]𝑒𝑗 = 𝑉 𝑥𝑘, 1 ≤ 𝑘 ≤ 𝑠,

with unknown 𝑒𝑗 ’s is solvable. By Cramer’s rule all 𝑒𝑗 ∈ 𝑆(𝑉 ). □Now we are able to characterize the minimal decompositions.

Theorem 2.7. The following statements are equivalent:

1) The decomposition (2.9) is minimal, i.e., 𝑠 = 𝑟(𝑉 ).2) The systems (𝑒𝑗)

𝑠𝑗=1 and (𝑣𝑗)

𝑠𝑗=1 are both linearly independent.

3) The system (𝑒𝑗)𝑠𝑗=1 is a basis in 𝑆(𝑉 ).

Proof. 1)⇒2). Suppose to the contrary. For definiteness, let the system (𝑒𝑗)𝑠𝑗=1

be linearly dependent. Then one of 𝑒𝑗’s is a linear combination of others. Thesubstitution of this expression into (2.9) reduces the number of summands to 𝑠−1that contradicts the minimality.

2)⇒3) since all 𝑒𝑗 ∈ 𝑆(𝑉 ) by Lemma 2.6 and Span(𝑒𝑗)𝑠𝑗=1 = 𝑆(𝑉 ) by (2.9).

3)⇒1). 𝑠 = 𝑟(𝑉 ) since 𝑟(𝑉 ) = dim𝑆(𝑉 ) and (𝑒𝑗)𝑠𝑗=1 is a basis in 𝑆(𝑉 ). □

Corollary 2.8. If the decomposition (2.9) is minimal then the system (𝑒𝑗)𝑠𝑗=1 is a

basis in 𝑆(𝑉 ) and 𝑣𝑗 [𝑥] = 𝑒∗𝑗 [𝑉 𝑥], 1 ≤ 𝑗 ≤ 𝑠, where (𝑒∗𝑗 )𝑠𝑗=1 is the dual basis in

𝑆(𝑉 )∗.

202 Yu.I. Lyubich

3. Continuous quadratic operators and Banach algebras

From now on the ground field Φ is ℝ or ℂ, until otherwise stated. Recall that areal or complex Banach space 𝑋 is said to be a Banach algebra if 𝑋 is an algebrawith continuous multiplication, i.e., the product 𝑥𝑦 is a continuous function of(𝑥, 𝑦) ∈ 𝑋 ×𝑋 .Proposition 3.1. Let 𝑋 be an algebra and a Banach space with a norm ∥.∥. Thenthe following statements are equivalent:

(1) 𝑋 is a Banach algebra.

(2) The product 𝑥𝑦 is continuous at the point (0, 0).

(3) The inequality

∥𝑥𝑦∥ ≤ 𝐶 ∥𝑥∥ ∥𝑦∥ (3.1)

holds with a constant 𝐶 > 0.

Proof. (1)⇒(2) trivially.(2)⇒(3). Suppose to the contrary. Then there is a sequence (𝑥𝑛, 𝑦𝑛)∞𝑛=1 ⊂

𝑋 ×𝑋 such that ∥𝑥𝑛𝑦𝑛∥ > 𝑛 ∥𝑥𝑛∥ ∥𝑦𝑛∥. Obviously, 𝑥𝑛 ∕= 0 and 𝑦𝑛 ∕= 0. By letting

𝑢𝑛 = 𝑥𝑛/√𝑛 ∥𝑥𝑛∥ , 𝑣𝑛 = 𝑦𝑛/

√𝑛 ∥𝑦𝑛∥ ,

we obtain ∥𝑢𝑛𝑣𝑛∥ > 1, while ∥𝑢𝑛∥ = ∥𝑣𝑛∥ = 1/√𝑛→ 0, a contradiction.

(3)⇒(1). From (3.1) it follows that

∥(𝑥+ 𝑢)(𝑦 + 𝑣)− 𝑥𝑦∥ = ∥𝑥𝑣 + 𝑢𝑦 + 𝑢𝑣∥ ≤ 𝐶(∥𝑥∥ ∥𝑣∥ + ∥𝑢∥ ∥𝑦∥+ ∥𝑢∥ ∥𝑣∥).Hence, (𝑥+ 𝑢)(𝑦 + 𝑣)→ 𝑥𝑦 when (𝑢, 𝑣)→ 0 in 𝑋 ×𝑋 . □

Sometimes, it can be reasonable to change the norm in a Banach algebra 𝑋to an equivalent one. By definition, the ratio of two equivalent norms lies in asegment [𝑎, 𝑏] of the semiaxis (0,∞). When a norm runs over an equivalence class,the topology remains the same, while the constant 𝐶 in (3.1) takes all positivevalues. This is true even if we only consider the norms proportional to a fixed one:{∥.∥′ = 𝑞 ∥.∥ : 𝑞 > 0}. Indeed, from (3.1) it follows that

∥𝑥𝑦∥′ ≤ 𝐶𝑞−1 ∥𝑥∥′ ∥𝑦∥′ . (3.2)

By the way, with 𝑞 = 𝐶 we have

∥𝑥𝑦∥′ ≤ ∥𝑥∥′ ∥𝑦∥′ . (3.3)

It is useful to add a geometrical criterion to the Proposition 3.1. Let usconsider the balls 𝐵𝑟 = {𝑧 ∈ 𝑋 : ∥𝑧∥ ≤ 𝑟}, 𝑟 > 0, in a Banach space 𝑋 . Let 𝑋 bean algebra, as before.

Proposition 3.2. For 𝑋 in order to be a Banach algebra it is necessary and suf-ficient that the product 𝑥𝑦 is bounded on every 𝐵𝑟 × 𝐵𝑟 and sufficient that thisproduct is bounded on a 𝐵𝑟 ×𝐵𝑟.


Proof. Let 𝑋 be a Banach algebra. Then for (𝑥, 𝑦) ∈ 𝐵𝑟 ×𝐵𝑟 the inequality (3.1)yields ∥𝑥𝑦∥ ≤ 𝐶𝑟2. Conversely, let ∥𝑥𝑦∥ ≤ 𝑀 for a 𝑀 > 0 and (𝑥, 𝑦) ∈ 𝐵𝑟 × 𝐵𝑟

with an 𝑟. For any (𝑥, 𝑦) ∈ 𝑋 ×𝑋 the vectors 𝑟𝑥/ ∥𝑥∥ and 𝑟𝑦/ ∥𝑦∥ belong to 𝐵𝑟.Hence, ∥𝑥𝑦∥ ≤𝑀𝑟−2 ∥𝑥∥ ∥𝑦∥. □

Now we proceed to the quadratic operators in a Banach space𝑋 and establishsome criteria for their continuity.

Proposition 3.3. Let 𝑉 : 𝑋 → 𝑋 be a quadratic operator. The following statementsare equivalent:

(1) 𝑉 is continuous.(2) The corresponding commutative algebra is a Banach algebra.(3) 𝑉 is bounded in the sense that the inequality

∥𝑉 𝑥∥ ≤ 𝐶 ∥𝑥∥2 (3.4)

holds with a constant 𝐶 > 0.(4) The image of every ball 𝐵𝑟 is contained in a ball 𝐵𝑅(𝑟).(5) The image of a ball 𝐵𝑟 is contained in a ball 𝐵𝑅.(6) 𝑉 is continuous at the point 𝑥 = 0.

Proof. (1)⇒(2) by polarization.(2)⇒(3) by taking 𝑦 = 𝑥 in the inequality (3.1).(3)⇒(4) since if ∥𝑥∥ ≤ 𝑟 then ∥𝑉 𝑥∥ ≤ 𝐶𝑟2 by (3.4).(4)⇒(5) trivially.(5)⇒(6) since if 𝑉 𝐵𝑟 ⊂ 𝐵𝑅 with some 𝑟 and 𝑅 then by homogeneity we have

𝑉 𝐵𝛿 ⊂ 𝐵𝜀 where 𝛿 = 𝑟√𝜀/𝑅.

(6)⇒(2) since 𝑥𝑦 is continuous at (0, 0) by polarization, and then one can referto (2)⇒(1) from Proposition 3.1.

(2)⇒(1) trivially. □We denote the linear space of continuous quadratic operators (a subspace of

𝑄𝑋) by 𝐵𝑄𝑋 in accordance with the equivalence (1) ⇔ (3). The latter allows usto introduce the norm

∥𝑉 ∥ = sup𝑥 ∕=0

∥𝑉 𝑥∥∥𝑥∥2 = sup

∥𝑥∥=1

∥𝑉 𝑥∥ , 𝑉 ∈ 𝐵𝑄𝑋 . (3.5)

This definition is a counterpart of that which is the standard for a linear continuous(⇔ bounded) operator 𝑇 . Obviously,

∥𝑇𝑉 ∥ ≤ ∥𝑇 ∥ ∥𝑉 ∥ , ∥𝑉 𝑇 ∥ ≤ ∥𝑇 ∥2 ∥𝑉 ∥ .At the end of Section 4 we show that the normed space 𝐵𝑄𝑋 is Banach. Its(nonclosed, in general) subspace 𝐵𝑄𝑋 ∩ 𝐹𝑄𝑋 we denote by 𝐵𝐹𝑄𝑋 .

Theorem 3.4. An operator 𝑉 ∈ 𝐹𝑄𝑋 belongs to 𝐵𝐹𝑄𝑋 if in a decomposition (2.9)the quadratic functionals 𝑣𝑗 [𝑥] are continuous. This condition is necessary if thedecomposition is minimal.

204 Yu.I. Lyubich

Proof. The sufficiency is obvious. The necessity follows from Corollary 2.8 sinceall linear functionals on the finite-dimensional space 𝑆(𝑉 ) are continuous. □

In conclusion of this section we prove the following

Proposition 3.5. A quadratic operator 𝑉 ∕= 0 of form (2.1) is continuous if andonly if the linear functional 𝜑 and the linear operator 𝑇 are both continuous.

Proof. The “if” part is obvious. By the implication 1) ⇒ 2) in Proposition 3.3 itsuffices to prove the “only if” part in terms of the multiplication (2.2). Let thelatter be continuous, and let 𝑇 ∣ ker𝜑 ∕= 0. Then there is 𝑦 such that 𝜑[𝑦] = 0,𝑇𝑦 ∕= 0, and then there is a continuous linear functional 𝜓 such that 𝜓[𝑇𝑦] = 2.From (2.2) it follows that 𝜑[𝑥] = 𝜓[𝑥𝑦], hence 𝜑 is continuous.

Now let 𝑇 ∣ ker𝜑 = 0. Since 𝑉 ∕= 0, we have 𝑇 ∕= 0 and 𝜑 ∕= 0. Then thereis a vector 𝑒 such that 𝜑[𝑒] = 1. This yields 𝑥 − 𝜑[𝑥]𝑒 ∈ ker𝜑 for all 𝑥 ∈ 𝑋 .Hence, 𝑇𝑥 = 𝜑[𝑥]𝑇𝑒, so 𝑇𝑒 ∕= 0. Taking a continuous linear functional 𝜃 such that𝜃[𝑇𝑒] = 1 we get 𝜑[𝑥] = 𝜃[𝑥𝑒], hence 𝜑 is continuous again.

Now to complete the proof we return to (2.2) taking any 𝑦 such that 𝜑[𝑦] = 1.Then we obtain 𝑇𝑥 = 2𝑥𝑦 − 𝜑[𝑥]𝑇𝑦. Thus, the operator 𝑇 is continuous. □

4. Intrinsic characterization of quadratic operators

Here we prove the following

Theorem 4.1. Let 𝑋 be a Banach space. A continuous mapping 𝑉 : 𝑋 → 𝑋is a quadratic operator if and only if for every two vectors 𝑥, 𝑦 ∈ 𝑋 and everycontinuous linear functional 𝜔 on 𝑋 the function 𝜔[𝑉 (𝛼𝑥 + 𝛽𝑦)] is a quadraticform of the scalar variables 𝛼, 𝛽.

Proof. “Only if.” In the corresponding algebra we have

𝜔[𝑉 (𝛼𝑥 + 𝛽𝑦)] = 𝜔[(𝛼𝑥+ 𝛽𝑦)2] = 𝛼2𝜔[𝑥2] + 𝛽2𝜔[𝑦2] + 2𝛼𝛽𝜔[𝑥𝑦].

“If.” Now we have

𝜔[𝑉 (𝛼𝑥+ 𝛽𝑦)] = 𝛼2𝑎(𝑥, 𝑦;𝜔) + 𝛽2𝑏(𝑥, 𝑦;𝜔) + 2𝛼𝛽𝑐(𝑥, 𝑦;𝜔) (4.1)

where 𝑎, 𝑏, 𝑐 are some scalar functions of the triple (𝑥, 𝑦;𝜔). By setting 𝛼 = 1, 𝛽 = 0and 𝛼 = 0, 𝛽 = 1 we obtain

𝑎(𝑥, 𝑦;𝜔) = 𝜔[𝑉 𝑥], 𝑏(𝑥, 𝑦;𝜔) = 𝜔[𝑉 𝑦], (4.2)

and then with 𝛼 = 1, 𝛽 = ±1 we get𝜔[𝑉 (𝑥± 𝑦)] = 𝜔[𝑉 𝑥] + 𝜔[𝑉 𝑦]± 2𝑐(𝑥, 𝑦;𝜔).

Hence,

𝑐(𝑥, 𝑦;𝜔) = 𝜔 [𝑊 (𝑥, 𝑦)] (4.3)

where

𝑊 (𝑥, 𝑦) =𝑉 (𝑥+ 𝑦)− 𝑉 (𝑥− 𝑦)

4. (4.4)


By substitution from (4.3) and (4.2) into (4.1) and linearity of the functional 𝜔we obtain

𝜔[𝑉 (𝛼𝑥 + 𝛽𝑦)] = 𝜔[𝛼2𝑉 𝑥+ 𝛽2𝑉 𝑦 + 2𝛼𝛽𝑊 (𝑥, 𝑦)]. (4.5)

By the Hahn–Banach theorem the continuous linear functionals on the Ba-nach space 𝑋 separate the vectors. Therefore, from (4.5) it follows that

𝑉 (𝛼𝑥 + 𝛽𝑦) = 𝛼2𝑉 𝑥+ 𝛽2𝑉 𝑦 + 2𝛼𝛽𝑊 (𝑥, 𝑦). (4.6)

In particular,

𝑉 (𝛼𝑥) = 𝛼2𝑉 𝑥, 𝑉 (0) = 0.

Now from (4.4) it follows that 𝑊 (𝑥, 𝑦) is continuous and 𝑊 (𝑥, 𝑥) = 𝑉 𝑥. Itremains to prove that 𝑊 (𝑥, 𝑦) is a multiplication. This multiplication will turnout to be commutative automatically since 𝑊 (𝑥, 𝑦) = 𝑊 (𝑦, 𝑥) by (4.4). Also dueto this symmetry the bilinearity of 𝑊 (𝑥, 𝑦) reduces to the linearity of 𝑊 (𝑥, .).

According to (4.4) we have

4𝑊 (𝛼𝑥1 + 𝛽𝑥2, 𝑦) = 𝑉 (𝛼𝑥1 + 𝛽𝑥2 + 𝑦)− 𝑉 (𝛼𝑥1 + 𝛽𝑥2 − 𝑦). (4.7)

In particular,

4𝑊 (𝛼𝑥,𝑦) = 𝑉 (𝛼𝑥+ 𝑦)− 𝑉 (𝛼𝑥 − 𝑦) = 4𝛼𝑊 (𝑥, 𝑦)

by (4.6) with 𝛽 = ±1. Thus 𝑊 (𝑥, .) is homogeneous of degree 1.

To prove the additivity of 𝑊 (𝑥, .) we note that for every triple 𝑥, 𝑦, 𝑧 ∈ 𝐸the formula (4.6) implies

𝑉 (𝑥+ 𝑧 ± 𝑦) = 𝑉 (𝑥+ 𝑧) + 𝑉 (𝑦)± 2𝑊 (𝑥+ 𝑧, 𝑦)

whence

𝑉 (𝑥+ 𝑧 + 𝑦) + 𝑉 (𝑥+ 𝑧 − 𝑦) = 2{𝑉 (𝑥+ 𝑧) + 𝑉 (𝑦)}and then

𝑉 (𝑥− 𝑧 + 𝑦) + 𝑉 (𝑥− 𝑧 − 𝑦) = 2{𝑉 (𝑥− 𝑧) + 𝑉 (𝑦)}.By subtraction we get

{𝑉 (𝑥+𝑧+𝑦)−𝑉 (𝑥−𝑧+𝑦)}+{𝑉 (𝑥+𝑧−𝑦)−𝑉 (𝑥−𝑧−𝑦)} = 2{𝑉 (𝑥+𝑧)−𝑉 (𝑥−𝑧)}that can be rewritten as

𝑊 (𝑥+ 𝑦, 𝑧) +𝑊 (𝑥− 𝑦, 𝑧) = 2𝑊 (𝑥, 𝑧) =𝑊 (2𝑥, 𝑧).

By substitution 𝑥 = (𝑢+ 𝑣)/2, 𝑦 = (𝑢− 𝑣)/2 we finally obtain𝑊 (𝑢, 𝑧) +𝑊 (𝑣, 𝑧) =𝑊 (𝑢+ 𝑣, 𝑧). □

In essence, the proof of the additivity above is a version of an argument whichshows that in a normed space the parallelogram identity implies the Euclideanstructure, see [4, Ch. 7, Section 3]. On the other hand, the topological aspect ofthe proof can be ignored that yields the following general result, cf. [2], no3, Ex. 8.

206 Yu.I. Lyubich

Theorem 4.2. Let 𝑋 and 𝑌 be some linear spaces over a field Φ, char(Φ) ∕= 2. Amapping 𝑉 : 𝑋 → 𝑌 is quadratic (i.e., generated by a bilinear mapping 𝑋 ×𝑋 →𝑌 ) if and only if such are the restrictions of 𝑉 to all two-dimensional subspacesof 𝑋.

Taking 𝑌 = Φ we obtain

Corollary 4.3. A scalar function on a linear space 𝑋 over Φ is a quadratic func-tional if and only if such are its restrictions to all two-dimensional subspaces.

As an application of Theorem 4.1 (with some elements of its proof) we prove

Proposition 4.4. With the norm (3.5) the space 𝐵𝑄𝑋 of continuous quadraticoperators in a Banach space 𝑋 is a Banach space.

Proof. Let (𝑉𝑛)∞𝑛=1 be a Cauchy sequence in 𝐵𝑄𝑋 . Then (𝑉𝑛𝑥)

∞𝑛=1 is a Cauchy

sequence in 𝑋 for every 𝑥 ∈ 𝑋 . Since 𝑋 is a Banach space, the self-mapping𝑉 𝑥 = lim𝑛→∞ 𝑉𝑛𝑥 is well defined on 𝑋 . This is continuous since the convergenceis uniform on every ball. This is quadratic since one can pass to the limit in

𝑉𝑛(𝛼𝑥 + 𝛽𝑦) = 𝛼2𝑉𝑛𝑥+ 𝛽

2𝑉𝑛𝑦 + 2𝛼𝛽𝑊𝑛(𝑥, 𝑦)

where

𝑊𝑛(𝑥, 𝑦) =1

4(𝑉𝑛(𝑥+ 𝑦)− 𝑉𝑛(𝑥 − 𝑦)) . □

5. Cubic matrices and quadratic integral operators

Assume that a Banach space 𝑋 has a Schauder basis (𝑒𝑗)𝜈𝑗=1, 1 ≤ 𝜈 ≤ ∞. By defi-

nition, every vector 𝑥 ∈ 𝑋 can be uniquely represented as the sum of a convergentseries (or as a finite sum if 𝜈 <∞) of the form

𝑥 =

𝜈∑𝑗=1

𝜉𝑗𝑒𝑗 (5.1)

with scalar coefficients 𝜉𝑗 . The latter are the coordinates of 𝑥 at the given basis.The functionals 𝜉𝑗 = 𝑒

∗𝑗 [𝑥] are linear and continuous.

If 𝑋 is an algebra then we have the table of multiplication

𝑒𝑖𝑒𝑘 =

𝜈∑𝑗=1

𝑐𝑗𝑖𝑘𝑒𝑗 . (5.2)

The scalar coefficients 𝑐𝑗𝑖𝑘 are called the structural constants of the algebra regard-ing the basis (𝑒𝑗)

𝜈𝑗=1. The structural constants constitute a (𝜈×𝜈×𝜈)- cubic matrix

𝑐 which also can be treated as a sequence of length 𝜈 of square (𝜈 × 𝜈)-matrices𝑐𝑗 = [𝑐𝑗𝑖𝑘]

𝜈𝑖,𝑘=1. The cubic matrix 𝑐 is called the structural matrix of the algebra at

the given basis.


Example 5.1. The structural coefficients of the algebra (2.2) are

𝑐𝑗𝑖𝑘 =1

2(𝜑[𝑒𝑖]𝜏𝑗𝑘 + 𝜑[𝑒𝑘]𝜏𝑗𝑖) (5.3)

where [𝜏𝑗𝑘]𝜈𝑗,𝑘=1 is the matrix of the operator 𝑇 at the same basis.

Theorem 5.2. Let 𝑋 be a Banach algebra with a Schauder basis (𝑒𝑗)𝜈𝑗=1, and let

𝜉𝑖 and 𝜂𝑘 be the coordinates of vectors 𝑥 and 𝑦, respectively. Then the coordinatesof the product 𝑥𝑦 are the bilinear forms

𝜁𝑗 =

𝜈∑𝑖,𝑘=1

𝑐𝑗𝑖𝑘𝜉𝑖𝜂𝑘 (5.4)

if 𝜈 <∞ and

𝜁𝑗 = lim𝑛→∞

𝑛∑𝑖,𝑘=1

𝑐𝑗𝑖𝑘𝜉𝑖𝜂𝑘 (5.5)

if 𝜈 =∞.

Proof. For 𝜈 < ∞ the formula (5.4) follows immediately by bilinearity of themultiplication and definition of the structural constants. Now let 𝜈 =∞, and let

𝑥𝑛 =

𝑛∑𝑖=1

𝜉𝑖𝑒𝑖, 𝑦𝑛 =

𝑛∑𝑘=1

𝜂𝑘𝑒𝑘.

Then 𝑥𝑛 → 𝑥 and 𝑦𝑛 → 𝑦 as 𝑛 → ∞. Since the multiplication is continuous, wehave 𝑥𝑛𝑦𝑛 → 𝑥𝑦. Since the coordinate functionals 𝑒∗𝑗 are continuous, we get

𝜁𝑗 = 𝑒∗𝑗 [𝑥𝑦] = lim

𝑛→∞ 𝑒∗𝑗 [𝑥𝑛𝑦𝑛],

and the case reduces to the previous one. □

As an immediate consequence we get

Corollary 5.3. A Banach algebra with a Schauder basis is commutative if and onlyif the corresponding structural constants 𝑐𝑗𝑖𝑘 are symmetric with respect 𝑖,𝑘, i.e.,

all matrices 𝑐𝑗 = [𝑐𝑗𝑖𝑘]𝜈𝑖,𝑘=1 are symmetric.

Note that the transformation (1.6) of an algebra to a commutative one cor-

responds to the standard symmetrization [𝑐𝑗𝑖𝑘] �→ 12 ([𝑐

𝑗𝑖𝑘] + [𝑐𝑗𝑘𝑖]).

Corollary 5.4. Let 𝑉 be a continuous quadratic operator in a Banach space 𝑋 witha Schauder basis (𝑒𝑗)

𝜈𝑗=1, 1 ≤ 𝜈 ≤ ∞, and let 𝑐𝑗𝑖𝑘 be the structural constants of

the corresponding commutative Banach algebra. Then if 𝜉𝑖 are the coordinates ofa vector 𝑥 ∈ 𝑋 then the 𝑗th coordinate of the vector 𝑉 𝑥 is the quadratic form

𝜁𝑗 =𝜈∑

𝑖,𝑘=1

𝑐𝑗𝑖𝑘𝜉𝑖𝜉𝑘 (5.6)

208 Yu.I. Lyubich

if 𝑋 is finite dimensional or

𝜁𝑗 = lim𝑛→∞

𝑛∑𝑖,𝑘=1

𝑐𝑗𝑖𝑘𝜉𝑖𝜉𝑘 (5.7)

if 𝑋 is infinite dimensional.

In the finite-dimensional space 𝑋 the formulas (5.4) are obviously valid forany multiplication, irrespective to a topology. On the other hand, a linear topologyin 𝑋 is unique and can be defined by any norm. Using the 𝑙1-norm

∥𝑥∥ =𝜈∑

𝑖=1

∣𝜉𝑖∣ (5.8)

we see that (5.4) implies the inequality (3.1) with

𝐶 =

𝜈∑𝑗=1

max𝑖,𝑘

∣∣∣𝑐𝑗𝑖𝑘∣∣∣ .This results in the following

Proposition 5.5. With respect to the linear topology all finite-dimensional algebrasare Banach, and all quadratic operators in finite-dimensional spaces are continu-ous.

This fails in any infinite-dimensional Banach space 𝑋 .

Example 5.6. The multiplication (2.5) is not continuous if such is 𝜑 and 𝑒 ∕= 0.Such a 𝜑 can be obtained as the linear extension of any unbounded scalar functionon an algebraic basis (a Hamel basis) Γ such that ∥𝑔∥ = 1 for 𝑔 ∈ Γ.

With finite 𝜈 any cubic matrix [𝑐𝑗𝑖𝑘]𝜈𝑖,𝑘,𝑗=1 is the structural matrix of the

multiplication defined by (5.4) at an arbitrary basis (𝑒𝑗)𝜈𝑗=1. Indeed, in this setting

𝑒𝑙𝑒𝑚 =

𝜈∑𝑗=1

( 𝜈∑𝑖,𝑘=1

𝑐𝑗𝑖𝑘𝛿𝑙𝑖𝛿𝑚𝑘

)𝑒𝑗 =

𝜈∑𝑗=1

𝑐𝑗𝑙𝑚𝑒𝑗 (5.9)

where 𝛿 is the Kronecker delta.In the infinite-dimensional case a cubic matrix must satisfy some conditions

in order to be the structural matrix of a continuous multiplication. Of course,these conditions depend on the underlying Banach space. Let us consider themultiplications in the classical spaces 𝑙𝑝, 1 ≤ 𝑝 ≤ ∞. As usual, we set 𝑞 = 𝑝/(𝑝−1)for 1 < 𝑝 < ∞, 𝑞 = ∞ for 𝑝 = 1 and 𝑞 = 1 for 𝑝 = ∞. We denote the norm of𝑥 ∈ 𝑙𝑝 by ∥𝑥∥𝑝 and say that a square matrix 𝑎 = [𝑎𝑖𝑘]

∞𝑖,𝑘=1 is of class 𝑙𝑞 if

∥𝑎∥𝑞 ≡( ∞∑

𝑖,𝑘=1

∣𝑎𝑖𝑘∣𝑞)1/𝑞

<∞

for 𝑞 <∞ and∥𝑎∥∞ ≡ sup

𝑖,𝑘≥1∣𝑎𝑖𝑘∣ <∞.


Now, given a cubic matrix 𝑐 = [𝑐𝑗𝑖𝑘]∞𝑖,𝑘,𝑗=1, we say that it is of class 𝑙𝑝,𝑞 if all square

matrices 𝑐𝑗 = [𝑐𝑗𝑖𝑘]∞𝑖,𝑘=1 are of class 𝑙𝑞 and the sequence (

∥∥𝑐𝑗∥∥𝑞)∞𝑗=1 belongs to 𝑙𝑝.

In this case we set

∥𝑐∥𝑝,𝑞 =∥∥∥∥(∥∥𝑐𝑗∥∥𝑞)∞𝑗=1

∥∥∥∥𝑝

. (5.10)

Theorem 5.7. Let 1 ≤ 𝑝 < ∞. In the space 𝑙𝑝 every cubic matrix 𝑐 = [𝑐𝑗𝑖𝑘]∞𝑖,𝑘,𝑗=1

of class 𝑙𝑝,𝑞 being assigned to the Schauder basis 𝛿𝑗 = (𝛿𝑗𝑖)∞𝑖=1, 1 ≤ 𝑗 < ∞, is the

structural matrix of a continuous multiplication.

Proof. For every pair 𝑥 = (𝜉𝑖)∞𝑖=1, 𝑦 = (𝜂𝑘)

∞𝑘=1 ∈ 𝑙𝑝 the values

𝜁𝑗 =∞∑

𝑖,𝑘=1

𝑐𝑗𝑖𝑘𝜉𝑖𝜂𝑘, 𝑗 ≥ 1, (5.11)

are determined since these series converge, even absolutely. Indeed,∞∑

𝑖,𝑘=1

∣∣∣𝑐𝑗𝑖𝑘𝜉𝑖𝜂𝑘∣∣∣ ≤ ∥∥𝑐𝑗∥∥𝑞 ∥𝑥∥𝑝 ∥𝑦∥𝑝by the Holder inequality for 𝑝 > 1 and trivially for 𝑝 = 1. Moreover, the sequence𝑧 = (𝜁𝑗)

∞𝑗=1 belongs to 𝑙𝑝 since

∞∑𝑗=1

∣𝜁𝑗 ∣𝑝 ≤∞∑𝑗=1

∞∑𝑖,𝑘=1

∣∣∣𝑐𝑗𝑖𝑘𝜉𝑖𝜂𝑘∣∣∣𝑝 ≤ (∥𝑐∥𝑝,𝑞 ∥𝑥∥𝑝 ∥𝑦∥𝑝)𝑝

by (5.10). The relation 𝑧 = 𝑥𝑦 defines a multiplication in 𝑙𝑝, and the last inequalitycan be rewritten as

∥𝑥𝑦∥𝑝 ≤ ∥𝑐∥𝑝,𝑞 ∥𝑥∥𝑝 ∥𝑦∥𝑝 . (5.12)

By Propositions 3.1 the multiplication is continuous. It remains to note that,according to (5.11), we have

𝛿𝑙𝛿𝑚 =

∞∑𝑗=1

𝑐𝑗𝑙𝑚𝛿𝑗 (5.13)

similarly to (5.9). □

Remark 5.8. In fact, the proof of Theorem 5.7 does not refer to the Schauder basisin 𝑙𝑝. The only important is that this a sequence space, so the multiplication canbe just introduced by formula (5.11), so that

𝛿𝑙𝛿𝑚 = (𝑐𝑗𝑙𝑚)∞𝑗=1,

instead of (5.13). For this reason Theorem 5.7 extends to 𝑝 =∞ in the same wayas before. In this case we have the inequality (5.12) with

∥𝑐∥∞,1 = sup𝑗≥1

∞∑𝑖,𝑘=1

∣∣∣𝑐𝑗𝑖𝑘∣∣∣ . (5.14)

210 Yu.I. Lyubich

By the way, there is no Schauder basis in 𝑙∞ since this Banach space is not sepa-rable.

Corollary 5.9. Let 𝐻 be a separable Hilbert space. Under condition∞∑

𝑖,𝑘,𝑗=1

∣∣∣𝑐𝑗𝑖𝑘∣∣∣2 <∞ (5.15)

a cubic matrix 𝑐 = [𝑐𝑗𝑖𝑘]∞𝑖,𝑘,𝑗=1 assigned to any orthonormal basis (𝑒𝑗)

∞𝑗=1 in 𝐻 is

the structural matrix of a continuous multiplication.

Proof. One can assume that 𝐻 = 𝑙2 and 𝑒𝑗 = 𝛿𝑗. Then the sum in (5.15) turnsinto the square of ∥𝑐∥2,2. □

Remark 5.10. By the Parseval equality we have

∥𝑒𝑖𝑒𝑘∥2 =∞∑𝑗=1

∣∣∣𝑐𝑗𝑖𝑘∣∣∣2 .Hence, (5.15) can be rewritten as

∞∑𝑖,𝑘=1

∥𝑒𝑖𝑒𝑘∥2 <∞. (5.16)

Let us omit the obvious reformulations of the last series of statements insetting of quadratic operators, cf. Corollary 5.4.

The condition 𝑐 ∈ 𝑙𝑝,𝑞 is not necessary for the structural matrices 𝑐 of contin-uous multiplications in 𝑙𝑝. We show this for 𝑝 = 𝑞 = 2, i.e., in the case of Corollary5.9. To this end it suffices to consider the algebras of form (2.2) with the structuralcoefficients (5.3). Recall that a linear operator 𝑇 in a Hilbert space is said to be aHilbert–Schmidt operator if

∥𝑇 ∥𝐻𝑆 ≡ (

∞∑𝑗=1

∥𝑇𝑒𝑖∥2)1/2 <∞

for an (and then for every) orthonormal basis (𝑒𝑖)∞𝑖=1. All Hilbert–Schmidt opera-

tors are continuous since ∥𝑇 ∥ ≤ ∥𝑇 ∥𝐻𝑆 . However, for example, the unit operatoris not Hilbert–Schmidt. In addition to Proposition 3.5 we have

Proposition 5.11. Let 𝑇 be a linear operator in a Hilbert space 𝐻 with an orthonor-mal basis (𝑒𝑖)

∞𝑖=1. For the algebra (2.2) the condition (5.15) is fulfilled for all linear

continuous functionals 𝜑 if and only if 𝑇 is a Hilbert–Schmidt operator.

Proof. Let us deal with the equivalent condition (5.16). The functional 𝜑[𝑥] canbe represented as the inner product ⟨𝑥, ℎ⟩ with an ℎ ∈ 𝐻 . Accordingly,4 ∥𝑒𝑖𝑒𝑘∥2 = ∥⟨𝑒𝑖, ℎ⟩𝑇𝑒𝑘 + ⟨𝑒𝑘, ℎ⟩𝑇𝑒𝑖∥2

≤ ∣⟨𝑒𝑖, ℎ⟩∣2 ∥𝑇𝑒𝑘∥2 + ∣⟨𝑒𝑘, ℎ⟩∣2 ∥𝑇𝑒𝑖∥2 + 2∣⟨𝑒𝑖, ℎ⟩⟨ℎ, 𝑒𝑘⟩∣ ∥𝑇𝑒𝑖∥ ∥𝑇𝑒𝑘∥ ,


whence

2

𝑛∑𝑖,𝑘=1

∥𝑒𝑖𝑒𝑘∥2 ≤𝑛∑

𝑖=1

∣⟨𝑒𝑖, ℎ⟩∣2𝑛∑

𝑘=1

∥𝑇𝑒𝑘∥2 +( 𝑛∑

𝑖=1

∣⟨𝑒𝑖, ℎ⟩∣ ∥𝑇𝑒𝑖∥)2

. (5.17)

Applying the Cauchy–Bunyakovski inequality to the last term in (5.17) we obtain𝑛∑

𝑖,𝑘=1

∥𝑒𝑖𝑒𝑘∥2 ≤𝑛∑

𝑖=1

∣⟨𝑒𝑖, ℎ⟩∣2𝑛∑

𝑘=1

∥𝑇𝑒𝑘∥2 ≤ ∥ℎ∥2𝑛∑

𝑘=1

∥𝑇𝑒𝑘∥2 .

Therefore,∞∑

𝑖,𝑘=1

∥𝑒𝑖𝑒𝑘∥2 ≤ ∥𝑇 ∥2𝐻𝑆 ∥ℎ∥2 <∞

if 𝑇 is a Hilbert–Schmidt operator. Conversely, if 𝑇 is not a Hilbert–Schmidtoperator then starting with ℎ = 𝑒1 we get

2

𝑛∑𝑖,𝑘=1

∥𝑒𝑖𝑒𝑘∥2 =𝑛∑

𝑘=1

∥𝑇𝑒𝑘∥2 + ∥𝑇𝑒1∥2 →∞

as 𝑛→∞. □

Now we consider an integral counterpart of the structural matrix. This is thekernel 𝐾(𝑠, 𝑡;𝑢) of the multiplication

(𝑓 ∘ 𝑔)(𝑢) =∫ℝ

∫ℝ

𝐾(𝑠, 𝑡;𝑢)𝑓(𝑠)𝑔(𝑡) d𝑠 d𝑡, 𝑢 ∈ ℝ. (5.18)

In order to realize this formal construction we assume that, at least, the kernel ismeasurable and the measurable functions 𝑓 , 𝑔 run over a linear space𝐸 such that in(5.18) the integrand belongs to 𝐿1(ℝ

2), and the integral belongs to 𝐸. If 𝐾(𝑠, 𝑡;𝑢)is symmetric in 𝑠, 𝑡 ( “symmetric” for brevity) then the multiplication (5.18) iscommutative. For 𝑓 = 𝑔 we get the quadratic integral operator 𝑉𝐾 : 𝐸 → 𝐸:

(𝑉𝐾𝑓)(𝑢) =

∫ℝ

∫ℝ

𝐾(𝑠, 𝑡;𝑢)𝑓(𝑠)𝑓(𝑡) d𝑠 d𝑡, 𝑢 ∈ ℝ, (5.19)

and here the kernel 𝐾 can be changed for a symmetric one: 𝑉𝐾 = 𝑉�� where

��(𝑠, 𝑡;𝑢) =1

2(𝐾(𝑠, 𝑡;𝑢) +𝐾(𝑡, 𝑠;𝑢)) .

In role of 𝐸 one can consider 𝐿𝑝(ℝ), 1 ≤ 𝑝 ≤ ∞.Theorem 5.12. If the function 𝐾𝑢(𝑠, 𝑡) = 𝐾(𝑠, 𝑡;𝑢) belongs to 𝐿𝑞(ℝ

2) for almostevery 𝑢 and the function 𝜅𝑢 = ∥𝐾𝑢∥𝑞 of 𝑢 belongs to 𝐿𝑝(ℝ) then the multiplication

(5.18) is defined and continuous in 𝐿𝑝(ℝ). Moreover,

∥𝑓 ∘ 𝑔∥𝑝 ≤ 𝜅𝑝,𝑞 ∥𝑓∥𝑝 ∥𝑔∥𝑝 (5.20)

where 𝜅𝑝,𝑞 is the 𝐿𝑝-norm of 𝜅𝑢.

Proof. The same as for Theorem 5.7 but with integrals instead of sums. □

212 Yu.I. Lyubich

Note that the conditions of Theorem 5.12 are symmetric since such is𝐾𝑢(𝑠, 𝑡).

Corollary 5.13. Under conditions of Theorem 5.12 the quadratic operator (5.19)is defined and continuous in 𝐿𝑝(ℝ). Moreover,

∥𝑉𝐾𝑓∥𝑝 ≤ 𝜅𝑝,𝑞 ∥𝑓∥2𝑝 . (5.21)

Corollary 5.14. If 𝐾(𝑠, 𝑡;𝑢) is a Hilbert–Schmidt kernel, i.e.,

𝜅22,2 ≡∫ℝ

∫ℝ

∫ℝ

∣𝐾(𝑠, 𝑡;𝑢)∣2 d𝑠 d𝑡 d𝑢 <∞,

then in 𝐿2(ℝ) the quadratic operator (5.19) is defined and continuous. Moreover,

∥𝑉𝐾𝑓∥2 ≤ 𝜅2,2 ∥𝑓∥22 . (5.22)

It is interesting to compare this result to Corollary 5.9. Let (𝑒𝑗(𝑢))∞𝑗=1 be an

orthonormal basis in 𝐿2(ℝ). Then (𝑒𝑖(𝑠)𝑒𝑘(𝑡)𝑒𝑗(𝑢))∞𝑖,𝑘,𝑗=1 is an orthonormal basis

in 𝐿2(ℝ3). The Hilbert–Schmidt kernel𝐾(𝑠, 𝑡;𝑢) belongs to 𝐿2(ℝ

3) by assumption.

If its coordinates are 𝑐𝑗𝑖𝑘 then for all 𝑓, 𝑔 ∈ 𝐿2(ℝ) the coordinates of 𝑓 ∘ 𝑔 are

𝜁𝑗 =∞∑

𝑖,𝑘=1

𝑐𝑗𝑖𝑘𝜑𝑖𝛾𝑘

where 𝜑𝑖 and 𝛾𝑘 are the coordinates of 𝑓 and 𝑔. Hence, [𝑐𝑗𝑖𝑘]∞𝑖,𝑘,𝑗=1 is the structural

matrix of this algebra. Moreover, 𝑐2,2 = 𝜅2,2 by Parseval’s equality. Thus, (5.15)is equivalent to that 𝐾(𝑠, 𝑡;𝑢) is the Hilbert–Schmidt kernel.

An important case is a kernel𝐾(𝑠, 𝑡;𝑢) with finite support, say,𝐾(𝑠, 𝑡;𝑢) = 0outside the cube 0 ≤ 𝑠, 𝑡, 𝑢 ≤ 1, so, accordingly,

(𝑓 ∘ 𝑔)(𝑢) =∫ 1

0

∫ 1

0

𝐾(𝑠, 𝑡;𝑢)𝑓(𝑠)𝑔(𝑡) d𝑠 d𝑡, 0 ≤ 𝑢 ≤ 1. (5.23)

This yields the following important Banach algebra [15].

Proposition 5.15. With a continuous kernel 𝐾(𝑠, 𝑡;𝑢), 0 ≤ 𝑠, 𝑡, 𝑢 ≤ 1, the multi-plication (5.23) is defined and continuous in 𝐶[0, 1]. Moreover,

∥𝑓 ∘ 𝑔∥ ≤ 𝑁(𝐾) ∥𝑓∥ ∥𝑔∥ , 𝑁(𝐾) = max𝑢∈[0,1]

∫ 1

0

∫ 1

0

∣𝐾(𝑠, 𝑡;𝑢)∣ d𝑠 d𝑡. (5.24)

Proof. For 𝑓 , 𝑔 ∈ 𝐶[0, 1] we have 𝑓 ∘ 𝑔 ∈ 𝐶[0, 1] since 𝐾(𝑠, 𝑡;𝑢) is uniformlycontinuous. The rest is obvious. □

The symmetric continuous kernels 𝐾(𝑠, 𝑡;𝑢) form a subspace 𝐿 ⊂ 𝐶([0, 1]3).We endow it with the norm 𝑁(𝐾) weaker than the standard one.

Corollary 5.16. The linear mapping 𝐽 : 𝐾 �→ 𝑉𝐾 is a continuous embedding of the𝑁 -normed space 𝐿 into the space of continuous quadratic operators in 𝐶[0, 1].

Proof. 𝐽 is injective since if 𝑉𝐾 = 0 then 𝑓 ∘ 𝑔 = 0 for all 𝑓 , 𝑔 ∈ 𝐶[0, 1], andthen 𝐾 = 0 because of the completeness of the set of products 𝑓𝑔 in 𝐶([0, 1]2).Furthermore, ∥𝑉𝐾∥ ≤ 𝑁(𝐾) because of (3.5) and (5.24), so 𝐽 is continuous. □


6. Compact quadratic operators and algebras

The compactness of some linear and nonlinear operators is a powerful tool in thefunctional analysis, especially in the theory of integral and differential equations,see, e.g., [17] and the references therein. According to [15] a commutative Banachalgebra is called compact if such is the corresponding quadratic operator. In turn,a quadratic operator 𝑉 in a Banach space 𝑋 is called compact if the image ofa ball 𝐵𝑟 is relatively compact. In this case the image of every bounded set isrelatively compact. From Proposition 3.3 it follows that every compact quadraticoperator is continuous. Every continuous quadratic operator 𝑉 of finite rank iscompact. Therefore, every commutative Banach algebra 𝑋 with finite-dimensional𝑋 ′ is compact.

We denote by 𝐶𝑄𝑋 the linear space of all compact quadratic operators, so𝐵𝐹𝑄𝑋 ⊂ 𝐶𝑄𝑋 ⊂ 𝐵𝑄𝑋 . The subspace 𝐶𝑄𝑋 is closed. Furthermore, we have

Theorem 6.1. Let a Banach space 𝑋 has a Schauder basis (𝑒𝑗)𝜈𝑗=1. Then the sub-

space 𝐵𝐹𝑄𝑋 is dense in 𝐶𝑄𝑋 .

Proof. If 𝜈 <∞ then 𝐵𝐹𝑄𝑋 = 𝐶𝑄𝑋 . Let 𝜈 =∞, let 𝑉 ∈ 𝐶𝑄𝑋 , and let

𝑃𝑛𝑥 =

𝑛∑𝑗=1

𝑒∗𝑗 [𝑥]𝑒𝑗 , 𝑥 ∈ 𝑋, 𝑛 ≥ 1.

Then

(𝑃𝑛𝑉 )𝑥 =

𝑛∑𝑗=1

𝑒∗𝑗 [𝑉 𝑥]𝑒𝑗 ,

so, 𝑃𝑛𝑉 is a continuous quadratic operator at most of rank 𝑛. For 𝑛→∞ we have𝑃𝑛𝑥→ 𝑥 for all 𝑥 ∈ 𝑋 . Hence, (𝑃𝑛𝑉 )𝑥→ 𝑉 𝑥 for all 𝑥. This convergence is uniformon the ball 𝐵1 since 𝑉 𝐵1 is relatively compact. Thus, ∥𝑃𝑛𝑉 − 𝑉 ∥ → 0. □

Remark 6.2. A similar theorem for the compact linear operators is well known,and the proof is the same. However, there exists a Banach space in which not everycompact linear operator can be approximated by continuous linear operators offinite rank [5]. The question arises: are the approximation properties for the linearand for the quadratic operators equivalent?

Now let us turn to the quadratic integral operators. For such an operator ofrank 𝑟 its symmetric kernel is called of rank 𝑟 as well.

Theorem 6.3. In the space 𝐶[0, 1] general form of a symmetric continuous kernelof a finite rank 𝑟 is

𝐾(𝑠, 𝑡;𝑢) =𝑟∑

𝑗=1

𝐾𝑗(𝑠, 𝑡)𝑒𝑗(𝑢) (6.1)

where all functions 𝑒𝑗(𝑢) are continuous, all partial kernels 𝐾𝑗(𝑠, 𝑡) are symmetricand continuous, and the systems (𝑒𝑗)

𝑟𝑗=1 and (𝐾𝑗)

𝑟𝑗=1 are both linearly independent.

214 Yu.I. Lyubich

Proof. Every function𝐾(𝑠, 𝑡;𝑢) of form (6.1) with continuous 𝑒𝑗(𝑢) and symmetriccontinuous 𝐾𝑗(𝑠, 𝑡) is the symmetric continuous kernel of the operator


𝑟∑𝑗=1

𝑒𝑗(𝑢)

∫ 1

0

𝐾𝑗(𝑠, 𝑡)𝑓(𝑠)𝑓(𝑡) d𝑠 d𝑡 (6.2)

in the space 𝑋 = 𝐶[0, 1]. By Theorem 2.7 this is a minimal decomposition of 𝑉𝐾into a sum of elementary quadratic operators. Thus, 𝑟(𝑉𝐾) = 𝑟.

Conversely, let a symmetric continuous kernel 𝐾(𝑠, 𝑡;𝑢) be such that

(𝑉𝐾𝑓)(𝑢) =𝑟∑

𝑗=1

𝑣𝑗 [𝑓 ]𝑒𝑗(𝑢)

where 𝑟 = 𝑟(𝑉𝐾 ), 𝑒𝑗 ∈ 𝐶[0, 1] and 𝑣𝑗 are quadratic functionals on 𝐶[0, 1]. ByCorollary 2.8 and the general form of continuous linear functionals on 𝐶[0, 1],

𝑣𝑗 [𝑓 ] = 𝑒∗𝑗 [𝑉𝐾𝑓 ] =

∫ 1

0

d𝜎𝑗(𝑢)

∫ 1

0

∫ 1

0

𝐾(𝑠, 𝑡;𝑢)𝑓(𝑠)𝑓(𝑡) d𝑠 d𝑡

where 𝜎𝑗(𝑢) are some functions of bounded variation. This yields (6.1) with

𝐾𝑗(𝑠, 𝑡) =

∫ 1

0

𝐾(𝑠, 𝑡;𝑢) d𝜎𝑗(𝑢), 1 ≤ 𝑗 ≤ 𝑟.

The partial kernels 𝐾𝑗(𝑠, 𝑡) are symmetric and continuous. They are linearly in-dependent, as well as 𝑒𝑗(𝑢), by Theorem 2.7. □

Theorem 6.4. In the space 𝐶[0, 1] any quadratic integral operator


∫ 1

0

∫ 1

0

𝐾(𝑠, 𝑡;𝑢)𝑓(𝑠)𝑓(𝑡) d𝑠 d𝑡, 0 ≤ 𝑢 ≤ 1, (6.3)

with continuous kernel 𝐾(𝑠, 𝑡;𝑢) is compact.

Proof. Let 𝑓 ∈ 𝐶[0, 1], ∥𝑓∥ ≤ 1, and let 𝑔 = 𝑉 𝑓 . Then

∣𝑔(𝑢+ 𝜏)− 𝑔(𝑢)∣ ≤∫ 1

0

∫ 1

0

∣𝐾(𝑠, 𝑡;𝑢+ 𝜏) −𝐾(𝑠, 𝑡;𝑢)∣ d𝑠 d𝑡, 𝜏 ∈ ℝ,

hence, the set 𝑉 𝐵1 is uniformly continuous. Furthermore, 𝑉 𝐵1 is bounded since

∣𝑔(𝑢)∣ ≤∫ 1

0

∫ 1

0

∣𝐾(𝑠, 𝑡;𝑢)∣ d𝑠 d𝑡, 𝜏 ∈ ℝ.

Thus, the set 𝑉 𝐵1 is relatively compact by the Arzela–Ascoli theorem. □

Further all kernels of integral operators in 𝐶[0, 1] are assumed continuous.

Theorem 6.5. If 𝐾(𝑠, 𝑡;𝑢) is a Hilbert–Schmidt kernel then in 𝐿2(ℝ) the quadraticoperator (5.19) is compact.


Proof. According to Theorem 20 from [3, Chapter 4], we have to check the followingthree properties of the set 𝐺 = {𝑉 𝑓 : ∥𝑓∥ ≤ 1}.

(1) 𝐺 is bounded. This is obvious since 𝑉 is bounded.(2) If 𝑎 > 0, ℎ𝑎(𝑢) = 0 for ∣𝑢∣ < 𝑎 and ℎ𝑎(𝑢) = 1 for ∣𝑢∣ ≥ 𝑎 then

sup𝑔∈𝐺 ∥𝑔ℎ𝑎∥ tends to zero as 𝑎→∞. This follows from the inequality

∥𝑔ℎ𝑎∥2 ≤∫ℝ

∫ℝ

∫ℝ

∣𝐾(𝑠, 𝑡, 𝑢)∣2 ℎ𝑎(𝑢) d𝑠 d𝑡 d𝑢

since the integrand pointwise tends to zero under the 𝐿1-majorant ∣𝐾(𝑠, 𝑡, 𝑢)∣2.(3) If 𝜏 ∈ ℝ and 𝑔𝜏 (𝑢) = 𝑔(𝑢 + 𝜏) then sup𝑔∈𝐺 ∥𝑔𝜏 − 𝑔∥ tends to zero as

𝜏 → 0. This follows from the inequality

∥𝑔𝜏 − 𝑔∥2 ≤∫ℝ

∫ℝ

∫ℝ

∣𝐾(𝑠, 𝑡;𝑢+ 𝜏)−𝐾(𝑠, 𝑡;𝑢)∣2 d𝑠 d𝑡 d𝑢

the right-hand side of which can be rewritten as∫ℝ

∫ℝ

∫ℝ

∣∣∣��(𝜉, 𝜂, 𝜁)(𝑒𝑖<𝜁,𝜏> − 1)∣∣∣2 d𝜉 d𝜂 d𝜁

by Parseval’s equality for the Fourier–Plansherel transform 𝐾 �→ ��. □

Remark 6.6. Theorems 6.4 and 6.5 can be extended to the operators of form


∫𝑆

∫𝑆

𝐾(𝑠, 𝑡, 𝑢)𝑓(𝑠)𝑓(𝑡) d𝑠 d𝑡, 𝑢 ∈ 𝑆, (6.4)

in the space 𝐶(𝑆) on a compact topological space 𝑆 with a Radon measure d𝑠 or,respectively, in 𝐿2(𝑆) on a locally compact Abelian group 𝑆 with a Haar measure.

7. Baric and Bernstein algebras and quadratic operators

We start with a purely algebraic theory originated from the population genetics.A scalar function 𝜔 ∕= 0 on an algebra 𝑋 is called a weight if it is a linear multi-plicative functional or, the same, a homomorphism of 𝑋 into the ground field. Ifa weight exists and fixed then 𝑋 is called a baric algebra [6]. More rigorously, abaric algebra is a pair (𝑋,𝜔) where 𝑋 is an algebra and 𝜔 is a weight on 𝑋 . Thehyperplane 𝐻0

𝜔 = ker𝜔 = {𝑥 ∈ 𝑋 : 𝜔[𝑥] = 0} is an ideal (the barideal) of 𝑋 .With 𝜑[𝑒] = 1 the algebra (2.5) is baric with the weight 𝜑. With 𝜑 ∕= 0 the

algebra (2.2) is baric with the weight 𝜑 if and only if 𝜑[𝑇𝑥] = 𝜑[𝑥] for all 𝑥 ∈ 𝑋 .A linear functional 𝜔 ∕= 0 is called a weight of a quadratic operator 𝑉 if

𝜔[𝑉 𝑥] = 𝜔2[𝑥], 𝑥 ∈ 𝑋. (7.1)

Lemma 7.1. For a quadratic operator 𝑉 and for the corresponding commutativealgebra 𝑋 the sets Ω𝑉 and Ω𝑋 of weights coincide.

Proof. Obviously, Ω𝑋 ⊂ Ω𝑉 . The converse inclusion follows by polarization. □

The following is a refinement of Proposition 2.2.

216 Yu.I. Lyubich

Proposition 7.2. If 𝑟 = 𝑟(𝑉 ) <∞ and 𝜔 ∈ Ω𝑉 then

𝑉 𝑥 = 𝜔2[𝑥]𝑒1 +

𝑟∑𝑗=2

𝑣𝑗 [𝑥]𝑒𝑗 , 𝑥 ∈ 𝑋, (7.2)

where 𝑒1 ∈ Im𝑉 , 𝜔[𝑒1] = 1, (𝑒𝑗)𝑟𝑗=2 is a basis in 𝑆(𝑉 ) ∩𝐻0

𝜔, 𝑣𝑗 [𝑥] are quadratic

functionals linearly independent along with 𝜔2[𝑥].

Proof. In the proof of Proposition 2.2 one can take any 𝑒1 = 𝑉 𝑒0 with 𝜔[𝑒0] = 1.Then 𝜔[𝑒1] = 𝜔[𝑉 𝑒0] = 𝜔2[𝑒0] = 1. Let (𝑒𝑗)

𝑟𝑗=2 be a basis in 𝑆(𝑉 ) ∩ 𝐻0

𝜔, and

let (𝑒∗𝑗 )𝑟𝑗=2 be its dual basis extended to a basis in 𝑆(𝑉 )

∗ by setting 𝑒∗𝑗 [𝑒1] = 0,2 ≤ 𝑗 ≤ 𝑟, and by joining 𝑒∗1 = 𝜔. In the corresponding decomposition of 𝑉 wehave 𝑣1[𝑥] = 𝑒

∗1[𝑉 𝑥] = 𝜔[𝑉 𝑥] = 𝜔

2[𝑥]. □

An important example is the operator (6.3) in the space 𝐶[0, 1] under condi-tion ∫ 1

0

𝐾(𝑠, 𝑡;𝑢) d𝑢 = 1, 0 ≤ 𝑠, 𝑡 ≤ 1. (7.3)

In this case the functional

𝜔[𝑓 ] =

∫ 1

0

𝑓(𝑢) d𝑢 (7.4)

is a weight. Then

𝐾(𝑠, 𝑡;𝑢) = 𝑒1(𝑢) +

𝑟∑𝑗=2

𝐾𝑗(𝑠, 𝑡)𝑒𝑗(𝑢) (7.5)

where

𝑒1(𝑢) = (𝑉𝐾1)(𝑢) =

∫ 1

0

∫ 1

0

𝐾(𝑠, 𝑡;𝑢) d𝑠 d𝑡,

∫ 1

0

𝑒𝑗(𝑢) d𝑢 = 0, 2 ≤ 𝑗 ≤ 𝑟,(7.6)

the sets {𝑒𝑗(𝑢)}𝑟𝑗=2 and {𝐾𝑗(𝑠, 𝑡)}𝑟𝑗=2 ∪ {1(𝑠, 𝑡)} are both linearly independent.A commutative baric algebra (𝑋,𝜔) is called a Bernstein algebra if

(𝑥2)2 = 𝜔2[𝑥]𝑥2, 𝑥 ∈ 𝑋. (7.7)

For example, the baric algebra (2.5) is Bernstein. The baric algebra (2.2) is Bern-stein if and only if 𝑇 2 = 𝑇 , i.e., 𝑇 is a projection. (In both cases 𝜔 = 𝜑.)

In terms of the Bernstein quadratic operators the identity (7.7) is

𝑉 2𝑥 = 𝜔2[𝑥]𝑉 𝑥, 𝑥 ∈ 𝑋, (7.8)

or, equivalently, if 𝑉1 = 𝑉 ∣𝐻1𝜔 where 𝐻1

𝜔 = {𝑥 ∈ 𝑋 : 𝜔[𝑥] = 1} then𝑉 21 = 𝑉1. (7.9)

Lemma 7.3. Every Bernstein quadratic operator (or Bernstein algebra) with aweight 𝜔 has a fixed point (an idempotent, respectively) 𝑒 such that 𝜔[𝑒] = 1.

Proof. Take 𝑒 = 𝑉 𝑥 with 𝜔[𝑥] = 1. Then 𝜔[𝑒] = 1 and 𝑉 𝑒 = 𝑉 2𝑥 = 𝑉 𝑥 = 𝑒. □


Below 𝑒 is an idempotent, 𝜔[𝑒] = 1. Every vector 𝑥 ∈ 𝑋 can be uniquelyrepresented as

𝑥 = 𝜔[𝑥]𝑒 + 𝑧, 𝑧 ∈ 𝐻0𝜔. (7.10)

In the barideal 𝐻0𝜔 we have the linear operator 𝑃𝑒𝑥 = 2𝑒𝑥.

Lemma 7.4. The operator 𝑃𝑒 is a projection.

Proof. For 𝜔[𝑥] = 0 the line 𝑒+𝑡𝑥, 𝑡 ∈ ℝ, belongs to 𝐻1𝜔. Hence, ((𝑒+𝑡𝑥)

2)2 = (𝑒+𝑡𝑥)2, i.e., 𝑒+4𝑡𝑒(𝑒𝑥)+ ⋅ ⋅ ⋅ = 𝑒+2𝑡𝑒𝑥+ ⋅ ⋅ ⋅ . Thus, 4𝑒(𝑒𝑥) = 2𝑒𝑥, i.e., 𝑃 2

𝑒 = 𝑃𝑒. □

As a consequence, we have the direct decomposition

𝐻0𝜔 = 𝑈 ⊕𝑊 (𝑈 = Im𝑃𝑒, 𝑊 = ker𝑃𝑒). (7.11)

As a result,

𝑋 = Span{𝑒} ⊕ 𝑈 ⊕𝑊. (7.12)

The following lemma is an immediate corollary of Theorem 3.4.8 from [14].

Lemma 7.5. For every 𝑥 ∈ 𝑋 the 𝑊 -component of 𝑥2 is 𝑢2 with an 𝑢 ∈ 𝑈 .

Now let 𝑋 be a Banach space, and let 𝜔 ∕= 0 be a continuous linear functional.With the weight 𝜔 we consider a continuous Bernstein quadratic operator and thecorresponding Banach–Bernstein algebra [15].

Theorem 7.6. A Banach–Bernstein algebra 𝑋 is compact if and only if the subal-gebra 𝑋 ′ is finite dimensional.

Proof. “If” is obvious. For the “only if” note that the linear operator 𝑃𝑒 is compactsince such is 𝑉 and

𝑃𝑒𝑥 =1

2(𝑉 (𝑥+ 𝑒)− 𝑉 (𝑥− 𝑒))

Since 𝑃𝑒 is a projection and 𝑃𝑒∣𝑈 = 1, the subspace 𝑈 is finite dimensional. Let(𝑢𝑖)

𝑙𝑖=1 be its basis. By decomposition (7.12) and Lemma 7.5 we have

𝑋 ′ ⊂ Span{𝑒, 𝑢1, . . . , 𝑢𝑙, 𝑢21, . . . , 𝑢2𝑙 , 𝑢1𝑢2, . . . , 𝑢𝑙−1𝑢𝑙}. □

The operator form of Theorem 7.6 is

Theorem 7.7. A continuous Bernstein quadratic operator is compact if and only ifit is of finite rank.

Combining this result with Lemma 7.3 and Proposition 7.2 we obtain

Corollary 7.8. Every compact Bernstein quadratic operator is of form

𝑉 𝑥 = 𝜔2[𝑥]𝑒1 +

𝑟∑𝑗=2

𝑣𝑗 [𝑥]𝑒𝑗 , 𝑥 ∈ 𝑋, (7.13)

where 𝑟 = 𝑟(𝑉 ) <∞, 𝑉 𝑒1 = 𝑒1, 𝜔[𝑒1] = 1, (𝑒𝑗)𝑟𝑗=2 is a basis in 𝑆(𝑉 ) ∩𝐻0

𝜔, 𝑣𝑗 [𝑥]

are continuous quadratic functionals linearly independent along with 𝜔2[𝑥].

218 Yu.I. Lyubich

In addition,

𝑣𝑗 [𝑉 𝑥] = 𝜔2[𝑥]𝑣𝑗 [𝑥], 2 ≤ 𝑗 ≤ 𝑟. (7.14)

This follows by 𝑥 �→ 𝑉 𝑥 in (7.13) and by applying (7.8) to the result.Theorem 7.7 is applicable to the Bernstein quadratic integral operators in

𝐶[0, 1] since they are compact by Theorem 6.4. The standard weight in this caseis (7.4). Thus, all Bernstein kernels are of form (7.5) where the functions 𝑒𝑗 and𝐾𝑗(𝑠, 𝑡) are such as aforesaid.

The following theorem announced in [1] was proved for the first time in [16]applying Theorem 7.7. The latter (as well as Theorem 7.6) was proved in [15].

Theorem 7.9. If 𝐾(𝑠, 𝑡;𝑢) > 0 is a Bernstein kernel then 𝐾 only depends on 𝑢.

In other words,𝐾 is of rank 1. Actually, in [16] the assumption is 𝐾(𝑠, 𝑡;𝑢) ≥0 with 𝐾(𝑠, 𝑠;𝑢) > 0. Also note that all our results on continuous kernels remaintrue for 𝐶(𝑆, 𝜇) where 𝑆 is a compact and 𝜇 is a Radon measure, supp𝜇 = 𝑆.

References

[1] S.N. Bernstein. Solution of a mathematical problem related to the theory of inheri-tance, Uchenye Zapiski n.-i. kafedr Ukrainy 1 (1924), 83–115 (in Russian).

[2] N. Bourbaki. Elements de mathematique, Algebre, Ch. 9: Formes sesquilineaires etformes quadratiques, Hermann, Paris, 1959 (in French).

[3] N. Danford and J.T. Schwartz. Linear operators, Part 1: General theory, Intersci.Publ., 1958.

[4] M. Day. Normed linear spaces, Springer-Verlag, 1958.

[5] P. Enflo. A counterexample to the approximation problem in Banach spaces, ActaMath. 130 (1973), 309–317.

[6] I.M.H. Etherington. Genetic algebras, Proc. Roy. Soc. Edinburgh, A 59 (1939), 242–258.

[7] V.I. Glivenko. Mendelian algebra, Doklady Akad. Nauk SSSR 8, No.4 (1936), 371–372 (in Russian).

[8] J.C. Gutierrez. Solution of the Bernstein problem in the non-regular case, J. Algebra223, No.1 (2000), 226–132.

[9] P. Holgate. Genetic algebras satisfying Bernstein’s stationarity principle. J. LondonMath. Soc. (2) 9 (1975), 612–624.

[10] Yu.I. Lyubich. Basis concepts and theorems of evolutionary genetics for free popu-lations, Russian Math. Surveys 26, No. 5 (1971), 51–123.

[11] Yu.I. Lyubich, Two-level Bernstein populations, Math. USSR Sb. 24, No. 1 (1974),593–615.

[12] Yu.I. Lyubich. Proper Bernstein populations, Probl. Inform. Transmiss. (1978), 228–235.

[13] Yu.I. Lyubich. Bernstein algebras, Uspekhi Mat. Nauk 32, No. 6 (1977), 261–263 (inRussian).


[14] Yu.I. Lyubich. Mathematical structures in population genetics, Springer-Verlag,1992. (Translated from the Russian original, 1983, Naukova Dumka.)

[15] Yu.I. Lyubich. Banach–Bernstein algebras and their applications. Nonassociativealgebra and its applications, Lecture Notes in Pure and Appl. Math., 211, Dekker,2000, pp. 205–210.

[16] Yu.I. Lyubich. A theorem on Bernstein quadratic integral operators, Nonassociativealgebra and its applications, Lect. Notes Pure and Appl. Math., 246, Chapman–Hall,2006, pp. 245–251.

[17] L. Nirenberg. Topics in nonlinear functional analysis, Courant Inst, 1974.

[18] O. Reiersol. Genetic algebras studied recursively and by means of differential oper-ators, Math. Scand., 10 (1962), 25–44.

[19] A.S. Serebrovskii. Properties of Mendelian equalities, Doklady Akad. Nauk SSSR 2,No. 1 (1934), 33–36.

[20] R.D. Schafer. An introduction to nonassociative algebras, Acad. Press, 1966.

[21] A. Worz-Busekros. Algebras in Genetics, Springer-Verlag, 1980.

Yu.I. LyubichDepartment of MathematicsTechnion, 32000Haifa, Israele-mail: [email protected]



Strong Stability of Invariant Subspacesof Quaternion Matrices

Leiba Rodman

Dedicated to Leonid Lerer on occasion of his 70th birthday

Abstract. Classes of matrices and their invariant subspaces with various ro-bustness properties are described, in the context of matrices over real quater-nions and quaternionic subspaces. Robustness is understood in the sense ofbeing close to the original invariant subspace under small perturbation of thematrix.

Mathematics Subject Classification (2010). 15B33.

Keywords. Real quaternions, matrices over quaternions, invariant subspaces,robustness, stability.

1. Introduction

In the present paper we study certain classes of invariant subspaces of matricesthat change little (in various senses) after small perturbations of the matrix. Suchclasses have been much studied in recent 35 years, for complex and real matrices,starting with the grounbreaking papers [3, 5]. The literature on the subject isextensive, and we mention here only chapters and parts in books [2, Chapter 8],[6, Chapter I.5], [7, Chapter 15], [8, Chapter II.4], [4, Part 4].

In [13], the results on stable and 𝛼-stable invariant subspaces for complexmatrices have been extended to matrices over the real quaternions and quater-nionic invariant subspaces. An 𝐴-invariant subspace is said to be stable if, looselyspeaking, every matrix 𝐵 close to 𝐴 has an invariant subspace 𝒩 close to ℳ;𝛼-stability is characterized by the stronger property that the gap between 𝒩 andℳ is a Holder function of ∥𝐵 − 𝐴∥ of exponent 𝛼. (See the precise definitions inSection 2.)

Here, we continue this line of investigation. We extend to quaternionic ma-trices the characterizations of strongly stable and strongly 𝛼-stable invariant sub-spaces of complex matrices obtained in [12]. Strong stability means that stability

222 L. Rodman

holds for every 𝐵-invariant subspace 𝒩 (in the above notation), provided the ob-vious dimensionality conditions are satisfied, and similarly for strong 𝛼-stability.From the standpoint of basic backward error analysis, the notion of strong sta-bility is perhaps more appropriate than the notion of stability as a measure ofrobustness of invariant subspaces of matrices.

Besides the introduction, the paper consists of five sections. In Section 2 wepresent some known results on matrix analysis for quaternionic matrices, for thereaders’ benefit. Our main Theorems 3.3 and 6.1 are stated in Sections 3 and 6respectively. We prove Theorem 3.3 in Sections 4 and 5.

2. Quaternionic linear algebra

In this section we review briefly some known facts in linear algebra over the skewfield of real quaternions H, and introduce notation to be used throughout thepaper.

Denote by i, j, k the standard quaternion imaginary units. For 𝑥 ∈ H, 𝑥 = 𝑎0+𝑎1i+𝑎2j+𝑎3k with 𝑎𝑗 ∈ R (the real field), let ℜ(𝑥) = 𝑎0 and 𝔙(𝑎) = 𝑎1i+𝑎2j+𝑎3kbe the real and the vector part of 𝑥, respectively, 𝑥∗ = 𝑎0 − 𝑎1i− 𝑎2j− 𝑎3k be theconjugate quaternion, and ∣𝑥∣ =√𝑎20 + 𝑎21 + 𝑎22 + 𝑎23 be the length of 𝑥.

Let H𝑛×𝑚 (abbreviated to H𝑛 if 𝑚 = 1) be the set of 𝑛 ×𝑚 matrices withentries in H. The quaternionic conjugation naturally leads to the concept of con-jugate transposed matrices; we denote by 𝐴∗ ∈ H𝑚×𝑛 the conjugate transposeof a matrix 𝐴 ∈ H𝑛×𝑚. H𝑛 is considered as a right quaternionic vector space,endowed with the quaternion-valued inner product ⟨𝑢, 𝑣⟩ = 𝑣∗𝑢, 𝑢, 𝑣 ∈ H𝑛 and

norm ∥𝑣∥ =√⟨𝑣, 𝑣⟩, 𝑣 ∈ H𝑛. Matrices 𝐴 ∈ H𝑛×𝑚 will be also considered as lineartransformations H𝑚 → H𝑛 by way of matrix-vector multiplication 𝑥 �→ 𝐴𝑥. Weuse the operator norm for matrices:

∥𝐴∥ = max{∥𝐴𝑢∥ : 𝑢 ∈ H𝑛, ∥𝑢∥ = 1},where 𝐴 ∈ H𝑚×𝑛. Note that ∥𝐴∥ coincides with the largest singular value of 𝐴.

Consider now the complex matrix representation of quaternions. For 𝑥 =𝑎0 + i𝑎1 + j𝑎2 + k𝑎3 ∈ H, 𝑎0, 𝑎1, 𝑎2, 𝑎3 ∈ R, define

𝜔(𝑥) =

[𝑎0 + i𝑎1 𝑎2 + i𝑎3−𝑎2 + i𝑎3 𝑎0 − i𝑎1

]∈ C2×2,

(C𝑚×𝑛 stands for the set of 𝑚× 𝑛 complex matrices ) and extend 𝜔 entrywise toa map

𝜔𝑚,𝑛 : H𝑚×𝑛 → C2𝑚×2𝑛, 𝜔𝑚,𝑛

([𝑥𝑖,𝑗 ]

𝑚,𝑛𝑖,𝑗=1

)= [𝜔(𝑥𝑖,𝑗)]

𝑚,𝑛𝑖,𝑗=1, 𝑥𝑖,𝑗 ∈ H.

We have:

(i) 𝜔𝑛,𝑛 is a unital homomorphism of real algebras;(ii) if 𝑋 ∈ H𝑚×𝑛, 𝑌 ∈ H𝑛×𝑝, then 𝜔𝑚,𝑝(𝑋𝑌 ) = 𝜔𝑚,𝑛(𝑋)𝜔𝑛,𝑝(𝑌 );(iii) 𝜔𝑛,𝑚(𝑋

∗) = (𝜔𝑚,𝑛(𝑋))∗, ∀ 𝑋 ∈ H𝑚×𝑛;

Strong Stability of Invariant Subspaces 223

(iv) there exist positive constants 𝑐𝑚,𝑛, 𝐶𝑚,𝑛 such that

𝑐𝑚,𝑛∥𝜔𝑚,𝑛(𝑋)∥ ≤ ∥𝑋∥ ≤ 𝐶𝑚,𝑛∥𝜔𝑚,𝑛(𝑋)∥ (2.1)

for every 𝑋 ∈ H𝑚×𝑛.

Often, we will abbreviate 𝜔𝑚,𝑛 to 𝜔 (with 𝑚,𝑛 understood from context).Let 𝐴 ∈ H𝑛×𝑛. A vector 𝑢 ∈ H𝑛 ∖ {0} is said to be an eigenvector of 𝐴

corresponding to the eigenvalue 𝛼 ∈ H if

𝐴𝑣 = 𝑣𝛼. (2.2)

The set of all eigenvalues of 𝐴 is denoted 𝜎(𝐴), the spectrum of 𝐴. Note that 𝜎(𝐴)is closed under similarity of quaternions: If (2.2) holds, then 𝐴(𝑣𝜆) = (𝑣𝜆)(𝜆−1𝛼𝜆),𝜆 ∈ H ∖ {0}, so 𝑣𝜆 is an eigenvector of 𝐴 corresponding to the eigenvalue 𝜆−1𝛼𝜆.Note also that the similarity orbit {𝜆−1𝛼𝜆 : 𝜆 ∈ H ∖ {0}} of 𝛼 consists exactly ofthose quaternions 𝜇 for which ℜ(𝜇) = ℜ(𝛼) and ∣𝔙(𝜇)∣ = ∣𝔙(𝛼)∣. In particular, if𝛼 ∈ 𝜎(𝐴), then also 𝛼∗ ∈ 𝜎(𝐴).

The Jordan form is valid for quaternionic matrices:

Theorem 2.1. Let 𝐴 ∈ H𝑛×𝑛. Then there exists an invertible 𝑆 ∈ H𝑛×𝑛 such that𝑆−1𝐴𝑆 has the form

𝑆−1𝐴𝑆 = 𝐽𝑚1(𝜆1)⊕ ⋅ ⋅ ⋅ ⊕ 𝐽𝑚𝑝(𝜆𝑝), 𝜆1, . . . , 𝜆𝑝 ∈ H, (2.3)

where 𝐽𝑚(𝜆) is the 𝑚×𝑚 (upper triangular) Jordan block having eigenvalue 𝜆. Theform (2.3) is uniquely determined by 𝐴 up to an arbitrary permutation of blocksand up to a replacement of 𝜆1, . . . , 𝜆𝑝 with 𝛼−11 𝜆1𝛼1, . . . , 𝛼

−1𝑝 𝜆𝑝𝛼𝑝, respectively,

where 𝛼𝑗 ∈ H ∖ {0}, 𝑗 = 1, 2, . . . , 𝑝.

A proof is given in [16]; the result goes back to [15].We will need the following transformation properties of bases and Jordan

forms under 𝜔:

Proposition 2.2. Let 𝑢1, . . . , 𝑢𝑘 be a basis (resp., orthogonal basis, orthonormalbasis, spanning set) of a subspace 𝒰 ⊆ H𝑛. Then:

(1) The columns of 𝜔 (𝑢1), . . . , 𝜔 (𝑢𝑘) form a basis (resp., orthogonal basis, or-thonormal basis) of the subspace 𝜔 (𝒰) ⊆ C2𝑛.

(2) The subspace 𝜔 (𝒰) is independent of the choice of the basis (resp., orthogonalbasis, orthonormal basis) of 𝒰 .

Proof. Part (1) is [13, Proposition 2.1]. If 𝑢1, . . . , 𝑢𝑘 and 𝑢′1, . . . , 𝑢

′𝑘 are two bases

for 𝒰 , then [𝑢1 . . . 𝑢𝑘 ]𝑆 = [𝑢′1 . . . 𝑢′𝑘 ] for some invertible matrix 𝑆. Applying 𝜔

to this equality, (2) follows. □

Proposition 2.3. Let 𝐴 ∈ H𝑛×𝑛, and let 𝒩 of H𝑛 be an 𝐴-invariant subspace.Denote by 𝐽 the Jordan form of 𝐴∣𝒩 specialized so that 𝐽 ∈ C𝑛×𝑛. Then theJordan form (over the complexes) of 𝜔(𝐴∣𝒩 ) is 𝐽 ⊕ 𝐽 , where the overline standsfor the complex conjugation.

224 L. Rodman

Proof. Let 𝑢1, . . . , 𝑢𝑘 be a Jordan basis for 𝐴∣𝒩 in 𝒩 , such that𝐴 [𝑢1 𝑢2 . . . 𝑢𝑘] = [𝑢1 𝑢2 . . . 𝑢𝑘] 𝐽. (2.4)

By Proposition 2.2, the columns of [𝜔(𝑢1), 𝜔(𝑢2), . . . , 𝜔(𝑢𝑘)] form a basis for𝜔(𝒩 ) (cf. [13, Proposition 2.4]). It remains to apply the map 𝜔 to (2.4). □

The distance between two subspaces in H𝑛 will be measured by the gap.Define the gap between two subspacesℳ and 𝒩 of H𝑛 by

𝜃(ℳ,𝒩 ) = ∥𝑃ℳ − 𝑃𝒩 ∥,where 𝑃𝒳 is the orthogonal projection on a subspace 𝒳 . If {𝑢1, . . . , 𝑢𝑝} is anorthonormal basis for 𝒳 , then

𝑃𝒳 = [𝑢1 𝑢2 . . . 𝑢𝑝][𝑢1 𝑢2 . . . 𝑢𝑝]∗. (2.5)

The gap is a metric on the set of all subspaces of H𝑛 that turns the set intoa compact complete metric space; this is well known in the context of complexsubspaces, and can be proved for quaternion subspaces in the same way (see, e.g.,the proof of [7, Theorem 13.4.1]). Many basic properties of subspaces in H𝑛 as theyrelate to the gap metric and are familiar in the setting of subspaces of the realvector space R𝑛 or of C𝑛, remain valid in the setting of quaternion subspaces, withessentially the same proofs, for example, [7, Theorems 13.1.1, 13.1.2, 13.1.3, 13.4.1,13.4.2, 13.4.3]. Some are proved with complete details in [13, Theorem 2.11]. Wewill use these properties in the sequel as necessary, and present here only a fewof them (Theorem 2.4 below; parts (1) and (3) are standard, (2) is proved in [13],and a short proof of (4) is supplied). We denote by 𝑑(𝑥, 𝑍) = inf𝑡∈𝑍 ∥𝑥 − 𝑡∥ thedistance from 𝑥 ∈ H𝑛 to a set 𝑍 ⊆ H𝑛.

Theorem 2.4.(1) If 𝜃(ℳ,𝒩 ) < 1, then dimℳ = dim𝒩 . (The dimensions of subspaces in H𝑛

are understood in the quaternionic sense.)(2) Assume ℳ′,ℳ are subspaces of H𝑛 such that the sum ℳ′+ℳ is direct.

Then there exists 𝛿 > 0 (which depends on ℳ and ℳ′ only) such that, if𝒩 ,𝒩 ′ are subspaces of H𝑛 and

max{𝜃(ℳ,𝒩 ), 𝜃(ℳ′,𝒩 ′)} ≤ 𝛿, (2.6)

then the sum 𝒩 ′+𝒩 is also direct.Assume in addition that ℳ′+ℳ = H𝑛. Then there exists 𝛿1 > 0 such

that if (2.6) holds (with 𝛿 replaced by 𝛿1) for subspaces 𝒩 ,𝒩 ′ of H𝑛, then𝒩 ′+𝒩 = H𝑛 and

max{𝜃(ℳ,𝒩 ), 𝜃(ℳ′,𝒩 ′)} ≤ ∥𝑃ℳ,ℳ′ − 𝑃𝒩 ,𝒩 ′∥ (2.7)

≤(4(1 + 2∥𝑃ℳ,ℳ′∥) max

𝑥∈ℳ′, ∥𝑥∥=1

{𝑑(𝑥,ℳ)−1

})× (𝜃(ℳ,𝒩 ) + 𝜃(ℳ′,𝒩 ′)) , (2.8)

where the matrix 𝑃ℳ,ℳ′ projects H𝑛 onto ℳ along ℳ′, whereas 𝑃𝒩 ,𝒩 ′

projects H𝑛 onto 𝒩 along 𝒩 ′.


(3) For all subspaces 𝒩1,𝒩2 ⊆ H𝑛 and all invertible matrices 𝑆 ∈ H𝑛×𝑛, theinqualities(∥𝑆∥ ∥𝑆−1∥)−1 𝜃 (𝒩1,𝒩2) ≤ 𝜃 (𝑆𝒩1, 𝑆𝒩2) ≤ ∥𝑆∥ ∥𝑆−1∥𝜃 (𝒩1,𝒩2)

hold.(4) Let 𝒬1+𝒬2 = H𝑛, a direct sum of subspaces. Then there exists a constant

𝐶 > 0 that depends on 𝒬1,𝒬2 only such that

𝜃(ℳ1,ℳ2) ≤ 𝐶𝜃(ℳ1+𝒬2,ℳ2+𝒬2)

for every pair of subspaces ℳ1,ℳ2 ⊆ 𝒬1.

Proof of (4). Let 𝑆 ∈ H𝑛×𝑛 be an invertible matrix such that 𝑆𝒬1 and 𝑆𝒬2 areorthogonal. Then for subspacesℳ1,ℳ2 ⊆ 𝒬1 the equality

𝜃(𝑆ℳ1, 𝑆ℳ2) = 𝜃(𝑆ℳ1+𝑆𝒬2, 𝑆ℳ2+𝑆𝒬2) (2.9)

is obvious. Now (3) gives

𝜃(ℳ1,ℳ2) ≤ ∥𝑆∥ ∥𝑆−1∥ 𝜃 (𝑆ℳ1, 𝑆ℳ2) ≤ by (2.9)

≤ (∥𝑆∥ ∥𝑆−1∥)2 𝜃 (ℳ1+𝒬2,ℳ2+𝒬2). □

The complex representation 𝜔 keeps the gap between subspaces within uni-versal bounds (for fixed 𝑛):

𝑐𝑛,𝑛𝜃 (𝜔 (𝒰), 𝜔 (𝒱)) ≤ 𝜃 (𝒰 ,𝒱) ≤ 𝐶𝑛,𝑛𝜃 (𝜔 (𝒰), 𝜔 (𝒱)) (2.10)

for all subspaces 𝒰 ,𝒱 ⊆ H𝑛, where the positive constants 𝑐𝑛,𝑛, 𝐶𝑛,𝑛 are taken from(2.1). Indeed, letting 𝑢1, . . . , 𝑢𝑘 and 𝑣1, . . . , 𝑣ℓ be orthonormal bases for 𝒰 and 𝒱 ,respectively, we have (see Proposition 2.2)

𝜃(𝒰 ,𝒱) = ∥[𝑢1 . . . 𝑢𝑘 ]∗[𝑢1 . . . 𝑢𝑘 ]− [ 𝑣1 . . . 𝑣ℓ ]∗[ 𝑣1 . . . 𝑣ℓ ]∥ ,𝜃(𝜔 (𝒱), 𝜔 (𝒱)) = ∥[𝜔 (𝑢1) . . . 𝜔 (𝑢𝑘) ]∗[𝜔 (𝑢1) . . . 𝜔 (𝑢𝑘) ]

− [𝜔 (𝑣1) . . . 𝜔 (𝑣ℓ) ]∗[𝜔 (𝑣1) . . . 𝜔 (𝑣ℓ) ]∥,and so (2.10) follows from (2.1).

The rank of 𝐴 ∈ H𝑚×𝑛 is, by definition, the dimension of the range of 𝐴, orequivalently, the dimension of the column space (understood as a right quaternionvector space) of 𝐴.

For 𝐴 ∈ H𝑚×𝑛, the pseudoinverse, or Moore–Penrose inverse, is defined as amatrix 𝐴+ ∈ H𝑛×𝑚 that is the unique solution of the following system of equations:

𝐴𝐴+𝐴 = 𝐴, 𝐴+𝐴𝐴+ = 𝐴+, (𝐴𝐴+)∗ = 𝐴𝐴+, (𝐴+𝐴)∗ = 𝐴+𝐴.

Let

𝒯𝑚,𝑛,𝑘 := {𝐴 ∈ H𝑚×𝑛 : rank𝐴 = 𝑘}, 𝑘 = 1, 2, . . . ,min{𝑚,𝑛}be sets of quaternion matrices of fixed rank.

226 L. Rodman

Theorem 2.5. The pseudoinverse is a (local) Lipschitz function on each of the sets𝒯𝑚,𝑛,𝑘, namely: Given 𝐴 ∈ H𝑚×𝑛, there exist positive constants 𝛿, 𝐾 dependingon 𝐴 only such that

∥𝐵+ −𝐴+∥ ≤ 𝐾∥𝐵 −𝐴∥for all 𝐵 ∈ 𝒯𝑚,𝑛,𝑘 with ∥𝐵 −𝐴∥ ≤ 𝛿.

For a proof see [16, 14] for the complex matrices; it can be extended easilyto quaternion matrices using the complex representation 𝜔.

Formulas for orthogonal projections on intersections and sums of subspacescan be given in terms of the relevant pseudoinverses:

Proposition 2.6. Let ℳ,𝒩 be subspaces in H𝑛. Then

𝑃ℳ∩𝒩 = 2𝑃𝒩 (𝑃𝒩 +𝑃ℳ)+𝑃ℳ and 𝑃ℳ+𝒩 = (𝑃ℳ+𝑃𝒩 )(𝑃ℳ+𝑃𝒩 )+. (2.11)

This result is proved in [9] for complex matrices; extension to quaternionmatrices is immediate (same proofs apply). For the second formula in (2.11) seealso [1].

Next, we discuss briefly root subspaces of quaternionic matrices. Let 𝐴 ∈H𝑛×𝑛, and let 𝑝(𝐴)(𝑥) be the minimal polynomial with real coefficients and leadingcoefficient 1 for 𝐴; in other words, 𝑝(𝐴)(𝑥) is the monic real polynomial of minimaldegree such that 𝑝(𝐴)(𝐴) = 0. One easily verifies that the (real and complex) rootsof 𝑝(𝐴)(𝑥) are exactly the eigenvalues of 𝐴 that belong to C. Write

𝑝(𝐴)(𝑥) = 𝑝1(𝑥)𝑚1 ⋅ ⋅ ⋅ 𝑝𝑘(𝑥)𝑚𝑘 ,

where the 𝑝𝑗(𝑥)’s are distinct monic irreducible real polynomials (i.e., of the form𝑥− 𝑎, 𝑎 real, or of the form 𝑥2+ 𝑝𝑥+ 𝑞, 𝑝, 𝑞 ∈ R, with no real roots), and the 𝑚𝑗 ’sare positive integers. The subspace

ℳ𝑗 := {𝑢 : 𝑢 ∈ H𝑛, 𝑝𝑗(𝐴)𝑚𝑗𝑢 = 0}, 𝑗 = 1, 2, . . . , 𝑘,

is called the root subspace of 𝐴 corresponding to the roots of 𝑝𝑗(𝑥). Obviously,the root subspaces of 𝐴 are 𝐴-invariant. We refer the reader to [13] for someelementary properties of the minimal polynomials and subspaces. In particular,

ℳ =∑𝑘

𝑗=1(ℳ∩ℳ𝑗) for every 𝐴-invariant subspace ℳ. Also, root subspaces,and more generally, their sums, are Lipschitz functions of a matrix:

Proposition 2.7.(a) The roots of 𝑝(𝐴)(𝑥) depend continuously on 𝐴: Fix 𝐴 ∈ H𝑛×𝑛, and let

𝜆1, . . . , 𝜆𝑠 be all the distinct roots of 𝑝(𝐴)(𝑥) in the closed upper complexhalf-plane C+. Then for every 𝜖 > 0 there exists 𝛿 > 0 such that if 𝐵 ∈ H𝑛×𝑛

satisfies ∥𝐵 − 𝐴∥ < 𝛿, then the roots of 𝑝(𝐵)(𝑥) in C+ are contained in theunion

∪𝑠𝑗=1{𝑧 ∈ C+ : ∣𝑧 − 𝜆𝑗 ∣ < 𝜖}.

(b) Sums of root subspaces of 𝐴 are Lipschitz functions of 𝐴: Given 𝐴 and𝜆1, . . . , 𝜆𝑠 as in part (a), there exist 𝛿0,𝐾0 > 0 such that for every 𝐵 ∈ H𝑛×𝑛

satisfying ∥𝐵−𝐴∥ < 𝛿0, it holds that if 𝑇 is any nonempty subset of 𝜆1, . . . , 𝜆𝑠


and 𝑇 ′ is the set of all eigenvalues of 𝐵 contained in ∪𝑗∈𝑇 {𝑧 ∈ C+ : ∣𝑧−𝜆𝑗 ∣ <𝛿0}, then the sum of root subspaces ℳ′ of 𝐵 corresponding to 𝑇 ′∪𝑇 ′ and thesum of root subspaces ℳ of 𝐴 corresponding to 𝑇 ∪ 𝑇 satisfy the inequality

𝜃(ℛ,ℛ′) ≤ 𝐾0 ∥𝐵 −𝐴∥. (2.12)

For the reader’s convenience, we quote the main result (Theorem 2.8 below)on stability of invariant subspaces from [13], where a complete and detailed proofis given.

Let 𝐴 ∈ H𝑛×𝑛. An 𝐴-invariant subspaceℳ⊆ H𝑛 is called stable if for every𝜖 > 0 there exists 𝛿 > 0 such that every 𝐵 ∈ H𝑛×𝑛 satisfying ∥𝐵 − 𝐴∥ < 𝛿 has a𝐵-invariant subspace 𝒩 ⊆ H𝑛 with the property that 𝜃(ℳ,𝒩 ) < 𝜖. For a fixed𝛼 ≥ 1, an 𝐴-invariant subspace ℳ is called 𝛼-stable, if there exist 𝛿0,𝐾0 > 0such that for every 𝐵 ∈ H𝑛×𝑛 satisfying ∥𝐵 − 𝐴∥ ≤ 𝛿0 there exists a 𝐵-invariantsubspace 𝒩 with the property that

𝜃(ℳ,𝒩 ) ≤ 𝐾0∥𝐵 −𝐴∥1/𝛼. (2.13)

Noting that 𝜃(ℳ,𝒩 ) ≤ 1 for all subspaces ℳ,𝒩 ∈ H𝑛, an equivalent definitionof 𝛼-stable 𝐴-invariant subspace ℳ is obtained by requiring that there exists𝐾 ′0 > 0 such that for every 𝐵 ∈ H𝑛×𝑛 there is a 𝐵-invariant subspace 𝒩 satisfyinginequality

𝜃(ℳ,𝒩 ) ≤ 𝐾 ′0∥𝐵 −𝐴∥1/𝛼. (2.14)

Indeed, if (2.13) holds for all 𝐵 ∈ H𝑛×𝑛 with ∥𝐵−𝐴∥ ≤ 𝛿0, then (2.14) holds with𝐾 ′0 = max{𝐾0, 𝛿

−1/𝛼0 } for all 𝐵 ∈ H𝑛×𝑛 where in the case ∥𝐵 − 𝐴∥ > 𝛿0 we take

any 𝐵-invariant subspace for 𝒩 .1-stable 𝐴-invariant subspaces are called Lipschitz stable.For two positive integers 𝑘 < 𝑝, define

𝑠(𝑘, 𝑝) =

{𝑝 if no 𝑘 distinct 𝑝th roots of 1 sum up to zero;𝑝− 1 if there are 𝑘 distinct 𝑝th roots of 1 that sum up to zero.

(2.15)In the following theorem as well as in later statements, the following property

of a matrix 𝐴 ∈ H𝑛×𝑛 and its invariant subspaceℳ will be used:

(ℵ(𝐴,ℳ)) the intersection ofℳ with any root subspace

ℳ𝑗 of 𝐴 corresponding to a real eigenvalue satisfies

dim (ℳ∩ℳ𝑗) ≤ 1 or dim (ℳ∩ℳ𝑗) ≥ dimℳ𝑗 − 1. (2.16)

A root subspaceℳ𝑗 of 𝐴 ∈ H𝑛×𝑛 is said to be of geometric multiplicity oneif there is only one eigenvector (up to scaling) of 𝐴 inℳ𝑗.

Theorem 2.8. Let 𝐴 ∈ H𝑛×𝑛, and let ℳ ∕= {0} be an 𝐴-invariant subspace. Then:

(a) ℳ is Lipschitz stable if and only if ℳ is a sum of root subspaces.(b) ℳ is stable if and only if for every root subspace ℳ𝑗 of 𝐴 that contains at

least two linearly independent eigenvectors of 𝐴, we have ℳ∩ℳ𝑗 = {0} orℳ𝑗 ⊆ℳ.

228 L. Rodman

(c) if ℳ is 𝛼-stable, then ℳ is stable and for every root subspace ℳ𝑗 of 𝐴 ofgeometric multiplicity one and such that {0} ∕= ℳ ∩ℳ𝑗 ∕= ℳ𝑗, we have𝛼 ≥ 𝑠(dim (ℳ∩ℳ𝑗), dimℳ𝑗).

Conversely, assume (ℵ(𝐴,ℳ)) holds. Then, if ℳ is stable, and if for every rootsubspace ℳ𝑗 of 𝐴 of geometric multiplicity one and such that {0} ∕=ℳ∩ℳ𝑗 ∕=ℳ𝑗, we have 𝛼 ≥ 𝑠(dim (ℳ∩ℳ𝑗), dimℳ𝑗), then ℳ is 𝛼-stable.

Remark 2.9. The result of Theorem 2.8 holds for complex matrices and complexinvariant subspaces without the hypothesis (ℵ(𝐴,ℳ)); see [3, 5, 12].

Open Problem 2.10. Is the converse statement in Theorem 2.8(c) valid under thefollowing hypothesis (ℵ0(𝐴,ℳ)) which is weaker than (ℵ(𝐴,ℳ))?

(ℵ0(𝐴,ℳ)) the intersection ofℳ with any root subspace

ℳ𝑗 of geometric multiplicity one of 𝐴 corresponding to a real eigenvaluesatisfies

dim (ℳ∩ℳ𝑗) ≤ 1 or dim (ℳ∩ℳ𝑗) ≥ dimℳ𝑗 − 1.Analogous question with respect to Theorem 3.3(a) and Theorem 6.1.

3. Main results

Let 𝐴 ∈ H𝑛×𝑛, and let let 𝜆1, . . . , 𝜆𝑠 be all the distinct roots of 𝑝(𝐴)(𝑥) in the

closed upper complex half-plane C+. Select 𝜖0 > 0 so that the disks

𝐷𝜖0(𝜆𝑗) := {𝑧 ∈ C : ∣𝑧 − 𝜆𝑗 ∣ < 𝜖0}, 𝑗 = 1, 2, . . . , 𝑠,

do not intersect. By Proposition 2.7, there exists 𝛿0 > 0 such that if 𝐵 ∈ H𝑛×𝑛,∥𝐵−𝐴∥ < 𝛿0, then all roots of 𝑝(𝐵)(𝑥) in the closed upper half-plane are containedin ∪𝑠

𝑗=1𝐷𝜖0(𝜆𝑗).Fix 𝛼 ≥ 1. An 𝐴-invariant subspaceℳ⊆ H𝑛 is said to be strongly 𝛼-stable

if there exist 𝜖 ≤ 𝛿0, 𝐾 > 0 (that depend on 𝐴 and ℳ only) with the followingproperty: For every 𝐵 ∈ H𝑛×𝑛 such that ∥𝐵 − 𝐴∥ < 𝜖, and every 𝐵-invariantsubspace 𝒩 such that

dim(𝒩 ∩ℛ𝑗(𝐵)) = dim(ℳ∩ℛ𝑗(𝐴)), 𝑗 = 1, 2, . . . , 𝑠, (3.1)

the inequality

𝜃(𝒩 ,ℳ) ≤ 𝐾∥𝐵 −𝐴∥1/𝛼holds; here we denote by ℛ𝑗(𝑋) the sum of the root subspaces of the matrix 𝑋

corresponding to the roots of its minimal polynomial in the set 𝐷𝜖(𝜆𝑗) ∪𝐷𝜖(𝜆𝑗),𝑗 = 1, 2, . . . , 𝑠. Note that 𝐵-invariant subspaces 𝒩 such that (3.1) holds do exist;indeed, it follows from the condition ∥𝐵 − 𝐴∥ < 𝜖 ≤ 𝛿0 and the properties of 𝛿0(see the first paragraph of this section) that

dimℛ𝑗(𝐵) = dimℛ𝑗(𝐴), 𝑗 = 1, 2, . . . , 𝑠;

then existence of such 𝒩 becomes obvious.


When 𝛼 = 1, we say that strongly 1-stable 𝐴-invariant subspace is stronglyLipschitz stable. Finally, an 𝐴-invariant subspace ℳ ⊆ H𝑛 is said to be stronglystable if for every 𝜖 > 0 there exists 𝛿 > 0 (which can be taken ≤ 𝛿0) such thatfor every 𝐵 ∈ H𝑛×𝑛 satisfying ∥𝐵 − 𝐴∥ < 𝛿 and every 𝐵-invariant subspace 𝒩satisfying (3.1) the inequality 𝜃(ℳ,𝒩 ) < 𝜖 holds. Again, 𝐵-invariant subspaces𝒩 satisfying (3.1) do exist (if 𝛿 is taken sufficiently small).

Clearly, strong 𝛼-stability (for any 𝛼 ≥ 1) implies strong stability; the con-verse is generally false (see Theorem 3.3 below). The concepts of strong 𝛼-stabilityand strong stability for invariant subspaces of complex matrices were introducedand studied in [12] and for a particular situation in [10].

Proposition 3.1. If an 𝐴-invariant subspace is strongly 𝛼-stable, then it is also𝛼-stable. If an 𝐴-invariant subspace is strongly stable, then it is stable.

Proof. The result follows from the already mentioned fact that the set of 𝐵-invariant subspaces 𝒩 for which (3.1) holds is nonempty provided 𝐵 is sufficientlyclose to 𝐴 (cf. Proposition 2.7(b) and Theorem 2.4(1)). □

The following characterization of strong 𝛼-stability is proved in [12] for com-plex matrices. The definition of strongly 𝛼-stable invariant subspaces in the contextof complex matrices is analogous to the above definition, with all considerationsrestricted to complex matrices and invariant subspaces in C𝑛.

Theorem 3.2. Let 𝐴 ∈ C𝑛×𝑛. An 𝐴-invariant subspaceℳ⊆ C𝑛 is strongly 𝛼-stablein the context of complex matrices if and only if for every (complex) eigenvalue 𝜆of 𝐴 that has at least two linearly independent associated (complex) eigenvectors,we have

ℳ∩ {𝑣 ∈ C𝑛 : (𝐴− 𝜆𝐼)𝑛𝑣 = 0} = {0}or

ℳ⊇ {𝑣 ∈ C𝑛 : (𝐴− 𝜆𝐼)𝑛𝑣 = 0},and for every (complex) eigenvalue 𝜆 of 𝐴 that has a unique up to scaling (complex)eigenvector and such that

{0} ∕=ℳ∩{𝑣 ∈ C𝑛 : (𝐴− 𝜆𝐼)𝑛𝑣 = 0} ∕= {𝑣 ∈ C𝑛 : (𝐴− 𝜆𝐼)𝑛𝑣 = 0}we have

𝛼 ≥ dim {𝑣 ∈ C𝑛 : (𝐴− 𝜆𝐼)𝑛𝑣 = 0}.We prove analogous result in the context of quaternion matrices, imposing

the additional condition (ℵ(𝐴,ℳ)), as necessary:

Theorem 3.3. Let 𝐴 ∈ H𝑛×𝑛, and let ℳ be an 𝐴-invariant subspace. Then:

(a) ifℳ is strongly 𝛼-stable, thenℳ is strongly stable and for every root subspaceℳ𝑗 of 𝐴 of geometric multiplicity one and such that {0} ∕=ℳ∩ℳ𝑗 ∕=ℳ𝑗,we have 𝛼 ≥ dimℳ𝑗.

230 L. Rodman

Conversely, assume property (ℵ(𝐴,ℳ)) holds. Then, if ℳ is strongly stable, andfor every root subspace ℳ𝑗 of 𝐴 of geometric multiplicity one and such that {0} ∕=ℳ∩ℳ𝑗 ∕=ℳ𝑗, we have 𝛼 ≥ dimℳ𝑗, then ℳ is strongly 𝛼-stable.

(b) ℳ is strongly Lipschitz stable if and only ifℳ is a sum of root subspaces of 𝐴.(c) ℳ is strongly stable if and only if ℳ is stable.

Remark 3.4. We do not know whether or not the property (ℵ(𝐴,ℳ)) is essentialin the converse part of Theorem 3.3(a).

The proof of Theorem 3.3 will be given in the next two sections.

4. Preliminaries for the proof of Theorem 3.3

The following fact is the key; it allows us to reduce the proof to the case of justone root subspace.

Fact 4.1. Let 𝐴 ∈ H𝑛×𝑛, and let ℳ be an 𝐴-invariant subspace. Fix 𝛼 ≥ 1.Then ℳ is strongly stable or strongly 𝛼-stable if and only if for every sum of rootsubspaces ℛ ⊆ H𝑛 for 𝐴 the intersection ℳ ∩ ℛ is strongly stable or strongly𝛼-stable, respectively, as an 𝐴∣ℛ-invariant subspace.

We provide details of the proof for the case of strong 𝛼-stability only (thecase of strong stability can be proved analogously). The proof will be accomplishedby proving Steps 1, 2, 3, 4 below (often parallel to the proof of [13, Fact 4.2]).

Let 𝜆1, . . . , 𝜆𝑠 be all the distinct roots of 𝑝(𝐴)(𝑥) in C+, the closed upper

complex half-plane. By Proposition 2.7, there exists 𝛿0 > 0 such that if 𝐵 ∈ H𝑛×𝑛,∥𝐵−𝐴∥ < 𝛿0, then all roots of 𝑝(𝐵)(𝑥) in the closed upper half-plane are containedin ∪𝑠

𝑗=1𝐷𝜖0(𝜆𝑗), where 𝜖0 > 0 is selected so that the disks 𝐷𝜖0(𝜆𝑗), 𝑗 = 1, 2, . . . , 𝑠,do not intersect.

Assuming 𝑋 ∈ H𝑛×𝑛 satisfies ∥𝑋 − 𝐴∥ < 𝛿0, we let ℛ𝑗(𝑋) be the sum ofroot subspaces of 𝑋 corresponding to the roots of the minimal polynomial of 𝑋in 𝐷𝜖0(𝜆𝑗) ∪𝐷𝜖0(𝜆𝑗), 𝑗 = 1, 2, . . . , 𝑠.

Fix ℛ, a sum of root subspaces for 𝐴, and let ℛ1, . . . ,ℛ𝑟 be all the rootsubspaces of 𝐴 contained in ℛ. We denote by 𝐾0,𝐾1, . . . positive constants thatdepend on 𝐴 and 𝒩 only.

Step 1. If an 𝐴-invariant subspace 𝒩 ⊆ ℛ is strongly 𝛼-stable as a 𝐴∣ℛ-invariantsubspace, then 𝒩 is also strongly 𝛼-stable as an 𝐴-invariant subspace.

Proof of Step 1. Suppose not. Then there exists a sequence {𝑆𝑚}∞𝑚=1, 𝑆𝑚 ∈ H𝑛×𝑛,such that ∥𝑆𝑚 − 𝐴∥ < 𝑚−1, 𝑚 = 1, 2, . . . , and for some 𝑆𝑚-invariant subspace𝒩𝑚 such that

dim (𝒩𝑚 ∩ℛ𝑗(𝑆𝑚)) = dim (𝒩 ∩ℛ𝑗(𝐴)), 𝑗 = 1, 2, . . . , 𝑠, (4.1)

we have

𝜃(𝒩𝑚,𝒩 ) ≥ 𝑚∥𝑆𝑚 −𝐴∥1/𝛼, 𝑚 = 1, 2, . . . . (4.2)


Denote

ℛ𝑚 =

𝑟∑𝑗=1

ℛ𝑗(𝑆𝑚), 𝑚 = 1, 2, . . . .

For sufficiently large 𝑚, the subspace ℛ𝑚 is a direct complement of ℛ⊥, theorthogonal complement of ℛ in H𝑛 ((2.12) and Theorem 2.4(2)). For such 𝑚, wedefine the linear transformation 𝑈𝑚 : H𝑛 → H𝑛 by

𝑈𝑚 =

[𝐼 (−𝑃ℛ𝑚,ℛ⊥ + 𝑃ℛ)∣ℛ0 𝐼

],

with respect to the orthogonal decomposition H𝑛 = ℛ⊥ ⊕ ℛ, where 𝑃ℛ𝑚,ℛ⊥ is

the projection on ℛ𝑚 along ℛ⊥. ((−𝑃ℛ𝑚,ℛ⊥ + 𝑃ℛ)∣ℛ is known as the angular

operator, cf. [2].) Clearly, 𝑈𝑚ℛ⊥ = ℛ⊥. Also, 𝑈𝑚ℛ𝑚 = ℛ. Now∥𝑈𝑚 − 𝐼∥ = ∥(−𝑃ℛ𝑚,ℛ⊥ + 𝑃ℛ)∣ℛ∥ ≤ ∥(−𝑃ℛ𝑚,ℛ⊥ + 𝑃ℛ)∥

(by Theorem2.4(2)) ≤ 𝐾0𝜃 (ℛ𝑚,ℛ)(by Proposition 2.7(b)) ≤ 𝐾1∥𝑆𝑚 −𝐴∥.

(4.3)

Next, let 𝑇𝑚 = 𝑈𝑚𝑆𝑚𝑈−1𝑚 ; then 𝑈𝑚𝒩𝑚 and ℛ are 𝑇𝑚-invariant. It is easy to see

(in view of (4.3)) that∥𝑇𝑚 −𝐴∥ ≤ 𝐾2∥𝑆𝑚 −𝐴∥. (4.4)

Now

𝜃(𝑈𝑚𝒩𝑚,𝒩𝑚) = max

{sup

𝑥∈𝒩𝑚, ∥𝑥∥=1

𝑑(𝑥, 𝑈𝑚𝒩𝑚), sup𝑦∈𝑈𝑚𝒩𝑚, ∥𝑦∥=1

𝑑(𝑦,𝒩𝑚)

}≤ max{∥𝑈𝑚 − 𝐼∥, ∥𝑈−1𝑚 − 𝐼∥} ≤ 𝐾3∥𝑆𝑚 −𝐴∥,

and𝜃(𝑈𝑚𝒩𝑚,𝒩 ) ≥ 𝜃(𝒩𝑚,𝒩 )− 𝜃(𝑈𝑚𝒩𝑚,𝒩𝑚)

≥ 𝑚∥𝑆𝑚 − 𝐴∥1/𝛼 −𝐾3∥𝑆𝑚 −𝐴∥≥ (𝑚−𝐾3)∥𝑆𝑚 −𝐴∥1/𝛼

≥ (𝑚−𝐾3)𝐾−1/𝛼2 ∥𝑇𝑚 −𝐴∥1/𝛼.

(4.5)

Restricting 𝑗 so that ℛ𝑗 ⊆ ℛ, in view of (4.1) and (4.5), a contradiction withstrong 𝛼-stability of 𝒩 as an 𝐴∣ℛ-invariant subspace is obtained. □

Step 2. If an 𝐴-invariant subspace 𝒩 ⊆ ℛ is strongly 𝛼-stable as an 𝐴-invariantsubspace, then 𝒩 is also strongly 𝛼-stable as an 𝐴∣ℛ-invariant subspace.

Proof of Step 2. Suppose not. Then there exists a sequence {𝑆𝑚,ℛ}∞𝑚=1, 𝑆𝑚,ℛ alinear transformation on ℛ, and there exists an 𝑆𝑚,ℛ-invariant subspace 𝒩𝑚, withthe following properties:

(1) ∥𝑆𝑚,ℛ −𝐴∣ℛ∥ < 𝑚−1, 𝑚 = 1, 2, . . .;(2) dim (𝒩𝑚 ∩ℛ𝑗(𝑆𝑚,ℛ)) = dim (𝒩 ∩ℛ𝑗(𝐴)) for 𝑗 = 1, 2, . . . , 𝑟;

(3) 𝜃(𝒩𝑚,𝒩 ) ≥ 𝑚∥𝑆𝑚,ℛ −𝐴∣ℛ∥1/𝛼 for 𝑚 = 1, 2, . . ..

232 L. Rodman

As in the proof of Step 2 of [13, Fact 4.2], let

𝑆𝑚 =

[𝐴∣ℛ𝑐 00 𝑆𝑚,ℛ

], 𝑚 = 1, 2, . . . ,

with respect to the decomposition H𝑛 = ℛ𝑐+ℛ, where ℛ𝑐 is the sum of rootsubspaces of 𝐴 not contained in ℛ. Then ∥𝑆𝑚 − 𝐴∥ = 𝐾4∥𝑆𝑚,ℛ − 𝐴∣ℛ∥, and𝑆𝑚 → 𝐴 as 𝑚→∞. By the strong 𝛼-stability of 𝒩 , we have, for sufficiently large𝑚, and every 𝑆𝑚-invariant subspace 𝒩 ′𝑚:

dim (𝒩 ′𝑚 ∩ℛ𝑗(𝑆𝑚)) = dim (𝒩 ∩ℛ𝑗(𝐴)), 𝑗 = 1, 2, . . . , 𝑠

=⇒ 𝜃(𝒩 ′𝑚,𝒩 ) ≤ 𝐾5∥𝑆𝑚 −𝐴∥1/𝛼

≤ 𝐾5𝐾1/𝛼4 ∥𝑆𝑚,ℛ −𝐴∣ℛ∥1/𝛼.

(4.6)

Applying (4.6) with 𝒩 ′𝑚 = 𝒩𝑚, we obtain a contradiction with item (3). □

Step 3. If an 𝐴-invariant subspace 𝒩 is strongly 𝛼-stable, then 𝒩 ∩ℛ is strongly𝛼-stable as an 𝐴∣ℛ-invariant subspace.

Proof of Step 3. Let 𝑆 : ℛ → ℛ be a linear transformation sufficiently close to𝐴∣ℛ, and let 𝒩 ′ be an 𝑆-invariant subspace such that

dim (𝒩 ′ ∩ℛ𝑗(𝑆)) = dim ((𝒩 ∩ℛ) ∩ℛ𝑗(𝐴)), 𝑗 = 1, 2, . . . , 𝑟. (4.7)

Define 𝑆𝑥 = 𝑆𝑥 if 𝑥 ∈ ℛ, and 𝑆𝑥 = 𝐴𝑥 if 𝑥 ∈ ℛ𝑐. By the strong 𝛼-stability of 𝒩we have

𝜃(𝒩 ,𝒩 ′+(𝒩 ∩ℛ𝑐)) ≤ 𝐾6∥𝑆 −𝐴∥1/𝛼.It is easy to see that

∥𝑆 −𝐴∥ ≤ 𝐾7∥𝑆 −𝐴∣ℛ∥.Now

𝜃(𝒩 ∩ℛ,𝒩 ′) ≤ by Theorem 2.4(4), with𝒬1 = ℛ,𝒬2 = 𝒩 ∩ℛ𝑐,

≤ 𝐾8 𝜃(𝒩 ,𝒩 ′+(𝒩 ∩ℛ𝑐)) ≤ 𝐾8𝐾6𝐾1/𝛼7 ∥𝑆 −𝐴ℛ∥1/𝛼,

and strong 𝛼-stability of 𝒩 ∩ℛ follows. □

Step 4. Let ℛ1, . . . ,ℛ𝑠 be all the distinct root subspaces of 𝐴, and assume that an𝐴-invariant subspace 𝒩 is such that every intersection 𝒩 ∩ℛ𝑗 is strongly 𝛼-stableas an 𝐴∣ℛ𝑗 -invariant subspace, 𝑗 = 1, 2, . . . , 𝑠. Then 𝒩 is strongly 𝛼-stable as an𝐴-invariant subspace.

Proof of Step 4. By Step 1,𝒩∩ℛ𝑗 is strongly 𝛼-stable as an 𝐴-invariant subspace,for 𝑗 = 1, 2, . . . , 𝑠. Now, repeat the arguments in the proof of [13, Step 4 of Fact4.2], using [13, Theorem 2.12]. □


5. Proof of Theorem 3.3

Part (b) of Theorem 3.3 follows easily from (a). Indeed, ifℳ is strongly Lipschitzstable, then by part (a) we have that ℳ is a sum of root subspaces. Conversely,assume ℳ is a sum of root subspaces. By Fact 4.1 it follows that ℳ is stronglyLipschitz stable.

Consider now part (a). In view of Fact 4.1 we may (and do) assume that H𝑛

is a root subspace for 𝐴. If H𝑛 contains two linearly independent eigenvectors of𝐴, then by Theorem 2.8 a nontrivial (i.e., not equal to {0} or to H𝑛) 𝐴-invariantsubspace cannot be stable, hence it cannot be strongly stable by Proposition 3.1,and we are done in this case: There are no nontrivial strongly 𝛼-stable 𝐴-invariantsubspaces.

Thus, suppose that 𝐴 has only one eigenvector (up to scaling). Using Theorem2.1, we may assume without loss of generality that 𝐴 = 𝐽𝑛(𝜆), where 𝜆 ∈ C hasnonnegative imaginary part. Then part (a) amounts to the following two state-ments; here and in the sequel we denote by 𝑒𝑝 the vector having 1 in the 𝑝thposition and zeros elsewhere:

Statement 5.1. Assume (ℵ(𝐴,ℳ)) holds. Then there exist 𝛿,𝐾 > 0 such that forevery 𝐵 ∈ H𝑛×𝑛 with ∥𝐵 −𝐴∥ < 𝛿 and every 𝑘-dimensional 𝐵-invariant subspace𝒩 ⊆ H𝑛 we have

𝜃(𝒩 , Span {𝑒1, . . . , 𝑒𝑘}) ≤ 𝐾∥𝐵 −𝐴∥1/𝑛. (5.1)

Statement 5.2. For every 𝑘 = 1, 2, . . . , 𝑛 − 1, and for every 𝛼 < 𝑛, there exists asequence {𝐵𝑚}∞𝑚=1, 𝐵𝑚 ∈ H𝑛×𝑛, and there exists a 𝑘-dimensional 𝐵𝑚-invariantsubspace ℳ𝑚 ⊆ H𝑛, 𝑚 = 1, 2, . . ., such that lim𝑚→∞𝐵𝑚 = 𝐴 and

𝜃(ℳ𝑚, Span {𝑒1, . . . , 𝑒𝑘}) ≥ 𝑚∥𝐵 −𝐴∥1/𝛼.Proof of Statement 5.1. Consider first the case when 𝜆 is nonreal. We use thecomplex representation 𝜔 of quaternion matrices; note that

𝑃𝜔(𝐴)𝑃−1 =[𝐽𝑛(𝜆) 0

0 𝐽𝑛(𝜆)

]∈ C2𝑛×2𝑛,

for a suitable permutation matrix 𝑃 .By Theorem 3.2 the following claim holds true: there exist 𝛿′,𝐾 ′ > 0 such

that for every 𝐵′ ∈ C2𝑛×2𝑛 with ∥𝐵′−𝑃𝜔(𝐴)𝑃−1∥ < 𝛿′ and every 2𝑘-dimensional𝐵′-invariant subspace ℳ′ ⊆ C2𝑛 we have

𝜃(ℳ′, Span {𝑒1, . . . , 𝑒ℓ, 𝑒𝑛+1, . . . 𝑒𝑛+2𝑘−ℓ}) ≤ 𝐾 ′∥𝐵′ − 𝑃𝜔(𝐴)𝑃−1∥1/𝑛 (5.2)

for some ℓ, max{2𝑘−𝑛, 0} ≤ ℓ ≤ min{𝑛, 2𝑘}, which may depend on ℳ′. Note thefact that, for 𝐵 ∈ H𝑛×𝑛 and its invariant subspace 𝒩 , the subspace 𝜔(𝒩 ) ⊆ C2𝑛

(see Proposition 2.2 for the definition) is 𝜔(𝐵)-invariant. We apply the claim tomatrices 𝐵′ of the form 𝐵′ = 𝑃𝜔(𝐵)𝑃−1 for some 𝐵 ∈ H𝑛×𝑛 and their invariantsubspaces of the form

ℳ′ = 𝑃 (Col (𝜔(𝑣1), . . . , 𝜔(𝑣𝑘))) ,

234 L. Rodman

where Col (𝜔(𝑣1), . . . , 𝜔(𝑣𝑘)) is subspace spanned by the columns of {𝜔(𝑣1), . . .,𝜔(𝑣𝑘)}, and {𝑣1, . . . , 𝑣𝑘} is a basis of a 𝐵-invariant subspace f𝒩 . In view of Propo-sition 2.3, the Jordan form of such matrices 𝐵′ restricted toℳ′ is symmetric withrespect to the real axis, so in fact it must be ℓ = 𝑘 in (5.2). We now have, for a𝐵-invariant 𝑘-dimensional subspace 𝒩 ⊆ H𝑛:

𝜃(𝒩 , Span {𝑒1, . . . , 𝑒𝑘}) (by Proposition 2.2 and 2.1)

≤ 𝐶𝑛,𝑛𝜃(𝜔 (𝒩 ), 𝜔 (Span {𝑒1, . . . , 𝑒𝑘})= 𝐶𝑛,𝑛𝜃(𝑃 (𝜔 (𝒩 )), 𝑃 (𝜔 (Span {𝑒1, . . . , 𝑒𝑘}))≤ (by (5.2)) 𝐶𝑛,𝑛𝐾

′∥𝜔 (𝐵)− 𝜔 (𝐴)∥1/𝑛≤ (by (2.1)) 𝑐−1𝑛,𝑛𝐶𝑛,𝑛𝐾

′∥𝐵 −𝐴∥1/𝑛,and Statement 5.1 follows for the case of a nonreal 𝜆. Consider the case when 𝜆 isreal; then the result follows from Theorem 2.8. □

Proof of Statement 5.2. In view of Theorem 3.2, there are sequences of matrices{𝐵𝑚}∞𝑚=1, 𝐵𝑚 ∈ C𝑛×𝑛 and of complex subspaces {ℳ𝑚}∞𝑚=1,ℳ𝑚 ⊆ C𝑛 with therequired properties. □

Finally, consider (c). Assume thatℳ is stable. We need to prove thatℳ isstrongly stable. By Fact 4.1, we may assume that H𝑛 is the (sole) root subspace of𝐴. Ignoring the trivial casesℳ = {0} andℳ = H𝑛, in view of description of non-trivial stable 𝐴-invariant subspaces (Theorem 2.8(b)) we may further assume that𝐴 = 𝐽(𝜆) for some 𝜆 ∈ C. Arguing by contradiction, assume ℳ is not stronglystable. Letting 𝑘 = dimℳ, there exists 𝜖0 > 0 such that for some sequence{𝐵𝑚}∞𝑚=1, 𝐵𝑚 ∈ H𝑛×𝑛, we have ∥𝐵𝑚 − 𝐴∥ < 𝑚−1 and 𝜃(ℳ𝑚,ℳ) ≥ 𝜖0 forsome 𝐵𝑚-invariant subspace ℳ𝑚 of dimension 𝑘. Passing to a subsequence, wemay assume that {ℳ𝑚} converges in the gap norm: lim𝑚→∞ ℳ𝑚 = 𝒩 for somesubspace 𝒩 . We then have 𝜃(ℳ,𝒩 ) ≥ 𝜖0 and dim𝒩 = 𝑘.

Moreover, 𝒩 is 𝐴-invariant. Indeed, let 𝑢1, . . . , 𝑢𝑘 be a basis for 𝒩 . Thenthere exist 𝑢𝑚,1, . . . , 𝑢𝑚,𝑘 such that 𝑢𝑚,ℓ ∈ℳ𝑚, 𝑚 = 1, 2, . . ., and

lim𝑚→∞𝑢𝑚,ℓ = 𝑢ℓ, ℓ = 1, 2, . . . , 𝑘.

Clearly, for sufficiently large 𝑚 (which will be assumed), the vectors𝑢𝑚,1, . . . , 𝑢𝑚,𝑘 form a basis forℳ𝑚. Sinceℳ𝑚 is 𝐵𝑚-invariant, we have

𝐵𝑚 [𝑢𝑚,1 . . . 𝑢𝑚,𝑘] = [𝑢𝑚,1 . . . 𝑢𝑚,𝑘] 𝑉𝑚 (5.3)

for some matrix 𝑉𝑚. In fact,

𝑉𝑚 = (𝑄∗𝑚𝑄𝑚)−1𝐵𝑚𝑄𝑚,

where 𝑄𝑚 = [𝑢𝑚,1 . . . 𝑢𝑚,𝑘], and the invertibility of 𝑄∗𝑚𝑄𝑚 follows from thelinear independence of the columns of 𝑄𝑚. Passing to the limit when 𝑚 → ∞ in(5.3), the 𝐴-invariance of 𝒩 follows.

But now we obtain a contradiction, because 𝐴 has only one invariant subspaceof the fixed dimension 𝑘 (cf. [13, Proposition 2.10]).


6. Strong 𝜶-stability: alternative formulation

In this section we re-cast strong 𝛼-stability property of invariant subspaces in adifferent form that does not involve equalities (3.1), and is more in the spirit of thedefinition of 𝛼-stability. The next theorem provides the alternative formulation forthe strong stability property.

Theorem 6.1. Fix 𝛼 ≥ 1. In the following statements, (1) implies (2) for 𝐴 ∈ H𝑛×𝑛

and an 𝐴-invariant subspace ℳ⊆ H𝑛:

(1) ℳ is strongly 𝛼-stable;(2) there are positive constants 𝛿1, 𝛿

′1,𝐾1 that depend on 𝐴 andℳ only such that

the set of all 𝐵-invariant subspaces 𝒩 for which the inequality 𝜃(𝒩 ,ℳ) ≤ 𝛿′1holds is non-empty for every 𝐵 ∈ H𝑛×𝑛 satisfying∥𝐵 −𝐴∥ ≤ 𝛿1, and

𝜃(𝒩 ,ℳ) ≤ 𝐾1∥𝐵 −𝐴∥1/𝛼 (6.1)

holds for every such 𝒩 .

Assume in addition that the property (ℵ(𝐴,ℳ)) is satisfied. Then the condi-tions (1) and (2) are equivalent.

We do not state a parallel version for strong stability, because by Theorem 3.3strong stability is equivalent to stability, and therefore the definition of stabilitycan be thought of as an alternative version of strong stability.

The rest of this section is devoted to the proof of Theorem 6.1. We need twolemmas.

Lemma 6.2. Let 𝒬1,𝒬2,𝒵1,𝒵2 be subspaces in H𝑛 such that

𝒬1+𝒬2 = H𝑛, 𝒵1 ⊆ 𝒬1, 𝒵2 ⊆ 𝒬2.

Then there exists 𝛿3 > 0 depending on 𝒬1,𝒬2,𝒵1,𝒵2 only with the following prop-erty: If 𝒴,𝒬′1,𝒬′2 ⊆ H𝑛 are subspaces such that

𝜃(𝒴,𝒵1+𝒵2) < 𝛿3, 𝜃(𝒬′1,𝒬1) < 𝛿3, 𝜃(𝒬′2,𝒬2) < 𝛿3,

and

𝒴 = (𝒴 ∩ 𝒬′1) + (𝒴 ∩ 𝒬′2), (6.2)

then

dim (𝒴 ∩ 𝒬′𝑗) = dim𝒵𝑗 , 𝑗 = 1, 2.

Proof. By Theorem 2.4 (2) we have 𝒬′1+𝒬′2 = H𝑛 (if 𝛿3 is taken sufficiently small),hence (𝒴 ∩𝒬′1)+(𝒴 ∩ 𝒬′2) is a direct sum, and

dim𝒵1 + dim𝒵2 = dim𝒴 = dim (𝒴 ∩ 𝒬′1) + dim (𝒴 ∩ 𝒬′2).Thus, it suffices to prove that

dim (𝒴 ∩ 𝒬′𝑗) ≤ dim𝒵𝑗 , 𝑗 = 1, 2.

236 L. Rodman

We prove this for 𝑗 = 1; the case 𝑗 = 2 is analogous. Assume not. Then there existsequences of subspaces {𝒴𝑚,𝒬′1,𝑚,𝒬′2,𝑚}∞𝑚=1 such that

𝒴𝑚 → 𝒵1+𝒵2, 𝒬′𝑗,𝑚 → 𝒬𝑗 , 𝑗 = 1, 2, (6.3)

as 𝑚 → ∞,𝒴𝑚 = (𝒴𝑚 ∩ 𝒬′1,𝑚) + (𝒴𝑚 ∩ 𝒬′2,𝑚), 𝑚 = 1, 2, . . . ,

but

dim (𝒴𝑚 ∩𝒬′1,𝑚) > dim𝒵1, 𝑚 = 1, 2, . . . . (6.4)

Using compactness of the metric space of subspaces in H𝑛, we may assume that thesequence {𝒴𝑚 ∩ 𝒬′1,𝑚}∞𝑚=1 converges to a subspace 𝒲 . Take 𝑥 ∈ 𝒲 . Then thereis a sequence {𝑥𝑚}∞𝑚=1, 𝑥𝑚 ∈ 𝒴𝑚 ∩𝒬′1,𝑚, such that lim𝑚→∞ 𝑥𝑚 = 𝑥. By (6.3) we

also have 𝑥 ∈ 𝒬1, 𝑥 ∈ 𝒵1+𝒵2, hence (because 𝒵1 ⊆ 𝒬1) 𝑥 ∈ 𝒵1. Thus 𝒲 ⊆ 𝒵1, acontradiction with our assumption (6.4) (note that dim (𝒴𝑚 ∩𝒬′1,𝑚) = dim𝒲 forlarge 𝑚 by Theorem 2.4(1)). □

Lemma 6.3. Given 𝐴 andℳ as in Theorem 6.1, for every 𝜖4 > 0 there exists 𝛿4 > 0with the property that for each 𝐵 ∈ H𝑛×𝑛 and for each 𝐵-invariant subspace 𝒩such that

∥𝐵 −𝐴∥ < 𝛿4, 𝜃(𝒩 ,ℳ) < 𝛿4, (6.5)

the inequality

max𝑗=1,2,...,𝑠

{𝜃(𝒩 ∩ℛ𝑗(𝐵),ℳ∩ℛ𝑗(𝐴))} ≤ 𝜖4holds, where the maximum is taken over all sums of root subspaces ℛ1(𝐴), . . .,ℛ𝑠(𝐴) for 𝐴, and where the sums of root subspaces ℛ1(𝐵), . . . ,ℛ𝑠(𝐵) for 𝐵 aresuch that the eigenvalues of 𝐵 to which ℛ𝑗(𝐵) corresponds are in a neighborhoodof the eigenvalues of 𝐴 to which ℛ𝑗(𝐴) corresponds, for 𝑗 = 1, 2, . . . , 𝑠.

Proof. Denote by ℛ𝑗(𝐴)𝑐, resp. ℛ𝑗(𝐵)

𝑐, the sum of root subspaces for 𝐴, resp. 𝐵,

which is a direct complement to ℛ𝑗(𝐴), resp. ℛ𝑗(𝐵), in H𝑛, and let 𝑃ℛ𝑗(𝐴),ℛ𝑗(𝐴)𝑐 ,

resp. 𝑃ℛ𝑗(𝐵),ℛ𝑗(𝐵)𝑐 , be the projection on ℛ𝑗(𝐴) along ℛ𝑗(𝐴)𝑐, resp. on ℛ𝑗(𝐵)

along ℛ𝑗(𝐵)𝑐. In what follows, we denote by 𝐾0,𝐾1, . . . positive constants that

depend only on 𝐴 andℳ. By Theorem 2.4(2)∥∥∥𝑃ℛ𝑗(𝐵),ℛ𝑗(𝐵)𝑐 − 𝑃ℛ𝑗(𝐴),ℛ𝑗(𝐴)𝑐

∥∥∥≤ 𝐾0(𝜃(ℛ𝑗(𝐵),ℛ𝑗(𝐴)) + 𝜃(ℛ𝑗(𝐵)

𝑐,ℛ𝑗(𝐴)𝑐)

≤ 𝐾0𝐾1∥𝐵 −𝐴∥,(6.6)

where the second inequality follows from Proposition 2.7(b). Proposition 2.6 gives:

𝜃(𝒩 ∩ℛ𝑗(𝐵),ℳ∩ℛ𝑗(𝐴))

= 2∥∥𝑃𝒩 (𝑃𝒩 + 𝑃ℛ𝑗(𝐵))

+𝑃ℛ𝑗(𝐵) − 𝑃ℳ(𝑃ℳ + 𝑃ℛ𝑗(𝐴))+𝑃ℛ𝑗(𝐴)

∥∥ ; (6.7)

here 𝒩 is a 𝐵-invariant subspace. On the other hand, taking 𝛿4 sufficiently smalland assuming (6.5) holds, Lemma 6.2 (applied with 𝒬1 = ℛ𝑗(𝐴), 𝒬2 = ℛ𝑗(𝐴)

𝑐,


𝒵1 = ℳ ∩ ℛ𝑗(𝐴), 𝒵2 = ℳ ∩ ℛ𝑗(𝐴)𝑐, 𝒬′1 = ℛ𝑗(𝐵), 𝒬′2 = ℛ𝑗(𝐵)

𝑐, 𝒴 = 𝒩 ),together with (6.6) and Theorem 2.4(1), yields that

dim (𝒩 ∩ℛ𝑗(𝐵)) = dim (ℳ∩ℛ𝑗(𝐴)), 𝑗 = 1, 2, . . . , 𝑠,

and hence

dim (𝒩 +ℛ𝑗(𝐵)) = dim (ℳ +ℛ𝑗(𝐴)), 𝑗 = 1, 2, . . . , 𝑠.

Since the range of 𝑃𝒩 + 𝑃ℛ𝑗(𝐵) is equal to 𝒩 + ℛ𝑗(𝐵) (see [9, Corollary 2], forexample), and similarly for 𝐴, we now have

rank (𝑃𝒩 + 𝑃ℛ𝑗(𝐵)) = rank (𝑃ℳ + 𝑃ℛ𝑗(𝐴)), 𝑗 = 1, 2, . . . , 𝑠.

Using Theorem 2.5, formula (6.7) gives the result of Lemma 6.3. □

Proof of Theorem 6.1, (1)⇒ (2). Assume (1) holds. Then by Lemma 6.3 (taking𝜖4 < 1 and using Theorem 2.4(1)) equalities (3.1) are guaranteed, and we have

𝜃(𝒩 ,ℳ) ≤ 𝐾1∥𝐵 −𝐴∥1/𝛼for every 𝐵-invariant subspace 𝒩 provided the inequality

max{𝜃(𝒩 ,ℳ), ∥𝐵 −𝐴∥} ≤ 𝛿4/2holds. It remains to prove that for some 𝛿1, 0 < 𝛿1 ≤ 𝛿4/2, the set of all 𝐵-invariantsubspaces𝒩 for which the inequality 𝜃(𝒩 ,ℳ) ≤ 𝛿4/2 holds is non-empty for every𝐵 ∈ H𝑛×𝑛 satisfying ∥𝐵−𝐴∥ ≤ 𝛿1. To this end we take advantage of the fact thatunder (1)ℳ is 𝛼-stable. Thus, there exist 𝛿′,𝐾 ′ > 0 such that

∥𝐵 −𝐴∥ ≤ 𝛿′ ⇒ ∃ 𝐵-invariant 𝒩 such that 𝜃(𝒩 ,ℳ) ≤ 𝐾 ′∥𝐵 −𝐴∥1/𝛼.Now take

𝛿1 = min{𝛿′, 𝛿4/2, (𝛿4/(2𝐾 ′))𝛼}. □

Proof of Theorem 6.1 in the complex case. The implication (1) ⇒ (2) is verifiedas in the quaternionic case. Assume now that (2) holds. Thenℳ is in particular𝛼-stable, and using the description of 𝛼-stability in the complex case (Theorem2.8, Remark 2.9), we easily reduce the proof to the case when 𝐴 = 𝐽𝑛(0). Theorem3.2 shows that ℳ is strongly 𝑛-stable. Thus, if 𝛼 ≥ 𝑛, we are done. However, if𝛼 < 𝑛, then (6.1) cannot hold (unlessℳ is trivial:ℳ = {0} orℳ = H𝑛).

Indeed, let 𝑘, 1 ≤ 𝑘 ≤ 𝑛− 1, be the (complex) dimension of

ℳ := Range

[𝐼𝑘

0𝑛−𝑘,𝑘

],

and let 𝜏1, . . . , 𝜏𝑘 be a set of distinct 𝑛th roots of unity that do not sum upto zero. For 𝜂 > 0, let 𝐵𝜂 ∈ C𝑛×𝑛 be the matrix obtained from 𝐴 = 𝐽𝑛(0)by adding 𝜂 in the lower left corner. Clearly, ∥𝐵𝜂 − 𝐴∥ = 𝜂. Letting 𝒩𝜂 to bethe 𝐵𝜂-invariant subspace generated by the eigenvectors 𝐵𝜂 corresponding to the

eigenvalues 𝜏1𝜂1/𝑛, . . . , 𝜏𝑘𝜂

1/𝑛, one verifies that

𝒩𝜂 = Range

[𝐼𝑘𝑋𝜂

],

238 L. Rodman

where 𝑋𝜂 ∈ C(𝑛−𝑘)×𝑘 has 𝜏1𝜂1/𝑛+ ⋅ ⋅ ⋅+ 𝜏𝑘𝜂1/𝑛 in its top left corner (see the proof

of [7, Lemma 16.5.2], also [13, Lemma 4.16]). Now [13, Lemma 5.2(a3)] guaranteesexistence of a constant 𝐾0,1 > 0 which depends on 𝑛 only such that

𝜃 (ℳ,𝒩𝜂) ≥ 𝐾0,1 ∣𝜏1 + ⋅ ⋅ ⋅+ 𝜏𝑘∣ 𝜂1/𝑛 (6.8)

as long as 𝜂 ≤ 1. Letting 𝜂 → 0, a contradiction with (6.1) is obtained. □Proof of Theorem 6.1, (2)⇒ (1). The proof of (2) ⇒ (1), under the additionalhypothesis that (ℵ(𝐴,ℳ)) holds, is essentially the same as in the complex case(use Theorem 3.3(a) instead of Theorem 3.2). □

References

[1] W.N. Anderson, Jr., and R.J. Duffin. Series and parallel addition of matrices, J.Math. Anal. Appl., 26 (1969), 576–594.

[2] H. Bart, I. Gohberg, and M.A. Kaashoek. Minimal factorization of matrix and op-erator functions. Operator Theory: Advances and Applications, Vol 1, Birkhauser,1979.

[3] H. Bart, I. Gohberg, and M.A. Kaashoek. Stable factorizations of monic matrixpolynomials and stable invariant subspaces, Integral Equations Operator Theory, 1(1978), no. 4, 496–517.

[4] H. Bart, I. Gohberg, M.A. Kaashoek, and A.C.M. Ran. Factorization of matrixand operator functions: the state space method, Operator Theory: Advances andApplications, Vol 178, Birkhauser, 2008.

[5] S. Campbell and J. Daughtry. The stable solutions of quadratic matrix equations,Proc. Amer. Math. Soc., 74 (1979), no. 1, 19–23.

[6] I. Gohberg, P. Lancaster, and L. Rodman. Matrix Polynomials, Academic Press,1982; republication, SIAM, 2009.

[7] I. Gohberg, P. Lancaster, and L. Rodman. Invariant Subspaces of Matrices withApplications, John Wiley, 1986; republication, SIAM, 2006.

[8] I. Gohberg, P. Lancaster, and L. Rodman. Matrices and Indefinite Scalar Products,Operator Theory: Advances and Applications, Vol. 8, Birkhauser, Basel and Boston,1983.

[9] R. Piziak, P.L. Odell, and R. Hahn. Constructing projections for sums and intersec-tions, Computers and Mathematics with Applications, 17 (1999), 67–74.

[10] A.C. M. Ran and L. Rodman. Stability of invariant Lagrangian subspaces I, OperatorTheory: Advances and Applications, 32 (1988), 181–218.

[11] A.C. M. Ran and L. Rodman. The rate of convergence of real invariant subspaces,Linear Algebra Appl., 207 (1994), 197–224.

[12] A.C.M. Ran and L. Roozemond. On strong 𝛼-stability of invariant subspaces ofmatrices, Operator Theory: Advances and Applications, 40 (1989), 427–435.

[13] L. Rodman. Stability of invariant subspaces of quaternion matrices, Complex Anal-ysis and Operator Theory, 6 (2012), 1069–1119.

[14] G.W. Stewart. On the continuity of the generalized inverse, SIAM J. Appl. Math.,17 (1969), 33–45.


[15] N.A. Wiegmann. Some theorems on matrices with real quaternion entries, CanadianJ. of Math., 7 (1955), 191–201.

[16] F. Zhang. Quaternions and matrices of quaternions, Linear Algebra Appl., 251 (1997),21–57.

Leiba RodmanDepartment of MathematicsCollege of William and MaryWilliamsburg, VA 23187-8795, USAe-mail: [email protected]



Determinantal Representations ofStable Polynomials

Hugo J. Woerdeman

Dedicated to Leonia Lerer on the occasion of his seventieth birthday

Abstract. For every stable multivariable polynomial 𝑝, with 𝑝(0) = 1, weconstruct a determinantal representation

𝑝(𝑧) = det(𝐼 −𝑀(𝑧)),

where 𝑀(𝑧) is a matrix-valued rational function with ∥𝑀(𝑧)∥ ≤ 1 and∥𝑀(𝑧)𝑛∥ < 1 for 𝑧 ∈ 𝕋𝑑 and 𝑀(𝑎𝑧) = 𝑎𝑀(𝑧) for all 𝑎 ∈ ℂ ∖ {0}.Mathematics Subject Classification (2010). 15A15; 11C20, 47A13, 47A48.

Keywords. Determinantal representation; multivariable polynomial; stablepolynomial.

1. Introduction

A polynomial 𝑝(𝑧) = 𝑝(𝑧1, . . . , 𝑧𝑑) is called stable if 𝑝(𝑧) ∕= 0 for 𝑧 ∈ 𝔻𝑑, where

𝔻 is the open unit disk in ℂ and 𝔻 is its closure. We shall also use the notation𝕋 for the unit circle. The polynomial 𝑝(𝑧) is called semi-stable when 𝑝(𝑧) ∕= 0 for𝑧 ∈ 𝔻𝑑. It is an open question whether every multivariable stable polynomial 𝑝(𝑧)with 𝑝(0) = 1 can be written as

𝑝(𝑧) = det(𝐼 −𝐾𝑍(𝑧)),where 𝑍(𝑧) is a diagonal matrix with coordinate variables on the diagonal and 𝐾is a strict contraction. For one and two variable polynomials such a representationalways exists; in one variable it is an easy consequence of the fundamental theoremof algebra, while in two variables the result follows from [8, Theorem 1] and [7].The question of existence of representations of the form det(𝐼 −𝐾𝑍(𝑧)) was thetopic of the paper [6], where such representation was shown to lead to rational

The author was partially supported by NSF grant DMS-0901628.

242 H.J. Woerdeman

inner functions in the Schur–Agler class. As a consequence of these results, it canbe seen that for 5/6 < 𝑟 < 1, the stable polynomial

𝑞(𝑧1, 𝑧2, 𝑧3) = 1 +𝑟

5𝑧1𝑧2𝑧3

(𝑧21𝑧

22 + 𝑧

22𝑧

23 + 𝑧

23𝑧

21 − 2𝑧1𝑧2𝑧23 − 2𝑧1𝑧22𝑧3 − 2𝑧21𝑧2𝑧3

)can not be represented as det(𝐼 − 𝐾𝑍(𝑧)) where 𝐾 is a 9 × 9 (or smaller size)contraction. It is an open question whether such a representation exists with alarger size contraction 𝐾. Our main result shows, however, that we can find a rep-resentation 𝑞(𝑧) = det(𝐼 −𝑀(𝑧)), with 𝑀(𝑧) a rational matrix function satisfying∥𝑀(𝑧)∥ ≤ 1 and ∥𝑀(𝑧)7∥ < 1, 𝑧 ∈ 𝕋3, and 𝑀(𝑎𝑧) = 𝑎𝑀(𝑧). In fact, for thisparticular polynomial 𝑞, one may choose 𝑀(𝑧) to be the 7 × 7 rational matrixfunction

𝑀(𝑧) =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 𝑧1 00 𝑧1 0

0 𝑧1 00 𝑧1 0

0 𝑧1 00 0 𝑧1

𝑞(𝑧)−1𝑧61

0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠.

Our main result is the following determinantal characterization of stable poly-nomials. Recall that the total degree tdeg𝑝 of a polynomial 𝑝 is the maximum amongthe total degrees of all its terms, where the total degree of the monomial 𝑧𝑛1

1 ⋅ ⋅ ⋅ 𝑧𝑛𝑑

𝑑

is 𝑛1 + ⋅ ⋅ ⋅+ 𝑛𝑑.Theorem 1.1. Let 𝑝 be a polynomial in 𝑑 variables. Put 𝑛 = tdeg𝑝. Then 𝑝 isstable with 𝑝(0) = 1 if and only if for some 𝑘 ∈ ℕ there exists a 𝑘 × 𝑘 rationalmatrix-valued function 𝑀(𝑧) which has no singularities on the set ∪𝑟>0𝑟𝕋

𝑑 so that

(i) 𝑝(𝑧) = det(𝐼𝑘 −𝑀(𝑧)),(ii) 𝑀(𝑧) is contractive and 𝑀(𝑧)𝑛 is strictly contractive for all 𝑧 ∈ 𝕋𝑑,(iii) 𝑀(𝑎𝑧) = 𝑎𝑀(𝑧) for all 𝑎 ∈ ℂ ∖ {0} and 𝑧 ∈ 𝕋𝑑.

In [13] real zero polynomials were studied, for which a desirable determinantalrepresentation is det(𝐼 + 𝑥1𝐴1 + ⋅ ⋅ ⋅ + 𝑥𝑘𝐴𝑘) with 𝐴1, . . . , 𝐴𝑘 symmetric. Notall real zero polynomials have such a representation; see [12]. In [13, Theorem3.1] it was shown, however, that every square-free real zero polynomial can bewritten as det(𝐼 −𝑀(𝑥)), where 𝑀(𝑥) is a symmetric rational matrix functionand 𝑀(𝑎𝑥) = 𝑎𝑀(𝑥), 𝑎 ∈ ℝ, 𝑥 ∈ ℝ𝑑. Our Theorem 1.1 can be seen as an analogof [13, Theorem 3.1] to the setting of stable polynomials.

2. Proof of main result

We first need a couple of lemmas.

Lemma 2.1. Let 𝑄(𝑧) be a positive definite 𝑛×𝑛 matrix-valued trigonometric poly-nomial on 𝕋𝑑, so that in the Laurent expansion the (𝑖, 𝑗)th entry is homogeneousof degree 𝑖 − 𝑗. Then there exists a factorization 𝑄(𝑧) = 𝑃 (𝑧)𝑃 (𝑧)∗, with 𝑃 (𝑧) a

Determinantal Representations of Stable Polynomials 243

rational matrix function of size 𝑛 × 𝑘, say, so that in the Laurent expansion the𝑖th row of the 𝑃 is homogeneous of degree 𝑖 − 1. The rational matrix function 𝑃may be chosen to be polynomial in at least 𝑑− 1 variables.

Proof. By [3, Corollary 5.2] a polynomial matrix function 𝑅 exists so that 𝑄 =

𝑅𝑅∗. Write now 𝑅 =∑𝑁

𝑗=0 𝑅𝑗 , where 𝑅𝑗 is homogeneous of degree 𝑗. Next, write

𝑅𝑗 = row(𝑝𝑘𝑗)𝑛𝑘=1. Observe that

𝑄𝑖𝑗 =

( 𝑁∑𝑘=0

𝑝𝑖𝑘

)( 𝑁∑𝑟=0

𝑝∗𝑗𝑟

)=

∑𝑘−𝑟=𝑖−𝑗

𝑝𝑖𝑘𝑝∗𝑗𝑟 +

∑𝑘−𝑟 ∕=𝑖−𝑗

𝑝𝑖𝑘𝑝∗𝑗𝑟,

but as the last term equals 0 due to 𝑄𝑖𝑗 being homogeneous of degree 𝑖 − 𝑗, weactually have

𝑄𝑖𝑗 =∑

𝑘−𝑟=𝑖−𝑗

𝑝𝑖𝑘𝑝∗𝑗𝑟.

Define now

𝑃 =

⎛⎜⎜⎜⎝𝑝1,𝑁𝑧

−𝑁1 ⋅ ⋅ ⋅ 𝑝1,0𝑧

01 0

0 𝑝2,𝑁𝑧−𝑁+11 ⋅ ⋅ ⋅ 𝑝2,0𝑧

11

. . .. . .

. . .

0 𝑝𝑛,𝑁𝑧−𝑁+𝑛−11 ⋅ ⋅ ⋅ 𝑝𝑛,0𝑧

𝑛−11

⎞⎟⎟⎟⎠ .Then the 𝑖th row of 𝑃 is homogeneous of degree 𝑖− 1 and 𝑄 = 𝑃𝑃 ∗. □

Lemma 2.2. Let 𝑎(𝑧) = 𝑎0 + 𝑎1𝑧 + ⋅ ⋅ ⋅+ 𝑎𝑛𝑧𝑛 with 𝑎0 = 1 be a one variable stablepolynomial, and let 𝑄 := 𝐴𝐴∗ −𝐵∗𝐵, where

𝐴 =

⎛⎜⎝ 𝑎0...

. . .

𝑎𝑛−1 ⋅ ⋅ ⋅ 𝑎0

⎞⎟⎠ , 𝐵 =

⎛⎜⎝𝑎𝑛 ⋅ ⋅ ⋅ 𝑎1. . .

...𝑎𝑛

⎞⎟⎠ .In addition, let

𝐶 =

⎛⎜⎜⎜⎝−𝑎1 1 0...

. . .

−𝑎𝑛−1 0 1−𝑎𝑛 0 ⋅ ⋅ ⋅ 0

⎞⎟⎟⎟⎠ .Then 𝑄 > 0, (

𝑄−1 𝐶∗𝑄−1

𝑄−1𝐶 𝑄−1

)≥ 0,

(𝑄−1 𝐶𝑛∗𝑄−1

𝑄−1𝐶𝑛 𝑄−1

)> 0. (2.1)

Proof. Let 𝑓(𝑧) = 1∣𝑎(𝑧)∣2 , and write 𝑓(𝑧) =

∑∞𝑗=−∞ 𝑓𝑗𝑧

𝑗, ∣𝑧∣ = 1. Introduce

𝑇 = (𝑓𝑖−𝑗)𝑛−1𝑖,𝑗=0, 𝑇 = (𝑓𝑖−𝑗+1)

𝑛−1𝑖,𝑗=0, 𝑇 = (𝑓𝑖−𝑗+𝑛)

𝑛𝑖,𝑗=0.

244 H.J. Woerdeman

By the Schur–Cohn criterion [9, Section 13.5] we have that 𝑄 > 0 and by theGohberg–Semencul formula [5] we have that 𝑄−1 = 𝑇 . In addition, it is easy tocheck that 𝑇𝐶 = 𝑇 . Next, observe that(

𝑄−1 𝐶∗𝑄−1

𝑄−1𝐶 𝑄−1

)=

(𝑇 𝑇 ∗

𝑇 𝑇

),

has many identical rows and columns; indeed, columns 𝑘 and 𝑘 + 𝑡 − 1 are equalfor 𝑘 = 2, . . . , 𝑛, and by selfadjointness the same holds for the rows. Removingcolumns and rows 𝑛 + 1, . . . , 2𝑛 − 1, we remain with the positive definite matrix(𝑓𝑖−𝑗)

𝑛𝑖,𝑗=0. But then the first inequality in (2.1) follows.

In addition, one may check that 𝑇𝐶𝑛 = 𝑇 (this was observed in [4, Proof ofTheorem 2.1]; see also [1, Equation (2.3.14)]). It remains to observe that(

𝑄−1 𝐶𝑛∗𝑄−1

𝑄−1𝐶𝑛 𝑄−1

)=

(𝑇 𝑇 ∗

𝑇 𝑇

)= (𝑓𝑖−𝑗)

2𝑛−1𝑖,𝑗=0 > 0. □

Proof of Theorem 1.1. “If”: Suppose that 𝑀(𝑧) as described exists, and that𝑝(𝑧) = 0 for some 𝑧 ∈ 𝕋𝑑. Then 1 is an eigenvalue of 𝑀(𝑧). But as ∥𝑀(𝑧)𝑛∥ < 1,this can not happen. Thus 𝑝(𝑧) ∕= 0 for 𝑧 ∈ 𝕋𝑑. Now using that 𝑀(𝑎𝑧) = 𝑎𝑀(𝑧),we get that ∥𝑀(𝑧)∥ < 1 for any 𝑧 ∈ 𝑟𝕋𝑑 where 0 < 𝑟 < 1. Thus 𝑝(𝑧) ∕= 0 for any𝑧 ∈ 𝑟𝕋𝑑 where 0 < 𝑟 ≤ 1. In addition, for 𝑧 ∈ 𝕋𝑑 one has

𝑝(0) = lim𝑟→0+

𝑝(𝑟𝑧) = lim𝑟→0+

det(𝐼 −𝑀(𝑟𝑧)) = lim𝑟→0+

det(𝐼 − 𝑟𝑀(𝑧)) = 1.

But now the stability of 𝑝 follows from Theorem 1’ in [2].

“Only if”: Let 𝑝(𝑧) = 𝑝(𝑧1, . . . , 𝑧𝑑) be a stable multivariable polynomial with𝑝(0) = 1. As 𝑛 = tdeg𝑝, we may write 𝑝(𝑧) = 1 + 𝑝1(𝑧) + ⋅ ⋅ ⋅+ 𝑝𝑛(𝑧), where 𝑝𝑗 isa homogeneous multivariable polynomial of degree 𝑗; i.e., 𝑝𝑗(𝑎𝑧) = 𝑎

𝑗𝑝𝑗(𝑧), where𝑎 ∈ ℂ. Introduce now,

𝐶(𝑧) =

⎛⎜⎜⎜⎝−𝑝1(𝑧) 1 0

.... . .

−𝑝𝑛−1(𝑧) 0 1−𝑝𝑛(𝑧) 0 ⋅ ⋅ ⋅ 0

⎞⎟⎟⎟⎠ .Then det(𝐼𝑛 − 𝐶(𝑧)) = 𝑝(𝑧). In addition, if we let 𝐷𝑎 = diag(𝑎𝑗)𝑛−1𝑗=0 , then

𝐶(𝑎𝑧) = 𝑎𝐷𝑎𝐶(𝑧)𝐷−1𝑎 , 𝑎 ∈ ℂ ∖ {0}, 𝑧 ∈ ℂ𝑑. (2.2)

For fixed 𝑧 ∈ 𝔻𝑑put

𝑔𝑧(𝑎) = 𝑝(𝑎𝑧) = 1 + 𝑎𝑝1(𝑧) + ⋅ ⋅ ⋅+ 𝑎𝑛𝑝𝑛(𝑧).Then 𝑔𝑧 is stable, so by the Schur–Cohn criterion (see, e.g., [11], [9, Section 13.5])we have that

𝑄(𝑧) := 𝐴(𝑧)𝐴(1/𝑧)∗ −𝐵(1/𝑧)∗𝐵(𝑧)

Determinantal Representations of Stable Polynomials 245

is positive definite for 𝑧 ∈ 𝕋𝑑. Here

𝐴(𝑧) =

⎛⎜⎝ 𝑝0(𝑧)...

. . .

𝑝𝑛−1(𝑧) ⋅ ⋅ ⋅ 𝑝0(𝑧)

⎞⎟⎠ , 𝐵(𝑧) =

⎛⎜⎝𝑝𝑛(𝑧) ⋅ ⋅ ⋅ 𝑝1(𝑧). . .

...𝑝𝑛(𝑧)

⎞⎟⎠ ,and 𝑝0(𝑧) = 1. The matrix 𝑄 is also called the Bezoutian corresponding to 𝑔𝑧and its reverse ←−𝑔𝑧(𝑎) = 𝑎𝑛𝑔𝑧(1/𝑎); see, for instance, [10] and [11]. It is easy tosee that if we write 𝑄(𝑧) = (𝑄𝑖𝑗(𝑧))

𝑛𝑖,𝑗=1, then 𝑄𝑖𝑗(𝑎𝑧) = 𝑎𝑖−𝑗𝑄𝑖𝑗(𝑧). But then

𝑄(𝑎𝑧) = 𝐷𝑎𝑄(𝑧)𝐷−1𝑎 follows. Next, by Lemma 2.2, we have that(

𝑄(𝑧)−1 𝐶(𝑧)∗𝑄(𝑧)−1

𝑄(𝑧)−1𝐶(𝑧) 𝑄(𝑧)−1

)≥ 0, 𝑧 ∈ 𝕋𝑑. (2.3)

Multiplying all rows and columns on both sides with 𝑄(𝑧) we obtain(𝑄(𝑧) 𝑄(𝑧)𝐶(𝑧)∗

𝐶(𝑧)𝑄(𝑧) 𝑄(𝑧)

)≥ 0, 𝑧 ∈ 𝕋𝑑. (2.4)

As 𝑄(𝑧) satisfies the conditions of Lemma 2.1 we may write 𝑄(𝑧) = 𝑃 (𝑧)𝑃 (𝑧)∗,𝑧 ∈ 𝕋𝑑, with the 𝑖th row of 𝑃 being homogeneous of degree 𝑖 − 1. Thus 𝑃 (𝑎𝑧) =𝐷𝑎𝑃 (𝑧). Let now

𝑀(𝑧) = 𝑃 (1/𝑧)∗𝑄(𝑧)−1𝐶(𝑧)𝑃 (𝑧).

Then det(𝐼 −𝑀(𝑧)) = det(𝐼 − 𝑃 (𝑧)𝑃 (1/𝑧)∗𝑄(𝑧)−1𝐶(𝑧)) = det(𝐼 −𝐶(𝑧)) = 𝑝(𝑧).Next,

𝑀(𝑎𝑧) = 𝑃 (1/𝑎𝑧)∗𝑄(𝑎𝑧)−1𝐶(𝑎𝑧)𝑃 (𝑎𝑧)

= 𝑃 (1/𝑧)∗𝐷−1𝑎 𝐷𝑎𝑄(𝑧)−1𝐷−1𝑎 𝑎𝐷𝑎𝐶(𝑧)𝐷

−1𝑎 𝐷𝑎𝑃 (𝑧) = 𝑎𝑀(𝑧).

Finally, for 𝑧 ∈ 𝕋𝑑, we have that

𝑃 (𝑧)(𝐼 −𝑀(𝑧)∗𝑀(𝑧))𝑃 (𝑧)∗ = 𝑄(𝑧)−𝑄(𝑧)𝐶(𝑧)∗𝑄(𝑧)−1𝐶(𝑧)𝑄(𝑧) ≥ 0,

which follows from (2.4).Thus, as Ran𝑀(𝑧)∗ ⊆ Ran𝑃 (𝑧)∗, it follows that ∥𝑀(𝑧)∥ ≤ 1. Using the

second inequality in (2.1) one can show in a similar way that ∥𝑀(𝑧)𝑛∥ < 1. □

Remark 2.3. If 𝑀(𝑧) in Theorem 1.1 can be chosen to be analytic in 𝔻𝑑then

one easily sees from (iii) that 𝑀(𝑧) =∑𝑑

𝑖=1 𝑧𝑖𝑀𝑖 for some constant matrices𝑀1, . . . ,𝑀𝑑. When in addition,𝑀(𝑧) satisfies that ∥𝑀1⊗𝑇1+ ⋅ ⋅ ⋅+𝑀𝑑⊗𝑇𝑑∥ ≤ 1for all contractions 𝑇1, . . . , 𝑇𝑑 (i.e., 𝑀(𝑧) is in Schur–Agler class), then it follows

from [6, Corollary 3.3] that det(𝐼+∑𝑑

𝑖=1 𝑧𝑖𝑀𝑖) may be written as det(𝐼−𝐾𝑍(𝑧))with 𝐾 a contraction. It is an open problem what happens when 𝑀(𝑧) is not inthe Schur–Agler class.

Acknowledgments

The author wishes to thank Anatolii Grinshpan and Dmitry S. Kaliuzhnyi-Verbo-vetskyi for useful discussions and their input on earlier drafts of this paper.

246 H.J. Woerdeman

References

[1] M. Bakonyi and H.J. Woerdeman. Matrix completions, moments, and sums of Her-mitian squares. Princeton University Press, Princeton, NJ, 2011. xii+518 pp.

[2] Ph. Delsarte, Y.V. Genin, and Y.G. Kamp, A simple proof of Rudin’s multivariablestability theorem. IEEE Trans. Acoust. Speech Signal Process. 28 (1980), no. 6, 701–705.

[3] M.A. Dritschel. On factorization of trigonometric polynomials. Integral EquationsOperator Theory 49 (2004), no. 1, 11–42.

[4] I. Gohberg and L. Lerer, Matrix generalizations of M.G. Krein theorems on orthog-onal polynomials. Orthogonal matrix-valued polynomials and applications, 137–202,Oper. Theory Adv. Appl., 34, Birkhauser, Basel, 1988.

[5] I.C. Gohberg and A.A. Semencul, The inversion of finite Toeplitz matrices and theircontinual analogues. (Russian) Mat. Issled. 7 (1972), no. 2(24), 201–223, 290.

[6] A. Grinshpan, D.S. Kaliuzhnyi-Verbovetskyi, and H.J. Woerdeman. Norm-con-strained determinantal representations of multivariable polynomials. Complex Anal.Oper. Theory 7 (2013), no. 3, 635–654.

[7] A. Grinshpan, D.S. Kaliuzhnyi-Verbovetskyi, V. Vinnikov and H.J. Woerdeman.Stable polynomials and real zero polynomials in two variables. Preprint.

[8] A. Kummert. 2-D stable polynomials with parameter-dependent coefficients: gener-alizations and new results. IEEE Trans. Circuits Systems I: Fund. Theory Appl. 49(2002), 725–731.

[9] P. Lancaster and M. Tismenetsky. The theory of matrices. Second edition. ComputerScience and Applied Mathematics. Academic Press, Inc., Orlando, FL, 1985.

[10] P. Lancaster, L. Lerer, and M. Tismenetsky, Factored forms for solutions of 𝐴𝑋 −𝑋𝐵 = 𝐶 and 𝑋−𝐴𝑋𝐵 = 𝐶 in companion matrices. Linear Algebra Appl. 62 (1984),19–49.

[11] L. Lerer, L. Rodman, and M. Tismenetsky, Bezoutian and Schur–Cohn problem foroperator polynomials. J. Math. Anal. Appl. 103 (1984), no. 1, 83–102.

[12] T. Netzer and A. Thom. Polynomials with and without determinantal representa-tions. Linear Algebra Appl. 437 (2012), 1579–1595.

[13] T. Netzer, D. Plaumann and A. Thom. Determinantal representations and the Her-mite matrix. Michigan Math. J. 62 (2013), 407–420.

Hugo J. WoerdemanDepartment of MathematicsDrexel University3141 Chestnut St.Philadelphia, PA, 19104, USAe-mail: [email protected]


advances in structured operator theory and related areas: the leonid lerer anniversary volume

Documents