principles of quantum communication theory: a modern approach

971
Principles of Quantum Communication Theory: A Modern Approach Sumeet Khatri and Mark M. Wilde November 11, 2020 arXiv:2011.04672v1 [quant-ph] 9 Nov 2020

Upload: others

Post on 27-Jan-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Principles of Quantum Communication Theory: A Modern Approach

Principles ofQuantum Communication Theory:

A Modern Approach

Sumeet Khatri and Mark M. Wilde

November 11, 2020

arX

iv:2

011.

0467

2v1

[qu

ant-

ph]

9 N

ov 2

020

Page 2: Principles of Quantum Communication Theory: A Modern Approach

Preface

[IN PROGRESS]

ii

Page 3: Principles of Quantum Communication Theory: A Modern Approach

Acknowledgements

[IN PROGRESS]

We dedicate this book to the memory of Jonathan P. Dowling. Jon wasgenerous and kind-hearted, and he always gave all of his students his full,unwavering support. His tremendous impact on the lives of everyone whomet him will ensure that his memory lives on and that he will not be forgotten.We will especially remember Jon’s humour and his sharp wit. We are surethat, as he had promised, this book would have made the perfect doorstopfor his office.

Sumeet Khatri acknowledges support from the National Science Founda-tion under Grant No. 1714215 and the Natural Sciences and Engineering Re-search Council of Canada postgraduate scholarship. Mark M. Wilde acknowl-edges support from the National Science Foundation over the past decade(specifically from Grant Nos. 1350397, 1714215, 1907615, 2014010), and is in-debted and grateful to Patrick Hayden for hosting him for a sabbatical atStanford University during calendar year 2020, with support from StanfordQFARM and AFOSR (FA9550-19-1-0369).

iii

Page 4: Principles of Quantum Communication Theory: A Modern Approach

Table of Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

I Preliminaries 2

2 Mathematical Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1 Finite-Dimensional Hilbert Spaces . . . . . . . . . . . . . . . . . . 42.2 Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . 172.2.2 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.3 Operator Inequalities . . . . . . . . . . . . . . . . . . . . . 322.2.4 Superoperators . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.3 Analysis and Probability . . . . . . . . . . . . . . . . . . . . . . . 392.3.1 Limits, Infimum, Supremum, and Continuity . . . . . . . 392.3.2 Compact Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 422.3.3 Convex Sets and Functions . . . . . . . . . . . . . . . . . . 422.3.4 Fenchel–Eggleston–Carathéodory Theorem . . . . . . . . . 442.3.5 Minimax Theorems . . . . . . . . . . . . . . . . . . . . . . . 442.3.6 Probability Distributions . . . . . . . . . . . . . . . . . . . 46

2.4 Semi-Definite Programming . . . . . . . . . . . . . . . . . . . . . 472.4.1 Example of an SDP . . . . . . . . . . . . . . . . . . . . . . . 52

2.5 The Symmetric Subspace . . . . . . . . . . . . . . . . . . . . . . . 552.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3 Overview of Quantum Mechanics . . . . . . . . . . . . . . . . . . . . 613.1 Quantum Systems, States, and Measurements . . . . . . . . . . . 62

3.1.1 Bipartite States . . . . . . . . . . . . . . . . . . . . . . . . . 663.1.2 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 843.1.3 Quantum State Discrimination . . . . . . . . . . . . . . . . 88

3.2 Quantum Channels . . . . . . . . . . . . . . . . . . . . . . . . . . 993.2.1 Choi Representation . . . . . . . . . . . . . . . . . . . . . . 101

iv

Page 5: Principles of Quantum Communication Theory: A Modern Approach

3.2.2 Characterizations of Quantum Channels: Choi, Kraus,Stinespring . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

3.2.3 Unital Maps and Adjoint Maps . . . . . . . . . . . . . . . . 1113.2.4 Appending, Replacement, Partial Trace, Isometric, Com-

plementary, and Degradable Channels . . . . . . . . . . . 1123.2.5 Amplitude Damping, Erasure, Pauli, and Depolarizing

Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173.2.6 Classical–Quantum and Quantum–Classical Channels . . 1283.2.7 Quantum Instruments . . . . . . . . . . . . . . . . . . . . . 1323.2.8 Entanglement-Breaking Channels . . . . . . . . . . . . . . 1343.2.9 Hadamard Channels . . . . . . . . . . . . . . . . . . . . . . 1393.2.10 Petz Recovery Map . . . . . . . . . . . . . . . . . . . . . . . 1423.2.11 LOCC Channels . . . . . . . . . . . . . . . . . . . . . . . . 1463.2.12 Completely PPT-Preserving Channels . . . . . . . . . . . . 152

3.3 Quantum Teleportation . . . . . . . . . . . . . . . . . . . . . . . . 1553.3.1 Qubit Teleportation Protocol . . . . . . . . . . . . . . . . . 1563.3.2 Qudit Teleportation Protocol . . . . . . . . . . . . . . . . . 1583.3.3 Qudit Teleportation Protocol With Respect to a Finite Group1623.3.4 Post-Selected Teleportation . . . . . . . . . . . . . . . . . . 1653.3.5 Teleportation-Simulable Channels . . . . . . . . . . . . . . 166

3.4 Quantum Super-Dense Coding . . . . . . . . . . . . . . . . . . . . 1693.5 Distinguishability Measures for States and Channels . . . . . . . 171

3.5.1 Trace Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 1713.5.2 Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1763.5.3 Diamond Norm and Diamond Distance . . . . . . . . . . . 1923.5.4 Fidelity Measures for Channels . . . . . . . . . . . . . . . . 196

3.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 201Appendix 3.A The Quantum Chernoff Bound . . . . . . . . . . . . . . 205

3.A.1 Proof of Theorem 3.14 . . . . . . . . . . . . . . . . . . . . . 205Appendix 3.B Support Containment Lemmas . . . . . . . . . . . . . 209Appendix 3.C SDPs for Normalized Trace and Diamond Distance . . 211

3.C.1 Proof of Proposition 3.51 . . . . . . . . . . . . . . . . . . . . 2113.C.2 Proof of Proposition 3.70 . . . . . . . . . . . . . . . . . . . . 211

Appendix 3.D SDPs for Fidelity of States and Channels . . . . . . . . 2143.D.1 Proof of Proposition 3.56 . . . . . . . . . . . . . . . . . . . . 2143.D.2 Proof of Proposition 3.74 . . . . . . . . . . . . . . . . . . . . 218

4 Quantum Entropies and Information . . . . . . . . . . . . . . . . . . 2234.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2244.2 Quantum Relative Entropy . . . . . . . . . . . . . . . . . . . . . . 226

v

Page 6: Principles of Quantum Communication Theory: A Modern Approach

4.2.1 Information Measures from Quantum Relative Entropy . . 2404.2.2 Quantum Conditional Mutual Information . . . . . . . . . 2434.2.3 Quantum Mutual Information . . . . . . . . . . . . . . . . 252

4.3 Generalized Divergences . . . . . . . . . . . . . . . . . . . . . . . 2554.4 Petz–Rényi Relative Entropy . . . . . . . . . . . . . . . . . . . . . 2604.5 Sandwiched Rényi Relative Entropy . . . . . . . . . . . . . . . . . 2774.6 Geometric Rényi Relative Entropy . . . . . . . . . . . . . . . . . . 301

4.6.1 Proof of Proposition 4.39 . . . . . . . . . . . . . . . . . . . . 3274.6.2 Proof of Proposition 4.38 . . . . . . . . . . . . . . . . . . . . 328

4.7 Belavkin–Staszewski Relative Entropy . . . . . . . . . . . . . . . 3384.8 Max-Relative Entropy . . . . . . . . . . . . . . . . . . . . . . . . . 348

4.8.1 Smooth Max-Relative Entropy . . . . . . . . . . . . . . . . 3524.9 Hypothesis Testing Relative Entropy . . . . . . . . . . . . . . . . 357

4.9.1 Connection to Quantum Relative Entropy . . . . . . . . . 3654.9.2 Connections to Quantum Rényi Relative Entropies . . . . 366

4.10 Quantum Stein’s Lemma . . . . . . . . . . . . . . . . . . . . . . . 3704.10.1 Error and Strong Converse Exponents . . . . . . . . . . . . 379

4.11 Information Measures for Quantum Channels . . . . . . . . . . . 3834.11.1 Simplified Formulas for Rényi Information Measures . . . 3994.11.2 Remarks on Defining Channel Quantities from State Quan-

tities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4064.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4084.13 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 409

5 Entanglement Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 4145.1 Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . 416

5.1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4245.2 Generalized Divergence of Entanglement . . . . . . . . . . . . . . 448

5.2.1 Cone Program Formulations . . . . . . . . . . . . . . . . . 4605.3 Generalized Rains Divergence . . . . . . . . . . . . . . . . . . . . 462

5.3.1 Semi-Definite Program Formulations . . . . . . . . . . . . 4685.4 Squashed Entanglement . . . . . . . . . . . . . . . . . . . . . . . . 4755.5 Entanglement Measures for Channels . . . . . . . . . . . . . . . . 484

5.5.1 Amortized Entanglement . . . . . . . . . . . . . . . . . . . 4885.5.2 Generalized Divergence of Entanglement . . . . . . . . . . 4965.5.3 Generalized Rains Divergence . . . . . . . . . . . . . . . . 5045.5.4 Squashed Entanglement . . . . . . . . . . . . . . . . . . . . 511

5.6 Amortization Collapses . . . . . . . . . . . . . . . . . . . . . . . . 5145.6.1 Max-Relative Entropy of Entanglement . . . . . . . . . . . 5145.6.2 Max-Rains Information . . . . . . . . . . . . . . . . . . . . 518

vi

Page 7: Principles of Quantum Communication Theory: A Modern Approach

5.6.3 Squashed Entanglement . . . . . . . . . . . . . . . . . . . . 5225.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5265.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 527Appendix 5.A Semi-Definite Programs for Negativity . . . . . . . . . 530Appendix 5.B The α → 1 and α → ∞ Limits of the Sandwiched

Rényi Entanglement Measures . . . . . . . . . . . . . . . . . . . . 532

II Quantum Communication Protocols 538

6 Entanglement-Assisted Classical Communication . . . . . . . . . . . 5396.1 One-Shot Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540

6.1.1 Protocol Over a Useless Channel . . . . . . . . . . . . . . . 5476.1.2 Upper Bound on the Number of Transmitted Bits . . . . . 5526.1.3 Lower Bound on the Number of Transmitted Bits via Position-

Based Coding and Sequential Decoding . . . . . . . . . . . 5566.2 Entanglement-Assisted Classical Capacity of a

Quantum Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . 5646.2.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . . . 5706.2.2 Additivity of the Sandwiched Rényi Mutual Information

of a Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . 5736.2.3 Proof of the Strong Converse . . . . . . . . . . . . . . . . . 5826.2.4 Proof of the Weak Converse . . . . . . . . . . . . . . . . . . 583

6.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5846.3.1 Covariant Channels . . . . . . . . . . . . . . . . . . . . . . 5846.3.2 Generalized Amplitude Damping Channel . . . . . . . . . 590

6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5926.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 593Appendix 6.A Proof of Theorem 6.7 . . . . . . . . . . . . . . . . . . . 594Appendix 6.B The α → 1 Limit of the Sandwiched Rényi Mutual

Information of a Channel . . . . . . . . . . . . . . . . . . . . . . . 601Appendix 6.C Achievability from a Different Point of View . . . . . . 602Appendix 6.D Proof of Lemma 6.20 . . . . . . . . . . . . . . . . . . . 604Appendix 6.E Alternate Expression for the 1→ α CB Norm . . . . . 608Appendix 6.F Proof of the Multiplicativity of the 1→ α CB Norm . . 609Appendix 6.G The Strong Converse from a Different Point of View . 618

7 Classical Communication . . . . . . . . . . . . . . . . . . . . . . . . . 6217.1 One-Shot Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624

7.1.1 Protocol Over a Useless Channel . . . . . . . . . . . . . . . 628

vii

Page 8: Principles of Quantum Communication Theory: A Modern Approach

7.1.2 Upper Bound on the Number of Transmitted Bits . . . . . 6307.1.3 Lower Bound on the Number of Transmitted Bits . . . . . 633

7.2 Classical Capacity of a Quantum Channel . . . . . . . . . . . . . 6437.2.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . . . 6497.2.2 Proof of the Weak Converse . . . . . . . . . . . . . . . . . . 6527.2.3 The Additivity Question . . . . . . . . . . . . . . . . . . . . 6557.2.4 Proof of the Strong Converse for Entanglement-Breaking

Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6657.2.5 General Upper Bounds on the Strong Converse Classical

Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6677.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684

7.3.1 Covariant Channels . . . . . . . . . . . . . . . . . . . . . . 6867.3.2 Amplitude Damping Channel . . . . . . . . . . . . . . . . 6947.3.3 Hadamard Channels . . . . . . . . . . . . . . . . . . . . . . 696

7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6997.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 700Appendix 7.A The α→ 1 Limit of the Sandwiched Rényi Υ-Information

of a Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701Appendix 7.B Proof of the Additivity of Cβ(N) . . . . . . . . . . . . . 702

8 Entanglement Distillation . . . . . . . . . . . . . . . . . . . . . . . . . 7098.1 One-Shot Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711

8.1.1 Upper Bounds on the Number of Ebits . . . . . . . . . . . 7138.1.2 Lower Bound on the Number of Ebits via Decoupling . . 722

8.2 Distillable Entanglement of a Quantum State . . . . . . . . . . . . 7358.2.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . . . 7418.2.2 Proof of the Weak Converse . . . . . . . . . . . . . . . . . . 7458.2.3 Rains Relative Entropy Strong Converse Upper Bound . . 7478.2.4 Squashed Entanglement Weak Converse Upper Bound . . 7518.2.5 One-Way Entanglement Distillation . . . . . . . . . . . . . 752

8.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7588.3.1 Pure States . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7588.3.2 Degradable and Anti-Degradable States . . . . . . . . . . . 758

8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7628.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 763Appendix 8.A One-Shot Decoupling and Proof of Theorem 8.11 . . . 765

9 Quantum Communication . . . . . . . . . . . . . . . . . . . . . . . . . 7739.1 One-Shot Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776

9.1.1 Protocol for a Useless Channel . . . . . . . . . . . . . . . . 781

viii

Page 9: Principles of Quantum Communication Theory: A Modern Approach

9.1.2 Upper Bound on the Number of Transmitted Qubits . . . 7829.1.3 Lower Bound on the Number of Transmitted Qubits via

Entanglement Distillation . . . . . . . . . . . . . . . . . . . 7849.2 Quantum Capacity of a Quantum Channel . . . . . . . . . . . . . 795

9.2.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . . . 8019.2.2 Proof of the Weak Converse . . . . . . . . . . . . . . . . . . 8059.2.3 The Additivity Question . . . . . . . . . . . . . . . . . . . . 8089.2.4 Rains Information Strong Converse Upper Bound . . . . . 8099.2.5 Squashed Entanglement Weak Converse Bound . . . . . . 818

9.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8199.3.1 Degradable Channels . . . . . . . . . . . . . . . . . . . . . 8209.3.2 Anti-Degradable Channels . . . . . . . . . . . . . . . . . . 8269.3.3 Generalized Amplitude Damping Channel . . . . . . . . . 827

9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8329.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 833Appendix 9.A Alternative Notions of Quantum Communication . . 834

10 Secret Key Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . 83810.1 Tripartite key states and bipartite private states . . . . . . . . . . 83810.2 Privacy test and the ε-relative entropy of entanglement of bipar-

tite private states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843

11 Private Communication . . . . . . . . . . . . . . . . . . . . . . . . . . 848

III Feedback-Assisted Quantum Communication Proto-cols 849

12 Quantum-Feedback-Assisted Communication . . . . . . . . . . . . . 85012.1 n-Shot Quantum Feedback-Assisted Communication Protocols . 851

12.1.1 Protocol over a Useless Channel . . . . . . . . . . . . . . . 85412.1.2 Upper Bound on the Number of Transmitted Bits . . . . . 85712.1.3 The Amortized Perspective . . . . . . . . . . . . . . . . . . 864

12.2 Quantum Feedback-Assisted Classical Capacity of a QuantumChannel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871

12.3 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 872

13 Classical-Feedback-Assisted Communication . . . . . . . . . . . . . 874

14 LOCC-Assisted Quantum Communication . . . . . . . . . . . . . . . 87514.1 n-Shot LOCC-Assisted Quantum Communication Protocol . . . 878

ix

Page 10: Principles of Quantum Communication Theory: A Modern Approach

14.1.1 Lower Bound on the Number of Transmitted Qubits . . . 88114.1.2 Amortized Entanglement as a General Upper Bound for

LOCC-Assisted Quantum Communication Protocols . . . 88114.1.3 Squashed Entanglement Upper Bound on the Number of

Transmitted Qubits . . . . . . . . . . . . . . . . . . . . . . . 88314.2 n-Shot PPT-Assisted Quantum Communication Protocol . . . . . 884

14.2.1 Rényi–Rains Information Upper Bounds on the Numberof Transmitted Qubits . . . . . . . . . . . . . . . . . . . . . 886

14.3 LOCC- and PPT-Assisted Quantum Capacities of Quantum Chan-nels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888

14.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89114.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 891

15 LOPC-Assisted Private Communication . . . . . . . . . . . . . . . . . 89315.1 n-Shot Secret-Key-Agreement Protocol . . . . . . . . . . . . . . . 895

15.1.1 Equivalence between Secret Key Agreement and LOPC-Assisted Private Communication . . . . . . . . . . . . . . . 899

15.2 Equivalence between Secret Key Agreement and LOCC-AssistedPrivate-State Distillation . . . . . . . . . . . . . . . . . . . . . . . 90115.2.1 The Purified Protocol . . . . . . . . . . . . . . . . . . . . . 90215.2.2 LOCC-Assisted Bipartite Private-State Distillation . . . . . 90415.2.3 Relation between Secret Key Agreement and LOCC-Assisted

Quantum Communication . . . . . . . . . . . . . . . . . . 90715.2.4 n-Shot Secret-Key-Agreement Protocol Assisted by Pub-

lic Separable Channels . . . . . . . . . . . . . . . . . . . . . 90815.2.5 Amortized Entanglement Bound for Secret-Key-Agreement

Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90915.3 Squashed Entanglement Upper Bound on the Number of Trans-

mitted Private Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . 91215.3.1 Squashed Entanglement and Approximate Private States . 91215.3.2 Squashed Entanglement Upper Bound . . . . . . . . . . . 916

15.4 Relative Entropy of Entanglement Upper Bounds on the Num-ber of Transmitted Private Bits . . . . . . . . . . . . . . . . . . . . 917

15.5 Secret-Key-Agreement Capacities of Quantum Channels . . . . . 91915.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92115.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 921

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 924

Appendix A Analyzing General Communication Scenarios . . . . . . 925

x

Page 11: Principles of Quantum Communication Theory: A Modern Approach

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926

xi

Page 12: Principles of Quantum Communication Theory: A Modern Approach

Chapter 1

Introduction

[IN PROGRESS]

1

Page 13: Principles of Quantum Communication Theory: A Modern Approach

Part I

Preliminaries

Page 14: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2

Mathematical Tools

Before we attempt to build up the theory of quantum information, it is nec-essary for us to learn about and sharpen the tools that we will need for itsconstruction. Towards this goal, this chapter reviews many of the mathemat-ical notions required for the analysis of quantum communication protocols.We mostly provide a summary of the main definitions and results needed inlater chapters, and we omit several of the proofs. For further details on theconcepts presented here, as well as for proofs of the results presented, pleaseconsult the Bibliographic Notes (Section 2.6) at the end of the chapter.

Linear algebra forms the core mathematical foundation of quantum infor-mation theory for finite-dimensional quantum systems, and thus it is worth-while for us to start by reviewing the basics of linear algebra, with an em-phasis on linear operators. We then proceed to give a summary of severalrelevant definitions and results in real and convex analysis, probability the-ory, and semi-definite programming. In addition to linear algebra, these latterconcepts are foundational as well, and it is our opinion that they should beat the fingertips of all practicing quantum information theorists. Conceptsfrom real analysis play an important role in quantum information theory: in-deed, the capacity of a quantum channel itself is defined as a limit, i.e., thebest rate that is achievable if an arbitrarily large number of channel uses isavailable. Convexity plays a prominent role as well: not only is the set ofquantum states a convex set, but also the operator Jensen inequality, a foun-dational statement about operator convex functions, is the fundamental in-equality that leads to various quantum data-processing inequalities. The lat-ter data-processing principle is one of the central tenets of quantum informa-tion that allows for placing limitations on the communication capacities of

3

Page 15: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

quantum channels. Probability theory is essential as well, due to the prob-abilistic nature of quantum mechanics and the inevitable and unpredictableerrors that occur when communicating information over quantum channels.Finally, semi-definite programming is a remarkably useful tool, not only as ananalytical tool but also for calculating relevant quantities of interest numer-ically. Semi-definite programming has also played a pivotal role in many ofthe substantive advances that have taken place in quantum information the-ory during the past several decades, and so it has become one of the standardtools in the quantum information theorist’s toolkit.

2.1 Finite-Dimensional Hilbert Spaces

The primary mathematical object in quantum theory is the Hilbert space. Weconsider only finite-dimensional Hilbert spaces, denoted by H, throughoutthis book, and we use dim(H) to denote their dimension. Although we con-sider finite-dimensional spaces exclusively in this book, we note here thatmany of the statements and claims extend directly to the case of separable,infinite-dimensional Hilbert spaces, especially for operationally-defined tasksand information quantities. However, we do not delve into these details.

A d-dimensional Hilbert space (1 ≤ d < ∞) is defined to be a complexvector space equipped with an inner product1. We use the notation |ψ〉 todenote a vector in H. An inner product is a function 〈·|·〉 : H×H → C thatsatisfies the following properties:

• Non-negativity: 〈ψ|ψ〉 ≥ 0 for all |ψ〉 ∈ H, and 〈ψ|ψ〉 = 0 if and only if|ψ〉 = 0.

• Conjugate bilinearity: For all |ψ1〉, |ψ2〉, |φ1〉, |φ2〉 ∈ H and α1, β1, α2, β2 ∈C,

〈α1ψ1 + β1φ1|α2ψ2 + β2φ2〉 = α1α2 〈ψ1|ψ2〉+ α1β2 〈ψ1|φ2〉+ β1α2 〈φ1|ψ2〉+ β1β2 〈φ1|φ2〉 . (2.1.1)

• Conjugate symmetry: 〈ψ|φ〉 = 〈φ|ψ〉 for all |ψ〉, |φ〉 ∈ H.

1This definition suffices in the finite-dimensional case. More generally, a Hilbert space isa complete inner product space; please consult the Bibliographic Notes (Section 2.6).

4

Page 16: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

In the above, α denotes the complex conjugate of α ∈ C. Throughout thisbook, the term “Hilbert space” always refers to a finite-dimensional Hilbertspace.

All d-dimensional Hilbert spaces are isomorphic to the vector space Cd

equipped with the Euclidean inner product. Note that Cd is the vector spaceof d-dimensional column vectors with elements in C. We let |i〉d−1

i=0 denotean orthonormal basis, called the standard basis or computational basis, for theHilbert space with respect to the Euclidean inner product. The vector |i〉 isdefined to be a column vector with its (i + 1)th entry equal to one and allothers equal to zero, so that

|0〉 =

100...0

, |1〉 =

010...0

, . . . , |d− 1〉 =

000...1

. (2.1.2)

The inner product 〈i|j〉 evaluates to 〈i|j〉 = δi,j for all i, j ∈ 0, . . . , d− 1,where the Kronecker delta function is defined as

δi,j :=

0 if i 6= j1 if i = j

. (2.1.3)

More generally, for two vectors |ψ〉 = ∑d−1i=0 αi|i〉 and |φ〉 = ∑d−1

i=0 βi|i〉, withαi, βi ∈ C being the respective components of |ψ〉 and |φ〉 in the standardbasis, the inner product 〈ψ|φ〉 is defined as

〈ψ|φ〉 :=d−1

∑i=0

αiβi. (2.1.4)

The Euclidean norm, denoted by ‖|ψ〉‖2, of a vector |ψ〉 ∈ H is the norminduced by the inner product, i.e.,

‖|ψ〉‖2 :=√〈ψ|ψ〉. (2.1.5)

The Cauchy–Schwarz inequality is the following statement: for two vectors|ψ〉, |φ〉 ∈ H, the following inequality holds

|〈ψ|φ〉|2 ≤ 〈ψ|ψ〉 · 〈φ|φ〉 = ‖|ψ〉‖22 · ‖|φ〉‖2

2 , (2.1.6)

5

Page 17: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

with equality if and only if |φ〉 = α|ψ〉 for some α ∈ C.

Given a vector |ψ〉 ∈ H, its dual vector, denoted by 〈ψ|, is defined to be alinear functional from H to C such that 〈ψ|(|φ〉) = 〈ψ|φ〉 for all |φ〉 ∈ H. If|ψ〉 = ∑d−1

i=0 αi|i〉, then 〈ψ| = ∑d−1i=0 αi〈i|, where 〈i| can be interpreted, based on

(2.1.2), as a row vector with its (i+ 1)th entry equal to one and all other entriesequal to zero; i.e., 〈i| = (|i〉)T, where (·)T denotes the matrix transpose.

The tensor product of vectors, operators, and Hilbert spaces plays an im-portant role in quantum theory. For two Hilbert spaces HA and HB withdimensions dA and dB, respectively, along with associated orthonormal bases|i〉AdA−1

i=0 and |j〉BdB−1j=0 , the tensor product vector |i〉A ⊗ |j〉B is a vector in

a dA × dB-dimensional Hilbert space with a one in its (i · dB + j + 1)th entryand zeros elsewhere. Notice here that we have employed the labels A and Bin order to keep track of the vectors in the tensor product. Later on, when wemove to the study of quantum information, we will see that the label A can beassociated to a quantum system in possession of “Alice” and the label B canbe associated to a quantum system in possession of “Bob.” As an example ofthe tensor-product vector |i〉A ⊗ |j〉B, if dA = 2, dB = 3, i = 0, and j = 2, then

|i〉 ⊗ |j〉 = |0〉 ⊗ |2〉 =(

10

)⊗

001

=

1 ·

001

0 ·

001

=

001000

. (2.1.7)

More generally, for vectors |ψ〉A = ∑dA−1i=0 αi|i〉A and |φ〉B = ∑dB−1

j=0 β j|j〉B, thetensor-product vector |ψ〉A ⊗ |φ〉B is given by

|ψ〉A ⊗ |φ〉B =dA−1

∑i=0

dB−1

∑j=0

αiβ j|i〉A ⊗ |j〉B. (2.1.8)

As an example with dA = 2 and dB = 3, we find that |ψ〉A ⊗ |φ〉B can becalculated by a generalization of the “stack-and-multiply” procedure used

6

Page 18: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

in (2.1.7):

|ψ〉A ⊗ |φ〉B =

(α0α1

)⊗

β0β1β2

=

α0 ·

β0β1β2

α1 ·

β0β1β2

=

α0β0α0β1α0β2α1β0α1β1α1β2

. (2.1.9)

The tensor-product Hilbert space HA ⊗HB is defined to be the Hilbertspace spanned by the vectors |i〉A ⊗ |j〉B defined above:

HA ⊗HB := span|i〉A ⊗ |j〉B : 0 ≤ i ≤ dA − 1, 0 ≤ j ≤ dB − 1. (2.1.10)

The inner product on HA ⊗HB is given by

(〈i|A ⊗ 〈j|B)(|i′〉A ⊗ |j′〉B) = 〈i|i′〉〈j|j′〉 = δi,i′δj,j′ (2.1.11)

for all i, i′, j, j′ satisfying 0 ≤ i, i′ ≤ dA − 1 and 0 ≤ j, j′ ≤ dB − 1. Thespace HA ⊗HB consequently has dimension dAdB. We often use the nota-tion HAB ≡ HA ⊗HB, as well as the abbreviation |i, j〉AB ≡ |i〉A ⊗ |j〉B. Weoften also use the notation HAn ≡ H⊗n

A to refer to the n-fold tensor productof HA.

The direct sum of HA and HB, denoted by HA ⊕HB, is defined to be theHilbert space of vectors |ψ〉A ⊕ |φ〉B, with |ψ〉A ∈ HA and |φ〉B ∈ HB, where

|ψ〉A ⊕ |φ〉B :=(|ψ〉A|φ〉B

). (2.1.12)

In other words, HA ⊕HB can be viewed as the Hilbert space of column vec-tors formed by stacking elements of the Hilbert spaces. Connecting to whatwe developed earlier with the tensor product, if HA has the same dimensionas HB, we can then write

|ψ〉A ⊕ |φ〉B = |0〉 ⊗ |ψ〉+ |1〉 ⊗ |φ〉, (2.1.13)

where |0〉, |1〉 is the standard basis for a two-dimensional Hilbert space. If|i〉AdA−1

i=0 and |j〉BdB−1j=0 are orthonormal bases for HA and HB, respectively,

then(|i〉A

0

): 0 ≤ i ≤ dA − 1

∪(

0|j〉B

): 0 ≤ j ≤ dB − 1

(2.1.14)

7

Page 19: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

is an orthonormal basis for HA ⊕HB under the inner product

(〈i|A ⊕ 〈j|B)(|i′〉A ⊕ |j′〉B) =⟨i|i′⟩+⟨

j|j′⟩= δi,i′ + δj,j′ . (2.1.15)

for all 0 ≤ i, i′ ≤ dA − 1 and 0 ≤ j, j′ ≤ dB − 1. Consequently, HA ⊕HBhas dimension dA + dB. One of the simplest examples of a direct-sum Hilbertspace is C⊕C = C2. More generally, the d-fold direct sum C⊕d is equal to Cd.

If H is a d-dimensional Hilbert space, then the k-fold direct sum H⊕k isa kd-dimensional Hilbert space. Consequently, it is isomorphic to Ck ⊗H,generalizing the simple example presented in (2.1.13). Indeed, let HX ≡ Ck,with orthonormal basis |i〉Xk−1

i=0 , and let HA ≡ H, with orthonormal basis|j〉Ad−1

j=0 . We then have the correspondence

|i〉X ⊗ |j〉A ↔

0...0|j〉A

0...0

, (2.1.16)

holding for all 0 ≤ i ≤ k− 1 and all 0 ≤ j ≤ d− 1, where on the right-handside there is a one in the (i · d + j + 1)th entry of the column vector and zeroselsewhere. Then, for an element |ψ0〉A ⊕ |ψ1〉A ⊕ · · · ⊕ |ψk−1〉A ∈ H⊕k

A , wehave

|ψ1〉A|ψ2〉A

...|ψk−1〉A

k−1

∑i=0|i〉X ⊗ |ψi〉A. (2.1.17)

The isomorphism between H⊕k and Ck ⊗H given by (2.1.16) is relevant inthe context of superpositions of quantum states and entanglement.

2.2 Linear Operators

Given a Hilbert space HA with dimension dA and a Hilbert space HB withdimension dB, a linear operator X : HA → HB is defined to be a function suchthat

X(α|ψ〉A + β|φ〉A) = αX|ψ〉A + βX|φ〉A (2.2.1)

8

Page 20: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

for all α, β ∈ C and |ψ〉A, |φ〉A ∈ HA. For clarity, we sometimes write XA→B toexplicitly indicate the input and output Hilbert spaces of the linear operatorX. We use 1 to denote the identity operator, which is defined as the uniquelinear operator such that 1|ψ〉 = |ψ〉 for all vectors |ψ〉.

We denote the set of all linear operators from HA to HB by L(HA,HB). IfHA = HB, then L(HA) := L(HA,HA), and we sometimes indicate the inputHilbert space HA of X ∈ L(HA) by writing XA. In particular, we often writeXAB when referring to linear operators in L(HA⊗HB), i.e., when referring tolinear operators acting on a tensor-product Hilbert space.

The set L(HA,HB) is itself a dAdB-dimensional vector space. The standardbasis for L(H) is defined to be

|i〉B〈j|A : 0 ≤ i ≤ dB − 1, 0 ≤ j ≤ dA − 1. (2.2.2)

By applying (2.1.2), we see that the operator |i〉B〈j|A has a matrix represen-tation as a dB × dA matrix with the (i + 1, j + 1)th entry equal to one and allother entries equal to zero, i.e.,

|0〉B〈0|A =

1 0 · · · 00 0 · · · 0...

... . . . 00 0 · · · 0

, |0〉B〈1|A =

0 1 · · · 00 0 · · · 0...

... . . . 00 0 · · · 0

, . . . ,

|dB − 1〉B〈dA − 1|A =

0 0 · · · 00 0 · · · 0...

... . . . 00 0 · · · 1

.

(2.2.3)

Using this basis, we can write a linear operator X ∈ L(HA,HB) as

XA→B =dB−1

∑i=0

dA−1

∑j=0

Xi,j|i〉B〈j|A, (2.2.4)

where Xi,j := 〈i|BX|j〉A. We can thus interpret a linear operator X ∈ L(HA,HB) as a dB × dA matrix with the elements Xi,j = 〈i|BX|j〉A, where 0 ≤ i ≤dB − 1 and 0 ≤ j ≤ dA − 1.

Given two linear operators X ∈ L(HA,HB) and Y ∈ L(HA′ ,HB′), theirtensor product X⊗Y is a linear operator in L(HA⊗HA′ ,HB⊗HB′) such that

(X⊗Y)(|ψ〉A ⊗ |φ〉A′) = X|ψ〉A ⊗Y|φ〉A′ (2.2.5)

9

Page 21: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

for all |ψ〉A ∈ HA and |φ〉A′ ∈ HA′ . The matrix representation of X ⊗ Y isthe Kronecker product of the matrix representations of X and Y, which is amatrix generalization of the “stack-and-multiply” procedure from (2.1.9). Forexample, if dA = dB = 2 and dA′ = dB′ = 3, then

X⊗Y =

(X0,0 X0,1X1,0 X1,1

)⊗

Y0,0 Y0,1 Y0,2Y1,0 Y1,1 Y1,2Y2,0 Y2,1 Y2,2

(2.2.6)

=

X0,0 ·

Y0,0 Y0,1 Y0,2Y1,0 Y1,1 Y1,2Y2,0 Y2,1 Y2,2

X0,1 ·

Y0,0 Y0,1 Y0,2Y1,0 Y1,1 Y1,2Y2,0 Y2,1 Y2,2

X1,0 ·

Y0,0 Y0,1 Y0,2Y1,0 Y1,1 Y1,2Y2,0 Y2,1 Y2,2

X1,1 ·

Y0,0 Y0,1 Y0,2Y1,0 Y1,1 Y1,2Y2,0 Y2,1 Y2,2

(2.2.7)

=

X0,0Y0,0 X0,0Y0,1 X0,0Y0,2 X0,1Y0,0 X0,1Y0,1 X0,1Y0,2X0,0Y1,0 X0,0Y1,1 X0,0Y1,2 X0,1Y1,0 X0,1Y1,1 X0,1Y1,2X0,0Y2,0 X0,0Y2,1 X0,0Y2,2 X0,1Y2,0 X0,1Y2,1 X0,1Y2,2X1,0Y0,0 X1,0Y0,1 X1,0Y0,2 X1,1Y0,0 X1,1Y0,1 X1,1Y0,2X1,0Y1,0 X1,0Y1,1 X1,0Y1,2 X1,1Y1,0 X1,1Y1,1 X1,1Y1,2X1,0Y2,0 X1,0Y2,1 X1,0Y2,2 X1,1Y2,0 X1,1Y2,1 X1,1Y2,2

. (2.2.8)

The image of a linear operator X ∈ L(HA,HB), denoted by im(X), is theset defined as

im(X) := |φ〉B ∈ HB : |φ〉B = X|ψ〉A, |ψ〉A ∈ HA. (2.2.9)

It is also known as the column space or range of X. The image of X is asubspace of HB. The rank of X, denoted by rank(X), is defined2 to be the di-mension of im(X). Note that rank(X) ≤ mindA, dB for all X ∈ L(HA,HB).

The kernel of a linear operator X ∈ L(HA,HB), denoted by ker(X), isdefined to be the set of vectors in the input space HA of X for which theoutput is the zero vector; i.e.,

ker(X) := |ψ〉A ∈ HA : X|ψ〉A = 0. (2.2.10)

It is also known as the null space of X. The following dimension formulaholds

dA = rank(X) + dim(ker(X)), (2.2.11)

2The rank of a linear operator can also be equivalently defined as the number of its singu-lar values; please see Theorem 2.1.

10

Page 22: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

and it is known as the rank-nullity theorem (dim(ker(X)) is called the nullityof X).

The support of a linear operator X ∈ L(HA,HB), denoted by supp(X), isdefined to be the orthogonal complement of the kernel:

supp(X) := ker(X)⊥ := |ψ〉 ∈ HA : 〈ψ|φ〉 = 0 ∀|φ〉 ∈ ker(HA). (2.2.12)

It is also known as the row space or coimage of X.

The trace of a linear operator X ∈ L(H) acting on a d-dimensional Hilbertspace H is defined as

Tr[X] :=d−1

∑i=0〈i|X|i〉, (2.2.13)

which can be interpreted as the sum of the diagonal elements of the matrixcorresponding to X. The trace satisfies the cyclicity property, which is thefollowing: for X, Y, Z ∈ L(H), the following equalities hold

Tr[XYZ] = Tr[YZX] = Tr[ZXY]. (2.2.14)

More generally, if ZA→B ∈ L(HA,HB), YB→C ∈ L(HB,HC), and XC→A ∈L(HC,HA), the following cyclicity of trace equalities hold

Tr[XC→AYB→CZA→B] = Tr[YB→CZA→BXC→A] (2.2.15)= Tr[ZA→BXC→AYB→C]. (2.2.16)

Consider X ∈ L(HA,HB) as written in (2.2.4). The transpose of X is de-noted by XT or alternatively by T(X), and it is defined as

XT ≡ T(X) =dB−1

∑i=0

dA−1

∑j=0

Xi,j|j〉A〈i|B. (2.2.17)

Note that it is basis dependent, in the sense that it is defined with respect to aparticular basis (in the case above, we have defined it with respect to the stan-dard bases of HA and HB). Furthermore, taking the transpose with respect toone orthonormal basis can lead to an operator different from that found bytaking the transpose with respect to a different orthonormal basis. In thissense, we could more precisely refer to the operation in (2.2.17) as the “stan-dard transpose.” The standard transpose can also be understood as a linearsuperoperator (an operator on operators) with the following representation:

T(X) =dB−1

∑i=0

dA−1

∑j=0

(|j〉A〈i|B) X (|j〉A〈i|B) =dB−1

∑i=0

dA−1

∑j=0

Xi,j|j〉A〈i|B. (2.2.18)

11

Page 23: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

Superoperators are discussed in more detail in Section 2.2.4.

The conjugate transpose of X, also known as the Hermitian conjugate or theadjoint of X, is the linear operator X† ∈ L(HB,HA) defined as

X† :=dB−1

∑i=0

dA−1

∑j=0

Xi,j|j〉A〈i|B. (2.2.19)

The adjoint of X is the unique linear operator that satisfies

〈φ|Xψ〉 = 〈X†φ|ψ〉 (2.2.20)

for all |ψ〉 ∈ HA and |φ〉 ∈ HB. Although we have defined the conjugatetranspose with respect to the standard basis, the identity in (2.2.20) indicatesthat the conjugate transpose of X is in fact basis independent. That is, onealways arrives at the same operator even when taking the conjugate transposewith respect to different orthonormal bases.

On the vector space L(HA,HB), we define the Hilbert–Schmidt inner prod-uct as follows:

〈X, Y〉 := Tr[X†Y], X, Y ∈ L(HA,HB). (2.2.21)

With this inner product, the vector space L(HA,HB) becomes an inner prod-uct space, and thus a Hilbert space. The basis defined in (2.2.3) is an or-thonormal basis for L(HA,HB) under this inner product. The Hilbert–Schmidtnorm (or Schatten 2-norm) of an operator X ∈ L(HA,HB) is defined from theHilbert–Schmidt inner product as follows:

‖X‖2 :=√〈X, X〉, (2.2.22)

as a generalization of the Euclidean norm in (2.1.5). The Cauchy–Schwarzinequality for the Hilbert–Schmidt inner product is as follows:

|〈X, Y〉|2 ≤ 〈X, X〉 · 〈Y, Y〉 = ‖X‖22 · ‖Y‖2

2 , (2.2.23)

for all X, Y ∈ L(HA,HB). Observe that the inequality in (2.2.23) is a general-ization of that in (2.1.6).

The Hilbert space L(HA,HB) is isomorphic to the tensor-product Hilbertspace HA ⊗HB, where the isomorphism is defined by

|i〉B〈j|A ↔ |j〉A ⊗ |i〉B =: vec(|i〉B〈j|A) (2.2.24)

12

Page 24: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

for all i ∈ 0, . . . , dB − 1 and j ∈ 0, . . . , dA − 1. The operation vec on theright in (2.2.24) is the “vectorize” operation, which transposes the rows of adB× dA matrix with respect to the standard basis and then stacks the resultingcolumns in order one on top of the next in order to construct a dAdB vector.This fact leads to the following useful identities that we call upon repeatedlythroughout this book.

1. For a linear operator X ∈ L(HA,HB),

vec(X) = (1A ⊗ XA→B)|Γ〉AA, (2.2.25)

where

|Γ〉AA :=dA−1

∑i=0|i, i〉AA. (2.2.26)

For reasons that become clear later, we refer to |Γ〉AA as the “maximallyentangled vector.” Note that |Γ〉AA = vec(1A). We also note that, for twovectors |ψ〉B ∈ HB and |φ〉A ∈ HA,

vec(|ψ〉B〈φ|A) = |φ〉A ⊗ |ψ〉B. (2.2.27)

This follows by writing both |ψ〉B and |φ〉A using the orthonormal bases|i〉BdB−1

i=0 and |j〉AdA−1j=0 and using (2.2.24).

2. For every vector |ψ〉AB ∈ HA⊗HB, there exists a linear operator XψA→B ∈

L(HA,HB) such that

(1A ⊗ XψA→B)|Γ〉AA = |ψ〉AB. (2.2.28)

In particular, if |ψ〉AB = ∑dA−1i=0 ∑dB−1

j=0 αi,j|i, j〉AB, then we can take Xψ =

∑dB−1j=0 ∑dA−1

i=0 αi,j|j〉B〈i|A. Alternatively, there exists a linear operator YψB→A

∈ L(HB,HA) such that

(YψB→A ⊗ 1B)|Γ〉BB = |ψ〉AB. (2.2.29)

In particular, we can set Yψ = ∑dB−1j=0 ∑dA−1

i=0 αi,j|i〉A〈j|B.

3. Transpose trick: For a linear operator X ∈ L(Cd), the following equalityholds

(1⊗ X)|Γ〉 = (XT ⊗ 1)|Γ〉, (2.2.30)

13

Page 25: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

By setting X = U, where U is a unitary operator, left-multiplying by U,and noticing that UUT = 1, the transpose trick identity in (2.2.30) impliesthe following identity:

|Γ〉 = (U ⊗U)|Γ〉. (2.2.31)

This latter identity can be understood in an alternate way as follows.Let |φ〉kd

k=1 be an orthonormal basis for Cd. Then by setting U =

∑dk=1 |φ〉k〈k − 1|, so that U = ∑d

k=1 |φk〉〈k − 1|, we find that the follow-ing identity holds for every orthonormal basis |φ〉kd

k=1:

|Γ〉 =d

∑k=1|φk〉 ⊗ |φk〉. (2.2.32)

4. For every linear operator X ∈ L(Cd),

Tr[X] = 〈Γ|(X⊗ 1)|Γ〉. (2.2.33)

The following classes of linear operators in L(H) are of particular impor-tance.

• Hermitian operators: Also known as self-adjoint operators, they are linearoperators X such that X† = X. The set of all Hermitian operators is a realvector space with dimension d2, where d = dim(H). This means thatevery Hermitian operator X can be expanded in an orthonormal basis

σk,` : 0 ≤ k, ` ≤ d− 1

of Hermitian operators such that

X =d−1

∑k,`=0

xk,`σk,`, (2.2.34)

where xk,` are real numbers.

An example of an orthonormal basis σk,` : 0 ≤ k, ` ≤ d− 1 of Hermitianoperators is the following:

σ0,0 :=1√d

, (2.2.35)

σ(+)k,` :=

√12(|k〉〈`|+ |`〉〈k|) , 0 ≤ k < ` ≤ d− 1, (2.2.36)

σ(i)k,` :=

√12(−i|k〉〈`|+ i|`〉〈k|) , 0 ≤ k < ` ≤ d− 1, (2.2.37)

14

Page 26: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

σk,k :=

√1

k(k + 1)

((k−1

∑j=0|j〉〈j|

)− k|k〉〈k|

),

1 ≤ k ≤ d− 1, (2.2.38)

Observe that there are d(d−1)2 operators labeled as σ

(+)k,` and d(d−1)

2 opera-

tors labeled as σ(i)k,` . If we scale each of the above operators by

√d, then

they are called the generalized Gell-Mann matrices. When d = 2, thegeneralized Gell-Mann matrices reduce to the Pauli matrices:

σI :=(

1 00 1

), σX :=

(0 11 0

), (2.2.39)

σY :=(

0 −ii 0

), σZ :=

(1 00 −1

). (2.2.40)

• Positive semi-definite operators: These are Hermitian operators X such that〈ψ|X|ψ〉 ≥ 0 for all |ψ〉 ∈ H. For all positive semi-definite operators X,there exists a linear operator Y such that X = Y†Y. X is called positivedefinite if 〈ψ|X|ψ〉 > 0 for all |ψ〉 ∈ H. We write X ≥ 0 if X is positivesemi-definite, and X > 0 if X is positive definite. If X − Z is positivesemi-definite for Hermitian operators X and Z, then we write X ≥ Z,and if X − Z is positive definite, then we write X > Z. The orderingX ≥ Z on Hermitian operators is a partial order called the Löwner orderand is discussed more in Definition 2.3.

• Density operators: These are Hermitian operators that are positive semi-definite and have unit trace. Density operators are generalizations ofprobability distributions from classical information theory and describethe states of a quantum system, as detailed in Chapter 3.

• Unitary operators: These are linear operators U whose inverses are equalto their adjoints, meaning that U†U = UU† = 1. Unitary operators gen-eralize invertible maps or permutations from classical information theoryand describe the noiseless evolution of the state of a quantum system, asdetailed in Chapter 3.

• Isometric operators or isometries: These are linear operators V such thatV†V = 1. Isometries also describe the noiseless evolution of the state ofa quantum system, as detailed in Chapter 3.

15

Page 27: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

Note that every linear operator X can be decomposed as

X = Re[X] + i Im[X], (2.2.41)

where Re[X] := 12(X + X†) and Im[X] := 1

2i(X − X†) are both Hermitianoperators, generalizing how complex numbers can be decomposed into realand imaginary parts.

An important fact that we make use of throughout this book is the singularvalue decomposition theorem.

Theorem 2.1 Singular Value Decomposition

For every linear operator X ∈ L(HA,HB) with r := rank(X), there existstrictly positive real numbers sk > 0 : 1 ≤ k ≤ r, called the singularvalues of X, and orthonormal vectors |ek〉B : 1 ≤ k ≤ r and | fk〉A :1 ≤ k ≤ r, such that

X =r

∑k=1

sk|ek〉B〈 fk|A. (2.2.42)

This is called the singular value decomposition of X.

REMARK: From (2.2.42), we see that the rank of a linear operator X, which we definedearlier as the dimension of the image of X, is equal to the number of singular values of X.

The singular value decomposition theorem can be written in the followingmatrix form that is familiar from elementary linear algebra. We first extendthe orthonormal vectors |ek〉B : 1 ≤ k ≤ r to an orthonormal basis |ek〉B :1 ≤ k ≤ dB for HB, and similarly, we extend the set | fk〉A : 1 ≤ k ≤ r oforthonormal vectors on HA to an orthonormal basis | fk〉A : 1 ≤ k ≤ dA forHA. Then, by defining the unitaries

W :=dB

∑k=1|ek〉B〈k− 1|B, V :=

dA

∑k=1|gk〉A〈k− 1|A, (2.2.43)

Eq. (2.2.42) can be written as

X = WSV†, (2.2.44)

where

S :=r

∑k=1

sk|k− 1〉B〈k− 1|A (2.2.45)

16

Page 28: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

is the matrix of singular values. Note that the matrix S is a dB × dA rectangu-lar matrix and (2.2.45) is only specifying the non-zero entries of S along thediagonal.

Another important decomposition of a linear operator is the polar decom-position.

Theorem 2.2 Polar Decomposition

Any linear operator X ∈ L(H) can be written as X = UP, where U is aunitary operator and P is a positive semi-definite operator. In particular,P = |X| :=

√X†X.

PROOF: If X = WSV† is the singular value decomposition of X, then we cantake P = VSV† and U = WV†.

Throughout this book, we make use of the Löwner partial order for Hermi-tian operators.

Definition 2.3 Löwner Partial Order for Hermitian Operators

For two Hermitian operators X and Y, the expression X ≥ Y means thatX−Y ≥ 0, i.e., that X−Y is positive semi-definite. We also write X ≤ Yto mean Y− X ≥ 0. The expressions X > Y and Y < X mean that X−Yis positive definite.

The relations “≥” and “≤” satisfy the following expected properties: X ≤Y and X ≥ Y imply that X = Y, and X ≤ Y and Y ≤ Z imply that X ≤ Z.The term “partial order” is used because not every pair (X, Y) of Hermitianoperators satisfies either X ≥ Y or X ≤ Y.

2.2.1 The Spectral Theorem

Given a linear operator X ∈ L(H) acting on some Hilbert space H, if thereexists a vector |ψ〉 ∈ H such that

X|ψ〉 = λ|ψ〉, (2.2.46)

then |ψ〉 is said to be an eigenvector of X with associated eigenvalue λ. Theset of all eigenvectors associated with an eigenvalue λ is a subspace of H

17

Page 29: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

called the eigenspace of X associated with λ, and the multiplicity of λ is the num-ber of linearly independent eigenvectors of X that are associated with λ (inother words, the dimension of the eigenspace of X associated with λ). Theeigenspace of X associated with λ is equal to ker(X− λI).

The spectral theorem, which we state below, allows us to decompose ev-ery normal operator X, i.e., an operator that commutes with its adjoint, so thatXX† = X†X, in terms of its eigenvalues and projections onto its correspond-ing eigenspaces.

Theorem 2.4 Spectral Theorem

For every normal operator X ∈ L(H), there exists n ∈N such that

X =n

∑j=1

λjΠj, (2.2.47)

where λ1, λ2, . . . , λn ∈ C are the distinct eigenvalues of X andΠ1, Π2, . . . , Πn are the spectral projections onto the correspondingeigenspaces, which satisfy Π1 + Π2 + · · ·+ Πn = 1H and ΠiΠj = δi,jΠi.The decomposition in (2.2.47) is unique and is called the spectral decompo-sition of X. The spectrum of X is denoted by spec(X) := λ1, λ2, . . . , λn.

REMARK: The multiplicity of an eigenvalue λi is equal to the rank of the correspondingprojection Πi.

The property ΠiΠj = δi,jΠi of the spectral projections indicates that the eigenspacesof a linear operator are orthogonal to each other. In other words, for every eigenvector|ψλ1〉 associated with the eigenvalue λ1 and every eigenvector |ψλ2〉 associated with theeigenvector λ2, where λ1 6= λ2, we have that 〈ψλ1 |ψλ2〉 = 0.

If we take the spectral decomposition of a normal operator X ∈ L(H) in(2.2.47) and find orthonormal bases for the corresponding eigenspaces, thenX can be written as

X =d

∑k=1

λk|ψk〉〈ψk|, (2.2.48)

where d = dim(H) and |ψk〉dk=1 is a set of orthonormal vectors such that

|ψ1〉〈ψ1|+ |ψ2〉〈ψ2|+ · · ·+ |ψd〉〈ψd| = 1H. (2.2.49)

From this decomposition, we see clearly that |ψk〉 is an eigenvector of X withassociated eigenvalue λk.

18

Page 30: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

The spectral theorem is a statement of the fact familiar from elementarylinear algebra that every normal operator can be unitarily diagonalized. In-deed, if we consider the spectral decomposition in (2.2.48), and we definethe unitary operator U := ∑d

k=1 |ψk〉〈k − 1|, then (2.2.48) can be written asX = UDU†, where D := ∑d

k=1 λk|k− 1〉〈k− 1| is a diagonal operator.

Note that in the decomposition (2.2.48) the numbers λk ∈ C are not alldistinct because the eigenspace associated with each eigenvalue can have di-mension greater than one. Also, note that the decomposition in (2.2.48) isgenerally not unique because the decomposition of the spectral projectionsinto orthonormal vectors is not unique.

From (2.2.48), it is evident that the rank of a normal operator X (recallthe discussion around (2.2.9)) is equal to the number of non-zero eigenvaluesof X (including their multiplicities). Furthermore, the support of X (recall(2.2.12)) is equal to the span of all eigenvectors of X associated with the non-zero eigenvalues of X. In particular,

ΠX := ∑k:λk 6=0

|ψk〉〈ψk| (2.2.50)

is the projection onto the support of X. It is also evident that the trace of X isequal to the sum of its eigenvalues, i.e., Tr[X] = ∑d

k=1 λk = ∑k:λk 6=0 λk.

The singular values of a linear operator X (not necessarily normal) are re-lated to the eigenvalues of X†X and XX† in the following way. Let skrank(X)

k=1

be the set of singular values of X, and let λkrank(X)k=1 be the non-zero eigen-

values of X†X, which are the same as the eigenvalues of XX†. (Note thatboth X†X and XX† are normal operators, so that the spectral theorem appliesto them.) Then, sk =

√λk for all 1 ≤ k ≤ rank(X). In particular, if X is a

Hermitian operator, then sk =√

λ2k = |λk| for all 1 ≤ k ≤ rank(X).

An important and easily-verifiable fact is that every Hermitian operatorX has real eigenvalues. We can thus split every Hermitian operator X into itspositive part, denoted by X+ and its negative part, denoted by X−, so that

X = X+ − X−. (2.2.51)

In particular, if X = ∑dk=1 λk|ψk〉〈ψk| is a spectral decomposition of X, then

X+ := ∑k:λk≥0

λk|ψk〉〈ψk|, X− := ∑k:λk<0

|λk| |ψk〉〈ψk|. (2.2.52)

19

Page 31: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

Note that both X+ and X− are positive semi-definite operators, and they aresupported on orthogonal subspaces, meaning that X+X− = 0. Such a decom-position of a Hermitian operator X into positive and negative parts is calledthe Jordan–Hahn decomposition of X.

Another important fact is that a Hermitian operator X is positive semi-definite if and only if all of its eigenvalues are non-negative, and positivedefinite if and only if all of its eigenvalues are strictly positive. The latterimplies that every positive definite operator X ∈ L(H) has full support, i.e.,supp(X) = H and rank(X) = dim(H). For every positive semi-definite op-erator X, the operator X + ε1 is positive definite for all ε > 0.

2.2.1.1 Functions of Hermitian Operators

Using the spectral decomposition as in (2.2.48), we can define the notion of afunction of a Hermitian operator.

Let f : R → R be a real-valued function with domain dom( f ). For everyHermitian operator X ∈ L(H) with spectral decomposition

X =d

∑k=1

λk|ψk〉〈ψk|, (2.2.53)

where d = dim(H), we define f (X) as the following operator:

f (X) := ∑k:λk∈dom( f )

f (λk)|ψk〉〈ψk|. (2.2.54)

Functions that arise frequently throughout this book are as follows:

• Power functions: for every α ∈ N, the function f (x) = xα, with x ∈ R,extends to Hermitian operators via (2.2.54) as

Xα :=d

∑k=1

λαk |ψk〉〈ψk|, α ∈N. (2.2.55)

The function f (x) = x−α = 1xα , for α ∈ N, has domain x ∈ R : x 6= 0,

so thatX−α := ∑

k:λk 6=0

1λα

k|ψk〉〈ψk|, α ∈N. (2.2.56)

20

Page 32: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

For α ∈ (0, ∞), the function f (x) = xα has domain x ∈ R : x ≥ 0, sothat

Xα := ∑k:λk≥0

λαk |ψk〉〈ψk|, α ∈ (0, ∞). (2.2.57)

If X is a positive semi-definite operator (so that λk ≥ 0 for all k), thenin the case α = 1

2 we typically use the notation√

X to refer to X12 . In

particular,√

X is the unique positive semi-definite operator such that√X√

X = X.For α = 0, because f (x) = x0 has domain x ∈ R : x 6= 0, the followingequality holds

X0 := ∑k:λk 6=0

|ψk〉〈ψk| = ΠX, (2.2.58)

where ΠX is the projection into the support of X; see (2.2.50).

The function f (x) = x−α = 1xα , for α ∈ (0, ∞) has domain x ∈ R : x > 0,

so thatX−α := ∑

k:λk>0

1λα

k|ψk〉〈ψk|, α ∈ (0, ∞). (2.2.59)

• Logarithm functions: For the function logb : (0, ∞) → R with base b > 0,we define

logb(X) := ∑k:λk>0

logb(λk)|ψk〉〈ψk|. (2.2.60)

We deal throughout this book exclusively with the base-2 logarithm log2and the base-e logarithm loge ≡ ln.

Definition 2.5 Operator Convex, Concave, Monotone Functions

Let X and Y be Hermitian operators, and let f : R → R be a functionextended to Hermitian operators as in (2.2.54).

1. The function f is called operator convex if for all λ ∈ [0, 1] and Her-mitian operators X, Y, the following inequality holds

f (λX + (1− λ)Y) ≤ λ f (X) + (1− λ) f (Y). (2.2.61)

We call f operator concave if − f is operator convex.

2. The function f is called operator monotone if X ≤ Y implies f (X) ≤f (Y) for all Hermitian operators X, Y. We call f operator anti-monotone if − f is operator monotone.

21

Page 33: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

The functions considered above have the following properties with re-spect to Definition 2.5:

• The function x 7→ xα is operator monotone for α ∈ [0, 1], operator anti-monotone for α ∈ [−1, 0), operator convex for α ∈ [−1, 0) ∪ [1, 2], andoperator concave for α ∈ (0, 1]. Note that this function is neither operatormonotone, operator convex, nor operator concave for α < −1 and α > 2.

• The function x 7→ logb(x), for every base b > 0 and x ∈ (0, ∞), is operatormonotone and operator concave.

• The function x 7→ x logb(x), for every base b > 0 and x ∈ [0, ∞), isoperator convex3.

For proofs of these properties, please consult the Bibliographic Notes (Sec-tion 2.6). We note here that these properties are critical for understandingquantum entropies, as detailed in Chapter 4.

We end this section with the following lemma:

Lemma 2.6

Let L be a square operator and f a function such that the squares of thesingular values of L are in the domain of f . Then

L f (L†L) = f (LL†)L. (2.2.62)

PROOF: This is a direct consequence of the singular value decomposition the-orem (Theorem 2.1). Let L = UDV be a singular value decomposition ofL, where U and V are unitary operators and D is a diagonal, positive semi-definite operator. Then

L f (L†L) = UDV f ((UDV)† UDV) (2.2.63)

= UDV f (V†DU†UDV) (2.2.64)

= UDVV† f (D2)V (2.2.65)

= UD f (D2)V (2.2.66)

= U f (D2)DV (2.2.67)

= U f (DVV†D)U†UDV (2.2.68)

3Note that, since limx→0 x logb(x) = 0, we take the convention that 0 logb(0) = 0 through-out this book.

22

Page 34: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

= f (UDVV†DU†)UDV (2.2.69)

= f (LL†)L. (2.2.70)

This concludes the proof.

2.2.2 Norms

A norm on a Hilbert space H (more generally a vector space) is a function‖·‖ : H→ R that satisfies the following properties:

• Non-negativity: ‖|ψ〉‖ ≥ 0 for all |ψ〉 ∈ H, and ‖|ψ〉‖ = 0 if and only if|ψ〉 = 0.

• Scaling: For every α ∈ C and |ψ〉 ∈ H, ‖α|ψ〉‖ = |α| ‖|ψ〉‖.• Triangle inequality: For all |ψ〉, |φ〉 ∈ H, ‖|ψ〉+ |φ〉‖ ≤ ‖|ψ〉‖+ ‖|φ〉‖.

An immediate consequence of the scaling property and the triangle inequalityis that a norm is convex (i.e., ‖λ|ψ〉+ (1− λ)|φ〉‖ ≤ λ ‖|ψ〉‖+ (1− λ) ‖|φ〉‖for all vectors |ψ〉 and |φ〉 and λ ∈ [0, 1].

In this section, we are primarily interested in the Hilbert space L(H) oflinear operators X : H → H for some Hilbert space H. The following normfor linear operators is used extensively in this book.

Definition 2.7 Schatten Norm

For every linear operator X ∈ L(H) acting on a Hilbert space H, wedefine its Schatten α-norm as

‖X‖α := (Tr[|X|α]) 1α (2.2.71)

for all α ∈ [1, ∞), where |X| :=√

X†X.

Throughout this book, we extend the function ‖·‖α to include α ∈ (0, 1)(with the definition exactly as in (2.2.71)), although in this case it is not a normbecause it does not satisfy the triangle inequality.

It follows directly from Theorem 2.1, (2.2.57), and Definition 2.7 that, if

23

Page 35: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

skrk=1 is the set of singular values of X, where r := rank(X), then

‖X‖α =

(r

∑k=1

sαk

) 1α

(2.2.72)

for all α > 0. If X is a Hermitian operator with non-zero eigenvalues λkrk=1,

then its singular values are sk = |λk|, where k ∈ 1, . . . , r. Therefore, forall α > 0,

‖X‖α =

(r

∑k=1|λk|α

) 1α

(2.2.73)

whenever X is Hermitian.

Using Definition 2.7, it is straightfoward to show that for all α > 0,∥∥∥X†X

∥∥∥α=∥∥∥XX†

∥∥∥α= ‖X‖2

2α . (2.2.74)

An important case of the Schatten norms is the Schatten ∞-norm, whichis defined as

‖X‖∞ := limα→∞‖X‖α , (2.2.75)

Proposition 2.8 Schatten ∞-Norm is Largest Singular Value

For a linear operator X ∈ L(HA,HB), the following equality holds

‖X‖∞ = smax, (2.2.76)

where smax is the largest singular value of X.

PROOF: Let ~s := (sk)rk=1 denote the vector of singular values of X, where

r := rank(X). Then for all α ≥ 1 and k ∈ 1, . . . , r, the inequality sαk ≤

∑rk=1 sα

k holds, which implies that sk ≤ ‖~s‖α where ‖~s‖α :=(∑r

k=1 sαk) 1

α . Notethat ‖~s‖α = ‖X‖α. So we conclude that the following inequality holds for allα ≥ 1

smax ≤ ‖X‖α . (2.2.77)

Now consider for α > β > 1 that

‖X‖α = ‖~s‖α =

(r

∑k=1

sα−βk sβ

k

) 1α

≤(

r

∑k=1

sα−βmax sβ

k

) 1α

(2.2.78)

24

Page 36: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

= s1− βα

max

(r

∑k=1

sβk

) 1α

= s1− βα

max ‖~s‖βαβ = s1− β

αmax ‖X‖

βαβ . (2.2.79)

Combining (2.2.77) and (2.2.79), we find for α > β > 1 that

smax ≤ ‖X‖α ≤ s1− βα

max ‖X‖βαβ . (2.2.80)

By fixing β and letting α become large, we find that the limit α → ∞ of theright-hand side is smax. Since the limits of the lower and upper bounds asα → ∞ are the same, it follows that limα→∞ ‖X‖α = smax. Now applying thedefinition in (2.2.75), we conclude the statement of the proposition.

Due to Proposition 2.8, the term spectral norm is often used to refer to thisnorm. The Schatten ∞-norm is also referred to as the operator norm, because itis the norm induced by the Euclidean norm on the underlying Hilbert spaceon which the operator X acts, i.e.,

‖X‖∞ = sup|ψ〉6=0

‖X|ψ〉‖2‖|ψ〉‖2

= sup|ψ〉:‖|ψ〉‖2=1

‖X|ψ〉‖2 . (2.2.81)

In the equation above, we have employed the shorthand sup, which standsfor supremum. We also often employ inf for infimum. These concepts arereviewed in Section 2.3.1. The fact that ‖X‖∞ is equal to the largest singularvalue means that

‖X‖∞ = sup|〈ψ|X|φ〉| : ‖|ψ〉‖2 = ‖|φ〉‖2 = 1, (2.2.82)

which is true because X has a singular value decomposition of the form X =

∑rank(X)k=1 sk|ψk〉〈φk| (see Theorem 2.1). If X is Hermitian and positive semi-

definite, then ‖X‖∞ is equal to the largest eigenvalue of X, and we can write

‖X‖∞ = sup|ψ〉:‖|ψ〉‖2=1

〈ψ|X|ψ〉 (2.2.83)

= supρ≥0Tr[Xρ] : Tr[ρ] = 1 (X positive semi-definite). (2.2.84)

More generally, if λkrank(X)k=1 are the eigenvalues of X, then

‖X‖∞ = sup|ψ〉:‖|ψ〉‖2=1

|〈ψ|X|ψ〉| (2.2.85)

25

Page 37: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

= max1≤k≤rank(X)

|λk| (X Hermitian). (2.2.86)

Another important special case is α = 1. In this case, we refer to ‖·‖1 asthe trace norm, and by applying (2.2.72), it is equal to the sum of the singularvalues of X:

‖X‖1 =r

∑k=1

sk. (2.2.87)

If X is Hermitian and positive semi-definite, then ‖X‖1 is equal to the sum ofthe eigenvalues of X.

Proposition 2.9 Variational Characterization of Trace Norm

For X ∈ L(HA,HB) with dA ≥ dB, the trace norm has the followingvariational characterization:

‖X‖1 = supU|Tr[UB→AXA→B]|, (2.2.88)

where the optimization is with respect to all isometries UB→A ∈L(HB,HA). If dA ≤ dB, then we instead have

‖X‖1 = supV|Tr[VA→B(XA→B)

†]|, (2.2.89)

with the isometry VA→B ∈ L(HA,HB).

PROOF: To see this, let X = ∑rk=1 sk|ek〉B〈 fk|A be the singular value decom-

position of X, where r := rank(X). Then we have that

|Tr[UX]| =∣∣∣∣∣Tr

[U

(r

∑k=1

sk|ek〉B〈 fk|A)]∣∣∣∣∣ =

∣∣∣∣∣r

∑k=1

sk〈 fk|AU|ek〉B∣∣∣∣∣ (2.2.90)

≤r

∑k=1

sk |〈 fk|AU|ek〉B| ≤r

∑k=1

sk = ‖X‖1 . (2.2.91)

The first inequality follows from the triangle inequality and the second from(2.2.82) and because ‖U‖∞ = 1 for all isometries. So the inequality |Tr[UX]| ≤‖X‖1 holds for all isometries UB→A ∈ L(HB,HA). Furthermore, the op-timal isometry that saturates the inequality above can be determined fromthe singular value decomposition of X; i.e., U = ∑dB

k=1 | fk〉A〈ek|B is opti-mal for the optimization in (2.2.88), where |ek〉B : r + 1 ≤ k ≤ dB and

26

Page 38: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

| fk〉B : r + 1 ≤ k ≤ dB are some other orthonormal vectors chosen to com-plete the isometry U, so that U†U = 1B. Then we find the following for thischoice:

|Tr[UX]| =∣∣∣∣∣Tr

[(dB

∑k′=1| fk′〉A〈ek′ |B

)(r

∑k=1

sk|ek〉B〈 fk|A)]∣∣∣∣∣ (2.2.92)

=

∣∣∣∣∣r

∑k=1

sk

∣∣∣∣∣ = ‖X‖1 . (2.2.93)

The proof of (2.2.89) is essentially identical to the proof already given.

The following inequality holds

‖X‖∞ ≤ ‖X‖1 (2.2.94)

for all linear operators X, because ‖X‖∞ = maxk∈1,...,r sk and ‖X‖1 = ∑rk=1 sk,

where skrk=1 are the singular values of X and r := rank(X). The following

proposition gives a tighter bound in the special case when X is a tracelessHermitian operator.

Lemma 2.10

Let X be a Hermitian operator satisfying Tr[X] = 0. Then

‖X‖∞ ≤12‖X‖1 . (2.2.95)

PROOF: Let the Jordan–Hahn decomposition of X be given by

X = X+ − X−, (2.2.96)

where X+, X− ≥ 0 and X+X− = 0. Then

‖X‖1 = Tr[X+] + Tr[X−]. (2.2.97)

Since Tr[X] = 0, we have that Tr[X+] = Tr[X−], which means that

‖X‖1 = 2Tr[X+]. (2.2.98)

We also have that‖X‖∞ = max‖X+‖∞ , ‖X−‖∞ (2.2.99)

27

Page 39: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

because X+X− = 0. Then

‖X‖∞ = max ‖X+‖∞ , ‖X−‖∞ ≤ Tr[X+] = ‖X‖1 /2, (2.2.100)

with the inequality following from (2.2.94) and the fact that Tr[X+] = Tr[X−] =‖X+‖1.

In the following proposition, we state several important properties of theSchatten norm.

Proposition 2.11 Properties of Schatten Norm

For all α ∈ [1, ∞], the Schatten norm ‖·‖α has the following properties.

1. Monotonicity: The Schatten norm is monotonically non-increasingwith α: for α ≥ β and a linear operator X, the following holds

‖X‖α ≤ ‖X‖β . (2.2.101)

In particular, we have then that ‖X‖∞ ≤ ‖X‖α ≤ ‖X‖1 for all 1 ≤α ≤ ∞ and linear operators X.

2. Isometric invariance: For every two isometries U and V,

‖X‖α =∥∥∥UXV†

∥∥∥α

. (2.2.102)

3. Submultiplicativity: For every two linear operators X and Y,

‖XY‖α ≤ ‖X‖α ‖Y‖α . (2.2.103)

4. Multiplicativity with respect to tensor product: For every two linearoperators X and Y,

‖X⊗Y‖α = ‖X‖α ‖Y‖α . (2.2.104)

5. Direct-sum property: Given a collection Xxx∈X of linear operatorsindexed by a finite alphabet X, the following equality holds for ev-ery orthonormal basis |x〉x∈X,

∥∥∥∥∥∑x∈X|x〉〈x| ⊗ Xx

∥∥∥∥∥

α

α

= ∑x∈X‖Xx‖α

α . (2.2.105)

28

Page 40: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

6. Duality: For all linear operators X,

‖X‖α = supY 6=0

∣∣∣Tr[Y†X]∣∣∣ : ‖Y‖β ≤ 1,

1α+

1β= 1

. (2.2.106)

The equality above implies Hölder’s inequality∣∣∣Tr[Z†X]

∣∣∣ ≤ ‖X‖α ‖Z‖β (2.2.107)

which holds for all linear operators X and Z, where α, β ∈ [1, ∞]satisfy 1

α + 1β = 1. In this sense, the norms ‖·‖α and ‖·‖β, with

1α + 1

β = 1, are said to be dual to each other.

PROOF: 1. In the case that X = 0, the statement is trivial. So we focus onthe case X 6= 0. Let skr

k=1 denote the singular values of X, where r :=rank(X). If we can show that d

dα ‖X‖α ≤ 0 for all α ≥ 1, then it followsthat ‖X‖α is monotone non-increasing with α. To this end, consider that

ddα‖X‖α =

ddα

(r

∑k=1

sαk

) 1α

=d

dαe

1α ln ∑r

k=1 sαk (2.2.108)

= e1α ln ∑r

k=1 sαk

ddα

(1α

lnr

∑k=1

sαk

)(2.2.109)

=

(r

∑k=1

sαk

) 1α(− 1

α2 lnr

∑k=1

sαk +

[d

dαln

r

∑k=1

sαk

])(2.2.110)

=

(r

∑k=1

sαk

) 1α(− 1

α2 lnr

∑k=1

sαk +

1α ∑r

k=1 sαk

[d

r

∑k=1

sαk

])(2.2.111)

=

(r

∑k=1

sαk

) 1α−1(

− 1α2

[r

∑k=1

sαk

]ln

[r

∑k=1

sαk

]+

[d

r

∑k=1

sαk

])(2.2.112)

=

(r

∑k=1

sαk

) 1α−1

α(

ddα ∑r

k=1 sαk

)−(∑r

k=1 sαk)

ln(∑r

k=1 sαk)

α2

(2.2.113)

29

Page 41: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

=

(r

∑k=1

sαk

) 1α−1(

α ∑rk=1 sα

k ln sk −(∑r

k=1 sαk)

ln(∑r

k=1 sαk)

α2

)(2.2.114)

=

(r

∑k=1

sαk

) 1α−1(

∑rk=1 sα

k ln sαk −

(∑r

k=1 sαk)

ln(∑r

k=1 sαk)

α2

). (2.2.115)

The first equality follows from (2.2.72), and the others are standard cal-culations. The term on the very left in the last line is non-negative and sois the denominator with α2. The inequality

r

∑k=1

sαk ln sα

k −(

r

∑k=1

sαk

)ln

(r

∑k=1

sαk

)≤ 0 (2.2.116)

then follows because it is equivalent to

r

∑k=1

sαk

∑rk=1 sα

kln sα

k ≤ ln

(r

∑k=1

sαk

). (2.2.117)

This latter inequality follows by defining the probabilities

pk :=r

∑k=1

sαk

∑rk=1 sα

k(2.2.118)

so that

r

∑k=1

sαk

∑rk=1 sα

kln sα

k =r

∑k=1

pk ln sαk ≤ ln

(r

∑k=1

pksαk

)≤ ln

(r

∑k=1

sαk

), (2.2.119)

where we applied concavity of the logarithm function, as well as themonotonicity of the logarithm and the fact that pk ≤ 1 for all k. So weconclude that d

dα ‖X‖α ≤ 0 for all α ≥ 1, which implies that ‖X‖α ismonotone non-increasing. In fact, observe that the proof given holds forα > 0, which implies that ‖X‖α is monotone non-increasing for all α > 0.

2. Isometric invariance holds because the singular values of a linear opera-tor X do not change under the action UXV† for isometries U and V.

3. First, let us establish the following slight strengthening of the Hölder in-equality in (2.2.107) for β ∈ [1, ∞] satisfying 1

α + 1β = 1:

∥∥∥Z†X∥∥∥

1≤ ‖X‖α ‖Z‖β . (2.2.120)

30

Page 42: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

This actually follows by a direct application of the Hölder inequality it-self: ∣∣∣Tr[UZ†X]

∣∣∣ ≤ ‖X‖α

∥∥∥ZU†∥∥∥

β= ‖X‖α ‖Z‖β , (2.2.121)

where U is an isometry. Since the inequality holds for all isometries, itfollows from (2.2.88) that

∥∥∥Z†X∥∥∥

1= sup

U

∣∣∣Tr[UZ†X]∣∣∣ ≤ ‖X‖α ‖Z‖β . (2.2.122)

Now let X and Y be linear operators and suppose that W 6= 0 satisfies‖W‖β ≤ 1 for β ∈ [1, ∞] satisfying 1

α + 1β = 1. Then consider that

∣∣∣Tr[W†XY]∣∣∣ =

∥∥∥W†X∥∥∥

1·∣∣∣∣Tr[

W†X‖W†X‖1

Y]∣∣∣∣ ≤ ‖X‖α ‖Y‖α . (2.2.123)

The inequality follows by applying (2.2.120) and the Hölder inequality:∥∥∥W†X

∥∥∥1≤ ‖X‖α ‖W‖β ≤ ‖X‖α , (2.2.124)

∣∣∣∣Tr[

W†X‖W†X‖1

Y]∣∣∣∣ ≤ ‖Y‖α

∥∥∥∥W†X‖W†X‖1

∥∥∥∥β

≤ ‖Y‖α . (2.2.125)

In the above, we applied the following inequality∥∥∥∥

W†X‖W†X‖1

∥∥∥∥β

≤ 1, (2.2.126)

which follows from the monotonicity of α-norms from (2.2.101):∥∥∥W†X

∥∥∥β≤∥∥∥W†X

∥∥∥1

. (2.2.127)

Since the inequality in (2.2.123) holds for all W 6= 0 satisfying ‖W‖β ≤ 1,we conclude (2.2.103) after applying (2.2.106).

4.-5. Multiplicativity with respect to the tensor product and the direct sumproperty follow immediately from the definition of ‖·‖α.

6. For a proof of (2.2.106), please consult the Bibliographic Notes (Section 2.6).Given (2.2.106), for every two linear operators X and Z, let Y = Z

‖Z‖β.

Then, ‖Y‖β ≤ 1, which means that

1‖Z‖β

∣∣∣Tr[Z†X]∣∣∣ =

∣∣∣Tr[Y†X]∣∣∣ ≤ ‖X‖α ⇒

∣∣∣Tr[Z†X]∣∣∣ ≤ ‖X‖α ‖Z‖β ,

(2.2.128)

31

Page 43: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

which is the inequality in (2.2.107).

This concludes the proof.

As indicated in the proof above, we actually have the following slightstrengthening of the Hölder inequality in (2.2.107):

∥∥∥Z†X∥∥∥

1≤ ‖X‖α ‖Z‖β (2.2.129)

holding for all linear operators X and Z and α, β ∈ [1, ∞] satisfying 1α +

1β = 1.

We remark that the monotonicity inequality ‖X‖∞ ≤ ‖X‖1 from the propo-sition above can be reversed to give

‖X‖1 ≤ d ‖X‖∞ (2.2.130)

for every linear operator X acting on a d-dimensional Hilbert space. Thisfollows because

‖X‖1 =r

∑k=1

sk ≤r

∑k=1

maxk∈1,...,r

sk = r maxk∈1,...,r

sk = r ‖X‖∞ ≤ d ‖X‖∞ ,

(2.2.131)where skr

k=1 is the set of singular values of X and r := rank(X).

In addition to the variational characterization of the Schatten norm ‖·‖αgiven in (2.2.106), we have the following variational characterization, whichextends to α ∈ (0, 1).

Proposition 2.12

Let α ∈ (0, 1) ∪ (1, ∞]. Then, for every β such that 1α + 1

β = 1, and everypositive semi-definite operator X,

‖X‖α =

infTr[XY

1β ] : Y > 0, Tr[Y] = 1 if α ∈ [0, 1),

supTr[XY1β ] : Y ≥ 0, Tr[Y] = 1 if α ∈ [1, ∞).

(2.2.132)

PROOF: Please consult the Bibliographic Notes (Section 2.6).

2.2.3 Operator Inequalities

We now state some basic operator inequalities that we use repeatedly through-out the book.

32

Page 44: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

Lemma 2.13 Basic Operator Inequalities

Let X, Y ∈ L(H) be Hermitian operators acting on a Hilbert space H,and let Z ∈ L(H,H′).

1. X ≥ 0 ⇒ ZXZ† ≥ 0 for all Z. In particular, X ≥ Y ⇒ ZXZ† ≥ZYZ† for all Z.

2. X ≥ Y ⇒ Tr[X] ≥ Tr[Y]. More generally, X ≥ Y ⇒ Tr[WX] ≥Tr[WY] for all W ∈ L(H) satisfying W ≥ 0.

3. For every Hermitian operator X with maximum and minimumeigenvalues λmax and λmin, respectively, λmin1 ≤ X ≤ λmax1.

4. Let X and Y be Hermitian operators with their spectrum in someinterval I ⊂ R, and let f : I → R be a monotone increasing function.If X ≤ Y, then Tr[ f (X)] ≤ Tr[ f (Y)]. In particular, if X and Y arepositive semi-definite, then

0 ≤ X ≤ Y ⇒ Tr[Xα] ≤ Tr[Yα] ∀ α > 0. (2.2.133)

PROOF:

1. X ≥ 0 implies that 〈ψ|X|ψ〉 ≥ 0 for all |ψ〉 ∈ H. Then, for every vector|φ〉 ∈ H′, we have 〈φ|ZXZ†|φ〉 ≥ 0 because Z†|φ〉 ≡ |ψ〉 is some vectorin H. Therefore, ZXZ† ≥ 0.Now, X ≥ Y is equivalent to X − Y ≥ 0. Let W = X − Y. Then, fromthe arguments in the previous paragraph, we have ZWZ† ≥ 0 for all Z,which implies that ZXZ† − ZYZ† ≥ 0, i.e., ZXZ† ≥ ZYZ†, as required.

2. X ≥ Y implies that X − Y ≥ 0. The trace of a positive semi-definite op-erator is non-negative, since positive semi-definite operators have non-negative eigenvalues and the trace of every normal operator is equal tothe sum of its eigenvalues. Thus, X−Y ≥ 0 implies Tr[X−Y] ≥ 0, whichimplies that Tr[X] ≥ Tr[Y], as required.Next, let W be a positive semi-definite operator. Using 1. above, X ≥ Yimplies that

√WX√

W ≥√

WY√

W. Then, using the result of the previ-ous paragraph, we obtain Tr[

√WX√

W] ≥ Tr[√

WY√

W]. Finally, by thecyclicity of the trace (recall (2.2.14)), we find that Tr[WX] ≥ Tr[WY], asrequired.

3. This result follows from the fact that, for every Hermitian operator X ∈33

Page 45: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

L(H) with eigenvalues λkdim(H)k=1 , the eigenvalues of X + t1 are equal

to λk + tdim(H)k=1 for every t ∈ R. In particular, then, by definition of

the minimum eigenvalue, X − λmin1 ≥ 0 since all of the eigenvalues ofX − λmin1 are non-negative. Similarly, by definition of the maximumeigenvalue, X − λmax1 ≤ 0 since all of the eigenvalues of X − λmax1 arenon-positive.

4. Let λ↓i (X) denote the sequence of decreasingly ordered eigenvalues of X.Then the inequalities λ↓i (X) ≤ λ↓i (Y) hold for all i ∈ 1, . . . , dim(H).These inequalities are a consequence of the Courant–Fischer–Weyl mini-max principle (please consult the Bibliographic Notes in Section 2.6 for areference to this principle). Then the desired inequality follows directlyfrom the fact that Tr[ f (X)] = ∑

dim(H)i=1 f (λ↓i (X)), as well as the mono-

tonicity of f . The inequality in (2.2.133) follows because the function xα

with domain x ≥ 0 is monotone for all α > 0.

This concludes the proof.

Lemma 2.14 Araki–Lieb–Thirring Inequality

For X and Y positive semi-definite operators acting on a finite-dimens-ional Hilbert space, and for q ≥ 0, the following inequalities hold:

1. Tr[(

Y12 XY

12

)rq]≥ Tr

[(Y

r2 XrY

r2

)q]for all 0 ≤ r < 1.

2. Tr[(

Y12 XY

12

)rq]≤ Tr

[(Y

r2 XrY

r2

)q]for all r ≥ 1.

PROOF: Please consult the Bibliographic Notes (Section 2.6).

The operator Jensen inequality is the linchpin of several quantum data-processing inequalities presented later on in Chapter 4. These in turn arerepeatedly used in Parts II and III to place fundamental limits on quantumcommunication protocols. As such, the operator Jensen inequality is a signif-icant bridge that connects convexity to information processing.

Theorem 2.15 Operator Jensen Inequality

Let f : R→ R be a continuous function with dom( f ) = I (where I is aninterval). Then, the following are equivalent:

34

Page 46: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

1. f is operator convex.

2. For all n ∈N, the inequality

f

(n

∑k=1

A†k Xk Ak

)≤

n

∑k=1

A†k f (Xk)Ak (2.2.134)

holds for every collection (Xk)nk=1 of Hermitian operators acting on

a Hilbert space H with spectrum contained in I and every collection(Ak)

nk=1 of linear operators in L(H′,H) satisfying ∑n

k=1 A†k Ak = 1H′ .

3. For every Hermitian operator X ∈ L(H) with spectrum in I andevery isometry V ∈ L(H′,H), the following inequality holds

f (V†XV) ≤ V† f (X)V. (2.2.135)

PROOF: We first prove that 2.⇒ 1. Let X and Y be Hermitian operators withtheir eigenvalues in I. Let λ ∈ [0, 1]. We can take n = 2, A1 =

√λI, X1 = X,

A2 =√

1− λI, X2 = Y, and the following operator inequality is an immedi-ate consequence of (2.2.134):

f (λX + (1− λ)Y) ≤ λ f (X) + (1− λ) f (Y). (2.2.136)

Since X, Y, and λ are arbitrary, it follows that f is operator convex.

3. is actually a special case of 2., found by setting n = 1 and taking A1 = Vand Xk = X, with V an isometry and X Hermitian with eigenvalues in I.

Now we prove that 3.⇒ 2. Fix n ∈ N and the sets (Ak)nk=1 and (Xk)

nk=1

of operators such that they satisfy the conditions specified in 2. Define thefollowing Hermitian operator:

X :=n

∑k=1

Xk ⊗ |k〉〈k|, (2.2.137)

as well as the isometry

V :=n

∑k=1

Ak ⊗ |k〉, (2.2.138)

where |k〉nk=1 is an orthonormal basis. The condition ∑n

k=1 A†k Ak = 1 and a

calculation imply that V is an isometry (satisfying V†V = 1). Another calcu-

35

Page 47: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

lation implies that

V†XV =n

∑k=1

A†k Xk Ak. (2.2.139)

Since

f (X) = f

(n

∑k=1

Xk ⊗ |k〉〈k|)

=n

∑k=1

f (Xk)⊗ |k〉〈k|, (2.2.140)

which follows as a consequence of (2.2.54), a similar calculation implies that

V† f (X)V =n

∑k=1

A†k f (Xk)Ak. (2.2.141)

Then the desired inequality in (2.2.134) follows from (2.2.139), (2.2.141), and(2.2.135).

We finally prove that 1.⇒ 3. Fix the operator X and isometry V as spec-ified in 3. Let M be a Hermitian operator in L(H′) with spectrum in I. LetP := 1H −VV†, and observe that P is a projection, V†P = 0, and PV = 0. Set

Z :=(

X 00 M

), U :=

(V P0 −V†

), W :=

(V −P0 V†

). (2.2.142)

Observe that U and W are unitary operators (these are called unitary dilationsof the isometry V). By direct calculation, we then find that

U†ZU =

(V†XV V†XPPXV PXP + VMV†

), (2.2.143)

W†ZW =

(V†XV −V†XP−PXV PXP + VMV†

), (2.2.144)

so that

12

(U†ZU + W†ZW

)=

(V†XV 0

0 PXP + VMV†

). (2.2.145)

We then find that(

f(V†XV

)0

0 f(

PXP + VBV†))

= f((

V†XV 00 PXP + VBV†

))(2.2.146)

36

Page 48: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

= f(

12

(U†ZU + W†ZW

))(2.2.147)

≤ 12

f(

U†ZU)+

12

f(

W†ZW)

(2.2.148)

=12

U† f (Z)U +12

W† f (Z)W (2.2.149)

=

(V† f (X)V 0

0 P f (X)P + V f (B)V†

). (2.2.150)

The first equality follows from the same reasoning that leads to (2.2.140). Thesecond equality follows from (2.2.145). The inequality follows from the as-sumption that f is operator convex. The third equality follows from (2.2.54).The final equality follows because

f (Z) =(

f (X) 00 f (M)

), (2.2.151)

and by applying (2.2.145) again, with the substitutions Z → f (Z), X → f (X),and M→ f (M). It follows that(

f(V†XV

)0

0 f(

PXP + VBV†))≤(

V† f (X)V 00 P f (X)P + V f (B)V†

),

(2.2.152)and we finally conclude that f (V†XV) ≤ V† f (X)V by examining the upperleft blocks in the operator inequality in (2.2.152).

2.2.4 Superoperators

Just as we have been considering linear operators X : HA → HB with inputand output Hilbert spaces HA and HB, we can consider linear operators withinput Hilbert space L(HA) and output Hilbert space L(HB). We use the termsuperoperator to refer to a linear operator acting on the Hilbert space of linearoperators. Specifically, a superoperator is a function N : L(HA) → L(HB)such that

N(αX + βY) = αN(X) + βN(Y) (2.2.153)

for all α, β ∈ C and X, Y ∈ L(HA). It is often helpful to explicitly indicate theinput and output Hilbert spaces of a superoperator N : L(HA) → L(HB) bywriting NA→B. We make use of this notation throughout the book.

37

Page 49: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

For every superoperator NA→B, there exists n ∈ N, and sets Kini=1 and

Lini=1 of operators in L(HA,HB) and L(HB,HA), respectively, such that

NA→B(XA) =n

∑i=1

KiXALi, (2.2.154)

for all XA ∈ L(HA). This follows as a consequence of the requirement thatNA→B has a linear action on XA and the isomorphism in (2.2.24). The trans-pose operation discussed previously in (2.2.18) is an example of a superop-erator. In Chapter 3, we see that quantum physical evolutions of quantumstates, known as quantum channels, are other examples of superoperatorswith additional constraints on the sets Kin

i=1 and Lini=1.

We denote the identity superator by id : L(H)→ L(H), and by definitionit satisfies id(X) = X for all X ∈ L(H).

Given two superoperators NA→A′ and MB→B′ , their tensor product NA→A′⊗MB→B′ is a superoperator with input Hilbert space L(HA⊗HB) and outputHilbert space L(HA′ ⊗HB′), such that

(NA→A′ ⊗MB→B′)(XA ⊗YB) = NA→A′(XA)⊗MB→B′(YB) (2.2.155)

for all XA ∈ L(HA) and YB ∈ L(HB). We use the abbreviation

NA→A′ ⊗ idB→B′ ≡ NA→A′ , idA→A′ ⊗MB→B′ ≡MB→B′ (2.2.156)

throughout this book whenever a superoperator acts only on one of the tensorfactors of the underlying Hilbert space of linear operators.

A superoperator N is called Hermiticity preserving if N(X) is Hermitianfor all Hermitian inputs X. The condition that N(Y†) = N(Y)† for all linearoperators Y is necessary and sufficient for N to be Hermiticity preserving,which is a straightforward consequence of (2.2.41).

The adjoint of a superoperator N : L(HA) → L(HA′) is the unique super-operator N† : L(HA′)→ L(HA) that satisfies

〈Y,N(X)〉 = 〈N†(Y), X〉 (2.2.157)

for all X ∈ L(HA) and all Y ∈ L(HA′), where we recall that 〈·, ·〉 is theHilbert–Schmidt inner product defined in (2.2.21).

For every superoperator N : L(HA) → L(HB), we define its induced tracenorm ‖N‖1 as

‖N‖1 := sup‖N(X)‖1‖X‖1

: X ∈ L(HA), X 6= 0

(2.2.158)

38

Page 50: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

= sup‖N(X)‖1 : X ∈ L(HA), ‖X‖1 ≤ 1. (2.2.159)

Then, for all X ∈ L(HA), it immediately follows that

‖N(X)‖1 ≤ ‖N‖1 ‖X‖1 . (2.2.160)

2.3 Analysis and Probability

In this section, we briefly review some essential concepts from mathematicalanalysis, in particular the concepts of the limit, supremum, and infimum, aswell as the continuity of real-valued functions. We also discuss compact sets,convex sets, and functions, as well as the basic notions of probability distri-butions.

2.3.1 Limits, Infimum, Supremum, and Continuity

Limit of a Sequence. We start with the definition of the limit of a sequence ofreal numbers. A sequence snn∈N ⊂ R of real numbers is said to have thelimit `, written limn→∞ sn = `, if for all ε > 0, there exists nε ∈ N, such thatfor all n ≥ nε, the inequality |sn − `| < ε holds.

One can think of the concept of a limit intuitively as a game between twoplayers, an antagonist and a protagonist. The antagonist goes first, and getsto pick an arbitrary ε > 0. The protagonist wins if he reports back an entry inthe sequence snn such that |sn − `| < ε; if the protagonist reports back anentry sn such that |sn− `| ≥ ε, then the antagonist wins. If the limit exists andis equal to `, then the protagonist always wins by taking n sufficiently large(i.e., larger than nε) and then reporting back sn. If the limit does not exist orif the limit is not equal to `, then the protagonist cannot necessarily win withthe strategy of taking n sufficiently large; in this case, there exists a choice ofε > 0, such that for all nε ∈ N, there exists n ≥ nε such that |sn − `| ≥ ε. Inthis latter case, the choice of ε > 0 can again be understood as a strategy ofthe antagonist.

Infimum and Supremum. Let us now recall the concepts of the infimumand supremum of subsets of the real numbers. Roughly speaking, they aregeneralizations of the concepts of the minimum and maximum, respectively,of a set. Formally, let E ⊂ R.

39

Page 51: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

• A point x ∈ R is a lower bound of E if y ≥ x for all y ∈ E. If x is thegreatest such lower bound, then x is called the infimum of E, and we writex = inf E.

• A point x ∈ R is an upper bound of E if y ≤ x for all y ∈ E. If x is theleast such upper bound, then x is called the supremum of E, and we writex = sup E.

The supremum and infinum may or may not be contained in the subset E.For example, let E = 1

nn∈N. Then, sup E = 1 ∈ E, but inf E = 0 /∈ E. Asanother example, let E = [0, 1). Then sup E = 1 /∈ E and inf E = 0 ∈ E.

When considering a function F : S → R defined on a subset S of L(H),its infimum and supremum are defined for the set E = F(X) : X ∈ S.Specifically,

infX∈S

F(X) := infF(X) : X ∈ S, (2.3.1)

andsupX∈S

F(X) := supF(X) : X ∈ S. (2.3.2)

Limit Inferior and Limit Superior. Turning back to limits, the limit of asequence need not always exist. A particularly illuminating example is thesequence rnn∈N for r ∈ R. If r > 1, then the sequence never converges to afinite value and so the limit does not exist. We say that the sequence divergesto +∞ in this case. If r < −1, then the sequence oscillates and diverges (butit does not specifically diverge to either +∞ or −∞). If r = −1, then thesequence oscillates back and forth between −1 and +1 and so the limit doesnot exist.

Given that the limit of a sequence need not always exist, it can be help-ful to have a reasonable substitute for this asymptotic concept that does al-ways exist. Such a substitute is provided by two quantities: the limit inferiorand limit superior of a sequence. We now define the limit inferior and limitsuperior, noting that they can be understood as asymptotic versions of theinfimum and supremum just discussed.

• We say that s is an asymptotic lower bound on the sequence snn∈N if forall ε > 0, there exists nε ∈ N, such that for all n ≥ nε, the inequalitysn > s− ε holds. The limit inferior is the greatest asymptotic lower boundand is denoted by

lim infn→∞

sn. (2.3.3)

40

Page 52: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

• The definition of the limit superior is essentially opposite to that of thelimit inferior. We say that s is an asymptotic upper bound on the sequencesnn∈N if for all ε > 0, there exists nε ∈ N, such that for all n ≥ nε,the inequality sn < s + ε holds. The limit superior is the least asymptoticupper bound and is denoted by

lim supn→∞

sn. (2.3.4)

The limit inferior and limit superior always exist by extending the real lineR to include −∞ and +∞. Furthermore, every asymptotic lower bound onthe sequence cannot exceed an asymptotic upper bound, implying that thefollowing inequality holds for every sequence snn∈N:

lim infn→∞

sn ≤ lim supn→∞

sn. (2.3.5)

If the opposite inequality holds for a sequence snn∈N, then the limit of thesequence exists and we can write

lim infn→∞

sn = lim supn→∞

sn = limn→∞

sn. (2.3.6)

This collapse is a direct consequence of the definitions of limit, limit inferior,and limit superior.

Limits and Continuity. We now consider the limit and continuity of afunction. Specifically, we consider real-valued functions F : L(H) → R de-fined on the space of linear operators acting on a Hilbert space H. We viewthis space as a normed vector space with either the trace norm ‖·‖1 or thespectral norm ‖·‖∞. In the definitions that follow, we use ‖·‖ to denote eitherone of these norms.

• Limit: We write limX→X0 F(X) = y to the mean the following: for allε > 0, there exists δε > 0 such that |F(X)− y| < ε for all X ∈ L(H) forwhich ‖X− X0‖ < δε.

• Continuity at a point: We say that F is continuous at X0 if for all ε > 0, thereexists δε > 0 such that |F(X) − F(X0)| < ε for all points X for which‖X− X0‖ < δε. F is said to be continuous if F is continuous at X0 for allX0 ∈ L(H).

• Uniform continuity: We say that f is uniformly continuous if for all ε > 0,there exists δε > 0 such that |F(X)− F(X′)| < ε for all X, X′ ∈ L(H) forwhich ‖X− X′‖ < δε.

41

Page 53: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

• Upper and lower semi-continuity: We say that F is upper semi-continuous atX0 if for all ε > 0, there exists a neighborhood NX0,ε of X0 such that: ifF(X0) > −∞, then F(X) ≤ F(X0) + ε for all X ∈ NX0,ε; if F(X0) = −∞,then limX→X0 F(X) = −∞. F called upper semi-continuous if it is uppersemi-continuous at X0 for all X0 ∈ L(H).

We say that f is lower semi-continuous at X0 if for all ε > 0, there existsa neighborhood NX0,ε of X0 such that: if F(X0) < +∞, then F(X) ≥F(X0)− ε for all X ∈ NX0,ε; if F(X0) = +∞, then limX→X0 F(X) = +∞.

2.3.2 Compact Sets

A subset S of a topological vector space is called compact if every sequence ofelements in S has a subsequence that converges to an element in S. For finite-dimensional vector spaces, a subset S is compact if and only if it is closed andbounded.

An important fact about compact sets is that the infimum and supremumof continuous functions defined on compact sets always exist and are con-tained in the set. Consequently, for optimization problems over compact sets,the infimum can be replaced by a minimum and the supremum can be re-placed by a maximum. In other words, if F : S → R is a continuous functiondefined on a compact subset S of L(H), then

infX∈S

F(X) = minX∈S

F(X) and supX∈S

F(X) = maxX∈S

F(X). (2.3.7)

An important example of a compact set is the set X ∈ L(H) : X ≥0, Tr[X] ≤ 1 of positive semi-definite operators with trace bounded fromabove by one. This set contains the set of density operators acting on H.

2.3.3 Convex Sets and Functions

A subset C of a vector space is called convex if, for all elements u, v ∈ C andfor all λ ∈ [0, 1], we have λu + (1− λ)v ∈ C. We often call λu + (1− λ)v aconvex combination of u and v. More generally, for every set S = vxx∈X ofelements of a real vector space indexed by an alphabet X, and every functionp : X → [0, 1] with p(x) ≥ 0 for all x ∈ X and ∑x∈X p(x) = 1, the sum∑x∈X p(x)vx is called a convex combination of the vectors in S. The convex

42

Page 54: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

hull of S is the convex set of all possible convex combinations of the vectorsin S.

Throughout this book, in the context of convex sets and functions, weconsider the real vector space of Hermitian operators acting on some Hilbertspace. Then, an important example of a convex subset is the set of all positivesemi-definite operators. Indeed, if X and Y are positive semi-definite opera-tors, then λX + (1− λ)Y is a positive semi-definite operator for all λ ∈ [0, 1].From now on, we assume C to be a convex subset of the set of Hermitianoperators, and we use X, Y, and Z to denote arbitrary elements of C.

An element Z ∈ C is called an extreme point of C if Z cannot be writtenas a non-trivial convex combination of other vectors in C. Formally, Z is anextreme point if every decomposition of Z as the convex combination Z =λX + (1− λ)Y, such that λ ∈ (0, 1) (so that the decomposition is non-trivial),implies that X = Y = Z. An important fact is that every convex set is equalto the convex hull of its extreme points.

We now define convex functions.

Definition 2.16 Convex and Concave Functions

A function F : C → R defined on a convex subset C ⊆ L(H) is a convexfunction if, for all X, Y ∈ C and λ ∈ [0, 1], the following inequality holds

F(λX + (1− λ)Y) ≤ λF(X) + (1− λ)F(Y). (2.3.8)

A function F : C → R is a concave function if −F is a convex function.

It follows from an inductive argument using Definition 2.16 that a convexfunction F : C → R on a convex subset C ⊂ L(H) satisfies

F

(∑

x∈Xp(x)Xx

)≤ ∑

x∈Xp(x)F(Xx) (2.3.9)

for all sets Xxx∈X of elements in C and all functions p : X → [0, 1] definedon X such that p(x) ≥ 0 for all x ∈ X and ∑x∈X p(x) = 1.

• A function F : C → R defined on a convex subset C ⊆ L(H) is calledquasi-convex for all X, Y ∈ C and all λ ∈ [0, 1], the following inequalityholds

F(λX + (1− λ)Y) ≤ maxF(X), F(Y). (2.3.10)

43

Page 55: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

• A function F : C → R defined on a convex subset C ⊆ L(H) is calledquasi-concave if−F is quasi-convex. Specifically, F is called quasi-concaveif for all X, Y ∈ C and all λ ∈ [0, 1], the following inequality holds

F(λX + (1− λ)Y) ≥ minF(X), F(Y). (2.3.11)

2.3.4 Fenchel–Eggleston–Carathéodory Theorem

We mentioned above that the convex hull of a subset S of a real vector spaceis the convex set consisting of all convex combinations of the vectors in S.A fundamental result is that if the underlying vector space has dimension d,then, in order to obtain an element in the convex hull of S, it suffices to take aconvex combination of no more than d+ 1 elements of S. If S is connected andcompact, then no more than d elements are required. We state this formallyas follows:

Theorem 2.17 Fenchel–Eggleston–Carathéodory Theorem

Let S be a set of vectors in a real d-dimensional vector space (d < ∞), andlet conv(S) denote the convex hull of S. Then, an element v ∈ conv(S)if and only if v can be expressed as a convex combination of m ≤ d + 1elements in S. If S is connected and compact, then the same statementholds with m ≤ d.

PROOF: Please consult the Bibliographic Notes (Section 2.6).

2.3.5 Minimax Theorems

We often encounter expressions of the following form in quantum informa-tion theory:

infX∈S

supY∈S′

F(X, Y). (2.3.12)

The expression above contains both an infimum and a supremum over sub-sets S, S′ ⊆ L(H) of some real-valued function F : S × S′ → R. It can behelpful to think of the expression in (2.3.12) in terms of a two-player competi-tive game, in which F(X, Y) is the pay-off function. The antagonist can chooseX ∈ S, with the goal of minimizing the pay-off. The protagonist can choose

44

Page 56: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

Y ∈ S′, with the goal of maximizing the pay-off. The ordering in (2.3.12) cor-responds to the antagonist playing first and making the choice X ∈ S, and theprotagonist playing second, being able to choose Y ∈ S′ as a reaction to theantagonist’s choice of X ∈ S.

With the ordering above, it is thus intuitively advantageous for the pro-tagonist to reach a high pay-off when playing second (in reaction to the an-tagonist’s choice). This intuition is captured by the following “max-min in-equality”:

supY∈S′

infX∈S

F(X, Y) ≤ infX∈S

supY∈S′

F(X, Y), (2.3.13)

which always holds. We can prove this formally as follows. Let G(Y) :=infX∈S F(X, Y). Then for all X ∈ S, Y ∈ S′, we have that G(Y) ≤ F(X, Y). Itthen follows that supY∈S′ G(Y) ≤ supY∈S′ F(X, Y) by applying the definitionof supremum. Since this latter inequality holds for all X ∈ S, the definitionof infimum implies that supY∈S′ G(Y) ≤ infX∈S supY∈S′ F(X, Y). This finalinequality is equivalent to (2.3.13).

Many proofs that we present in this book require determining when thereverse inequality of (2.3.13) holds, i.e.,

infX∈S

supY∈S′

F(X, Y)?≤ sup

Y∈S′infX∈S

F(X, Y), (2.3.14)

which is known as the “min-max inequality.” If it holds, then the inequalityin (2.3.13) is saturated and becomes an equality. The game-theoretic interpre-tation of the situation when the reverse inequality holds is that, for the sets Sand S′ and the pay-off function F, there is no advantage to playing first; i.e.,the pay-off is the same regardless of who goes first as long as the protagonistand antagonist play optimal strategies. It is thus important to know underwhat conditions this reverse inequality holds.

We now present theorems for two classes of functions that tell us whenthe reverse inequality holds.

Theorem 2.18 Sion Minimax

Let S be a compact and convex subset of a normed vector space andlet S′ be a convex subset of a normed vector space. Let F : S× S′ → R

be a real-valued function such that1. The function F(X, ·) : S′ → R is upper semi-continuous and quasi-

concave on S′ for each X ∈ S.

45

Page 57: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

2. The function F(·, Y) : S → R is lower semi-continuous and quasi-convex on S for each Y ∈ S′.

Then,infX∈S

supY∈S′

F(X, Y) = supY∈S′

infX∈S

F(X, Y). (2.3.15)

Furthermore, the infimum can be replaced with a minimum.

PROOF: Please consult the Bibliographic Notes (Section 2.6).

Theorem 2.19 Mosonyi–Hiai Minimax

Let S be a compact normed vector space, let S′ ⊆ R, and let F : S× S′ →R∪ +∞,−∞. Suppose that

1. The function F(·, y) : S′ → R ∪ +∞,−∞ is lower semi-continuous for every y ∈ S′.

2. The function F(X, ·) : S → R ∪ +∞,−∞ is either monotonicallyincreasing or monotonically decreasing for every X ∈ S.

Then,infX∈S

supy∈S′

F(X, y) = supy∈S′

infX∈S

F(X, y). (2.3.16)

Furthermore, the infimum can be replaced with a minimum.

PROOF: Please consult the Bibliographic Notes (Section 2.6).

2.3.6 Probability Distributions

Throughout this book, we are concerned with discrete probability distribu-tions, and the following definitions suffice for our needs. A discrete probabil-ity distribution is a function p : X → [0, 1] defined on an finite alphabet X

such that p(x) ≥ 0 for all x ∈ X and ∑x∈X p(x) = 1. Formally, we can con-sider the alphabet X to be the set of realizations of a discrete random variableX : Ω → X from the space Ω of experimental outcomes, called the samplespace, to the set X. We then write pX to denote the probability distribution ofthe random variable X, i.e., pX(x) ≡ Pr[X = x].

The expected value or mean E[X] of a random variable X taking values in

46

Page 58: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

X ⊂ R is defined asE[X] = ∑

x∈Xx pX(x). (2.3.17)

For every function g : X → R, we define g(X) to be the random variableg X : Ω→ R with image g(x) : x ∈ X. Then,

E[g(X)] = ∑x∈X

g(x)pX(x). (2.3.18)

A useful fact is Markov’s inequality: if X is a non-negative random variable,then for all a > 0 we have

Pr[X ≥ a] ≤ E[X]

a. (2.3.19)

Jensen’s inequality is the following: if X is a random variable with finitemean, and f is a real-valued convex function acting on the output of X, then

f (E[X]) ≤ E[ f (X)]. (2.3.20)

This is a very special case of the more elaborate operator Jensen inequalitypresented previously in Theorem 2.15.

As we explain in Chapter 3, observables O in quantum mechanics (whichare merely Hermitian operators) generalize random variables, such that theirexpectation is given by E[O] ≡ 〈O〉ρ ≡ Tr[Oρ] for a given density opera-tor ρ. In this case, an application of the operator Jensen inequality from The-orem 2.15 leads to the following: for a Hermitian operator O, a density oper-ator ρ, and an operator convex function f ,

f (Tr[Oρ]) ≤ Tr[ f (O)ρ], (2.3.21)

which we can alternatively write as

f (〈O〉ρ) ≤ 〈 f (O)〉ρ . (2.3.22)

2.4 Semi-Definite Programming

Semi-definite programs (SDPs) constitute an important class of optimizationproblems that arise frequently in quantum information theory. An SDP isa constrained optimization problem in which the optimization variable is apositive semi-definite operator X, the objective function is linear in X, andthe constraint is an operator inequality featuring a linear function of X.

47

Page 59: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

Definition 2.20 Semi-Definite Program

Given a Hermiticity-preserving superoperator Φ and Hermitian opera-tors A and B, a semi-definite program (SDP) corresponds to two optimiza-tion problems. The first is the primal SDP, which is defined as

S(Φ, A, B) :=

supremum Tr[AX]subject to Φ(X) ≤ B,

X ≥ 0.(2.4.1)

The second optimization problem is the dual SDP, which is defined as

S(Φ, A, B) :=

infimum Tr[BY]subject to Φ†(Y) ≥ A,

Y ≥ 0.(2.4.2)

We often abbreviate S(Φ, A, B) and S(Φ, A, B) more compactly as follows:

S(Φ, A, B) = supX≥0Tr[AX] : Φ(X) ≤ B, (2.4.3)

S(Φ, A, B) = infY≥0Tr[BY] : Φ†(Y) ≥ A. (2.4.4)

A variable X for the primal SDP in (2.4.1) is called a feasible point if it ispositive semi-definite and satisfies the constraint Φ(X) ≤ B, and it is called astrictly feasible point if X is positive definite and the constraint is satisfied witha strict inequality, i.e., if Φ(X) < B. The same definitions apply to the dualSDP in (2.4.2): a variable Y is a feasible point if Y ≥ 0 and Φ†(Y) ≥ A, and it isstrictly feasible if Y > 0 and Φ†(Y) > A. By convention, if there is no primalfeasible operator X, then S(Φ, A, B) = −∞, and if there is no dual feasibleoperator Y, then S(Φ, A, B) = +∞. It is also possible for S(Φ, A, B) = ∞ orS(Φ, A, B) = −∞: a simple example of the first possibility is when A = 1 andΦ is the zero map (so that there are no constraints), and a simple example ofthe second possibility is when B = −1 and Φ is the zero map.

Proposition 2.21 Weak Duality

For every SDP corresponding to Φ, A, and B, the following weak dualityinequality holds

S(Φ, A, B) ≤ S(Φ, A, B). (2.4.5)

48

Page 60: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

PROOF: Let X ≥ 0 be primal feasible, and let Y ≥ 0 be dual feasible. Thenthe following holds

Tr[AX] ≤ Tr[Φ†(Y)X] = Tr[YΦ(X)] ≤ Tr[YB]. (2.4.6)

The first inequality follows from the assumption that Y is dual feasible, sothat it satisfies Φ†(Y) ≥ A, and by applying 2. of Lemma 2.13. The equalityfollows from applying (2.2.157). The last inequality follows from the assump-tion that X is primal feasible, so that it satisfies Φ(X) ≤ B, and by applying2. of Lemma 2.13. Since the inequality holds for all primal feasible X and forall dual feasible Y, we can take a supremum over the left-hand side of (2.4.6)and an infimum over the right-hand side of (2.4.6), and we thus arrive at theweak duality inequality in (2.4.5).

There is a deep connection between the weak duality inequality in Propo-sition 2.21 and the max-min inequality in (2.3.13). This is realized by intro-ducing the Lagrangian L(Φ, A, B, X, Y) for the SDP as follows:

L(Φ, A, B, X, Y) := Tr[AX] + Tr[BY]− Tr[Φ(X)Y]. (2.4.7)

Note that the following equalities hold, which are helpful in the discussionbelow:

L(Φ, A, B, X, Y) = Tr[AX] + Tr[(B−Φ(X))Y] (2.4.8)

= Tr[BY] + Tr[(A−Φ†(Y))X]. (2.4.9)

By first taking an infimum over Y ≥ 0 and then a supremum over X ≥ 0,we find that

supX≥0

infY≥0

L(Φ, A, B, X, Y) = S(Φ, A, B). (2.4.10)

This equality follows because

supX≥0

infY≥0

L(Φ, A, B, X, Y) = supX≥0

Tr[AX] + inf

Y≥0Tr[(B−Φ(X))Y]

. (2.4.11)

The inner infimum over Y ≥ 0 forces the outer search over X ≥ 0 to beover feasible X. If an infeasible X ≥ 0 is chosen, then this means that theconstraint Φ(X) ≤ B is violated, so that there exists a non-trivial negativeeigenspace of B−Φ(X). Let |ϕ〉 be a unit vector in this negative eigenspace.We can then pick Y = c|ϕ〉〈ϕ| for c > 0 and take the limit c → ∞ to setinfY≥0 Tr[(B − Φ(X))Y] to −∞. So a violation of the constraint Φ(X) ≤ B

49

Page 61: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

incurs an infinite cost for the outer search and thus forces the outer search tobe over feasible X. Thus, the equality in (2.4.10) follows.

If we instead first take a supremum over X ≥ 0 and then take an infimumover Y ≥ 0, it follows that

infY≥0

supX≥0

L(Φ, A, B, X, Y) = S(Φ, A, B). (2.4.12)

This time, the equality follows because

infY≥0

supX≥0

L(Φ, A, B, X, Y) = infY≥0

Tr[BY] + sup

X≥0Tr[(A−Φ†(Y))X]

(2.4.13)

Similar to what was argued previously, the inner supremum over X ≥ 0forces the outer search over Y ≥ 0 to be over feasible Y. If an infeasibleY ≥ 0 is chosen, then this means that the constraint Φ†(Y) ≥ A is violated, sothat A−Φ†(Y) has a non-trivial positive eigenspace. Let |ϕ〉 be a unit vectorin this positive eigenspace. We can then pick X = c|ϕ〉〈ϕ| for c > 0 and takethe limit c → ∞ to set supX≥0 Tr[(A − Φ†(Y))X] to +∞. So a violation ofthe constraint Φ†(Y) ≥ A incurs an infinite penalty for the outer search andthus forces the outer search to be over feasible Y. Thus, the equality in (2.4.12)follows.

Now by examining (2.3.13), (2.4.10), and (2.4.12), we see that the weakduality inequality in Proposition 2.21 can be understood as a consequence ofthe max-min inequality in (2.3.13).

The reverse inequality of (2.4.5) does not hold in general; if it does, it im-plies the following equality S(Φ, A, B) = S(Φ, A, B). Then we say that theSDP corresponding to Φ, A, and B has the strong duality property or that itsatisfies strong duality. Considering the above discussion in terms of the La-grangian of the SDP, we also can understand strong duality as being equiva-lent to a minimax theorem holding.

Theorem 2.22 Slater’s Condition

Slater’s condition is a sufficient condition for strong duality to hold, andit is given as follows:

1. If there exists X ≥ 0 such that Φ(X) ≤ B and there exists Y > 0such that Φ†(Y) > A, then S(Φ, A, B) = S(Φ, A, B) and there existsa primal feasible operator X for which Tr[AX] = S(Φ, A, B).

50

Page 62: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

2. If there exists Y ≥ 0 such that Φ†(Y) ≥ A and there exists X > 0such that Φ(X) < B, then S(Φ, A, B) = S(Φ, A, B) and there existsa dual feasible operator Y for which Tr[BY] = S(Φ, A, B).

For many SDPs of interest, it is straightforward to determine if Slater’scondition holds. We provide an example in Section 2.4.1.

Complementary slackness for SDPs is useful for understanding furtherconstraints on an optimal primal operator X and an optimal dual operator Y.

Proposition 2.23 Complementary Slackness of SDPs

For every SDP corresponding to Φ, A, and B, suppose that strong dualityholds. Then the following complementary slackness conditions hold forfeasible X and Y if and only if they are optimal:

YB = YΦ(X), (2.4.14)

Φ†(Y)X = AX. (2.4.15)

PROOF: On the one hand, suppose that X is primal feasible, that Y is dualfeasible, and that they satisfy (2.4.14)–(2.4.15). Then it is clear by inspecting(2.4.6) that the inequalities are saturated, thus implying that X is primal opti-mal and Y is dual optimal.

On the other hand, suppose that X is primal optimal and that Y is dualoptimal. Then, by this assumption, it follows that Tr[AX] = Tr[BY] so thatthe inequalities in (2.4.6) are saturated. This means that

Tr[(Φ†(Y)− A)X] = 0 (2.4.16)Tr[Y(B−Φ(X))] = 0. (2.4.17)

Since Φ†(Y)− A and X are positive semi-definite, the equality in (2.4.16) im-plies that (Φ†(Y)− A)X = 0, which is equivalent to (2.4.14). Similarly, sinceB−Φ(X) and Y are positive semi-definite, the equality in (2.4.17) implies thatY(B−Φ(X)) = 0, which is equivalent to (2.4.15).

If the matrices A and B and the map Φ involved in an SDP are of reason-able size, then the SDP can be computed efficiently using numerical solvers(specifically, the time required is polynomial in the size of these objects andpolynomial in the inverse of the numerical accuracy desired). SDPs arise fre-

51

Page 63: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

quently in quantum information, with some examples appearing in Chap-ter 3. Furthermore, SDPs appear in some of the upper bounds for rates ofquantum communication protocols that we consider in Parts II and III.

2.4.1 Example of an SDP

We now detail a simple example of an SDP that allows us to go through thesteps of proving that an optimization problem is an SDP, finding the dual SDP,and proving strong duality. Let H be a Hermitian operator, and consider thefollowing functions:

f (H) := supX1,X2≥0

Tr[H(X1 − X2)] : Tr[X1 + X2] ≤ 1 , (2.4.18)

f (H) := inft≥0t : −t1 ≤ H ≤ t1 . (2.4.19)

The quantities above can be computed via SDPs, and in fact, the followingequality holds

f (H) = f (H) = ‖H‖∞ . (2.4.20)

That is, f (H) is equal to the largest singular value of the Hermitian opera-tor H.

Let us now prove that f (H) can be computed using an SDP and that theequality in (2.4.20) holds. Given that the optimization in (2.4.18) is a maxi-mization, let us first show that (2.4.18) can be written in the form of S(Φ, A, B)in (2.4.1). Indeed if we let

X =

(X1 Z†

Z X2

), A =

(H 00 −H

), (2.4.21)

Φ(X) = Tr[X1 + X2], B = 1, (2.4.22)

then we have that

f (H) = supX≥0Tr[AX] : Φ(X) ≤ B . (2.4.23)

The constraint X ≥ 0 implies that X1, X2 ≥ 0. Furthermore, notice that theoperator Z appears neither in the objective function Tr[H(X1 − X2)] nor inthe constraint Tr[X1 + X2] ≤ 1. Thus, the operator Z plays no role in theoptimization, and so we can simply set Z = 0, so that

X =

(X1 00 X2

). (2.4.24)

52

Page 64: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

Thus, (2.4.18) is indeed an SDP in primal form.

Now recall from (2.2.85) that the spectral norm of H is given by the maxi-mum of the absolute values of H. In particular, we can write

‖H‖∞ = max |λmax| , |λmin| , (2.4.25)

where λmax and λmin are the maximum and minimum eigenvalues, respec-tively, of H. Note that we always have λmax ≥ λmin. Let |φmax〉 be an eigen-vector of H satisfying H|φmax〉 = λmax|φmax〉, and let |φmin〉 be an eigenvectorof H satisfying H|φmin〉 = λmin|φmin〉. Let us suppose at first that λmax ≥ 0.Then one feasible choice of X1 and X2 in (2.4.18) is X1 = |φmax〉〈φmax| andX2 = 0, and for this choice, we find that f (H) ≥ |λmax|. If λmax ≤ 0, then an-other feasible choice of X1 and X2 in (2.4.18) is X1 = 0 and X2 = |φmax〉〈φmax|,and for this choice, we find that f (H) ≥ |λmax|. Thus, we conclude that

f (H) ≥ |λmax| . (2.4.26)

Now suppose that λmin ≥ 0. Then a feasible choice of X1 and X2 in (2.4.18) isX1 = |φmin〉〈φmin| and X2 = 0, and for this choice, we find that f (H) ≥ |λmin|.If λmin ≤ 0, then another feasible choice of X1 and X2 in (2.4.18) is X1 = 0 andX2 = |φmin〉〈φmin|, and for this choice, we find that f (H) ≥ |λmin|. Thus, weconclude that

f (H) ≥ max |λmax| , |λmin| = ‖H‖∞ . (2.4.27)

It thus remains to prove the reverse inequality, namely, that f (H) ≤ ‖H‖∞.To prove this, let us show that f (H) in (2.4.19) is the dual to f (H). Wethus now have to find the map Φ†, which is dual to Φ. Since B = 1 andΦ(X) = Tr[X1 + X2] are scalars, we take Y = t to be a scalar also. Then wefind that

Tr[YΦ(X)] = t Tr[X1 + X2] (2.4.28)

= Tr[(

t1 00 t1

)(X1 00 X2

)](2.4.29)

= Tr[Φ†(Y)X], (2.4.30)

from which we conclude that

Φ†(Y) = Φ†(t) =(

t1 00 t1

). (2.4.31)

Plugging into the standard form of the dual in (2.4.2), we find that

f (H) = infY≥0

Tr[BY] : Φ†(Y) ≥ A

(2.4.32)

53

Page 65: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

= inft≥0

t :(

t1 00 t1

)≥(

H 00 −H

)(2.4.33)

= inft≥0t : t1 ≥ H, t1 ≥ −H (2.4.34)

= inft≥0t : −t1 ≤ H ≤ t1 . (2.4.35)

Since the constraint −t1 ≤ H ≤ t1 implies that t1 ≥ −t1 and this is in turnequivalent to t1 ≥ 0, it follows that f (H) = inft∈R t : −t1 ≤ H ≤ t1.

Let us recall property 3. of Lemma 2.13, which states that λmin1 ≤ H ≤λmax1. By combining with (2.4.27), we find that λmax1 ≤ ‖H‖∞ and λmin1 ≥−‖H‖∞ 1, which implies that

− ‖H‖∞ 1 ≤ H ≤ ‖H‖∞ 1. (2.4.36)

Thus, we see that ‖H‖∞ is a feasible choice for t in (2.4.19), which implies that

f (H) ≤ ‖H‖∞ . (2.4.37)

Now combining the inequalities in (2.4.27) and (2.4.37) along with theweak duality inequality from Proposition 2.21, which for our case impliesthat f (H) ≤ f (H), we conclude that the primal and dual optimal values areequal to each other and equal to the spectral norm of H:

f (H) = f (H) = ‖H‖∞ . (2.4.38)

We arrived at this conclusion by employing clever guesses for primal fea-sible and dual feasible points. In this case, it is possible because the problemis simple enough to begin with, and we could apply knowledge from linearalgebra to make these clever guesses. Although it is sometimes possible tomake clever guesses and arrive at analytical solutions like we did above, inmany cases it is not possible. In such cases, it can be helpful to check Slater’scondition in Theorem 2.22 explicitly in order to see if strong duality holds.So let us do so for this example. For the primal SDP in (2.4.18), a strictly fea-sible point consists of the choice X1 = α1

d and X2 = β1d such that α, β > 0

and α + β < 1, where d is the dimension of 1. Then we clearly have X1 > 0,X2 > 0, and Tr[X1 + X2] < 1, so that X1 and X2 are strictly feasible as claimed.A feasible point for the dual consists of the choice γ ≥ ‖H‖∞. Thus, strongduality holds, further confirming that f (H) = f (H), as shown above.

54

Page 66: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

We finally remark about the complementary slackness conditions fromProposition 2.23, which apply to optimal primal X and optimal dual Y. Inthis case, the conditions reduce to

t = t Tr[X1 + X2], (2.4.39)(tI 00 tI

)(X1 00 X2

)=

(H 00 −H

)(X1 00 X2

), (2.4.40)

and the latter is the same as the following two separate conditions:

tX1 = HX1, −tX2 = HX2. (2.4.41)

If we have prior knowledge about the operator H, e.g., that one of its eigen-values is non-zero, then we conclude that the optimal t 6= 0 and the conditionin (2.4.39) implies that Tr[X1 + X2] = 1. In this case, we can conclude thatthe inequality constraint in (2.4.18) is loose and it suffices to optimize overX1 and X2 satisfying the constraint with equality. The conditions in (2.4.41)indicate that the image of the optimal X1 should be in the eigenspace of Hwith optimal eigenvalue t, and the image of the optimal X2 should be in theeigenspace of H with optimal eigenvalue−t. Observe that these complemen-tary slackness conditions are consistent with the choices that we made above.

As a final remark, if H is actually positive semi-definite, then the lowerbound constraint in (2.4.19) is unnecessary. Letting P be a positive semi-definite operator, we thus find that

f (P) = ‖P‖∞ = inf t : P ≤ t1 . (2.4.42)

Note that in this case, ‖P‖∞ is the largest eigenvalue of P.

2.5 The Symmetric Subspace

Given a d-dimensional Hilbert space H and an n-fold tensor product H⊗n

of H, it is often important to consider a permutation of the individual Hilbertspaces in the n-fold tensor product. This is especially the case in quantuminformation theory, where we often assume that the resources involved havepermutation symmetry. These permutations can be implemented using a uni-tary representation of the symmetric group on n elements.

The symmetric group on n elements, denoted by Sn, is defined to be the setof permutations of the set 1, 2, . . . , n. A permutation in Sn is an invertible

55

Page 67: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

function π : 1, 2, . . . , n → 1, 2, . . . , n that describes how each elementin the set 1, 2, . . . , n should be rearranged or permuted. An example ofa permutation in S3 is the function π such that π(1) = 3, π(2) = 1, andπ(3) = 2. Since there are n! ways to permute n distinct elements, it followsthat the set Sn contains n! elements.

Given a permutation π ∈ Sn and an orthonormal basis |i〉d−1i=0 for H, we

define the unitary permutation operators Wπ acting on H⊗n by

Wπ|i1, i2, . . . , in〉 = |iπ(1), iπ(2), . . . , iπ(n)〉, 0 ≤ i1, i2, . . . , in ≤ d− 1. (2.5.1)

Since the set |i1, i2, . . . , in〉 : 0 ≤ i1, i2, . . . , in ≤ d− 1 is an orthonormal basisfor H⊗n, the definition in (2.5.1) extends to every vector in H⊗n by linearity.The operators Wππ∈Sn constitute a unitary representation of Sn, in the sensethat

(Wπ)† = Wπ−1, Wπ1Wπ2 = Wπ1π2 (2.5.2)

for all π, π1, π2 ∈ Sn.

Given a d-dimensional Hilbert space H and the unitary representationWππ∈Sn of Sn defined in (2.5.1), we are interested in the subspace of vec-tors |ψ〉 ∈ H⊗n that are invariant under permutations, i.e., Wπ|ψ〉 = |ψ〉 for allπ ∈ Sn. We call this subspace the symmetric subspace of H⊗n, and it is formallydefined as

Symn(H) := span|ψ〉 ∈ H⊗n : Wπ|ψ〉 = |ψ〉 for all π ∈ Sn. (2.5.3)

Vectors |ψ〉 ∈ Symn(H) are sometimes called symmetric. The subspace

ASymn(N) := span|ψ〉 ∈ H⊗n : Wπ|ψ〉 = sgn(π)|ψ〉 for all π ∈ Sn(2.5.4)

is called the anti-symmetric subspace of H⊗n, where sgn(π) is the sign of thepermutation π, defined as sgn(π) = (−1)T(π) where T(π) is the number oftranspositions into which π can be decomposed4.

The operator

ΠSymn(H) :=1n! ∑

π∈Sn

Wπ (2.5.5)

4A transposition is a permutation that permutes only two elements of the set 1, 2, . . . , n.Any permutation π ∈ Sn can be decomposed into a product of transpositions. Although thisdecomposition is in general not unique, the parity of the number T(π) of transpositions intowhich π can be decomposed is unique, so that sgn(π) is well defined.

56

Page 68: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

is the orthogonal projection onto the symmetric subspace of H⊗n, while

ΠASymn(H) :=

1n! ∑

π∈Sn

sgn(π)Wπ (2.5.6)

is the orthogonal projection onto the anti-symmetric subspace of H⊗n. Notethat Symn(H) and ASymn(H) are orthogonal subspaces of H⊗n, because

ΠSymn(H)ΠASymn(H) = 0. (2.5.7)

This implies that 〈ψs|ψa〉 = 0 for all |ψs〉 ∈ Symn(H) and |ψa〉 ∈ ASymn(H).

We focus primarily on the symmetric subspace of H⊗n in this book, andso we now provide some additional facts about it.

The following set of vectors constitutes an orthonormal basis for the sym-metric subspace Symn(H) corresponding to the d-dimensional Hilbert space H:

|n1, n2, . . . , nd〉:=

1√n!(

∏dj=1 nj!

) ∑π∈Sn

Wπ(|0〉⊗n1 ⊗ |1〉⊗n2 ⊗ · · · ⊗ |d− 1〉⊗nd), (2.5.8)

where n1, n2, . . . , nd ≥ 0 are such that ∑dj=1 nj = n. We often call this the

occupation number basis for Symn(H). The number of elements in this basis isequal to the number of ways of selecting n elements, with repetition, from aset of d distinct elements. This number is equal to (d+n−1

n ). Consequently, thedimension of Symn(H) is

dim(Symn(H)) =

(d + n− 1

n

)=

(d + n− 1

d− 1

). (2.5.9)

REMARK: The direct sum vector space

FB(H) :=∞⊕

n=0Symn(H) (2.5.10)

is called the bosonic Fock space. (Note that Sym0(H) is the set of complex scalars, i.e.,Sym0(H) = C.) It is an infinite-dimensional Hilbert space that is relevant for the study ofquantum optical and other continuous-variable quantum systems.

An important fact that we state without proof (please consult the Bibilio-graphic Notes in Section 2.6) is that for every d-dimensional Hilbert space H,

ΠSymn(H) =

(d + n− 1

n

) ∫ψ⊗n dψ, (2.5.11)

57

Page 69: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

where the integral on the right-hand side is taken over all unit vectors withrespect to the Haar measure.

REMARK: The measure dψ is also called the Fubini-Study measure. A concrete coordinaterepresentation of the measure can be obtained by using the following parameterization ofunit vectors |ψ〉 in a d-dimensional Hilbert space H:

|ψ〉 =d−1

∑k=0

rkeiϕk |k〉, (2.5.12)

where 0 ≤ ϕk ≤ 2π and rk ≥ 0 for all 0 ≤ k ≤ d − 1. Furthermore, since |ψ〉 is a unitvector, we require that ∑d−1

k=0 r2k = 1. The conditions on the coefficients rk imply that they

parameterize the positive octant of a sphere in d dimensions. As such, each rk can bewritten as

r0 = cosθ1

2sin

θ2

2sin

θ3

2· · · sin

θd−12

, (2.5.13)

r1 = sinθ1

2sin

θ2

2sin

θ3

2· · · sin

θd−12

, (2.5.14)

r2 = cosθ2

2sin

θ3

2· · · sin

θd−12

, (2.5.15)

... (2.5.16)

rd−1 = cosθd−1

2, (2.5.17)

where 0 ≤ θi ≤ π. Similarly, the angles ϕk parameterize a torus in d dimensions. TheFubini-Study measure dψ is then the volume element of the coordinate system formedfrom the rk and the coordinate system formed by the ϕk:

dψ =(d− 1)!(2π)d−1

d−1

∏i=1

cosθi2

sin2i−1 θi2

dθi dϕi. (2.5.18)

(Please consult the Bibliographic Notes in Section 2.6 for details.) In the case d = 2, wehave:

dψ =1

2πcos

θ

2sin

θ

2dθ dϕ (d = 2). (2.5.19)

We thus see that, in the two-dimensional case, integrating with respect to the measure dψ isequivalent to integrating over the unit sphere in three dimensions. In quantum mechanics,this unit sphere is often referred to as the Bloch sphere.

We often consider the case that the Hilbert space H is a tensor product oftwo Hilbert spaces, i.e., H = HA ⊗HB ≡ HAB, with HA a dA-dimensionalHilbert space and HB a dB-dimensional Hilbert space. As we have seen above,if |i〉AdA−1

i=0 is an orthonormal basis for HA and |j〉BdB−1j=0 is an orthonormal

basis for HB, then |i, j〉AB ≡ |i〉A ⊗ |j〉B : 0 ≤ i ≤ dA − 1, 0 ≤ j ≤ dB − 158

Page 70: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

is an orthonormal basis for HAB. In this case, if we consider the n-fold tensorproduct H⊗n

AB, then the unitary representation Wπ(AB)nπ∈Sn defined in (2.5.1)

acts as follows:

Wπ(AB)n(|i1, j1〉A1B1 ⊗ |i2, j2〉A2B2 ⊗ · · · ⊗ |in, jn〉AnBn)

= |iπ(1), jπ(1)〉A1B1 ⊗ |iπ(2), jπ(2)〉A2B2 ⊗ · · · ⊗ |iπ(n), jπ(n)〉AnBn , (2.5.20)

for all 0 ≤ i1, i2, . . . , in ≤ dA− 1 and all 0 ≤ j1, j2, . . . , jn ≤ dB− 1. However, byrearranging the tensor factors, we find that the right-hand side of the aboveequation can be written as

WπAn |i1, i2, . . . , in〉A1···An ⊗Wπ

Bn |j1, j2, . . . , jn〉B1B2···Bn , (2.5.21)

where WπAnπ∈Sn and Wπ

Bnπ∈Sn are the unitary representations of Sn act-ing on H⊗n

A and H⊗nB , respectively. We can thus write the projection onto

Symn(HA ⊗HB) as

ΠSymn(HA⊗HB)=

1n! ∑

π∈Sn

Wπ(AB)n ≡ 1

n! ∑π∈Sn

WπAn ⊗Wπ

Bn . (2.5.22)

2.6 Bibliographic Notes

The study of inner product spaces, including Hilbert spaces, is the primaryfocus of functional analysis, for which we refer to the following books: (Reedand Simon, 1981; Kreyszig, 1989; Hall, 2013). In the case of finite-dimension-al Hilbert spaces, which is what we consider throughout this book, many ofthe concepts studied in functional analysis reduce to those studied in linearalgebra and matrix analysis. For these topics, we refer to Bhatia (1997); Hornand Johnson (2013); Strang (2016).

The generalized Gell-Mann matrices discussed after (2.2.38) were presentedby Hioe and Eberly (1981); Bertlmann and Krammer (2008).

A review of operator monotone, operator concave, and operator convexfunctions is given by Bhatia (1997). The short course of Carlen (2010) is alsohelpful. For proofs of the properties listed immediately after Definition 2.5,see (Bhatia, 1997, Section V). Lemma 2.10 can be found in (Audenaert andEisert, 2005, Lemma 4). A proof of (2.2.106) can be found in (Bhatia, 1997,Section IV & Exercise IV.2.12). A proof of Proposition 2.12 can be found in

59

Page 71: Principles of Quantum Communication Theory: A Modern Approach

Chapter 2: Mathematical Tools

(Müller-Lennert et al., 2013, Lemma 12). Lemma 2.14 was proved by Lieb andThirring (1976); Araki (1990). The Courant–Fischer–Weyl minimax principle,which is invoked in the proof of property 4 of Lemma 2.13, is presented as(Bhatia, 1997, Corollary III.1.2).

A proof of the operator Jensen inequality (Theorem 2.15) was given byHansen and Pedersen (2003). In presenting the implication 1. ⇒ 3. of Theo-rem 2.15, we followed the proof given by (Fujii et al., 2004, Theorem 3).

For an introduction to real analysis, see (Rudin, 1976).

For an introduction to convex analysis, see (Rockafellar, 1970; Boyd andVandenberghe, 2004), and for a proof of the Fenchel–Eggleston–Carathéodorytheorem, see (Eggleston, 1958; Rockafellar, 1970).

Sion’s minimax theorem (Theorem 2.18) is due to Sion (1958), and it is ageneralization of a minimax theorem of von Neumann (1928). A short proofof Sion’s minimax theorem can be found in (Komiya, 1988). The minimaxtheorem in Theorem 2.19 was presented by Mosonyi and Hiai (2011).

For an introduction to probability theory, see (Feller, 1968; Ross, 2019).Proofs of Markov’s inequality (2.3.19) and Jensen’s inequality (2.3.20) can befound in, e.g., (Fristedt and Gray, 1997).

For further details on semi-definite programming, see Vandenberghe andBoyd (1996); Watrous (2018). Various polynomial-time algorithms for semi-definite programming were developed by Khachiyan (1980); Arora et al. (2005);Arora and Kale (2007); Arora et al. (2012); Lee et al. (2015). A proof of Slater’sTheorem (Theorem 2.22) can be found in (Boyd and Vandenberghe, 2004, Sec-tion 5.3.2).

For further details about the symmetric subspace of a tensor product offinite-dimensional Hilbert spaces, as well as for a proof of (2.5.11), see (Har-row, 2013) (see also (Bengtsson and Zyczkowski, 2017, Section 12.7)). Furtherdetails about the Fubini-Study measure dψ introduced in (2.5.11) and elabo-rated upon in the remark immediately below it may be found in (Bengtssonand Zyczkowski, 2017, Chapter 4).

60

Page 72: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3

Overview of QuantumMechanics

This chapter provides an overview of quantum mechanics, placing particularemphasis on those aspects of quantum mechanics that are useful for the com-munication protocols that we discuss in later chapters. Quantum mechanicsis the backbone for quantum information processing, and many aspects of itcannot be explained by classical reasoning. For example, there is no strongclassical analogue for pure quantum states or entanglement, and this leadsto stark differences between what is possible in the classical and quantumworlds. However, at the same time, it is important to emphasize that all ofclassical information theory is subsumed by quantum information theory, sothat whatever is possible with classical information processing is also pos-sible with quantum information processing. As such, quantum informationsubsumes classical information while allowing for richer possibilities.

The mathematical description of quantum systems can be summarized bythe following axioms. Each of these axioms is elaborated upon in the sectionindicated.

1. Quantum systems: A quantum system A is associated with a Hilbert space HA.The state of the system A is described by a density operator, which is aunit-trace, positive semi-definite linear operator acting on HA. (See Sec-tion 3.1.)

2. Bipartite quantum systems: For distinct quantum systems A and B withassociated Hilbert spaces HA and HB, the composite system AB is asso-

61

Page 73: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

ciated with the Hilbert space HA ⊗HB. (See Section 3.1.1.)

3. Measurement: The measurement of a quantum system A is described by apositive operator-valued measure (POVM) Mxx∈X, which is defined to be acollection of positive semi-definite operators indexed by a finite alphabetsatisfying ∑x∈X Mx = 1HA .1 X If the system is in the state ρ, then theprobability Pr[x] of obtaining the outcome x is given by the Born rule as

Pr[x] = Tr[Mxρ]. (3.0.1)

Furthermore, a physical observable O corresponds to a Hermitian operatoracting on the underlying Hilbert space. Recall from the spectral theorem(Theorem 2.4) that O has a spectral decomposition as follows:

O = ∑λ∈spec(O)

λ Πλ, (3.0.2)

where spec(O) is the set of distinct eigenvalues of O and Πλ is a spectralprojection. A measurement of O is described by the POVM Πλλ, whichis indexed by the distinct eigenvalues λ of O. The expected value 〈O〉ρ ofthe observable O when the state is ρ is given by

〈O〉ρ := Tr[Oρ]. (3.0.3)

(See Section 3.1.2.)

4. Evolution: The evolution of the state of a quantum system is describedby a quantum channel, which is a linear, completely positive, and trace-preserving map acting on the state of the system. (See Section 3.2.)

Note that the second axiom for the description of bipartite quantum sys-tems is sufficient to conclude that the multipartite quantum system A1A2 · · ·Ak, comprising k distinct quantum systems A1, A2, . . . , Ak, is associated withthe Hilbert space HA1 ⊗HA2 ⊗ · · · ⊗HAk .

3.1 Quantum Systems, States, and Measurements

Each quantum system is associated with a Hilbert space. In this book, weconsider only finite-dimensional quantum systems, that is, quantum systems

1POVMs need not contain a finite number of elements, but we consider POVMs with afinite number of elements exclusively throughout this book.

62

Page 74: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

described by finite-dimensional Hilbert spaces. In the following, we provide amathematical description of all possible finite-dimensional quantum systems,along with examples of how these systems can be physically realized.

1. Qubit systems: The qubit is perhaps the most fundamental quantum sys-tem and is the quantum analogue of the (classical) bit. Every physicalsystem with two distinct degrees of freedom obeying the laws of quan-tum mechanics can be considered a qubit system. The Hilbert space as-sociated with a qubit system is C2, whose standard orthonormal basis isdenoted by |0〉, |1〉. Three common ways of physically realizing qubitsystems are as follows:

(a) The two spin states of a spin-12 particle.

(b) Two distinct energy levels of an atom, such as the ground state andone of the excited states.

(c) Clockwise and counter-clockwise directions of current flow in a su-perconducting electronic circuit.

2. Qutrit systems: A qutrit system is a quantum system consisting of threedistinct physical degrees of freedom. The Hilbert space of a qutrit is C3,with the standard orthonormal basis denoted by |0〉, |1〉, |2〉. Qutritsystems are less commonly considered than qubit systems for implemen-tations, although one important example of an implementation of a qutritsystem occurs in quantum optical systems, which we discuss below. Likequbit systems, qutrit systems can also be physically realized using, for ex-ample, the spin states of a spin-1 atom or three distinct energy levels ofan atom.

3. Qudit systems: A qudit system is a quantum system with d distinct de-grees of freedom and is described by the Hilbert space Cd, with the stan-dard orthonormal basis denoted by |0〉, |1〉, . . . , |d− 1〉. The spin statesof every spin-j atom can be used to realize a qudit system with d = 2j+ 1.Another physical realization of a qudit system is with the d distinct en-ergy levels of an atom.

4. Quantum optical systems: An important quantum system, particularly forthe implementation of many quantum communication protocols, is a qu-antum optical system. By a quantum optical system we mean a physicalsystem, such as an optical cavity or a fiber-optic cable, in which modes oflight, with photons as information carriers, propagate. A mode of light

63

Page 75: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

has a well defined momentum, frequency, polarization, and spatial direc-tion.

Formally, a quantum optical system with d distinct modes is describedby the Fock space FB(C

d), which is a Hilbert space equipped with theorthonormal occupation number basis |n1, . . . , nd〉 : n1, . . . , nd ≥ 0,where nj, for 1 ≤ j ≤ d, indicates the number of photons occupied inmode j. See (2.5.8) and the surrounding discussion for a brief review ofthe occupation number basis and Fock space.

The Fock space is infinite dimensional, but by restricting to particularsubspaces, it is possible use photons to physically realize finite-dimensionalquantum systems. The following two realizations of a qubit system areparticularly important:

(a) A single-mode optical system, with Hilbert space FB(C), restricted tothe subspace spanned by the states |0〉, |1〉, consisting of either zeroor one photon occupied in the mode. The state |0〉 corresponding tono photons is commonly called the vacuum state of the mode.

(b) A two-mode optical system, with Hilbert space FB(C2), restricted to

the subspace spanned by the orthogonal states |0, 1〉, |1, 0〉, con-sisting of only one photon in total occupying either one of the twomodes. This realization of a qubit system is commonly called thedual-rail encoding because it makes use of two modes of light. Twodistinct polarization degrees of freedom of photons, such as horizon-tal and vertical polarizations, are commonly used as the two modesin dual-rail encodings of a qubit. One then usually lets |H〉 ≡ |0, 1〉and |V〉 ≡ |1, 0〉 denote a horizontally- and vertically-polarized pho-ton, respectively.By considering the three-dimensional subspace spanned by the states|0, 0〉, |0, 1〉, |1, 0〉, that is, the dual-rail qubit system with the addi-tional orthogonal vacuum state |0, 0〉 of the two modes, we obtain aphysical realization of a qutrit system. This particular realization of aqutrit system is relevant for communication protocols in the contextof the erasure channel, which is discussed in Section 3.2.5.

Having discussed how a quantum system is mathematically described,let us now move on to the mathematical description of the state of a quantumsystem.

64

Page 76: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Definition 3.1 Quantum State

The state of a quantum system is described by a density operator actingon the underlying Hilbert space of the quantum system. A density op-erator is a unit-trace, positive semi-definite linear operator. Throughoutthe book, we identify a state with its corresponding density operator.We denote the set of density operators on a Hilbert space H as D(H).

We typically use the Greek letters ρ, σ, τ, or ω to denote quantum states.

The set of quantum states is a convex set, which by definition means thatif ρ1 and ρ2 are quantum states, then so is ρ = λρ1 + (1 − λ)ρ2 for everyλ ∈ [0, 1]. Indeed, ρ is positive semi-definite since the sum of positive semi-definite operators is positive semi-definite, and Tr[ρ] = 1. More generally,for every alphabet X and set ρxx∈X of quantum states, along with everyprobability distribution p : X → [0, 1], the following convex combination is aquantum state:

ρ = ∑x∈X

p(x)ρx. (3.1.1)

The extremal points in the convex set of quantum states are called purestates. A pure state is a unit-rank projection onto a vector in the Hilbert space.Concretely, pure states are of the form |ψ〉〈ψ| for every normalized vector|ψ〉 ∈ H. For convenience, we sometimes denote |ψ〉〈ψ| simply as ψ and referto the vector |ψ〉 itself as a pure state. Since every element of a convex set canbe written as a convex combination of the extremal points in the set, everyquantum state ρ that is not a pure state can be written as

ρ = ∑k

pk|ψk〉〈ψk| (3.1.2)

for some set |ψk〉k of pure states, where pk ≥ 0 and ∑k pk = 1.

States ρ that are not pure are called mixed states because they can be thoughtof as arising from the lack of knowledge of which pure state from the set|ψk〉k the system has been prepared. Note that the decomposition in (3.1.2)of a quantum state into pure states is generally not unique.

A state ρ is called maximally mixed if the set |ψk〉k consists of d orthonor-mal states and the probabilities pk are uniform (i.e., pk = 1

d for all k). In thiscase, it follows that

ρ =1dd

=: πd. (3.1.3)

65

Page 77: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

To see this, we can expand every unit vector |φ〉 ∈ span|ψk〉 : 1 ≤ k ≤ das |φ〉 = ∑d

k=1 ck|ψk〉, with ck = 〈ψk|φ〉. Writing ρ = 1d ∑d

k=1 |ψk〉〈ψk|, we thenobtain

ρ|φ〉 = 1d

d

∑k=1|ψk〉〈ψk|φ〉 =

1d

d

∑k=1

ck|ψk〉 =1d|φ〉. (3.1.4)

The state ρ thus acts as the operator 1dd on the d-dimensional Hilbert space

spanned by the vectors |ψk〉 : 1 ≤ k ≤ d. The state πd is called maximallymixed because it corresponds to having no knowledge about which state fromthe set |ψk〉k the system is in. This lack of knowledge can be quantifiedby using quantum entropy, and in Chapter 4, we find that the maximallymixed state πd has the largest entropy among all states of a finite-dimensionalsystem of dimension d, thus justifying the term “maximally mixed.”

3.1.1 Bipartite States

The joint state of two distinct quantum systems A and B is described by abipartite quantum state ρAB ∈ D(HA ⊗HB). For brevity, the joint Hilbertspace HA ⊗HB of the composite system AB is denoted by HAB.

Let |i〉AdA−1i=0 and |j〉BdB−1

j=0 be orthonormal bases for HA and HB, re-spectively. Then,

|i〉A ⊗ |j〉B : 0 ≤ i ≤ dA − 1, 0 ≤ j ≤ dB − 1 (3.1.5)

is an orthonormal basis for HAB. For brevity, we often write |i, j〉AB instead of|i〉A ⊗ |j〉B. Every pure state |ψ〉AB ∈ HAB can thus be written as

|ψ〉AB =dA−1

∑i=0

dB−1

∑j=0

αi,j|i, j〉AB, (3.1.6)

where αi,j = 〈i, j|ψ〉 ∈ C and ∑dA−1i=0 ∑dB−1

j=0

∣∣αi,j∣∣2 = 1. The Schmidt decompo-

sition provides a useful decomposition of every bipartite pure state.

Theorem 3.2 Schmidt Decomposition

Let |ψ〉AB be a pure bipartite state as in (3.1.6). Let XA→B be the oper-ator with matrix elements 〈j|BX|i〉A = αi,j, and let r = rank(X). Then,there exist strictly positive Schmidt coefficients λk > 0r

k=1 satisfying

66

Page 78: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

∑rk=1 λk = 1 and orthonormal vectors |ek〉Ar

k=1 and | fk〉Brk=1 such

that

|ψ〉AB =r

∑k=1

√λk |ek〉A ⊗ | fk〉B. (3.1.7)

The quantity r is called the Schmidt rank, and it holds that r ≤mindA, dB.

PROOF: This is a direct application of the singular value decomposition (The-orem 2.1). Consider the operator X defined in the statement of the theorem,having matrix elements 〈j|BX|i〉A = αi,j, so that

X =dA−1

∑i=0

dB−1

∑j=0

αi,j|j〉B〈i|A. (3.1.8)

Note that the vector |ψ〉AB in (3.1.6) is nothing but vec(X) (see (2.2.24)). Then,by the singular-value decomposition (Theorem 2.1), we can write X as

X =r

∑k=1

sk| fk〉B〈gk|A, (3.1.9)

where sk are strictly positive numbers, r = rank(X) ≤ mindA, dB, and| fk〉B : 1 ≤ k ≤ r and |gk〉A : 1 ≤ k ≤ r are sets of orthonormal vectorson HB and HA, respectively. Letting λk = s2

k, we find that

X =r

∑k=1

√λk| fk〉B〈gk|A. (3.1.10)

Upon vectorizing X as written in this form (see (2.2.24)), we find that

|ψ〉AB = vec(X) =r

∑k=1

√λk |gk〉A ⊗ | fk〉B. (3.1.11)

Letting |ek〉A := |gk〉A, the result follows.

A pure state |ψ〉〈ψ|AB is a product state if it is possible to find unit vec-tors |φ〉A and |ϕ〉B such that |ψ〉〈ψ|AB = |φ〉〈φ|A ⊗ |ϕ〉〈ϕ|B. A pure state isentangled if it is not a product state.

The Schmidt decomposition gives a simple way to determine if a purestate is entangled. If the Schmidt rank is equal to one, then the state is product.

67

Page 79: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

If the Schmidt rank is two or larger, then the state is entangled. Thus, decidingentanglement is easy for pure states, in this sense.

For more general, mixed states of quantum systems A and B, entangle-ment is a more involved notion, which we develop later in Section 3.1.1.4.

3.1.1.1 The Partial Trace

The trace of a linear operator X acting on a d-dimensional Hilbert space canbe written as

Tr[X] =d−1

∑i=0〈i|X|i〉, (3.1.12)

where |i〉d−1i=0 is some orthonormal basis. We interpret this expression as the

sum of the diagonal elements of the matrix representation of X in the basis|i〉d−1

i=0 , although it is straightforward to show that the trace is independentof the choice of basis used to evaluate it.

The trace is known as the partial trace when it acts on one subsystem of abipartite linear operator XAB.

Definition 3.3 Partial Trace

For every bipartite linear operator XAB, the partial trace over B is denotedby TrB ≡ idA ⊗ TrB and is defined as

TrB[XAB] = (idA ⊗ Tr)(XAB)

=dB−1

∑j=0

(1A ⊗ 〈j|B)XAB(1A ⊗ |j〉B).(3.1.13)

Similarly, the partial trace over A is denoted by TrA ≡ TrA ⊗ idB and isdefined as

TrA[XAB] = (Tr⊗ idB)(XAB)

=dA−1

∑i=0

(〈i|A ⊗ 1B)XAB(|i〉A ⊗ 1B).(3.1.14)

68

Page 80: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

REMARK: For every linear operator XAB acting on HAB, we can define the partial traceTrB[XAB] more abstractly as the unique linear operator XA acting on HA such that

Tr[(MA ⊗ 1B)XAB] = Tr[MAXA] (3.1.15)

for every linear operator MA on HA. If we let XAB be the state ρAB and MA be a Hermitianoperator, then we can interpret this equation physically in the following way: in order todetermine the expectation value of an observable MA acting on only one of the subsystems(in this case, the A subsystem), it suffices to know the reduced state ρA of the subsystemA rather than the entire joint state ρAB of the total system.

Given a state ρAB for the composite system AB, we are often interestedin determining the state of only one of its subsystems. The partial traceTrB takes a state ρAB acting on the composite space HAB and returns a stateρA ≡ TrB[ρAB] acting on the space HA. The partial trace is therefore the math-ematical operation used to determine the state of one of the subsystems giventhe state of a composite system comprising two or more subsystems, and itcan be thought of as the action of “discarding” one of the subsystems. Thepartial trace generalizes the notion of marginalizing a joint probability distri-bution. Later, in Section 3.2, we see that partial trace is a particular kind ofquantum channel corresponding to this discarding action.

Now, recall from Chapter 2 that, given orthonormal bases |i〉AdA−1i=0 and

|j〉BdB−1j=0 for HA and HB, respectively, we can define the orthonormal bases

|i〉〈i′|A : 0 ≤ i, i′ ≤ dA − 1, |j〉〈j′|B : 0 ≤ j, j′ ≤ dB − 1, (3.1.16)

for the linear operators on HA and HB, respectively. Then, the set

|i, j〉〈i′, j′|AB ≡ |i〉〈i′|A ⊗ |j〉〈j′|B : 0 ≤ i, i′ ≤ dA − 1, 0 ≤ j, j′ ≤ dB − 1(3.1.17)

forms an orthonormal basis for the linear operators on HAB. Since the actionof every linear operator is fully defined by its action on basis elements, wecan use the definition above to see that the action of the partial trace TrB onthis basis is

TrB[|i〉〈i′|A ⊗ |j〉〈j′|B] = |i〉〈i′|Aδj,j′ (3.1.18)

for all 0 ≤ i, i′ ≤ dA − 1, 0 ≤ j, j′ ≤ dB − 1. Similarly, for TrA, we obtain

TrA[|i〉〈i′|A ⊗ |j〉〈j′|B] = δi,i′ |j〉〈j′|B (3.1.19)

for all 0 ≤ i, i′ ≤ dA − 1, 0 ≤ j, j′ ≤ dB − 1. Then, by decomposing every

69

Page 81: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

linear operator XAB as

XAB =dA−1

∑i,i′=0

dB−1

∑j,j′=0

Xi,j;i′,j′ |i〉〈i′|A ⊗ |j〉〈j′|B

=dA−1

∑i,i′=0

dB−1

∑j,j′=0

Xi,j;i′,j′ |i, j〉〈i′, j′|AB,

(3.1.20)

where Xi,j;i′,j′ := 〈i, j|XAB|i′, j′〉, we find that

TrB[XAB] =dA−1

∑i,i′=0

(dB−1

∑j=0

Xi,j;i′,j

)|i〉〈i′|A, (3.1.21)

TrA[XAB] =dB−1

∑j,j′=0

(dA−1

∑i=0

Xi,j;i,j′

)|j〉〈j′|B. (3.1.22)

For every bipartite linear operator XAB, we let

XA ≡ TrB[XAB] and XB ≡ TrA[XAB] (3.1.23)

denote its partial traces. For states, we also use the terms marginal states orreduced states to refer to their partial traces.

An immediate consequence of the Schmidt decomposition theorem is thatthe marginal states ρA := TrB[|ψ〉〈ψ|AB] and ρB := TrA[|ψ〉〈ψ|AB] of everypure state |ψ〉AB have the same non-zero eigenvalues. Indeed, using (3.1.7),we find that

ρA =r

∑k,k′=1

√λkλk′TrB[|ek〉〈ek′ |A ⊗ | fk〉〈 fk′ |B] (3.1.24)

=r

∑k,k′=1

√λkλk′ |ek〉〈ek′ |Aδk,k′ (3.1.25)

=r

∑k=1

λk|ek〉〈ek|A, (3.1.26)

and ρB =r

∑k,k′=1

√λkλk′TrA[|ek〉〈ek′ |A ⊗ | fk〉〈 fk′ |B] (3.1.27)

=r

∑k,k′=1

√λkλk′δk,k′ | fk〉〈 fk′ |B (3.1.28)

70

Page 82: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

=r

∑k=1

λk| fk〉〈 fk|B, (3.1.29)

in which the equalities in (3.1.26) and (3.1.29) contain spectral decompositionsof ρA and ρB.

3.1.1.2 Purifications and Extensions

One of the most useful concepts in quantum information is the notion of pu-rification. There is no strong classical analogue of this concept.

Definition 3.4 Purification

Let ρA be a state of a system A. A purification of ρA is a pure state|ψ〉〈ψ|RA for a bipartite system RA such that

TrR[|ψ〉〈ψ|RA] = ρA. (3.1.30)

We often call the reference system R the “purifying system.”

The following simple theorem establishes that every state ρA has a purifi-cation.

Theorem 3.5 State Purification

For every state ρA, there exists a purification |ψ〉〈ψ|RA of ρA with dR ≥rank(ρA).

PROOF: Consider a spectral decomposition of ρA

ρA =r

∑k=1

λk|φk〉〈φk|, (3.1.31)

where r = rank(ρA). Consider a reference system R with dR ≥ r and anarbitrary set |ϕk〉R : 1 ≤ k ≤ r of orthonormal states. The unit vector

|ψ〉RA :=r

∑k=1

√λk|ϕk〉R ⊗ |φk〉A (3.1.32)

71

Page 83: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

then satisfies

TrR[|ψ〉〈ψ|RA] =r

∑k,k′=1

√λkλk′ Tr[|ϕk〉〈ϕk′ |R]︸ ︷︷ ︸

δk,k′

|φk〉〈φk′ |A (3.1.33)

=r

∑k=1

λk|φk〉〈φk|A (3.1.34)

= ρA, (3.1.35)

so that |ψ〉〈ψ|RA is a purification of ρA, as required.

REMARK: The theorem above states that the condition dR ≥ rank(ρA) on the dimensionof the purifying system R is sufficient to guarantee the existence of a purification. Thiscondition is also necessary, meaning that it is not possible to have a purifying systemwhose dimension is less than the rank of ρA.

The proof of the theorem above not only tells us that every state has a pu-rification, but it also tells us explicitly how to construct one such purification.We can also construct a purification of every state ρA as follows:

|ψ〉RA = (1R ⊗√

ρA)|Γ〉RA = vec(√

ρA), (3.1.36)

where |Γ〉RA = ∑dA−1i=0 |i, i〉RA and where the operation vec is defined in (2.2.24).

We often call the state |ψ〉〈ψ|RA the canonical purification of ρA. Note that thecanonical purification is very closely related to the purification used in theproof of Theorem 3.5. Indeed, if

ρA =r

∑k=1

λk|φk〉〈φk| (3.1.37)

is a spectral decomposition of ρA, with r = rank(ρA), then

vec(√

ρA) =r

∑k=1

√λk |φk〉R ⊗ |φk〉A, (3.1.38)

where we have made use of (2.2.27).

Physically, the fact that every state ρA has a purification means that everyquantum system A in a mixed state can be viewed as being entangled withsome system R to which we do not have access, such that the global state is apure state |ψ〉〈ψ|RA. Since we do not have access to R, our knowledge of thesystem has to be described by the partial trace of |ψ〉〈ψ|RA over R, i.e., by ρA.

72

Page 84: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Observe that if the state ρA is pure, i.e., if ρA = |φ〉〈φ|A, then the onlypossible purification of it is of the form |ψ〉〈ψ|RA = |ϕ〉〈ϕ|R ⊗ |φ〉〈φ|A, with|ϕ〉〈ϕ|R a pure state of the system R. In other words, purifications of purestates can only be pure product states.

Purifications of states are not unique. In fact, given a state ρA and a pu-rification |ψ〉〈ψ|RA of ρA as in (3.1.32), let VR→R′ be an isometry (i.e., a linearoperator satisfying V†V = 1R) acting on the R system. Defining

|ψ′〉R′A = (VR→R′ ⊗ 1A)|ψ〉RA, (3.1.39)

we find that

TrR′ [|ψ′〉〈ψ′|R′A] =r

∑k,k′=1

√λkλk′Tr[V|ϕk〉〈ϕk′ |RV†]|φk〉〈φk|A (3.1.40)

=r

∑k=1

λk|φk〉〈φk|A (3.1.41)

= ρA, (3.1.42)

where we conclude that Tr[V|ϕk〉〈ϕk′ |V†] = δk,k′ from cyclicity of the traceand V†V = 1R. So |ψ′〉〈ψ′|R′A is also a purification of ρA.

A converse statement holds as well by employing the Schmidt decompo-sition (Theorem 3.2): if |ψ〉〈ψ|RA and |ψ′〉〈ψ′|R′A are two purifications of thestate ρA, then they are related by an isometry taking one reference system tothe other. By combining this statement and the previous one, we can thussay that purifications are unique “up to isometries acting on the referencesystem.”

We now state a useful fact about purification of states that are permutationinvariant. A state ρ ∈ D(H⊗n) is called permutation invariant if

ρ = WπρWπ† (3.1.43)

for all permutations π ∈ Sn, where the unitary permutation operators Wπ aredefined in (2.5.1).

REMARK: Note that the permutation-invariance condition in (3.1.43) does not imply thatthe state ρ is supported on the symmetric subspace Symn(H) of H⊗n. In other words, thecondition in (3.1.43) does not imply that

ΠSymn(H)ρΠSymn(H) = ρ. (3.1.44)

73

Page 85: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

As a simple example, suppose that H = C2, and let ρ = |ψ〉〈ψ| ∈ D((C2)⊗2), where|ψ〉 = 1√

2(|0, 1〉 − |1, 0〉). Then, it is easy to see that WπρWπ† = ρ for all π ∈ S2, while

ΠSym2(H)ρΠSym2(H) = 0. The latter is true because |ψ〉 is an anti-symmetric state, i.e., |ψ〉 ∈ASym2(H). The state ρ is thus supported on the anti-symmetric subspace, even though itis permutation invariant.

Lemma 3.6 Purification of Permutation-Invariant States

Let ρAn ∈ D(H⊗nA ) be a permutation-invariant state, i.e.,

ρAn = WπAnρAnWπ†

An (3.1.45)

for all π ∈ Sn, where the unitary operators WπAnπ∈Sn are de-

fined in (2.5.1). Then, there exists a permutation-invariant purification|ψρ〉〈ψρ|An An of ρAn , such that |ψρ〉An An ∈ Symn(HAA). This means that

|ψρ〉An An = WπAn ⊗Wπ

An |ψρ〉An An (3.1.46)

for all π ∈ Sn, where the dimension of A is equal to the dimension of A.

PROOF: Let us consider the following spectral decomposition of ρAn :

ρAn = ∑λ∈spec(ρAn )

λ ΠλAn , (3.1.47)

where spec(ρAn) is the set of (distinct) eigenvalues of ρAn and ΠλAn is the spec-

tral projection corresponding to λ. Let us write ΠλAn as follows:

ΠλAn =

∑k=1|φλ

k 〉〈φλk |An , (3.1.48)

in terms of an orthonormal basis |φλk 〉Andλ

k=1, where dλ := rank(ΠλAn). Since

ρAn is a density operator, it follows that ∑λ∈spec(ρAn ) λ dλ = 1. Let Eλ denotethe eigenspace corresponding to λ ∈ spec(ρAn). Define the vectors

|Γλ〉An An :=dλ

∑k=1|φλ

k 〉An ⊗ |φλk 〉An , (3.1.49)

|ψρ〉An An := ∑λ∈Λ

√λ |Γλ〉An An , (3.1.50)

74

Page 86: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

where |φλk 〉 is the complex conjugate of |φλ

k 〉with respect to the standard basis.Observe that

TrAn [|ψρ〉〈ψρ|An An ] = ∑λ∈spec(ρAn )

∑k=1

λ|φλk 〉〈φλ

k |An = ρAn , (3.1.51)

so that |ψρ〉〈ψρ|An An is a purification of ρAn .

Let us now show that |ψρ〉An An ∈ Symn(HAA). Since ρAn is permutationinvariant, observe that

Wπ†An ρAnWπ

An |φλk 〉An = ρAn |φλ

k 〉An = λ|φλk 〉An , (3.1.52)

for all λ ∈ spec(ρAn) and k ∈ 1, . . . , dλ. Multiplying both sides on the leftby Wπ

An givesρAnWπ

An |φλk 〉An = λWπ

An |φλk 〉An . (3.1.53)

This means that WAn |φλk 〉An is an eigenvector of ρAn with eigenvalue λ. Since

both |φλk 〉An and WAn |φλ

k 〉An are eigenvectors of ρAn with eigenvalue λ, it fol-lows that the subspaces Eλ are invariant under all permutation operators Wπ

An .This implies that the restriction of Wπ

An to the subspace Eλ is unitary. Ob-serve that the operators Wπ

An are real in the standard basis, meaning thatWπ

An = WπAn for all π ∈ Sn. Then, by applying these two observations, we

find that

WπAn ⊗Wπ

An |Γλ〉An An = WπAn ⊗Wπ

An |Γλ〉An An = |Γλ〉An An (3.1.54)

for all π ∈ Sn, where we used (2.2.31) and (2.2.32) to establish the last equality.Therefore, by linearity,

WπAn ⊗Wπ

An |ψρ〉An An = |ψρ〉An An (3.1.55)

for all π ∈ Sn. By the definition of Symn(HAA) in (2.5.3) and the observationin (2.5.20), we conclude that |ψρ〉An An ∈ Symn(HAA).

Definition 3.7 Extension

An extension of a quantum state ρA is a state ωRA satisfying TrR[ωRA] =ρA, where R is a reference system.

Thus, a purification |ψ〉〈ψ|RA of a quantum state ρA is a particular kind ofextension of ρA. For a purification, it is required that dR ≥ rank(ρA). How-ever, there is no such requirement for an extension.

75

Page 87: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Note that if the state ρA is pure, i.e., if ρA = |φ〉〈φ|A, then every extensionωRA of ρA must be a product state, i.e., we must have ωRA = σR⊗ |φ〉〈φ|A forsome state σR.

3.1.1.3 Ensembles and Classical–Quantum States

For a finite alphabet X, an ensemble is a collection (p(x), ρx)x∈X consistingof a probability distribution p : X → [0, 1] such that each probability p(x)is paired with a quantum state ρx. Ensembles are used to describe quantumsystems that are known to be in one of a given set of states with some proba-bility.

Suppose that Alice is in possession of a quantum system and that she pre-pares the system in a state ρx with probability p(x). The state of the systemcan thus be described by the ensemble (p(x), ρx)x∈X. If she sends the sys-tem to Bob in one of these states without telling him in which of the statesthe system has been prepared, but Bob knows the ensemble (p(x), ρx)x∈Xdescribing the system, then from Bob’s perspective the state of the system isgiven by the expected state ρ of the ensemble, which is given by

ρ = ∑x∈X

p(x)ρx. (3.1.56)

On the other hand, if Alice sends Bob classical information about whichstate she has prepared, then from Bob’s perspective the state of the systemcan be described by the following classical–quantum state:

ρXB = ∑x∈X

p(x)|x〉〈x|X ⊗ ρxB, (3.1.57)

where X is the |X|-dimensional quantum system corresponding to the reg-ister holding the information about which state was sent and |x〉x∈X is anorthonormal basis for HX. Classical–quantum states have a block-diagonalstructure, in the sense that they can equivalently be written as the followingblock-diagonal matrix:

ρXB =

p(x1)ρx1

p(x2)ρx2

. . .p(x|X|)ρ

x|X|

. (3.1.58)

76

Page 88: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

We can represent (3.1.58) more compactly as

ρXB =⊕

x∈Xp(x)ρx. (3.1.59)

3.1.1.4 Separable and Entangled States

The concepts of separable and entangled states are at the heart of virtuallyall of the communication protocols that we consider in this book. Entangle-ment, in particular, is a key element of private communication and secure keydistillation, and the successful distribution of entangled states among severalspatially separated parties is a crucial ingredient in the implementation ofsuch protocols over the future quantum internet. If the parties share onlyseparable, unentangled states, then it is not possible for them to distill a keythat is secure against a general quantum adversary.

We begin this section by defining separable and entangled states as fol-lows:

Definition 3.8 Separable and Entangled States

A bipartite state σAB is called separable if there exists a finite alphabet X,a probability distribution p : X → [0, 1] on X, and sets ωx

Ax∈X andτx

Bx∈X of states for A and B, respectively, such that

σAB = ∑x∈X

p(x)ωxA ⊗ τx

B . (3.1.60)

In other words, a state is called separable if it can be written as a convexcombination of product states ωx

A ⊗ τxBx∈X. The set of separable states

on HAB is denoted by SEP(A : B).

Any state that is not separable is called entangled.

REMARK: Note that a separable state can always be written in the form

σAB = ∑x′∈X′

q(x′)|ψx′〉〈ψx′ |A ⊗ |φx′〉〈φx′ |B (3.1.61)

for some probability distribution q : X′ → [0, 1] on a finite alphabet X′ and sets of purestates |ψx′〉〈ψx′ |A : x′ ∈ X′, |φx′〉〈φx′ |B : x′ ∈ X′. In other words, separable states canalways be written as a convex combination of pure product states. Indeed, from (3.1.60),

77

Page 89: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

we can take spectral decompositions of ωxA and τx

B ,

ωxA =

rxA

∑k=1

txk |ex

k 〉〈exk |A, τx

B =rx

B

∑`=1

sx` | f x

` 〉〈 f x` |B, (3.1.62)

where rxA = rank(ωx

A) and rxB = rank(τx

B), so that

ρAB = ∑x∈X

rxA

∑k=1

rxB

∑`=1

p(x)txk sx

` |exk 〉〈ex

k |A ⊗ | f x` 〉〈 f x

` |B. (3.1.63)

Then, define the alphabet X′ = x′ := (x, k, `) : x ∈ X, 1 ≤ k ≤ rxA, 1 ≤ ` ≤ rx

B, so that x′

is a superindex, and the unit vectors

|ψx′〉A := |exk 〉A, |φx′〉B := | f x

` 〉B. (3.1.64)

Also, define the probability distribution q : X′ → [0, 1] by

q(x, k, `) = p(x)txk sx

` . (3.1.65)

Therefore, (3.1.63) can be written as

σAB = ∑x′∈X′

q(x′)|ψx′〉〈ψx′ |A ⊗ |φx′〉〈φx′ |B. (3.1.66)

From the development above, it follows that the set of separable states is the convex hull ofthe set of pure product states. By an application of the Fenchel–Eggleston–CarathéodoryTheorem (Theorem 2.17), it follows that σAB can be written as a convex combination of nomore than rank(σAB)

2 pure product states.

Bipartite separable states can be thought of as exhibiting purely classicalcorrelations between the two parties, Alice and Bob, in the following sense.Suppose that Alice draws x with probability p(x), prepares her system inthe state ωx

A, sends x to Bob over a classical channel, who then prepares hissystem in the state τx

B , where x ∈ X and X is a finite alphabet. This procedurecorresponds to preparing the ensemble (p(x), ωx

A ⊗ τxB)x∈X, and if Alice

and Bob discard the label x, then their shared joint state is the separable stateσAB = ∑x∈X p(x)ωx

A ⊗ τxB .

On the other hand, no such procedure consisting of only local operationsby Alice and Bob supplemented by classical communication between themcan ever be used to generate an entangled state between Alice and Bob (with-out some prior entanglement shared between them). Section 3.2.11 introduceslocal operations and classical communication (LOCC) channels and explainsthis point in more detail. Essentially, two entangled particles are intrinsicallylinked in such a way that it is insufficient to describe each particle individu-

78

Page 90: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

ally.

Observe that a pure state ψAB is separable if and only if it is a productstate, i.e., if and only if there exist pure states φA and ϕB such that ψAB =φA ⊗ ϕB. Recalling that every pure state has a Schmidt decomposition (seeTheorem 3.2), we now reiterate what we stated earlier after Theorem 3.2:

A pure state is entangled if and only if its Schmidtrank is strictly greater than one.

An important example of an entangled pure state is the state ΦAB on twod-dimensional systems A and B, defined as ΦAB := |Φ〉〈Φ|AB, where

|Φ〉AB :=1√d

d−1

∑i=0|i〉A ⊗ |i〉B =

1√d|Γ〉AB, (3.1.67)

and |Γ〉AB = ∑d−1i=0 |i〉A ⊗ |i〉B is the vector defined in (2.2.26). Note that

TrA[ΦAB] =1Bd

, TrB[ΦAB] =1Ad

. (3.1.68)

The state ΦAB is an example of a maximally entangled state. A pure stateψAB, for two systems A and B of the same dimension d, is called maximallyentangled if the Schmidt coefficients of |ψ〉AB are all equal to 1√

d, with d being

the Schmidt rank of |ψ〉AB. In other words, ψAB is called maximally entangledif |ψ〉AB has the Schmidt decomposition

|ψ〉AB =1√d

d

∑k=1|ek〉A ⊗ | fk〉B (3.1.69)

for some orthonormal sets |ek〉A : 1 ≤ k ≤ d and | fk〉B : 1 ≤ k ≤ d.Observe then that

TrA[|ψ〉〈ψ|AB] = πd = TrB[|ψ〉〈ψ|AB]. (3.1.70)

In other words, like the state ΦAB, the marginal states of every maximallyentangled state are maximally mixed on HA and HB.

Maximally entangled states provide a good example of why entangledparticles are, in a sense, greater than the sum of their parts. Since maximally

79

Page 91: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

entangled states have maximally mixed marginal states, the individual par-ticles in a maximally entangled state can be viewed as being in a completelyrandom state since, as we have seen, maximally mixed states can be writtenas the expected state of every ensemble of orthonormal pure states with uni-form probability distribution. However, intriguingly, the overall compositesystem is in a pure, definite state.

3.1.1.5 The Partial Transpose and PPT States

As indicated previously in (2.2.18), the action of the transpose map T on alinear operator XA→A′ can be written as

T(X) =dA−1

∑i=0

dA′−1

∑j=0|i〉A〈j|A′X|i〉A〈j|A′ , (3.1.71)

where we have defined the transpose with respect to the orthonormal bases|i〉A : 0 ≤ i ≤ dA − 1 and |j〉A′ : 0 ≤ j ≤ dA′ − 1. This is consistentwith the familiar definition of the transpose of a matrix X as being the matrixXT with its rows and columns flipped relative to X. Indeed, if X has the ma-trix representation X = ∑j,i Xj,i|j〉A′〈i|A, with matrix elements Xj,ij,i, then itfollows that the transpose map has action

T(X) = ∑j,i

Xj,i|i〉A〈j|A′ , (3.1.72)

which, as expected, has XT as its matrix representation. Note that, unlikethe trace or the conjugate transpose, the transpose depends on the choice oforthonormal bases used to evaluate it. Throughout the rest of this book, weuse both T(X) and XT to refer to the transpose of X.

The transpose map is known as the partial transpose when it acts on onesubsystem of a bipartite operator XAB.

Definition 3.9 Partial Transpose

For every bipartite linear operator XAB, its partial transpose on B is de-noted by TB ≡ idA ⊗ TB and is defined as

TB(XAB) :=dB−1

∑j,j′=0

(1A ⊗ |j〉〈j′|B)XAB(1A ⊗ |j〉〈j′|B). (3.1.73)

80

Page 92: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Similarly, the partial transpose on A is denoted by TA ≡ TA ⊗ idB and isdefined as

TA(XAB) :=dA−1

∑i,i′=0

(|i〉〈i′|A ⊗ 1B)XAB(|i〉〈i′|A ⊗ 1B). (3.1.74)

Given an expansion of XAB as

XAB =dB−1

∑i,j=0

Xi,jA ⊗ |i〉〈j|B, (3.1.75)

where each Xi,jA is a linear operator acting on system A, the partial transpose

map TB has the action

TB(XAB) =dB−1

∑i,j=0

Xi,jA ⊗ |j〉〈i|B =

dB−1

∑i,j=0

X j,iA ⊗ |i〉〈j|B. (3.1.76)

The partial transpose map is self-inverse, i.e.,

(idA ⊗ TB) (idA ⊗ TB) = idAB, (3.1.77)

and it is self-adjoint with respect to the Hilbert–Schmidt inner product, in thesense that

〈XAB, TB(YAB)〉 = 〈TB(XAB), YAB〉 , (3.1.78)

for all operators XAB and YAB.

Definition 3.10 PPT State

A bipartite state ρAB is called a positive partial transpose (PPT) state if thepartial transpose TB(ρAB) is positive semi-definite.

The set of PPT states is denoted by PPT(A : B), so that

PPT(A : B) := σAB : σAB ≥ 0, TB(σAB) ≥ 0, Tr[σAB] = 1 . (3.1.79)

Note that the property of a state being PPT depends neither on which sys-tem the transpose is taken nor on which orthonormal basis is used to definethe transpose map. Indeed, to see the first statement, suppose that ρAB ∈

81

Page 93: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

PPT(A : B). This means that TB(ρAB) ≥ 0. But since the eigenvalues are in-variant under a full transpose TA⊗TB, this means that (TA⊗TB)(TB(ρAB)) ≥0, the latter being the same as TA(ρAB) ≥ 0 due to the self-inverse property ofthe partial transpose. So TB(ρAB) ≥ 0 implies TA(ρAB) ≥ 0, and vice versa.

To see the second statement, let TB(ρAB) ≥ 0, and let |φ`〉BdB`=1 be some

other orthonormal basis for B. The partial transpose with respect to this basisis given by

dB

∑`,`′=1

(1A ⊗ |φ`〉〈φ`′ |B)ρAB(1A ⊗ |φ`〉〈φ`′ |B). (3.1.80)

Now, consider that ∑dB`=1 |φ`〉〈φ`| = 1B, so that

TB(ρAB)

=dB−1

∑j,j′=0

(1A ⊗ |j〉〈j′|B)ρAB(1A ⊗ |j〉〈j′|B)

=dB

∑j,j′,`,`′=1

(1A ⊗ |j〉⟨

j′|φ`

⟩〈φ`|B)ρAB(1A ⊗ |φ`′〉 〈φ`′ |j〉〈j′|B)

=dB

∑j,j′,`,`′=1

(1A ⊗ 〈φ`′ |j〉 |j〉〈φ`|B)ρAB(1A ⊗ |φ`′〉〈j′|B⟨

j′|φ`

⟩)

=dB

∑`,`′=1

(1A ⊗

(dB−1

∑j=0〈φ`′ |j〉 |j〉

)〈φ`|B

)ρAB

1A ⊗ |φ`′〉

db

∑j′=1〈j′|B

⟨j′|φ`

=dB

∑`,`′=1

(1A ⊗ |φ`′〉〈φ`|B

)ρAB

(1A ⊗ |φ`′〉〈φ`|

), (3.1.81)

where in the last line we defined |φ`〉 := ∑dB−1j=0 〈φ`|j〉 |j〉 for 1 ≤ ` ≤ dB. Note

that the set |φ`〉dB`=1 is orthonormal, so that UB := ∑dB

`=1 |φ`〉〈φ`|B is a unitaryoperator. Then we find that

dB

∑`,`′=1

(1A ⊗ |φ`′〉〈φ`|B) ρAB (1A ⊗ |φ`′〉〈φ`|B) = UBTB(ρAB)U†B ≥ 0, (3.1.82)

where the last inequality follows from the condition TB(ρAB) ≥ 0 and prop-erty 1 of Lemma 2.13. We thus conclude that the PPT property does not de-pend on which orthonormal basis is used to define the transpose map.

82

Page 94: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

The partial transpose map plays an important role in quantum informa-tion theory due to its connection with entanglement. In fact, it leads to asufficient condition for a bipartite state to be entangled. To see this, supposethat a state σAB is a separable, unentangled state of the following form:

σAB = ∑x∈X

p(x)ωxA ⊗ τx

B , (3.1.83)

where p : X → [0, 1] is a probability distribution on a finite alphabet X andωx

Ax∈X and τxBx∈X are sets of quantum states. Then, the action of the

partial transpose map TB on σAB is as follows:

TB(σAB) = ∑x

p(x)ωxA ⊗ T(τx

B), (3.1.84)

which is a separable quantum state, being the expected state of the ensemble(p(x), ωx

A ⊗ T(τxB))x∈X. Each element of the ensemble is indeed a quantum

state because the transpose is a positive map, i.e., T(τxB) ≥ 0 if τx

B ≥ 0. Due tothis fact, we conclude that TB(σAB) ≥ 0, so that σAB is a PPT state. Thus, weconclude the following:

If a state is separable, then it is PPT.

This is called the PPT criterion. Equivalently, by taking the contrapositive ofthis statement, we obtain the following:

If a state is not PPT, then it is entangled.

So the condition of a state being NPT (non-positive partial transpose) is suffi-cient for detecting entanglement.

As an example, let us consider applying the partial transpose map TB tothe maximally entangled state ΦAB, as defined in (3.1.67). We find that

TB(ΦAB) =1d

TB

(d

∑i,i′=1|i〉〈i′|A ⊗ |i〉〈i′|B

)(3.1.85)

=d

∑i,i′=1|i〉〈i′|A ⊗ |i′〉〈i|B (3.1.86)

83

Page 95: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

=:1d

FAB, (3.1.87)

where we have defined the swap operator FAB as

FAB :=d

∑i,i′=1|i〉〈i′|A ⊗ |i′〉〈i|B =

d

∑i,i′=1|i, i′〉〈i′, i|AB (3.1.88)

The swap operator is Hermitian and satisfies F2AB = 1AB, meaning that it is

also unitary and self-inverse. The swap operator has the following spectraldecomposition:

FAB = ΠSAB −ΠA

AB, (3.1.89)

whereΠS

AB := 12 (1AB + FAB) (3.1.90)

is the projection onto the symmetric subspace of Cd ⊗ Cd, with dimensionTr[ΠS

AB] = d(d + 1)/2 (see (2.5.5)), and

ΠAAB := 1

2 (1AB − FAB) (3.1.91)

is the projection onto the anti-symmetric subspace of Cd ⊗ Cd, with dimen-sion Tr[ΠA

AB] = d(d − 1)/2 (see (2.5.6)). Note that ΠSAB + ΠA

AB = 1AB andΠS

ABΠAAB = 0. Thus, the swap operator has negative eigenvalues, which by

the PPT criterion means that ΦAB is an entangled state, as expected.

Although the PPT criterion is generally only a necessary condition for sep-arability of a bipartite state, it is known that the PPT criterion is necessary andsufficient for quantum states ρAB in which both A and B are qubits or A is aqubit and B is a qutrit; please consult the Bibliographic Notes in Section 3.6.In particular, therefore, in higher dimensions there are PPT states that areentangled. These PPT entangled states turn out to be useless for the taskof entanglement distillation (see Chapter 8), and thus they are called boundentangled (although they are entangled, they cannot be used to extract puremaximally entangled states).

3.1.2 Measurements

Measurements in quantum mechanics are described by positive operator-valuedmeasures (POVMs).

84

Page 96: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Definition 3.11 Positive Operator-Valued Measure (POVM)

A POVM is a set Mxx∈X of operators satisfying

Mx ≥ 0 ∀ x ∈ X, ∑x∈X

Mx = 1. (3.1.92)

For our purposes, it suffices to consider finite sets of such operators. Theelements of the finite alphabet X are used to label the outcomes of themeasurement.

The measurement of a quantum system in the state ρ according to thePOVM Mxx∈X induces a probability distribution pX : X → [0, 1]. Thisdistribution corresponds to a random variable X, which takes values in thealphabet X, and is defined by the Born rule:

pX(x) = Pr[X = x] = Tr[Mxρ]. (3.1.93)

If the elements Mx of the POVM are all projections, then the correspond-ing measurement is called a projective measurement. If the POVM of a projec-tive measurement consists solely of rank-one projections, then the measure-ment is sometimes called a von Neumann measurement. Projective measure-ments, in particular von Neumann measurements, are viewed as the simplesttype of measurement that can be performed on a quantum system. However,as it turns out, we can implement every measurement as a von Neumannmeasurement if we have access to an auxiliary quantum system, and this isthe content of Naimark’s theorem.

Theorem 3.12 Naimark’s Theorem

For every POVM Mxx∈X, there exists an isometry V such that

Mx = V†(1⊗ |x〉〈x|)V ∀ x ∈ X. (3.1.94)

PROOF: This follows immediately by defining the isometry V as

V = ∑x∈X

√Mx ⊗ |x〉. (3.1.95)

85

Page 97: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

VA→A′PρA

ρA′

|x〉〈x|x∈X

x

FIGURE 3.1: The system-probe model of measurement. In order to mea-sure the system A of interest, we allow it to first interact with a probesystem P via an isometry VA→A′P. We then measure the probe with a pro-jective measurement consisting of rank-one elements.

This is indeed an isometry because

V†V =

(∑

x∈X

√Mx ⊗ 〈x|

)(∑

x′∈X

√Mx′ ⊗ |x′〉

)(3.1.96)

= ∑x,x′∈X

√Mx√

Mx′⟨

x|x′⟩

(3.1.97)

= ∑x∈X

Mx (3.1.98)

= 1, (3.1.99)

where the last equality holds because Mxx∈X is a POVM. It is straightfor-ward to check that (3.1.94) is satisfied for the choice of V in (3.1.95).

The physical relevance of Naimark’s theorem is illustrated in Figure 3.1.We model the measurement of the system A of interest as an interaction ofthe system with a probe system P followed by a projective measurement ofthe probe system described by the POVM |x〉〈x| : x ∈ X. The interaction isdescribed by an isometry VA→A′P, and if ρAP = VρAV† is the joint state of thesystem and the probe after the interaction, then the measurement outcomeprobabilities are

pX(x) = Tr[(1A′ ⊗ |x〉〈x|)ρA′P] (3.1.100)

= Tr[(1A′ ⊗ |x〉〈x|)VρAV†] (3.1.101)

= Tr[V†(1A′ ⊗ |x〉〈x|)VρA] (3.1.102)= Tr[MxρA], (3.1.103)

where we let Mx := V†(1A′ ⊗ |x〉〈x|)V. This shows us that the system-probemodel of measurement, in which the probe is measured after it interacts withthe system of interest, can effectively be described by a POVM. Naimark’s

86

Page 98: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

theorem, on the other hand, tells us the converse: for every measurementdescribed by a POVM, there exists an isometry such that the measurementcan be described in the system-probe model, as depicted in Figure 3.1.

When performing measurements in quantum mechanics, we are typicallyinterested not only in the measurement outcomes and their probabilities butalso with the so-called post-measurement states of the system being measured.That is, we are interested in knowing the state of the system after we havemeasured it and observed the outcome.

Since the POVM elements Mx are positive semi-definite, there exist op-erators Kx such that Mx = K†

xKx for all x ∈ X. For example, we could letKx be the square root of Mx, so that Kx =

√Mx. Then, the Born rule in

(3.1.93) for the probability of the measurement outcome x ∈ X can be writtenas pX(x) = Tr[KxρK†

x]. In this case, the post-measurement state correspond-ing to the outcome x ∈ X is as follows:

ρx :=KxρK†

xTr[KxρK†

x]. (3.1.104)

The state ρx can be understood to capture the experimenter’s knowledge ofthe state of the system given that the measurement outcome was observed tobe x.

The post-measurement states ρx give rise to the ensemble (pX(x), ρx)x∈Xwhose expected density operator is

ρM := ∑x∈X

pX(x)ρx = ∑x∈X

KxρK†x. (3.1.105)

This expected density operator is the state of the system after measurement ifthe measurement outcome is not available. It can be interpreted as the state ofthe system after measurement if the experimenter has no knowledge of whichmeasurement outcome occurred.

Due to the unitary freedom in the decomposition Mx = K†xKx, other choices

of Kx are given by Kx = Ux√

Mx for some unitary Ux, so that there is not aunique way to determine the post-measurement state when starting from aPOVM.

Suppose now that we perform a measurement on a subsystem of a com-posite system. Specifically, consider measuring a system A that is in a jointstate ρRA with a reference system R, and let the measurement be described bythe POVM Mx

Ax∈X for some finite alphabet X. If we let MxA = Kx†

A KxA, then

87

Page 99: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

according to (3.1.104), the measurement probabilities are given by pX(x) =Tr[(1R ⊗ Kx

A)ρRA(1R ⊗ Kx†A )] and the post-measurement states are as follows:

ρxRA =

(1R ⊗ KxA)ρRA(1R ⊗ Kx†

A )

Tr[(1R ⊗ KxA)ρRA(1R ⊗ Kx†

A )](3.1.106)

=1

pX(x)(1R ⊗ Kx

A)ρRA(1R ⊗ Kx†A ) (3.1.107)

for all x ∈ X. The state of the system R conditioned on the measurementoutcome x is then

ρxR := TrA[ρ

xRA] (3.1.108)

=1

pX(x)TrA[(1R ⊗ Kx

A)ρRA(1R ⊗ Kx†A )] (3.1.109)

=1

pX(x)TrA[(1R ⊗ Kx†

A KA)ρRA] (3.1.110)

=1

pX(x)TrA[(1R ⊗Mx

A)ρRA]. (3.1.111)

We thus see that, although the post-measurement state on the system A beingmeasured is not uniquely defined due to the unitary freedom in the decom-position Mx

A = Kx†A Kx

A, as described earlier, the post-measurement state onthe reference system R not being measured is uniquely defined because itdepends directly on the POVM elements Mx

A. If the system A undergoes ameasurement for which Mx

A = |ψx〉〈ψx|A, then (3.1.111) can be written as

ρxR =

1pX(x)

〈ψx|AρRA|ψx〉A. (3.1.112)

3.1.3 Quantum State Discrimination

Having described measurements in quantum mechanics, let us now discussan important application called quantum state discrimination. It is also knownas symmetric or Bayesian quantum hypothesis testing.

Suppose that Alice prepares a quantum system in one of two states, ρor σ, the first with probablity λ and the second with probability 1− λ, andsends the system to Bob without telling him which state she prepared. Bob’stask is to guess the state of the system by performing only one measurement,knowing in advance both the prior probability λ and that the state is either

88

Page 100: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

ρ

σ

Guess “ρ”

Guess “σ”

Guess “ρ”

Guess “σ”

λ

1− λ

FIGURE 3.2: Decision tree for the state discrimination problem. Given thatBob receives the state ρ with probability λ and the state σ with probability1 − λ, he performs a measurement described by the POVM Mρ, Mσ,where Mσ = 1− Mρ. Conditioned on the outcome of his measurement,he guesses the state of the system to be either ρ or σ.

ρ or σ. What is Bob’s highest probability of correctly guessing the state, andwhich measurement can he use to achieve the maximum success probability?

This scenario is depicted in Figure 3.2 along with the strategy that Bob canemploy, which is the following:

1. He picks a measurement that can be described by the two-element POVMMρ, Mσ. Note that Mρ, Mσ ≥ 0 and Mρ + Mσ = 1, by the definition ofa POVM, which means that Mσ = 1−Mρ.

2. If the outcome corresponding to Mρ occurs, then Bob guesses “ρ”; other-wise, if the outcome corresponding to Mσ occurs, he guesses “σ”.

Bob’s strategy is successful if his guess is correct; i.e., it is correct when thesystem is in the state ρ and he guesses “ρ” or when the system is in the state σand he guesses “σ”. By following the decision tree in Figure 3.2, we find thatthe success probability of his strategy is

psucc(Mρ, Mσ; λ, ρ, σ):= Pr[state is ρ and Bob guesses “ρ”]

+ Pr[state is σ and Bob guesses “σ”]= Pr[state is ρ] · Pr[Bob guesses “ρ” | state is ρ]

89

Page 101: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

+ Pr[state is σ] · Pr[Bob guesses “σ” | state is σ]

= λTr[Mρρ] + (1− λ)Tr[Mσσ]

= λTr[Mρρ] + (1− λ)Tr[(1−Mρ)σ]

= Tr[Mρ(λρ)] + Tr[(1−Mρ)((1− λ)σ)]. (3.1.113)

We can manipulate the last line above to find that

psucc(Mρ, Mσ; λ, ρ, σ) = 1− λ + Tr[Mρ(λρ− (1− λ)σ)], (3.1.114)

where we used the fact that Tr[σ] = 1.

The error probability is then

perr(Mρ, Mσ; λ, ρ, σ) := 1− psucc(Mρ, Mσ; λ, ρ, σ) (3.1.115)= λ− Tr[Mρ(λρ− (1− λ)σ)] (3.1.116)= Tr[(1−Mρ)(λρ)] + Tr[Mρ(1− λ)σ]. (3.1.117)

The optimal success probability is given by maximizing the success prob-ability in (3.1.113) over all two-outcome POVMs. Equivalently, we can opti-mize over all operators M satisfying 0 ≤ M ≤ 1:

p∗succ(λ, ρ, σ) := supM:0≤M≤1

1− λ + Tr[M(λρ− (1− λ)σ)] . (3.1.118)

Similarly, the optimal error probability is given by minimizing the error prob-ability (3.1.117) over all operators M satisfying 0 ≤ M ≤ 1:

p∗err(λ, ρ, σ) := infM:0≤M≤1

Tr[(1−M)(λρ)] + Tr[M(1− λ)σ] . (3.1.119)

The solution to this optimization problem is given by Theorem 3.13.

Theorem 3.13 Helstrom–Holevo Theorem

For every two positive semi-definite operators A and B,

infM:0≤M≤1

Tr[(1−M)A] + Tr[MB]

=12(Tr[A + B]− ‖A− B‖1) . (3.1.120)

A measurement operator M is optimal if and only if M = Π+ + Λ0,where Π+ is the projection onto the strictly positive part of A− B, the

90

Page 102: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

operator Π0 is the projection onto the zero eigenspace of A − B, and0 ≤ Λ0 ≤ Π0. Furthermore,

supM:0≤M≤1

Tr[M(A− B)] =12(Tr[A− B] + ‖A− B‖1) , (3.1.121)

and the conditions for an optimal M are the same as given above.

Letting A = λρ and B = (1 − λ)σ in (3.1.120), we recognize that theobjective function on the left-hand side of (3.1.120) is equal to perr(M, 1−M; λ, ρ, σ) as defined in (3.1.117). We thus obtain

p∗err(λ, ρ, σ) =12(1− ‖λρ− (1− λ)σ‖1) . (3.1.122)

Note that Theorem 3.13 also gives us a measurement that achieves the mini-mal error probability.

PROOF (Proof of Theorem 3.13): Let M satisfying 0 ≤ M ≤ 1 be arbitrary. Let∆ := A− B and let ∆+ and ∆− be the positive and negative parts, respectively,of ∆, so that A− B = ∆+ − ∆− and ∆+∆− = 0 (recall (2.2.51)). We can thenwrite the objective function in (3.1.120) as

Tr[(1−M)A] + Tr[MB] = Tr[A]− (Tr[M∆+]− Tr[M∆−]) . (3.1.123)

Now, since Tr[M∆−] ≥ 0, on account of both M and ∆− being positive semi-definite, we find that

Tr[M∆+]− Tr[M∆−] ≤ Tr[M∆+] ≤ Tr[∆+], (3.1.124)

where to obtain the last inequality we used the fact that M ≤ 1. However,since

‖A− B‖1 = Tr[|A− B|] = Tr[∆+] + Tr[∆−] (3.1.125)

and∆− = ∆+ + B− A, (3.1.126)

we can write Tr[∆+] as

Tr[∆+] =12(‖A− B‖1 − Tr[B− A]) , (3.1.127)

which means that the objective function on the left-hand side of (3.1.123) canbe bounded from below by 1

2 (Tr[A + B]− ‖A− B‖1). We have thus shown

91

Page 103: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

that

Tr[(1−M)A] + Tr[MB] ≥ 12(Tr[A + B]− ‖A− B‖1) . (3.1.128)

for all M such that 0 ≤ M ≤ 1, which implies that

infM:0≤M≤1

Tr[(1−M)A] + Tr[MB] ≥ 12(Tr[A + B]− ‖A− B‖1) . (3.1.129)

To see the reverse inequality, let M = Π++Λ0, where Π+ is the projectiononto the strictly positive part of A− B and Λ0 satisfies 0 ≤ Λ0 ≤ Π0, with Π0the projection onto the zero eigenspace of A− B. Then,

Tr[M(A− B)] = Tr[(Π+ + Λ0)(∆+ − ∆−)] (3.1.130)= Tr[(Π+ + Λ0)∆+]− Tr[(Π+ + Λ0)∆−] (3.1.131)= Tr[∆+], (3.1.132)

where to obtain the last equality we used the facts that Tr[Π+∆+] = Tr[∆+]and Tr[Π+∆−] = 0 since Π+ and ∆− are by definition orthogonal. We alsoused Tr[Λ0∆+] = Tr[Λ0∆−] = 0, with these latter equalities following because0 ≤ Tr[Λ0∆±] ≤ Tr[Π0∆±] = 0. Therefore, using (3.1.127), we find that

Tr[A]− Tr[(Π+ + Λ0)(A− B)] =12(Tr[A + B]− ‖A− B‖1) . (3.1.133)

The operator Π++Λ0 thus achieves the bound in (3.1.128), which means that

infM:0≤M≤1

Tr[(1−M)A] + Tr[MB] =12(Tr[A + B]− ‖A− B‖1) , (3.1.134)

so that (3.1.120) is established.

To see that Π+ + Λ0 is the only form for an optimal measurement opera-tor, suppose that M is optimal, i.e., satisfies 0 ≤ M ≤ 1 and saturates (3.1.120)with equality. Then it follows that the two inequalities in (3.1.124) are satu-rated with equality, so that

Tr[M(∆+ − ∆−)] = Tr[M∆+] = Tr[∆+] = Tr[Π+∆+]. (3.1.135)

This implies that Tr[M∆−] = 0, where ∆− is the strictly negative part of A− B.Since both M and ∆− are positive semi-definite and Π− is the projection onto

92

Page 104: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

the strictly negative part of ∆−, we conclude that MΠ− = Π−M = 0. This inturn implies that

M(Π+ + Π0) = (Π+ + Π0)M = M, (3.1.136)

which, after sandwiching M ≤ 1 on the left and right by Π+ + Π0, impliesthat M ≤ Π+ + Π0. Since

0 = Tr[∆+(Π+ −M)] (3.1.137)= Tr[∆+(Π+ −Π+MΠ+)] (3.1.138)= Tr[∆+Π+(1−M)Π+], (3.1.139)

we find that Π+(1 − M)Π+ = 0. Now consider that Π−(1 − M)Π+ = 0because Π−Π+ = 0 and Π−MΠ+ = Π−(Π+ + Π0)MΠ+ = 0. So thenΠ+(1 − M)Π+ = 0 and Π−(1 − M)Π+ = 0 imply that (1 − M)Π+ = 0.From this equation, we conclude that Π+ = Π+M = MΠ+. By sandwich-ing Π+ ≤ 1 by M and applying operator monotonicity of the square-rootfunction (see Section 2.2.1.1), we conclude that Π+ ≤ M. Combining thisoperator inequality with the previous one, we conclude that an optimal Msatisfies Π+ ≤ M ≤ Π+ + Π0, which is equivalent to M decomposing asM = Π+ + Λ0 for 0 ≤ Λ0 ≤ Π0.

The equality in (3.1.121) follows as a rewrite of (3.1.120):

12‖A− B‖1 =

12

Tr[A + B]− infM:0≤M≤1

Tr[(1−M)A] + Tr[MB] (3.1.140)

= supM:0≤M≤1

12

Tr[A + B]− (Tr[(1−M)A] + Tr[MB]) (3.1.141)

= supM:0≤M≤1

12

Tr[B− A] + Tr[M(A− B)] (3.1.142)

=12

Tr[B− A] + supM:0≤M≤1

Tr[M(A− B)]. (3.1.143)

Rearranging this equality, we arrive at (3.1.121). An optimal M having theform Π+ + Λ0 again follows from (3.1.124), (3.1.130)–(3.1.132), and the rea-soning given above.

The Helstrom–Holevo theorem gives us the lowest possible error proba-bility in distinguishing between two states ρ and σ, given just one copy ofeither state. Suppose now that, instead of just one copy, Alice sends Bob

93

Page 105: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

n copies of either ρ or σ. Bob’s task is then to discriminate between the statesρ⊗n and σ⊗n. The Helstrom–Holevo theorem still applies in this case, so that

p∗err(λρ⊗n, λσ⊗n) =12(1−

∥∥λρ⊗n − (1− λ)σ⊗n∥∥1

)(3.1.144)

is the lowest possible error probability. However, because Bob now has ncopies of either ρ or σ, he can perform a discrimination strategy that involvesa collective measurement acting on the n copies of the state. This means thatthe optimal error exponent can generally be lower with n ≥ 2 copies thanwith just one copy. In other words, it is possible to have p∗err(λρ⊗n, λσ⊗n) <(p∗err(λ, ρ, σ))n. If we allow for an arbitrary number n of copies of eitherstate, then we are interested in the lowest possible error exponent that canbe achieved. Specifically, we are interested in the optimal asymptotic errorexponents, defined as follows:

ξ(ρ, σ) := lim infn→∞

−1n

log2 p∗err(λρ⊗n, λσ⊗n), (3.1.145)

ξ(ρ, σ) := lim supn→∞

−1n

log2 p∗err(λρ⊗n, λσ⊗n). (3.1.146)

In defining these exponents, we have suppressed the dependence on the priorprobability λ because, as we will see, they do not depend on them when tak-ing the limit. We define the two asymptotic quantities separately, mainly be-cause the limit limn→∞ need not exist a priori (see Section 2.3.1). The quantumChernoff divergence of states ρ and σ is defined as follows:

C(ρ‖σ) := sups∈(0,1)

(− log2 Tr[ρsσ1−s]

), (3.1.147)

and finds its operational meaning in the following theorem:

Theorem 3.14 Quantum Chernoff Bound

For every two quantum states ρ and σ, the optimal asymptotic errorexponents are equal to the quantum Chernoff divergence:

ξ(ρ, σ) = ξ(ρ, σ) = C(ρ‖σ). (3.1.148)

PROOF: See Appendix 3.A. An important element of the proof is Lemma 3.15below, which provides us with a lower bound on ξ(ρ, σ).

94

Page 106: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Since Theorem 3.14 establishes that the equality ξ(ρ, σ) = ξ(ρ, σ) holds,this means that the limit exists and we can simply define

ξ(ρ, σ) := limn→∞

−1n

log2 p∗err(λρ⊗n, λσ⊗n). (3.1.149)

REMARK: Theorem 3.14 tells us that, as n becomes large, the following approximationholds

p∗err(λρ⊗n, λσ⊗) ≈ 2−n ξ(ρ,σ). (3.1.150)

We can thus think of ξ(ρ, σ) as the optimal error exponent for the decay of the error prob-ability as the number n of copies increases.

Note that the optimal error exponent in (3.1.148) is independent of the prior probabil-ity distribution. This means that, in the asymptotic limit (i.e., in the limit n→ ∞), the priorprobability distribution is irrelevant for the optimal error exponent.

We call Theorem 3.14 the quantum Chernoff bound because it is analogous to theChernoff bound from (classical) symmetric hypothesis testing, which is the task of dis-criminating between two hypotheses, each of which has a corresponding probability dis-tribution (please consult the Bibliographic Notes in Section 3.6 for details).

Lemma 3.15

Let A and B be positive semi-definite operators. Then, for all s ∈ (0, 1),

12(Tr[A + B]− ‖A− B‖1) ≤ Tr[AsB1−s]. (3.1.151)

PROOF: Let ∆ := A − B, and let ∆+ and ∆− be the positive and negativeparts, respectively, of ∆, so that ∆ = ∆+ − ∆− (recall (2.2.51)). Then,

|∆| = |∆+ − ∆−| = ∆+ + ∆−. (3.1.152)

Therefore,

‖A− B‖1 = ‖∆‖1 = Tr[|∆|] = Tr[∆+] + Tr[∆−]. (3.1.153)

In addition, we write

A + B = A− B + 2B = ∆+ − ∆− + 2B, (3.1.154)

so that

12(Tr[A + B]− ‖A− B‖1)

95

Page 107: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

=12(Tr[∆+]− Tr[∆−] + 2Tr[B]− Tr[∆+]− Tr[∆−]) (3.1.155)

= Tr[B]− Tr[∆−]. (3.1.156)

So it suffices to prove that the following inequality holds for all s ∈ (0, 1):

Tr[B]− Tr[∆−] ≤ Tr[AsB1−s]. (3.1.157)

Using the fact that ∆+ ≥ 0, by definition of the positive part of ∆, we obtain

B + ∆+ ≥ B. (3.1.158)

Similarly, using A− B = ∆+ − ∆−, we obtain

A + ∆− = B + ∆+ ≥ B. (3.1.159)

By the operator monotonicity of x 7→ xs for s ∈ (0, 1) (recall Definition 2.5and thereafter), the inequality in (3.1.159) implies that

Bs ≤ (A + ∆−)s. (3.1.160)

Using this, along with the fact that Tr[B] = Tr[BsB1−s], we find that

Tr[B]− Tr[AsB1−s] = Tr[(Bs − As)B1−s] (3.1.161)

≤ Tr[((A + ∆−)s − As)B1−s] (3.1.162)

≤ Tr[((A + ∆−)s − As)(A + ∆−)1−s] (3.1.163)

= Tr[A] + Tr[∆−]− Tr[As(A + ∆−)1−s], (3.1.164)

Finally, observe that

A ≤ A + ∆− (3.1.165)

⇒ As2 A1−s A

s2 ≤ A

s2 (A + ∆−)1−s A

s2 (3.1.166)

⇒ A ≤ As2 (A + ∆−)1−s A

s2 (3.1.167)

⇒ Tr[A] ≤ Tr[As(A + ∆−)1−s]. (3.1.168)

Therefore, continuing from (3.1.164), we find that

Tr[B]− Tr[AsB1−s] ≤ Tr[A] + Tr[∆−]− Tr[As(A + ∆−)1−s] (3.1.169)≤ Tr[∆−], (3.1.170)

establishing (3.1.157), as required.

96

Page 108: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

3.1.3.1 Multiple State Discrimination

We now briefly consider state discrimination with more than two states. Sup-pose that Alice prepares a quantum system in a state chosen randomly froma set ρxx∈X of states. We assume that X is some finite alphabet with size|X| ≥ 2 and that the state ρx is chosen with probability p(x), where p : X →[0, 1] is a probability distribution. Alice sends her chosen state to Bob, whosetask is to guess the value of x, i.e., which state Alice sent. Bob’s knowledge ofthe system can be described by the ensemble (p(x), ρx

B)x∈X, which has thefollowing associated classical–quantum state:

ρXAB := ∑x∈X

p(x)|x〉〈x|XA ⊗ ρxB, (3.1.171)

where XA is a classical |X|-dimensional register that contains Alice’s choiceof state. Note that while Bob knows both the prior probability distribution pand the association x ↔ ρx between the labels x and the states ρx, he does nothave access to the register XA. (If Bob did have access to the classical registerXA, he could simply measure it in the basis |x〉x∈X and figure out what stateAlice sent.) Therefore, as before, Bob must make a measurement. His strategyis to choose a POVM Mx

Bx∈X with elements indexed by the elements of X. Ifhe obtains the outcome corresponding to x ∈ X, then he guesses that the statesent was ρx. The probability of success, given that the state ρx was selected,is thus Tr[Mx

BρxB], according to the Born rule. It then follows that the expected

success probability is

psucc(Mxx; (p(x), ρx)x) = ∑x∈X

p(x)Tr[Mxρx]. (3.1.172)

and furthermore that the optimal success probability is

p∗succ((p(x), ρx)x)

= supMxx

x∈Xp(x)Tr[Mxρx] : 0 ≤ Mx ≤ 1 ∀x, ∑

x∈XMx = 1

. (3.1.173)

We can also arrive at the conclusion in (3.1.173) by using the notion of ameasurement channel, which is a concept introduced later in Definition 3.28.Let MB→XB(·) := ∑x∈X Tr[Mx

B(·)]|x〉〈x|XB be the measurement channel cor-responding to the POVM Mx

Bx∈X, where XB is a |X|-dimensional classi-cal register. After the measurement, the classical–quantum state in (3.1.171)transforms to

ωXAXB := MB→XB(ρXAB) (3.1.174)

97

Page 109: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= ∑x,x′∈X

p(x)Tr[Mx′B ρx

B]|x〉〈x|XA ⊗ |x′〉〈x′|XB . (3.1.175)

LetΠsucc

XAXB:= ∑

x∈X|x〉〈x|XA ⊗ |x〉〈x|XB , (3.1.176)

which is a projector corresponding to the registers XA and XB having thesame value; this is what we want for state discrimination to be successful.The success probability of the strategy given by the POVM Mxx∈X is thus

psucc(Mxx; (p(x), ρx)x) = Tr[ΠXAXBωXAXB ] (3.1.177)

= ∑x∈X

p(x)Tr[Mxρx], (3.1.178)

and the optimal success probability is as given in (3.1.173).

The scenario of multiple state discrimination is very similar to the taskof classical communication over a quantum channel N that we consider inChapter 7. The classical messages to be sent correspond to the labels x ∈ X,while the states ρx correspond to an encoding of the messages into quan-tum states, which are then sent through the quantum channel, and p corre-sponds to the prior probability over the set of messages. The measurementperformed in order to guess the state corresponds to a decoding channel thatis applied at the receiving end of the quantum channel in order to determinethe message that was sent. The quantity in (3.1.173), evaluated for the ensem-ble (p(x),N(ρx))x∈X, is then the optimal average probability of correctlyguessing the message sent, where the optimization is over all POVMs indexedby the messages.

The optimal success probability in (3.1.173) can be computed by means ofa semi-definite program (recall Section 2.4), giving us a good opportunity topractice semi-definite programming with a well motivated physical example.The semi-definite program for (3.1.173) is as follows:

p∗succ = supMXB≥0

Tr[MXBρXB] : TrX[MXB] = 1B , (3.1.179)

where we have employed the shorthand p∗succ ≡ p∗succ((p(x), ρx)x) and ρXBis the classical–quantum state in (3.1.171), with the shorthand X ≡ XA. Thisis equal to (3.1.173) because the objective function Tr[MXBρXB] reduces to thatin (3.1.173) with Mx

B = 〈x|X MXB|x〉X, due to the classical–quantum structureof ρXB. The constraint TrX[MXB] = 1B is then the same as the constraint in

98

Page 110: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

(3.1.173). The SDP above is equivalent to the following one with a relaxedconstraint:

p∗succ = supMXB≥0

Tr[MXBρXB] : TrX[MXB] ≤ 1B , (3.1.180)

which follows because we can add in |0〉〈0|X ⊗ (1B − TrX[MXB]) as an addi-tional term to MXB if we do not have equality in the constraint in (3.1.180).Furthermore, the objective function in (3.1.180) only increases under this ad-dition. The SDP in (3.1.180) is now in the standard primal form from Defini-tion 2.20, with A = ρXB, X = MXB, B = 1B, and Φ(X) = TrX[MXB]. Since thedual map Φ† of the partial trace is tensoring with the identity, it follows fromDefinition 2.20 that the dual SDP is given by

p∗succ = infYB≥0Tr[YB] : 1X ⊗YB ≥ ρXB, (3.1.181)

which is equivalent to

p∗succ = infYB≥0Tr[YB] : YB ≥ p(x)ρx

B ∀x ∈ X. (3.1.182)

Strong duality holds because the operator MXB = 1XB/|X| is feasible for theprimal, while YB = ∑x∈X p(x)ρx

B + 1B is strictly feasible for the dual. So weconclude that psucc = p∗succ. The complementary slackness conditions fromProposition 2.23 reduce to the following for this case: YB = TrX[MXB]YB andρXBMXB = (IX ⊗ YB)MXB. Taking a partial trace over the second conditiongives TrX[ρXBMXB] = YBTrX[MXB] = YB, where we used the first condition toarrive at the last equality. This final equality becomes YB = ∑x∈X p(x)ρx

BMxB.

Thus, we find that an optimal dual operator YB is a direct function of an opti-mal measurement Mx

Bx∈X.

3.2 Quantum Channels

The evolution of a quantum system is described mathematically by a quantumchannel, which is a linear, completely positive, and trace-preserving (CPTP) mapacting on the quantum states of the underlying Hilbert space of the system.The main idea behind the formalism of quantum channels is that every ad-missible quantum physical evolution should accept a quantum state as inputand output a legitimate quantum state. Furthermore, if such an evolution actson a mixture of quantum states, then the output state should be equal to the

99

Page 111: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

mixture of the individual outputs. This latter requirement is called convexlinearity on density operators, and for each convex linear map acting on theconvex set of density operators, it is possible to define a unique linear mapacting on the space of all linear operators. The latter is the mathematical objectthat we employ. The fact that this linear map should map quantum states toquantum states means that it should be a trace-preserving and positive map.However, it is furthermore reasonable to demand that if the channel acts onone share A of a bipartite quantum state ρRA, then the output should be a le-gitimate bipartite quantum state. So we demand additionally that a quantumchannel should be not simply positive, but in fact completely positive. Thus,a quantum channel is a linear, completely positive, and trace-preserving map.

In detail, consider a map N : L(H)→ L(H′) (also called “superoperator,”as discussed previously in Section 2.2.4).

1. N is called positive if it maps positive semi-definite operators to positivesemi-definite operators, i.e., N(X) ≥ 0 for all X ≥ 0. A map N is calledcompletely positive if the map idk ⊗ N is positive for all integers k ≥ 1.Note that if N is a map acting on linear operators in L(Cd), then the mapidk ⊗N acts on linear operators in L(Ckd). In other words, for every lin-ear operator X acting on a kd-dimensional Hilbert space, which we candecompose as the block matrix

X =

X0,0 · · · X0,k−1... . . . ...

Xk−1,0 · · · Xk−1,k−1

, (3.2.1)

such that Xi,j is a d × d matrix for all 0 ≤ i, j ≤ k − 1, the action of themap idk ⊗N is defined as

(idk ⊗N)(X) =

N(X0,0) · · · N(X0,k−1)... . . . ...

N(Xk−1,0) . . . N(Xk−1,k−1)

. (3.2.2)

We can write this more compactly as follows. Noting that L(Ckd) is iso-morphic to L(Ck)⊗ L(Cd), we can write X ∈ L(Ckd) as

X =k−1

∑i,j=0|i〉〈j| ⊗ Xi,j. (3.2.3)

100

Page 112: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Then the action of idk ⊗N is defined as

(idk ⊗N)(X) =k−1

∑i,j=0|i〉〈j| ⊗N(Xi,j). (3.2.4)

Physically, the complete positivity of N takes into account the fact thatthe system of interest might be entangled with another system that isoutside of our control, so that simply letting N be positive is not suffi-cient to ensure that all positive semi-definite operators get mapped topositive semi-definite operators. Letting N be completely positive meansthat positive semi-definite operators are mapped to positive semi-definiteoperators even in this more general setting.

2. N is called trace preserving if Tr[N(X)] = Tr[X] for all linear operators X.

The defining properties of complete positivity and trace preservation forquantum channels together ensure that quantum states for the systems of in-terest get mapped to quantum states, even if they happen to be entangledwith other external systems outside of our control.

Thoughout this book, we write NA→B to denote a map N : L(HA) →L(HB) taking a quantum system A to a quantum system B. We sometimeswrite NA if the input and output systems of the channel have the same di-mension. We drop the subscript indicating the input and output systems ifthey are not important in the context being considered.

3.2.1 Choi Representation

The Choi representation of a quantum channel gives a way to represent aquantum channel as a bipartite operator and is an essential concept in quan-tum information theory.

Definition 3.16 Choi Representation

For every quantum channel NA→B, its Choi representation, or Choi opera-tor, is defined as

ΓNAB := NA′→B(|Γ〉〈Γ|AA′) =

dA−1

∑i,j=0|i〉〈j|A ⊗N(|i〉〈j|A′), (3.2.5)

101

Page 113: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

N ΦNAB

ΦAA′ A′

A

B

AliceBob

FIGURE 3.3: The Choi state ΦNAB of a quantum channel NA→B is the state

resulting from sending one share of the maximally entangled state ΦAA′ ,defined in (3.1.67), through N.

where HA′ is isomorphic to the Hilbert space HA corresponding to thechannel input system A. The Choi state ΦN

AB of N is defined as

ΦNAB :=

1dA

ΓNAB = NA′→B(ΦAA′), (3.2.6)

where ΦAA′ = |Φ〉〈Φ|AA′ is the maximally entangled state definedin (3.1.67).

The Choi representation ΓNAB of the channel NA→B is an operator acting

on HAB and uniquely characterizes the channel because it specifies the ac-tion of N on the basis |i〉〈j|A : 0 ≤ i, j ≤ dA − 1 of linear operators actingon HA. The Choi state ΦN

AB is simply the Choi representation normalized bythe dimension dA of the input system A of the channel N, and it is the stateresulting from sending one share of a maximally entangled state through N;see Figure 3.3.

Proposition 3.17 Quantum Dynamics from Choi Operator

Let NA→B be a quantum channel, and let XRA be a bipartite operator,with R an arbitrary reference system. Then the action of the channelidR ⊗NA→B on XRA can be expressed in terms of the Choi operator ΓN

ABas follows:

(idR ⊗NA→B)(XRA) = TrA[(TA(XRA)⊗ 1B)(1R ⊗ ΓNAB)], (3.2.7)

(idR ⊗NA→B)(XRA) = 〈Γ|A′A(XRA′ ⊗ ΓNAB)|Γ〉A′A, (3.2.8)

where TA denotes the partial transpose from Definition 3.9.

102

Page 114: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

PROOF: Observe that (3.2.5) implies that

NA→B(|i〉〈j|A) = (〈i|A ⊗ 1B)ΓNAB(|j〉A ⊗ 1B) (3.2.9)

for all 0 ≤ i, j ≤ dA− 1. We extend this by linearity to apply to every input op-erator XA by the following reasoning. Expanding XA as XA = ∑dA−1

i,j=0 Xi,j|i〉〈j|A,we find that

NA→B(XA) =dA−1

∑i,j=0

Xi,jNA→B(|i〉〈j|A) (3.2.10)

=dA−1

∑i,j=0

Xi,j(〈i|A ⊗ 1B)ΓNAB(|j〉A ⊗ 1B) (3.2.11)

=dA−1

∑i,j=0

Xi,jTrA[(|j〉〈i|A ⊗ 1B)ΓNAB] (3.2.12)

= TrA[(XTA ⊗ 1B)ΓN

AB]. (3.2.13)

So we conclude that the action of NA→B on every linear operator XA can beexpressed using the Choi representation as

NA→B(XA) = TrA[(XTA ⊗ 1B)ΓN

AB]. (3.2.14)

Now employing (2.2.33) and (2.2.30), we conclude that the action of NA→B onevery linear operator XA can be expressed alternatively as

NA→B(XA) = 〈Γ|A′A(XA′ ⊗ ΓNAB)|Γ〉A′A. (3.2.15)

The identities in (3.2.7) and (3.2.15) extend more generally to the case of thechannel idR⊗NA→B acting on a bipartite operator XRA by expanding XRA asXRA = ∑dR−1

i,j=0 |i〉〈j|R ⊗ Xi,jA and using linearity. We thus conclude (3.2.7) and

(3.2.8).

Using the definition of the Choi state and the maximally entangled state ΦA′A,we can write (3.2.8) as

〈Φ|A′A(XRA′ ⊗ΦNAB)|Φ〉A′A =

1d2

ANA′→B(XRA′). (3.2.16)

Comparing this equation with (3.1.112), we see that it has the following phys-ical interpretation: if we start with the systems R, A′, A and B in the state

103

Page 115: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

ρRA′⊗ΦNAB and we measure A′ and A according to the POVM |Φ〉〈Φ|A′A, 1A′A

−|Φ〉〈Φ|A′A, then the outcome corresponding to |Φ〉〈Φ|A′A occurs with prob-ability 1

d2A

and the post-measurement state on systems R and B is NA′→B(ρRA′).

The Choi state ΦNAB can thus be viewed as a resource state for the probabilistic

implementation of the channel N. We return to this point in Section 3.3 whenwe discuss post-selected quantum teleportation.

The concept of the Choi state allows us to associate to each quantum chan-nel NA→B a bipartite quantum state. Conversely, given a bipartite state ρAB,we can associate a map given by

XA 7→ dATrA[(XTA ⊗ 1B)ρAB]. (3.2.17)

It is straightforward to see that this map is completely positive; however, itis trace preserving if and only if TrB[ρAB] = πA. On the other hand, the mapN

ρA→B defined as

NρA→B(XA) := TrA

[(XT

A ⊗ 1B)ρ− 1

2A ρABρ

− 12

A

](3.2.18)

is a quantum channel whenever ρA is positive definite, where ρA = TrB[ρAB].

The operator ρ− 1

2A ρABρ

− 12

A is sometimes called a “conditional state,” motivatedby the fact that it reduces to a conditional probability distribution when ρABis a fully classical state, so that it can be written as ρAB = ∑x,y p(x, y)|x〉〈x|A⊗|y〉〈y|B where p(x, y) is a probability distribution. Note that if ρA is not invert-ible, then the inverse in (3.2.18) should be taken on the support of ρA (as in(2.2.56)), in which case the channel Nρ

A→B is defined as in (3.2.18) only on thesupport of ρA.

Given two quantum channels (N1)A1→B1 and (N2)A2→B2 , the Choi repre-sentation of the tensor product channel N1 ⊗N2 is given by

ΓN1⊗N2A1 A2B1B2

= ΓN1A1B1⊗ ΓN2

A2B2. (3.2.19)

This is due to the fact that the operator ΓA1 A2 A′1 A′2= |Γ〉〈Γ|A1 A2 A′1 A′2

splits intoa tensor product as

ΓA1 A2 A′1 A′2= ΓA1 A′1

⊗ ΓA2 A′2. (3.2.20)

Given two quantum channels NA→B and MB→C, the Choi representationof the composition (M N)A→C is given by

ΓMNAC = MB→C(ΓN

AB) (3.2.21)

104

Page 116: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= TrB[TB(ΓNAB)Γ

MBC] (3.2.22)

= 〈Γ|BB′ΓNAB ⊗ ΓM

B′C|Γ〉BB′ . (3.2.23)

The first equality follows immediately from the definition of the Choi repre-sentation, the second follows from (3.2.7), and the third from (3.2.8).

3.2.2 Characterizations of Quantum Channels: Choi, Kraus,Stinespring

The following theorem provides three useful ways to characterize quantumchannels, and as such, it is one of the most important theorems in quantuminformation theory.

Theorem 3.18 Quantum Channels

Let NA→B be a linear map from L(HA) to L(HB). Then the following areequivalent:

1. N is a quantum channel.

2. Choi: The Choi representation ΓNAB is positive semi-definite and sat-

isfies TrB[ΓNAB] = 1A.

3. Kraus: There exists a set Kiri=1 of operators such that

N(XA) =r

∑i=1

KiXAK†i (3.2.24)

for all linear operators XA, where Ki ∈ L(HA,HB) for all i ∈1, . . . , r, r ≥ rank(ΓN

AB), and ∑ri=1 K†

i Ki = 1A.

4. Stinespring: There exists an isometry VA→BE, with dE ≥ rank(ΓNAB),

such thatN(XA) = TrE[VXAV†] (3.2.25)

for all linear operators XA.

Please consult the Bibliographic Notes in Section 3.6 for references con-taining a proof of this theorem.

The operators Ki in (3.2.24) are called Kraus operators. They have an inter-

105

Page 117: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

pretation in quantum error correction as “error operators” that characterizevarious errors that a quantum system undergoes. Kraus operators for a givenquantum channel are, however, not unique in general. If Kir

i=1 is a set ofKraus operators for the channel N, then, given an s × r isometric matrix Vwith elements Vi,j : 1 ≤ i ≤ s, 1 ≤ j ≤ r, the operators K′is

i=1 defined as

K′i =r

∑j=1

Vi,jKj (3.2.26)

are also Kraus operators for N. Indeed, for every linear operator X, the fol-lowing equality holds

s

∑i=1

K′iX(K′i)† =

s

∑i=1

r

∑j,j′=1

Vi,jVi,j′KjXK†j′ (3.2.27)

=r

∑j,j′=1

(s

∑i=1

(V†)j′,iVi,j

)

︸ ︷︷ ︸(V†V)j′ ,j=δj′ ,j

KjXK†j′ (3.2.28)

=r

∑j=1

KjXK†j (3.2.29)

= N(X), (3.2.30)

where to obtain the second equality we used the fact that Vi,j′ = (V†)j′,i. Wenote also that a converse statement holds: if Kir

i=1 and K′isi=1 are two

sets of Kraus operators that realize the same quantum channel, then they arerelated by an isometry as in (3.2.26). This is a dynamical version of the state-ment made earlier in Section 3.1.1.2, the statement there being that all purifi-cation of a state are related by an isometry acting on the purifying system.

The isometry VA→BE in (3.2.25) is called an isometric extension of N. Physi-cally, the isometry can be thought of as modeling the interaction of the quan-tum system of interest with its environment, i.e., anything external to the quan-tum system that is not under our control. Stinespring’s theorem then tells usthat the evolution of every quantum system can be thought of as first an in-teraction of the system with its environment, followed by discarding the en-vironment; see Figure 3.4. Given a set Kir

i=1 of Kraus operators for N, wecan let the environment E correspond to a space of dimension r and define

106

Page 118: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

VA→BEρA

N(ρA)B

E

UAE′→BE′′

ρA N(ρA)

|0〉E′E′

A B

E′′

FIGURE 3.4: (Left) According to Stinespring’s theorem, the evolution ofevery quantum system A via a quantum channel NA→B can be describedas an interaction of the system A with its environment E via an isometryVA→BE, followed by discarding the environment. (Right) The isometryVA→BE can be extended to a unitary UAE′→BE′′ using, e.g., the construc-tion in (3.2.32), such that A and E′ are initially in a product state, with E′

starting in a pure state. The two systems then interact accoring to U, andafter discarding the environment E′′ the resulting state is the output of thechannel.

the isometry VA→BE as

VA→BE =r

∑j=1

Kj ⊗ |j− 1〉E. (3.2.31)

It is straightforward to show that this is indeed an isometric extension of Nsince TrE[VXV†] = ∑r

j=1 KjXK†j = N(X).

3.2.2.1 Unitary Extensions of Quantum Channels from Isometric Exten-sions

We can always extend every isometric extension VA→BE of a channel NA→B toa unitary UAE′→BE′′ in a way similar to what we used in (2.2.142) in the proofof the operator Jensen inequality (Theorem 2.15). Let the unitary UAE′→BE′′be defined as the following block matrix:

U =

V 1−VV† 0dBdE×d′

0dA×dA V† 0dA×d′

0d′×dA0d′×dBdE

1d′

, (3.2.32)

where we set d′ := (dB − 1)dA. Without the various dimensions indicated, Uis more simply expressed as

U =

V 1−VV† 00 V† 00 0 1

. (3.2.33)

107

Page 119: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Note that V is a dBdE × dA matrix and VV† is a dBdE × dBdE matrix. Further-more, let us suppose that dE = dAdB. Note that, by the Stinespring theorem, itis always possible to pick this as the dimension of HE since 0 < rank(ΓN

AB) ≤dAdB. Then

dBdE + dA + d′ = dAdB(dB + 1), (3.2.34)

and we conclude that U is a (dAdB(dB + 1)) × (dAdB(dB + 1)) matrix. It isalso indeed a unitary because

UU† =

V 1−VV† 00 V† 00 0 1

V† 0 01−VV† V 0

0 0 1

(3.2.35)

=

VV† + (1−VV†)(1−VV†) (1−VV†)V 0V†(1−VV†) V†V 0

0 0 1

(3.2.36)

=

1 0 00 1 00 0 1

. (3.2.37)

Similarly, it can be shown that U†U = 1. By defining the system E′ withdimension dE′ = dB(dB + 1), we can think of U as acting on the input tensor-product space HE′ ⊗HA. Then, we can embed the state ρA into this largerspace as

|0〉〈0|E′ ⊗ ρA =

ρA 0 00 0 00 0 0

, (3.2.38)

so that

U

ρA 0 00 0 00 0 0

U† =

VρV† 0 00 0 00 0 0

. (3.2.39)

By defining the system E′′ with dimension dE′′ = dA(dB + 1), we can think ofthe output space of U as the tensor-product space HB ⊗HE′′ , so that

N(ρA) = TrE[VρV] = TrE′′ [U(|0〉〈0|E′ ⊗ ρA)U†]. (3.2.40)

The above construction is not necessarily an efficient construction (usingas few extra degrees of freedom as possible), but it illustrates the principlethat every quantum channel can be thought of as arising from

1. adjoining an environment state |0〉〈0|E′ to the input system A,

108

Page 120: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

2. performing a unitary interaction U, and then

3. tracing over an output environment system E′′,

thus assigning a strong physical meaning to the notion of isometric extensionof a quantum channel; see Figure 3.4.

Another construction of a unitary operator that is less explicit but moreefficient is as follows. Let VA→BE be the isometric extension from (3.2.31),and let r = rank(ΓN

AB). Since r ≤ dAdB, without loss of generality, we cantake the output environment system E to have dimension dE = dAdB. Let|k〉A⊗ |`〉E′k,` be an orthonormal basis for the input system A and an inputenvironment system E′, with dE′ = d2

B, where 0 ≤ k ≤ dA − 1 and 0 ≤ ` ≤dE′ − 1. Then define the orthonormal vectors |φk,1〉BE for 0 ≤ k ≤ dA − 1 asfollows:

|φk,0〉BE :=

(dE

∑j=1

Kj ⊗ |j− 1〉E〈0|E′)|k〉A ⊗ |0〉E′ (3.2.41)

= VA→BE|k〉A. (3.2.42)

The fact that these vectors form an orthonormal set is a consequence of thefacts that V†V = 1A and |k〉AdA−1

k=0 is an orthonormal set. We then definethe action of the unitary UAE′→BE on these vectors |k〉A ⊗ |0〉E′ as follows:

UAE′→BE|k〉A ⊗ |0〉E′ = |φk,0〉BE, (3.2.43)

for 0 ≤ k ≤ dA − 1. By construction, we have that

dA−1

∑k=0|φk,0〉〈φk,0|BE = VV†. (3.2.44)

Now let a spectral decomposition of the projection 1BE − VV† of dimensiondBdE − dA = (d2

B − 1)dA be given by

1BE −VV† =dA−1

∑k=0

d2B−1

∑`=1|φk,`〉〈φk,`|BE. (3.2.45)

We can thus complete the action of the unitary UAE′→BE on the remainingvectors as follows:

UAE′→BE|k〉A ⊗ |`〉E′ = |φk,`〉BE, (3.2.46)

109

Page 121: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

for 0 ≤ k ≤ dA − 1 and 1 ≤ ` ≤ d2B − 1. Thus, the full unitary UAE′→BE is

specified as

UAE′→BE :=dA−1

∑k=0

d2B−1

∑`=0|φk,`〉BE〈k|A ⊗ 〈`|E′ . (3.2.47)

By construction, the following identity holds for every input state ρA:

UAE′→BE (ρA ⊗ |0〉〈0|E′) (UAE′→BE)† = VρAV†, (3.2.48)

so that we can realize the isometric channel ρA 7→ VρAV† by tensoring inan environment state |0〉〈0|E′ and applying the unitary UAE′→BE. We thenrealize the original channel NA→B by applying a final partial trace over theenvironment system E:

NA→B(ρA) = TrE[UAE′→BE (ρA ⊗ |0〉〈0|E′) (UAE′→BE)†]. (3.2.49)

3.2.2.2 Relating Extensions and Purifications by Quantum Channels

We can now establish a fundamental relationship between a purification ψRAof a state ρA and an extension ωR′A of ρA.

Proposition 3.19

Let ρA be a quantum state with purification ψRA. For every extensionωR′A of ρA, there exists a quantum channel NR→R′ such that

NR→R′(ψRA) = ωR′A. (3.2.50)

PROOF: Consider a purification φR′′R′A of ωR′A, with purifying system R′′

satisfying dR′′ ≥ rank(ωR′A). Since ωR′A is an extension of ρA, we have that

TrR′′R′ [φR′′R′A] = ρA, (3.2.51)

which means that φR′′R′A is a purification of ρA. The state ψRA is also a purifi-cation of ρA, which means that, by the isometric equivalence of purifications(see (3.1.39)–(3.1.42) and the paragraph thereafter), there exists an isometryVR→R′′R′ such that

VR→R′′R′ |ψ〉RA = |φ〉R′′R′A. (3.2.52)

Now, let us use this isometry to define the channel NR→R′ :

NR→R′(·) = TrR′′ [VR→R′′R′(·)V†R→R′′R′ ]. (3.2.53)

110

Page 122: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

It then follows that

NR→R′(ψRA) = TrR′′ [VR→R′′R′ψRAV†R→R′′R′ ] = TrR′′ [φR′′R′A] = ωR′A, (3.2.54)

as required.

Proposition 3.19 tells us that an extension of a quantum state can be “reached”via a quantum channel acting on a purification of the state. In this sense, apurification can be viewed as the “strongest” extension of a state.

3.2.3 Unital Maps and Adjoint Maps

Definition 3.20 Unital Map

A map NA→B is called unital if it preserves the identity operator 1A, i.e.,

N(1A) = 1B. (3.2.55)

REMARK: Observe that if N is trace preserving and A and B are finite dimensional, thenby taking the trace on both sides of (3.2.55) we find that dA = dB. This means that, infinite dimensions, it is necessary for unital channels to have the same input and outputdimensions.

Unital channels thus have the maximally mixed state πA as a fixed point.

Given that the set L(HA,HB) of linear operators between the Hilbert spac-es HA and HB is itself a Hilbert space under the Hilbert–Schmidt inner prod-uct defined in (2.2.21), it is possible to define the adjoint of a map N : L(HA)→ L(HB) as follows.

Definition 3.21 Adjoint Map

For a map N : L(HA) → L(HB), its adjoint map (N†)B→A is defined asthe unique linear map from L(HB) to L(HA) satisfying

〈YB,N(XA)〉 = 〈N†(YB), XA〉 (3.2.56)

for all X ∈ L(HA) and Y ∈ L(HB).

111

Page 123: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Given a channel NA→B with Kraus representation

N(X) =r

∑i=1

KiXK†i (3.2.57)

and Stinespring representation V satisfying

N(X) = TrE[VXV†], (3.2.58)

it is straightforward to verify using (3.2.56) that the adjoint map N† can berepresented in the following two ways:

N†(Y) =r

∑i=1

K†i YKi (3.2.59)

= V†(Y⊗ 1E)V. (3.2.60)

From this, the following facts are straightforward to verify:

1. The adjoint of a completely positive map is completely positive.

2. The adjoint of a trace preserving map is a unital map.

3. The adjoint of a unital map is trace preserving.

3.2.4 Appending, Replacement, Partial Trace, Isometric, Com-plementary, and Degradable Channels

We now detail a variety of quantum channels that tend to come up often inquantum information theory.

3.2.4.1 Preparation, Appending, and Replacement Channels

The preparation of a quantum system in a given (fixed) state, as well as takingthe tensor product of a state with a given (fixed) state, can both be viewed asquantum channels.

112

Page 124: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Definition 3.22 Preparation and Appending Channels

For a quantum system A and a state ρA, the preparation channel PρA isdefined for α ∈ C as

PρA(α) := αρA. (3.2.61)

When acting in parallel with the identity channel on a linear operator YBof another quantum system B, the preparation channel PρA is called theappending channel PρA and is defined as

PρA(YB) ≡ (PρA ⊗ idB)(YB) = ρA ⊗YB. (3.2.62)

In other words, the appending channel PρA ⊗ idB takes the tensor prod-uct of its argument with ρA.

One way to determine whether a map N on quantum states is completelypositive is to find a Kraus representation for it, i.e., a set Kii of operatorssuch that N(X) = ∑i KiXK†

i for all operators X, for then by Theorem 3.18 weget that N is completely positive. If in addition the Kraus operators satisfy∑i K†

i Ki = 1, then N is also trace preserving and therefore a quantum channel.

Supposing that ρA has a spectral decomposition of the form

ρA = ∑k

λk|φk〉〈φk|A, (3.2.63)

one set of Kraus operators for the preparation channel PρA is √λk|φk〉Ak. Aset of Kraus operators for the appending channel PρA⊗ idB is then √λk|φk〉⊗1Bk.

Definition 3.23 Replacement Channel

For a state σB, the replacement channel RσBA→B is defined as the channel

that traces out its input and replaces it with the state σB, i.e.,

RσBA→B(XA) = Tr[XA]σB. (3.2.64)

for all linear operators XA. When acting on one share of a bipartite stateρRA, the replacement channel RσB

A→B has the following action:

RσBA→B(ρRA) = TrA[ρRA]⊗ σB = ρR ⊗ σB. (3.2.65)

113

Page 125: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Observe that we can write the replacement channel RσBA→B as the com-

position of the partial trace over A followed by the preparation/appendingchannel PσB :

RσBA→B = PσB TrA. (3.2.66)

We often omit the superscript on RσBA→B when it is clear from the context that

the replacement state is σB.

3.2.4.2 Trace and Partial-Trace Channels

Recall the trace and partial trace of a linear operator from Definition 3.3. Asa map acting on linear operators, one can ask whether the partial trace is achannel. The answer, perhaps not surprisingly, is “yes.” In fact, observe thatthe definition in (3.1.13) of the partial trace TrB over B is already in Krausform, with Kraus operators Kj = 1A⊗ 〈j|B. This means that TrB is completelypositive. It is also trace preserving because

dB−1

∑j=0

K†j Kj =

dB−1

∑j=0

(1A ⊗ |j〉B)(1A ⊗ 〈j|B) = 1A ⊗dB−1

∑j=0|j〉〈j|B = 1AB, (3.2.67)

where we used the fact that ∑dB−1j=0 |j〉〈j|B = 1B.

Unlike the trace and partial trace, the transpose and partial transpose aretrace-preserving maps but not completely positive. Indeed, for the latter, re-call from (3.1.87) that TB(ΦAB) = 1

d FAB, so that its Choi representation isΓTB

AB = FAB, which we know has negative eigenvalues, as shown in (3.1.89).So by Theorem 3.18, the transpose map TB is not completely positive.

3.2.4.3 Isometric and Unitary Channels

Two more simple examples of quantum channels are isometric and unitarychannels. An isometric channel conjugates the channel input by an isometry,and a unitary channel conjugates the channel input by a unitary. Specifically,the isometric channel V corresponding to an isometry V is

V(X) := VXV†. (3.2.68)

Similarly, the unitary channel U corresponding to a unitary U is

U(X) := UXU†. (3.2.69)

114

Page 126: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Since every unitary is also an isometry, it follows that every unitary channelis an isometric channel. Isometric channels are completely positive becausethey can be described using only one Kraus operator, the isometry V. In fact,a quantum channel is isometric if and only if it has a single Kraus operator.

Observe that by the unitarity of U, the map

U†(Y) := U†YU, (3.2.70)

i.e., the conjugation by U†, is also a channel. In particular,

U† U = U U† = id, (3.2.71)

so that U† is the inverse channel of U.

On the other hand, for an isometry V, conjugation by V† is not necessarilya channel: although the map V†(Y) := V†YV is completely positive, it isnot necessarily trace preserving because VV† 6= 1 in general. However, bydefining the reversal channel RV as

RV(Y) := V†(Y) + Tr[(1−VV†)Y]σ, (3.2.72)

where σ is an arbitrary (but fixed) state, we find that RV is trace preserving:

Tr[RV(Y)] = Tr[V†(Y) + Tr[(1−VV†)Y]σ

](3.2.73)

= Tr[V†YV] + Tr[Y]− Tr[VV†Y] (3.2.74)= Tr[Y]. (3.2.75)

Since it is also completely positive, being the sum of completely positivemaps, the map RV is indeed a quantum channel. Like U†, the reversal chan-nel RV reverses the action of V:

(RV V)(X) = V†(V(X)) + Tr[(1−VV†)V(X)]σ (3.2.76)

= V†VXV†V + Tr[(1−VV†)VXV†]σ (3.2.77)

= X + (Tr[VXV†]− Tr[VV†VXV†])σ (3.2.78)= X. (3.2.79)

In the above sense, RV is a left inverse of V. Unlike U†, however, RV is not theright inverse of V because the equality (V RV)(Y) = Y need not hold.

115

Page 127: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

3.2.4.4 Complementary Channels

As stated earlier, the Stinespring representation NA→B(XA) = TrE[VXV†] ofa quantum channel NA→B, where VA→BE is an isometry, can be interpretedas an interaction of the quantum system A of interest with its environment Efollowed by discarding E. If instead we discard the system B of the isometricchannel VA→BE(XA) = VXV†, then we obtain the state of the environmentafter the interaction with A. This defines a channel complementary to NA→B.

Definition 3.24 Complementary Channel

Let VA→BE be an isometry and NA→B a quantum channel defined as

NA→B(XA) = TrE[VXV†]. (3.2.80)

The complementary channel for NA→B associated with the isometric exten-sion VA→BE is denoted by Nc

A→E and is defined as

NcA→E(XA) := TrB[VXV†]. (3.2.81)

Related to the above, the channel McA→E is a complementary channel for

the channel MA→B if there exists an isometric channel WA→BE such that

MA→B = TrE WA→BE (3.2.82)Mc

A→E = TrB WA→BE. (3.2.83)

Given a channel NA→B, there is not a unique complementary channel forNA→B, just as there is not a unique purification of a state ρA, nor is therea unique Kraus representation of a quantum channel. Similar to the latterscenarios however, it is possible to show that all complementary channels forNA→B are related by an isometric channel acting on their output. That is,let us suppose that (N1)

cA→E and (N2)

cA→E′ are complementary channels for

NA→B. Then there exists an isometric channel SE→E′ such that

(N2)cA→E′ = SE→E′ (N1)

cA→E. (3.2.84)

The notion of a complementary channel arises in the scenario in whichtwo parties, Alice and Bob, wish to communicate to each other using a chan-nel NA→B in the presence of an eavesdropper Eve. In this scenario, we cannaturally identify Alice and Bob with the quantum systems A and B and Eve

116

Page 128: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

with the system E, where E is as given in Definition 3.24. Any signals sentthrough the quantum channel by Alice are received by Bob via the action ofNA→B, while Eve receives a signal via the action of Nc

A→E. These concepts areimportant for private communication over quantum channels, which is thetopic of Chapter 11. Two important classes of channels are relevant in thiscontext.

Definition 3.25 Degradable and Anti-Degradable Channels

A channel NA→B is called degradable if there exists a channel DB→E,called a degrading channel, such that

DB→E NA→B = NcA→E, (3.2.85)

where NcA→E is a complementary channel of NA→B. The channel

NA→B is called anti-degradable if a complementary channel NcA→E of it is

degradable, i.e., if there exists a channel AE→B, called an anti-degradingchannel, such that

AE→B NcA→E = NA→B. (3.2.86)

See Figure 3.5 for a schematic depiction of degradable and anti-degradablechannels. A degradable channel is one whose complement can be simulated(via D) using the output of N. This means that Bob can simulate Eve’s re-ceived signal. On the other hand, an anti-degradable channel is such that Ncan be simulated (via A) using the output of Nc, which means that Eve cansimulate Bob’s received signal.

3.2.5 Amplitude Damping, Erasure, Pauli, and DepolarizingChannels

Let us now go through some common quantum channels of interest in quan-tum communication.

3.2.5.1 (Generalized) Amplitude Damping Channel

The amplitude damping channel with decay parameter γ ∈ [0, 1] is the channel Aγ

given by Aγ(ρ) = A1ρA†1 + A2ρA†

2, with the two Kraus operators A1 and A2

117

Page 129: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

A

B

E

N

Nc

D

(a) N degradable.

A

B

E

N

Nc

A

(b) N anti-degradable.

FIGURE 3.5: Degradable and anti-degradable channels. In (a), the channelN is degradable, meaning that there exists a channel D that Bob can applyto the output he receives via N that can be used to simulate what Evereceives via Nc. In (b) on the other hand, N is anti-degradable since Evecan simulate, using the channel E, what Bob receives.

defined asA1 =

√γ|0〉〈1|, A2 = |0〉〈0|+

√1− γ|1〉〈1|. (3.2.87)

It is straightforward to verify that A†1 A1 + A†

2 A2 = 1, so that Aγ is indeedtrace preserving.

Let ρ be a state with a matrix representation in the standard basis |0〉, |1〉as

ρ =

(1− λ α

α λ

). (3.2.88)

In order for ρ to be a state (positive semi-definite with unit trace), the condi-tions 0 ≤ λ ≤ 1 and λ(1− λ) ≥ |α|2 should hold, where α ∈ C. The outputstate Aγ(ρ) has the matrix representation

Aγ(ρ) =

(1− (1− γ)λ

√1− γα√

1− γα (1− γ)λ

). (3.2.89)

To obtain a physical interpretation of the amplitude damping channel,consider that it can written in the Stinespring form as

Aγ(ρ) = TrE[Uη(ρ⊗ |0〉〈0|E)(Uη)†], (3.2.90)

118

Page 130: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

where E is a qubit environment system, η := 1− γ, and

Uη =

1 0 0 00

√η i

√1− η 0

0 i√

1− η√

η 00 0 0 1

(3.2.91)

is unitary. Note that the action of Aγ on the pure states |0〉 and |1〉 is, respec-tively,

A1−η(|0〉〈0|A) = |0〉〈0|B,

A1−η(|1〉〈1|A) = (1− η)|0〉〈0|B + η|1〉〈1|B.(3.2.92)

A complementary channel Ac1−η for Aγ is defined from

Ac1−η(ρ) := TrB[Uη(ρ⊗ |0〉〈0|E)(Uη)†]. (3.2.93)

Note that the action of the complementary channel Ac1−η on these states is

Ac1−η(|0〉〈0|A) = |0〉〈0|E,

Ac1−η(|1〉〈1|A) = η|0〉〈0|E + (1− η)|1〉〈1|E.

(3.2.94)

We see that whenever the state |0〉〈0| is input to the channel, the output sys-tems B and E are both in the state |0〉〈0|. On the other hand, if the inputstate is |1〉〈1|, then B receives a mixed state: with probability 1− η, the stateis |0〉〈0|, and with probability η, the state is |1〉〈1|. The situation for E is re-versed, receiving |0〉〈0| with probability η and |1〉〈1| with probability 1− η.The unitary Uη can thus be viewed as a qubit analogue of a beamsplitter, andthe amplitude damping channel A1−η can be viewed as a qubit analogue ofthe pure-loss bosonic channel; see Figure 3.6.

A beamsplitter is an optical device that takes two beams of light as in-put and splits them into two separate output beams, with one of the outputbeams containing a fraction η of the intensity of the incoming beam and theother output beam containing the remaining fraction 1− η of the incoming in-tensity. When one of the input ports of the beamsplitter is empty, i.e., is in thevacuum state, the output to the receiver is by definition the pure-loss bosonicchannel. In the case of a single incoming photon, the pure-loss channel eithertransmits the photon with probability η (allowing it to go to the receiver) orreflects it with probability 1− η (sending it to the environment).

To draw the correspondence between the qubit amplitude damping chan-nel and the pure-loss bosonic channel described above, we can think of the

119

Page 131: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

ρ A1−η(ρ)

|0〉〈0|Uη

FIGURE 3.6: The amplitude damping channel A1−η can be interpreted,using (3.2.90), as an interaction with a qubit analogue of a bosonic beam-splitter unitary Uη, followed by discarding the output state of the envi-ronment. The channel from the sender to the receiver is from the left tothe right, while the input and output environment systems are at the topand the bottom, respectively. In the bosonic case, the state |0〉〈0| of theenvironmental input arm of the beamsplitter corresponds to the vacuumstate, which contains no photons, and the channel to the receiver’s end iscalled the pure-loss bosonic channel.

qubit state |0〉 as the vacuum state and the qubit state |1〉 as the state of asingle photon. It is possible to show that the output states of the amplitudedamping channel in (3.2.92) and (3.2.94) for the receiver and environment, re-spectively, then exactly match the action of the bosonic pure-loss channel onthe subspace spanned by |0〉 and |1〉 (please consult the Bibliographic Notes inSection 3.6 for further references on this connection). The amplitude dampingchannel can indeed, therefore, be viewed as the qubit analogue of the bosonicpure-loss channel.

By replacing the initial state |0〉〈0| of the environment in (3.2.90) with thestate

θNB := (1− NB)|0〉〈0|+ NB|1〉〈1|, NB ∈ [0, 1], (3.2.95)

we define the generalized amplitude damping channel A1−η,NB as

Aγ,NB(ρ) ≡ A1−η,NB(ρ) := TrE[Uη(ρ⊗ θNB)(Uη)†], (3.2.96)

where we again use the relation γ = 1− η. This channel has the followingfour Kraus operators:

A1 =√

1− NB

(1 00√

1− γ

), A2 =

√1− NB

(0√

γ0 0

), (3.2.97)

A3 =√

NB

(√1− γ 00 1

), A4 =

√NB

(0 0√γ 0

), . (3.2.98)

120

Page 132: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Note that the amplitude damping channel is a special case of the general-ized amplitude damping channel in which the thermal noise NB = 0, so thatA1−η = A1−η,0.

The state θNB is a qubit thermal state and can be regarded as a qubit ana-logue of the bosonic thermal state. The latter is a state corresponding to a heatbath with an average number of photons equal to NB. The generalized am-plitude damping channel can then be seen as a qubit analogue of the bosonicthermal noise channel.

3.2.5.2 Erasure Channel

The qubit erasure channel Ep with erasure probability p ∈ [0, 1] has the followingaction on an input qubit state ρ:

Ep(ρ) = (1− p)ρ + pTr[ρ]|e〉〈e|, (3.2.99)

where |e〉 is some unit vector that is orthogonal to all states in the input qubitHilbert space and |e〉〈e| is called the erasure state. For example, if the inputqubit Hilbert space is spanned by the standard basis |0〉, |1〉 and the outputqutrit Hilbert space is spanned by the standard basis |0〉, |1〉, |2〉, then wecan set |e〉 = |2〉.

It is straightforward to verify that a set of Kraus operators for Ep is√

1− p(|0〉〈0|+ |1〉〈1|),√p|e〉〈0|,√p|e〉〈1|

. (3.2.100)

The qudit erasure channel is a simple generalization of the qubit erasurechannel. In fact, it is defined by (3.2.99), with the main exception being thatthe input state ρ is a qudit input state. A set of Kraus operators for the quditerasure channel is√

1− p(|0〉〈0|+ · · ·+ |d− 1〉〈d− 1|),√p|e〉〈0|, . . . ,√

p|e〉〈d− 1|

.(3.2.101)

We now discuss how the pure-loss bosonic channel restricted to acting ondual-rail single-photon inputs, an important model for transmission of singlephotons through an optical fiber, corresponds to a qubit erasure channel. Thecorrespondence is illustrated in Figure 3.7.

Consider a dual-rail optical system, i.e., a quantum optical system withtwo distinct optical modes A1 and A2 representing the input to the channel.

121

Page 133: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

|0〉〈0|Uη

A1 B1

E1

|0〉〈0|Uη

A2 B2

E2

ρA1,A2 E1−η(ρA1,A2)

FIGURE 3.7: The qubit erasure channel E1−η, for η ∈ [0, 1], can be physi-cally realized by using a photonic dual-rail qubit system and passing eachof the two modes through a beamsplitter, modeled by the unitary Uη in(3.2.102), such that the input from the environment is the vacuum state.

As described at the beginning of Section 3.1, the two-dimensional subspacespanned by the states |0, 1〉A1,A2 , |1, 0〉A1,A2, consisting of a total of one pho-ton in either one of the two modes, constitutes a qubit system. We also let E1and E2 be two distinct optical modes constituting a dual-rail qubit systemrepresenting the environment of the channel. Finally, let B1 and B2 be twodistinct optical modes spanned by the states |0, 0〉B1,B2 , |0, 1〉B1,B2 , |1, 0〉B1,B2,so that together B1 and B2 constitute a qutrit system.

When the unitary corresponding to a quantum-optical beamsplitter actson the space spanned by |0, 0〉, |0, 1〉, |1, 0〉, it is equivalent to the upper left3× 3 matrix in (3.2.91). We can thus make use of this fact because the environ-ment state for the bosonic pure-loss channel is prepared in the state |0〉〈0|. LetUη

AiEi→BiEithen denote the beamsplitter unitary, for i ∈ 1, 2, with the fol-

lowing action on the basis |0, 0〉, |0, 1〉, |1, 0〉 of the input and output modes(in that order):

UηAiEi→BiEi

=

1 0 00

√η i

√1− η

0 i√

1− η√

η

. (3.2.102)

Letting ρA1 A2 be an arbitrary qubit state on the two modes A1 and A2 defined

122

Page 134: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

by

ρA1,A2 = (1− λ)|0, 1〉〈0, 1|A1,A2 + α|0, 1〉〈1, 0|A1,A2 + α|1, 0〉〈0, 1|A1,A2

+ λ|1, 0〉〈1, 0|A1,A2 ,(3.2.103)

the pure-loss bosonic channel on the dual-rail qubit system A1A2 is given by

TrE1,E2

[(Uη

A1E1→B1E1⊗Uη

A2E2→B2E2)(ρA1,A2 ⊗ |0, 0〉〈0, 0|E1,E2)

×(UηA1E1→B1E1

⊗UηA2E2→B2E2

)†]

.(3.2.104)

Although this has a similar form to (3.2.90), which defines the amplitudedamping channel, our particular realization of the qubit system in terms ofdual-rail single photons results in a completely different output from thatof the amplitude damping channel. In particular, using (3.2.103) along with(3.2.102), it is straightforward to show that

TrE1,E2

[(Uη

A1E1→B1E1⊗Uη

A2E2→B2E2)(ρA1,A2 ⊗ |0, 0〉〈0, 0|E1,E2)

×(UηA1E1→B1E1

⊗UηA2E2→B2E2

)†]

= ηρB1,B2 + (1− η)|0, 0〉〈0, 0|B1,B2

= E1−η(ρA1,A2). (3.2.105)

That is, the pure-loss bosonic channel on a dual-rail qubit system is simply thequbit erasure channel E1−η with erasure probability 1− η and erasure state|0, 0〉〈0, 0|B1,B2 . This means that a dual-rail single-photonic qubit sent througha pure-loss bosonic channel is transmitted to the receiver unchanged withprobability η, or it is lost, and replaced by the vacuum state |0, 0〉〈0, 0|, withprobability 1− η.

3.2.5.3 Pauli Channels

The Pauli channels constitute an important class of channels on qubit systems.They are based on the qubit Pauli operators X, Y, Z, which have the followingmatrix representation in the standard basis |0〉, |1〉:

X =

(0 11 0

), Y =

(0 −ii 0

), Z =

(1 00 −1

). (3.2.106)

The Pauli operators are Hermitian, satisfy X2 = Y2 = Z2 = 1, and mutuallyanti-commute.

123

Page 135: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

A general Pauli channel is one whose Kraus operators are proportional tothe Pauli operators, i.e.,

ρ 7→ pIρ + pXXρX + pYYρY + pZZρZ, (3.2.107)

where pI , pX, pY, pZ ≥ 0, pI + pX + pY + pZ = 1.

Here we highlight two particular Pauli channels of interest.

1. Dephasing channel: We let pI = 1− p and pZ = p for 0 ≤ p ≤ 1, andpX = pY = 0. The action of the dephasing channel is thus

ρ 7→ (1− p)ρ + pZρZ. (3.2.108)

To see why this is called the dephasing channel, consider again a generalstate ρ of the form (3.2.88) and let p = 1

2 . In this case, we call the channelthe completely dephasing channel, and it is straightforward to see that

ρ =

(1− λ α

α λ

)7→ 1

2ρ +

12

ZρZ =

(1− λ 0

0 λ

). (3.2.109)

In other words, the completely dephasing channel eliminates the off-diagonal elements of the input state (when represented in the standardbasis, which is the same basis in which Z is diagonal), so that the rela-tive phase between the |0〉 state and the |1〉 state vanishes and the statebecomes effectively classical.

In the more general case when p 6= 12 , the effect of the dephasing channel

is to reduce the magnitude of the off-diagonal elements:

ρ =

(1− λ α

α λ

)7→ (1− p)ρ + pZρZ (3.2.110)

=

(1− λ (1− 2p)α

(1− 2p)α λ

). (3.2.111)

2. Depolarizing Channel: For p ∈ [0, 1], the depolarizing channel is definedby letting pI = 1− p and pX = pY = pZ = p

3 , so that

ρ 7→ (1− p)ρ +p3(XρX + YρY + ZρZ). (3.2.112)

By using the identity

14

ρ +14

XρX +14

YρY +14

ZρZ = Tr[ρ]π, (3.2.113)

124

Page 136: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

we can equivalently write the action of the depolarizing channel as

ρ 7→(

1− 4p3

)ρ +

4p3

Tr[ρ]π, (3.2.114)

which has the interpretation that the state of the system is replaced bythe maximally mixed state π with probability 4p

3 . Observe, however, that4p3 can be interpreted as a probability only for 0 ≤ p ≤ 3

4 .

3.2.5.4 Generalized Pauli Channels

The qubit Pauli channels have a generalization to the qudit case also. Beforedefining the generalized Pauli channels, we first generalize the qubit Paulioperators to the Heisenberg–Weyl operators. The latter are defined as

Wz,x = Z(z)X(x), (3.2.115)

Z(z) :=d−1

∑k=0

e2πikz

d |k〉〈k|, (3.2.116)

X(x) :=d−1

∑k=0|k + x〉〈k|, (3.2.117)

for all 0 ≤ z, x ≤ d− 1. When the dimension d = 2, these operators reduceto the Pauli operators. The addition operation in the definition of X(x) isperformed modulo d. The operators X(x) and Z(z) satisfy the commutationrelation

Z(z)X(x) = e2πixz

d X(x)Z(z). (3.2.118)

The Heisenberg–Weyl operators are unitary, just like the Pauli operators; how-ever, unlike the Pauli operators, they are not Hermitian. In particular,

W†z,x = e−

2πixzd W−z,−x. (3.2.119)

It is also straightforward to show that

Wz1,x1Wz2,x2 = e−2πix1z2

d Wz1+z2,x1+x2 . (3.2.120)

Furthermore, the Heisenberg–Weyl operators are orthogonal with respect tothe Hilbert–Schmidt inner product, meaning that

〈Wz1,x1 , Wz2,x2〉 = Tr[W†z1,x1

Wz2,x2 ] = dδz1,z2δx1,x2 (3.2.121)

for all 0 ≤ z1, z2, x1, x2 ≤ d− 1. The following identity, called a “Heisenberg–Weyl twirl,” plays an important role here and throughout this book:

125

Page 137: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Lemma 3.26 Heisenberg–Weyl Twirl

Let M be a linear operator, and set πd := 1d . Then

1d2

d−1

∑z,x=0

Wz,x MW†z,x = Tr[M]πd. (3.2.122)

PROOF: The Heisenberg–Weyl operators form an irreducible projective uni-tary representation of the group Zd × Zd. This fact can be used to prove(3.2.122) (see Bibliographic Notes in Section 3.6).

Here we give a direct proof. To see the identity in (3.2.122), consider that

1d2 ∑

z,xWz,x MW†

z,x = (∆X ∆Z)(M), (3.2.123)

where

∆Z(M) :=1d

d−1

∑z=0

Z(z)MZ(z)†, ∆X(M) :=1d

d−1

∑x=0

X(x)MX(x)†. (3.2.124)

We prove (3.2.122) by showing that (∆X ∆Z)(|`〉〈m|) = δ`,m1/d, from whichthe statement in 3.2.122 follows by linearity because

(∆X ∆Z)(M) = (∆X ∆Z)

(∑`,m

M`,m|`〉〈m|)

(3.2.125)

= ∑`,m

M`,m(∆X ∆Z) (|`〉〈m|) (3.2.126)

= ∑`,m

M`,mδ`,m1

d(3.2.127)

= Tr[M]1

d. (3.2.128)

Consider that

∆Z(|`〉〈m|) =1d

d−1

∑z=0

Z(z)|`〉〈m|Z(z)† (3.2.129)

=1d

d−1

∑z=0

e2πi`z/d|`〉〈m|e−2πimz/d (3.2.130)

126

Page 138: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= |`〉〈m|(

1d

d−1

∑z=0

e2πi(`−m)z/d

)(3.2.131)

= |`〉〈m|δ`,m. (3.2.132)

Now consider that

∆X(|`〉〈`|) =1d

d−1

∑x=0

X(x)|`〉〈`|X(x)† (3.2.133)

=1d

d−1

∑x=0|`+ x〉〈`+ x| (3.2.134)

=1

d. (3.2.135)

This concludes the proof.

A generalized Pauli channel is then defined as

ρ 7→d−1

∑z,x=0

pz,xWz,xρW†z,x, (3.2.136)

where pz,xz,x is a probability distribution satisfying pz,x ≥ 0 for all 0 ≤z, x ≤ d− 1 and ∑z,x pz,x = 1. Thus, this channel randomly applies one of theHeisenber–Weyl operators in (3.2.115) to the input.

A special case of a generalized Pauli channel is a d-dimensional dephasingchannel

ρ 7→d−1

∑z=0

pzZ(z)ρZ(z)†, (3.2.137)

where pzz is a probability distribution. For this special case, only the phaseoperators Z(z) are applied randomly to the input ρ, with the effect being tomodify only the off-diagonal elements of ρ (with respect to the standard ba-sis).

The depolarizing channel on qubits can be generalized using the Heisen-berg–Weyl operators, as defined in (3.2.115). The qudit depolarizing channel isthen defined as

D(d)p (ρ) := (1− p)ρ +

pd2 − 1 ∑

(z,x) 6=(0,0)Wz,xρW†

z,x, 0 ≤ p ≤ 1. (3.2.138)

127

Page 139: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Setting

q ≡ q(p) :=pd2

d2 − 1(3.2.139)

the depolarizing channel can alternatively be written in terms of this repa-rameterization as

D(d)p (ρ) = (1− q) ρ + q Tr[ρ]

1

d, (3.2.140)

because

D(d)p (ρ) = (1− p) ρ +

pd2 − 1 ∑

(z,x) 6=(0,0)Wz,xρW†

z,x (3.2.141)

=

(1− p− p

d2 − 1

)ρ +

pd2 − 1 ∑

z,xWz,xρW†

z,x (3.2.142)

= (1− q) ρ + q1d2 ∑

z,xWz,xρW†

z,x (3.2.143)

= (1− q) ρ + q Tr[ρ]1

d, (3.2.144)

where the last equality follows from (3.2.122).

3.2.6 Classical–Quantum and Quantum–Classical Channels

Any classical probability distribution p : X → [0, 1] over a finite alphabet Xcan be represented as a quantum state of a |X|-dimensional system X that isdiagonal in an orthonormal basis |x〉x∈X of X. Specifically, the probabilitydistribution can be represented as the state

ρX = ∑x∈X

p(x)|x〉〈x|X. (3.2.145)

States that are diagonal in a preferred basis as in the above equation aretypically called classical states. In addition to being in one-to-one correspon-dence with classical probability distributions, classical states do not exhibitthe quantum properties of coherence and entanglement.

In Chapter 7, however, we are interested in classical communication overquantum channels. We are then interested in so-called classical–quantum chan-nels, which we discuss in this section, that take a classical state as input andoutput a quantum state.

128

Page 140: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Definition 3.27 Classical–Quantum Channel

A classical–quantum channel is a map from a system X with alphabet Xand orthonormal basis |x〉 : x ∈ X to a quantum system A with aspecified set σx

A : x ∈ X of states such that

|x〉〈x′| 7→ δx,x′σxA ∀ x, x′ ∈ X. (3.2.146)

If Ncq is a classical–quantum channel, then for a classical state ρX = ∑x∈X p(x)|x〉〈x|,we have that

Ncq(ρX) = ∑x∈X

p(x)σxA. (3.2.147)

More generally, for every |X|-dimensional system A and every state ρA that isnot necessarily classical and is expressed in the computational basis as ρA =∑x,x′∈X〈x|ρA|x′〉|x〉〈x′|, we find that

Ncq(ρA) = ∑x,x′∈X

〈x|ρA|x′〉δx,x′σxA = ∑

x∈X〈x|ρA|x〉σx

A. (3.2.148)

Therefore, for every quantum state, the classical–quantum channel Ncq takesthe input state, measures it in the computational basis |x〉x∈X, and with thecorresponding outcome probability 〈x|ρA|x〉, outputs the state σx

A.

The Choi state ΦNcqof a classical–quantum channel is

ΦNcq

XA =1|X| ∑

x,x′∈X|x〉〈x′|X ⊗N(|x〉〈x′|X) = ∑

x∈X

1|X| |x〉〈x|X ⊗ σx

A. (3.2.149)

In other words, the Choi state of a classical–quantum channel is a classical–quantum state.

If the states σxA have spectral decomposition

σxA =

rx

∑j=1

λxj |ϕx

j 〉〈ϕxj |, (3.2.150)

where rx = rank(σxA), then Kx

j : x ∈ X, 1 ≤ j ≤ rx is a set of Kraus

operators for Ncq, where Kxj =√

λxj |ϕx

j 〉A〈x|X. Indeed, for all x, x′ ∈ X,

∑x′′∈X

rx′′

∑j=1

Kx′′j |x〉〈x′|(Kx′′

j )†

129

Page 141: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= ∑x′′∈X

rx′′

∑j=1

√λx′′

j |ϕx′′j 〉⟨

x′′|x⟩ ⟨

x′|x′′⟩〈ϕx′′

j |√

λx′′j (3.2.151)

= δx,x′rx

∑j=1

λxk |ϕx

j 〉〈ϕxj | (3.2.152)

= δx,x′σxA (3.2.153)

= Ncq(|x〉〈x′|), (3.2.154)

and

∑x∈X

rx

∑j=1

(Kxj )

†Kxj = ∑

x∈X

rx

∑j=1

λxj |x〉 〈ϕx

j |ϕxj 〉︸ ︷︷ ︸

=1

〈x| (3.2.155)

= ∑x∈X

rx

∑j=1

λxj

︸ ︷︷ ︸=1 ∀ x

|x〉〈x| (3.2.156)

= 1X. (3.2.157)

Also, observe from the construction above that every classical–quantum chan-nel has a Kraus representation with unit-rank Kraus operators.

Having described classical–quantum channels, let us now describe chan-nels for which the situation is opposite, such that they accept quantum inputsand provide classical outputs.

Definition 3.28 Quantum–Classical Channel

Given a quantum system A and a measurement on A with correspond-ing POVM Mxx∈X indexed by elements of a finite alphabet X, aquantum–classical channel, or measurement channel, is a map from thequantum system A to a classical system X with alphabet X such that

ρA 7→ ∑x∈X

Tr[MxρA]|x〉〈x|X (3.2.158)

for every state ρA on A, where |x〉 : x ∈ X is an orthonormal basis forX.

A measurement channel thus takes the measurement outcome probabili-ties Tr[MxρA] and arranges them into a classical state.

130

Page 142: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Denoting the measurement channel in Definition 3.28 as MA→X, its Choistate ΦM

AX is as follows:

ΦMAX =

1dA

dA−1

∑i,j=0|i〉〈j|A ⊗ ∑

x∈XTr[Mx|i〉〈j|]|x〉〈x|X (3.2.159)

=1

dA∑

x∈X

dA−1

∑i,j=0|i〉〈j|A ⊗ 〈j|Mx|i〉|x〉〈x|X (3.2.160)

=1

dA∑

x∈X

dA−1

∑i,j=0|i〉〈j|Mx|i〉〈j|A ⊗ |x〉〈x|X (3.2.161)

=1

dA∑

x∈X(MT

x)A ⊗ |x〉〈x|X, (3.2.162)

where we applied Definition 3.9 to arrive at the last equality.

By writing a spectral decomposition of Mx as

Mx =rx

∑j=1

µxj |φx

j 〉〈φxj |, (3.2.163)

where rx = rank(Mx), we can write

Tr[Mxρ] =rx

∑j=1

µxj 〈φx

j |ρ|φxj 〉. (3.2.164)

Therefore, the action of a quantum–classical channel Nqc can be written as

Nqc(ρ) = ∑x∈X

Tr[Mxρ]|x〉〈x| (3.2.165)

= ∑x∈X

rx

∑j=1

µxj 〈φx

j |ρ|φxj 〉|x〉〈x| (3.2.166)

= ∑x∈X

rx

∑j=1

õx

j |x〉〈φxj |ρ|φx

j 〉〈x|√

µxj (3.2.167)

= ∑x∈X

rx

∑j=1

Kxj ρ(Kx

j )†, (3.2.168)

whereKx

j :=√

µxj |x〉〈φx

j | ∀ x ∈ X, 1 ≤ j ≤ rx. (3.2.169)

131

Page 143: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Since

∑x∈X

rx

∑j=1

(Kxj )

†Kxj = ∑

x∈X

rx

∑j=1

µxj |φx

j 〉 〈x|x〉〈φxj | (3.2.170)

= ∑x∈X

rx

∑j=1

µxj |φx

j 〉〈φxj |

︸ ︷︷ ︸Mx

(3.2.171)

= ∑x∈X

Mx (3.2.172)

= 1A, (3.2.173)

it holds that Kxj : x ∈ X, 1 ≤ j ≤ rx is a set of Kraus operators for Nqc. This

means that all quantum–classical channels have a Kraus representation withunit-rank Kraus operators.

3.2.7 Quantum Instruments

Recall from Section 3.1.2 that, for an arbitrary POVM Mxx∈X, a set of post-measurement states corresponding to an initial state ρ can be given by (3.1.104),

ρx =KxρK†

xTr[KxρK†

x]∀ x ∈ X, (3.2.174)

where Kxx∈X is a set of operators such that Mx = K†xKx for all x ∈ X.

Also recall that the expected density operator of the measurement is givenby (3.1.105),

∑x∈X

KxρK†x. (3.2.175)

This expected state can be seen as arising from a quantum channel with Krausoperators Kxx∈X. Note that this map is indeed a channel since ∑x∈X K†

xKx =∑x∈X Mx = 1. We can view the channel as being the sum of the completelypositive and trace-non-increasing maps Mx defined as

Mx(ρ) = KxρK†x. (3.2.176)

A quantum instrument generalizes this notion to maps Mx that are com-pletely positive and trace non-increasing with an arbitrary number of Krausoperators, not just one.

132

Page 144: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Definition 3.29 Quantum Instrument

A quantum instrument is a collection Mxx∈X of completely positive andtrace-non-increasing maps indexed by elements of a finite alphabet X,such that the sum map ∑x∈XMx is a quantum channel.

A quantum instrument corresponds to a more general notion of a mea-surement in which, as before, the elements of the alphabet X correspond tothe measurement outcomes. If the initial state of the system is ρ, then thecorresponding measurement outcome probabilities p(x) are given by

p(x) = Tr[Mx(ρ)], (3.2.177)

and the corresponding post-measurement state is

ρx =Mx(ρ)

Tr[Mx(ρ)]. (3.2.178)

The expected state of the ensemble (p(x), ρx)x∈X is then

∑x∈X

Mx(ρ). (3.2.179)

It is customary to define the output of a quantum instrument Mxx∈X asthe channel M defined by

M(ρ) = ∑x∈X|x〉〈x|X ⊗Mx(ρ). (3.2.180)

That is, the output of a quantum instrument is a classical–quantum state suchthat the classical register X stores the outcome of the measurement. This isunlike the expected state in (3.2.179), which represents a lack of knowledge ofwhich measurement outcome occurred.

Note that the channel in (3.2.180) corresponding to a quantum instru-ment reduces to the measurement channel defined in (3.2.158) if we con-sider a measurement with POVM Mxx∈X and we define the maps Mx asMx(ρ) = Tr[Mxρ] for all x ∈ X. In this case, the channel in (3.2.180) becomes,

M(ρ) = ∑x∈X

Tr[Mxρ]|x〉〈x|X, (3.2.181)

which is precisely the measurement channel in (3.2.158).

133

Page 145: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

3.2.8 Entanglement-Breaking Channels

An important class of channels in quantum communication consists of thosethat, when acting on one share of a bipartite state, eliminate any entanglementbetween the two systems. Such channels are called entanglement breaking, andwe define them as follows.

Definition 3.30 Entanglement-Breaking Channel

A channel NA→B is called entanglement breaking if (idR ⊗NA→B)(ρRA) isa separable state for every state ρRA, where R is an arbitrary referencesystem.

Although Definition 3.30 suggests that it is necessary to check an infi-nite number of input states to determine whether a channel is entanglementbreaking, the following proposition states that it is only necessary to checkthe output of the channel on the maximally entangled state.

Proposition 3.31

A channel N is entanglement breaking if and only if its Choi state ΦNAB

is separable.

PROOF: Observe that if NA→B is entanglement breaking, then its Choi stateΦN

AB is separable. On the other hand, if the Choi state ΦNAB of a given channel

N is separable, then it is of the form

ΦNAB = ∑

x∈Xp(x)σx

A ⊗ τxB (3.2.182)

for some probability distribution p : X→ [0, 1] on a finite alphabet X and setsof states σx

A : x ∈ X, τxB : x ∈ X. We note that the property TrB[ΦN

AB] = πAof the Choi state translates to

πA =1AdA

= ∑x∈X

p(x)σxA. (3.2.183)

Then, for every reference R and state ξRA on HRA, we find by using (3.2.7)

(idR ⊗N)(ξRA) = dATrA[(TA(ξRA)⊗ 1B)(1R ⊗ΦNAB)] (3.2.184)

= ∑x∈X

q(x)ωxR ⊗ τx

B , (3.2.185)

134

Page 146: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

where

ωxR :=

TrA[TA(ξRA)(1R ⊗ dA p(x)σxA)]

q(x), (3.2.186)

q(x) := p(x)dATr[ξTAσx

A]. (3.2.187)

Now, the map x 7→ q(x) is a probability distribution on X since q(x) ≥ 0 forall x ∈ X and

∑x∈X

q(x) = dATr

[ξT

A

(∑

x∈Xp(x)σx

A

)](3.2.188)

= dATr[ξTAπA] (3.2.189)

= Tr[ξA] (3.2.190)= 1, (3.2.191)

where we have made use of (3.2.183). So (idR⊗N)(ξRA) is a separable state.

Another useful characterization of entanglement breaking channels is thr-ough their Kraus representations, as shown in the following proposition:

Proposition 3.32

A channel N is entanglement breaking if and only if there exists a set ofKraus operators for N, with each Kraus operator having unit rank.

PROOF: First suppose that the Kraus operators of N have unit rank. Theyare therefore of the form |φj〉B〈ψj|A =: Kj for 1 ≤ j ≤ r. Without loss ofgenerality, we can let each vector in the set |φj〉r

j=1 be normalized. Then,since N is trace preserving, it holds that

1A =r

∑j=1

K†j Kj =

r

∑j=1|ψj〉A

⟨φj|φj

⟩〈ψj|A =

r

∑j=1|ψj〉〈ψj|A. (3.2.192)

Now, for every reference system R of arbitrary dimension and every stateρRA, we find that

(idR ⊗N)(ρRA) =r

∑j=1

(1R ⊗ Kj)ρRA(1R ⊗ K†j ) (3.2.193)

=r

∑j=1

(1R ⊗ |φj〉B〈ψj|A)ρRA(1R ⊗ |ψj〉A〈φj|B) (3.2.194)

135

Page 147: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

=r

∑j=1

(1R ⊗ 〈ψj|A)(ρRA)(1R ⊗ |ψj〉A)⊗ |φj〉〈φj|B (3.2.195)

=r

∑j=1

p(j)σjR ⊗ |φj〉〈φj|B, (3.2.196)

where

σjR :=

(1R ⊗ 〈ψj|A)(ρRA)(1R ⊗ |ψj〉A)p(j)

, p(j) := 〈ψj|AρA|ψj〉A. (3.2.197)

Note that j 7→ p(j) is a probability distribution since p(j) ≥ 0 for all j, and

r

∑j=1

p(j) =r

∑j=1〈ψj|AρA|ψj〉A (3.2.198)

=r

∑j=1

Tr[|ψj〉〈ψj|ρA] (3.2.199)

= Tr

[(r

∑j=1|ψj〉〈ψj|A

)ρA

](3.2.200)

= Tr[ρA] (3.2.201)= 1, (3.2.202)

where we have made use of (3.2.192). Therefore, (idR ⊗N)(ρRA) is a separa-ble state, so that N is entanglement breaking.

Now, suppose that N is entanglement breaking. This means that its Choistate ΦN

AB is a separable state, which means that it can be written as

ΦNAB = ∑

x∈Xp(x)|ψx〉〈ψx|A ⊗ |φx〉〈φx|B (3.2.203)

for some probability distribution p : X→ [0, 1] on a finite alphabet X and setsof pure states |ψx〉A : x ∈ X, |φx〉B : x ∈ X. Now, define the unit-rankoperators

Kx :=√

dA p(x)|φx〉B〈ψx|A, x ∈ X. (3.2.204)

Then, for every orthonormal basis state |i〉A on A, we have

Kx|i〉A =√

dA p(x)|φx〉B⟨ψx|i

⟩(3.2.205)

=√

dA p(x) 〈i|ψx〉 |φx〉B (3.2.206)

136

Page 148: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

=√

dA p(x)(〈i|A ⊗ 1B)(|ψx〉A ⊗ |φx〉B). (3.2.207)

Using this, we find that

N(|i〉〈i′|A) = dA(〈i|A ⊗ 1B)ΦNAB(|i′〉A ⊗ 1B)

= ∑x∈X

√dA p(x)(〈i|A ⊗ 1B)(|ψx〉A ⊗ |φx〉B)

× (〈ψx|A ⊗ 〈φx|B)(|i′〉A ⊗ 1B)√

dA p(x)

= ∑x∈X

Kx|i〉〈i′|AK†x. (3.2.208)

This holds for all 0 ≤ i, i′ ≤ dA − 1, which means that Kxx∈X is a set ofKraus operators for N, each of which has unit rank.

From this proposition, we immediately see that both quantum-classicaland classical–quantum channels are entanglement breaking, since each onehas a Kraus representation with unit-rank Kraus operators. This implies thata composition of a quantum–classical channel followed by a classical-quan-tum channel is also entanglement-breaking, for every such map can be writ-ten as

ρ 7→ ∑x∈X

Tr[Mxρ]σx, (3.2.209)

where X is a finite alphabet, σxx∈X is a set of quantum states, and Mxx∈Xis a POVM. If each POVM element Mx has a spectral decomposition of theform

Mx =rx

∑k=1

λxk |ψx

k 〉〈ψxk |, (3.2.210)

where rx = rank(Mx), and each state σx has a spectral decomposition of theform

σx =sx

∑`=1

αx` |φx

` 〉〈φx` |, (3.2.211)

where sx = rank(σx), then (3.2.209) can be written as

ρ 7→ ∑x∈X

rx

∑k=1

sx

∑`=1

λxk αx

` |φx` 〉〈φx

` |〈ψxk |ρ|ψx

k 〉 (3.2.212)

= ∑x∈X

rx

∑k=1

sx

∑`=1

√λ`

kαx` |φx

` 〉〈ψxk |ρ|ψx

k 〉〈φx` |√

λxk αx

` (3.2.213)

137

Page 149: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= ∑x∈X

rx

∑k=1

sx

∑`=1

Kxk,`ρ(K

xk,`)

†, (3.2.214)

where Kxk,` :=

√λx

k αx` |φx

` 〉〈ψxk |. Since Kx

k,` : x ∈ X, 1 ≤ k ≤ rx, 1 ≤ ` ≤ sxis a set of Kraus operators for the map, with each Kraus operator having unitrank, it holds by Proposition 3.32 that the map in (3.2.209) is entanglementbreaking.

A channel of the form (3.2.209) is sometimes called a “measure-and-preparechannel” or an “intercept-resend channel” since the input to the channel isfirst measured then replaced by a new state conditioned on the measurementoutcome. Remarkably, any entanglement breaking channel can be written inthe form (3.2.209), as we now show.

Theorem 3.33

For every entanglement-breaking channel N, there exists a finite alpha-bet X, a set σxx∈X of states, and a POVM Mxx∈X such that the actionof N can be written as

N(ρ) = ∑x∈X

Tr[Mxρ]σx (3.2.215)

for every state ρ.

PROOF: By Proposition 3.32, the action of N can be written as

N(ρ) =r

∑j=1

KjρK†j , (3.2.216)

where r = rank(ΓN) and Kjrj=1 is a set of Kraus operators for N, with each

Kraus operator having unit rank. Since each Kraus operator has unit rank, itholds that Kj = |φj〉〈ψj| for all 1 ≤ j ≤ r, where |φj〉r

j=1 and |ψj〉rj=1 are

sets of pure states. Since N is trace preserving, it holds that

r

∑j=1

K†j Kj =

r

∑j=1|ψj〉

⟨φj|φj

⟩〈ψj| =

r

∑j=1|ψj〉〈ψj| = 1. (3.2.217)

This implies that |ψj〉〈ψj|rj=1 is a POVM. Therefore, defining the alphabet

X = 1, 2, . . . , r, the POVM elements Mx := |ψx〉〈ψx| and states σx := |φx〉〈φx|,

138

Page 150: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

we have thatN(ρ) = ∑

x∈XTr[Mxρ]σx, (3.2.218)

as required.

An extreme example of a measure-and-prepare channel, as in (3.2.215), isone in which the output is a fixed state σ for every outcome of the measure-ment described by the POVM Mxx∈X. In this case, the channel in (3.2.215)takes the form

N(ρ) = ∑x∈X

Tr[Mxρ]σ = Tr[ρ]σ = Rσ(ρ), (3.2.219)

where to obtain the second equality we used the fact that ∑x∈X Mx = 1, andfor the last equality we used the definition of the replacement channel for σin Definition 3.23. This means that every replacement channel is a measure-and-prepare channel, and in particular an entanglement-breaking channel.

The development above Proposition 3.32 tells us that to every entangle-ment breaking channel there is associated a separable state, namely, the Choistate. Conversely, given a separable state ρAB = ∑x∈X p(x)ωx

A⊗ τxB , the chan-

nel NρA→B defined in (3.2.18) has the form

NρA→B(XA) = ∑

x∈XTr[XAMx

A]τxB , (3.2.220)

where

MxA = p(x)

(ρ− 1

2A ωx

Aρ− 1

2A

)T

(3.2.221)

for all x ∈ X. In other words, by Theorem 3.33, every separable state can beassociated with an entanglement-breaking channel.

3.2.9 Hadamard Channels

It turns out that every entanglement-breaking channel can be regarded as thecomplement of what is called a Hadamard channel, which we now define.

Definition 3.34 Hadamard Channel

A channel N is called a Hadamard channel or a Schur channel if there existsa positive semi-definite operator N with unit diagonal elements (in the

139

Page 151: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

standard basis) and an isometry V such that

N(X) = N ∗VXV† (3.2.222)

for all linear operators X, where N ∗VXV† is the Hadamard product, alsocalled the Schur product, which is defined as the element-wise product ofthe operators N and VXV† when represented as matrices with respectto the standard basis.

Given linear operators X and Y acting on a d-dimensional Hilbert space,with matrix representations in the standard basis as

X =d−1

∑i,j=0

Xi,j|i〉〈j|, Y =d−1

∑i,j=0

Yi,j|i〉〈j|, (3.2.223)

the Hadamard product X ∗Y is the operator with matrix representation in thestandard basis given by the product of the matrix elements of X and Y:

X ∗Y =d−1

∑i,j=0

Xi,jYi,j|i〉〈j|. (3.2.224)

Note that the Hadamard product can be defined as the element-wise productwith respect to an arbitrary orthonormal basis; however, we only consider theHadamard product only with respect to the standard basis.

The positive semi-definiteness of N in the definition of a Hadamard chan-nel is necessary and sufficient for the map N defined in (3.2.222) to be com-pletely positive, while the fact that N has unit diagonal elements in the stan-dard basis and that V is an isometry ensures that N is trace preserving.

The following fact about complements of Hadamard channels providesan important characterization of Hadamard channels:

Proposition 3.35

Any complement of a Hadamard channel is entanglement breaking.

PROOF: Suppose NA→B is a Hadamard channel between d-dimensional sys-tems A and B with associated positive semi-definite operator N having unitdiagonal elements in the standard basis and with associated isometry V. Since

140

Page 152: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

N is positive semi-definite and it has unit diagonal elements, it can be ex-pressed as the Gram matrix of some set |ψi〉 : 1 ≤ i ≤ d of normalizedvectors, so that

N =d−1

∑i,j=0

⟨ψi|ψj

⟩|i〉〈j|. (3.2.225)

Therefore, using (3.2.224) and Definition 3.34, the action of N can be writtenas

N(X) =d−1

∑i,j=0

⟨ψi|ψj

⟩〈i|VXV†|j〉|i〉〈j|B. (3.2.226)

Now, set|φi〉 := V†|i〉 ⇒ 〈φi| = 〈i|V, (3.2.227)

so that

N(X) =d−1

∑i,j=0

⟨ψi|ψj

⟩〈φi|X|φj〉|i〉〈j|B. (3.2.228)

Consider the operator (UN)A→BE defined as

UN :=d−1

∑i=0|i〉B〈φi|A ⊗ |ψi〉E. (3.2.229)

Since V is an isometry, and the vectors |ψi〉 : 1 ≤ i ≤ d are normalized, itfollows that

(UN)†UN =d−1

∑i=0|φi〉〈φi| =

d−1

∑i=0

V†|i〉〈i|V = V†V = 1. (3.2.230)

This means that UN is an isometry. Furthermore,

UNX(UN)† =d−1

∑i,j=0〈φi|X|φj〉|ψi〉〈ψj|E ⊗ |i〉〈j|B, (3.2.231)

so that TrE[UNX(UN)†] = N(X). The operator UN is therefore an isometricextension of N. A complementary channel then results from tracing out B in(3.2.231), i.e.,

Nc(X) = TrB[UNX(UN)†] (3.2.232)

=d−1

∑i=0〈φi|X|φi〉|ψi〉〈ψi|E (3.2.233)

141

Page 153: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

=d−1

∑i=0|ψi〉〈φi|X|φi〉〈ψi| (3.2.234)

=d−1

∑i=0

KiXK†i , (3.2.235)

where Ki := |ψi〉〈φi|. So Nc has a Kraus representation with unit-rank Krausoperators, which means, by Proposition 3.32, that Nc is entanglement break-ing. Every complement of a channel is related to another complement by anisometric channel acting on the output of the complement, and this does notchange the entanglement-breaking property.

By following the proof above backwards, we find that every entangle-ment-breaking channel is the complement of some Hadamard channel.

3.2.10 Petz Recovery Map

The following channel plays an important role in variations of the data-processinginequality for the quantum relative entropy and other entropic quantities thatwe define in the next chapter.

Definition 3.36 Petz Recovery Map

Let σ ∈ L(H) be a positive semi-definite operator and let N : L(H) →L(H′) be a quantum channel. The Petz recovery map for σ and N isthe completely positive and trace-non-increasing map Pσ,N : L(H′) →L(H′) defined as

Pσ,N(X) := σ12N†

(N(σ)−

12 XN(σ)−

12

12 (3.2.236)

for all X ∈ L(H′).

REMARK: If the operator N(σ) is invertible, then the Petz recovery map Pσ,N is a channel.If the operator N(σ) is not invertible, then the inverse N(σ)−

12 is taken on the support on

N(σ), following the convention from Section 2.2.1.1. In this latter case, the Petz recoverymap Pσ,N is a trace non-increasing map.

The Petz recovery map is indeed completely positive because it is the com-position of the following completely positive maps:

142

Page 154: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

1. Sandwiching by the positive semi-definite operator N(σ)−12 .

2. The adjoint N† of N.

3. Sandwiching by the positive semi-definite operator σ12 .

The Petz recovery map is also trace non-increasing, as we can readily verify.For all positive semi-definite operators X, the following holds

Tr[Pσ,N(X)] = Tr[σ

12N†

(N(σ)−

12 XN(σ)−

12

12

](3.2.237)

= Tr[σN†

(N(σ)−

12 XN(σ)−

12

)](3.2.238)

= Tr[N(σ)N(σ)−

12 XN(σ)−

12

](3.2.239)

= Tr[N(σ)−

12N(σ)N(σ)−

12 X]

(3.2.240)

= Tr[ΠN(σ)X] (3.2.241)

≤ Tr[X], (3.2.242)

where ΠN(σ) is the projection onto the support of N(σ), which arises becauseN(σ) need not be invertible. If X is contained in the support of N(σ), thenTr[ΠN(σ)X] = Tr[X], which means that the Petz recovery channel is trace-preserving for all inputs with support contained in the support of N(σ).

One of the important properties of the Petz recovery channel Pσ,N is thatit reverses the action of N on σ whenever N(σ) is invertible. In particular,

Pσ,N(N(σ)) = σ12N†

(N(σ)−

12N(σ)N(σ)−

12

12 (3.2.243)

= σ12N†(1H′)σ

12 (3.2.244)

= σ, (3.2.245)

where we have used the fact that N(σ) is invertible, so that

N(σ)−12N(σ)N(σ)−

12 = 1H′ . (3.2.246)

Then, since N is a channel, its adjoint is unital, which leads to the final equal-ity.

143

Page 155: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

REMARK: The equality in (3.2.245) holds more generally; i.e., it holds even when N(σ)is not invertible. To see this, we use the fact that the projection ΠN(σ) onto the support ofN(σ) satisfies ΠN(σ) ≤ 1H′ . Then, we find that

Pσ,N(N(σ)) = σ12N†

(N(σ)−

12N(σ)N(σ)−

12

12 (3.2.247)

= σ12N†(ΠN(σ))σ

12 (3.2.248)

≤ σ12N†(1H′)σ

12 (3.2.249)

= σ. (3.2.250)

On the other hand, if we let U : H→ H′ ⊗HE be an isometric extension of N, then we canuse Lemma 3.76 from Appendix 3.B, which implies that supp(UσU†) ⊆ supp(N(σ)⊗ 1E),and in turn implies that ΠUσU† ≤ ΠN(σ)⊗1E

. Then, for every vector |ψ〉 ∈ H, we obtain

〈ψ|Πσ|ψ〉 = 〈ψ|U†ΠUσU†U|ψ〉 (3.2.251)

≤ 〈ψ|U†(ΠN(σ) ⊗ 1E)U|ψ〉 (3.2.252)

= Tr[U|ψ〉〈ψ|U†(ΠN(σ) ⊗ 1E)] (3.2.253)

= Tr[TrE[U|ψ〉〈ψ|U†]ΠN(σ)] (3.2.254)

= Tr[N(|ψ〉〈ψ|)ΠN(σ)] (3.2.255)

= Tr[|ψ〉〈ψ|N†(ΠN(σ))] (3.2.256)

= 〈ψ|N†(ΠN(σ))|ψ〉. (3.2.257)

Since |ψ〉 is arbitrary, it holds that Πσ ≤ N†(ΠN(σ)). Using this, we find that

Pσ,N(N(σ)) = σ12N†(ΠN(σ))σ

12 ≥ σ

12 Πσσ

12 = σ. (3.2.258)

Having shown that Pσ,N(N(σ)) ≤ σ and Pσ,N(N(σ)) ≥ σ, we conclude that

Pσ,N(N(σ)) = σ, (3.2.259)

even when N(σ) is not invertible.

3.2.10.1 Petz Recovery Channel for Partial Trace

Let the input Hilbert space to the channel N in the definition of the Petz re-covery map be H = HAB. Then, let N = TrB be the partial trace over B, andnote that

N†(σB) = σA ⊗ 1B. (3.2.260)

Indeed, using the definition of the adjoint of a superoperator, we have

〈N†(σA), σAB〉 = 〈σA ⊗ 1B, σAB〉 (3.2.261)

144

Page 156: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= Tr[(σA ⊗ 1B)†σAB] (3.2.262)

= Tr[σ†ATrB(σAB)] (3.2.263)

= 〈σA,N(σAB)〉 . (3.2.264)

Therefore, the Petz recovery map corresponding to the partial trace over B is

PσAB,TrB(XA) = σ12AB

(σ− 1

2A XAσ

− 12

A ⊗ 1B

12AB. (3.2.265)

By writing the identity on B as 1B = ∑dB−1j=0 |j〉〈j|B, we can write the action

PσAB,TrB as

PσAB,TrB(XA) = σ12AB

(σ− 1

2A XAσ

− 12

A ⊗ 1B

12AB (3.2.266)

=dB−1

∑j=0

σ12AB

(σ− 1

2A XAσ

− 12

A ⊗ |j〉〈j|B)

σ12AB (3.2.267)

=dB−1

∑j=0

σ12AB

(σ− 1

2A ⊗ |j〉B

)XA

(σ− 1

2A ⊗ 〈j|B

12AB (3.2.268)

=dB−1

∑j=0

KjXAK†j , (3.2.269)

where

Kj := σ12AB

(σ− 1

2A ⊗ |j〉B

). (3.2.270)

The operators Kj, for 0 ≤ j ≤ dB − 1, are thus Kraus operators for the Petzrecovery map for the partial trace. Using (3.2.31), and letting the environmentE be denoted by B (since the dimension of the environment in the construction(3.2.31) is the same as the number of Kraus operators, which in this case isequal to the dimension of B), we find that

VA→BB =dB−1

∑j=0

σ12AB(σ

− 12

A ⊗ |j〉B)⊗ |j〉B (3.2.271)

=dB−1

∑j=0

σ12AB(σ

− 12

A ⊗ |j〉B ⊗ |j〉B) (3.2.272)

= (σ12AB ⊗ 1B)(σ

− 12

A ⊗ 1B ⊗ 1B)

(1A ⊗

dB−1

∑j=0|j, j〉BB

)(3.2.273)

145

Page 157: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= (σ12AB ⊗ 1B)(σ

− 12

A ⊗ 1BB)(1A ⊗ |Γ〉BB), (3.2.274)

which is an isometric extension of the Petz recovery map PσAB,TrB . Omittingidentity operators, this can be written more simply as follows:

VA→BB = σ12ABσ

− 12

A |Γ〉BB. (3.2.275)

3.2.11 LOCC Channels

A very common physical scenario encountered in quantum information the-ory is one in which two parties, Alice and Bob, who are distantly separated,perform local quantum operations (consisting of channels and/or measure-ments) at their respective locations and communicate classically with eachother in order to transform some initially shared state to some final desiredstate. This sequence of local operations and classical communication is abbrevi-ated LOCC and is an important element of many quantum communicationprotocols, such as entanglement distillation and secret key distillation.

As with every other transformation in quantum theory, an LOCC opera-tion corresponds mathematically to a channel, which we call an LOCC chan-nel. Formally, an LOCC channel is defined as follows:

Definition 3.37 LOCC Channel

Let X be a finite alphabet, let Mxx∈X be a quantum instrument (a setof completely positive trace-non-increasing maps such that ∑x∈XMx isa channel), and let Nxx∈X be a set of quantum channels. Then, a one-way LOCC channel from Alice to Bob is the channel L→AB→A′B′ from Alice’sinitial and final systems A and A′ to Bob’s initial and final systems Band B′, defined as

L→AB→A′B′ = ∑x∈X

MxA→A′ ⊗Nx

B→B′ . (3.2.276)

A one-way LOCC channel from Bob to Alice is the channel L←AB→A′B′ de-fined as

L←AB→A′B′ = ∑x∈X

NxA→A′ ⊗Mx

B→B′ . (3.2.277)

An LOCC channel is a composition of a finite, but arbitrarily large num-ber of one-way LOCC channels from Alice to Bob and from Bob to Alice

146

Page 158: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

and can be written as

L↔AB→A′B′ = ∑y∈Y

SyA ⊗W

yB (3.2.278)

for some finite alphabet Y and sets Syy∈Y, Wyy∈Y of completely pos-itive trace-non-increasing maps such that ∑y∈Y Sy ⊗Wy is trace preserv-ing.

Consider the LOCC channel L→AB→A′B′ from Alice to Bob defined as in(3.2.276). The values x ∈ X form the possible messages that might be com-municated from Alice to Bob and constitute the “classical communication”part of LOCC. The set Mxx∈X of completely positive trace-non-increasingmaps and the set Nxx∈X of quantum channels specify the actions of Aliceand Bob for each value x ∈ X and constitute the “local operations” part ofLOCC. The set Mxx∈X of completely positive trace-non-increasing mapsthat sum to a channel essentially specifies a quantum instrument. The op-erations corresponding to these maps can thus be considered probabilisiticsince the maps are not trace preserving. In general, the party that performsthe quantum instrument determines the direction of classical communicationand thus the direction of the LOCC channel. In this case, Alice performs theclassical communication since she performs the quantum instrument. Thevalues x specify the outcomes of the instrument, and Alice communicates theoutcome to Bob, who performs the corresponding channel selected from hisset Nxx∈X of channels.

In more detail, let ρAB be the initial state shared by Alice and Bob. Us-ing the definition in (3.2.180) for the channel corresponding to the quantuminstrument Mxx∈X, the state after applying the quantum instrument is

∑x∈X

MxA→A′(ρAB)⊗ |x〉〈x|XA , (3.2.279)

where the system XA stores the classical information corresponding to theoutcome of the instrument. Alice then communicates the outcome of the in-strument to Bob. This classical communication can be understood via thenoiseless classical channel from XA to XB defined by

θXA 7→ ∑x∈X〈x|XAθXA |x〉XA |x〉〈x|XB . (3.2.280)

147

Page 159: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

The state in (3.2.279) thus gets transformed to

∑x∈X

MxA→A′(ρAB)⊗ |x〉〈x|XB . (3.2.281)

Finally, Bob applies the channel specified by

τB ⊗ωXB 7→ ∑x∈X

NxB→B′(τB)⊗ 〈x|XBωXB |x〉XB , (3.2.282)

which corresponds to Bob measuring his system XB in the basis |x〉XBx∈Xand applying a quantum channel from the set Nx

B→B′x∈X to the system Bbased on the outcome. The final state is then

∑x∈X

(MxA→A′ ⊗Nx

B→B′)(ρAB), (3.2.283)

which is precisely the output of the one-way LOCC channel L→AB→A′B′ definedin (3.2.276). We can succinctly write the steps in (3.2.279)–(3.2.282) as

LAB→A′B′(ρAB) = (NXBB→B′ CXA→XB MA→A′XA)(ρAB), (3.2.284)

where the channel MA→A′XAis defined as

MA→A′XA(ξA) := ∑

x∈XMx

A→A′(ξA)⊗ |x〉〈x|XA , (3.2.285)

and the channel NBXB→B′ is defined as

NBXB→B′(τB ⊗ |x〉〈x|XB) := NxB→B′(τB) (3.2.286)

for all x ∈ X. As indicated above, the channel CXA→XB , defined in (3.2.280), isa noiseless classical channel that transforms the classical register XA, held byAlice, to the classical register XB (which is simply a copy of X), held by Bob.

An example of an LOCC channel is illustrated in Figure 3.8. This is anLOCC channel consisting of k rounds of alternating Alice-to-Bob and Bob-to-Alice one-way LOCC channels and is of the form

L↔A0B0→AkBk= Lk,→

Ak−1Bk−1→AkBkLk−1,←

Ak−2Bk−2→Ak−1Bk−1 · · ·

L2,←A1B1→A2B2

L1,→A0B0→A1B1

.(3.2.287)

For each round i, there is a finite alphabet Xi consisting of the messages com-municated in the round, along with a set Mx

i x∈Xi of completely positive

148

Page 160: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

ρA0B0 ρAkBkAlice

Bob

Mx1x∈X1

Nx1x∈X1

x1

Round 1L1,→

A0B0→A1B1

Mx2x∈X2

Nx2x∈X2

x2

Round 2L2,←

A1B1→A2B2

Mxkx∈Xk

Nxkx∈Xk

xk

Round kLk,→

Ak−1Bk−1→AkBk

· · ·

· · ·

· · ·

FIGURE 3.8: Illustration of an LOCC channel with k rounds of alternatingone-way LOCC channels from Alice to Bob and from Bob to Alice. In eachround i, there is a finite alphabet Xi, a set Mx

i x∈Xi of completely positivetrace-non-increasing maps that sum to a channel, and a set Nx

i x∈Xi ofquantum channels.

trace-non-increasing maps that sum to a quantum channel and a set Nxi x∈Xi

of quantum channels. In a multi-round LOCC channel such as this one, itpossible for the operation sets Mx

i x∈Xi and Nxi x∈Xi on the ith round to

depend on the outcomes and actions taken in previous rounds. The quantumteleportation protocol in Section 3.3 below provides a concrete example of aone-way LOCC protocol from Alice to Bob.

Definition 3.38 Separable Channel

A separable channel is a quantum channel SAB→A′B′ such that

SAB→A′B′ = ∑x∈X

CxA→A′ ⊗Dx

B→B′ (3.2.288)

for some finite alphabet X and sets of completely positive and trace-non-increasing maps Cxx∈X and Dxx∈X such that S is trace preserving.

Every separable channel has a set of Kraus operators in product form;i.e., for every separable channel SAB→A′B′ as in (3.2.288) there exists a finitealphabet Y and sets Cy

A→A′y∈Y and DyB→B′y∈Y such that

SAB→A′B′(ρAB) = ∑y∈Y

(CyA→A′ ⊗ Dy

B→B′)ρAB(CyA→A′ ⊗ Dy

B→B′)† (3.2.289)

for all ρAB.

149

Page 161: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

A key property of a separable channel is that it outputs a separable stateif the input state is separable. To see this, consider the separable state σAB =∑z∈Z p(z)τz

A ⊗ωzB. Then the output state SAB→A′B′(σAB) is given by

SAB→A′B′(σAB)

= ∑y∈Y

(CyA→A′ ⊗ Dy

B→B′)

(∑z∈Z

p(z)τzA ⊗ωz

B

)(Cy

A→A′ ⊗ DyB→B′)

† (3.2.290)

= ∑y∈Y,z∈Z

p(z)CyA→A′τ

zA(C

yA→A′)

† ⊗ DyB→B′ω

zB(Dy

B→B′)†, (3.2.291)

and is manifestly separable.

Proposition 3.39 LOCC Channels are Separable Channels

From (3.2.278) and (3.2.288), we conclude that every LOCC channel is aseparable channel.

The converse of Proposition 3.39 is not true. For example, let us define thefollowing operators:

K1 :=1√2|0〉〈0|+ |1〉〈1|, K2 := |0〉〈0|, K3 := |1〉〈1|. (3.2.292)

Then, following the notation in (3.2.288), let

C1A→A′(·) = K1(·)K†

1 , (3.2.293)

C2A→A′(·) = K2(·)K†

2 , (3.2.294)

C3A→A′(·) = K3(·)K†

3 , (3.2.295)

and

D1B→B′(·) = K2(·)K†

2 , (3.2.296)

D2B→B′(·) = K1(·)K†

1 , (3.2.297)

D3B→B′(·) = K3(·)K†

3 . (3.2.298)

Then, the map

SAB→A′B′(·) :=3

∑x=1

(CxA→A′ ⊗Dx

B→B′)(·) (3.2.299)

150

Page 162: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

NA→BρA N(ρA) L↔RAB′→B−→ρA

ωRB′

N(ρA)

NA→B

FIGURE 3.9: Depiction of an LOCC-simulable channel NA→B with associ-ated resource state ωRB′ . The channel NA→B can be realized via the LOCCchannel L↔RAB′→B and the resource state ωRB′ , for some auxiliary systemR. The LOCC channel L↔RAB′→B could in principle consist of a sequence ofseveral rounds of one-way LOCC channels, as depicted in Figure 3.8.

= (K1 ⊗ K2)(·)(K1 ⊗ K2)† + (K2 ⊗ K1)(·)(K2 ⊗ K1)

† (3.2.300)

+ (K3 ⊗ K3)(·)(K3 ⊗ K3)† (3.2.301)

is a separable channel, but it can be shown that it is not an LOCC channel;please consult the Bibliographic Notes in Section 3.6.

3.2.11.1 LOCC and Separable Simulation of Channels

Given a quantum channel NAB→A′B′ , an important question related to thephysical realization of the channel is whether the channel is an LOCC chan-nel. This amounts to determining whether the channel can be decomposed asin (3.2.278). More generally, one can ask whether a given channel NA→B canbe simulated by an LOCC channel acting on the input and a resource state, anotion that is illustrated in Figure 3.9 and defined below.

Definition 3.40 LOCC-Simulable Channel

A channel NA→B is called LOCC-simulable with associated resource stateωRB′ if there exists an auxiliary system R and an LOCC channel L↔RAB′→Bsuch that for every state ρA

N(ρA) = L↔RAB′→B(ρA ⊗ωRB′). (3.2.302)

For the LOCC channel L↔RAB′→B, the input systems RA are Alice’s, theinput system B′ is Bob’s, Alice’s output system is trivial, and Bob’s out-put system is B.

151

Page 163: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

If a channel NA→B is LOCC-simulable with associated resource state ωRB′ ,it means that Alice and Bob, the sender and receiver of the channel, respec-tively, can execute an LOCC channel of the form as depicted in Figure 3.8,with the assistance of the auxiliary system R, such that the output on Bob’ssystem at the end is N(ρA). The resource state ωRB′ is fixed, being such thatthe same resource state can be used for all input states ρA on Alice’s system. Aconcrete example of an LOCC simulation of a channel is shown in the contextof quantum teleportation in Section 3.3 below.

Due to the fact that separable channels strictly contain LOCC channels, itis sensible to generalize the notion of teleportation simulation even further tothis case:

Definition 3.41 Separable-Simulable Channel

A channel NA→B is called separable-simulable with associated resourcestate ωRB′ if there exists an auxiliary system R and a separable channelSRAB′→B such that for every state ρA

N(ρA) = SRAB′→B(ρA ⊗ωRB′). (3.2.303)

For the separable channel S↔RAB′→B, the input systems RA are Alice’s,the input system B′ is Bob’s, Alice’s output system is trivial, and Bob’soutput system is B.

3.2.12 Completely PPT-Preserving Channels

In Definition 3.10, we introduced PPT states as bipartite states ρAB such thatthe partial transpose TB(ρAB) is positive semi-definite. The class of channelsthat preserve PPT states are called completely PPT-preserving channels, abbrevi-ated as C-PPT-P channels, and we define them formally as follows.

Definition 3.42 Completely PPT-Preserving Channel

A channel PAB→A′B′ is called completely PPT-preserving if the map TB′ PAB→A′B′ TB is a channel.

152

Page 164: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Proposition 3.43

Completely PPT-preserving channels preserve the set of PPT states.

PROOF: Suppose that ρAB is a PPT state and PAB→A′B′ is a completely PPT-preserving channel. If we take the partial transpose TB′ on the output stateσA′B′ = PAB→A′B′(ρAB), then we find that

TB′(σA′B′) = (TB′ PAB→A′B′)(ρAB) (3.2.304)= (TB PAB→A′B′ TB)(TB(ρAB)). (3.2.305)

Since PAB→A′B′ is completely PPT-preserving, by definition TB′ PAB→A′B′ TB is completely positive. Since TB(ρAB) is positive, this implies that TB′(σA′B′)is positive, which means that the output state is a PPT state.

Proposition 3.44

Every separable channel is a completely PPT-preserving channel.

PROOF: Let SAB→A′B′ be a separable channel. By definition, it has the form

SAB→A′B′ = ∑x∈X

RxA→A′ ⊗Wx

B→B′ , (3.2.306)

where X is a finite alphabet and Rxx∈X and Wxx∈X are sets of completelypositive trace non-increasing maps such that ∑x∈XRx ⊗Wx is trace preserv-ing. Then,

TB′ SAB→A′B′ TB = ∑x∈X

RxA→A′ ⊗ (TB′ Wx

B→B′ TB). (3.2.307)

By applying Lemma 3.45 below, we conclude that the maps TB′ WxB→B′ TB

are completely positive for all x ∈ X, which means that TB′ SAB→A′B′ TB iscompletely positive. Therefore, SAB→A′B′ is completely PPT-preserving.

As a consequence of Proposition 3.44, it follows that every LOCC channelis a completely PPT-preserving channel, because every LOCC channel is aseparable channel.

The next lemma applies to channels that do not have a bipartite structure,i.e., there is no input or output system for Alice.

153

Page 165: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Lemma 3.45

Let NB→B′ be a completely positive map. Then the map TB′ NB→B′ TBis completely positive, and its Choi operator is given by the full trans-pose of the Choi operator for NB→B′ , i.e.,

ΓTB′NB→B′TBBB′ = T(ΓN

BB′). (3.2.308)

PROOF: Let ΓBB = |Γ〉〈Γ|BB, where |Γ〉 is defined in (2.2.26) and dB = dB.Observe that TB(ΓBB) = TB(ΓBB), since

TB(ΓBB) =dB−1

∑i,j=0|i〉〈j|B ⊗ (|i〉〈j|B)T (3.2.309)

=dB−1

∑i,j=0|i〉〈j|B ⊗ |j〉〈i|B (3.2.310)

=dB−1

∑i,j=0

(|j〉〈i|B)T ⊗ |j〉〈i|B (3.2.311)

= TB(ΓBB). (3.2.312)

Then the following holds for the Choi representation of TB′ NB→B′ TB:

ΓTB′NB→B′TBBB′ = (TB′ NB→B′ TB)(ΓBB) (3.2.313)

= ((TB ⊗ TB′) NB→B′)(ΓBB) (3.2.314)

= T(ΓNBB′). (3.2.315)

Since the map NB→B′ is completely positive, its Choi representation ΓNBB′ is

positive semi-definite. Since positive semi-definiteness is preserved undertransposition, we find that T(ΓN

BB′) is positive semi-definite, which means thatthe map TB′ NB→B′ TB is completely positive.

As a generalization of the lemma above, we have the following:

Proposition 3.46 Choi States of C-PPT-P Channels

Let NAB→A′B′ be a bipartite channel. The channel NAB→A′B′ is com-pletely PPT-preserving if and only if its Choi state ΦN

ABA′B′ is a PPT statewith respect to the bipartite cut AA′|BB′.

154

Page 166: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

PROOF: We begin by proving the only-if part. Suppose that NAB→A′B′ is com-pletely PPT-preserving. By definition, this implies that TB′ NAB→A′B′ TB iscompletely positive. By the definition of complete positivity, we concludethat (TB′ NAB→A′B′ TB)(ΦABAB) is a positive semi-definite operator, wheresystems A and B are isomorphic to the input systems A and B, respectively.By employing a calculation similar to that in (3.2.309)–(3.2.315), we concludethat

(TB′ NAB→A′B′ TB)(ΦABA′B′) = ((TB ⊗ TB′) NAB→A′B′)(ΦABAB),(3.2.316)

which implies that the Choi state NAB→A′B′)(ΦABAB) is a PPT state with re-spect to the bipartite cut AA′|BB′. Running this calculation backwards andmaking use of Theorem 3.18 establishes the if-part of the proposition.

3.2.12.1 PPT Simulation of Channels

Just as we asked whether a given channel NA→B can be simulated by anLOCC channel, we can ask whether the channel NA→B can be simulated by acompletely PPT-preserving channel. This leads to the following definition:

Definition 3.47 PPT-Simulable Channel

A channel NA→B is called PPT-simulable with associated resource stateωRB′ if there exists an auxiliary system R and a completely PPT-preserving channel PRAB′→B such that for every state ρA

N(ρA) = PRAB′→B(ρA ⊗ωRB′). (3.2.317)

For the C-PPT-P channel PRAB′→B, the input systems RA are Alice’s,the input system B′ is Bob’s, Alice’s output system is trivial, and Bob’soutput system is B.

The definition of a PPT-simulable channel is illustrated in Figure 3.10.

3.3 Quantum Teleportation

Quantum teleportation is a remarkable and fundamental protocol in quantuminformation theory. The simplest version of the protocol allows two parties,

155

Page 167: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

NA→BρA N(ρA) PRAB′→B−→ρA

ωRB′

N(ρA)

NA→B

FIGURE 3.10: Depiction of a PPT-simulable channel NA→B with associatedresource state ωRB′ . The channel NA→B can be realized via the completelyPPT-preserving channel PRAB′→B and the resource state ωRB′ , for someauxiliary system R.

say Alice and Bob, to transfer the state of a qubit from Alice to Bob whilemaking use of one shared pair of qubits in a maximally entangled state.

3.3.1 Qubit Teleportation Protocol

Before stating the basic teleportation protocol, let us start by introducing akey element of the protocol, the Bell measurement.

The Bell measurement is a measurement on two qubits defined by thePOVM |Φ+〉〈Φ+|, |Φ−〉〈Φ−|, |Ψ+〉〈Ψ+|, |Ψ−〉〈Ψ−|, where

|Φ+〉 :=1√2(|0, 0〉+ |1, 1〉), (3.3.1)

|Φ−〉 :=1√2(|0, 0〉 − |1, 1〉), (3.3.2)

|Ψ+〉 :=1√2(|0, 1〉+ |1, 0〉), (3.3.3)

|Ψ−〉 :=1√2(|0, 1〉 − |1, 0〉) (3.3.4)

are the two-qubit Bell states. They are all maximally entangled, and they forman orthonormal basis for the Hilbert space C2 ⊗ C2 of two qubits. As such,we have that |Φ+〉〈Φ+|+ |Φ−〉〈Φ−|+ |Ψ+〉〈Ψ+|+ |Ψ−〉〈Ψ−| = 1, so that theset |Φ+〉〈Φ+|, |Φ−〉〈Φ−|, |Ψ+〉〈Ψ+|, |Ψ−〉〈Ψ−| is indeed a POVM.

We can write the usual computational basis states for two qubits in terms

156

Page 168: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

|Φ+〉AB

|ψ〉A′

|ψ〉B

Alice

Bob

BellMeasurement

Xx Zz

z

x

FIGURE 3.11: Circuit diagram for the teleportation protocol. Conditionedon the outcomes (x, z) of Alice’s Bell measurement on her qubits A andA′, Bob performs the operation ZzXx on his qubit to transform it to |ψ〉.

of the Bell states as

|0, 0〉 = 1√2(|Φ+〉+ |Φ−〉), (3.3.5)

|0, 1〉 = 1√2(|Ψ+〉+ |Ψ−〉), (3.3.6)

|1, 0〉 = 1√2(|Ψ+〉 − |Ψ−〉), (3.3.7)

|1, 1〉 = 1√2(|Φ+〉 − |Φ−〉). (3.3.8)

Observe that we can write the Bell states compactly as

|Φx,z〉AB := (1A ⊗ XxBZz

B)|Φ+〉AB = (ZzAXx

A ⊗ 1B)|Φ+〉AB (3.3.9)

for all x, z ∈ 0, 1. In other words, all four two-qubit Bell states can be ob-tained from each other using only local operations on either A or B. Further-more, the Bell measurement POVM can be written as |Φx,z〉〈Φx,z| : x, z ∈0, 1, so that the classical bits x and z can be viewed as being the outcomesof the measurement.

We now detail the teleportation protocol; see Figure 3.11 for a circuit dia-gram depicting the protocol. The protocol starts with Alice and Bob sharingtwo qubits in the state |Φ+〉AB. Alice has an additional qubit, which is in thestate |ψ〉A′ , that she wishes to teleport to Bob, where

|ψ〉A′ = α|0〉A′ + β|1〉A′ , α, β ∈ C, |α|2 + |β|2 = 1. (3.3.10)

157

Page 169: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

The state |ψ〉A′ is arbitrary and need not be known to either Alice or Bob. Thetotal joint state between Alice and Bob at the start of the protocol is therefore

|ψ〉A′ ⊗ |Φ+〉AB =1√2(α|0, 0, 0〉A′AB + α|0, 1, 1〉A′AB

+ β|1, 0, 0〉A′AB + β|1, 1, 1〉A′AB).(3.3.11)

Alice and Bob then proceed as follows.

1. Alice performs a Bell measurement on her two qubits A′ and A. To easilydetermine the measurement outcomes and their probabilities, it is helpfulto write down the initial state (3.3.11) in the Bell basis on Alice’s systems.Using (3.3.5)–(3.3.8), we find that

|ψ〉A′ ⊗ |Φ+〉AB

=12(|Φ+〉A′A ⊗ (α|0〉B + β|1〉B) +|Φ−〉A′A ⊗ (α|0〉B − β|1〉B)+|Ψ+〉A′A ⊗ (α|1〉B + β|0〉B) +|Ψ−〉A′A ⊗ (α|1〉B − β|0〉)

)

=12(|Φ+〉A′A ⊗ |ψ〉B +|Φ−〉A′A ⊗ ZB|ψ〉B

+|Ψ+〉A′A ⊗ XB|ψ〉B +|Ψ−〉A′A ⊗ XBZB|ψ〉B)

. (3.3.12)

From this, it is clear that each outcome (x, z) ∈ 0, 12 of the Bell mea-surement occurs with equal probablity 1

4 and that the state of Bob’s qubitafter the measurement is Xx

BZzB|ψ〉B.

2. Alice communicates to Bob the two classical bits x and z resulting fromthe Bell measurement.

3. Upon receiving the measurement outcomes, Bob performs Xx and then Zz

on his qubit. The resulting state of Bob’s qubit is |ψ〉.Although we have described the teleportation protocol using a pure state |ψ〉A′

as the state being teleported, the protocol applies just as well if the state to beteleported is a mixed state ρA′ .

3.3.2 Qudit Teleportation Protocol

The teleportation protocol for qubits described above can be easily general-ized to qudits using the Heisenberg–Weyl operators Wz,x : 0 ≤ z, x ≤ d− 1,as defined in Section 3.2.5.4.

158

Page 170: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

We begin by defining the qudit Bell states in terms of the Heisenberg–Weyloperators as

|Φz,x〉 := (Wz,x ⊗ 1)|Φ〉, (3.3.13)

which is a direct generalization of (3.3.9), where |Φ〉 = 1√d ∑d−1

i=0 |i, i〉. Just likethe qubit Bell states, the qudit Bell states are orthonormal because

〈Φz1,x1 |Φz2,x2〉 = 〈Φ|(W†z1,x1⊗ 1)(Wz2,x2 ⊗ 1)|Φ〉 (3.3.14)

= 〈Φ|(W†z1,x1

Wz2,x2 ⊗ 1)|Φ〉 (3.3.15)

=1d

Tr[W†z1,x1

Wz2,x2 ] (3.3.16)

= δz1,z2δx1,x2 (3.3.17)

for all 0 ≤ z1, z2, x1, x2 ≤ d− 1. The last equality follows from (3.2.121). Thismeans that the set of operators

|Φz,x〉〈Φz,x| : 0 ≤ z, x ≤ d− 1 (3.3.18)

constitutes a POVM, which is the POVM corresponding to the Bell measure-ment on two qudits.

Now we start, like before, with Alice holding two qudits, one shared withBob and in the joint state |Φ〉AB, and the other in the state |ψ〉A′ , where

|ψ〉A′ =d−1

∑i=0

ci|i〉,d−1

∑i=0|ci|2 = 1. (3.3.19)

The state |ψ〉A′ is the one to be teleported to Bob’s system. The starting jointstate on the three qudits A′, A, and B is

|ψ〉A′ ⊗ |Φ〉AB =1√d

d−1

∑i,j=0

ci|i, j, j〉A′AB. (3.3.20)

Alice then performs a Bell measurement on her two qudits. By writing theBell states |Φz,x〉A′A as

|Φz,x〉A′A =1√d

d−1

∑k=0

e2πi(k+x)z

d |k + x, k〉A′A, (3.3.21)

for each outcome (z, x) ∈ 0, . . . , d − 12, we use (3.1.112) to find that thecorresponding (unnormalized) post-measurement state on Bob’s qudit is

(〈Φz,x|A′A ⊗ 1B)(|ψ〉A′ ⊗ |Φ〉AB)

159

Page 171: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Φ+AB

ρA′

ρB

Alice

Bob

BellMeasurement

Wz,x

z

x

TA′AB→B

−→ TA′AB→B

ρA′

Φ+AB

ρB

FIGURE 3.12: The qudit teleportation protocol, depicted on the left, canbe regarded as an LOCC channel TA′AB→B with the input states ρA′ and|Φ〉〈Φ|AB and the output state ρB, as shown on the right.

=1d

d−1

∑j,k,`=0

c`e−2πi(k+x)z

d 〈k + x|`〉 〈k|j〉 |j〉B (3.3.22)

=1d

d−1

∑k=0

ck+xe−2πi(k+x)z

d |k〉B (3.3.23)

=1d

d−1

∑k′=0

ck′e− 2πik′z

d |k′ − x〉B (3.3.24)

=1d

d−1

∑k=0

ck′X(−x)Z(−z)|k′〉B (3.3.25)

=1d

X(−x)Z(−z)|ψ〉B. (3.3.26)

Therefore, each outcome occurs with probability 1d2 and the corresponding

post-measurement state on Bob’s qudit is X(−x)Z(−z)|ψ〉B. This means thatBob, upon receving the two classical values corresponding to the outcome ofthe Bell measurement, can perform Z(z)X(x) = Wz,x in order to transformthe state of his qudit to |ψ〉B, completing the teleportation protocol.

Of course, the teleportation protocol works just as well if the state to beteleported is mixed. Also, as shown in Figure 3.12, we can write down theentire teleportation protocol as a one-way LOCC channel TA′AB→B from Aliceto Bob, with the state ρA′ to be teleported and the state |Φ〉〈Φ|AB as the inputs.By the analysis above, the output state received by Bob is exactly the same asthe original input state:

TA′AB→B(ρA′ ⊗ |Φ〉〈Φ|AB) = ρB. (3.3.27)

160

Page 172: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

This equation can also be interpreted as follows:

TA′AB→B((·)A′ ⊗ |Φ〉〈Φ|AB) = idA′→B(·), (3.3.28)

i.e., that the teleportation protocol simulates the identity channel.

To see that the teleportation protocol can indeed be viewed as a one-wayLOCC channel from Alice to Bob, let us explicitly write down the channelTA′AB→B defined above in the form of (3.2.284), i.e.,

TA′AB→B = DBY1Y2→B CX1X2→Y1Y2 EA′A→X1X2(3.3.29)

which we recall is equivalent to the form in (3.2.276). We have,

EA′A→X1X2=

d−1

∑z,x=0

Ez,xA′A→∅ ⊗ |z, x〉〈z, x|X1X2 , (3.3.30)

Ez,xA′A→∅(·) := TrA′A[|Φz,x〉〈Φz,x|A′A(·)], (3.3.31)

CX1X2→Y1Y2(|z, x〉〈z, x|X1X2) = |z, x〉〈z, x|Y1Y2 , (3.3.32)DBY1Y2→B((·)B ⊗ |z, x〉〈z, x|Y1Y2) = Dz,x

B→B(·), (3.3.33)

Dz,xB→B(·) := Wz,x(·)W†

z,x. (3.3.34)

We can also connect with the previously defined notion of LOCC simula-tion of a quantum channel (Definition 3.40). That is, we can understand theteleportation protocol and (3.3.27) as demonstrating that the identity channelis LOCC simulable with associated resource state given by the maximally en-tangled state. Now, by using the teleportation protocol in conjunction with aquantum channel NA→B, we find that every quantum channel NA→B is LOCCsimulable with associated resource state given by the maximally entangledstate of an appropriate Schmidt rank. To see this, observe that the channelNA→B can be trivially written as NA→B = NB′→B idA→B′ , where B′ is anauxiliary system with the same dimension as A. Then, by (3.3.28), we cansimulate the identity channel idA→B′ using the usual teleportation protocol,so that the overall LOCC channel L is NB′→B TAA′B′→B′ and the resourcestate is |Φ〉〈Φ|A′B′ , with the dimension of A′ equal to the dimension of A.This is illustrated in Figure 3.13. We can also simulate N via teleportation ina different manner, in which Alice locally applies the channel N to her inputstate ρA, then teleports the resulting state to Bob. Mathematically, we writethis as NA→B = idA→B NA→A = TAAB→B NA→A, where A, A are auxiliarysystems with the same dimension as B. We thus have have the following twoways to represent the action of the channel N using teleportation:

NA→B(ρA) = NB′→B (TAA′B′→B′(ρA ⊗ |Φ〉〈Φ|A′B′)) (3.3.35)

161

Page 173: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

NA→BρA N(ρA) = Φ+AB′

ρA′

N(ρA′)

BellMeasurement

Wz,x N

z

x

TA′AB′→B′

L→A′AB′→B

FIGURE 3.13: Any channel N is teleportation simulable, since Alice andBob can first perform the usual teleportation protocol, after which Bobcan apply the channel N to the state teleported by Alice. The combinationof these two steps can be taken as the LOCC channel L→A′AB′→B.

= TAAB→B(NA→A(ρA)⊗ |Φ〉〈Φ|AB

). (3.3.36)

Depending on whether the input dimension is smaller than the output di-mension of the channel, there can be a more economical way to perform thesimulation. If the channel’s output dimension is smaller than its input di-mension, then the more economical way to simulate the channel is for Aliceto first apply NA→B and then for Alice and Bob to perform the teleportationprotocol. In this way, they exploit a maximally entangled state of Schmidtrank dB in order to accomplish the simulation. If the channel’s input dimen-sion is smaller than its output dimension, then the more economical way tosimulate the channel is for Alice and Bob to first perform the teleportationprotocol, and then for Bob to apply the channel locally. In this way, they ex-ploit a maximally entangled state of Schmidt rank dA in order to accomplishthe simulation. As we see in Section 3.3.5, depending on the channel and itssymmetries, there can be even more economical methods to simulate a quan-tum channel via teleportation.

3.3.3 Qudit Teleportation Protocol With Respect to a FiniteGroup

The qudit teleportation protocol outlined above is based on the Heisenberg–Weyl operators Wz,x : 0 ≤ z, x ≤ d− 1 acting on a d-dimensional system,

162

Page 174: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

which are used to form the qudit Bell states and thus the Bell measurement.

More generally, we can consider an arbitrary finite group G and an ir-reducible unitary representation Ugg∈G of G acting on a d-dimensionalHilbert space. Then, it follows from Schur’s lemma (see Bibliographic Notesin Section 3.6) that

1|G| ∑

g∈GUgρUg† = Tr[ρ]π|G| (3.3.37)

for every state ρ. In particular, for bipartite states ρAB in which the systems Aand B are both d-dimensional, we find that

1|G| ∑

g∈G(Ug

A ⊗ 1B)ρAB(Ug†A ⊗ 1B) = πA ⊗ TrA[ρAB]. (3.3.38)

Now, let us take the maximally entangled state |Φ〉AB and define the states

|Φg〉AB := (UgA ⊗ 1B)|Φ〉AB. (3.3.39)

We call these states the generalized Bell states. We see that these states are adirect generalization of the usual qudit Bell states in (3.3.13). Then, by substi-tuting |Φ〉〈Φ|AB for ρAB in (3.3.38), we find that

1|G| ∑

g∈G|Φg〉〈Φg|AB = πAB =

1ABd2 . (3.3.40)

By defining the operators

ΠgAB :=

d2

|G| |Φg〉〈Φg|AB, (3.3.41)

we find that∑

g∈GΠg

AB = 1AB. (3.3.42)

Since the operators ΠgAB are also positive semi-definite, we conclude that the

set ΠgABg∈G constitutes a POVM. This POVM corresponds, by definition, to

the G-Bell measurement. Note that d2

|G| ≤ 1, which follows as a consequenceof the facts that ∑g∈G Πg = 1 and each |Φg〉 is a unit vector (were it not so,then the sum ∑g∈G Πg would exceed the identity). Indeed, consider that theidentity ∑g∈G Πg = 1 and the fact that Πg ≥ 0 for all g ∈ G implies that

163

Page 175: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Πg ≤ 1 for all g ∈ G. Then decomposing 1AB = |Φg〉〈Φg|+ |Φg〉〈Φg|⊥, where|Φg〉〈Φg|⊥ := 1− |Φg〉〈Φg|, we see that the only way the operator inequalityΠg ≤ 1 can be satisfied is if d2

|G| ≤ 1.

We use the G-Bell measurement given by the POVM Πgg∈G to constructour generalized teleportation protocol. The protocol proceeds as follows. Asbefore, Alice and Bob start by sharing two qudits in the state |Φ〉AB, withAlice holding an extra qudit in the state |ψ〉A′ to be teleported to Bob.

1. Alice performs the generalized Bell measurement given by the POVMΠg

AA′g∈G on her qudits A and A′. For each outcome g ∈ G of the mea-surement, according to (3.1.112) the (unnormalized) post-measurementstate of Bob’s qudit is

(d√|G|〈Φg|A′A ⊗ 1B

)(|ψ〉A′ ⊗ |Φ〉AB)

=d√|G|

(〈Φ|A′A(Ug†

A′ ⊗ 1A)⊗ 1B

)(|ψ〉A′ ⊗ |Φ〉AB) (3.3.43)

=d√|G|

(1√d

d

∑k=1〈k, k|A′A(Ug†

A′ ⊗ 1A)⊗ 1B

)

×(|ψ〉A′ ⊗

1√d

d

∑k′=1|k′, k′〉AB

)(3.3.44)

=1√|G|

d

∑k,k′=1

〈k|A′Ug†A′ |ψ〉A′ |k〉B

⟨k|k′⟩

(3.3.45)

=1√|G|

d

∑k=1〈k|A′Ug†

A′ |ψ〉A′ |k〉B (3.3.46)

=1√|G|

Ug†B |ψ〉B (3.3.47)

Therefore, each outcome occurs with probability 1|G| , and the post-meas-

urement state of Bob’s qudit is Ug†B |ψ〉B.

2. Alice communicates the outcome g resulting from the measurement toBob.

3. Upon receiving the measurement outcome, Bob applies Ug on his qudit.The resulting state of Bob’s qudit is |ψ〉B.

164

Page 176: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Observe that the original qudit teleportation protocol is a special case ofthe generalized teleportation protocol outlined above, in which the group Gis Zd ×Zd and its irreducible projective unitary representation Ugg∈G istaken to be the set of Heisenberg–Weyl operators. Then, the generalized Bellstates |Φg〉 are precisely the qudit Bell states |Φz,x〉 defined in (3.3.13). Fur-thermore, since |G| = d2, the POVM elements Πg = d2

|G| |Φg〉〈Φg| are theprojections on to the qudit Bell states, exactly as in the qudit teleportationprotocol.

3.3.4 Post-Selected Teleportation

Throughout this section, we have described teleportation protocols that in-volve performing a particular kind of Bell measurement between a systemA′, whose state is to be teleported, and a system A that forms half of a bipar-tite system AB that is in the joint resource state |Φ〉〈Φ|AB. Based on the Bellmeasurement outcome on A′A, Bob applies a particular correction operationin order to obtain the initial state of A′ in his system B. Thus, although theindividual measurement outcomes of the Bell measurement occur with someprobability, due to the correction operations the overall teleportation protocolis deterministic, i.e., it suceeds with probability one.

If Bob does not have the ability to apply correction operations to his sys-tem based on the Bell measurement outcomes, then Alice and Bob can per-form what is called post-selected teleportation. Post-selected teleportation isbased on the fact that, in the teleportation protocols that we have consid-ered, Bob does not need to apply a correction operation on his system if theoutcome corresponding to |Φ〉〈Φ|A′A occurs in the Bell measurement on A′A.This is due to the fact that the post-measurement state on Bob’s system, condi-tioned on the |Φ〉A′A outcome, is precisely the initial state ρA′ to be teleported:

〈Φ|A′A(ρA′ ⊗ |Φ〉〈Φ|AB)|Φ〉A′A =1d2 ρB, (3.3.48)

where d = dA = dB. This is a special case of (3.2.16) in which NA′→B = idA′→Band XRA′ = ρA′ . If we thus modify the teleportation protocol such that weconsider the |Φ〉A′A outcome of the Bell measurement as a “success” and therest of the outcomes as a “failure”, then we obtain post-selected teleportation.Post-selected teleportation is probabilistic by definition. In particular, from(3.3.48), we see that it succeeds with probability 1

d2 .

165

Page 177: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

|Φ+〉AB

ρA′

VgN(Ug†ρUg)Vg†

Alice

Bob

BellMeasurement

N Vg

g

FIGURE 3.14: Circuit diagram of a modified teleportation protocol inwhich Bob applies a channel N to his qudit before performing a correc-tion operation Vg, which is a unitary operation from the set Vg : g ∈ Gof unitary operators.

3.3.5 Teleportation-Simulable Channels

Let us now consider the even more general protocol depicted in Figure 3.14.Let G be a finite group. As before, Alice and Bob start with a shared pair ofqudits in the state |Φ〉AB, while Alice holds an extra qudit in the state ρA′ to beteleported to Bob’s qudit. Unlike the teleportation protocol above, however,Bob applies the channel N to his qudit before he receives the results of theBell measurement. Once he receives the measurement results, he applies theunitary operation Vg from the set Vg : g ∈ G of pre-determined unitaryoperators constituting a projective unitary representation of G.

The initial tripartite joint state of the protocol is

ρA′ ⊗ (1A ⊗NB)(|Φ〉〈Φ|AB). (3.3.49)

Alice performs the same generalized Bell measurement as before on A andA′, which we recall has the POVM Πg

AA′g∈G with elements ΠgAA′ defined

in (3.3.41). Recall that this POVM corresponds to an irreducible projectiveunitary representation of G given by Ugg∈G. Since the Bell measurementoperates only on the systems A′ and A, we can bring them inside the action ofN on Bob’s share of the state |Φ〉AB. This means that the analysis for the quditteleportation from above carries over exactly in this case. In other words,each outcome g ∈ G occurs with an equal probability of 1

|G| , and the post-measurement state on Bob’s qudit corresponding to the outcome g is

N(Ug†ρUg). (3.3.50)

166

Page 178: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

ΦNAB

ρA′

VgN(Ug†ρUg)Vg†

Alice

Bob

BellMeasurement

Vg

g

FIGURE 3.15: A mathematically equivalent way of describing the protocolin Figure 3.14. In this case, Alice and Bob start with the Choi state ΦN

AB ofthe channel N instead of the maximally entangled state |Φ〉〈Φ|AB.

After Bob applies the unitary Vg, the state of Bob’s qudit at the end of theprotocol is

VgN(Ug†ρUg)Vg†. (3.3.51)

This occurs with probability 1|G| for all g ∈ G.

Now, observe that the state (3.3.49) can be written as

ρA′ ⊗ΦNAB, (3.3.52)

where we recall that ΦNAB = (idA ⊗ NB)(|Φ〉〈Φ|AB) is the Choi state of the

channel N. In other words, the protocol depicted in Figure 3.14 is mathe-matically equivalent to the teleportation protocol over a group G outlinedabove except that instead of starting with the shared maximally entangledstate |Φ〉AB, Alice and Bob start with the shared state ΦN

AB. This equivalentprotocol is depicted in Figure 3.15.

If at the end of the protocol Bob discards the classical message g, then thestate of his system is given by

N(ρ) :=1|G| ∑

g∈GVgN(Ug†ρUg)Vg†. (3.3.53)

This transformation of the channel N in (3.3.53) is known as a channel twirlwith respect to the unitary representations Ugg∈G and Vgg∈G, becausethe twirled channel N is a symmetrized version of the original channel N.Thus, the generalized teleportation protocol gives an explicit procedure forimplementing a channel twirl by implementing the teleportation protocol us-ing the Choi state of the channel as the resource state.

167

Page 179: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

NA→BρA N(ρA) = ρNAB

ρA′

N(ρA′)

BellMeasurement

Vg

g

L→A′AB→B

FIGURE 3.16: Teleportation simulation of a G-covariant channel N, wherethe operators Vg : g ∈ G form a unitary representation of G on theoutput space of N.

Suppose now that the channel N satisfies the following property calledgroup-covariance:

Definition 3.48 Group-Covariant Channel

Let G be a group. A channel NA→B is called covariant with respect to G,group-covariant, G-covariant, or simply covariant, if there exist (projective)unitary representations Ug

Ag∈G and VgBg∈G of G such that for all ρA,

NA→B(UgAρAUg†

A ) = VgBNA→B(ρA)V

g†B (3.3.54)

for all g ∈ G.

We thus see that if N is G-covariant, with the associated unitary repre-sentation Vgg∈G of G at the output of the channel, then for all outcomes gof Alice’s generalized Bell measurement we have N(Ug†ρUg) = Vg†N(ρ)Vg.Therefore, after Bob applies Vg, the state of his qudit is N(ρ). This general-ized teleportation protocol therefore effectively applies the channel N to thestate ρA′ and transfers the resulting state to Bob’s qudit; see Figure 3.16. Wesay that the teleportation protocol simulates the action of the channel N onthe input state ρA′ . As stated earlier, in this sense, the original teleportationprotocol can be regarded as a way to simulate the identity channel.

The notion of simulation of a channel by a teleportation protocol can beextended to a one-way LOCC channel L→, as introduced in Definition 3.37,to obtain the following definition.

168

Page 180: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

NA→BρA N(ρA) L→RAB′→B=ρA

ωRB′

N(ρA)

NA→B

FIGURE 3.17: Depiction of a teleportation-simulable channel with associ-ated resource state ωRB′ . The teleportation-simulable channel NA→B canbe realized via the interaction LOCC channel L→RAB→B and the resourcestate ωRB′ .

Definition 3.49 Teleportation-Simulable Channel

A channel NA→B is called teleportation-simulable with associated resourcestate ωRB if there exists a one-way LOCC channel L→RAB′→B such that forall input states ρA

N(ρA) = L→RAB′→B(ρA ⊗ωRB′). (3.3.55)

Figure 3.17 illustrates the concept of a teleportation-simulable channel.Note that in (3.3.55) the resource state ωRB′ is fixed, as well as the LOCCchannel L→. Both sides of the equation should thus be regarded as functionsof ρA.

From the discussions above, we conclude that every group-covariant chan-nel is teleportation-simulable, where the one-way LOCC channel L is simplythe teleportation protocol with respect to the group, and the resource state isthe Choi state of the channel.

3.4 Quantum Super-Dense Coding

We now discuss the quantum super-dense coding protocol. This protocol canbe viewed as a “dual” to the quantum teleportation protocol in the followingsense: while in the basic quantum teleportation protocol, Alice and Bob makeuse of two qubits in the entangled state |Φ+〉 = 1√

2(|0, 0〉 + |1, 1〉) and two

bits of classical information to simulate a noiseless qubit channel, in quantumsuper-dense coding they make use of the shared entangled state |Φ+〉 along

169

Page 181: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

|Φ+〉ABAlice

Bob

BellMeasurement

Xx Zz

x z

(z, x)

FIGURE 3.18: Circuit diagram for the quantum super-dense coding proto-col. Using the bits (z, x) that she wishes to send, Alice applies the appro-priate Pauli X and/or Z operators to her share of the maximally entangledqubits that are in the state |Φ+〉AB and sends it through a noiseless qubitchannel to Bob. Bob then performs a Bell measurement on the two qubitsto recover the encoded bits (z, x).

with one use of a noiseless qubit channel to communicate two bits of classicalinformation. This is remarkable because, without the shared entanglementand only one use of the noiseless qubit channel, they can communicate atmost only one bit of classical information. The quantum super-dense codingprotocol thus represents one of the simplest examples in which prior sharedentanglement provides an advantage for classical communication.

Let us now go through the quantum super-dense coding protocol. SeeFigure 3.18 for a depiction of the protocol. Alice wishes to send two classi-cal bits (z, x) ∈ 0, 12 to Bob by making use of a shared pair of qubits inthe maximally entangled state |Φ+〉AB and one use of a noiseless qubit chan-nel. Depending on the bits she wishes to send, she performs the followingoperations on her share of the entangled qubits:

• To send the bits (0, 0), she does nothing.

• To send the bits (0, 1), she applies the Pauli X operator to her qubit, trans-forming the joint state |Φ+〉AB to |Ψ+〉AB.

• To send the bits (1, 0), she applies the Pauli Z operator to her qubit, trans-forming the joint state |Φ+〉AB to |Φ−〉AB.

• To send the bits (1, 1), she applies the X operator followed by the Z op-erator, so that the joint state becomes |Ψ−〉AB.

170

Page 182: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

After applying the appropriate operation, Alice sends her qubit to Bob withthe one allowed use of a noiseless qubit channel.

Bob now holds both qubits, and they are in one of the four Bell states

|Φz,x〉AB = (ZzAXx

A ⊗ 1B)|Φ+〉AB (3.4.1)

depending on the bits (z, x) Alice sent. Bob then performs a Bell measurementon his two qubits, and the outcome of this measurement consists precisely ofthe bits (z, x) that Alice wished to send.

The super-dense coding protocol has a simple generalization to the quditcase. In this case, Alice and Bob share the qudit Bell state |Φ〉AB before com-munication begins, and by applying one of the d2 Heisenberg–Weyl operatorsWz,x from (3.2.115) on her share of the state, Alice can rotate the global stateto one of the d2 qudit Bell states in (3.3.13). After Alice sends her share ofthe encoded state over a noiseless qudit channel to Bob, Bob can then per-form the qudit Bell measurement to decode which of the d2 messages Alicetransmitted.

3.5 Distinguishability Measures for States and Chan-nels

We conclude this chapter with a discussion of distinguishability measures forquantum states and channels. These measures are important for assessing thequality of performance of various quantum communication protocols. Theprimary measures for states on which we focus are the trace distance andfidelity, which have compelling operational interpretations.

3.5.1 Trace Distance

The trace distance is based on the trace norm. For two states ρ and σ, wedefine the trace distance between ρ and σ as 1

2 ‖ρ− σ‖1. Note that the tracedistance between pure states |ψ〉〈ψ| and |φ〉〈φ| is given by

12‖|ψ〉〈ψ| − |φ〉〈φ|‖1 =

√1− |〈ψ|φ〉|2. (3.5.1)

This follows in a straightforward manner by writing |φ〉 as

|φ〉 = α|ψ〉+ β|ψ⊥〉, (3.5.2)

171

Page 183: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

where |ψ⊥〉 is a pure state orthogonal to |ψ〉, α = 〈ψ|φ〉, and |α|2 + |β|2 = 1.Then, as a matrix in the basis |ψ〉, |ψ⊥〉, we have that

|ψ〉〈ψ| − |φ〉〈φ| =(

1− |α|2 −αβ−αβ −|β|2

), (3.5.3)

which has eigenvalues ±|β| = ±√

1− |α|2. Therefore, recalling that the tracenorm of a Hermitian operator is the sum of the absolute value of its eigenval-ues, we conclude (3.5.1).

The trace distance provides a natural way to measure the distance be-tween two quantum states ρ and σ, and to assess the performance of a quan-tum information processing protocol in which the ideal state to be generatedis ρ but the actual state generated is σ. To see that this is the case, supposethat a third party is trying to assess how distinguishable the actual state σ isfrom the ideal state ρ. Such an individual can do so by performing a quantummeasurement described by the POVM Λxx. The probability of obtaining aparticular outcome x is given by the Born rule. In the case that ρ was pre-pared, this probability is Tr[Λxρ], and in the case that σ was prepared, thisprobability is Tr[Λxσ]. What we demand is that the absolute deviation be-tween the actual probability Tr[Λxσ] and the ideal probability Tr[Λxρ] be nolarger than some tolerance ε. Since this should be the case for every possiblemeasurement outcome, what we demand mathematically is that

sup0≤Λ≤1

|Tr[Λρ]− Tr[Λσ]| ≤ ε. (3.5.4)

As stated in Theorem 3.50 below, the following identity holds

sup0≤Λ≤1

|Tr[Λρ]− Tr[Λσ]| = 12‖ρ− σ‖1 , (3.5.5)

indicating that if 12 ‖ρ− σ‖1 ≤ ε, then the deviation between probabilities for

every possible measurement operator never exceeds ε, so that the approxi-mation between states ρ and σ is naturally quantified by the trace distance12 ‖ρ− σ‖1.

Let us now prove the variational characterization of the trace distancestated in (3.5.5).

172

Page 184: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Theorem 3.50 Trace Distance via Measurement

For two states ρ and σ, the following equality holds

12‖ρ− σ‖1 = sup

0≤Λ≤1

|Tr[Λ(ρ− σ)]| . (3.5.6)

An optimal measurement operator Λ is equal to Π+ + Λ0, where Π+ isthe projection onto the strictly positive part of ρ − σ and Λ0 is an op-erator satisfying 0 ≤ Λ0 ≤ Π0, with Π0 the projection onto the zeroeigenspace of ρ− σ.

PROOF: Consider that

sup0≤Λ≤1

|Tr[Λ(ρ− σ)]| = sup0≤Λ≤1

Tr[Λ(ρ− σ)] , (3.5.7)

because there are always choices for Λ such that Tr[Λ(ρ− σ)] ≥ 0. Then theequality follows as a direct application of (3.1.121) and the optimality state-ment following it, with A = ρ and B = σ.

From the expression in (3.5.6), it is clear that the trace distance can be com-puted via a semi-definite program (SDP). The following proposition statesthis fact formally and also states the dual optimization problem. Appendix 3.Cprovides a proof.

Proposition 3.51 SDPs for Normalized Trace Distance

The trace distance between any two quantum states ρ and σ can be writ-ten as the following semi-definite programs:

12‖ρ− σ‖1 = sup

Λ≥0Tr[Λ(ρ− σ)] : Λ ≤ 1 (3.5.8)

= infZ≥0Tr[Z] : Z ≥ ρ− σ. (3.5.9)

The trace distance is non-increasing under the action of a quantum chan-nel (and more generally, positive, trace-preserving maps).

173

Page 185: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Theorem 3.52 Data-Processing Inequality for Trace Distance

Let ρ and σ be states, and let N be a positive, trace-preserving map.Then,

‖ρ− σ‖1 ≥ ‖N(ρ)−N(σ)‖1 . (3.5.10)

PROOF: Using the definition of the adjoint N† of N, for every measurementoperator 0 ≤ Λ′ ≤ 1, we find that

Tr[Λ′(N(ρ)−N(σ))] = Tr[N†(Λ′)(ρ− σ)]. (3.5.11)

Now, since N is a positive map, so is N†, which means that N†(Λ′) ≥ 0.Furthermore, since N is trace preserving, the adjoint N† is unital. This, alongwith Λ′ ≤ 1, means that

N†(1−Λ′) ≥ 0⇒ N†(Λ′) ≤ N†(1) = 1. (3.5.12)

In other words, 0 ≤ N†(Λ′) ≤ 1. Therefore,

Tr[N†(Λ′)(ρ− σ)] ≤ sup0≤Λ≤1

Tr[Λ(ρ− σ)] =12‖ρ− σ‖1 , (3.5.13)

where to obtain the equality we made use of Theorem 3.50. Since Λ′ wasarbitrary, we find that

sup0≤Λ′≤1

Tr[Λ′(N(ρ)−N(σ))] ≤ 12‖ρ− σ‖1 , (3.5.14)

from which the result follows since the left-hand side of the inequality is equalto 1

2 ‖N(ρ)−N(σ)‖1 by Theorem 3.50.

The data-processing inequality for the trace distance tells us that for thetask of state discrimination (see Section 3.1.3), applying a channel to the statesρ and σ before performing the measurement can never exceed the perfor-mance (as quantified by the success probability) of measuring ρ and σ directly.

More general than Proposition 3.51 and Theorem 3.52, the following propo-sition holds, which we state here without proof:

174

Page 186: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Proposition 3.53 Trace Norm for Hermitian Operators

Let A be a Hermitian operator. Then

‖A‖1 = supΛ1,Λ2≥0

Tr[(Λ1 −Λ2)A] : Λ1, Λ2 ≤ 1 (3.5.15)

= infY1,Y2≥0

Tr[Y1 + Y2] : Y1 ≥ A, Y2 ≥ −A. (3.5.16)

Let N be a positive, trace-preserving map. Then

‖A‖1 ≥ ‖N(A)‖1 . (3.5.17)

By combining the results of Theorems 3.50 and 3.52, we find that the tracedistance is achieved by a measurement channel:

Theorem 3.54 Trace Distance Achieved by Measurement Channel

For two states ρ and σ, the following equality holds

‖ρ− σ‖1 = maxΛxx

∑x∈X|Tr[Λxρ]− Tr[Λxσ]| , (3.5.18)

where the optimization is performed over POVMs Λxx∈X definedwith respect to a finite alphabet X, and an optimal POVM is given byΛ∗, 1−Λ∗, where Λ∗ = Π+ + Λ0, the projection Π+ is the projectiononto the strictly positive part of ρ − σ, and Λ0 satisfies 0 ≤ Λ0 ≤ Π0,with Π0 the projection onto the zero eigenspace of ρ− σ.

PROOF: The inequality

‖ρ− σ‖1 ≥ maxΛxx

∑x∈X|Tr[Λxρ]− Tr[Λxσ]| (3.5.19)

follows from Theorem 3.52 by taking the channel N to be the quantum–classicalchannel defined as N(ω) = ∑x∈X Tr[Λxω]|x〉〈x|. Then, from Theorem 3.50,we have the equality

12‖ρ− σ‖1 = |Tr[Λ∗(ρ− σ)]|. (3.5.20)

Since |Tr[Λ∗(ρ− σ)]| = |Tr[(1−Λ∗)(ρ− σ)]|, we conclude that

‖ρ− σ‖1 = |Tr[Λ∗(ρ− σ)]|+ |Tr[(1−Λ∗)(ρ− σ)]|, (3.5.21)

175

Page 187: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

so that an optimal POVM is given by Λ∗, 1−Λ∗.

3.5.2 Fidelity

In addition to the trace distance, another distinguishability measure for statesthat we consider in this book is the fidelity (also called Uhlmann fidelity).

Definition 3.55 Fidelity

For two quantum states ρ and σ, the fidelity between ρ and σ, denoted byF(ρ, σ), is defined as

F(ρ, σ) :=∥∥√ρ√

σ∥∥2

1 =(

Tr[√√

σρ√

σ])2

. (3.5.22)

Observe that the fidelity is symmetric in its arguments. Also note thatF(ρ, σ) ∈ [0, 1] for all states ρ and σ, a fact that we prove below.

For a pure state |ψ〉 and mixed state ρ, the fidelity between them is equalto

F(ρ, |ψ〉〈ψ|) = 〈ψ|ρ|ψ〉 = Tr[|ψ〉〈ψ|ρ]. (3.5.23)

Also, for two pure states |ψ〉 and |φ〉, the fidelity is simply

F(|ψ〉〈ψ|, |φ〉〈φ|) = |〈ψ|φ〉|2 . (3.5.24)

The formula in (3.5.23) gives the fidelity an operational meaning that weemploy in later chapters. Suppose that the goal of a quantum informationprocessing protocol is to produce the pure state |ψ〉〈ψ|, but it instead pro-duces a mixed state ρ. Then the fidelity F(ρ, |ψ〉〈ψ|) is equal to the probabilitythat the actual state ρ passes a test for being the ideal state |ψ〉〈ψ|, with thetest being given by the POVM |ψ〉〈ψ|, 1− |ψ〉〈ψ|. That is, the probabilityof obtaining the first outcome of the measurement (i.e., “success”) is equal toF(ρ, |ψ〉〈ψ|). In this way, the fidelity provides another natural way for assess-ing the performance of quantum information processing protocols.

Like the trace distance, the fidelity can be computed via a semi-definiteprogram, as stated in the following proposition:

176

Page 188: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Proposition 3.56 SDPs for Root Fidelity of States

The root fidelity√

F(ρ, σ) =∥∥√ρ√

σ∥∥

1 of quantum states ρ and σ ischaracterized by the following primal and dual semi-definite programs:

√F(ρ, σ) =

12

supX∈L(H)

Tr[X] + Tr[X†] :

(ρ X

X† σ

)≥ 0

(3.5.25)

=12

infY,Z≥0

Tr[Yρ] + Tr[Zσ] :

(Y 1

1 Z

)≥ 0

(3.5.26)

PROOF: See Appendix 3.D.1.

Theorem 3.57 Basic Properties of Fidelity

1. For two states ρ and σ, the inequalities 0 ≤ F(ρ, σ) ≤ 1 hold. Fur-thermore, F(ρ, σ) = 1 if and only if ρ = σ, and F(ρ, σ) = 0 if andonly if ρ and σ are supported on orthogonal subspaces.

2. Isometric invariance: For all states ρ and σ, and for every isometry V,

F(ρ, σ) = F(VρV†, VσV†). (3.5.27)

3. Multiplicativity: The fidelity is multiplicative with respect to tensor-product states, meaning that for all states ρ1, σ1, ρ2, σ2, we have

F(ρ1 ⊗ ρ2, σ1 ⊗ σ2) = F(ρ1, σ1)F(ρ2, σ2). (3.5.28)

PROOF:

1. The fact that F(ρ, σ) ≥ 0 for all states ρ and σ follows from the defini-tion of the fidelity as the squared trace norm and the fact that the tracenorm is always non-negative. If ρ and σ are supported on orthogonalsubspaces, then

√ρ√

σ = 0, which means that F(ρ, σ) = 0. Conversely, ifF(ρ, σ) = 0, then

∥∥√ρ√

σ∥∥

1 = 0, which implies (by definition of a norm)that

√ρ√

σ = 0, which in turn implies that ρ and σ are supported onorthogonal subspaces.

Now, using (2.2.88), there exists a unitary U such that

F(ρ, σ) =∥∥√ρ√

σ∥∥2

1 =∣∣Tr[U

√ρ√

σ]∣∣2 . (3.5.29)

177

Page 189: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Then, using the Cauchy–Schwarz inequality for the Hilbert–Schmidt in-ner product (see (2.2.23)), we find that

F(ρ, σ) =∣∣Tr[U

√ρ√

σ]∣∣2 (3.5.30)

≤ Tr[U√

ρ√

ρU†]Tr[√

σ√

σ] (3.5.31)

= Tr[√

ρ√

ρ]Tr[√

σ√

σ] (3.5.32)= Tr[ρ]Tr[σ] (3.5.33)= 1. (3.5.34)

If ρ = σ, then F(ρ, σ) = ‖ρ‖21 = Tr[ρ]2 = 1. On the other hand, if

F(ρ, σ) = 1, then the inequality in the Cauchy–Schwarz inquality is sat-urated. The Cauchy–Schwarz inequality is saturated if and only if thetwo operators involved are proportional to each other. This means thatρ = ασ for some α > 0. But since both ρ and σ are states, it must be thecase that α = 1, which means that ρ = σ.

2. Proof of isometric invariance: For every isometry V and every two states ρand σ, since the action of an isometry does not change the eigenvalues,we have that

√VρV† = V

√ρV† and

√VσV† = V

√σV†. Therefore,

F(VρV†, VσV†) =

∥∥∥∥√

VρV†√

VσV†∥∥∥∥

2

1(3.5.35)

=∥∥∥V√

ρV†V√

σV†∥∥∥

2

1(3.5.36)

=∥∥∥V√

ρ√

σV†∥∥∥

2

1(3.5.37)

=∥∥√ρ√

σ∥∥2

1 , (3.5.38)

as required, where the last line is due to the isometric invariance of theSchatten norms, as stated in (2.2.102).

3. Proof of multiplicativity: Using the fact that√

ρ1 ⊗ ρ2 =√

ρ1 ⊗√

ρ2, andsimilarly for

√σ1 ⊗ σ2, and using the multiplicativity of the trace norm

with respect to the tensor product (see (2.2.104)), we find that

F(ρ1 ⊗ ρ2, σ1 ⊗ σ2) =∥∥√ρ1 ⊗ ρ2

√σ1 ⊗ σ2

∥∥21 (3.5.39)

= ‖(√ρ1 ⊗√

ρ2)(√

σ1 ⊗√

σ2)‖21 (3.5.40)

= ‖√ρ1√

σ1 ⊗√

ρ2√

σ2‖21 (3.5.41)

178

Page 190: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= ‖√ρ1√

σ1‖21 ‖√

ρ2√

σ2‖21 (3.5.42)

= F(ρ1, σ1)F(ρ2, σ2), (3.5.43)

as required.

Theorem 3.58 Uhlmann’s Theorem

For two quantum states ρA and σA, let |ψρ〉 := (1R ⊗√

ρA)|Γ〉RA and|ψσ〉 := (1R ⊗

√σA)|Γ〉RA be purifications of ρ and σ, respectively, with

the dimension of R equal to the dimension of A. Then,

F(ρ, σ) = maxUR|〈ψρ|RA(UR ⊗ 1A)|ψσ〉RA|2 , (3.5.44)

where the optimization is over unitaries on R.

REMARK: Since all purifications are related to each other by isometries on the purifyingsystem (which is the system R as in the statement of the theorem), Uhlmann’s theoremtells us that the fidelity between two quantum states is equal to the maximum overlapbetween their purifications.

Furthermore, it is straightforward to show that it suffices to take the dimension of Rthe same as the dimension of A, as we have done in the statement of the theorem. In otherwords, performing an optimization over the dimension of R leads to the same result as in(3.5.44).

PROOF: Using the definitions of |ψρ〉RA and |ψσ〉RA, we find for every unitaryUR that

|〈ψρ|RA(UR ⊗ 1A)|ψσ〉RA|2

= |〈Γ|RA(1R ⊗√

ρA)(UR ⊗ 1A)(1R ⊗√

σA)|Γ〉RA|2 (3.5.45)

= |〈Γ|RA(UR ⊗√

ρA√

σA)|Γ〉RA|2 (3.5.46)

=∣∣〈Γ|RA(1R ⊗

√ρA√

σAUTA)|ΓRA〉

∣∣2 , (3.5.47)

where to obtain the last line we used the transpose trick in (2.2.30). Then,using (2.2.33), we find that

|〈ψρ|RA(UR ⊗ 1A)|ψσ〉RA|2 =∣∣Tr[√

ρA√

σAUTA]∣∣2 . (3.5.48)

Since UA is arbitrary, and UTA is also a unitary, we use (2.2.88) to obtain

maxU|〈ψρ|RA(UR ⊗ 1A)|ψσ〉RA|2 = max

U

∣∣Tr[√

ρA√

σAUTA]∣∣2 (3.5.49)

179

Page 191: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= maxU|Tr[√

ρA√

σAUA]|2 (3.5.50)

= ‖√ρA√

σA‖21 (3.5.51)

as required.

Theorem 3.59 Data-Processing Inequality for Fidelity

Let ρ and σ be states, and let N be a quantum channel. Then,

F(ρ, σ) ≤ F(N(ρ),N(σ)). (3.5.52)

PROOF: Recall that every quantum channel NA→B can be written in the Stine-spring form as NA→B(ρA) = TrE[VρAV†], where V ≡ VA→BE is some isomet-ric extension of N and dE ≤ rank(ΓN

AB). Since we have shown that the fidelityis invariant under isometric channels, it remains to show that the fidelity isnon-decreasing under the action of the partial trace. To this end, considerbipartite states ρAB and σAB, and let |ψ〉RAB be an arbitrary purification ofρAB and let |φ〉RAB be an arbitrary purification of σAB, where dR = dAdB.Observe that |ψ〉RAB and |φ〉RAB are also purifications of ρA = TrB[ρAB] andσA = TrB[σAB], respectively. Then, by Uhlmann’s theorem, we have that

F(ρA, σA) = maxURB|〈ψ|RAB(URB ⊗ 1A)|φRAB〉|2 . (3.5.53)

By restricting the maximization above to unitaries of the form UR ⊗ 1B, wehave that

F(ρA, σA) ≥ |〈ψ|RAB(UR ⊗ 1AB)|φ〉RAB|2 (3.5.54)

for all unitaries UR. Therefore,

maxUR|〈ψ|RAB(UR ⊗ 1AB)|φ〉RAB|2

≤ maxURB|〈ψ|RAB(URB ⊗ 1A)|φ〉RAB|2 . (3.5.55)

But, by Uhlmann’s theorem,

maxUR|〈ψ|RAB(UR ⊗ 1AB)|φ〉RAB|2 = F(ρAB, σAB). (3.5.56)

Therefore,

F(ρAB, σAB) ≤ F(ρA, σA) = F(TrB[ρAB], TrB[σAB]). (3.5.57)

180

Page 192: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

The fidelity thus satisfies the data-processing inequality with respect to thepartial trace.

Using the data-processing inequality for the fidelity with respect to thepartial trace, along with its invariance under isometries, we conclude that

F(N(ρ),N(σ)) = F(

TrE[VρV†], TrE[VρV†])

(3.5.58)

≥ F(VρV†, VσV†) (3.5.59)= F(ρ, σ), (3.5.60)

as required.

With Uhlmann’s theorem and the data-processing inequality for the fi-delity in hand, we can now establish two more properties of the fidelity.

Theorem 3.60 Concavity of Fidelity

The fidelity is concave in either one of its arguments:

F

(∑

x∈Xp(x)ρx, σ

)≥ ∑

x∈Xp(x)F(ρx, σ), (3.5.61)

where X is a finite alphabet, p : X → [0, 1] is a probability distribution,and σ and ρxx∈X are states.

PROOF: By Uhlmann’s theorem, we know that the fidelity is given by themaximum overlap between the purifications of the two states under consid-eration. Based on this, let |ψσ〉RA be a purification of σA. Then, for x ∈ X, let|φx〉RA be a purification of ρx

A such that F(ρxA, σ) = |〈φx|ψσ〉|2. Then,

∑x∈X

p(x)F(ρxA, σA) = ∑

x∈Xp(x) |〈φx|ψσ〉|2 (3.5.62)

= 〈ψσ|RA

(∑

x∈Xp(x)|φx〉〈φx|RA

)|ψσ〉RA (3.5.63)

= F

(∑

x∈Xp(x)|φx〉〈φx|RA, |ψσ〉〈ψσ|RA

), (3.5.64)

where to obtain the last line we used the formula in (3.5.23) for the fidelitybetween a pure state and a mixed state. Then, using the data-processing in-

181

Page 193: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

equality for the fidelity with respect to the partial trace, we obtain

∑x∈X

p(x)F(ρxA, σA) = F

(∑

x∈Xp(x)|φx〉〈φx|RA, |ψσ〉〈ψσ|RA

)(3.5.65)

≤ F

(∑

x∈Xp(x)TrR[|φx〉〈φx|RA], TrR[|ψσ〉〈ψσ|RA]

)(3.5.66)

= F

(∑

x∈Xp(x)ρx

A, σA

), (3.5.67)

which is the required result.

A more general result than the concavity result proved above, namely jointconcavity, can be obtained if we consider instead the square root of the fidelity,which we call the “root fidelity” and denote by

√F(ρ, σ) :=

√F(ρ, σ) =

∥∥√ρ√

σ∥∥

1 . (3.5.68)

Theorem 3.61 Joint Concavity of Root Fidelity

The root fidelity is jointly concave:

√F

(∑

x∈Xp(x)ρx, ∑

x∈Xp(x)σx

)≥ ∑

x∈Xp(x)√

F(ρx, σx), (3.5.69)

where X is a finite alphabet, p : X → [0, 1] is a probability distribution,and ρxx∈X and σxx∈X are sets of states.

PROOF: This result follows by defining the classical–quantum states

ρXA := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, (3.5.70)

σXA := ∑x∈X

p(x)|x〉〈x|X ⊗ σxA, (3.5.71)

and observing that√

F(ρXA, σXA)

= ‖√ρXA√

σXA‖1 (3.5.72)

182

Page 194: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

=

∥∥∥∥∥

(∑

x∈X|x〉〈x|X ⊗

√p(x)ρx

A

)(∑

x′∈X|x′〉〈x′|X ⊗

√p(x′)σx′

A

)∥∥∥∥∥1

(3.5.73)

=

∥∥∥∥∥∑x∈X|x〉〈x|X ⊗ p(x)

√ρx

A

√σx

A

∥∥∥∥∥1

(3.5.74)

= ∑x∈X

∥∥∥p(x)√

ρxA

√σx

A

∥∥∥1

(3.5.75)

= ∑x∈X

p(x)∥∥∥√

ρxA

√σx

A

∥∥∥1

(3.5.76)

= ∑x∈X

p(x)√

F(ρxA, σx

A). (3.5.77)

From the above and the data-processing inequality for the fidelity under par-tial trace (Theorem 3.59), we conclude that

∑x∈X

p(x)√

F(ρxA, σx

A) =√

F(ρXA, σXA) ≤√

F(ρA, σA), (3.5.78)

which is equivalent to (3.5.69).

The steps in (3.5.72)–(3.5.77) demonstrate that the root fidelity satisfiesthe direct-sum property: for every finite alphabet X, probability distributionsp, q : X→ [0, 1], and sets ρx

Ax∈X, σxAx∈X of states, we have

√F

(∑

x∈Xp(x)|x〉〈x|X ⊗ ρx

A, ∑x∈X

q(x)|x〉〈x|X ⊗ σxA

)

= ∑x∈X

√p(x)q(x)

√F(ρx

A, σxA). (3.5.79)

Just as the trace distance can be achieved with a measurement, so it holdsthat the fidelity can also be achieved with a measurement, as we now show.

Theorem 3.62 Fidelity via Measurement

For two states ρ and σ, the following equality holds

F(ρ, σ) = minΛxx

(∑

x∈X

√Tr[Λxρ]

√Tr[Λxσ]

)2

, (3.5.80)

where the optimization is with respect to all POVMs Λxx∈X defined

183

Page 195: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

with respect to a finite alphabet X.

PROOF: Let M be a measurement channel defined as

M(ρ) = ∑x∈X

Tr[Λxρ]|x〉〈x|, (3.5.81)

where Λxx∈X is a POVM and X is a finite alphabet. Then, by the data-processing inequality for the fidelity with respect to the channel M, and sincethe action of M leads to a state that is diagonal in the orthonormal basis|x〉x∈X, we obtain

F(ρ, σ) ≤ F(M(ρ),M(σ)) (3.5.82)

=

∥∥∥∥√

M(ρ)√

M(σ)

∥∥∥∥2

1(3.5.83)

=

∥∥∥∥∥√

∑x∈X

Tr[Λxρ]|x〉〈x|√

∑x∈X

Tr[Λxσ]|x〉〈x|∥∥∥∥∥

2

1

(3.5.84)

=

∥∥∥∥∥

(∑

x∈X

√Tr[Λxρ]|x〉〈x|

)(∑

x∈X

√Tr[Λxσ]|x〉〈x|

)∥∥∥∥∥

2

1

(3.5.85)

=

∥∥∥∥∥∑x∈X

√Tr[Λxρ]

√Tr[Λxσ]|x〉〈x|

∥∥∥∥∥

2

1

(3.5.86)

=

(∑

x∈X

√Tr[Λxρ]

√Tr[Λxσ]

)2

. (3.5.87)

Since the POVM Λxx∈X is arbitrary, we find that

F(ρ, σ) ≤ minΛxx

(∑

x∈X

√Tr[Λxρ]

√Tr[Λxσ]

)2

. (3.5.88)

We now prove the reverse inequality by explicitly constructing a POVMthat achieves the fidelity. First, observe that we can write the fidelity F(ρ, σ)as

F(ρ, σ) =∥∥√ρ√

σ∥∥2

1 = Tr[√√

σρ√

σ

]2= Tr[Aσ]2, (3.5.89)

where

A := σ−12

12 ρσ

12

) 12

σ−12 . (3.5.90)

184

Page 196: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

If σ is not invertible, then the inverse is understood to be on the support ofσ, in which case A is supported on the support of σ. So the fidelity is sim-ply equal to the squared expectation value of the Hermitian operator A withrespect to the state σ. Observe that we can also write

F(ρ, σ) =

(Tr[√√

σρ√

σ

])2(3.5.91)

=

(Tr[√√

σΠσρΠσ

√σ

])2(3.5.92)

= F(ΠσρΠσ, σ), (3.5.93)

where Πσ is the projection onto the support of σ. This holds because√

σ =Πσ√

σ =√

σΠσ.

Now, let us consider a measurement in the eigenbasis of A. Let |ψi〉r−1i=0

be the eigenvectors of A, where r = rank(A). If σ is not invertible, then wecan always add a set |ψi〉d−1

i=r of linearly independent pure states orthogonalto the eigenbasis of A in order to obtain a POVM on the full d-dimensionalspace. Therefore, suppose that

A =d−1

∑i=0

λi|ψi〉〈ψi|, (3.5.94)

where we have included the vectors |ψi〉d−1i=r that have corresponding eigen-

values equal to zero. Then,

Tr[Aσ] = Tr

[d−1

∑i=0

λi|ψi〉〈ψi|σ]

(3.5.95)

=d−1

∑i=0

λi〈ψi|σ|ψi〉 (3.5.96)

=d−1

∑i=0

√〈ψi|λiσλi|ψi〉

√〈ψi|σ|ψi〉 (3.5.97)

=r−1

∑i=0

√〈ψi|AσA|ψi〉

√〈ψi|σ|ψi〉, (3.5.98)

where to obtain the last line we have used the fact that A|ψi〉 = λi|ψi〉 for all0 ≤ i ≤ r − 1, and we have used the fact that λi = 0 for all r ≤ i ≤ d − 1.

185

Page 197: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Now, it is straightforward to show that AσA = ΠσρΠσ. Therefore,

F(ρ, σ) = Tr[Aσ]2 =

(r−1

∑i=0

√〈ψi|ΠσρΠσ|ψi〉

√〈ψi|σ|ψi〉

)2

(3.5.99)

=

(r−1

∑i=0

√〈ψi|ρ|ψi〉

√〈ψi|σ|ψi〉

)2

, (3.5.100)

where to obtain the last line we have used the fact that A is defined on thesupport of σ, which means that Πσ|ψi〉 = |ψi〉 for all 0 ≤ i ≤ r− 1. We thushave

minΛxx

(∑

x∈X

√Tr[Λxρ]

√Tr[Λxσ]

)2

≤(

r−1

∑i=0

√〈ψi|ρ|ψi〉

√〈ψi|σ|ψi〉

)2

(3.5.101)= F(ρ, σ), (3.5.102)

which is precisely the reverse inequality, as desired.

By employing Theorem 3.62, we conclude the following stronger data-processing inequality for the fidelity, which strengthens the statement of The-orem 3.59 considerably:

Proposition 3.63 Improved Data Processing for Fidelity

Let ρ and σ be quantum states, and let N be a positive, trace-preservingmap. Then the following inequality holds:

F(ρ, σ) ≤ F(N(ρ),N(σ)). (3.5.103)

PROOF: The reasoning here follows the reasoning of the proof of Theorem 3.52closely. Let Λ′xx∈X be a POVM. Then consider that

∑x∈X

√Tr[Λ′xN(ρ)]Tr[Λ′xN(σ)] = ∑

x∈X

√Tr[N†(Λ′x)ρ]Tr[N†(Λ′x)σ] (3.5.104)

≥ minΛxx∈X

∑x∈X

√Tr[Λxρ]Tr[Λxσ] (3.5.105)

=√

F(ρ, σ). (3.5.106)

The inequality follows because N†(Λ′x)x∈X is a POVM since Λ′xx∈X isand N is a positive, trace-preserving map, so that N†(Λ′x) ≥ 0 for all x ∈ X

186

Page 198: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

and ∑x∈XN†(Λ′x) = N† (∑x∈X Λ′x) = N† (1) = 1. The last equality followsfrom Theorem 3.62. Since the inequality holds for all POVMs Λ′xx∈X, weconclude that

√F(N(ρ),N(σ)) = min

Λ′xx∈X∑

x∈X

√Tr[Λ′xN(ρ)]Tr[Λ′xN(σ)] (3.5.107)

≥√

F(ρ, σ), (3.5.108)

where we have again employed Theorem 3.62 for the equality.

We now establish a useful relation between trace distance and fidelity.

Theorem 3.64 Relation Between Trace Distance and Fidelity

For two states ρ and σ, the following chain of inequalities relates theirtrace distance with the fidelity between them:

1−√

F(ρ, σ) ≤ 12‖ρ− σ‖1 ≤

√1− F(ρ, σ). (3.5.109)

PROOF: We first prove the upper bound. To do so, recall the formula in (3.5.1)for the trace distance between two pure states. If we let |ψρ〉RA and |ψσ〉RA bepurifications of ρA and σA, respectively, such that F(ρA, σA) = |〈ψρ|ψσ〉|2, andif we use the data-processing inequality for the trace distance with respect tothe partial trace channel TrR, then we obtain

12‖ρA − σA‖1 =

12‖TrR[|ψρ〉〈ψρ|RA − |ψσ〉〈ψσ|RA]‖1 (3.5.110)

≤ 12‖|ψρ〉〈ψρ|RA − |ψσ〉〈ψσ|RA‖1 (3.5.111)

=

√1− |〈ψρ|ψσ〉|2 (3.5.112)

=√

1− F(ρA, σA), (3.5.113)

as required.

For the lower bound, we use the results of Theorems 3.62 and 3.54. Theo-rem 3.62 tells us that there exists a POVM Λxx∈X such that

F(ρ, σ) =

(∑

x∈X

√Tr[Λxρ]

√Tr[Λxσ]

)2

(3.5.114)

187

Page 199: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

≡(

∑x∈X

√p(x)q(x)

)2

, (3.5.115)

where we have let p(x) := Tr[Λxρ] and q(x) := Tr[Λxσ]. Using this, observethat

∑x∈X

(√p(x)−

√q(x)

)2= ∑

x∈X

(p(x)− 2

√p(x)q(x) + q(x)

)(3.5.116)

= 2− 2 ∑x∈X

√p(x)q(x) (3.5.117)

= 2− 2√

F(ρ, σ). (3.5.118)

Now, Theorem 3.54 tells us that

‖ρ− σ‖1 = maxΩyy

∑x∈X|r(y)− s(y)|, (3.5.119)

where r(y) := Tr[Ωyρ] and s(y) := Tr[Ωyσ]. In particular, for the POVMΛxx∈X that achieves the fidelity, we have

∑x∈X|p(x)− q(x)| ≤ ‖ρ− σ‖1 . (3.5.120)

Using this, and the fact that |√

p(x)−√

q(x)| ≤ |√

p(x) +√

q(x)|, we obtain

∑x∈X

(√p(x)−

√q(x)

)2≤ ∑

x∈X

∣∣∣√

p(x)−√

q(x)∣∣∣∣∣∣√

p(x) +√

q(x)∣∣∣ (3.5.121)

= ∑x∈X|p(x)− q(x)| (3.5.122)

≤ ‖ρ− σ‖1 . (3.5.123)

So we have

2− 2√

F(ρ, σ) = ∑x∈X

(√p(x)−

√q(x)

)2≤ ‖ρ− σ‖1 , (3.5.124)

which is the required lower bound.

188

Page 200: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Lemma 3.65 Gentle Measurement

Let ρ be a density operator and Λ a measurement operator, satisfying0 ≤ Λ ≤ I and Tr[Λρ] ≥ 1− ε, for ε ∈ [0, 1]. Then the post-measurementstate

ρ′ :=

√Λρ√

ΛTr[Λρ]

(3.5.125)

satisfies

F(ρ, ρ′) ≥ 1− ε, (3.5.126)12

∥∥ρ− ρ′∥∥

1 ≤√

ε. (3.5.127)

PROOF: Suppose first that ρ is a pure state |ψ〉〈ψ|. The post-measurementstate is then √

Λ|ψ〉〈ψ|√

Λ〈ψ|Λ|ψ〉 . (3.5.128)

The fidelity between the original state |ψ〉 and the post-measurement stateabove is as follows:

〈ψ|(√

Λ|ψ〉〈ψ|√

Λ〈ψ|Λ|ψ〉

)|ψ〉 =

∣∣∣〈ψ|√

Λ|ψ〉∣∣∣2

〈ψ|Λ|ψ〉 ≥ |〈ψ|Λ|ψ〉|2

〈ψ|Λ|ψ〉 (3.5.129)

= 〈ψ|Λ|ψ〉 ≥ 1− ε. (3.5.130)

The first inequality follows because√

Λ ≥ Λ when Λ ≤ I. The second in-equality follows from the hypothesis of the lemma. Now let us consider whenwe have mixed states ρA and ρ′A. Suppose |ψ〉RA and |ψ′〉RA are respectivepurifications of ρA and ρ′A, where

∣∣ψ′⟩

RA ≡IR ⊗

√ΛA|ψ〉RA√

〈ψ|IR ⊗ΛA|ψ〉RA. (3.5.131)

Then we can apply the data-processing inequality for fidelity (Proposition 3.63)and the result above for pure states to conclude that

F(ρA, ρ′A) ≥ F(ψRA, ψ′RA) ≥ 1− ε. (3.5.132)

We obtain the bound on the normalized trace distance 12

∥∥ρA − ρ′A∥∥

1 by ex-ploiting Theorem 3.64.

189

Page 201: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

3.5.2.1 Sine Distance

Unlike the trace distance, the fidelity is not a distance measure in the mathe-matical sense because it does not satisfy the triangle inequality. The followingdistance measure based on the fidelity, however, does satisfy the triangle in-equality, along with the other properties that define a distance measure.

Definition 3.66 Sine Distance

For two states ρ and σ, we define the sine distance between ρ and σ as

P(ρ, σ) :=√

1− F(ρ, σ). (3.5.133)

The measure P(ρ, σ) is known as the sine distance due to the fact thatF(ρ, σ) has the interpretation as the largest value of the squared cosine of theangle between two arbitrary purifications of ρ and σ (see Theorem 3.58), sothat

√1− F(ρ, σ) has the interpretation as the sine of the same angle. Re-

lated to this interpretation, the measure P(ρ, σ) is equal to the minimum tracedistance between purifications of ρ and σ:

inf|ψρ〉RA,|ψσ〉RA

12‖|ψρ〉〈ψρ|RA − |ψσ〉〈ψσ|RA‖1

= inf|ψρ〉RA,|ψσ〉RA

√1− |〈ψρ|ψσ〉RA |2 = P(ρ, σ), (3.5.134)

where the optimization is over all purifications |ψρ〉RA and |ψσ〉RA of ρ and σ,respectively. This follows by applying (3.5.1), as well as Uhlmann’s theorem(Theorem 3.58).

Since the fidelity satisfies the data-processing inequality with respect topositive, trace-preserving maps (Proposition 3.63), so does the sine distance:for two states ρ and σ and a positive, trace-preserving map N, we have that

P(ρ, σ) ≥ P(N(ρ),N(σ)). (3.5.135)

Lemma 3.67 Triangle Inequality for Sine Distance

Let ρ, σ, and ω be quantum states. Then the triangle inequality holds for

190

Page 202: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

the sine distance:

P(ρ, σ) ≤ P(ρ, ω) + P(ω, σ). (3.5.136)

PROOF: Define the canonical purifications as

|ψρ〉RA = (1R ⊗√

ρA)|Γ〉RA, (3.5.137)|ψσ〉RA = (1R ⊗

√σA)|Γ〉RA, (3.5.138)

|ψω〉RA = (1R ⊗√

ωA)|Γ〉RA, (3.5.139)

where |Γ〉RA is the maximally entangled vector from (2.2.26). Recalling (3.5.1),for pure states |φ〉 and |ϕ〉, we have that

12‖|φ〉〈φ| − |ϕ〉〈ϕ|‖1 =

√1− F(φ, ϕ) =

√1− |〈φ|ϕ〉|2. (3.5.140)

Let UR and VR be arbitrary unitaries acting on the reference system R. Fromthe fact that trace distance obeys the triangle inequality and the equality givenabove, we find that

√1− |〈ψσ|RA(WR ⊗ 1A)|ψρ〉RA|2 ≤

√1− |〈ψσ|RA(U†

R ⊗ 1A)|ψω〉RA|2

+√

1− |〈ψω|RA(VR ⊗ 1A)|ψρ〉RA|2, (3.5.141)

where WR := U†RVR. By minimizing the left-hand side with respect to all

unitaries WR, and applying Uhlmann’s theorem, we find that

√1− F(σA, ρA) ≤

√1− |〈ψσ|RA(U†

R ⊗ 1A)|ψω〉RA|2

+√

1− |〈ψω|RA(VR ⊗ 1A)|ψρ〉RA|2. (3.5.142)

Since the inequality holds for arbitrary unitaries U and V, it holds for theminimum of each term on the right, and so this, combined with Uhlmann’stheorem (Theorem 3.58), implies the desired result:

√1− F(σA, ρA) ≤

√1− F(σA, ωA) +

√1− F(ωA, ρA), (3.5.143)

concluding the proof.

191

Page 203: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

3.5.3 Diamond Norm and Diamond Distance

Just as there are measures of distinguishability for quantum states, it is im-portant to develop measures of distinguishability for quantum channels, inorder to assess the performance of quantum information-processing protocolsthat attempt to simulate an ideal process. The measures that we introduce inthis section are again motivated by operational concerns, stemming from theability of an experimenter to distinguish one quantum channel from anotherwhen given access to a single use of the channel. In what follows, our dis-cussion mirrors and generalizes the discussion in Section 3.5.1 that motivatedtrace distance as a measure of distinguishability for quantum states.

How should we measure the distance between two quantum channelsNA→B and MA→B? Related, how should we assess the performance of aquantum information-processing protocol in which the ideal channel to besimulated is NA→B but the channel realized in practice is MA→B? Supposethat a third party is trying to assess how distinguishable the actual channelMA→B is from the ideal channel NA→B. Such an individual has access to boththe input and output ports of the channel, and so the most general strategyfor the distinguisher to employ is to prepare a state ρRA of a reference sys-tem R and the channel input system A. The distinguisher transmits the Asystem of ρRA into the unknown channel. After that, the distinguisher re-ceives the channel output system B and then performs a measurement de-scribed by the POVM Λx

RBx on the reference system R and the channeloutput system B. The probability of obtaining a particular outcome Λx

RB isgiven by the Born rule. In the case that the unknown channel is NA→B, thisprobability is Tr[Λx

RBNA→B(ρRA)], and in the case that the unknown channelis MA→B, this probability is Tr[Λx

RBMA→B(ρRA)]. What we demand is thatthe absolute deviation between the two probabilities Tr[Λx

RBNA→B(ρRA)] andTr[Λx

RBMA→B(ρRA)] is no larger than some tolerance ε. Since this should bethe case for all possible input states and measurement outcomes, what wedemand mathematically is that

supρRA, 0≤ΛRB≤1RB

|Tr[ΛRBNA→B(ρRA)]− Tr[ΛRBMA→B(ρRA)]| ≤ ε. (3.5.144)

As a consequence of the characterization of trace distance from Theorem 3.50we have

supρRA,0≤ΛRB≤1RB

|Tr[ΛRB(NA→B −MA→B)(ρRA)]|

192

Page 204: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= supρRA

12‖NA→B(ρRA)−MA→B(ρRA)‖1 =:

12‖N−M‖ , (3.5.145)

where 12 ‖N−M‖ is defined to be the normalized diamond distance between

N and M. This indicates that if 12 ‖N−M‖ ≤ ε, then the absolute deviation

between probabilities for every possible input state and measurement oper-ator never exceeds ε, so that the approximation between quantum channelsNA→B and MA→B is naturally quantified by the normalized diamond distance12 ‖N−M‖.

With the above in mind, we now define the diamond norm for Hermiticity-preserving maps, from which the diamond distance measure for channelsarises.

Definition 3.68 Diamond Norm

The diamond norm of a Hermiticity-preserving map PA→B is defined as

‖P‖ := supρRA

‖PA→B(ρRA)‖1 , (3.5.146)

where the optimization is over all states ρRA, with the dimension of Runbounded.

For two quantum channels N and M, note that the difference N −M isa Hermiticity-preserving map. An important simplification of the diamondnorm of a Hermiticity-preserving map is given by the following proposition:

Proposition 3.69

The diamond norm of a Hermiticity-preserving map PA→B can be cal-culated as

‖P‖ = supψRA

‖PA→B(ψRA)‖1 , (3.5.147)

where the optimization is over all pure states ψRA, such that the dimen-sion of R is equal to the dimension of the system A.

PROOF: Let ρRA be an arbitrary state. It has a spectral decomposition as fol-lows:

ρRA = ∑x

p(x)ψxRA, (3.5.148)

193

Page 205: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

where p(x)x is a probability distribution and ψxRAx is a set of pure states.

From the convexity of the trace norm (see Section 2.2.2), it follows that

‖PA→B(ρRA)‖1 ≤∑x

p(x) ‖PA→B(ψxRA)‖1 (3.5.149)

≤ supx‖PA→B(ψ

xRA)‖1 (3.5.150)

≤ supψRA

‖PA→B(ψRA)‖1 (3.5.151)

From the Schmidt decomposition (Theorem 3.2), it follows that the rank of thereduced state ψR is no larger than the dimension of system A. So then it suf-fices to optimize with respect to all pure states ψRA, such that the dimensionof R is equal to the dimension of the system A.

The normalized diamond distance between two quantum channels canbe computed via a semi-definite program (SDP). The following propositionstates this fact formally and also states the dual optimization problem. Ap-pendix 3.C provides a proof.

Proposition 3.70 SDPs for Normalized Diamond Distance

The diamond distance between two quantum channels NA→B andMA→B can be written as the following semi-definite programs:

12‖N−M‖

= supρR≥0,

ΩRB≥0

Tr[ΩRB(ΓNRB − ΓM

RB)] : ΩRB ≤ ρR ⊗ 1B, Tr[ρR] = 1 (3.5.152)

= infµ≥0,

ZRB≥0

µ : ZRB ≥ ΓNRB − ΓM

RB, µ1R ≥ TrB[ZRB]. (3.5.153)

The latter expression is equal to

infZRB≥0

‖TrB[ZRB]‖∞ : ZRB ≥ ΓNRB − ΓM

RB. (3.5.154)

As described at the beginning of this section, the diamond distance has anoperational meaning in terms of the task of channel discrimination, which is ageneralization of state discrimination (see Section 3.1.3). Let us now analyzethe task of channel discrimination in more detail.

194

Page 206: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

?N or M Λ, 1−Λ

ρRA

R

A B

“N” or “M”

FIGURE 3.19: Schematic depiction of a general strategy for the task ofdiscriminating between the quantum channels N and M. Starting withthe state ρRA, the system A is passed through the device that implementseither N or M. Then, both systems R and A are measured according to thePOVM Λ, 1− Λ, the result of which is used to form a guess for whichchannel the device implements.

Suppose that Alice gives Bob a device that implements either the channelN or the channel M, but she does not tell him which channel it implements.Bob’s task is to decide which channel the device implements. Suppose thatthe channels N and M have prior probabilities λ and 1− λ of being selected,respectively. The only way for Bob to determine which channel the deviceimplements (without guessing randomly) is to pass a quantum system, say inthe state ρ, through it. He can then perform a measurement on the resultingoutput state and make a guess as to which channel was implemented. There-fore, in addition to having the freedom to choose any binary measurement(which is the case in state discrimination), in channel discrimination Bob alsohas the freedom to prepare a system A and a reference system R in any stateρRA of his choosing, with the system A being passed through the device.

For every fixed input state ρRA, there are two possible output states, de-pending on which channel was implemented. This means that, for every fixedinput state, the task of channel discrimination reduces to the task of state dis-crimination. Using the result of Theorem 3.13, for the input state ρRA thecorresponding optimal error probability (i.e., the error probability obtainedby optimizing over all measurements) is

p∗err(ρRA) =12(1− ‖λNA→B(ρRA)− (1− λ)MA→B(ρRA)‖1) . (3.5.155)

Then, optimizing over all input states ρRA in order to minimize the error prob-

195

Page 207: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

ability, we find that

infρRA

p∗err(ρRA) =12

(1− sup

ρRA

‖(λNA→B − (1− λ)MA→B)(ρRA)‖1

)(3.5.156)

=12(1− ‖λN− (1− λ)M‖) , (3.5.157)

where to obtain the last line we used the definition of the diamond norm.The optimal error probability for the task of channel discrimination is thus asimple function of the diamond norm.

3.5.4 Fidelity Measures for Channels

The diamond distance is a distance measure for channels that is based on thetrace distance for states. We now define a fidelity-based quantity for channelsthat can be used to assess its ability to preserve entanglement.

Definition 3.71 Entanglement Fidelity of a Channel

For a quantum channel NA with input and output systems of equal di-mension d, we define its entanglement fidelity as

Fe(N) := 〈Φ|RA(idR ⊗NA)(ΦRA)|Φ〉RA, (3.5.158)

where

|Φ〉RA =1√d

d−1

∑i=0|i, i〉RA. (3.5.159)

Notice that the entanglement fidelity of a channel is the fidelity of the max-imally entangled state with the Choi state of the channel. Intuitively, then, theentanglement fidelity quantifies how good a channel is at preserving the en-tanglement between two systems when it acts on one of the two systems.

It turns out that the entanglement fidelity is very closely related to anotherfidelity-based measure on quantum channels called the average fidelity:

F(N) :=∫

ψ〈ψ|N(ψ)|ψ〉 dψ, (3.5.160)

where we integrate over all pure states acting on the input Hilbert space ofN with respect to the Haar measure. The Haar measure is the uniform prob-

196

Page 208: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

ability measure on pure quantum states (see the remark after (2.5.11)). Fora quantum channel N with input system dimension d, the following identityholds

F(N) =dFe(N) + 1

d + 1. (3.5.161)

Instead of taking the average as in (3.5.160), we can take the minimumover all input states to obtain the minimum fidelity:

Fmin(N) := infψ〈ψ|N(ψ)|ψ〉, (3.5.162)

where the optimization is over all pure states ψ acting on the input Hilbertspace of the channel N. By introducing a reference system R and optimizingover all joint states |ψ〉RA of R and the input system A of the channel N, weobtain a fidelity measure that generalizes the entanglement fidelity.

Definition 3.72 Fidelity of a Quantum Channel

For a quantum channel NA with equal input and output system dimen-sion, we define the fidelity of N as

F(N) := infψRA〈ψ|RA(idR ⊗NA)(ψRA)|ψ〉RA, (3.5.163)

where we take the infimum over all pure states |ψ〉RA, with the dimen-sion of R equal to the dimension of A.

Note that the state |ψ〉RA = |Φ〉RA is a special case in the optimization in(3.5.163). This implies that, for a channel N, F(N) ≤ Fe(N).

More generally, we define the fidelity between two quantum channelsNA→B and MA→B as follows:

Definition 3.73 Fidelity of Quantum Channels

Let NA→B and MA→B be quantum channels. Their channel fidelity isdefined as

F(N,M) = infρRA

F(NA→B(ρRA),MA→B(ρRA)), (3.5.164)

where the infimum is taken over all bipartite states ρRA, with the dimen-sion of R arbitrarily large.

197

Page 209: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

REMARK: Similar to the diamond distance, we define the channel fidelity as above inorder to indicate its operational meaning with an infimum over all possible input states,but it is not necessary to take the infimum over all bipartite states. One can instead restrictthe infimum to be over pure bipartite states where the reference system R is isomorphic tothe channel input system A, so that

F(N,M) = infψRA

F(NA→B(ψRA),MA→B(ψRA)), (3.5.165)

where ψRA is a pure bipartite state with system R is isomorphic to system A. The samestatement thus applies to (3.5.163). An argument for this is similar to that given in the proofof Proposition 3.69, except using the joint concavity of root fidelity rather than convexityof the trace norm.

Here, we provide a different argument for this fact. First, we have that

infρRA

F(NA→B(ρRA),MA→B(ρRA)) ≤ infψRA

F(NA→B(ψRA),MA→B(ψRA)) (3.5.166)

which holds simply by restricting the optimization on the left-hand side to pure states.

Next, given a state ρRA, with the dimension of R not necessarily equal to the dimen-sion of A, we can purify it to a state ψR′RA. Then, using the data-processing inequality forthe fidelity with respect to the partial trace channel TrR′ (Proposition 3.63), we find that

F(NA→B(ρRA),MA→B(ρRA)) = F (NA→B(TrR′ [ψR′RA]),MA→B(TrR′ [ψR′RA])) (3.5.167)= F (TrR′ [NA→B(ψR′RA)], TrR′ [MA→B(ψR′RA)]) (3.5.168)≥ F(NA→B(ψR′RA),MA→B(ψR′RA)) (3.5.169)≥ inf

ψR′RAF(NA→B(ψR′RA),MA→B(ψR′RA)). (3.5.170)

Since the state ρRA is arbitrary, we obtain

infρRA

F(NA→B(ρRA),MA→B(ρRA)) ≥ infψRA

F(NA→B(ψRA),MA→B(ψRA)). (3.5.171)

Finally, by the Schmidt decomposition theorem (Theorem 3.2), for every pure stateψRA, the rank of the reduced state ψR need not exceed the dimension of A, implying thatit suffices to optimize over pure states for which the system R has the same dimension asthe system A. We thus obtain

infρRA

F(NA→B(ρRA),MA→B(ρRA)) = infψRA

F(NA→B(ψRA),MA→B(ψRA) (3.5.172)

= F(N,M). (3.5.173)

We then have that F(N) = F(N, id). In other words, the fidelity F(N) of aquantum channel N can be viewed as the fidelity between N and the identitychannel id.

198

Page 210: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Similar to the diamond distance, the fidelity of quantum channels can becomputed by means of primal and dual semi-definite programs:

Proposition 3.74 SDP for Root Fidelity of Channels

Let NA→B and MA→B be quantum channels with respective Choi opera-tors ΓN

RB and ΓMRB. Then their root channel fidelity

√F(NA→B,MA→B) := inf

ψRA

√F(NA→B(ψRA),MA→B(ψRA)) (3.5.174)

can be calculated by means of the following semi-definite program:√

F(NA→B,MA→B)

= supλ≥0,QRB

λ : λIR ≤ Re[TrB[QRB]],

(ΓN

RB Q†RB

QRB ΓMRB

)≥ 0

(3.5.175)

=12

infρR≥0,WRB,ZRB

Tr[ΓN

RBWRB] + Tr[ΓMRBZRB] : Tr[ρR] = 1,

(WRB ρR ⊗ IB

ρR ⊗ IB ZRB

)≥ 0

. (3.5.176)

The expression in (3.5.175) is equal to

supQRB

λmin (Re[TrB[QRB]]) :

(ΓN

RB Q†RB

QRB ΓMRB

)≥ 0

, (3.5.177)

where λmin denotes the minimum eigenvalue of its argument.

PROOF: See Appendix 3.D.2.

The inequality in (3.5.109) relating the fidelity between two states ρ andσ and their trace distance can be used to relate the fidelity-based distancemeasure F(N,M) on channels and the diamond distance 1

2 ‖N−M‖. It isstraightfoward to show that

1−√

F(N,M) ≤ 12‖N−M‖ ≤

√1− F(N,M). (3.5.178)

The following proposition relates the fidelity F(N) of a channel N to itsminimum fidelity Fmin(N) in (3.5.162), telling us that if Fmin(N) is large then

199

Page 211: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

so is F(N).

Proposition 3.75

Let N be a quantum channel. For all ε ∈ [0, 1], if Fmin(N) ≥ 1− ε, thenF(N) ≥ 1− 2

√ε.

PROOF: The inequality in Fmin(N) ≥ 1− ε implies that the following inequal-ity holds for all state vectors |φ〉 ∈ H:

〈φ| [|φ〉〈φ| −N(|φ〉〈φ|)] |φ〉 ≤ ε. (3.5.179)

By (3.5.109), this implies that

‖|φ〉〈φ| −N(|φ〉〈φ|)‖1 ≤ 2√

ε, (3.5.180)

for all state vectors |φ〉 ∈ H. We will show that∣∣∣〈φ|

[|φ〉〈φ⊥| −N(|φ〉〈φ⊥|)

]|φ⊥〉

∣∣∣ ≤ 2√

ε, (3.5.181)

for every orthonormal pair|φ〉, |φ⊥〉

of state vectors in H. Set

|wk〉 :=|φ〉+ ik|φ⊥〉√

2, (3.5.182)

for k ∈ 0, 1, 2, 3. Then, it follows that

|φ〉〈φ⊥| = 12

3

∑k=0

ik|wk〉〈wk|. (3.5.183)

Consider now that∣∣∣〈φ|

[|φ〉〈φ⊥| −N(|φ〉〈φ⊥|)

]|φ⊥〉

∣∣∣

≤∥∥∥|φ〉〈φ⊥| −N(|φ〉〈φ⊥|)

∥∥∥∞

(3.5.184)

≤ 12

3

∑k=0‖|wk〉〈wk| −N(|wk〉〈wk|)‖∞ (3.5.185)

≤ 14

3

∑k=0‖|wk〉〈wk| −N(|wk〉〈wk|)‖1 (3.5.186)

≤ 2√

ε. (3.5.187)

200

Page 212: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

The first inequality follows from the characterization of the operator normin (2.2.82) as ‖X‖∞ = sup|φ〉,|ψ〉 |〈ψ|X|φ〉|, where the optimization is with re-spect to pure states. The second inequality follows from substituting (3.5.183)and applying the triangle inequality and homogeneity of the ∞-norm. Thethird inequality follows because the ∞-norm of a traceless Hermitian opera-tor is bounded from above by half of its trace norm (see Lemma 2.10 below).The final inequality follows from applying (3.5.180). Let |ψ〉 ∈ H′ ⊗H be anarbitrary state. All such states have a Schmidt decomposition of the followingform:

|ψ〉 = ∑x

√p(x)|ζx〉 ⊗ |ϕx〉, (3.5.188)

where p(x)x is a probability distribution and |ζx〉x and |ϕx〉x are setsof states. Then, consider that

1− 〈ψ|(idH′ ⊗N)(|ψ〉〈ψ|)|ψ〉= 〈ψ|(idH′ ⊗ idH − idH′ ⊗N)(|ψ〉〈ψ|)|ψ〉 (3.5.189)= 〈ψ|(idH′ ⊗ [idH −N])(|ψ〉〈ψ|)|ψ〉 (3.5.190)

= ∑x,y

p(x)p(y)〈ϕx|[|ϕx〉〈ϕy| −N(|ϕx〉〈ϕy|)

]|ϕy〉. (3.5.191)

Now, applying the triangle inequality and (3.5.181), we find that the followingholds for all |ψ〉 ∈ H′ ⊗H:

1− 〈ψ|(idH′ ⊗N)(|ψ〉〈ψ|)|ψ〉

=

∣∣∣∣∣∑x,yp(x)p(y)〈ϕx|

[|ϕx〉〈ϕy| −N(|ϕx〉〈ϕy|)

]|ϕy〉

∣∣∣∣∣ (3.5.192)

≤∑x,y

p(x)p(y)∣∣〈ϕx|

[|ϕx〉〈ϕy| −N(|ϕx〉〈ϕy|)

]|ϕy〉

∣∣ (3.5.193)

≤ 2√

ε. (3.5.194)

This concludes the proof.

3.6 Bibliographic Notes

More details on many of the concepts that have been presented in this chap-ter can be found in the books of Nielsen and Chuang (2000), Holevo (2013),Hayashi (2017), Wilde (2017), and Watrous (2018). For a treatment of these

201

Page 213: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

concepts in infinite-dimensional Hilbert spaces, see the books by Holevo (2013)and Heinosaari and Ziman (2012). A more mathematically oriented exposi-tion of completely positive maps is available in Paulsen (2003).

The mathematical theory of quantum mechanics was developed by vonNeumann (1927a, 1932) and Landau (1927). The book by Helstrom (1976)also contains early developments in the theory of quantum measurements.

We have used quantum-optical modes as a concrete example of qubit en-codings throughout this chapter. For a detailed reference on this topic, see thereview by Kok et al. (2007).

Lemma 3.6, regarding purifications of permutation-invariant states, is dueto Renner (2005). Separable and entangled states were defined by Werner(1989b). The relevance of the partial transpose for characterizing entangle-ment in quantum information was given by Peres (1996); Horodecki et al.(1996), and the set of positive-partial-tranpose states was discussed by Horodecki(1997); Horodecki et al. (1998). In particular, the existence of PPT entangledstates was found by Horodecki (1997).

The problem of state discrimination as described in Section 3.1.3 was con-sidered by Helstrom (1967) (see also Helstrom (1969)) and Holevo (1972a),who determined the optimal success probability in the case of projective mea-surements and general POVMs, respectively. The proof of the optimal mea-surement operators in Theorem 3.13 is due to Jencova (2010). Lemma 3.15 isdue to Audenaert et al. (2007), with the simple proof presented here attributedby Jaksic et al. (2012) and Audenaert (2014) as being due to N. Ozawa. TheChernoff bound for probability distributions was given by Chernoff (1952).The corresponding bound for quantum states in Theorem 3.14 was estab-lished in two works: Audenaert et al. (2007) determined the upper bound onthe optimal error exponent, while Nussbaum and Szkoła (2009) establishedthe lower bound. The semi-definite program and complementary slacknessconditions for multiple state discrimination in Section 3.1.3.1 are due to Yuenet al. (1975).

The Choi representation of a linear map is named after Choi (1975), withhis paper establishing the equivalence between complete positivity of a mapand positivity of its Choi representation. Naimark’s dilation theorem for pos-itive operator-valued measures was established by Naimark (1940) (see alsoGelfand and Naimark (1943)). Stinespring’s dilation theorem for completelypositive maps was established by Stinespring (1955). An early exposition ofthe theory of quantum channels was presented by Kraus (1983). Leifer (2007)

202

Page 214: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

proposed the concept of conditional states (see also (Leifer and Spekkens,2013)). Proposition 3.19 was found by Christandl and Winter (2004). Thechannel in (3.2.72) for reversing the action of an isometric channel was pre-sented by Wilde (2017). The notions of a complementary channel and a degrad-able channel were defined by Devetak and Shor (2005). Anti-degradablechannels were defined by Caruso and Giovannetti (2006). The quantum era-sure channel was defined by Grassl et al. (1997). The connection of the am-plitude damping channel to the bosonic pure-loss channel was realized byGiovannetti and Fazio (2005). The quantum instrument formalism was de-veloped by Davies and Lewis (1970) and further developed by Ozawa (1984).Entanglement-breaking channels were defined by Horodecki et al. (2003) andseveral of their properties were established therein. Hadamard channels weredefined by King et al. (2007).

The Petz recovery map was established by Petz (1986b) and Petz (1988)in the context of proving the conditions under which equality holds in themonotonicty of the quantum relative entropy. The results therein are statedfor von Neumann algebras. A more accessible exposition that considers op-erators acting only on finite-dimensional Hilbert spaces can be found in Petz(2003) and Mosonyi and Petz (2004) (see also Hayden et al. (2004)).

The paradigm of local operations and classical communication was de-fined by Bennett et al. (1996c), and its mathematical properties were exploredin more detail by Chitambar et al. (2014). See Section 4.3 of Chitambar et al.(2014) for a justification of the example provided in Section 3.2.11 of a separa-ble channel that is not an LOCC channel. Separable channels were defined byVedral et al. (1997); Barnum et al. (1998). The existence of a separable chan-nel that is not LOCC was found by Bennett et al. (1999a). Completely PPT-preserving channels were defined by Rains (1999a) and further developed byChitambar et al. (2020).

The quantum teleportation protocol for qubits and qudits presented inSection 3.3 was devised by Bennett et al. (1993). The generalization to groupsthat act irreducibly on the input state was presented by Braunstein et al.(2000); Werner (2001). Covariant channels were considered by Holevo (2002b).For an overview of group- and representation-theoretic concepts for finitegroups, see Steinberg (2011). The notion of teleportation simulation of a quan-tum channel was presented by Bennett et al. (1996c) (see Section V therein).The LOCC simulation of a quantum channel was given by Horodecki et al.(1999) (see Eq. (10) therein), and the related PPT simulation of a quantumchannel was given by Kaur and Wilde (2017). The generalized teleporta-

203

Page 215: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

tion simulation of a quantum channel was developed for groups that actirreducibly on the channel input space by Chiribella et al. (2009) (see Sec-tion VII therein). Pirandola et al. (2017) played a role in developing the toolof LOCC/teleportation simulation in more recent years. The fact that thechannel twirl can be realized by generalized teleportation simulation was ob-served by Kaur and Wilde (2017) (see Appendix B therein). The teleportationprotocol was extended to states of bosonic systems by Braunstein and Kimble(1998). Teleportation simulation of bosonic Gaussian channels was given byGiedke and Ignacio Cirac (2002); Niset et al. (2009), and a detailed analysis ofconvergence in this scenario was presented by Wilde (2018a).

The quantum super-dense coding protocol was given by Bennett and Wies-ner (1992).

The quantum fidelity was defined by Uhlmann (1976), and Theorem 3.58is due to Uhlmann (1976). The semi-definite program for root fidelity inProposition 3.56 was established by Watrous (2013), and we have followedthe proof therein. The fact that the fidelity is achieved by a quantum mea-surement was realized by Fuchs and Caves (1995). The relation between tracedistance and fidelity presented in Theorem 3.64 was proved by Fuchs andvan de Graaf (1998). For very closely related inequalities, with the fidelityreplaced by the “Holevo fidelity” (Tr[

√ρ√

σ])2, see Holevo (1972b). The sinedistance was defined by Rastegin (2002, 2003); Gilchrist et al. (2005); Rastegin(2006), and its interpretation in terms of the minimal trace distance of pu-rifications was given by Rastegin (2006). The sine distance was generalizedto subnormalized states by Tomamichel et al. (2010), where it was given thename “purified distance.”

The diamond norm was presented and studied by Kitaev (1997), who ap-plied it to problems in quantum information theory and quantum computa-tion. The operational interpretation of the diamond distance in terms of hy-pothesis testing of quantum channels was given by Kretschmann and Werner(2004); Rosgen and Watrous (2005); Gilchrist et al. (2005). More propertiesof the diamond norm can be found in Watrous (2018). The SDP in Propo-sition 3.70 for the normalized diamond distance of quantum channels wasgiven by Watrous (2009).

Schumacher (1996) introduced the entanglement fidelity of a quantumchannel, and Barnum et al. (2000) made further observations regarding it.Nielsen (2002) provided a simple proof for the relation between entanglementfidelity and average fidelity in (3.5.161). The fidelity of quantum channels wasintroduced by Gilchrist et al. (2005), and it can be understood as a special case

204

Page 216: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

of the generalized channel divergence (Leditzky et al., 2018). A semi-definiteprogram for the root fidelity of channels was given by Yuan and Fung (2017).The particular semi-definite program in Proposition 3.74, for the root fidelityof channels, was presented by Katariya and Wilde (2020). Proposition 3.75was established by Barnum et al. (2000) and reviewed by Kretschmann andWerner (2004). Here we followed the proof given by (Watrous, 2018, The-orem 3.56), which therein established a relation between trace distance anddiamond distance between an arbitrary channel and the identity channel.

Appendix 3.A The Quantum Chernoff Bound

The quantities ξ(ρ, σ) and ξ(ρ, σ) are by definition the optimal error expo-nents, and the quantum Chernoff bound (Theorem 3.14) establishes that theyare equal to the Chernoff divergence in (3.1.147). It is an important quantityto consider because it tells us how fast the error probability converges to zeroas n increases.

We now proceed to the proof of the quantum Chernoff bound, which is thefollowing statement: For quantum states ρ and σ, the optimal error exponentsof discrimination, as defined in (3.1.145) and (3.1.146), are given by

ξ(ρ, σ) = ξ(ρ, σ) = sups∈(0,1)

(− log2 Tr[ρsσ1−s]

). (3.A.1)

3.A.1 Proof of Theorem 3.14

If λ is the probability with which ρ is chosen, and 1− λ the probability withwhich σ is chosen, then an application of Lemma 3.15, with A = λρ⊗n, B =(1− λ)σ⊗n, and s ∈ (0, 1), gives the following:

p∗err(λρ⊗n, λσ⊗n) =12(1−

∥∥λρ⊗n − (1− λ)σ⊗n∥∥1

)(3.A.2)

≤ Tr[λs(ρ⊗n)s(1− λ)1−s(σ⊗n)1−s

](3.A.3)

= λs(1− λ)1−sTr[(ρs)⊗n(σ1−s)⊗n] (3.A.4)

= λs(1− λ)1−s(

Tr[ρsσ1−s])n

(3.A.5)

≤(

Tr[ρsσ1−s])n

, (3.A.6)

205

Page 217: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

where to obtain the last line we used the fact that λs(1− λ)1−s ≤ 1. Since thebound above holds for all s ∈ (0, 1), we obtain the following bound:

ξ(ρ, σ) = lim infn→∞

−1n

log2 p∗err(λρ⊗n, λσ⊗n) ≥ sups∈(0,1)

− log2 Tr[ρsσ1−s].

(3.A.7)

For the reverse inequality, we start with the observation that it sufficesto optimize over projectors when determining the optimal error probabilityp∗err(λρ⊗n, λσ⊗n). (This holds because one choice of an optimal measurementis a projective measurement, as shown in the proof of Theorem 3.13, where wecan set Λ0 = 0.) Next, let ρ and σ have the following spectral decompositions:

ρ =d−1

∑i=0

λi|ψi〉〈ψi|, σ =d−1

∑j=0

µj|φj〉〈φj|, (3.A.8)

where d is the dimension of the underlying Hilbert space. Then, for everyprojection Π,

Tr[(1−Π)ρ] =d−1

∑i=0

λiTr[(1−Π)|ψi〉〈ψi|] (3.A.9)

=d−1

∑i=0

λiTr[(1−Π)|ψi〉〈ψi|(1−Π)] (3.A.10)

=d−1

∑i,j=0

λiTr[|φj〉〈φj|(1−Π)|ψi〉〈ψi|(1−Π)] (3.A.11)

=d−1

∑i,j=0

λi∣∣〈ψi|(1−Π)|φj〉

∣∣2 , (3.A.12)

where we have used the fact that 1 = ∑d−1j=0 |φj〉〈φj|. Similarly, using 1 =

∑d−1i=0 |ψi〉〈ψi|, we have that

Tr[Πσ] =d−1

∑j=0

µjTr[Π|φj〉〈φj|] (3.A.13)

=d−1

∑j=0

µjTr[Π|φj〉〈φj|Π] (3.A.14)

=d−1

∑i,j=0

µjTr[|ψi〉〈ψi|Π|φj〉〈φj|Π] (3.A.15)

206

Page 218: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

=d−1

∑i,j=0

µj∣∣〈ψi|Π|φj〉

∣∣2 . (3.A.16)

Then, using the fact that λi ≥ minλi, µj and µj ≥ minλi, µj for all 0 ≤i, j ≤ d− 1, the error probability under the measurement given by Π is

perr(Π, 1−Π; ρ, σ)= Tr[(1−Π)(λρ)] + Tr[Π(1− λ)σ] (3.A.17)

=d−1

∑i,j=0

(λλi

∣∣〈ψi|(1−Π)|φj〉∣∣2 + (1− λ)µj

∣∣〈ψi|Π|φj〉∣∣2)

(3.A.18)

≥ 12

minλλi, (1− λ)µj(∣∣xi,j

∣∣2 +∣∣yi,j

∣∣2), (3.A.19)

where in the last line we have defined xi,j := 〈ψi|(1 − Π)|φj〉 and yi,j :=〈ψi|Π|φj〉. Using the identity |a|2 + |b|2 ≥ 1

2 |a + b|2, which holds for alla, b ∈ C, we obtain

perr(Π, 1−Π; ρ, σ) ≥ 12

d−1

∑i,j=0

minλλi, (1− λ)µj∣∣⟨ψi|φj

⟩∣∣2 . (3.A.20)

Now, let us define two probability distributions, p, q : 0, 1, . . . , d − 12 →[0, 1] as follows:

p(i, j) := λi∣∣⟨ψi|φj

⟩∣∣2 , q(i, j) := µj∣∣⟨ψi|φj

⟩∣∣2 , ∀ 0 ≤ i, j ≤ d− 1. (3.A.21)

It is straightforward to verify that p and q are indeed probability distribu-tions. Then, since the projector Π is arbitrary, and it suffices to optimize overprojective measurements, as explained above, the following inequality holds

p∗err(ρ, σ) ≥ 12

d−1

∑i,j=0

minλp(i, j), (1− λ)q(i, j). (3.A.22)

The expression on the right-hand side of the above inequality is preciselyhalf the optimal error probability for discriminating the two probability dis-tributions p and q, with a prior probability of λ. Indeed, we can see thisby an application of Theorem 3.13 to the case of commutative A and B. Tothis end, letting A = ∑d−1

i=0 ai|i〉〈i| and B = ∑d−1i=0 bi|i〉〈i| where ai, bi ≥ 0

for all i ∈ 0, . . . , d − 1, it follows that an optimal measurement operatorΠ+ = ∑i:ai≥bi

|i〉〈i| and its complement Π− = ∑i:ai<bi|i〉〈i|, so that

infM:0≤M≤1

Tr[(1−M)A] + Tr[MB] = Tr[Π−A] + Tr[Π+B] (3.A.23)

207

Page 219: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= ∑i:ai<bi

ai + ∑i:ai≥bi

bi (3.A.24)

=d−1

∑i=0

minai, bi. (3.A.25)

We denote this optimal error probability by perr(p, q). Therefore,

p∗err(ρ, σ) ≥ 12

p∗err(p, q). (3.A.26)

Now, for n ≥ 2, by following the same arguments as above, we obtain

p∗err(λρ⊗n, λσ⊗n) ≥ 12

p∗err(p(n), q(n)), (3.A.27)

where p(n), q(n) are the n-fold i.i.d probability distributions corresponding top and q, respectively. Then, by the classical Chernoff bound (please consultthe Bibliographic Notes in Section 3.6), we find that

ξ(ρ, σ) = lim supn→∞

−1n

log2 p∗err(λρ⊗n, λσ⊗n) (3.A.28)

≤ lim supn→∞

−1n

log2 p∗err(p(n), q(n)) (3.A.29)

= sups∈(0,1)

− log2

d−1

∑i,j=0

p(i, j)sq(i, j)1−s, (3.A.30)

where the factor of 12 in (3.A.27) vanishes in the asymptotic limit. Finally,

observe that

d−1

∑i,j=0

p(i, j)sq(i, j)1−s =d−1

∑i,j=0

λsi

(∣∣⟨ψi|φj⟩∣∣2)s

µ1−sj

(∣∣⟨ψi|φj⟩∣∣2)1−s

(3.A.31)

=d−1

∑i,j=0

λsi µ1−s

j

∣∣⟨ψi|φj⟩∣∣2 (3.A.32)

=d−1

∑i,j=0

λsi µ1−s

j Tr[|ψi〉〈ψi|φj〉〈φj|] (3.A.33)

= Tr

[(d−1

∑i=0

λsi |ψi〉〈ψi|

)(d−1

∑j=0

µ1−sj |φj〉〈φj|

)](3.A.34)

208

Page 220: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= Tr[ρsσ1−s]. (3.A.35)

Therefore,ξ(ρ, σ) ≤ sup

s∈(0,1)− log2 Tr[ρsσ1−s], (3.A.36)

which, combined with (3.A.7), implies that

ξ(ρ, σ) = ξ(ρ, σ) = sups∈(0,1)

− log2 Tr[ρsσ1−s], (3.A.37)

which is what we set out to prove.

Appendix 3.B Support Containment Lemmas

This appendix contains useful technical lemmas about containments of sup-ports of operators.

Lemma 3.76

Let XAB ∈ L(HA ⊗ HB) be positive semi-definite, and let XA :=TrB[XAB] and XB := TrA[XAB]. Then supp(XAB) ⊆ supp(XA) ⊗supp(XB).

Proof. First suppose that XAB is rank one, so that XAB = |Ψ〉〈Ψ|AB for somevector |Ψ〉AB ∈ HA ⊗HB. Due to the Schmidt decomposition theorem (The-orem 3.2), we have that

|Ψ〉AB = ∑z∈Z

γz|θz〉A ⊗ |ξz〉B, (3.B.1)

where |Z| ≤ mindim(HA), dim(HB), the set γzz is a set of strictly posi-tive numbers, and |θz〉Az and |ξz〉Bz are orthonormal bases. Then

supp(XAB) = span|Ψ〉AB (3.B.2)⊆ span|θz〉A : z ∈ Z ⊗ span|ξz〉B : z ∈ Z. (3.B.3)

The statement then follows for this case because supp(XA) = span|θz〉A :z ∈ Z and supp(XB) = span|ξz〉B : z ∈ Z.

209

Page 221: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

Now suppose that XAB is not rank one. It admits a decomposition intorank-one vectors of the following form:

XAB = ∑x∈X|Ψx〉〈Ψx|AB, (3.B.4)

where |Ψx〉AB ∈ HA ⊗HB for all x ∈ X. Set ΨxAB = |Ψx〉〈Ψx|AB, and let

ΨxA := TrB[Ψx

AB] and ΨxB := TrA[Ψx

AB]. Then

supp(XAB) = span|Ψx〉AB : x ∈ X (3.B.5)

⊆ span

[⋃

x∈X[supp(Ψx

A)⊗ supp(ΨxB)]

](3.B.6)

⊆ span

[⋃

x∈Xsupp(Ψx

A)

]⊗ span

[⋃

x∈Xsupp(Ψx

B)

](3.B.7)

= supp(XA)⊗ supp(XB), (3.B.8)

concluding the proof.

Lemma 3.77

Let XAB, YAB ∈ L(HA⊗HB) be positive semi-definite, and suppose thatsupp(XAB) ⊆ supp(YAB). Then supp(XA) ⊆ supp(YA), where XA :=TrB[XAB] and YA := TrB[YAB].

Proof. First suppose that XAB is rank one, as in the first part of the proof of theprevious lemma, and let us use the same notation as given there. Applyingthe same lemma gives that

supp(XAB) ⊆ supp(YAB) ⊆ supp(YA)⊗ supp(YB), (3.B.9)

which in turn implies that supp(XAB) = span|Ψ〉AB ⊆ supp(YA)⊗ supp(YB).This implies that |θz〉A ∈ supp(YA) for all z ∈ Z, and thus that span|θz〉A ∈supp(YA). We can then conclude the statement in this case because span|θz〉A =supp(XA).

Now suppose that XAB is not rank one. Then it admits a decomposition asgiven in the proof of the previous lemma. Using the same notation, we havethat supp(Ψx

AB) ⊆ supp(YAB) holds for all x ∈ X. Since we have proven the

210

Page 222: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

lemma for rank-one operators, we can conclude that supp(ΨxA) ⊆ supp(YA)

holds for all x ∈ X. As a consequence, we find that

supp(XA) = span

[⋃

x∈Xsupp(Ψx

A)

]⊆ supp(YA), (3.B.10)

concluding the proof.

Appendix 3.C SDPs for Normalized Trace and Di-amond Distance

In this section, we prove that the normalized trace distance between twostates and the normalized diamond distance between two quantum channelscan be computed via SDPs. We start with the normalized trace distance, pro-viding a proof of Proposition 3.51.

3.C.1 Proof of Proposition 3.51

The first expression follows immediately from Theorem 3.50 and (3.5.6). Inparticular, (3.5.8) is in the standard form of (2.4.1), namely,

supTr[DY] : Φ(Y) ≤ C, Y ≥ 0, (3.C.1)

with D = ρ− σ, C = 1, and Φ = id. Then, since Φ† = id, the standard formof the dual problem in (2.4.2) is

infTr[CZ] : Φ†(Z) ≥ D, Z ≥ 0, (3.C.2)

which is precisely (3.5.9). Note that strong duality holds because Z = ρ− σ +δ1 is a strictly feasible point in (3.C.2) for all δ > 0.

We now derive the primal and dual SDPs for the diamond distance be-tween quantum channels, thus providing a proof of Proposition 3.70.

3.C.2 Proof of Proposition 3.70

Employing (3.5.145), consider that

12‖N−M‖

211

Page 223: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

= supψRA≥0,ΛRB≥0

Tr[ΛRB(NA→B −MA→B)(ψRA)] : ΛRB ≤ 1RB,

Tr[ψRA] = 1, Tr[ψ2RA] = 1

, (3.C.3)

where the constraints ψRA ≥ 0, Tr[ψRA] = 1, and Tr[ψ2RA] = 1 correspond to

ψRA being a pure bipartite state. Note that the above is equal to

supψRA≥0,ΛRB≥0

Tr[ΛRB(NA→B −MA→B)(ψRA)] : ΛRB ≤ 1RB, ψR > 0,

Tr[ψRA] = 1, Tr[ψ2RA] = 1

, (3.C.4)

due to the fact that the set of pure states with reduced state ψR positive def-inite is dense in the set of all pure states. Now, recall from (2.2.29) that anysuch pure state can be written as ψRA = XRΓRAX†

R for some linear operatorXR such that Tr[X†

RXR] = 1 and |XR| > 0, where ΓRA defined in (2.2.26).Using this, we find that the objective function can be rewritten as

Tr[ΛRB(NA→B −MA→B)(XRΓRAX†R)] (3.C.5)

= Tr[X†RΛRBXR(NA→B −MA→B)(ΓRA)] (3.C.6)

= Tr[X†RΛRBXR(ΓN

RB − ΓMRB)]. (3.C.7)

Now, observe the following equivalence:

0 ≤ ΛRB ≤ 1RB ⇔ 0 ≤ X†RΛRBXR ≤ X†

RXR ⊗ 1B. (3.C.8)

Thus, defining ΩRB := X†RΛRBXR and ρR := X†

RXR, the optimization in(3.C.4) is equivalent to the following one:

supρR,

ΩRB≥0

Tr[ΩRB(ΓNRB − ΓM

RB)] : ΩRB ≤ ρR ⊗ 1B, ρR > 0, Tr[ρR] = 1, (3.C.9)

giving the equality in (3.5.152). Finally, setting

Y :=(

ΩRB 00 ρR

), (3.C.10)

D :=(

ΓNRB − ΓM

RB 00 0

), (3.C.11)

Φ(Y) :=

ΩRB − ρR ⊗ 1B 0 00 Tr[ρR] 00 0 −Tr[ρR]

, (3.C.12)

212

Page 224: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

C :=

0 0 00 1 00 0 −1

(3.C.13)

we find that (3.5.152) is now in the standard form from (2.4.1), namely,

supY≥0Tr[DY] : Φ(Y) ≤ C. (3.C.14)

Now, to establish the dual SDP in (3.5.153), we first determine the adjointΦ† of Φ using

Tr[Φ(Y)Z] = Tr[YΦ†(Z)], (3.C.15)

where without loss of generality we can take Z to be

Z :=

ZRB 0 00 µ1 00 0 µ2

. (3.C.16)

Then, we find that

Tr[Φ(Y)Z] = Tr[(ΩRB − ρR ⊗ 1B)ZRB] + Tr[ρR]µ1 − Tr[ρR]µ2 (3.C.17)= Tr[ΩRBZRB] + Tr[ρR((µ1 − µ2)1R − TrB[ZRB])], (3.C.18)

from which we conclude that

Φ†(Z) =(

ZRB 00 (µ1 − µ2)1R − TrB[ZRB]

). (3.C.19)

The standard form of the dual SDP from (2.4.2), which is

infZ≥0Tr[CZ] : Φ†(Z) ≥ D, (3.C.20)

then becomes

infµ1≥0,µ2≥0,ZRB≥0

µ1 − µ2 : ZRB ≥ ΓNRB − ΓM

RB, (µ1 − µ2)1R ≥ TrB[ZRB]. (3.C.21)

Now, observe that the variables µ1 and µ2 always appear together in theabove optimization as µ1− µ2, and so can be reduced to the a single real vari-able µ ∈ R. Then, the condition µ1R ≥ TrB[ZRB] implies that µ ≥ 0. Thus, theoptimization in (3.C.21) can be simplified to

infµ≥0,

ZRB≥0

µ : ZRB ≥ ΓNRB − ΓM

RB, µ1R ≥ TrB[ZRB], (3.C.22)

213

Page 225: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

which is precisely (3.5.153). Equality of the primal and dual SDPs is due tostrong duality, which holds for the SDP in (3.5.153) because ZRB = ΓN

RB −ΓM

RB + δ1RB and µ = TrB[ZRB] + δ1R together form a strictly feasible point forall δ > 0 and a feasible point for the primal is ρR = πR and ΩRB = πR ⊗ 1B.

Finally, the equality

infµ≥0,

ZRB≥0

µ : ZRB ≥ ΓNRB − ΓM

RB, µ1R ≥ TrB[ZRB]

= infZRB≥0

‖TrB[ZRB]‖∞ : ZRB ≥ ΓNRB − ΓM

RB (3.C.23)

holds by the expression in (2.4.42) for the Schatten ∞-norm for positive semi-definite operators.

Appendix 3.D SDPs for Fidelity of States and Chan-nels

3.D.1 Proof of Proposition 3.56

First, let us verify that strong duality holds for the primal and dual semi-definite programs in (3.5.25) and (3.5.26), respectively. Consider that X = 0 isa feasible choice for the primal program, while Y = Z = 21 is strictly feasiblefor the dual program. Thus, strong duality holds according to Theorem 2.22.

In order to prove the equality in (3.5.25), we start with the following lemma:

Lemma 3.78

Let P and Q be positive semi-definite operators in L(H), and let X ∈L(H). Then the operator (

P XX† Q

)(3.D.1)

is positive semi-definite if and only if there exists K ∈ L(H) satisfying‖K‖∞ ≤ 1 and X =

√PK√

Q.

PROOF: See Theorem IX.5.9 of Bhatia (1997).

214

Page 226: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

It follows from this lemma that the operators X in the primal optimizationin (3.5.25) can range over X =

√PK√

Q such that K ∈ L(H) and ‖K‖∞ ≤ 1,so that the optimization over X is reduced to an optimization over K. We thenfind that the primal optimal value is given by

12

supX∈L(H)

Tr[X] + Tr[X†] :

(ρ X

X† σ

)≥ 0

=12

supK:‖K‖∞≤1

Tr[√

ρK√

σ] + Tr[√

σK†√ρ] (3.D.2)

= supK:‖K‖∞≤1

Re[Tr[√

ρK√

σ]] (3.D.3)

= supK:‖K‖∞≤1

∣∣Tr[√

ρK√

σ]∣∣ (3.D.4)

=∥∥√ρ√

σ∥∥

1 =√

F(ρ, σ). (3.D.5)

The first equality follows from Lemma 3.78. The third equality follows be-cause we can use the optimization over K to adjust a global phase such thatthe real part is equal to the absolute value (here, one should think of the factthat Re[z] = r cos(θ) for z = reiθ, and then one can optimize the value of θso that Re[z] = r). The final equality follows by a generalization of Proposi-tion 2.9 (in fact the same proof given there implies that the optimization canbe with respect to U satisfying ‖U‖∞ ≤ 1, rather than just with respect toisometries).

We now prove that (3.5.26) is the dual program of (3.5.25). We can rewritethe primal SDP as

12

sup Tr[X] + Tr[X†] (3.D.6)

subject to (ρ 00 σ

)≥(

0 XX† 0

),

(R X

X† S

)≥ 0 (3.D.7)

because R and S are not involved in the objective function and can always bechosen so that the second operator is PSD. Also, the following equivalenceshold

(ρ X

X† σ

)≥ 0 ⇐⇒

(ρ −X−X† σ

)≥ 0 (3.D.8)

⇐⇒(

ρ 00 σ

)≥(

0 XX† 0

). (3.D.9)

215

Page 227: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

As given in (2.4.1) and (2.4.2), the standard forms of primal and dual SDPsfor Hermitian A and B and Hermiticity-preserving map Φ are as follows:

supG≥0Tr[AG] : Φ(G) ≤ B , (3.D.10)

infY≥0

Tr[BY] : Φ†(Y) ≥ A

. (3.D.11)

So the SDP above is in standard form with

G =

(R X

X† S

), A =

(0 1

1 0

), (3.D.12)

Φ(G) =

(0 X

X† 0

), B =

(ρ 00 σ

). (3.D.13)

Setting

Y =

(W VV† Z

), (3.D.14)

the adjoint of Φ is given by

Tr[YΦ(X)] = Tr[(

W VV† Z

)(0 X

X† 0

)](3.D.15)

= Tr[(

VX† WXZX† V†X

)](3.D.16)

= Tr[VX†] + Tr[XV†] (3.D.17)

= Tr[(

0 VV† 0

)(R X

X† S

)], (3.D.18)

so that

Φ†(Y) =(

0 VV† 0

). (3.D.19)

Then the dual is given by

12

inf Tr[(

ρ 00 σ

)(W VV† Z

)](3.D.20)

subject to (0 V

V† 0

)≥(

0 1

1 0

),

(W VV† Z

)≥ 0. (3.D.21)

216

Page 228: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

This simplifies to12

inf Tr[ρW] + Tr[σZ], (3.D.22)

subject to (0 V

V† 0

)≥(

0 1

1 0

),

(W VV† Z

)≥ 0 (3.D.23)

Since(

W VV† Z

)≥ 0 ⇐⇒

(W −V−V† Z

)≥ 0 (3.D.24)

⇐⇒(

W 00 Z

)≥(

0 VV† 0

), (3.D.25)

we find that there is a single condition(

W 00 Z

)≥(

0 VV† 0

)≥(

0 1

1 0

), (3.D.26)

and since V plays no role in the objective function, we can set V = 1. So thefinal SDP simplifies as follows:

12

infW,Z

Tr[ρW] + Tr[σZ] (3.D.27)

subject to (W −1

−1 Z

)≥ 0. (3.D.28)

Using the fact that(

W −1

−1 Z

)≥ 0 ⇐⇒

(W 1

1 Z

)≥ 0 (3.D.29)

we can do one final rewriting as follows:

12

infW,Z

Tr[ρW] + Tr[σZ] (3.D.30)

subject to (W 1

1 Z

)≥ 0. (3.D.31)

217

Page 229: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

3.D.2 Proof of Proposition 3.74

First, strong duality holds, according to Theorem 2.22, because QRB = 0 andλ = 0 is feasible for the primal program, while ρR = 1/|R| and WRB = ZRB =21RB is strictly feasible for the dual.

For a pure bipartite state ψRA, we use (2.2.29) to conclude that

ψRA = XRΓRAX†R, (3.D.32)

where Tr[X†RXR] = 1 to see that

NA→B(ψRA) = XRΓNRBX†

R, MA→B(ψRA) = XRΓMRBX†

R, (3.D.33)

and then plug in to (3.5.26) to get that

√F(NA→B,MA→B) =

12

infWRB,ZRB

Tr[XRΓNRBX†

RWRB] + Tr[XRΓMRBX†

RZRB]

(3.D.34)subject to (

WRB 1RB1RB ZRB

)≥ 0. (3.D.35)

Consider that the objective function can be written as

Tr[ΓNRBW ′RB] + Tr[ΓM

RBZ′RB], (3.D.36)

withW ′RB := X†

RWRBXR, Z′RB := X†RZRBXR (3.D.37)

Now consider that the inequality in (3.D.35) is equivalent to

(XR 00 XR

)† (WRB 1RB1RB ZRB

)(XR 00 XR

)≥ 0. (3.D.38)

(Here we have assumed that XR is invertible, but it suffices to do so for thisoptimization because the set of invertible XR is dense in the set of all possi-ble XR.) Multiplying out the last matrix we find that

(XR 00 XR

)† (WRB 1RB1RB ZRB

)(XR 00 XR

)

=

(X†

RWRBXR X†RXR ⊗ 1B

X†RXR ⊗ 1B X†

RZRBXR

)(3.D.39)

218

Page 230: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

=

(W ′RB ρR ⊗ 1B

ρR ⊗ 1B Z′RB

), (3.D.40)

where we defined ρR = X†RXR. Observing that ρR ≥ 0 and Tr[ρR] = 1, we can

write the final SDP as follows:

√F(NA→B,MA→B) =

12

infρR,WRB,ZRB

Tr[ΓNRBWRB] + Tr[ΓM

RBZRB], (3.D.41)

subject to

ρR ≥ 0, Tr[ρR] = 1,(

WRB ρR ⊗ 1BρR ⊗ 1B ZRB

)≥ 0. (3.D.42)

Now let us calculate the dual SDP to this, using the following standardforms for primal and dual SDPs, with Hermitian operators A and B and aHermiticity-preserving map Φ (as given in (2.4.1) and (2.4.2)):

supX≥0Tr[AX] : Φ(X) ≤ B , inf

Y≥0

Tr[BY] : Φ†(Y) ≥ A

. (3.D.43)

Consider that the constraint in (3.D.42) implies WRB ≥ 0 and ZRB ≥ 0, so thatwe can set

Y =

WRB 0 00 ZRB 00 0 ρR

, B =

ΓNRB 0 00 ΓM

RB 00 0 0

, (3.D.44)

Φ†(Y) =

WRB ρR ⊗ 1B 0 0ρR ⊗ 1B ZRB 0 0

0 0 Tr[ρR] 00 0 0 −Tr[ρR]

, (3.D.45)

A =

0 0 0 00 0 0 00 0 1 00 0 0 −1

. (3.D.46)

Then with

X =

PRB Q†RB 0 0

QRB SRB 0 00 0 λ 00 0 0 µ

(3.D.47)

219

Page 231: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

the map Φ is given by

Tr[Xֆ(Y)]

= Tr

PRB Q†RB 0 0

QRB SRB 0 00 0 λ 00 0 0 µ

WRB ρR ⊗ 1B 0 0ρR ⊗ 1B ZRB 0 0

0 0 Tr[ρR] 00 0 0 −Tr[ρR]

= Tr[PRBWRB] + Tr[Q†RB (ρR ⊗ 1B)] + Tr[QRB(ρR ⊗ 1B)]

+ Tr[SRBZRB] + (λ− µ)Tr[ρR]

= Tr[PRBWRB] + Tr[SRBZRB] + Tr[(TrB[QRB + Q†RB] + (λ− µ) 1R)ρR]

= Tr

PRB 0 00 SRB 00 0 TrB[QRB + Q†

RB] + (λ− µ) 1R

WRB 0 00 ZRB 00 0 ρR

.

(3.D.48)

So then

Φ(X) =

PRB 0 00 SRB 00 0 TrB[QRB + Q†

RB] + (λ− µ) 1R

. (3.D.49)

The primal is then given by

12

sup Tr

0 0 0 00 0 0 00 0 1 00 0 0 −1

PRB Q†RB 0 0

QRB SRB 0 00 0 λ 00 0 0 µ

, (3.D.50)

subject to

PRB 0 00 SRB 00 0 TrB[QRB + Q†

RB] + (λ− µ) 1R

ΓNRB 0 00 ΓM

RB 00 0 0

, (3.D.51)

PRB Q†RB 0 0

QRB SRB 0 00 0 λ 00 0 0 µ

≥ 0, (3.D.52)

which simplifies to12

sup (λ− µ) (3.D.53)

220

Page 232: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

subject to

PRB ≤ ΓNRB, (3.D.54)

SRB ≤ ΓMRB, (3.D.55)

TrB[QRB + Q†RB] + (λ− µ) 1R ≤ 0, (3.D.56)(

PRB Q†RB

QRB SRB

)≥ 0, (3.D.57)

λ, µ ≥ 0. (3.D.58)

We can simplify this even more. We can set λ′ = λ − µ ∈ R, and we cansubstitute QRB with −QRB without changing the value, so then it becomes

12

sup λ′ (3.D.59)

subject to

PRB ≤ ΓNRB, (3.D.60)

SRB ≤ ΓMRB, (3.D.61)

λ′1R ≤ TrB[QRB + Q†RB], (3.D.62)(

PRB −Q†RB

−QRB SRB

)≥ 0, (3.D.63)

λ′ ∈ R. (3.D.64)

We can rewrite(

PRB −Q†RB

−QRB SRB

)≥ 0 ⇐⇒

(PRB Q†

RBQRB SRB

)≥ 0 (3.D.65)

⇐⇒(

PRB 00 SRB

)≥(

0 −Q†RB

−QRB 0

)(3.D.66)

We then have the simplified condition(

0 −Q†RB

−QRB 0

)≤(

PRB 00 SRB

)≤(

ΓNRB 00 ΓM

RB

). (3.D.67)

Since PRB and SRB do not appear in the objective function, we can set them totheir largest value and obtain the following simplification

12

sup λ′ (3.D.68)

221

Page 233: Principles of Quantum Communication Theory: A Modern Approach

Chapter 3: Overview of Quantum Mechanics

subject to

λ′1R ≤ TrB[QRB + Q†RB],

(ΓN

RB Q†RB

QRB ΓMRB

)≥ 0, λ′ ∈ R (3.D.69)

Since a feasible solution is λ′ = 0 and QRB = 0, it is clear that we can restrictto λ′ ≥ 0. After a relabeling, this becomes

12

supλ≥0,QRB

λ : λ1R ≤ TrB[QRB + Q†

RB],(

ΓNRB Q†

RBQRB ΓM

RB

)≥ 0

= supλ≥0,QRB

λ : λ1R ≤ Re[TrB[QRB]],

(ΓN

RB Q†RB

QRB ΓMRB

)≥ 0

. (3.D.70)

This is equivalent to

supQRB

λmin (Re[TrB[QRB]]) :

(ΓN

RB Q†RB

QRB ΓMRB

)≥ 0

. (3.D.71)

This concludes the proof.

222

Page 234: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4

Quantum Entropies andInformation

In this chapter, we introduce various entropic and information quantities thatplay a fundamental role in the analysis of quantum communication proto-cols. Here we see that the notions of entropy and information take manyforms. The most basic and fundamental entropic and information measuresare members of the “von Neumann family,” and these often correspond to op-timal communication rates for information-theoretic tasks in the asymptoticregime of many uses of an independent and identically distributed (i.i.d.) re-source. More refined entropy measures belong to the “Rényi family,” and in-terestingly, in part due to the non-commutativity of quantum states, there areat least two interesting ways of generalizing classical Rényi entropies that aremeaningful for understanding information-theoretic tasks. The Rényi mea-sures reduce to the von Neumann ones in a particular limit, and they are use-ful in characterizing optimal rates of information-theoretic tasks in the non-asymptotic regime, in particular when trying to determine how fast an errorprobability is converging to zero or one (so-called error exponents and strongconverse exponents, respectively). Even more broadly, we define entropymeasures from the meaningful “one-shot” information-theoretic task givenby quantum hypothesis testing. Even though one might debate whether suchan operationally defined quantity is truly an entropy, our opinion is that thisperspective is quite powerful, and so we adopt it in this chapter and the restof the book. Thus, this chapter explores quite broadly a variety of entropicmeasures and their mathematical properties, as they are the basis for analyz-ing a wide variety of quantum information-processing protocols.

223

Page 235: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

We start our development with a brief preview of quantum entropies andinformation (Section 4.1) and then proceed to the well-known quantum rela-tive entropy (Section 4.2), which plays an important role in proofs of achiev-ability for quantum channel capacities. We then proceed to defining the gen-eralized divergence, which is an important concept that plays an importantrole in the proofs of strong converse theorems throughout this book (Sec-tion 4.3). Of particular focus are three prominent examples of generalizeddivergences, the Petz–Rényi relative entropy (Section 4.4), the sandwichedRényi relative entropy (Section 4.5), and the hypothesis testing relative en-tropy (Section 4.9). The first of these plays an important role in proofs ofstrong converses for quantum channel capacities, and the second plays animportant role for both achievability and strong converses.

4.1 Preview

Arguably one of the most important quantities in quantum information the-ory is the quantum generalization of the Shannon entropy, the von Neumannentropy. For every quantum system A in the state ρA ∈ D(HA), the von Neu-mann entropy, which we simply call the quantum entropy from now on, isdefined as

H(ρA) ≡ H(A)ρ := −Tr[ρA log2 ρA]. (4.1.1)

If ρA has a spectral decomposition of the form

ρA =r

∑i=1

λi|ψi〉〈ψi|A, (4.1.2)

where r = rank(ρA), then we can write H(ρA) in terms of the non-zero eigen-values λir

i=1 of ρA as

H(ρA) = −r

∑i=1

λi log2 λi. (4.1.3)

Note that the zero eigenvalues of ρA do not contribute to the entropy due tothe convention that 0 log2 0 = 0, which is taken because limx→0+ x log2 x = 0.By viewing ρA as a probabilistic mixture of the pure states |ψi〉Ar

i=1 of thequantum system A, the quantum entropy quantifies the uncertainty aboutwhich of these pure states the system A is in. In particular, H(ρA) is, in arough sense, the expected information gain upon performing an experimentto determine the state of the system.

224

Page 236: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Note that the right-hand side of (4.1.3) is the formula for the Shannon en-tropy of the probability distribution λii corresponding to the eigenvaluesof ρA. The Shannon entropy of the probability distribution p, 1 − p, forp ∈ [0, 1], shows up frequently and is denoted by h2(p), i.e.,

h2(p) := −p log2 p− (1− p) log2(1− p). (4.1.4)

It is called the binary entropy function.

Just as the Shannon entropy has an operational meaning as the optimalrate of (classical) data compression, the quantum entropy has an operationalinterpretation as the optimal rate of quantum data compression. More pre-cisely, given the state ρ⊗n

A , the quantum entropy H(ρA) is the minimum num-ber of qubits per copy of the state ρA that are needed to faithfully representρ⊗n

A , when n becomes large. This task is also called Schumacher compression.

Other fundamental information-theoretic quantities, which are functionsof the Shannon entropy, have straightforward generalizations to the quantumsetting. Let ρAB be a bipartite state, and let σABC be a tripartite state.

1. The quantum conditional entropy is defined as

H(A|B)ρ := H(AB)ρ − H(B)ρ. (4.1.5)

The quantum conditional entropy quantifies the uncertainty about thestate of the system A in the presence of additional quantum side infor-mation in the form of the quantum system B.

2. The coherent information is defined as

I(A〉B)ρ := H(B)ρ − H(AB)ρ = −H(A|B)ρ, (4.1.6)

and it arises in the context of communication of quantum informationover quantum channels (see Chapter 9). The reverse coherent informationis a trivial modification of the coherent information, defined as

I(B〉A)ρ := H(A)ρ − H(AB)ρ = −H(B|A)ρ, (4.1.7)

and it arises when studying feedback-assisted quantum communication.

3. The quantum mutual information is defined as

I(A; B)ρ := H(A)ρ + H(B)ρ − H(AB)ρ, (4.1.8)= H(A)ρ − H(A|B)ρ (4.1.9)

225

Page 237: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= H(B)ρ − H(B|A)ρ, (4.1.10)

and it arises in the context of communicating classical information overquantum channels (see Chapters 6 and 7).

4. The quantum conditional mutual information is defined as

I(A; B|C)σ := H(A|C)σ + H(B|C)σ − H(AB|C)σ, (4.1.11)

and it is the basis for an entanglement measure called squashed entan-glement (see Chapter 5). An important result is that the quantum con-ditional mutual information is non-negative, i.e., I(A; B|C)σ ≥ 0 for alltripartite states σABC. This inequality goes by the name strong subaddi-tivity of quantum entropy, and we show at the end of Section 4.2 that itfollows from the data-processing inequality for the quantum relative en-tropy (Theorem 4.4).

As it turns out, all of these quantities, including the entropy itself, can bederived from a single parent quantity, the quantum relative entropy, whichwe introduce in the next section.

4.2 Quantum Relative Entropy

We define the quantum relative entropy as follows:

Definition 4.1 Quantum Relative Entropy

For every state ρ and positive semi-definite operator σ, the quantum rel-ative entropy of ρ and σ, denoted by D(ρ‖σ), is defined as

D(ρ‖σ) =

Tr[ρ(log2 ρ− log2 σ)] if supp(ρ) ⊆ supp(σ),+∞ otherwise. (4.2.1)

REMARK: More generally, we could define the quantum relative entropy exactly asabove, but with both arguments being positive semi-definite operators. For our purposesin this book, however, it suffices to consider the first argument of the quantum relativeentropy to be a state.

The quantum relative entropy is a particular quantum generalization ofthe classical Kullback–Leibler divergence or classical relative entropy, which for

226

Page 238: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

two probability distributions p, q : X→ [0, 1] defined on a finite alphabet X isgiven by

D(p‖q) = ∑x∈X

p(x) log2

(p(x)q(x)

). (4.2.2)

The quantum relative entropy has an operational meaning in terms of the taskof quantum hypothesis testing, as is shown below in Section 4.10 on the quan-tum Stein’s lemma (Theorem 4.75). The quantum relative entropy D(ρ‖σ) canalso be interpreted as a distinguishability measure for the quantum states ρand σ, in part due to the facts that D(ρ‖σ) ≥ 0 and D(ρ‖σ) = 0 if and only ifρ = σ, which is shown in Proposition 4.3 below.

The support condition supp(ρ) ⊆ supp(σ) in the definition of the quan-tum relative entropy essentially has to do with the term Tr[ρ log2 σ] and thefact that the logarithm of an operator is really only well defined for positivedefinite operators, while we allow for σ to be positive semi-definite, whichmeans that it could have eigenvalues equal to zero. Recall that the expressionρ log2 ρ is well defined even for states with eigenvalues equal to zero sincewe set 0 log2 0 = 0. We justified this with the fact that limx→0+ x log2 x = 0.We can similarly make sense of the support condition in the definition of thequantum relative entropy by using the following fact.

Proposition 4.2

For every state ρ and positive semi-definite operator σ,

D(ρ‖σ) = limε→0+

Tr[ρ(log2 ρ− log2(σ + ε1))]. (4.2.3)

Consequently, whenever σ does not have full support (i.e., it is positivesemi-definite as opposed to positive definite), we can define D(ρ‖σ) asthe following limit:

D(ρ‖σ) = limε→0+

D(ρ‖σ + ε1). (4.2.4)

PROOF: Observe that, for all ε > 0, the operator σ + ε1 has full support; i.e.,supp(σ + ε1) = H for all ε > 0, where H is the underlying Hilbert space. Thismeans that the quantity

Tr[ρ log2(σ + ε1)] (4.2.5)

is finite for all ε > 0. Now, let us decompose the Hilbert space H into thedirect sum of the orthogonal subspaces supp(σ) and ker(σ), so that H =

227

Page 239: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

supp(σ) ⊕ ker(σ). Let Πσ be the projection onto supp(σ) and Π⊥σ the pro-jection onto ker(σ). Then with respect to this decomposition, the operators ρand σ can be written as the block matrices

ρ =

(ΠσρΠσ ΠσρΠ⊥σΠ⊥σ ρΠσ Π⊥σ ρΠ⊥σ

)≡(

ρ0,0 ρ0,1ρ†

0,1 ρ1,1

),

σ =

(ΠσσΠσ ΠσσΠ⊥σΠ⊥σ σΠσ Π⊥σ σΠ⊥σ

)=

(σ 00 0

).

(4.2.6)

Now, let us first suppose that supp(ρ) ⊆ supp(σ). This means that ρ0,0 =

ρ and that ρ0,1 = 0 and ρ1,1 = 0. Now, using the fact that 1H =

(Πσ 00 Π⊥σ

),

we can write the second term in (4.2.3) as

Tr[ρ log2(σ + ε1)] = Tr[(

ρ0,0 00 0

)log2

(σ + εΠσ 0

0 εΠ⊥σ

)](4.2.7)

= Tr[(

ρ0,0 00 0

)(log2(σ + εΠσ) 0

0 log2(εΠ⊥σ )

)](4.2.8)

= Tr[ρ0,0 log2(σ + εΠσ)]. (4.2.9)

Therefore,

limε→0+

Tr[ρ log2(σ + ε1)] = Tr[ρ0,0 log2(σ + εΠσ)] = Tr[ρ log2 σ], (4.2.10)

which means that

limε→0+

Tr[ρ(log2 ρ− log2(σ + ε1))] = limε→0+

Tr[ρ(log2 ρ− log2 σ)] (4.2.11)

whenever supp(ρ) ⊆ supp(σ).

If supp(ρ) * supp(σ), then the blocks ρ0,1 and ρ1,1 of ρ are generally non-zero, and we obtain

Tr[ρ log2(σ + ε1)]

= Tr[(

ρ0,0 ρ0,1ρ†

0,1 ρ1,1

)(log2(σ + εΠσ) 0

0 log2(εΠ⊥σ )

)](4.2.12)

= Tr[ρ0,0 log2(σ + εΠσ)] + Tr[ρ1,1 log2(εΠ⊥σ )]. (4.2.13)

Then, using the fact that limε→0+(− log2 ε) = +∞, we find that

228

Page 240: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

limε→0+

Tr[ρ(log2 ρ− log2(σ + ε1))] = Tr[ρ log2 ρ]

− limε→0+

Tr[ρ0,0 log2(σ + εΠσ)]− limε→0+

Tr[ρ1,1 log2(εΠ⊥σ )], (4.2.14)

and since

− limε→0+

Tr[ρ1,1 log2(εΠ⊥σ )] = limε→0+

(− log2 ε)Tr[ρ1,1Π⊥σ ] = +∞, (4.2.15)

we have that

limε→0+

Tr[ρ(log2 ρ− log2(σ + ε1))]

=

Tr[ρ(log2 ρ− log2 σ)] if supp(ρ) ⊆ supp(σ),+∞ otherwise.

= D(ρ‖σ),

(4.2.16)

as required. This means that whenever σ does not have full support (i.e., it ispositive semi-definite as opposed to positive definite), we can define D(ρ‖σ)as D(ρ‖σ) = limε→0+ D(ρ‖σ + ε1).

The proposition above, in particular the fact that we can write the quan-tum relative entropy as D(ρ‖σ) = limε→0+ D(ρ‖σ + ε1), allows to take thelogarithm log2(σ) of σ only on the support of σ when determining the quan-tum relative entropy. We can use this fact to write another formula for thequantum relative entropy. We start by writing a spectral decomposition ofthe state ρ and positive semi-definite operator σ. In particular, let

ρ =d

∑j=1

pj|ψj〉〈ψj| =rank(ρ)

∑j=1

pj|ψj〉〈ψj|, (4.2.17)

σ =d

∑k=1

qk|φk〉〈φk| =rank(σ)

∑k=1

qk|φk〉〈φk|, (4.2.18)

be spectral decompositions of ρ and σ, where d = dim(H) and in the secondequality we have restricted the sum to only those eigenvalues that are non-zero. Then,

D(ρ‖σ) =rank(ρ)

∑j=1

pj log2(pj)

− Tr

[(rank(ρ)

∑j=1

pj|ψj〉〈ψj|)(

rank(σ)

∑k=1

log2(qk)|φk〉〈φk|)]

(4.2.19)

229

Page 241: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

=rank(ρ)

∑j=1

pj log2(pj)−rank(ρ)

∑j=1

rank(σ)

∑k=1

∣∣⟨ψj|φk⟩∣∣2 pj log2(qk) (4.2.20)

=rank(ρ)

∑j=1

[pj log2(pj)−

rank(σ)

∑k=1

∣∣⟨ψj|φk⟩∣∣2 pj log2(qk)

]. (4.2.21)

Now, using the fact that the eigenvectors |φk〉 : 1 ≤ k ≤ d form a completeorthonormal basis for H, so that 1H = ∑d

k=1 |φk〉〈φk|, we conclude that

1 =⟨ψj|ψj

⟩=

d

∑k=1

⟨ψj|φk

⟩ ⟨φk|ψj

⟩=

rank(σ)

∑k=1

∣∣⟨ψj|φk⟩∣∣2 , (4.2.22)

for 1 ≤ j ≤ rank(ρ), where to obtain the last equality we used the assumptionthat supp(ρ) ⊆ supp(σ), which implies that the eigenvectors |ψj〉 : 1 ≤ j ≤rank(ρ) of ρ can be expressed as a linear combination of the eigenvectors|φk〉 : 1 ≤ k ≤ rank(σ) of σ. Therefore,

D(ρ‖σ) =rank(ρ)

∑j=1

rank(σ)

∑k=1

∣∣⟨ψj|φk⟩∣∣2pj log2

( pj

qk

), (4.2.23)

whenever supp(ρ) ⊆ supp(σ).

We now detail some important mathematical properties of the quantumrelative entropy.

Proposition 4.3 Basic Properties of the Quantum Relative Entropy

The quantum relative entropy satisfies the following properties for allstates ρ, ρ1, ρ2 and positive semi-definite operators σ, σ1, σ2:

1. Isometric invariance: For every isometry V,

D(VρV†‖VσV†) = D(ρ‖σ). (4.2.24)

2. (a) Klein’s inequality: If Tr(σ) ≤ 1, then D(ρ‖σ) ≥ 0.

(b) Faithfulness: D(ρ‖σ) = 0 if and only if ρ = σ.

(c) If ρ ≤ σ, then D(ρ‖σ) ≤ 0.

(d) If σ ≤ σ′, then D(ρ‖σ) ≥ D(ρ‖σ′).

230

Page 242: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

3. Additivity:

D(ρ1 ⊗ ρ2‖σ1 ⊗ σ2) = D(ρ1‖σ1) + D(ρ2‖σ2). (4.2.25)

As a special case, for all β ∈ (0, ∞),

D(ρ‖βσ) = D(ρ‖σ) + log2

(1β

). (4.2.26)

4. Direct-sum property: Let p : X → [0, 1] be a probability distributionover a finite alphabet X with associated |X|-dimensional system X,and let q : X→ [0, ∞) be a non-negative function on X. Let ρx

Ax∈Xbe a set of states on a system A, and let σx

Ax∈X be a set of positivesemi-definite operators on A. Then,

D(ρXA‖σXA) = D(p‖q) + ∑x∈X

p(x)D(ρxA‖σx

A). (4.2.27)

where

ρXA := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, (4.2.28)

σXA := ∑x∈X

q(x)|x〉〈x|X ⊗ σxA. (4.2.29)

REMARK: If we let the first argument of the relative entropy be a general positive semi-definite operator instead of just a state, then the special case (4.2.26) of the additivity prop-erty can be written for every α, β ∈ (0, ∞) as

D(αρ‖βσ) = αD(ρ‖σ) + α log2

β

). (4.2.30)

PROOF:

1. Proof of isometric invariance: When supp(ρ) * supp(σ), there is nothingto prove because supp(VρV†) * supp(VσV†), which means that bothD(VρV†‖VσV†) and D(ρ‖σ) are equal to +∞.

Suppose that supp(ρ) ⊆ supp(σ), which implies that supp(VρV†) ⊆supp(VσV†). We use the formula in (4.2.23) to see that if we let ρ and σhave the spectral decompositions as in (4.2.17) and (4.2.18), respectively,

231

Page 243: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

then we find that

D(VρV†‖VσV†) =rank(ρ)

∑j=1

rank(σ)

∑k=1

∣∣∣〈ψj|V†V|φk〉∣∣∣2

pj log2

( pj

qk

)

=rank(ρ)

∑j=1

rank(σ)

∑k=1

∣∣⟨ψj|φk⟩∣∣2 pj log2

( pj

qk

)

= D(ρ‖σ).

(4.2.31)

We used the fact that V is an isometry, i.e., satisfying V†V = 1.

2. (a) Proof of Klein’s inequality: The result is trivial in the case that supp(ρ) *supp(σ), and so we assume that supp(ρ) ⊆ supp(σ). We can writethe relative entropy as in (4.2.21),

D(ρ‖σ) =d

∑j=1

[pj log2(pj)−

d

∑k=1

∣∣⟨ψj|φk⟩∣∣2 pj log2(qk)

], (4.2.32)

where we have extended the sums to include all terms up to d =

dim(H). Now, let cj,k ≡∣∣⟨ψj|φk

⟩∣∣2, and observe that

cj,k ≥ 0 ∀ 1 ≤ j, k ≤ d, andd

∑k=1

cj,k = 1 ∀ 1 ≤ j ≤ d. (4.2.33)

The latter indeed holds because

d

∑k=1

cj,k =d

∑k=1

⟨ψj|φk

⟩ ⟨φk|ψj

⟩(4.2.34)

= 〈ψj|(

d

∑k=1|φk〉〈φk|

)

︸ ︷︷ ︸1

|ψj〉 (4.2.35)

=⟨ψj|ψj

⟩(4.2.36)

= 1. (4.2.37)

Therefore, for each j, cj,k : 1 ≤ k ≤ d is a probability distributionover k. Using the concavity of the function log2, we thus obtain

d

∑k=1

cj,k log2(qk) ≤ log2

(d

∑k=1

cj,kqk

)= log2(rj) (4.2.38)

232

Page 244: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

for all 1 ≤ j ≤ d, where

rj :=d

∑k=1

cj,kqk. (4.2.39)

Therefore, we obtain

D(ρ‖σ) =d

∑j=1

pj log2(pj)−d

∑j=1

pj

(d

∑k=1

cj,k log2(qk)

)(4.2.40)

≥d

∑j=1

pj log2

(pj

rj

)(4.2.41)

= −d

∑j=1

pj log2

(rj

pj

). (4.2.42)

Now, we make use of the fact that

− log2(x) ≥ x− 1ln(2)

∀ x > 0, (4.2.43)

with equality if and only if x = 1. This fact can be readily verified byelementary calculus. Using this, we obtain

D(ρ‖σ) ≥ 1ln(2)

d

∑j=1

pj

(1− rj

pj

)(4.2.44)

=1

ln(2)

d

∑j=1

pj −1

ln(2)

d

∑j=1

rj. (4.2.45)

Now, ∑dj=1 pj = Tr(ρ) = 1, and since

d

∑j=1

cj,k = 〈φk|(

d

∑j=1|ψj〉〈ψj|

)

︸ ︷︷ ︸1

|φk〉 = 1, (4.2.46)

we obtaind

∑j=1

rj =d

∑k=1

qk = Tr(σ). (4.2.47)

233

Page 245: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Therefore,

D(ρ‖σ) ≥ 1ln(2)

(1− Tr(σ)) ≥ 0, (4.2.48)

as required, where the last inequality holds by the assumption thatTr(σ) ≤ 1.

(b) We are now interested in the case of equality in the statement D(ρ‖σ)≥ 0 that we just proved in (a). In that proof, we made use of twoinequalities. The first was in (4.2.38) where we made use of the con-cavity of the log function. Equality holds in (4.2.38) if and only if foreach j there exists k such that cj,k = 1. The second inequality we usedwas in (4.2.43), where equality holds if and only if x = 1. Therefore,equality holds in (4.2.44) if and only if pj = rj for all j, and equalityin (4.2.38) is true if and only if the eigenvectors of ρ and σ are, up torelabeling, the same. Therefore D(ρ‖σ) = 0 if and only if pj = qj forall j and the corresponding eigenvectors are (up to relabeling) equal,which is true if and only if ρ = σ.

(c) Suppose that both ρ and σ are positive definite. Since the log func-tion is operator monotone (see Section 2.2.1.1), the operator inequal-ity ρ ≤ σ implies that log2(ρ) ≤ log2(σ). This implies the inequal-ity ρ

12 log2(ρ)ρ

12 ≤ ρ

12 log2(σ)ρ

12 , which implies that Tr[ρ log2(ρ)] ≤

Tr[ρ log2(σ)], proving the result. In the case that ρ and/or σ are notpositive definite, we first apply the result to the positive definite state(1− δ)ρ + δπ and the positive definite operator σ + ε1, with δ, ε > 0,so that D((1− δ)ρ + δπ‖σ + ε1) ≤ 0. Then, using

limε→0+

limδ→0+

D((1− δ)ρ + δπ‖σ + ε1) = D(ρ‖σ), (4.2.49)

we obtain the desired result.

(d) As in (c), first suppose that ρ, σ, and σ′ are positive definite. Sincethe log function is operator monotone, the operator inequality σ′ ≥ σimplies that log2(σ

′) ≥ log2(σ), which implies that Tr[ρ log2(σ′)] ≥

Tr[ρ log2 σ]. Therefore,

D(ρ‖σ) = Tr[ρ(log2 ρ− log2 σ)] (4.2.50)≥ Tr[ρ(log2 ρ− log2 σ′)] (4.2.51)= D(ρ‖σ′), (4.2.52)

as required. In the general case that the operators are not positivedefinite, as in (c) we apply the result to the positive definite operators

234

Page 246: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

(1− δ)ρ+ δπ, σ+ ε1, and σ′+ ε′1, for δ, ε, ε′ ≥ 0, and then use (4.2.49)to obtain the result.

3. Proof of additivity: Since supp(ρ1⊗ ρ2) = supp(ρ1)⊗ supp(ρ2) and supp(σ1⊗σ2) = supp(σ1) ⊗ supp(σ2), the condition supp(ρ1 ⊗ ρ2) * supp(σ1 ⊗σ2) is equivalent to the condition supp(ρ1) * supp(σ1) or supp(ρ2) *supp(σ2). Therefore, D(ρ1 ⊗ ρ2‖σ1 ⊗ σ2) = +∞ and D(ρ1‖σ1) = +∞ orD(ρ2‖σ2) = +∞ if one of the support conditions is violated. Now sup-pose that supp(ρ1⊗ ρ2) ⊆ supp(σ1⊗ σ2). Letting ρ1 and ρ2 have spectraldecompositions

ρ1 =d

∑j=1

p1j |ψ1

j 〉〈ψ1j |, ρ2 =

d

∑k=1

p2k|ψ2

k〉〈ψ2k |, (4.2.53)

we find that

log2(ρ1 ⊗ ρ2) = log2

(d

∑j,k=1

p1j |ψ1

j 〉〈ψ1j | ⊗ p2

k|ψ2k〉〈ψ2

k |)

(4.2.54)

=d

∑j,k=1

log2(p1j p2

k)|ψ1j 〉〈ψ1

j | ⊗ |ψ2k〉〈ψ2

k | (4.2.55)

=d

∑j,k=1

log2(p1j )|ψ1

j 〉〈ψ1j | ⊗ |ψ2

k〉〈ψ2k | (4.2.56)

+d

∑j,k=1

log2(p2k)|ψ1

j 〉〈ψ1j | ⊗ |ψ2

k〉〈ψ2k | (4.2.57)

= log2(ρ1)⊗ 1 + 1⊗ log2(ρ2). (4.2.58)

Similarly, log2(σ1 ⊗ σ2) = log2(σ1)⊗ 1 + 1⊗ log2(σ2). Therefore,

D(ρ1 ⊗ ρ2‖σ1 ⊗ σ2)

= Tr[(ρ1 ⊗ ρ2)(log2(ρ1)⊗ 1 + 1⊗ log2(ρ2)]

− Tr[(ρ1 ⊗ ρ2)(log2(σ1)⊗ 1 + 1⊗ log2(σ2)] (4.2.59)= Tr(ρ2) (Tr[ρ1 log2(ρ1)]− Tr[ρ1 log2(σ1)])

− Tr(ρ1) (Tr[ρ2 log2(ρ2)]− Tr[ρ2 log2(σ2)]) (4.2.60)= D(ρ1‖σ1) + D(ρ2‖σ2). (4.2.61)

Then, to see (4.2.26), let ρ = ρ1, α = ρ2 = 1, σ = σ1, and β = σ2.Recognizing that the tensor product with a scalar is just multiplication

235

Page 247: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

by the scalar, we find that

D(ρ‖βσ) = D(ρ‖σ) + D(1‖β)= D(ρ‖σ) + (log2(1)− log2 β)

= D(ρ‖σ) + log2

(1β

).

(4.2.62)

4. Proof of the direct-sum property: Define the classical–quantum operators

ρXA = ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, σXA = ∑

x∈Xq(x)|x〉〈x|X ⊗ σx

A. (4.2.63)

Observe that

log2 ρXA = ∑x∈X|x〉〈x|X ⊗ log2(p(x)ρx

A) (4.2.64)

= ∑x∈X|x〉〈x|X ⊗ log2 p(x)1A + ∑

x∈X|x〉〈x|X ⊗ log2 ρx

A, (4.2.65)

log2 σXA = ∑x∈X|x〉〈x|X ⊗ log2(q(x)σx

A) (4.2.66)

= ∑x∈X|x〉〈x|X ⊗ log2 q(x)1A + ∑

x∈X|x〉〈x|X ⊗ log2 σx

A. (4.2.67)

Then, in the case supp(ρXA) ⊆ supp(σXA), we obtain

D(ρXA‖σXA) = Tr[ρXA log2 ρXA]− Tr[ρXA log2 σXA] (4.2.68)

= Tr

[∑

X∈Xp(x) log2(p(x))|x〉〈x|X ⊗ ρx

A

+ ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA log2 ρx

A

]

− Tr

[∑

x∈Xp(x) log2(q(x))|x〉〈x|X ⊗ ρx

A

+ ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA log2 σx

A

](4.2.69)

= ∑x∈X

[p(x) log2 p(x)− p(x) log2 q(x)]

+ ∑x∈X

p(x)Tr[ρxA log2 ρx

A − ρxA log2 σx

A] (4.2.70)

236

Page 248: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= D(p‖q) + ∑x∈X

p(x)D(ρxA‖σx

A), (4.2.71)

as required.

An important consequence of Klein’s inequality from the proposition abo-ve is that

D(ρ‖σ) ≥ 0 for all states ρ, σ, andD(ρ‖σ) = 0 if and only if ρ = σ.

(4.2.72)

This allows us to use the quantum relative entropy as a distinguishabilitymeasure for quantum states. We emphasize, however, that the quantum rela-tive entropy is not a metric in the mathematical sense since it is neither sym-metric in its two arguments nor satisfies the triangle inequality.

We now come to one of the most important properties of the quantum rel-ative entropy that is used frequently throughout this book: the data-processinginequality. It is also called the monotonicity of the quantum relative entropy.

Theorem 4.4 Data-Processing Inequality for Quantum Relative En-tropy

Let ρ be a state, σ a positive semi-definite operator, and N a quantumchannel. Then,

D(ρ‖σ) ≥ D(N(ρ)‖N(σ)). (4.2.73)

In other words, the quantum relative entropy D(ρ‖σ) can only decreaseor stay the same if we apply the same quantum channel N to the states ρ andσ. When the quantum relative entropy is interpreted as a distinguishabilitymeasure on quantum states, the monotonicity property tells us that the dis-tinguishability of two quantum states cannot increase when we process themthrough a quantum channel; see Figure 4.1. We postpone the proof of thedata-processing inequality theorem to later in the chapter, where it followseasily as a consequence of the data-processing inequality for the Petz–Rényiand sandwiched Rényi relative entropies.

It is typically interesting and illuminating to investigate the conditionsunder which an important inequality is saturated. The data-processing in-equality for quantum relative entropy is no exception. In the case that theaction of the quantum channel can be reversed, so that there exists a recovery

237

Page 249: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

ρ

σ

D(ρ‖σ)

N (ρ)

N (σ)

D(N (ρ)‖N (σ))

N

N

FIGURE 4.1: Illustration of the data-processing inequality for the quan-tum relative entropy (Theorem 4.4). The quantum states ρ, σ, N(ρ), andN(σ) are represented by spheres that signify the amount of “space” theyoccupy in the Hilbert space. While the states ρ and σ are nearly perfectlydistinguishable as depicted, since their spheres do not overlap, after pro-cessing with the channel N, the states become much less distinguishablebecause the spheres overlap significantly.

channel R satisfying

ρ = (R N)(ρ), σ = (R N)(σ), (4.2.74)

then it follows from an application of Theorem 4.4 that

D(N(ρ)‖N(σ)) ≥ D((R N)(ρ)‖(R N)(σ)) (4.2.75)= D(ρ‖σ). (4.2.76)

Thus, by combining with Theorem 4.4, we in fact have that

D(ρ‖σ) = D(N(ρ)‖N(σ)), (4.2.77)

so that the existence of a recovery channel is sufficient for saturation of theinequality in Theorem 4.4.

On the other hand, suppose that ρ, σ, and N are such that the inequalityin Theorem 4.4 is saturated:

D(ρ‖σ) = D(N(ρ)‖N(σ)), (4.2.78)

Then it is a non-trivial result that there exists a recovery channel R such thatthe equality in (4.2.74) holds. In fact, this channel can be taken as the Petz

238

Page 250: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

recovery channel from Definition 3.36. We do not provide a proof here andinstead point to the Bibliographic Notes in Section 4.13 for more details.

One of the remarkable aspects of the data-processing inequality for thequantum relative entropy is that it alone can be used to prove many of theproperties of the quantum relative entropy stated in Proposition 4.3. For ex-ample, Klein’s inequality follows by considering the trace channel Tr, so thatfor every state ρ and positive semi-definite operator σ such that Tr[σ] ≤ 1, wefind that

D(ρ‖σ) ≥ D(Tr(ρ)‖Tr(σ)) = Tr(ρ) log2

(Tr(ρ)Tr(σ)

)(4.2.79)

= log2

(1

Tr(σ)

)≥ 0. (4.2.80)

Another important fact that follows from the data-processing inequalityfor quantum relative entropy is its joint convexity.

Proposition 4.5 Joint Convexity of Quantum Relative Entropy

Let p : X → [0, 1] be a probability distribution over a finite alphabet Xwith associated |X|-dimensional system X, let ρx

Ax∈X be a set of stateson a system A, and let σx

Ax∈X be a set of positive semi-definite opera-tors on A. Then,

∑x∈X

p(x)D(ρxA‖σx

A) ≥ D

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥ ∑x∈X

p(x)σxA

). (4.2.81)

PROOF: Define the classical–quantum state and operator, respectively, as

ρXA := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, σXA := ∑

x∈Xp(x)|x〉〈x|X ⊗ σx

A. (4.2.82)

By the direct-sum property of the quantum relative entropy (Proposition 4.3),we find that

D(ρXA‖σXA) = ∑x∈X

p(x)D(ρxA‖σx

A). (4.2.83)

Then, since we have ρA = TrX[ρXA] = ∑x∈X p(x)ρxA and σA = TrX[σXA] =

∑x∈X p(x)σxA, we apply the data-processing inequality for quantum relative

entropy with respect to the partial trace channel TrX to obtain

∑x∈X

p(x)D(ρxA‖σx

A) = D(ρXA‖σXA) (4.2.84)

239

Page 251: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

≥ D(TrX(ρXA)‖TrX(σXA)) (4.2.85)

= D

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥ ∑x∈X

p(x)σxA

), (4.2.86)

which is the desired joint convexity of quantum relative entropy.

4.2.1 Information Measures from Quantum Relative Entropy

As stated at the beginning of this chapter, the quantum relative entropy acts,as in the classical case, as a parent quantity to all of the fundamental information-theoretic quantities based on the quantum entropy. Indeed, using the proper-ties of the quantum relative entropy stated previously, it is straightforward toverify the following:

1. The quantum entropy H(ρ) of a state ρ is given by

H(ρ) = −D(ρ‖1). (4.2.87)

2. The quantum conditional entropy H(A|B)ρ of a bipartite state ρAB isgiven by

H(A|B)ρ = −D(ρAB‖1A ⊗ ρB) (4.2.88)= − inf

σB∈D(HB)D(ρAB‖1A ⊗ σB), (4.2.89)

and the coherent information I(A〉B)ρ of a bipartite state ρAB is given by

I(A〉B)ρ = D(ρAB‖1A ⊗ ρB) (4.2.90)= inf

σB∈D(HB)D(ρAB‖1A ⊗ σB). (4.2.91)

Observe thatH(A|B)ρ = −I(A〉B)ρ (4.2.92)

for every bipartite state ρAB. Similarly, we can write the reverse coherentinformation as

I(B〉A)ρ = D(ρAB‖ρA ⊗ 1B) (4.2.93)= inf

σA∈D(HA)D(ρAB‖σA ⊗ 1B) (4.2.94)

240

Page 252: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

3. The quantum mutual information I(A; B)ρ of a bipartite state ρAB is givenby

I(A; B)ρ = D(ρAB‖ρA ⊗ ρB) (4.2.95)= inf

σB∈D(HB)D(ρAB‖ρA ⊗ σB) (4.2.96)

= infτA∈D(HA)

D(ρAB‖τA ⊗ ρB) (4.2.97)

= infτA∈D(HA)σB∈D(HB)

D(ρAB‖τA ⊗ σB). (4.2.98)

4. The quantum conditional mutual information I(A; B|C)ρ of a tripartitestate ρABC is given by

I(A; B|C)ρ = D(ρABC‖σABC), (4.2.99)

where

σABC := 2log2(ρAC⊗1B)+log2(ρBC⊗1A)−log2(1AB⊗ρC). (4.2.100)

The expressions in (4.2.89), (4.2.91), and (4.2.96), in terms of an optimiza-tion of the conditional entropy, coherent information, and mutual informa-tion, respectively, are useful for defining information-theoretic quantities anal-ogous to these ones in the context of generalized divergences, which we in-troduce in the next section. The equalities in (4.2.89), (4.2.91), and (4.2.96) canbe verified using the fact that

D(ρAB‖τA ⊗ ρB) + D(τA ⊗ ρB‖τA ⊗ σB) = D(ρAB‖τA ⊗ σB), (4.2.101)

which holds for all states τA and σB, along with the fact that

D(τA ⊗ ρB‖τA ⊗ σB) ≥ 0 (4.2.102)

for all states τA and σB, by Klein’s inequality as stated in (4.2.72). LettingτA = πA leads to (4.2.89) and (4.2.91), while τA = ρA leads to (4.2.96).

The properties of the quantum relative entropy, such as the ones statedin Propositions 4.3 and 4.5, can be directly translated to properties of the de-rived information measures stated above. Some of these properties are usedfrequently throughout the book, and so we state them here for convenience.They are straightforward to verify using definitions and properties of thequantum relative entropy.

241

Page 253: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

• Additivity of the quantum entropy for product states ρ and τ:

H(ρ⊗ τ) = H(ρ) + H(τ). (4.2.103)

• Isometric invariance of the quantum entropy for a state ρ and an isometry V:

H(ρ) = H(VρV†). (4.2.104)

• Concavity of the quantum entropy: The joint convexity of the quantum rel-ative entropy, as stated in Proposition 4.5, and the identity in (4.2.87) im-ply that the quantum entropy is concave in its input: if p : X → [0, 1] isa probability distribution over a finite alphabet X and ρx

Ax∈X is a set ofstates on a system A, then

H

(∑

x∈Xp(x)ρx

A

)≥ ∑

x∈Xp(x)H(ρx

A). (4.2.105)

By taking a spectral decomposition of a state ρ, applying the concavityinequality above, and the fact that the entropy of a pure state is equal tozero, we conclude that the quantum entropy is non-negative for everystate ρ:

H(ρ) ≥ 0. (4.2.106)

By employing the mixing property of the Heisenberg–Weyl unitaries from(3.2.122), the invariance of entropy under a unitary, its concavity, and thefact that the entropy of the maximally mixed state πA is equal to log2 dA,we conclude the following dimension bound for the entropy of a state ofsystem A:

H(ρ) ≤ log2 dA. (4.2.107)

• Direct-sum property of the quantum entropy: The direct-sum property of thequantum relative entropy (see Proposition 4.3) translates to the followingfor the quantum entropy: If p : X → [0, 1] is a probability distributionover a finite alphabet X with associated |X|-dimensional system X andρx

Ax∈X is a set of states on a system A, then

H

(∑

x∈Xp(x)|x〉〈x|X ⊗ ρx

A

)= H(p) + ∑

x∈Xp(x)H(ρx

A), (4.2.108)

where H(p) := −∑x∈X p(x) log2 p(x) is the (classical) Shannon entropyof the probability distribution p.

242

Page 254: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

• Chain rule for quantum mutual information: For every state ρABC, the fol-lowing equality holds

I(A; BC)ρ = I(A; B)ρ + I(A; C|B)ρ. (4.2.109)

We call this the chain rule because it can be interpreted as saying thatthe correlations between A and BC can be built up by first establishingcorrelations between A and B (signified by I(A; B)ρ), then establishingcorrelations between A and C, given the correlations with B (signified byI(A; C|B)ρ).

4.2.2 Quantum Conditional Mutual Information

We now develop in detail some properties of the quantum conditional mutualinformation that we use later in the book.

We start with the proof of the strong subadditivity property of the quan-tum entropy, which we recall from (4.1.11) is the statement that

I(A; B|C)ρ = H(A|C)ρ + H(B|C)ρ − H(AB|C)ρ ≥ 0 (4.2.110)

for every state ρABC.

Theorem 4.6 Strong Subadditivity of Quantum Entropy

For every state ρABC, the following inequality holds

I(A; B|C)ρ ≥ 0. (4.2.111)

PROOF: The key to proving this result is the data-processing inequality forthe quantum relative entropy, along with the expression for the quantum con-ditional entropy in (4.2.88). We start by using the definition of the quantumconditional entropy to rewrite the quantum conditional mutual informationdefined in (4.1.11) as

I(A; B|C)ρ = H(A|C)ρ + H(B|C)ρ − H(AB|C)ρ (4.2.112)= H(B|C)ρ − H(B|AC)ρ. (4.2.113)

Then, using the expression in (4.2.88) for the quantum conditional entropy interms of the quantum relative entropy, we find that

I(A; B|C)ρ = D(ρABC‖1B ⊗ ρAC)− D(ρBC‖1B ⊗ ρC). (4.2.114)

243

Page 255: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Finally, observe that by the data-processing inequality for the quantum rela-tive entropy with respect to the partial trace channel TrA we obtain

D(ρABC‖1B ⊗ ρAC) ≥ D(TrA[ρABC]‖1B ⊗ TrA[ρAC]) (4.2.115)= D(ρBC‖1B ⊗ ρC), (4.2.116)

which immediately implies that I(A; B|C)ρ ≥ 0, as required.

A direct consequence of strong subadditivity is that the conditional en-tropy H(A|B)σ is non-negative for every separable state σAB. Recall fromDefinition 3.8 that σAB is separable if it can be written as

σAB = ∑x∈X

p(x)τxA ⊗ωx

B, (4.2.117)

where p : X → [0, 1] is a probability distribution over a finite alphabet X andτx

Ax∈X and ωxBx∈X are sets of states. Defining the extension σXAB of σXAB

as∑

x∈Xp(x)|x〉〈x|X ⊗ τx

A ⊗ωxB, (4.2.118)

we then conclude that I(A; X|B)σ ≥ 0, which implies the desired inequality:

H(A|B)σ ≥ H(A|BX)σ (4.2.119)

= ∑x∈X

p(x)H(A|B)τx⊗ωx (4.2.120)

= ∑x∈X

p(x)H(A)τx ≥ 0. (4.2.121)

Another direct consequence is that the conditional entropy is concave:

H(A|B)ρ ≥ ∑x∈X

p(x)H(A|B)ρx , (4.2.122)

where X is a finite alphabet, p : X → [0, 1] is a probability distribution,ρx

ABx∈X is a set of bipartite states, and ρ := ∑x∈X p(x)ρxAB. This follows

directly by constructing the following classical–quantum state ρXAB:

ρXAB = ∑x∈X

p(x)|x〉〈x| ⊗ ρxAB, (4.2.123)

applying strong subadditivity H(A|B)− H(A|BX)ρ = I(A; X|B)ρ ≥ 0, andobserving that

H(A|BX)ρ = ∑x∈X

p(x)H(A|B)ρx . (4.2.124)

244

Page 256: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

By applying (4.2.92), we conclude from (4.2.122) that coherent information isconvex:

I(A〉B)ρ ≤ ∑x∈X

p(x)I(A〉B)ρx . (4.2.125)

We now prove some other properties of the quantum conditional mutualinformation that recur throughout the book.

Proposition 4.7 Properties of Quantum Conditional Mutual Infor-mation

The quantum conditional mutual information has the following proper-ties:

1. Symmetry: For every state ρABC, we have I(A; B|C)ρ = I(B; A|C)ρ.

2. Local entropy and dimension bounds: For every state ρABC, the follow-ing bounds hold

I(A; B|C)ρ ≤ 2 minH(A)ρ, H(B)ρ (4.2.126)≤ 2 log2(mindA, dB). (4.2.127)

Let σXBC be a classical–quantum state of the form

σXBC = ∑x∈X

p(x)|x〉〈x|X ⊗ ρxBC, (4.2.128)

where p : X→ [0, 1] is a probability distribution over a finite alpha-bet X with associated |X|-dimensional system X, and ρx

BC : x ∈ Xis a set of states. Then the following bounds hold

I(X; B|C)σ ≤ H(X)σ ≤ log2 |X|. (4.2.129)

3. Product conditioning system: If the state ρABC is of the form ρABC =σAB ⊗ τC, then

I(A; B|C)ρ = I(A; B)σ. (4.2.130)

4. Additivity: For states ρA1B1C1 and τA2B2C2 , the following equalityholds for the product state ρA1B1C1 ⊗ τA2B2C2 :

I(A1A2; B1B2|C1C2)ρ⊗τ = I(A1; B1|C1)ρ + I(A2; B2|C2)τ. (4.2.131)

5. Direct-sum property: Let p : X → [0, 1] be a probability distributionover a finite alphabet X with associated |X|-dimensional system X,

245

Page 257: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

and let ρxABCx∈X be a set of states. Then, for the classical–quantum

stateσXABC = ∑

x∈Xp(x)|x〉〈x|X ⊗ ρx

ABC (4.2.132)

the following equality holds

I(A; B|CX)σ = ∑x∈X

p(x)I(A; B|C)ρx . (4.2.133)

6. Chain rule: For every state ρAB1B2C,

I(A; B1B2|C)ρ = I(A; B1|C)ρ + I(A; B2|B1C)ρ (4.2.134)

7. Data-processing inequality for local channels: For every state ρABC, thequantum conditional mutual information is non-increasing underthe action of local channels on the non-conditioning systems. Inother words, for every state ρABC and all local channels NA→A′ andMB→B′ , the following inequality holds

I(A; B|C)ρ ≥ I(A′; B′|C)ω, (4.2.135)

where ωA′B′C := (NA→A′ ⊗MB→B′)(ρABC).

PROOF:

1. Symmetry under exchange of A and B follows immediately from the def-inition.

2. Using the definition of conditional entropy, we can write I(A; B|C)ρ as

I(A; B|C)ρ = H(AC)ρ − H(C)ρ + H(BC)ρ − H(ABC)ρ (4.2.136)= H(A|C)ρ − H(A|BC)ρ. (4.2.137)

The inequality H(A|C)ρ ≤ H(A)ρ is a direct consequence of strong sub-additivity, under a relabeling and with trivial conditioning system. Then,by the dimension bound from (4.2.107), it follows that H(A)ρ ≤ log2 dA.Therefore,

H(A|C)ρ ≤ log2 dA. (4.2.138)

Now, let |ψ〉ABCE be a purification of ρABC. Then, since ρABC and ψE havethe same spectrum, and since ρBC and ψAE have the same spectrum, we

246

Page 258: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

obtain

H(A|BC)ρ = H(ABC)ρ − H(BC)ρ (4.2.139)= H(E)ψ − H(AE)ψ (4.2.140)= −H(A|E)ψ (4.2.141)≥ −H(A) (4.2.142)≥ − log2 dA, (4.2.143)

where to obtain the first inequality we used (4.2.138), and to obtain thesecond inequality we used, as before, the fact that H(A)ρ ≤ log2 dA forevery state ρ. Therefore,

H(A|BC)ρ ≥ −H(A)ρ ≥ − log2 dA, (4.2.144)

which means that I(A; B|C)ρ = H(A|C)ρ − H(A|BC)ρ ≤ 2H(A)ρ ≤2 log2 dA. By applying symmetry under exchange of A and B and thesame argument, we conclude that I(A; B|C)ρ ≤ 2H(B)ρ ≤ 2 log2 dB.Thus, we conclude (4.2.126)–(4.2.127).

If the system A is classical (as in (4.2.128)), then the state is separable withrespect to the bipartite cut A|BC. As such, the lower bound in (4.2.144)improves to H(A|BC)ρ ≥ 0 (as a consequence of (4.2.121), implying thatthe upper bound improves to I(A; B|C)ρ ≤ H(A)ρ ≤ log2 dA in this case.

3. For ρABC = σAB ⊗ τC, we have

I(A; B|C)ρ = H(A|C)σ⊗τ + H(B|C)σ⊗τ − H(AB|C)σ⊗τ. (4.2.145)

Now, we use the fact that

H(A|C)σ⊗τ = H(A)σ, (4.2.146)

which follows from (4.2.103). This means that H(B|C)σ⊗τ = H(B)σ andH(AB|C)σ⊗τ = H(AB)σ, and we find that

I(A; B|C)ρ = H(A)σ + H(B)σ − H(AB)σ = I(A; B)σ. (4.2.147)

4. By writing

I(A1A2; B1B2|C1C2)ρ⊗τ = H(A1A2|C1C2)ρ⊗τ + H(B1B2|C1C2)ρ⊗τ

− H(A1A1B1B2|C1C2)ρ⊗τ, (4.2.148)

247

Page 259: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

and recognizing that each conditional entropy on the right-hand side isevaluated on a product state, we can use (4.2.103) to obtain the desiredresult. As an example, we evaluate the first term on the right-hand side:

H(A1A2|C1C2)ρ⊗τ

= H(A1A2C1C2)ρ⊗τ − H(C1C2)ρ⊗τ (4.2.149)= H(A1C1)ρ + H(A2C2)τ − H(C1)ρ − H(C2)τ (4.2.150)= H(A1|C1)ρ + H(A2|C2)τ. (4.2.151)

5. By definition, we have that

I(A; B|CX)σ = H(A|CX)σ + H(B|CX)σ − H(AB|CX)σ. (4.2.152)

Let us consider the first term on the right-hand side. Using the definitionof the quantum conditional entropy and using the direct-sum propertyof the quantum entropy, as stated in (4.2.108), we obtain

H(A|CX)σ

= H(ACX)σ − H(CX)σ (4.2.153)

= H(p) + ∑x∈X

p(x)H(AC)ρx − H(p)− ∑x∈X

p(x)H(C)ρx (4.2.154)

= ∑x∈X

p(x)(

H(AC)ρx − H(C)ρx)

(4.2.155)

= ∑x∈X

p(x)H(A|C)ρx . (4.2.156)

The other terms on the right-hand side of (4.2.152) are evaluated simi-larly, and we ultimately arrive at

I(A; B|CX)σ = ∑x∈X

p(x)(

H(A|C)ρx + H(B|C)ρx − H(AB|C)ρx)

= ∑x∈X

p(x)I(A; B|C)ρx . (4.2.157)

6. This follows straightforwardly by using the definition of the quantumconditional mutual information to expand both sides of (4.2.134) to con-firm that they are equal to each other.

7. Let us first prove the following inequality,

I(A1A2; B1B2|C)ρ ≥ I(A1; B1|C)ρ (4.2.158)

248

Page 260: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

for every state ρA1 A2B1B2C. This inequality implies that discarding parts ofthe non-conditioning systems never increases the quantum conditionalmutual information. Applying the chain rule in (4.2.134), along withstrong subadditivity, we find that

I(A1A2; B1B2|C)ρ = I(A1A2; B1|C)ρ + I(A1A2; B2|B1C)ρ (4.2.159)≥ I(A1A2; B1|C)ρ, (4.2.160)

where to obtain the last line we used strong subadditivity, which im-plies that I(A1A2; B2|B1C)ρ ≥ 0. Applying the chain rule in (4.2.134) andstrong subadditivity once again, we conclude that

I(A1A2; B1|C)ρ = I(A1; B1|C)ρ + I(A2; B1|A1C)ρ (4.2.161)≥ I(A1; B1|C)ρ. (4.2.162)

So we have I(A1A2; B1B2|C)ρ ≥ I(A1; B1|C)ρ.

Now, let VA→A′E1and WB→B′E2

be isometric extensions of the channelsNA→A′ and MB→B′ , respectively, with appropriate environment systemsE1 and E2. Then, by (4.2.158) and the isometric invariance of the quantumentropy, we find that

I(A′; B′|C)ω ≤ I(A′E1; B′E2|C)(V⊗W)ρ(V⊗W)† (4.2.163)

= I(A; B|C)ρ, (4.2.164)

as required.

Proposition 4.8 Uniform Continuity of Conditional Mutual Infor-mation

Let ρABC and σABC be tripartite quantum states such that

12‖ρABC − σABC‖1 ≤ ε, (4.2.165)

for ε ∈ [0, 1]. Then the following bound applies to their conditionalmutual informations:∣∣I(A; B|C)ρ − I(A; B|C)σ

∣∣ ≤ 2ε log2(mindA, dB) + 2g2(ε), (4.2.166)

where g2(ε) := (ε + 1) log2(ε + 1)− ε log2 ε.

249

Page 261: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

REMARK: See Section 2.3 for a definition of uniform continuity.

PROOF: Suppose without loss of generality that ε > 0 (otherwise the state-ment trivially holds). Let ωλ

ABC := λρABC + (1− λ) σABC for λ ∈ [0, 1]. Thenthe following inequality holds

λI(A; B|C)ρ + (1− λ) I(A; B|C)σ ≤ I(A; B|C)ωλ + h2(λ), (4.2.167)

because for the classical–quantum state

ωλABCX := λρABC ⊗ |0〉〈0|X + (1− λ) σABC ⊗ |1〉〈1|X, (4.2.168)

we have that

λI(A; B|C)ρ + (1− λ) I(A; B|C)σ = I(A; B|CX)ωλ (4.2.169)≤ I(AX; B|C)ωλ (4.2.170)= I(A; B|C)ωλ + I(X; B|CA)ωλ (4.2.171)≤ I(A; B|C)ωλ + H(X)ωλ (4.2.172)= I(A; B|C)ωλ + h2(λ). (4.2.173)

The first equality follows from (4.2.133). The first inequality follows from thechain rule and strong subadditivity. The second equality follows from thechain rule. The second inequality follows from the local entropy bound in(4.2.129). We also have that

I(A; B|C)ωλ ≤ I(AX; B|C)ωλ (4.2.174)= I(X; B|C)ωλ + I(A; B|CX)ωλ (4.2.175)≤ h2(λ) + λI(A; B|C)ρ + (1− λ) I(A; B|C)σ, (4.2.176)

which together imply that∣∣λI(A; B|C)ρ + (1− λ) I(A; B|C)σ − I(A; B|C)ωλ

∣∣ ≤ h2(λ). (4.2.177)

Then consider the state

ζABC :=1

1 + ε

(ρABC + [σABC − ρABC]+

), (4.2.178)

where [·]+ denotes the positive part of an operator, and for this choice, wehave that

11 + ε

ρABC +ε

1 + εξ1

ABC = ζABC (4.2.179)

250

Page 262: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

=1

1 + εσABC +

ε

1 + εξ2

ABC, (4.2.180)

analogous to the approach of Thales of Milete, where the states ξ1ABC and ξ2

ABCare defined as

ξ1ABC :=

1ε[σABC − ρABC]+ , (4.2.181)

ξ2ABC :=

1ε((1 + ε)ζABC − σABC) . (4.2.182)

Applying (4.2.177) to the convex decompositions above, we find that

11 + ε

(I(A; B|C)ρ − I(A; B|C)σ

)

≤ ε

1 + ε

(I(A; B|C)ξ2 − I(A; B|C)ξ1

)+ 2h2

1 + ε

)(4.2.183)

≤ ε

1 + εI(A; B|C)ξ2 + 2h2

1 + ε

)(4.2.184)

≤ ε

1 + ε2 log(mindA, dB) + 2h2

1 + ε

), (4.2.185)

where to obtain the last line we used the dimension bound in (4.2.127). Mul-tiplying through by 1 + ε and using the fact that g2(ε) = (1 + ε) h2

1+ε

), we

conclude that

I(A; B|C)ρ − I(A; B|C)σ ≤ 2ε log2(mindA, dB) + 2g2(ε). (4.2.186)

To arrive at the other inequality, we again apply (4.2.177) to the convex de-compositions above and find that

11 + ε

(I(A; B|C)σ − I(A; B|C)ρ

)

≤ ε

1 + ε

(I(A; B|C)ξ1 − I(A; B|C)ξ2

)+ 2h2

1 + ε

). (4.2.187)

Then we apply the same reasoning as above to find that

I(A; B|C)σ − I(A; B|C)ρ ≤ 2ε log2(mindA, dB) + 2g2(ε), (4.2.188)

concluding the proof.

251

Page 263: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

4.2.3 Quantum Mutual Information

The quantum mutual information of a bipartite state ρAB is a measure of allcorrelations between the A and B systems. We defined it previously in (4.1.8)and (4.2.95), and we recall its definition here:

I(A; B)ρ := H(A)ρ + H(B)ρ − H(AB)ρ. (4.2.189)

It can be understood as a special case of the conditional mutual informa-tion in which the C system is trivial (i.e., formally, a one-dimensional systemthat is in tensor product with ρAB and equal to the number one). As such,all of the properties of the conditional mutual information apply directly tomutual information, and we list them here for convenience:

Corollary 4.9 Non-Negativity of Quantum Mutual Information

For every state ρAB, the following inequality holds

I(A; B)ρ ≥ 0. (4.2.190)

We can view non-negativity in (4.2.190) as a consequence of strong subad-ditivity in (4.2.111). However, the conclusion in (4.2.190) follows more easilyas a consequence of non-negativity of quantum relative entropy for quantumstates from (4.2.72) because

I(A; B)ρ = D(ρAB‖ρA ⊗ ρB) ≥ 0. (4.2.191)

Proposition 4.10 Properties of Quantum Mutual Information

The quantum mutual information has the following properties:1. Symmetry: For every state ρAB, we have I(A; B)ρ = I(B; A)ρ.

2. Local entropy and dimension bounds: For every state ρAB, the followingbounds hold

I(A; B)ρ ≤ 2 minH(A)ρ, H(B)ρ (4.2.192)≤ 2 log2(mindA, dB). (4.2.193)

Let σXB be a classical–quantum state of the form

σXB = ∑x∈X

p(x)|x〉〈x|X ⊗ ρxB, (4.2.194)

252

Page 264: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

where p : X→ [0, 1] is a probability distribution over a finite alpha-bet X with associated |X|-dimensional system X, and ρx

Bx∈X is aset of states. Then the following bounds hold

I(X; B)σ ≤ H(X)σ ≤ log2 |X|. (4.2.195)

3. Additivity: For states ρA1B1 and τA2B2 , the following equality holdsfor the product state ρA1B1 ⊗ τA2B2 :

I(A1A2; B1B2)ρ⊗τ = I(A1; B1)ρ + I(A2; B2)τ. (4.2.196)

4. Direct-sum property: Let p : X → [0, 1] be a probability distributionover a finite alphabet X with associated |X|-dimensional system X,and let ρx

ABx∈X be a set of states. Then, for the classical–quantumstate

σXAB = ∑x∈X

p(x)|x〉〈x|X ⊗ ρxAB (4.2.197)

the following equality holds

I(A; B|X)σ = ∑x∈X

p(x)I(A; B)ρx . (4.2.198)

5. Data-processing inequality for local channels: For every state ρAB, thequantum mutual information is non-increasing under the action oflocal channels. In other words, for every state ρAB and all local chan-nels NA→A′ and MB→B′ , the following inequality holds

I(A; B)ρ ≥ I(A′; B′)ω, (4.2.199)

where ωA′B′ := (NA→A′ ⊗MB→B′)(ρAB).

Proposition 4.11 Uniform Continuity of Mutual Information

Let ρAB and σAB be bipartite quantum states such that

12‖ρAB − σAB‖1 ≤ ε, (4.2.200)

for ε ∈ [0, 1]. Then the following bound applies to their mutual informa-

253

Page 265: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

tions:∣∣I(A; B)ρ − I(A; B)σ

∣∣ ≤ 2ε log2(mindA, dB) + 2g2(ε), (4.2.201)

where g2(ε) := (ε + 1) log2(ε + 1)− ε log2 ε.

Proposition 4.12 Mutual Information of Classical–Quantum States

Let p : X → [0, 1] be a probability distribution over a finite alphabet Xwith associated |X|-dimensional system X, and let ρx

Ax∈X be a set ofstates. Then, for the classical–quantum state

ρXA = ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA (4.2.202)

the following equalities hold

I(X; A)ρ = H(ρA)− ∑x∈X

p(x)H(ρxA) (4.2.203)

= ∑x∈X

p(x)D(ρxA‖ρA), (4.2.204)

where ρA := ∑x∈X p(x)ρxA.

REMARK: The mutual information I(X; A)ρ of a classical–quantum state ρXA is oftencalled Holevo information, a term we use throughout this book.

PROOF: Consider from (4.2.95) that

I(X; A)ρ = D(ρXA‖ρX ⊗ ρA) (4.2.205)= Tr[ρXA log2 ρXA]− Tr[ρXA log2(ρX ⊗ ρA)]. (4.2.206)

Using (4.2.65) in the proof of Proposition 4.3, we find that

log2 ρXA = ∑x∈X|x〉〈x|X ⊗ log2 p(x)1A + ∑

x∈X|x〉〈x|X ⊗ log ρx

A, (4.2.207)

which leads to

Tr[ρXA log ρXA] = ∑x∈X

p(x) log2 p(x) + ∑x∈X

p(x)Tr[ρxA log2 ρx

A] (4.2.208)

254

Page 266: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= −H(X)− ∑x∈X

p(x)H(ρxA). (4.2.209)

Then, using log2(ρX ⊗ ρA) = log2 ρX ⊗ 1A + 1X ⊗ log2 ρA, we conclude that

Tr[ρXA log2(ρX ⊗ ρA)] = Tr[ρX log ρX] + Tr[ρA log2 ρA] (4.2.210)= −H(X)− H(ρA). (4.2.211)

However, ρA = ρA, so that

I(X; A)ρ = H(ρA)− ∑x∈X

p(x)H(ρxA), (4.2.212)

which is the statement in (4.2.203).

Now we prove (4.2.204). Starting from (4.2.203), consider that

H(ρA)− ∑x∈X

p(x)H(ρxA)

= −Tr[ρA log2 ρA] + ∑x∈X

p(x)Tr[ρxA log2 ρx

A] (4.2.213)

= − ∑x∈X

p(x)Tr[ρxA log2 ρA] + ∑

x∈Xp(x)Tr[ρx

A log2 ρxA] (4.2.214)

= ∑x∈X

p(x)Tr[ρxA(log2 ρx

A − log2 ρA)] (4.2.215)

= ∑x∈X

p(x)D(ρxA‖ρA), (4.2.216)

which establishes (4.2.204).

4.3 Generalized Divergences

The quantum relative entropy is a function on states and positive semi-defi-nite operators that satisfies the data-processing inequality under quantumchannels. The data-processing inequality is quite powerful and a unifyingconcept, in the sense that it allows for establishing many properties of in-formation and distinguishability measures. As such, this motivates the def-inition of generalized divergence as a function on states and positive semi-definite operators that satisfies the data-processing inequality.

255

Page 267: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Definition 4.13 Generalized Divergence

For every Hilbert space H, a function D : D(H)× L+(H) → R ∪ +∞is called a generalized divergence if it satisfies the data-processing inequal-ity under every channel, i.e., for all channels N, states ρ, and positivesemi-definite operators σ,

D(ρ‖σ) ≥ D(N(ρ)‖N(σ)). (4.3.1)

Many of the strong converse theorems in this book rely heavily on thedata-processing inequality, and so we employ the generalized divergence toemphasize this point.

We have already mentioned the quantum relative entropy as an exampleof a generalized divergence. Other examples discussed later in this chapter,which are relevant in the context of strong converse theorems, are the Petz–,sandwiched, and geometric Rényi relative entropies.

From the fact that generalized divergences satisfy the data-processing in-equality by definition, we immediately obtain two properties of interest.

Proposition 4.14 Basic Properties of the Generalized Divergence

For every generalized divergence D, for every state ρ, and every positivesemi-definite operator σ:

1. The generalized divergence is isometrically invariant, i.e.,

D(ρ‖σ) = D(VρV†‖VσV†), (4.3.2)

for all isometries V.

2. For every state τ,

D(ρ‖σ) = D(ρ⊗ τ‖σ⊗ τ). (4.3.3)

PROOF:

1. Since the map ρ 7→ VρV† is a channel, we immediately obtain D(ρ‖σ) ≥D(VρV†‖VσV†). To prove that D(ρ‖σ) ≤ D(VρV†‖VσV†), consider thechannel RV , which was defined in (3.2.72) as

RV(ω) = V†ωV + Tr[(1−VV†)ω]τ (4.3.4)

256

Page 268: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

for every positive semi-definite operator ω, where τ is an arbitrary (butfixed) state. Recall from Section 3.2.4.3 that this is a general way to takethe map ρ 7→ V†ρV, which is completely positive but not trace preserv-ing, and make it trace preserving and hence a channel. Then, recall thatRV(VρV†) = ρ and RV(VσV†) = σ. Therefore, by the data-processinginequality,

D(VρV†‖VσV†) ≥ D(RV(VρV†)‖RV(VσV†)) = D(ρ‖σ), (4.3.5)

and so D(ρ‖σ) = D(VρV†‖VσV†).

2. Since taking the tensor product with a fixed state is a channel (recall Defi-nition 3.22), by definition of generalized divergence we obtain D(ρ‖σ) ≥D(ρ ⊗ τ‖σ ⊗ τ). On the other hand, the partial trace is also a channel,and so by discarding the second system in the operators ρ⊗ τ and σ⊗ τ,we obtain

D(ρ‖σ) = D(Tr2(ρ⊗ τ)‖Tr2(σ⊗ τ)) ≤ D(ρ⊗ τ‖σ⊗ τ), (4.3.6)

which means that D(ρ‖σ) = D(ρ⊗ τ‖σ⊗ τ).

Proposition 4.15

Suppose that the generalized divergence obeys the following direct-sumproperty:

D(ρXA‖σXA) = ∑x∈X

p(x)D(ρxA‖σx

A), (4.3.7)

where X is a finite alphabet, p : X → [0, 1] is a probability distribution,ρx

Ax∈X is a set of states, σxAx∈X is a set of positive semi-definite op-

erators, and

ρXA := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, (4.3.8)

σXA := ∑x∈X

p(x)|x〉〈x|X ⊗ σxA. (4.3.9)

Then the generalized divergence is jointly convex; i.e., the following in-equality holds

∑x∈X

p(x)D(ρxA‖σx

A) ≥ D(ρA‖σA), (4.3.10)

where ρA := ∑x∈X p(x)ρxA and σA := ∑x∈X p(x)σx

A.

257

Page 269: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

PROOF: The proof is the same as the proof of Proposition 4.5 with D replacedby D.

Now, just as we defined entropic quantities like the entropy, conditionalentropy, and mutual information using the quantum relative entropy, we candefine their generalized counterparts using the generalized divergence.

Definition 4.16 Generalized Information Measures for States

Let D be a generalized divergence, as given in Definition 4.13.1. The generalized quantum entropy H(ρ), for a state ρ, is defined as

H(ρ) := −D(ρ‖1). (4.3.11)

2. The generalized quantum conditional entropy H(A|B)ρ, for a bipartitestate ρAB, is defined as

H(A|B)ρ := − infσB∈D(HB)

D(ρAB‖1A ⊗ σB), (4.3.12)

and the generalized coherent information I(A〉B)ρ, for a bipartite stateρAB, is defined as

I(A〉B)ρ := infσB∈D(HB)

D(ρAB‖1A ⊗ σB). (4.3.13)

3. The generalized quantum mutual information I(A; B)ρ, for a bipartitestate ρAB, is defined as

I(A; B)ρ := infσB∈D(HB)

D(ρAB‖ρA ⊗ σB). (4.3.14)

The data-processing inequality for every generalized divergence trans-lates to the derived generalized information measures for states in the fol-lowing way.

Proposition 4.17 Data-Processing Inequality for Generalized Infor-mation Measures for States

Let D be a generalized divergence.1. The generalized quantum entropy does not decrease under the ac-

258

Page 270: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

tion of a unital channel, i.e.,

H(ρ) ≤ H(N(ρ)), (4.3.15)

for every state ρ and every unital channel N.

2. For every bipartite state ρAB, the generalized quantum conditionalentropy does not decrease under the action of an arbitrary unitalchannel on A and an arbitrary channel on B, i.e.,

H(A|B)ρ ≤ H(A′|B′)ρ′ , (4.3.16)

where ρ′A′B′ := (NA→A′ ⊗MB→B′)(ρAB), with NA→A′ an arbitraryunital channel and MB→B′ an arbitrary channel. It follows by defi-nition that

I(A〉B)ρ ≥ I(A′〉B′)ρ′ . (4.3.17)

3. For every bipartite state ρAB, the generalized quantum mutual in-formation does not increase under the action of arbitrary channelson A and B, i.e.,

I(A; B)ρ ≥ I(A′; B′)ρ′ , (4.3.18)

where ρ′A′B′ = (NA→A′ ⊗MB→B′)(ρAB), with NA→A′ and MB→B′arbitrary channels.

PROOF:

1. Let ρ be an arbitrary state, and let N be a unital channel. The unitalityof N means, by definition, that N(1) = 1. Using this, along with thedata-processing inequality for the generalized divergence D, we obtain

H(N(ρ)) = −D(N(ρ)‖1) (4.3.19)= −D(N(ρ)‖N(1)) (4.3.20)≥ −D(ρ‖1) (4.3.21)= H(ρ), (4.3.22)

as required.

2. Let ρAB be an arbitrary bipartite state, let NA→A′ be an arbitrary unitalchannel, and let MB→B′ be an arbitrary channel. Also, let σB be an arbi-trary state. Then, using the data-processing inequality of the generalized

259

Page 271: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

divergence D and the unitality of N, we obtain

D(ρAB‖1A ⊗ σB) ≥ D((N⊗M)(ρAB)‖(N⊗M)(1A ⊗ σB)) (4.3.23)= D(ρ′A′B′‖N(1A)⊗M(σB)) (4.3.24)= D(ρ′A′B′‖1A′ ⊗M(σB)) (4.3.25)≥ inf

σB′D(ρ′A′B′‖1A′ ⊗ σB′) (4.3.26)

= −H(A′|B′)ρ′ . (4.3.27)

Since the state σB is arbitrary, we find that

H(A|B)ρ = − infσB

D(ρAB‖1A ⊗ σB) ≤ H(A′|B′)ρ′ , (4.3.28)

as required. The data-processing inequality in (4.3.17) for the generalizedcoherent information follows from the fact that, by definition, I(A〉B)ρ =−H(A|B)ρ.

3. Let ρAB be an arbitrary bipartite state, and let NA→A′ and MB→B′ be ar-bitrary channels. Also, let σB be an arbitrary state. Then, using the data-processing inequality for the generalized divergence D, we obtain

D(ρAB‖ρA ⊗ σB) ≥ D((N⊗M)(ρAB)‖(N⊗M)(ρA ⊗ σB)) (4.3.29)= D(ρ′A′B′‖ρ′A′ ⊗M(σB)) (4.3.30)≥ inf

σB′D(ρ′A′B′‖ρ′A′ ⊗ σB′) (4.3.31)

= I(A′; B′)ρ′ . (4.3.32)

Since the state σB is arbitrary, we find that

I(A; B)ρ = infσB

D(ρAB‖ρA ⊗ σB) ≥ I(A′; B′)ρ′ , (4.3.33)

as required.

4.4 Petz–Rényi Relative Entropy

One important example of a generalized divergence is the Petz–Rényi relativeentropy.

260

Page 272: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Definition 4.18 Petz–Rényi Relative Entropy

For all α ∈ (0, 1) ∪ (1, ∞), we define the Petz–Rényi relative quasi-entropyfor every state ρ and positive semi-definite operator σ as

Qα(ρ‖σ) :=

Tr[ρασ1−α]if α ∈ (0, 1), orα ∈ (1, ∞) and supp(ρ) ⊆ supp(σ),

+∞ otherwise.(4.4.1)

The Petz–Rényi relative entropy is then defined as

Dα(ρ‖σ) :=1

α− 1log2 Qα(ρ‖σ). (4.4.2)

By following the recipe in (4.3.11), i.e., setting σ = 1 and applying a minussign out in front, the Petz–Rényi relative entropy reduces to the Rényi entropyof a quantum state ρ:

Hα(ρ) = −Dα(ρ‖1) =1

1− αlog2 Tr[ρα] =

α

1− αlog2 ‖ρ‖α . (4.4.3)

If the state ρ is defined on system A, we also employ the notation

Hα(A)ρ ≡ Hα(ρ). (4.4.4)

The Rényi entropy is defined for all α ∈ (0, 1) ∪ (1, ∞), and one evaluates itsvalue at α ∈ 0, 1, ∞ by taking limits:

H0(ρ) := limα→0

Hα(ρ) = log2 rank(ρ), (4.4.5)

H1(ρ) := limα→1

Hα(ρ) = −Tr[ρ log2 ρ] = H(ρ), (4.4.6)

H∞(ρ) := limα→∞

Hα(ρ) = − log λmax(ρ). (4.4.7)

Turning back to the Petz–Rényi relative entropy, note that 1− α is negativefor α > 1. In this case, the inverse of σ is taken on its support. An alternativeto this convention is to define Dα(ρ‖σ) for α > 1 for only positive definite σand for positive semi-definite σ define Dα(ρ‖σ) = limε→0 Dα(ρ‖σ + ε1). Bothalternatives are equivalent, as we now show.

261

Page 273: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.19

For every state ρ and positive semi-definite operator σ,

Dα(ρ‖σ) = limε→0+

1α− 1

log2 Tr[ρα(σ + ε1)1−α

]. (4.4.8)

PROOF: For α ∈ (0, 1), this is immediate from the fact that the logarithm,trace, and power functions are continuous, so that the limit can be broughtinside the trace and inside the power (σ + ε1)1−α.

For α ∈ (1, ∞), since 1− α is negative and σ is not necessarily invertible,we first decompose the underlying Hilbert space H as H = supp(σ)⊕ker(σ),just as we did in (4.2.6), in order to write

ρ =

(ρ0,0 ρ0,1ρ†

0,1 ρ1,1

), σ =

(σ 00 0

). (4.4.9)

Then, writing 1 = Πσ + Π⊥σ , where Πσ is the projection onto the support ofσ and Π⊥σ is the projection onto the orthogonal complement of supp(σ), wefind that

σ + ε1 =

(σ + εΠσ 0

0 εΠ⊥σ

), (4.4.10)

which implies that

(σ + ε1)1−α =

((σ + εΠσ)1−α 0

0 (εΠ⊥σ )1−α

). (4.4.11)

If supp(ρ) ⊆ supp(σ), then ρ = ρ0,0, ρ0,1 = 0, and ρ1,1 = 0, which meansthat

ρα(σ + ε1)1−α =

(ρα(σ + εΠσ)1−α 0

0 0

), (4.4.12)

so that

limε→0+

1α− 1

log2 Tr[ρα(σ + ε1)1−α

]

= limε→0+

1α− 1

log2 Tr[ρα(σ + εΠσ)

1−α]

(4.4.13)

=1

α− 1log2 Tr

[ρασ1−α

](4.4.14)

262

Page 274: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= Dα(ρ‖σ), (4.4.15)

as required.

If supp(ρ) * supp(σ), then the blocks ρ0,1 and ρ1,1 are generally non-zero,and we obtain

ρα(σ + ε1)1−α

=

(ρ0,0 ρ0,1ρ†

0,1 ρ1,1

)α ((σ + εΠσ)1−α 0

0 (εΠ⊥σ )1−α

)(4.4.16)

= ε1−α

[(ρ0,0 ρ0,1ρ†

0,1 ρ1,1

)α (εα−1(σ + εΠσ)1−α 0

0 (Π⊥σ )1−α

)]. (4.4.17)

Due to the fact that α ∈ (1, ∞) it holds that limε→0+ ε1−α = +∞, and since thelimit ε→ 0+ of the matrix in square brackets in (4.4.17) is finite, we find that

limε→0

1α− 1

log2 Tr[ρα(σ + ε1)1−α

]= +∞ = Dα(ρ‖σ) (4.4.18)

for the case α ∈ (1, ∞) and supp(ρ) * supp(σ).

The Petz–Rényi relative entropy is a natural generalization of the classicalRényi relative entropy, which is defined for a probability distribution p : X→[0, 1] and a positive measure q : X→ [0, ∞) over the same alphabet X as

Dα(p‖q) = 1α− 1

log2 ∑x∈X

p(x)αq(x)1−α (4.4.19)

for α ∈ (0, 1). For α ∈ (1, ∞), the same expression defines Dα(p‖q) wheneverq(x) = 0 implies p(x) = 0 for all x ∈ X; otherwise, Dα(p‖q) = +∞.

Recall the quantum Chernoff bound from Theorem 3.14 in Section 3.1.3,which states that optimal error exponent for the task of discriminating be-tween the states ρ and σ is

ξ(ρ, σ) = ξ(ρ, σ) = C(ρ‖σ) := supα∈(0,1)

− log2 Tr[ρασ1−α]. (4.4.20)

Using the definition of the Petz–Rényi relative entropy, we find that

C(ρ‖σ) = infα∈(0,1)

(1− α)Dα(ρ‖σ). (4.4.21)

263

Page 275: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

The Petz–Rényi relative entropy thus plays a role in providing the optimal er-ror exponent for the task of discriminating between two states (i.e., symmetrichypothesis testing).

Like the quantum relative entropy, the Petz–Rényi relative entropy is faith-ful, meaning that, for α ∈ (0, 1) ∪ (1, ∞) and states ρ, σ,

Dα(ρ‖σ) = 0 ⇐⇒ ρ = σ. (4.4.22)

We prove this in Proposition 4.34 in the next section, as it requires results fromboth this section and the next one. Also, like the quantum relative entropy,the Petz–Rényi relative entropy is a generalized divergence for certain valuesof α, which is shown in Theorem 4.22 below.

Before getting to Theorem 4.22, we first discuss several important prop-erties of the Petz–Rényi relative entropy. Let us note that, if ρ and σ act on ad-dimensional Hilbert space and are invertible, then the Petz–Rényi relativequasi-entropy can be written as

Qα(ρ‖σ) = 〈ϕρ|(ρ−1 ⊗ σT)1−α|ϕρ〉, (4.4.23)

where |ϕρ〉 := (ρ12 ⊗ 1)|Γ〉 is a purification of ρ and |Γ〉 = ∑d

i=1 |i, i〉. This isdue to the transpose trick in (2.2.30) and the identity in (2.2.33). We can ex-tend the applicability of this expression to states ρ and positive semi-definiteoperators σ that are not invertible by noting that

Qα(ρ‖σ) = limε→0+

limδ→0+

1α− 1

log2 Tr[((1− δ)ρ + δπ)α (σ + ε1)1−α

]. (4.4.24)

We start by establishing the important fact that the quantum relative en-tropy is a special case of the Petz–Rényi relative entropy in the limit α→ 1.

Proposition 4.20

Let ρ be a state and σ a positive semi-definite operator. Then, in thelimit α → 1, the Petz–Rényi relative entropy converges to the quantumrelative entropy:

limα→1

Dα(ρ‖σ) = D(ρ‖σ). (4.4.25)

PROOF: Let us first consider the case α ∈ (1, ∞). If supp(ρ) * supp(σ),then Dα(ρ‖σ) = +∞, so that limα→1+ Dα(ρ‖σ) = +∞, consistent with the

264

Page 276: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

definition of the quantum relative entropy in this case (see Definition 4.1). Ifsupp(ρ) ⊆ supp(σ), then Dα(ρ‖σ) is finite and we can write

Dα(ρ‖σ) =1

α− 1log2 Qα(ρ‖σ). (4.4.26)

Now, let us define the function

Qα,β(ρ‖σ) := Tr[ρασ1−β], (4.4.27)

so that Qα(ρ‖σ) = Qα,α(ρ‖σ). By noting that supp(ρ) ⊆ supp(σ) impliesQ1(ρ‖σ) = Tr[ρΠσ] = 1 (where Πσ is the projection onto the support of σ),and since log2 1 = 0, we can write Dα(ρ‖σ) as

Dα(ρ‖σ) =log2 Qα(ρ‖σ)− log2 Q1(ρ‖σ)

α− 1, (4.4.28)

so that

limα→1

Dα(ρ‖σ) =d

dαlog2 Qα(ρ‖σ)

∣∣∣∣α=1

(4.4.29)

=1

ln(2)

ddα Qα(ρ‖σ)

∣∣∣α=1

Q1(ρ‖σ)(4.4.30)

=1

ln(2)d

dαQα(ρ‖σ)

∣∣∣∣α=1

, (4.4.31)

where to obtain the first equality we used the definition of the derivative andto obtain the second equality we used the derivative of the natural logarithmalong with the chain rule. Using the function Qα,β and the chain rule, wewrite

ddα

Qα(ρ‖σ)∣∣∣∣α=1

=d

dαQα,1(ρ‖σ)

∣∣∣∣α=1

+d

dβQ1,β(ρ‖σ)

∣∣∣∣β=1

. (4.4.32)

Then,

ddα

Qα,1(ρ‖σ) =d

dαTr[ραΠσ] =

ddα

Tr[ρα] = Tr[ρα ln ρ], (4.4.33)

where we used the fact that ραΠσ = ρα since supp(ρ) ⊆ supp(σ). Therefore,

ddα

Qα,1(ρ‖σ)∣∣∣∣α=1

= Tr[ρ ln ρ]. (4.4.34)

265

Page 277: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Similarly,d

dβQ1,β(ρ‖σ) =

ddβ

Tr[ρσ1−β] = −Tr[ρσ1−β ln σ], (4.4.35)

so thatd

dβQ1,β(ρ‖σ)

∣∣∣∣β=1

= −Tr[ρΠσ ln σ] = −Tr[ρ ln σ], (4.4.36)

where to obtain the last equality we again used the fact that the support con-dition supp(ρ) ⊆ supp(σ) holds. So we find that

limα→1

Dα(ρ‖σ) =1

ln(2)d

dαQα(ρ‖σ)

∣∣∣∣α=1

= Tr[ρ log2 ρ]− Tr[ρ log2 σ] (4.4.37)

when supp(ρ) ⊆ supp(σ), which means that for α ∈ (1, ∞),

limα→1+

Dα(ρ‖σ) =

Tr[ρ log2 ρ]− Tr[ρ log2 σ] if supp(ρ) ⊆ supp(σ),+∞ otherwise

= D(ρ‖σ).(4.4.38)

Let us now consider the case α ∈ (0, 1). If supp(ρ) ⊆ supp(σ), then sincethe limit in (4.4.37) holds from both sides, we find that

limα→1−

Dα(ρ‖σ) = Tr[ρ log2 ρ]− Tr[ρ log2 σ]. (4.4.39)

If supp(ρ) * supp(σ) (and Tr[ρσ] 6= 0), then observe that we can write Dα as

Dα(ρ‖σ) =log2 Qα(ρ‖σ)− log2 Q1(ρ‖σ)

α− 1+

log2 Q1(ρ‖σ)α− 1

, (4.4.40)

so that

limα→1−

Dα(ρ‖σ) =d

dαlog2 Qα(ρ‖σ)

∣∣∣∣α=1

+ limα→1−

log2 Tr[ρΠσ]

α− 1, (4.4.41)

where we have used Q1(ρ‖σ) = Tr[ρΠσ]. Now, since Tr[ρσ] 6= 0 and supp(ρ)* supp(σ), we have that 0 < Tr[ρΠσ] < 1, which means that log2 Tr[ρΠσ] <

0. Since limα→1−1

α−1 = −∞, we get that the second term in (4.4.41) is equalto +∞, which means that limα→1− Dα(ρ‖σ) = +∞. Therefore,

limα→1−

Dα(ρ‖σ)

=

Tr[ρ log2 ρ]− Tr[ρ log2 σ] if supp(ρ) ⊆ supp(σ),+∞ otherwise

= D(ρ‖σ).

(4.4.42)

266

Page 278: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

To conclude, we have that limα→1+ Dα(ρ‖σ) = limα→1− Dα(ρ‖σ) = D(ρ‖σ),which means that

limα→1

Dα(ρ‖σ) = D(ρ‖σ), (4.4.43)

as required.

Proposition 4.21 Properties of the Petz–Rényi Relative Entropy

For all states ρ, ρ1, ρ2 and positive semi-definite operators σ, σ1, σ2, thePetz–Rényi relative entropy satisfies the following properties.

1. Isometric invariance: For all α ∈ (0, 1) ∪ (1, ∞) and for all isome-tries V,

Dα(ρ‖σ) = Dα(VρV†‖VσV†). (4.4.44)

2. Monotonicity in α: For all α ∈ (0, 1) ∪ (1, ∞), Dα is monotonicallyincreasing in α, i.e., α < β implies Dα(ρ‖σ) ≤ Dβ(ρ‖σ).

3. Additivity: For all α ∈ (0, 1) ∪ (1, ∞),

Dα(ρ1 ⊗ ρ2‖σ1 ⊗ σ2) = Dα(ρ1‖σ1) + Dα(ρ2‖σ2). (4.4.45)

4. Direct-sum property: Let p : X → [0, 1] be a probability distributionover a finite alphabet X with associated |X|-dimensional system X,and let q : X → [0, ∞) be a positive function on X. Let ρx

Ax∈X bea set of states on a system A, and let σx

Ax∈X be a set of positivesemi-definite operators on A. Then,

Qα(ρXA‖σXA) = ∑x∈X

p(x)αq(x)1−αQα(ρxA‖σx

A), (4.4.46)

where

ρXA := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, (4.4.47)

σXA := ∑x∈X

q(x)|x〉〈x|X ⊗ σxA. (4.4.48)

REMARK: Observe that the direct-sum property analogous to that for the quantum rel-ative entropy (see Proposition 4.3) does not hold for the Petz–Rényi relative entropy forevery α ∈ (0, 1)∪ (1, ∞). We can instead only make a statement for the Petz–Rényi relativequasi-entropy.

267

Page 279: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

PROOF:

1. Proof of isometric invariance: Let us start by writing Dα(ρ‖σ) as in (4.4.8):

Dα(ρ‖σ) = limε→0+

1α− 1

log2 Tr[ρα(σ + ε1)1−α

]. (4.4.49)

Then, using the fact that (VρV†)α = VραV†, we find that

Dα(VρV†‖VσV†)

= limε→0+

1α− 1

log2 Tr[(VρV†)α(VσV† + ε1)1−α

](4.4.50)

= limε→0+

1α− 1

log2 Tr[VραV†(VσV† + ε1)1−α

]. (4.4.51)

Now, let Π := VV† be the projection onto the image of V, and let Π :=1−Π. Then, we can write

VσV† + ε1 = VσV† + εΠ + εΠ = V(σ + ε1)V† + εΠ. (4.4.52)

Since V(σ + ε1)V† and εΠ are supported on orthogonal subspaces, weobtain

(VσV† + ε1)1−α = V(σ + ε1)1−αV† + ε1−αΠ. (4.4.53)

Therefore,

Tr[VραV†(VσV† + ε1)1−α

]

= Tr[VραV†V(σ + ε1)1−αV† + ε1−αVραV†Π

](4.4.54)

= Tr[Vρα(σ + ε1)1−αV†

](4.4.55)

= Tr[ρα(σ + ε1)1−α

], (4.4.56)

where to obtain the second equality we used the fact that V†ΠV = V†V−V†VV†V = 1− 1 = 0, and to obtain the last equality we used cyclicity ofthe trace. Therefore,

Dα(VρV†‖VσV†) = limε→0+

1α− 1

log2 Tr[ρα(σ + ε1)1−α

](4.4.57)

= Dα(ρ‖σ), (4.4.58)

as required.

268

Page 280: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

2. Proof of monotonicity in α: Using the expression in (4.4.2) for Dα along withthe form in (4.4.23) for the quasi-entropy Qα, let us write Dα(ρ‖σ) as

Dα(ρ‖σ) =1

α− 1ln〈ϕρ|X1−α|ϕρ〉

ln(2)= − 1

γ

ln〈ϕρ|Xγ|ϕρ〉ln(2)

, (4.4.59)

where X = ρ−1 ⊗ σT, γ := 1− α, and |ϕρ〉 = (ρ12 ⊗ 1)|Γ〉 is a purification

of ρ. We first prove the result for ρ invertible, and the proof for non-invertible states ρ follows by (4.4.24). Now, since d

dα = ddγ

dγdα = − d

dγ , wefind that

ddα

Dα(ρ‖σ)

=1

ln(2)d

(1γ

ln〈ϕρ|Xγ|ϕρ〉)

(4.4.60)

=1

ln(2)

(− 1

γ2 ln〈ϕρ|Xγ|ϕρ〉+ 1γ

〈ϕρ|Xγ ln X|ϕρ〉〈ϕρ|Xγ|ϕρ〉

)(4.4.61)

=1

ln(2)−〈ϕρ|Xγ|ϕρ〉 ln〈ϕρ|Xγ|ϕρ〉+ γ〈ϕρ|Xγ ln X|ϕρ〉

γ2〈ϕρ|Xγ|ϕρ〉 (4.4.62)

=1

ln(2)−〈ϕρ|Xγ|ϕρ〉 ln〈ϕρ|Xγ|ϕρ〉+ 〈ϕρ|Xγ ln Xγ|ϕρ〉

γ2〈ϕρ|Xγ|ϕρ〉 . (4.4.63)

Letting g(x) := x log2 x, it follows that

ddα

Dα(ρ‖σ) =〈ϕρ|g(Xγ)|ϕρ〉 − g(〈ϕρ|Xγ|ϕρ〉)

γ2〈ϕρ|Xγ|ϕρ〉 . (4.4.64)

Then, since g is operator convex, by the operator Jensen inequality in(2.3.22), we conclude that

〈ϕρ|g(Xγ)|ϕρ〉 ≥ g(〈ϕρ|Xγ|ϕρ〉), (4.4.65)

which means that ddα Dα(ρ‖σ) ≥ 0. Therefore, Dα(ρ‖σ) is monotonically

increasing in α, as required.

3. Proof of additivity: When all quantities are finite, we have that

Dα(ρ1 ⊗ ρ2‖σ1 ⊗ σ2) =1

α− 1log2 Tr

[(ρ1 ⊗ ρ2)

α(σ1 ⊗ σ2)1−α]

. (4.4.66)

Using the fact that (X ⊗ Y)β = Xβ ⊗ Yβ for all positive semi-definiteoperators X, Y and all β ∈ R, we obtain

Qα(ρ1 ⊗ ρ2‖σ1 ⊗ σ2) = Tr[(ρ1 ⊗ ρ2)

α(σ1 ⊗ σ2)1−α]

(4.4.67)

269

Page 281: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= Tr[(ρα

1 ⊗ ρα2)(σ

1−α1 ⊗ σ1−α

2 )]

(4.4.68)

= Tr[ρα

1σ1−α1 ⊗ ρα

2σ1−α2

](4.4.69)

= Tr[ρα1σ1−α

1 ] · Tr[ρα2σ1−α

2 ] (4.4.70)= Qα(ρ1‖σ1) ·Qα(ρ2‖σ2). (4.4.71)

Applying 1α−1 log2 and definitions, additivity follows.

4. Proof of the direct-sum property: Define the classical–quantum operators

ρXA := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, σXA := ∑

x∈Xq(x)|x〉〈x|X ⊗ σx

A. (4.4.72)

Then,

ραXA = ∑

x∈X|x〉〈x|X ⊗ (p(x)ρx

X)α (4.4.73)

σ1−αXA = ∑

x∈X|x〉〈x|X ⊗ (q(x)σx

A)1−α, (4.4.74)

so that

ραXAσ1−α

XA = ∑x∈X|x〉〈x|X ⊗ (p(x)ρx

A)α(q(x)σx

A)1−α (4.4.75)

= ∑x∈X

p(x)αq(x)1−α|x〉〈x|X ⊗ (ρxA)

α(σxA)

1−α, (4.4.76)

and

Qα(ρXA‖σXA) = Tr[ραXAσ1−α

XA ] (4.4.77)

= ∑x∈X

p(x)αq(x)1−αTr[(ρxA)

α(σxA)

1−α] (4.4.78)

= ∑x∈X

p(x)αq(x)1−αQα(ρxA‖σx

A), (4.4.79)

as required.

We now prove the data-processing inequality for the Petz–Rényi relativeentropy for α ∈ [0, 1) ∪ (1, 2].

270

Page 282: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Theorem 4.22 Data-Processing Inequality for Petz–Rényi RelativeEntropy

Let ρ be a state, σ a positive semi-definite operator, and N a quantumchannel. Then, for all α ∈ [0, 1) ∪ (1, 2],

Dα(ρ‖σ) ≥ Dα(N(ρ)‖N(σ)). (4.4.80)

PROOF: We prove the statement for α ∈ (0, 1) ∪ (1, 2]. The case of α = 0then follows by taking the limit α→ 0. From Stinespring’s theorem (Theorem3.18), we know that the action of every channel N on a linear operator X canbe written as

N(X) = TrE[VXV†], (4.4.81)

for some V, where V is an isometry and E is an auxiliary system with di-mension dE ≥ rank(ΓN). As stated in (4.4.44), Dα is isometrically invariant.Therefore, it suffices to show the data-processing inequality for Dα under par-tial trace; i.e., it suffices to show that for every state ρAB and every positivesemi-definite operator σAB,

Dα(ρAB‖σAB) ≥ Dα(ρA‖σA), α ∈ (0, 1) ∪ (1, 2]. (4.4.82)

We now proceed to prove this inequality. We prove it for ρAB, and henceρA, invertible, as well as for σAB and σA invertible. The result follows inthe general case of ρAB and/or ρA non-invertible, as well as σAB and/or σAnon-invertible, by applying the result to the invertible operators (1− δ)ρAB +δπAB and σAB + ε1AB, with δ, ε > 0, and taking the limit δ→ 0+, followed byε→ 0+, because

Dα(ρAB‖σAB) = limε→0+

limδ→0+

Dα((1− δ)ρAB + δπAB‖σAB + ε1AB), (4.4.83)

Dα(ρA‖σA) = limε→0+

limδ→0+

Dα((1− δ)ρA + δπA‖σA + dBε1A), (4.4.84)

which can be verified in a similar manner to the proof of (4.4.8) in Proposi-tion 4.19.

Using the quasi-entropy Qα, we can equivalently write (4.4.82) as

Qα(ρAB‖σAB) ≥ Qα(ρA‖σA), for α ∈ (1, 2],Qα(ρAB‖σAB) ≤ Qα(ρA‖σA), for α ∈ (0, 1).

(4.4.85)

The remainder of this proof is thus devoted to establishing (4.4.85).

271

Page 283: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Consider that

Qα(ρAB‖σAB) = 〈ϕρAB | f (ρ−1AB ⊗ σT

AB)|ϕρAB〉, (4.4.86)

Qα(ρA‖σA) = 〈ϕρA | f (ρ−1A ⊗ σT

A)|ϕρA〉, (4.4.87)

where we have setf (x) := x1−α (4.4.88)

and

|ϕρAB〉 = (ρ12AB ⊗ 1AB)|Γ〉ABAB = (ρ

12AB ⊗ 1AB)|Γ〉AA|Γ〉BB, (4.4.89)

|ϕρA〉 = (ρ12A ⊗ 1A)|Γ〉AA. (4.4.90)

Now, let us define the isometry VAA→ABAB as1

VAA→ABAB = ρ12AB(ρ

− 12

A ⊗ 1A)|Γ〉BB. (4.4.91)

Observe then that

VAA→ABAB|ϕρA〉AA = ρ12AB(ρ

− 12

A ⊗ 1A)(ρ12A ⊗ 1A)|Γ〉AA|Γ〉BB (4.4.92)

= (ρ12AB ⊗ 1AB)|Γ〉AA|Γ〉BB (4.4.93)

= |ϕρAB〉. (4.4.94)

We thus obtain, using the operator Jensen inequality (Theorem 2.15),

Qα(ρAB‖σAB) = 〈ϕρA |V† f (ρ−1AB ⊗ σT

AB)V|ϕρA〉 (4.4.95)

≥ 〈ϕρA | f (V†(ρ−1AB ⊗ σT

AB)V)|ϕρA〉, for α ∈ (1, 2], (4.4.96)

and

Qα(ρAB‖σAB) = 〈ϕρA |V† f (ρ−1AB ⊗ σT

AB)V|ϕρA〉 (4.4.97)

≤ 〈ϕρA | f (V†(ρ−1AB ⊗ σT

AB)V)|ϕρA〉, for α ∈ [0, 1). (4.4.98)

Note that the operator Jensen inequality is applicable because for α ∈ (1, 2]the function f in (4.4.88) is operator convex and for α ∈ (0, 1) it is operatorconcave.2 Now, consider that

V†(ρ−1AB ⊗ σT

AB)V

1Observe that the isometry is related to the isometric extension in (3.2.274) of the Petzrecovery channel for the partial trace, as discussed in Section 3.2.10.

2Indeed, the function xβ is operator convex for β ∈ [−1, 0) ∪ [1, 2] and operator concavefor β ∈ (0, 1], where here β = 1− α.

272

Page 284: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= 〈Γ|BB(ρ− 1

2A ⊗ 1A)ρ

12AB(ρ

−1AB ⊗ σT

AB)ρ12AB(ρ

− 12

A ⊗ 1A)|Γ〉BB (4.4.99)

= 〈Γ|BB(ρ− 1

2A ⊗ 1A)(ρ

0AB ⊗ σT

AB)(ρ− 1

2A ⊗ 1A)|Γ〉BB (4.4.100)

= 〈Γ|BB(ρ− 1

2A ⊗ 1A)(1AB ⊗ σT

AB)(ρ− 1

2A ⊗ 1A)|Γ〉BB (4.4.101)

= ρ−1A ⊗ 〈Γ|BBσT

AB|Γ〉BB (4.4.102)

= ρ−1A ⊗ σT

A, (4.4.103)

where to obtain the last equality we used the fact that

〈Γ|BBσT

AB|Γ〉BB = TrB[σT

AB] = σT

A, (4.4.104)

the last equality due to the fact that the transpose is taken on a product basisfor HA ⊗HB. Therefore, we find that

Qα(ρAB‖σAB) ≥ Qα(ρA‖σA), for α ∈ (1, 2], (4.4.105)Qα(ρAB‖σAB) ≤ Qα(ρA‖σA), for α ∈ (0, 1), (4.4.106)

as required. This establishes the data-processing inequality for Dα underpartial trace. Combining this with the isometric invariance of Dα and Stine-spring’s theorem, we conclude that

Dα(ρ‖σ) ≥ Dα(N(ρ)‖N(σ)), α ∈ (0, 1) ∪ (1, 2] (4.4.107)

for every state ρ, positive semi-definite operator σ, and channel N.

By taking the limit α → 1 in the statement of data-processing inequalityfor Dα, and using Proposition 4.20, we immediately obtain the data-processinginequality for the quantum relative entropy, stated previously as Theorem 4.4.

Corollary 4.23 Data-Processing Inequality for Quantum RelativeEntropy

Let ρ be a state, σ a positive semi-definite operator, and N a quantumchannel. Then,

D(ρ‖σ) ≥ D(N(ρ)‖N(σ)). (4.4.108)

The data-processing inequality for the Petz–Rényi relative entropy can bewritten using the Petz–Rényi relative quasi-entropy Qα as

1α− 1

log2 Qα(ρ‖σ) ≥1

α− 1log2 Qα(N(ρ)|N(σ)) (4.4.109)

273

Page 285: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

for all α ∈ [0, 1)∪ (1, 2]. Then, since α− 1 is negative for α ∈ [0, 1), we can usethe monotonicity of the function log2 to conclude that

Qα(ρ‖σ) ≥ Qα(N(ρ)‖N(σ)), α ∈ (1, 2], (4.4.110)Qα(ρ‖σ) ≤ Qα(N(ρ)‖N(σ)), α ∈ [0, 1). (4.4.111)

With the data-processing inequality for the Petz–Rényi relative entropyin hand, it is now straightforward to prove some of the following additionalproperties.

Proposition 4.24 Additional Properties of Petz–Rényi Relative En-tropy

The Petz–Rényi relative entropy Dα satisfies the following properties forevery state ρ and positive semi-definite operator σ for α ∈ (0, 1) ∪ (1, 2]:

1. If Tr(σ) ≤ Tr(ρ) = 1, then Dα(ρ‖σ) ≥ 0.

2. Faithfulness: If Tr[σ] ≤ 1, we have that Dα(ρ‖σ) = 0 if and only ifρ = σ.

3. If ρ ≤ σ, then Dα(ρ‖σ) ≤ 0.

4. For every positive semi-definite operator σ′ such that σ′ ≥ σ, wehave Dα(ρ‖σ) ≥ Dα(ρ‖σ′).

PROOF:

1. By the data-processing inequality for Dα with respect to the trace channelTr, and letting x = Tr(ρ) = 1 and y = Tr(σ), we find that

Dα(ρ‖σ) ≥ Dα(x‖y) = 1α− 1

log2 Tr[xαy1−α] (4.4.112)

=1

α− 1log2(y

1−α) (4.4.113)

=1− α

α− 1log2 y (4.4.114)

= − log2 y (4.4.115)≥ 0, (4.4.116)

where the last line follows from the assumption that y = Tr(σ) ≤ 1.

2. Proof of faithfulness: If ρ = σ, then the following equalities hold for all

274

Page 286: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

α ∈ (0, 1) ∪ (1, 2):

Dα(ρ‖ρ) =1

α− 1log2 Tr[ραρ1−α] (4.4.117)

=1

α− 1log2 Tr(ρ) (4.4.118)

= 0. (4.4.119)

Next, suppose that α ∈ (0, 1) ∪ (1, 2) and Dα(ρ‖σ) = 0. From the above,we conclude that Dα(Tr(ρ)‖Tr(σ)) = − log2 y ≥ 0. From the fact thatlog2 y = 0 if and only if y = 1, we conclude that Dα(ρ‖σ) = 0 impliesTr(σ) = Tr(ρ) = 1, so that σ is a density operator. Then, for every mea-surement channel M,

Dα(M(ρ)‖M(σ)) ≤ Dα(ρ‖σ) = 0. (4.4.120)

On the other hand, since Tr(σ) = Tr(ρ),

D(M(ρ)‖M(σ)) ≥ Dα(Tr(M(ρ))‖Tr(M(σ))) (4.4.121)= Dα(Tr(ρ)‖Tr(σ)) (4.4.122)= 0, (4.4.123)

which means that Dα(M(ρ)‖M(σ)) = 0 for all measurement channels M.Now, recall that M(ρ) and M(σ) are effectively probability distributionsdetermined by the measurement. Since the classical Rényi relative en-tropy is equal to zero if and only if its two arguments are equal, we canconclude that M(ρ) = M(σ). Since this is true for every measurementchannel, we conclude from Theorem 3.54 and the fact that the trace normis a norm that ρ = σ.

So we have that Dα(ρ‖σ) = 0 if and only if ρ = σ, as required.

3. Consider that ρ ≤ σ implies that σ − ρ ≥ 0. Then define the followingpositive semi-definite operators:

ρ := |0〉〈0| ⊗ ρ, (4.4.124)σ := |0〉〈0| ⊗ ρ + |1〉〈1| ⊗ (σ− ρ) . (4.4.125)

By exploiting the direct-sum property of Petz–Rényi relative entropy in(4.4.46) and the data-processing inequality, we find that

0 = Dα(ρ‖ρ) = Dα(ρ‖σ) ≥ Dα(ρ‖σ), (4.4.126)

where the inequality follows from data processing with respect to partialtrace over the classical register.

275

Page 287: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

4. Consider the state ρ := |0〉〈0| ⊗ ρ and the operator σ := |0〉〈0| ⊗ σ +|1〉〈1| ⊗ (σ′ − σ), which is positive semi-definite because σ′ ≥ σ by as-sumption.

Thenρασ1−α = |0〉〈0| ⊗ ρασ1−α, (4.4.127)

which implies thatDα(ρ‖σ) = Dα(ρ‖σ). (4.4.128)

Then, observing that Tr1[σ] = σ′, and using the data-processing inequal-ity for Dα with respect to the partial trace channel Tr1, we conclude that

Dα(ρ‖σ′) = Dα(Tr1(ρ)‖Tr1(σ)) ≤ Dα(ρ‖σ) = Dα(ρ‖σ), (4.4.129)

as required.

Proposition 4.25 Joint Convexity & Concavity of the Petz–RényiRelative Quasi-Entropy

Let p : X → [0, 1] be a probability distribution over a finite alphabet Xwith associated |X|-dimensional system X, let ρx

Ax∈X be a set of stateson a system A, and let σx

Ax∈X be a set of positive semi-definite opera-tors on A. Then,

Qα(ρXA‖σXA) ≤ ∑x∈X

p(x)Qα(ρxA‖σx

A), for α ∈ (1, 2], (4.4.130)

Qα(ρXA‖σXA) ≥ ∑x∈X

p(x)Qα(ρxA‖σx

A), for α ∈ [0, 1), (4.4.131)

where

ρXA := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, (4.4.132)

σXA := ∑x∈X

p(x)|x〉〈x|X ⊗ σxA. (4.4.133)

Thus, the Petz–Rényi relative entropy Dα is jointly convex for α ∈ [0, 1):

Dα(ρXA‖σXA) ≤ ∑x∈X

p(x)Dα(ρxA‖σx

A), α ∈ [0, 1). (4.4.134)

PROOF: By the direct-sum property of Qα and applying (4.4.110)–(4.4.111)and Proposition 4.15, we conclude (4.4.130)–(4.4.131).

276

Page 288: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

For α ∈ [0, 1), applying log2 to both sides of (4.4.131) and multiplying by1

α−1 , which is negative, we conclude that

1α− 1

log2 Qα

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥ ∑x∈X

p(x)σxA

)

≤ 1α− 1

log2

(∑

x∈Xp(x)Qα(ρ

xA‖σx

A)

).

(4.4.135)

Then, since − log2 is a convex function, and using the definition of Dα interms of Qα, we find that

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥ ∑x∈X

p(x)σxA

)≤ ∑

x∈Xp(x)

1α− 1

log2 Qα(ρxA‖σx

A) (4.4.136)

= ∑x∈X

p(x)Dα(ρxA‖σx

A), (4.4.137)

as required.

4.5 Sandwiched Rényi Relative Entropy

A second example of a generalized divergence is the sandwiched Rényi rela-tive entropy, which we define as follows.

Definition 4.26 Sandwiched Rényi Relative Entropy

For all α ∈ (0, 1) ∪ (1, ∞), we define the sandwiched Rényi relative quasi-entropy for every state ρ and positive semi-definite operator σ as

Qα(ρ‖σ)

:=

Tr[(

σ1−α2α ρσ

1−α2α

)α] if α ∈ (0, 1), orα ∈ (1, ∞), supp(ρ) ⊆ supp(σ),

+∞ otherwise.

(4.5.1)

The sandwiched Rényi relative entropy is then defined as

Dα(ρ‖σ) :=1

α− 1log2 Qα(ρ‖σ). (4.5.2)

277

Page 289: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Observe that we can use the definition of the Schatten norm from (2.2.71)to write the sandwiched Rényi relative entropy Dα in the following differentways:

Dα(ρ‖σ) =α

α− 1log2

∥∥∥σ1−α2α ρσ

1−α2α

∥∥∥α

(4.5.3)

α− 1log2

∥∥∥ρ12 σ

1−αα ρ

12

∥∥∥α

(4.5.4)

=2α

α− 1log2

∥∥∥ρ12 σ

1−α2α

∥∥∥2α

. (4.5.5)

The expression in (4.5.4) and Proposition 2.12 then lead us to the followingvariational characterization of Dα for α ∈ (0, 1) ∪ (1, ∞):

Dα(ρ‖σ) = supτ>0,

Tr(τ)=1

Dα(ρ‖σ; τ), (4.5.6)

where

Dα(ρ‖σ; τ)

:=

+∞ if α > 1, supp(ρ) * supp(σ),

αα−1 log2 Tr

12 σ

1−αα ρ

12 τ

α−1α

]otherwise.

(4.5.7)

Also observe that for α = 12 , the sandwiched Rényi relative entropy D1

2(ρ‖σ)

can be expressed asD1

2(ρ‖σ) = − log2 F(ρ, σ), (4.5.8)

where we recall the definition of the fidelity F(ρ, σ) from Definition 3.55.

In the case α ∈ (1, ∞), since 1− α is negative, we take the inverse of σ.In case σ is not invertible, we take the inverse of σ on its support. An alter-native to this convention is to define Dα(ρ‖σ) for α > 1 using only positivedefinite σ, and for positive semi-definite σ, define

Dα(ρ‖σ) = limε→0+

Dα(ρ‖σ + ε1). (4.5.9)

Both alternatives are equivalent, as we now show (similar to what we did inthe proofs of Propositions 4.2 and 4.19).

278

Page 290: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.27

For every state ρ and positive semi-definite operator σ,

Dα(ρ‖σ) = limε→0+

1α− 1

log2 Tr[(

ρ12 (σ + ε1)

1−αα ρ

12

)α]. (4.5.10)

PROOF: For α ∈ (0, 1), this is immediate from the fact that the logarithm,trace, and power functions are continuous, so that the limit can be broughtinside the trace and inside the power (σ + ε1)

1−α2α .

For α ∈ (1, ∞), since 1− α is negative and σ is not necessarily invertible, letus start by decomposing the underlying Hilbert space H as H = supp(σ)⊕ker(σ), as in (4.2.6), so that

ρ12 =

((ρ

12 )0,0 (ρ

12 )0,1

(ρ12 )†

0,1 (ρ12 )1,1

), σ =

(σ 00 0

). (4.5.11)

Then, writing 1 = Πσ + Π⊥σ , where Πσ is the projection onto the support ofσ and Π⊥σ is the projection onto the orthogonal complement of supp(σ), wefind that

σ + ε1 =

(σ + εΠσ 0

0 εΠ⊥σ

), (4.5.12)

which implies that

(σ + ε1)1−α

α =

((σ + εΠσ)

1−αα 0

0 (εΠ⊥σ )1−α

α

). (4.5.13)

If supp(ρ) ⊆ supp(σ), then (ρ12 )1,0 = 0, (ρ

12 )1,1 = 0, and (ρ

12 )0,0 = ρ

12 ,

which means that

ρ12 (σ + ε1)

1−αα ρ

12 =

((ρ

12 )0,0(σ + εΠσ)

1−αα (ρ

12 )0,0 0

0 0

), (4.5.14)

so that

limε→0+

1α− 1

log2 Tr[(

ρ12 (σ + ε1)

1−αα ρ

12

)α]= Dα(ρ‖σ), α ∈ (1, ∞), (4.5.15)

as required.

279

Page 291: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

If supp(ρ) * supp(σ), then (ρ12 )1,1 is non-zero. In this case, we use the

fact that

(σ + ε1)1−α

α =

((σ + εΠσ)

1−αα 0

0 (εΠ⊥σ )1−α

α

)≥(

0 00 (εΠ⊥σ )

1−αα

)(4.5.16)

to conclude that

ρ12 (σ + ε1)

1−αα ρ

12

≥((ρ

12 )0,1(εΠ⊥σ )

1−αα (ρ

12 )†

0,1 (ρ12 )0,1(εΠ⊥σ )

1−αα (ρ

12 )1,1

(ρ12 )1,1(εΠ⊥σ )

1−αα (ρ

12 )†

0,1 (ρ12 )1,1(εΠ⊥σ )

1−αα (ρ

12 )1,1

)(4.5.17)

= ε1−α

α

((ρ

12 )0,1(Π⊥σ )

1−αα (ρ

12 )†

0,1 (ρ12 )0,1(Π⊥σ )

1−αα (ρ

12 )1,1

(ρ12 )1,1(Π⊥σ )

1−αα (ρ

12 )†

0,1 (ρ12 )1,1(Π⊥σ )

1−αα (ρ

12 )1,1

). (4.5.18)

Now, since α ∈ (1, ∞), we have that limε→0+ ε1−α

α = +∞; therefore, by conti-nuity arguments similar to those given above, we conclude that

limε→0+

1α− 1

log2 Tr[(

ρ12 (σ + ε1)

1−αα ρ

12

)α]≥ +∞, (4.5.19)

for the case α ∈ (1, ∞) and supp(ρ) * supp(σ). This implies that

limε→0+

1α− 1

log2 Tr[(

ρ12 (σ + ε1)

1−αα ρ

12

)α]= Dα(ρ‖σ), (4.5.20)

for the case α ∈ (1, ∞) and supp(ρ) * supp(σ), as required.

The Petz–Rényi and sandwiched Rényi relative entropies are two waysof defining a quantum generalization of the classical Rényi relative entropyin (4.4.19). Indeed, if ρ and σ are both classical, commuting states (i.e., bothare diagonal in the same basis), then both Dα(ρ‖σ) and Dα(ρ‖σ) reduce to theclassical Rényi relative entropy in (4.4.19). In general, there are often many (infact, typically infinitely many) ways to generalize classical quantities to thequantum (i.e., non-commutative) case such that we recover the original clas-sical quantity in the special case of commuting operators. What distinguishesone generalization from another is the role that they play in characterizingoperational tasks in quantum information theory, which is a theme exploredthroughout this book.

We now establish the important fact that the quantum relative entropy isa special case of the sandwiched Rényi relative entropy in the limit α → 1.The proof proceeds very similarly to the proof of the same property for thePetz–Rényi relative entropy.

280

Page 292: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.28

Let ρ be a state and σ a positive semi-definite operator. Then, in the limitα→ 1, the sandwiched Rényi relative entropy converges to the quantumrelative entropy:

limα→1

Dα(ρ‖σ) = D(ρ‖σ). (4.5.21)

PROOF: Let us first consider the case α ∈ (1, ∞). If supp(ρ) * supp(σ),then Dα(ρ‖σ) = +∞ for all α ∈ (1, ∞), so that limα→1+ Dα(ρ‖σ) = +∞. Ifsupp(ρ) ⊆ supp(σ), then Dα(ρ‖σ) is finite and using (4.5.4) we write

Dα(ρ‖σ) =1

α− 1log2 Qα(ρ‖σ) =

1α− 1

log2 Tr[(

ρ12 σ

1−αα ρ

12

)α]. (4.5.22)

Let us define the function

Qα,β(ρ‖σ) := Tr[(

ρ12 σ

1−αα ρ

12

)β]

, (4.5.23)

so that Qα(ρ‖σ) = Qα,α(ρ‖σ). By noting that supp(ρ) ⊆ supp(σ) impliesQ1(ρ‖σ) = Tr[ρΠσ] = 1 (where Πσ is the projection onto the support of σ),and since log2 1 = 0, we can write Dα(ρ‖σ) as

Dα(ρ‖σ) =log2 Qα(ρ‖σ)− log2 Q1(ρ‖σ)

α− 1, (4.5.24)

so that

limα→1

Dα(ρ‖σ) =d

dαlog2 Qα(ρ‖σ)

∣∣∣∣α=1

(4.5.25)

=1

ln(2)

ddα Qα(ρ‖σ)

∣∣∣α=1

Q1(ρ‖σ)(4.5.26)

=1

ln(2)d

dαQα(ρ‖σ)

∣∣∣∣α=1

, (4.5.27)

where to obtain the first equality we used the definition of the derivative andto obtain the second equality we used the derivative of the natural logarithmalong with the chain rule. Using the function Qα,β and the chain rule, wewrite

ddα

Qα(ρ‖σ)∣∣∣∣α=1

=d

dαQα,1(ρ‖σ)

∣∣∣∣α=1

+d

dβQ1,β(ρ‖σ)

∣∣∣∣β=1

. (4.5.28)

281

Page 293: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Then,

ddα

Qα,1(ρ‖σ) =d

dαTr[ρσ

1−αα

]= − 1

α2 Tr[ρσ

1−αα ln σ

], (4.5.29)

so that

ddα

Qα,1(ρ‖σ)∣∣∣∣α=1

= −Tr[ρΠσ ln σ] = −Tr[ρ ln σ], (4.5.30)

where we used the fact that supp(ρ) ⊆ supp(σ) to obtain the last equality.Similarly,

ddβ

Q1,β(ρ‖σ) =d

dβTr[(

ρ12 Πσρ

12

)β]=

ddβ

Tr[ρβ] = Tr[ρβ ln ρ], (4.5.31)

where we again used the fact that supp(ρ) ⊆ supp(σ) in order to concludethat ρ

12 Πσρ

12 = ρ. Therefore,

ddβ

Q1,β(ρ‖σ)∣∣∣∣β=1

= Tr[ρ ln ρ]. (4.5.32)

So we find that

limα→1

Dα(ρ‖σ) =1

ln(2)d

dαQα(ρ‖σ)

∣∣∣∣α=1

= Tr[ρ log2 ρ]− Tr[ρ log2 σ] (4.5.33)

when supp(ρ) ⊆ supp(σ). Therefore, for α ∈ (1, ∞),

limα→1+

Dα(ρ‖σ)

=

Tr[ρ log2 ρ]− Tr[ρ log2 σ] if supp(ρ) ⊆ supp(σ),+∞ otherwise

= D(ρ‖σ).

(4.5.34)

Let us now consider the case α ∈ (0, 1). If supp(ρ) ⊆ supp(σ), then sincethe limit in (4.5.33) holds from both sides, we find that

limα→1−

Dα(ρ‖σ) = Tr[ρ log2 ρ]− Tr[ρ log2 σ]. (4.5.35)

If supp(ρ) * supp(σ) (and Tr[ρσ] 6= 0), then observe that we can write Dα as

Dα(ρ‖σ) =log2 Qα(ρ‖σ)− log2 Q1(ρ‖σ)

α− 1+

log2 Q1(ρ‖σ)α− 1

, (4.5.36)

282

Page 294: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

so that

limα→1−

Dα(ρ‖σ) =d

dαlog2 Qα(ρ‖σ)

∣∣∣∣α=1

+ limα→1−

log2 Tr[ρΠσ]

α− 1, (4.5.37)

where we have used Q1(ρ‖σ) = Tr[ρΠσ]. Now, since supp(ρ) * supp(σ) andTr[ρσ] 6= 0, we have that 0 < Tr[ρΠσ] < 1, which means that log2 Tr[ρΠσ] < 0.Since limα→1−

1α−1 = −∞, we find that the second term in (4.5.37) is equal to

+∞, which means that limα→1− Dα(ρ‖σ) = +∞. Therefore,

limα→1−

Dα(ρ‖σ)

=

Tr[ρ log2 ρ]− Tr[ρ log2 σ] if supp(ρ) ⊆ supp(σ),+∞ otherwise

= D(ρ‖σ).

(4.5.38)

To conclude, we have that limα→1+ Dα(ρ‖σ) = limα→1− Dα(ρ‖σ) = D(ρ‖σ),which means that (4.5.21) holds.

In the following proposition, we state several basic properties of the sand-wiched Rényi relative entropy. The proofs of the first four properties are anal-ogous to those of the same properties of the Petz–Rényi relative entropy. Thelast property in the proposition establishes that the sandwiched Rényi relativeentropy is always less than or equal to the Petz–Rényi relative entropy.

Proposition 4.29 Properties of Sandwiched Rényi Relative Entropy

For all states ρ, ρ1, ρ2 and positive semi-definite operators σ, σ1, σ2, thesandwiched Rényi relative entropy Dα satisfies the following properties:

1. Isometric invariance: For all α ∈ (0, 1) ∪ (1, ∞) and for every isome-try V,

Dα(ρ‖σ) = Dα(VρV†‖VσV†). (4.5.39)

2. Monotonicity in α: For all α ∈ (0, 1) ∪ (1, ∞), Dα is monotonicallyincreasing in α, i.e., α < β implies Dα(ρ‖σ) ≤ Dβ(ρ‖σ).

3. Additivity: For all α ∈ (0, 1) ∪ (1, ∞),

Dα(ρ1 ⊗ ρ2‖σ1 ⊗ σ2) = Dα(ρ1‖σ1) + Dα(ρ2‖σ2). (4.5.40)

4. Direct-sum property: Let p : X → [0, 1] be a probability distributionover a finite alphabet X with associated |X|-dimensional system X,

283

Page 295: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

and let q : X → [0, ∞) be a positive function on X. Let ρxAx∈X be

a set of states on a system A, and let σxAx∈X be a set of positive

semi-definite operators on A. Then,

Qα(ρXA‖σXA) = ∑x∈X

p(x)αq(x)1−αQα(ρxA‖σx

A). (4.5.41)

where

ρXA := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, (4.5.42)

σXA := ∑x∈X

q(x)|x〉〈x|X ⊗ σxA. (4.5.43)

5. If ρ1 ≤ γρ2 for some γ ≥ 1, then

Dα(ρ1‖σ) ≤α

α− 1log2 γ + Dα(ρ2‖σ), α > 1. (4.5.44)

6. For all α ∈ (0, 1)∪ (1, ∞), the sandwiched Rényi relative entropy Dα

is always less than or equal to the Petz–Rényi relative entropy Dα,i.e.,

Dα(ρ‖σ) ≤ Dα(ρ‖σ). (4.5.45)

Furthermore, for α ∈ (0, 1), we have

αDα(ρ‖σ) + (1− α)(− log2 Tr[σ]) ≤ Dα(ρ‖σ). (4.5.46)

PROOF:

1. Proof of isometric invariance: Let us start by writing Dα(ρ‖σ) using thefunction ‖·‖α as in (4.5.4):

Dα(ρ‖σ) = limε→0

α

α− 1log2

∥∥∥ρ12 (σ + ε1)

1−αα ρ

12

∥∥∥α

, (4.5.47)

where we have also made use of the fact that for positive semi-definiteoperators, Dα(ρ‖σ) can be defined as in (4.5.10). Now,

Dα(VρV†‖VσV†)

= limε→0

α

α− 1log2

∥∥∥(VρV†)12 (VσV† + ε1)

1−αα (VρV†)

12

∥∥∥α

.(4.5.48)

284

Page 296: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Since (VρV†)12 = Vρ

12 V†, we find that

∥∥∥(VρV†)12 (VσV† + ε1)

1−αα (VρV†)

12

∥∥∥α

=∥∥∥Vρ

12 V†(VσV† + ε1)

1−αα Vρ

12 V†

∥∥∥α

(4.5.49)

=∥∥∥ρ

12 V†(VσV† + ε1)

1−αα Vρ

12

∥∥∥α

, (4.5.50)

where to obtain the last equality we used the isometric invariance of thefunction ‖·‖α. Now, let Π := VV† be the projection onto the image of V,and let Π := 1−Π. Then, we write

VσV† + ε1 = VσV† + εΠ + εΠ = V(σ + ε1)V† + εΠ. (4.5.51)

Since V(σ + ε1)V† and εΠ are supported on orthogonal subspaces, weobtain

(VσV† + ε1)1−α

α = V(σ + ε1)1−α

α V† + ε1−α

α Π. (4.5.52)

Continuing from (4.5.50), we thus find that∥∥∥ρ

12 V†(VσV† + ε1)

1−αα Vρ

12

∥∥∥α

=∥∥∥ρ

12 V†

(V(σ + ε1)

1−αα V† + ε

1−αα Π

)Vρ

12

∥∥∥α

(4.5.53)

=∥∥∥ρ

12 V†V(σ + ε1)

1−αα V†Vρ

12 + ε

1−αα ρ

12 V†ΠVρ

12

∥∥∥α

(4.5.54)

=∥∥∥ρ

12 (σ + ε1)

1−αα ρ

12

∥∥∥α

, (4.5.55)

where to obtain the last equality we used the fact that V†ΠV = V†V −V†VV†V = 1− 1 = 0. Therefore,

Dα(VρV†‖VσV†) = limε→0

α

α− 1log2

∥∥∥ρ12 (σ + ε1)

1−αα ρ

12

∥∥∥α

= Dα(ρ‖σ),(4.5.56)

as required.

2. Proof of monotonicity in α: We make use of the function Dα(ρ‖σ; τ) definedin (4.5.7), which we can write as

Dα(ρ‖σ; τ) = − 1γ

log2〈ϕρ|Xγ|ϕρ〉 = − 1γ

ln〈ϕρ|Xγ|ϕρ〉ln(2)

, (4.5.57)

285

Page 297: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

where X = τ−1 ⊗ σT, γ := 1−αα and |ϕρ〉 = (ρ

12 ⊗ 1)|Γ〉 is a purification

of ρ. We prove monotonicity of this quantity by taking its derivative withrespect to α and showing that it is non-negative. Since dγ

dα = − 1α2 , we can

express the derivative with respect to α in terms of the derivative withrespect to γ using d

dα = ddγ

dγdα = − 1

α2d

dγ . Therefore,

ddα

Dα(ρ‖σ; τ) = − 1α2

ddγ

(− 1

γ

ln〈ϕρ|Xγ|ϕρ〉ln(2)

)=

1α2

d fdγ

, (4.5.58)

where

f (γ) :=1γ

ln〈ϕρ|Xγ|ϕρ〉ln(2)

. (4.5.59)

Now,

d fdγ

=1

ln(2)

(− 1

γ2 ln〈ϕρ|Xγ|ϕρ〉+ 1γ

〈ϕρ|Xγ ln X|ϕρ〉〈ϕρ|Xγ|ϕρ〉

)(4.5.60)

=−〈ϕρ|Xγ|ϕρ〉 log2〈ϕρ|Xγ|ϕρ〉+ 〈ϕρ|Xγ log2 Xγ|ϕρ〉

γ2〈ϕρ|Xγ|ϕρ〉 . (4.5.61)

Now, let g(x) := x log2 x. Then, we can write

d fdγ

=〈ϕρ|g(Xγ)|ϕρ〉 − g(〈ϕρ|Xγ|ϕρ〉)

γ2〈ϕρ|Xγ|ϕρ〉 . (4.5.62)

Since g is operator convex, the operator Jensen inequality in (2.3.22) im-plies that

〈ϕρ|g(Xγ)|ϕρ〉 ≥ g(〈ϕρ|Xγ|ϕρ〉), (4.5.63)

which implies that d fdγ ≥ 0. Therefore, Dα(ρ‖σ; τ) is monotonically in-

creasing in α for all ρ, σ, τ. By (4.5.6), we conclude that Dα(ρ‖σ) is mono-tonically increasing in α, as required.

3. Proof of additivity: When all quantities are finite, we have that

Dα(ρ1 ⊗ ρ2‖σ1 ⊗ σ2)

=1

α− 1log2 Tr

[((σ1 ⊗ σ2)

1−α2α (ρ1 ⊗ ρ2)(σ1 ⊗ σ2)

1−α2α

)α].

(4.5.64)

Using the fact that (X ⊗ Y)β = Xβ ⊗ Yβ for all positive semi-definiteoperators X, Y and all β ∈ R, we obtain

Qα(ρ1 ⊗ ρ2‖σ1 ⊗ σ2)

286

Page 298: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= Tr[(

(σ1 ⊗ σ2)1−α2α (ρ1 ⊗ ρ2)(σ1 ⊗ σ2)

1−α2α

)α](4.5.65)

= Tr[((

σ1−α2α

1 ⊗ σ1−α2α

2

)(ρ1 ⊗ ρ2)

1−α2α

1 ⊗ σ1−α2α

2

))α](4.5.66)

= Tr[(

σ1−α2α

1 ρ1σ1−α2α

1 ⊗ σ1−α2α

2 ρ2σ1−α2α

2

)α](4.5.67)

= Tr[(

σ1−α2α

1 ρ1σ1−α2α

1

⊗(

σ1−α2α

2 ρ2σ1−α2α

2

)α](4.5.68)

= Tr[(

σ1−α2α

1 ρ1σ1−α2α

1

)α]· Tr[(

σ1−α2α

2 ρ2σ1−α2α

2

)α](4.5.69)

= Qα(ρ1‖σ1) · Qα(ρ2‖σ2). (4.5.70)

Applying 1α−1 log2 and definitions, additivity follows.

4. Proof of the direct-sum property: Define the classical–quantum state andoperator, respectively, as

ρXA := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, σXA := ∑

x∈Xq(x)|x〉〈x|X ⊗ σx

A. (4.5.71)

Then, since

σ1−α2α

XA = ∑x∈X|x〉〈x|X ⊗ (q(x)σx

A)1−α2α , (4.5.72)

we find that

σ1−α2α

XA ρXAσ1−α2α

XA = ∑x∈X|x〉〈x|X ⊗ (q(x)σx

A)1−α2α (p(x)ρx

A)(q(x)σxA)

1−α2α (4.5.73)

= ∑x∈X

p(x)q(x)1−α

α |x〉〈x|X ⊗ (σxA)

1−α2α ρx

A(σxA)

1−α2α , (4.5.74)

which means that(

σ1−α2α

XA ρXAσ1−α2α

XA

= ∑x∈X

p(x)αq(x)1−α|x〉〈x|X ⊗((σx

A)1−α2α ρx

A(σxA)

1−α2α

)α.

(4.5.75)

Taking the trace on both sides of this equation, and using the definitionof Qα, we conclude that

Qα(ρXA‖σXA) = ∑x∈X

p(x)αq(x)1−αQα(ρxA‖σx

A), (4.5.76)

as required.

287

Page 299: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

5. From the assumption that ρ1 ≤ γρ2, we have that

σ1−α2α ρ1σ

1−α2α ≤ γσ

1−α2α ρ2σ

1−α2α . (4.5.77)

Then, using (2.2.133), we obtain

Tr[(

σ1−α2α ρ1σ

1−α2α

)α]≤ γαTr

[(σ

1−α2α ρ2σ

1−α2α

)α]. (4.5.78)

The result follows after applying the logarithm and dividing by α− 1 onboth sides of this inequality.

6. This follows from the Araki–Lieb–Thirring inequalities, which we statehere without proof (see the Bibliographic Notes in Section 4.13): for posi-tive semi-definite operators A and B acting on a finite-dimensional Hilbertspace, and for q ≥ 0, the following inequalities hold

(a) Tr[(

B12 AB

12

)rq]≥ Tr

[(B

r2 ArB

r2

)q]for all r ∈ [0, 1].

(b) Tr[(

B12 AB

12

)rq]≤ Tr

[(B

r2 ArB

r2

)q]for all r ≥ 1.

For α ∈ (0, 1), we make use of the first of these inequalities. In particular,we set q = 1, r = α, A = ρ and B = σ

1−αα . Then, letting γ := 1−α

2α , weobtain

Tr[(σγρσγ)α] ≥ Tr[σαγρασαγ] = Tr

1−α2 ρασ

1−α2

]= Tr[ρασ1−α], (4.5.79)

where the last equality holds by cyclicity of the trace. Since the logarithmfunction is monotonically increasing, this inequality implies that

log2 Tr[(

σ1−α2α ρσ

1−α2α

)α]≥ log2 Tr

[ρασ1−α

]. (4.5.80)

Finally, since α− 1 < 0 for all α ∈ (0, 1), we conclude that

1α− 1

log2 Tr[(

σ1−α2α ρσ

1−α2α

)α]≤ 1

α− 1log2 Tr[ρασ1−α]. (4.5.81)

That is, Dα(ρ‖σ) ≤ Dα(ρ‖σ), as required.

For α ∈ (1, ∞), we make use of the second Araki–Lieb–Thirring inequal-ity above. As before, we let q = 1, r = α, A = ρ, and B

12 = σγ. We find

that

Tr[(

σ1−α2α ρσ

1−α2α

)α]≤ Tr

1−α2 ρασ

1−α2

]= Tr[ρασ1−α]. (4.5.82)

288

Page 300: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Then, since the logarithm function is a monotonically increasing functionand α− 1 > 0 for all α ∈ (1, ∞), we conclude that

1α− 1

log2 Tr[(

σ1−α2α ρσ

1−α2α

)α]≤ 1

α− 1log2 Tr[ρασ1−α], (4.5.83)

i.e., Dα(ρ‖σ) ≤ Dα(ρ‖σ), as required.

For the inequality in (4.5.46), with ρ a state and σ a positive semi-definiteoperator, we use the following “reverse” Araki–Lieb–Thirring inequality,which we state here without proof (see the Bibliographic Notes in Sec-tion 4.13):

Tr[(

B12 AB

12

)rq]≤(

Tr[(

Br2 ArB

r2

)q])r ∥∥∥A1−r

2

∥∥∥2rq

a

∥∥∥B1−r

2

∥∥∥2rq

b. (4.5.84)

This inequality holds for all positive semi-definite operators A and B, aswell as for q > 0, r ∈ (0, 1], a, b ∈ (0, ∞], and for 1

2rq = 12q +

1a +

1b . Taking

q = 1, r = α, A = ρ, B = σ1−α

α , a = 21−α , and b = 2α

(1−α)2 , we obtain

Tr[(

σ1−α2α ρσ

1−α2α

)α]

≤(

Tr[(

σ1−α

2 ρσ1−α

2

)])α ∥∥∥ρ1−α

2

∥∥∥2α

21−α

∥∥∥∥σ(1−α)2

∥∥∥∥2α

2α(1−α)2

. (4.5.85)

Now, because ρ is a state,

∥∥∥ρ1−α

2

∥∥∥2α

21−α

=

(Tr[∣∣∣ρ 1−α

2

∣∣∣2

1−α

])α(1−α)

(4.5.86)

=

(Tr[(

ρ1−α

2

) 21−α

])α(1−α)

(4.5.87)

= (Tr[ρ])α(1−α) (4.5.88)= 1. (4.5.89)

For σ, we obtain ∥∥∥∥σ(1−α)2

∥∥∥∥2α

2α(1−α)2

= (Tr[σ])(1−α)2. (4.5.90)

Therefore,

Tr[(

σ1−α2α ρσ

1−α2α

)α]≤(

Tr[ρασ1−α])α· (Tr[σ])(1−α)2

. (4.5.91)

289

Page 301: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Taking the logarithm of both sides and multiplying by 1α−1 , which is neg-

ative for α ∈ (0, 1), we obtain the inequality in (4.5.46).

The monotonicity in α of the sandwiched Rényi relative entropy estab-lishes an inequality relating the quantum relative entropy and the fidelity ofquantum states ρ and σ, by picking α = 1 and α = 1/2, respectively, andapplying Proposition 4.28:

D(ρ‖σ) ≥ − log2 F(ρ, σ). (4.5.92)

We can modify the lower bound a bit to establish an inequality relating thequantum relative entropy and the trace distance:

Corollary 4.30 Quantum Pinsker Inequality

Let ρ and σ be quantum states. Then the following inequality holds

D(ρ‖σ) ≥ 14 ln 2

‖ρ− σ‖21 . (4.5.93)

REMARK: The constant prefactor can be improved from 14 ln 2 to 1

2 ln 2 , but we do not givea proof here (please consult the Bibliographic Notes in Section 4.13).

PROOF: We can rewrite (4.5.92) as follows:

D(ρ‖σ) ≥ − 1ln 2

ln F(ρ, σ) (4.5.94)

= − 1ln 2

ln[1− (1− F(ρ, σ))] (4.5.95)

≥ 1ln 2

(1− F(ρ, σ)) (4.5.96)

≥ 14 ln 2

‖ρ− σ‖21 . (4.5.97)

The second inequality follows from − ln(1 − x) ≥ x, which holds for x ∈[0, 1]. the final inequality follows from Theorem 3.64.

Like the quantum relative entropy and the Petz–Rényi relative entropy,the sandwiched Rényi relative entropy is faithful, meaning that for all statesρ, σ and all α ∈ (0, 1) ∪ (1, ∞),

Dα(ρ‖σ) = 0 ⇐⇒ ρ = σ. (4.5.98)

290

Page 302: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

We prove this in Proposition 4.34 below.

We now prove the data-processing inequality for the sandwiched Rényirelative entropy Dα for α ∈ [1/2, 1)∪ (1, ∞). This, along with Proposition 4.28,gives us a different way (apart from using data-processing inequality for thePetz–Rényi relative entropy) to prove the data-processing inequality for thequantum relative entropy.

Theorem 4.31 Data-Processing Inequality for Sandwiched RényiRelative Entropy

Let ρ be a state, σ a positive semi-definite operator, and N a quantumchannel. Then, for all α ∈ [1/2, 1) ∪ (1, ∞),

Dα(ρ‖σ) ≥ Dα(N(ρ)‖N(σ)). (4.5.99)

PROOF: This proof follows steps very similar to those in the proof of the data-processing inequality for the Petz–Rényi relative entropy (Theorem 4.22), withthe key difference being that in this case we make use of the fact that the sand-wiched Rényi relative entropy can be written as the optimization in (4.5.6).

From Stinespring’s theorem (Theorem 3.18), we know that the action of achannel N on a linear operator X can be written as

N(X) = TrE[VXV†], (4.5.100)

for some V, where V is an isometry and E is an auxiliary system with di-mension dE ≥ rank(ΓN). As stated in (4.5.39), Dα is isometrically invariant.Therefore, it suffices to prove the data-processing inequality for Dα underpartial trace; i.e., it suffices to show that for every state ρAB, every positivesemi-definite operator σAB, and all α ∈ [1/2, 1) ∪ (1, ∞):

Dα(ρAB‖σAB) ≥ Dα(ρA‖σA). (4.5.101)

We now proceed to prove this inequality. We prove it for ρAB, and henceρA, invertible, as well as for σAB and σA invertible. The result follows inthe general case of ρAB and/or ρA non-invertible, as well as σAB and/or σAnon-invertible, by applying the result to the invertible operators (1− δ)ρAB +δπAB and σAB + ε1AB, with δ, ε > 0, and taking the limits ε→ 0+ and δ→ 0+,since

Dα(ρAB‖σAB) = limε→0+

limδ→0+

Dα((1− δ)ρAB + δπAB‖σAB + ε1AB), (4.5.102)

291

Page 303: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Dα(ρA‖σA) = limε→0+

limδ→0+

Dα((1− δ)ρA + δπA‖σA + dBε1A), (4.5.103)

which can be verified in a similar manner to the proof of (4.5.10) in Proposi-tion 4.27.

Let us start by defining the quantity Qα(ρ‖σ; τ) as

Qα(ρ‖σ; τ) := 〈ϕρ|(τ−1 ⊗ σT)1−α

α |ϕρ〉, (4.5.104)

where τ is a positive definite state and

|ϕρ〉 := (ρ12 ⊗ 1)|Γ〉 (4.5.105)

is a purification of ρ. We note that

Qα(ρ‖σ; τ) = Tr[ρ

12 σ

1−αα ρ

12 τ

α−1α

](4.5.106)

so thatDα(ρ‖σ; τ) =

α

α− 1log2 Qα(ρ‖σ; τ), (4.5.107)

where we recall the quantity Dα(ρ‖σ; τ) defined in (4.5.7). Now, to prove(4.5.101), we show that for every positive definite state ωA, there exists a pos-itive definite state τAB such that

Qα(ρAB‖σAB; τAB) ≥ Qα(ρA‖σA; ωA), for α ∈ (1, ∞),

Qα(ρAB‖σAB; τAB) ≤ Qα(ρA‖σA; ωA), for α ∈ [1/2, 1) .(4.5.108)

With these two inequalities, along with (4.5.107) and (4.5.6), the result follows.

Consider that

Qα(ρAB‖σAB; τAB) = 〈ϕρAB | f (τ−1AB ⊗ σT

AB)|ϕρAB〉, (4.5.109)

Qα(ρA‖σA; ωA) = 〈ϕρA | f (ω−1A ⊗ σT

A)|ϕρA〉, (4.5.110)

where we have setf (x) := x

1−αα (4.5.111)

and

|ϕρAB〉 = (ρ12AB ⊗ 1AB)|Γ〉ABAB = (ρ

12AB ⊗ 1AB)|Γ〉AA|Γ〉BB, (4.5.112)

|ϕρA〉 = (ρ12A ⊗ 1A)|Γ〉AA. (4.5.113)

292

Page 304: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Now, let us use the same isometry VAA→ABAB from (4.4.91) that we used inthe proof of data-processing inequality for the Petz–Rényi relative entropy;that is, let

VAA→ABAB := ρ12AB(ρ

− 12

A ⊗ 1A)|Γ〉BB. (4.5.114)

Recall that

VAA→ABAB|ϕρA〉AA = ρ12AB(ρ

− 12

A ⊗ 1A)(ρ12A ⊗ 1A)|Γ〉AA|Γ〉BB (4.5.115)

= (ρ12AB ⊗ 1AB)|Γ〉AA|Γ〉BB (4.5.116)

= |ϕρAB〉. (4.5.117)

We thus obtain, for all τAB,

Qα(ρAB‖σAB; τAB) = 〈ϕρA |V† f (τ−1AB ⊗ σT

AB)V|ϕρA〉

≥ 〈ϕρA | f (V†(τ−1AB ⊗ σT

AB)V)|ϕρA〉(4.5.118)

for α ∈ (1, ∞) and

Qα(ρAB‖σAB; τAB) = 〈ϕρA |V† f (τ−1AB ⊗ σT

AB)V|ϕρA〉

≤ 〈ϕρA | f (V†(τ−1AB ⊗ σT

AB)V)|ϕρA〉(4.5.119)

for α ∈ [1/2, 1), where to obtain the last inequality in each case we used theoperator Jensen inequality (Theorem 2.15), which is applicable since for α ∈(1, ∞) the function f in (4.5.111) is operator convex and for α ∈ [1/2, 1) it isoperator concave.

Now, recall that to conclude (4.5.101), we should perform an optimizationover invertible states τAB as per the definition in (4.5.6) of Dα(ρAB‖σAB; τAB)

in order to obtain Dα(ρAB‖σAB). Since we only require a lower bound onDα(ρAB‖σAB), we can obtain the lower bound in (4.5.101) on Dα(ρAB‖σAB)by simply picking a particular state τAB in the optimization in (4.5.6). Let ustherefore take

τAB = ξAB(ωA) := ρ12AB(ρ

− 12

A ωAρ− 1

2A ⊗ 1B)ρ

12AB, (4.5.120)

where ωA is an arbitrary invertible state. Note that this choice of τAB is indeeda state because it is the result of applying the Petz recovery channel PρAB,TrBdefined in (3.2.265) to ωA. It is also invertible; in particular,

τ−1AB = [ξAB(ωA)]

−1 = ρ− 1

2AB(ρ

12Aω−1

A ρ12A ⊗ 1B)ρ

− 12

AB. (4.5.121)

293

Page 305: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

With the choice in (4.5.120) for τAB, we find that

V†(τ−1AB ⊗ σT

AB)V

= 〈Γ|BB(ρ− 1

2A ⊗ 1A)ρ

12AB(τ

−1AB ⊗ σT

AB)ρ12AB(ρ

− 12

A ⊗ 1A)|Γ〉BB (4.5.122)

= 〈Γ|BB

(ρ− 1

2A ρ

12ABρ

− 12

AB(ρ12Aω−1

A ρ12A ⊗ 1B)ρ

− 12

ABρ12ABρ

− 12

A ⊗ σT

AB

)|Γ〉BB (4.5.123)

= 〈Γ|BBω−1A ⊗ 1B ⊗ σT

AB|Γ〉BB (4.5.124)

= ω−1A ⊗ 〈Γ|BBσT

AB|Γ〉BB (4.5.125)

= ω−1A ⊗ σT

A, (4.5.126)

where we have used the fact that 〈Γ|BBσT

AB|Γ〉BB = TrB[σ

T

AB] = σT

A, the last

equality due to the fact that the transpose is taken on a product basis forHA ⊗HB.

Therefore, for α ∈ (1, ∞), taking the logarithm on both sides of (4.5.118)and using the state in (4.5.120), we find that

log2 Qα(ρAB‖σAB; ξAB(ωA)) ≥ log2〈ϕρA | f (ω−1A ⊗ σT

A)|ϕρA〉 (4.5.127)

= log2 Qα(ρA‖σA; ωA). (4.5.128)

Multiplying both sides of this inequality by αα−1 , we obtain

Dα(ρAB‖σAB) = supτAB>0

Tr[τAB]=1

Dα(ρAB‖σAB; τAB) (4.5.129)

≥ Dα(ρAB‖σAB; ξAB(ωA)) (4.5.130)

≥ Dα(ρA‖σA; ωA) (4.5.131)

for all invertible states ωA. Finally, taking the supremum over the set ωA :ωA > 0, Tr[ωA] = 1, we conclude that

Dα(ρAB‖σAB) ≥ Dα(ρA‖σA), for α ∈ (1, ∞). (4.5.132)

For α ∈ [1/2, 1), taking the logarithm on both sides of (4.5.119) and usingthe state in (4.5.120), we conclude that

log2 Qα(ρAB‖σAB; ξAB(ωA)) ≤ log2〈ϕρA | f (ω−1A ⊗ σT

A)|ϕρA〉 (4.5.133)

= log2 Qα(ρA‖σA; ωA). (4.5.134)

294

Page 306: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Multiplying both sides of this inequality by αα−1 , which is negative in this case,

so thatα

α− 1log2 Qα(ρAB‖σAB; ξAB(ωA)) ≥

α

α− 1log2 Qα(ρA‖σA; ωA), (4.5.135)

we obtain

Dα(ρAB‖σAB) = supτAB>0,

Tr[τAB]=1

Dα(ρAB‖σAB; τAB) (4.5.136)

≥ Dα(ρAB‖σAB; ξAB(ωA)) (4.5.137)

≥ Dα(ρA‖σA; ωA) (4.5.138)

for all invertible states ωA. Finally, taking the supremum over the set ωA :ωA > 0, Tr[ωA] = 1, we conclude that

Dα(ρAB‖σAB) ≥ Dα(ρA‖σA), for α ∈ [1/2, 1) . (4.5.139)

Having established the data-processing inequality for Dα under the partialtrace channel for α ∈ [1/2, 1) ∪ (1, ∞), we conclude that

Dα(ρ‖σ) ≥ Dα(N(ρ)‖N(σ)), (4.5.140)

for α ∈ [1/2, 1) ∪ (1, ∞), all states ρ, positive semi-definite operators σ, and allchannels N.

By taking the limit α → 1 in the statement of the data-processing in-equality for Dα, along with Proposition 4.28, we immediately obtain the data-processing inequality for the quantum relative entropy.

Corollary 4.32 Data-Processing Inequality for Quantum RelativeEntropy

Let ρ be a state, σ a positive semi-definite operator, and N a quantumchannel. Then,

D(ρ‖σ) ≥ D(N(ρ)‖N(σ)). (4.5.141)

With the data-processing inequality for the sandwiched Rényi relative en-tropy in hand, it is now straightforward to prove some of the following addi-tional properties.

295

Page 307: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.33 Additional Properties of Sandwiched Rényi Rela-tive Entropy

The sandwiched Rényi relative entropy Dα satisfies the following prop-erties for every state ρ and positive semi-definite operator σ for α ∈[1/2, 1) ∪ (1, ∞).

1. If Tr(σ) ≤ Tr(ρ) = 1, then Dα(ρ‖σ) ≥ 0.

2. Faithfulness: If Tr[σ] ≤ 1, we have that Dα(ρ‖σ) = 0 if and only ifρ = σ.

3. If ρ ≤ σ, then Dα(ρ‖σ) ≤ 0.

4. For every positive semi-definite operator σ′ such that σ′ ≥ σ, wehave Dα(ρ‖σ) ≥ Dα(ρ‖σ′).

PROOF:

1. By the data-processing inequality for Dα with respect to the trace channelTr, and letting x = Tr(ρ) = 1 and y = Tr(σ), we find that

Dα(ρ‖σ) ≥ Dα(x‖y) = 1α− 1

log2 Tr[(y1−α2α xy

1−α2α )α] (4.5.142)

=1

α− 1log2(y

1−α) (4.5.143)

=1− α

α− 1log2 y (4.5.144)

= − log2 y (4.5.145)≥ 0, (4.5.146)

where the last line follows from the assumption that y = Tr(σ) ≤ 1.

2. Proof of faithfulness: If ρ = σ, then the following equalities hold for allα ∈ [1/2, 1) ∪ (1, ∞):

Dα(ρ‖ρ) =1

α− 1log2 Tr

[(ρ

1−α2α ρρ

1−α2α

)α](4.5.147)

=1

α− 1log2 Tr

1−α2 ραρ

1−α2

](4.5.148)

=1

α− 1log2 Tr[ρ1−αρα] (4.5.149)

296

Page 308: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

=1

α− 1log2 Tr(ρ) (4.5.150)

= 0. (4.5.151)

Next, suppose that α ∈ [1/2, 1) ∪ (1, ∞) and Dα(ρ‖σ) = 0. From theabove, we conclude that Dα(Tr(ρ)‖Tr(σ)) = − log2 y ≥ 0. From the factthat log2 y = 0 if and only if y = 1, we conclude that Dα(ρ‖σ) = 0implies Tr(σ) = Tr(ρ) = 1, so that σ is a density operator. Then, for everymeasurement channel M,

Dα(M(ρ)‖M(σ)) ≤ Dα(ρ‖σ) = 0. (4.5.152)

On the other hand, since Tr(σ) = Tr(ρ),

D(M(ρ)‖M(σ)) ≥ Dα(Tr(M(ρ))‖Tr(M(σ))) (4.5.153)

= Dα(Tr(ρ)‖Tr(σ)) (4.5.154)= 0, (4.5.155)

which means that Dα(M(ρ)‖M(σ)) = 0 for all measurement channels M.Now, recall that M(ρ) and M(σ) are effectively probability distributionsdetermined by the measurement. Since the classical Rényi relative en-tropy is equal to zero if and only if its two arguments are equal, we canconclude that M(ρ) = M(σ). Since this is true for every measurementchannel, we conclude from Theorem 3.54 and the fact that the trace normis a norm that ρ = σ. So we have that Dα(ρ‖σ) = 0 if and only if ρ = σ,as required.

3. Consider that ρ ≤ σ implies that σ − ρ ≥ 0. Then define the followingpositive semi-definite operators:

ρ := |0〉〈0| ⊗ ρ, (4.5.156)σ := |0〉〈0| ⊗ ρ + |1〉〈1| ⊗ (σ− ρ) . (4.5.157)

By exploiting the direct-sum property of sandwiched Rényi relative en-tropy (Proposition 4.29) and the data-processing inequality, we find that

0 = Dα(ρ‖ρ) = Dα(ρ‖σ) ≥ Dα(ρ‖σ), (4.5.158)

where the inequality follows from data processing with respect to partialtrace over the classical register.

297

Page 309: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

4. Consider the state ρ := |0〉〈0| ⊗ ρ and the operator σ := |0〉〈0| ⊗ σ +|1〉〈1| ⊗ (σ′ − σ), which is positive semi-definite because σ′ ≥ σ by as-sumption. Then

σ1−α2α ρσ

1−α2α = |0〉〈0| ⊗ σ

1−α2α ρσ

1−α2α , (4.5.159)

which implies thatDα(ρ‖σ) = Dα(ρ‖σ). (4.5.160)

Then, observing that Tr1[σ] = σ′, and using the data-processing inequal-ity for Dα with respect to the partial trace channel Tr1, we conclude that

Dα(ρ‖σ′) = Dα(Tr1(ρ)‖Tr1(σ)) ≤ Dα(ρ‖σ) = Dα(ρ‖σ), (4.5.161)

as required.

Let us now prove the faithfulness of both the Petz–Rényi and sandwichedRényi relative entropies for the full range of parameters for which they aredefined.

Proposition 4.34 Faithfulness of the Petz–Rényi and SandwichedRényi Relative Entropies

For all α ∈ (0, 1)∪ (1, ∞) and for all states ρ, σ, the Petz–Rényi and sand-wiched Rényi relative entropies are faithful, meaning that

Dα(ρ‖σ) = 0 if and only if ρ = σ, (4.5.162)

Dα(ρ‖σ) = 0 if and only if ρ = σ. (4.5.163)

PROOF: Note that the equality Dα(ρ‖ρ) = 0 for all α ∈ (0, 1) ∪ (1, ∞) isimmediate from the definition (see also (4.5.147)–(4.5.151)). The conversestatement has already been established in property 2. of Proposition 4.33 forα ∈ [1/2, 1) ∪ (1, ∞). Before getting to the range α ∈ (0, 1/2), let us considerthe Petz–Rényi relative entropy.

It is immediately clear from the definition that Dα(ρ‖ρ) = 0 for all α ∈(0, 1) ∪ (1, ∞). For α ∈ [0, 1) ∪ (1, 2], the converse follows from the data-processing inequality, which holds for this parameter range as shown in The-orem 4.22, as well as from arguments analogous to those in the proof of prop-erty 2. in Proposition 4.33. For α ∈ (2, ∞), we use the fact that Dα(ρ‖σ) ≥Dα(ρ‖σ) for all ρ, σ, as shown in Proposition 4.29. In particular, if Dα(ρ‖σ) =0, then Dα(ρ‖σ) ≤ 0. However, because ρ and σ are states, by property 1. of

298

Page 310: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.33, we have that Dα(ρ‖σ) ≥ 0, which means that Dα(ρ‖σ) = 0.Then, by property 2. of Proposition 4.33, we immediately get that ρ = σ.

Finally, suppose that Dα(ρ‖σ) = 0, where α ∈ (0, 1/2). Then, using (4.5.46),we have that αDα(ρ‖σ) ≤ 0. However, because ρ and σ are states, by the data-processing inequality we have that

Dα(ρ‖σ) ≥ Dα(Tr[ρ]‖Tr[σ]) =1

α− 1log2

(Tr[ρ]αTr[σ]1−α

)= 0. (4.5.164)

Therefore, Dα(ρ‖σ) = 0, which implies that ρ = σ by the faithfulness of thePetz–Rényi relative entropy, which we just proved.

The data-processing inequality for the sandwiched Rényi relative entropycan be written using the sandwiched Rényi relative quasi-entropy Qα as

1α− 1

log2 Qα(ρ‖σ) ≥1

α− 1log2 Qα(N(ρ)|N(σ)). (4.5.165)

Then, since α− 1 is negative for α ∈ [1/2, 1), we can use the monotonicity ofthe function log2 to obtain

Qα(ρ‖σ) ≥ Qα(N(ρ)‖N(σ)), for α ∈ (1, ∞), (4.5.166)

Qα(ρ‖σ) ≤ Qα(N(ρ)‖N(σ)), for α ∈ [1/2, 1) . (4.5.167)

Just as with the Petz–Rényi relative entropy, we can use this to prove the jointconvexity of the sandwiched Rényi relative entropy.

Proposition 4.35 Joint Convexity & Concavity of Sandwiched RényiRelative Quasi-Entropy

Let p : X → [0, 1] be a probability distribution over a finite alphabet Xwith associated |X|-dimensional system X, let ρx

Ax∈X be a set of stateson a system A, and let σx

Ax∈X be a set of positive semi-definite opera-tors on A. Then, for α ∈ (1, ∞)

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥ ∑x∈X

p(x)σxA

)≤ ∑

x∈Xp(x)Qα(ρ

xA‖σx

A), (4.5.168)

299

Page 311: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

and for α ∈ [1/2, 1),

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥ ∑x∈X

p(x)σxA

)≥ ∑

x∈Xp(x)Qα(ρ

xA‖σx

A). (4.5.169)

Consequently, the sandwiched Rényi relative entropy Dα is jointly con-vex for α ∈ [1/2, 1):

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥ ∑x∈X

p(x)σxA

)≤ ∑

x∈Xp(x)Dα(ρ

xA‖σx

A). (4.5.170)

PROOF: By the direct-sum property of Qα and applying (4.5.166)–(4.5.167)and Proposition 4.15, we conclude (4.5.168)–(4.5.169).

For α ∈ [1/2, 1), taking log2 of both sides and multiplying by 1α−1 , which is

negative, we find that

1α− 1

log2 Qα

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥ ∑x∈X

p(x)σxA

)

≤ 1α− 1

log2

(∑

x∈Xp(x)Qα(ρ

xA‖σx

A

). (4.5.171)

Then, since − log2 is a convex function, and using the definition of Dα interms of Qα, we conclude that

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥ ∑x∈X

p(x)σxA

)≤ ∑

x∈Xp(x)

1α− 1

log2 Qα(ρxA‖σx

A) (4.5.172)

= ∑x∈X

p(x)Dα(ρxA‖σx

A), (4.5.173)

as required.

Although the sandwiched Rényi relative entropy is not jointly convex forα ∈ (1, ∞), it is jointly quasi-convex, in the sense that

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥ ∑x∈X

p(x)σxA

)≤ max

x∈XDα(ρ

xA‖σx

A), (4.5.174)

300

Page 312: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

for every finite alphabet X, probability distribution p : X→ [0, 1], set ρxAx∈X

of states, and set σxAx∈X of positive semi-definite operators. Indeed, from

(4.5.168), we immediately obtain

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥ ∑x∈X

p(x)σxA

)≤ max

x∈XQα(ρ

xA‖σx

A). (4.5.175)

Taking the logarithm and multiplying by 1α−1 on both sides of this inequality

leads to (4.5.174).

4.6 Geometric Rényi Relative Entropy

In the previous two sections, we considered two examples of generalized di-vergences, the Petz– and sandwiched Rényi relative entropies. Both of theseare quantum generalizations of the classical Rényi relative entropy definedin (4.4.19).

In this section, we consider another generalization of the classical Rényirelative entropy, called the geometric Rényi relative entropy. Unlike the twoprevious Rényi relative entropies, the geometric Rényi relative entropy doesnot converge to the quantum relative entropy in the α → 1 limit. Instead,it converges to what is called the Belavkin–Staszewski relative entropy, asshown in Section 4.7. This latter quantity represents a different quantum gen-eralization of the classical relative entropy in (4.2.2). The main use of thegeometric Rényi and Belavkin–Staszewski relative entropies is in establish-ing upper bounds on the rates of feedback-assisted quantum communicationprotocols, the latter of which is the main focus of Part III of this book.

Definition 4.36 Geometric Rényi Relative Entropy

Let ρ be a state, σ a positive semi-definite operator, and α ∈ (0, 1) ∪(1, ∞). The geometric Rényi relative quasi-entropy is defined as

Qα(ρ‖σ) := limε→0+

Tr[

σε

(σ− 1

2ε ρσ

− 12

ε

)α]= lim

ε→0+Tr[Gα(σε, ρ)], (4.6.1)

where σε := σ + ε1 and

Gα(σε, ρ) := σ12ε

(σ− 1

2ε ρσ

− 12

ε

σ12ε (4.6.2)

301

Page 313: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

is the weighted operator geometric mean of σε and ρ. The geometric Rényirelative entropy is then defined as

Dα(ρ‖σ) :=1

α− 1log2 Qα(ρ‖σ). (4.6.3)

REMARK: In general, the weighted operator geometric mean of two positive definite op-erators X and Y is defined as

Gβ(X, Y) := X12

(X−

12 YX−

12

)βX

12 , (4.6.4)

where β ∈ R is the weight parameter. We recover the standard operator geometric meanfor β = 1

2 .

An important property of the weighted operator geometric mean is that

Gβ(X, Y) = G1−β(Y, X) (4.6.5)

for all positive definite X, Y, and all β ∈ R. To see this, observe that

G1−β(Y, X) = Y12

(Y−

12 XY−

12

)1−βY

12 (4.6.6)

= Y12

(Y−

12 XY−

12

) (Y−

12 XY−

12

)−βY

12 (4.6.7)

= X12 X

12 Y−

12

(Y−

12 X

12 X

12 Y−

12

)−βY

12 (4.6.8)

Now we apply Lemma 2.6. Specifically, we set L = X12 Y−

12 and f (x) = x−β therein to

conclude that

G1−β(Y, X) = X12

(X

12 Y−

12 Y−

12 X

12

)−βX

12 Y−

12 Y

12 (4.6.9)

= X12

(X−

12 YX−

12

)βX

12 (4.6.10)

= Gβ(X, Y). (4.6.11)

Definition 4.36 of the geometric Rényi relative entropy involves a limit,which has to do with the possibility that σ might not be invertible (i.e., itmight not be positive definite). Recall that the same situation arises for thePetz– and sandwiched Rényi relative entropies, which leads to expressionsfor them in terms of a limit in Propositions 4.19 and 4.27, respectively. Forthese two quantities, the limits evaluate to a finite value with an explicit ex-pression under the condition α ∈ (0, 1) and Tr[ρσ] 6= 0, or α ∈ (1, ∞) andsupp(ρ) ⊆ supp(σ). For the geometric Rényi relative entropy, however, there

302

Page 314: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

are several cases for which the limit in (4.6.1) is finite and has an explicit ex-pression. The following proposition outlines some of the simpler cases inwhich σ is positive definite:

Proposition 4.37

Let ρ be a state, and let σ be a positive definite operator. Then,

Qα(ρ‖σ) = Tr[σ(

σ−12 ρσ−

12

)α]= Tr[Gα(σ, ρ)] (4.6.12)

for all α ∈ (0, 1) ∪ (1, ∞). If ρ is a positive definite state and σ a positivedefinite operator, then

Qα(ρ‖σ) = Tr[

ρ(

ρ−12 σρ−

12

)1−α]= Tr[G1−α(ρ, σ)] (4.6.13)

= Tr[

ρ(

ρ12 σ−1ρ

12

)α−1]

. (4.6.14)

for all α ∈ (0, 1) ∪ (1, ∞).

Proof. If σ is positive definite, then the support of σ is the entire Hilbert space,and so the limit ε → 0+ in (4.6.1) simply evaluates to Qα(ρ‖σ) = Tr[Gα(σ, ρ)]for all α ∈ (0, 1) ∪ (1, ∞).

If ρ is also positive definite, then by invoking the equality in (4.6.5), weconclude that Tr[Gα(σ, ρ)] = Tr[G1−α(ρ, σ)] for all α ∈ (0, 1)∪ (1, ∞). Further-more, since both ρ and σ are positive definite, the following equality holds

(ρ−

12 σρ−

12

)1−α=(

ρ12 σ−1ρ

12

)α−1. (4.6.15)

Therefore, the equality in (4.6.14) holds.

We now provide explicit expressions for the geometric Rényi relative quasi-entropy Qα(ρ‖σ) that are consistent with the limit-based definition in (4.6.1)whenever ρ and/or σ are not positive definite. The expressions given in(4.6.16) below cover all possible values of α ∈ (0, 1) ∪ (1, ∞) and supportconditions. Additional expressions are given in (4.6.19).

303

Page 315: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.38 Explicit Expressions for Geometric Rényi RelativeQuasi-Entropy

For every state ρ, positive semi-definite operator σ, and α ∈ (0, 1) ∪(1, ∞), the following equality holds for the geometric Rényi relativequasi-entropy:

Qα(ρ‖σ) =

Tr[σ(

σ−12 ρσ−

12

)α] if α ∈ (0, 1) ∪ (1, ∞)and supp(ρ) ⊆ supp(σ)

Tr[σ(

σ−12 ρσ−

12

)α] if α ∈ (0, 1)and supp(ρ) 6⊆ supp(σ)

+∞ if α ∈ (1, ∞) andsupp(ρ) 6⊆ supp(σ),

(4.6.16)where

ρ := ρ0,0 − ρ0,1ρ−11,1 ρ†

0,1, ρ =

(ρ0,0 ρ0,1ρ†

0,1 ρ1,1

), (4.6.17)

ρ0,0 := ΠσρΠσ, ρ0,1 := ΠσρΠ⊥σ , ρ1,1 := Π⊥σ ρΠ⊥σ , (4.6.18)

Πσ is the projection onto the support of σ, Π⊥σ is the projection onto thekernel of σ, and the inverses σ−

12 and ρ−1

1,1 are taken on the supports ofσ and ρ1,1, respectively. We also have the alternative expressions belowfor certain cases:

Qα(ρ‖σ) =

Tr[

ρ(

ρ−12 σρ−

12

)1−α]

if α ∈ (0, 1)and supp(σ) ⊆ supp(ρ)

Tr[

ρ(

ρ12 σ−1ρ

12

)α−1]

if α ∈ (1, ∞)and supp(ρ) ⊆ supp(σ)

,

(4.6.19)where the inverses ρ−

12 and σ−1 are taken on the supports of ρ and σ,

respectively.

PROOF: The proof is similar in spirit to the proofs of Propositions 4.19 and4.27, but it is more complicated than these previous proofs. We provide it inSection 4.6.2.

304

Page 316: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Observe that when supp(ρ) ⊆ supp(σ) and α ∈ (0, 1), the expressionTr[σ(σ−1/2ρσ−1/2)α] is actually a special case of Tr[σ(σ−1/2ρσ−1/2)α], becausethe operators ρ0,1 and ρ1,1 are both equal to zero in this case, so that Πσρ =ρΠσ = ρ and ρ = ρ0,0.

The main intuition behind the first expression in (4.6.16) and those in(4.6.19) is as follows. If ρ and σ are positive definite, then the following equal-ities hold

Tr[σ(

σ−12 ρσ−

12

)α]= Tr

[ρ(

ρ−12 σρ−

12

)1−α]

(4.6.20)

= Tr[

ρ(

ρ12 σ−1ρ

12

)α−1]

, (4.6.21)

for all α ∈ (0, 1) ∪ (1, ∞), as shown previously in Proposition 4.37.

1. If the support condition supp(ρ) ⊆ supp(σ) holds, then we can thinkof supp(σ) as being the whole Hilbert space and σ being invertible onthe whole space. So then generalized inverses like σ−

12 or σ−1 are true

inverses on supp(σ), and the expression Tr[σ(σ−1/2ρσ−1/2)α] is sensiblefor α ∈ (0, 1) ∪ (1, ∞), with the only inverse in the expression being σ−

12 ;

this expression results after taking the limit ε→ 0+.

2. Similarly, the expression Tr[ρ(ρ1/2σ−1ρ1/2)α−1] is sensible for α ∈ (1, ∞),with the only inverse in the expression being σ−1; this latter expressionresults after taking the limit ε→ 0+.

3. If the support condition supp(σ) ⊆ supp(ρ) holds, then we can thinkof supp(ρ) as being the whole Hilbert space and ρ being invertible onthe whole space. So then the generalized inverse ρ−

12 is a true inverse

on supp(ρ), and the expression Tr[ρ(ρ−1/2σρ−1/2)1−α] is sensible for α ∈(0, 1), with the only inverse in the expression being ρ−

12 ; this expression

also results after taking the limit ε→ 0+.

In order to understand the first expression in (4.6.19) further, observe thatthe following identities hold for all α ∈ (0, 1) ∪ (1, ∞):

Qα(ρ‖σ) = limε→0+

limδ→0+

Tr[

σε

(σ− 1

2ε ρδσ

− 12

ε

)α](4.6.22)

= limε→0+

limδ→0+

Tr[Gα(σε, ρδ)], (4.6.23)

305

Page 317: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

whereρδ := (1− δ) ρ + δπ, (4.6.24)

and π is the maximally mixed state. This holds because the expression for thegeometric Rényi relative quasi-entropy in Definition 4.36 does not involve aninverse of the state ρ.

As it turns out, the order of the limits in (4.6.22) does not matter for α ∈(0, 1):

Lemma 4.39 Limit Interchange for Geometric Rényi Relative Quasi-Entropy

Let ρ be a state and σ a positive semi-definite operator. For α ∈ (0, 1),the following equality holds

Qα(ρ‖σ) = limε→0+

limδ→0+

Tr[

σε

(σ− 1

2ε ρδσ

− 12

ε

)α](4.6.25)

= infε,δ>0

Tr[

σε

(σ− 1

2ε ρδσ

− 12

ε

)α](4.6.26)

= limδ→0+

limε→0+

Tr[

σε

(σ− 1

2ε ρδσ

− 12

ε

)α], (4.6.27)

where ρδ := (1− δ) ρ + δπ, δ ∈ (0, 1), π is the maximally mixed state,σε := σ + ε1, and ε > 0.

PROOF: See Section 4.6.1.

Now, because both σε and ρδ are positive definite for ε, δ > 0, we can usethe property in (4.6.5), along with Lemma 4.39, to obtain the following forα ∈ (0, 1):

Qα(ρ‖σ) = limδ→0+

limε→0+

Tr[G1−α(ρδ, σε)] (4.6.28)

= limδ→0+

limε→0+

Tr

[ρδ

(ρ− 1

2δ σερ

− 12

δ

)1−α]

(4.6.29)

= limδ→0+

Tr

[ρδ

(ρ− 1

2δ σρ

− 12

δ

)1−α]

, (4.6.30)

where the last equality holds for the analogous reason that (4.6.22) holds,namely, that the inverse of σ is not involved. We are now in a situation that

306

Page 318: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

looks like the expression in (4.6.1), except that the roles of ρ and σ are re-versed and α is substituted with 1 − α. Then, in the limit δ → 0+, if thesupport condition supp(σ) ⊆ supp(ρ) holds, the expression converges toTr[ρ(ρ−1/2σρ−1/2)1−α].

It is worthwhile to consider the special case of α = 2. In this case, thegeometric Rényi relative quasi-entropy collapses to the Petz–Rényi relativequasi-entropy when supp(ρ) ⊆ supp(σ):

Q2(ρ‖σ) = limε→0+

Tr

[σε

(σ− 1

2ε ρσ

− 12

ε

)2]

(4.6.31)

= limε→0+

Tr[

σε

(σ− 1

2ε ρσ

− 12

ε

)(σ− 1

2ε ρσ

− 12

ε

)](4.6.32)

= limε→0+

Tr[ρσ−1ε ρ] (4.6.33)

= limε→0+

Tr[ρ2σ−1ε ] (4.6.34)

= Q2(ρ‖σ), (4.6.35)

with the last line following from Proposition 4.19. The development aboveimplies that the corresponding Rényi relative entropies are equal:

D2(ρ‖σ) = D2(ρ‖σ). (4.6.36)

The geometric and sandwiched Rényi relative entropies also converge to thesame value in the limit α→ ∞, as shown in Section 4.8.

A first property of the geometric Rényi relative entropy that we establishis its relation to the sandwiched Rényi relative entropy.

Proposition 4.40 Ordering of Sandwiched and Geometric RényiRelative Entropies

Let ρ be a state and σ a positive semi-definite operator. The geometricRényi relative entropy is not smaller than the sandwiched Rényi relativeentropy for all α ∈ (0, 1) ∪ (1, ∞):

Dα(ρ‖σ) ≤ Dα(ρ‖σ). (4.6.37)

PROOF: This is a direct consequence of the Araki–Lieb–Thirring inequality(Lemma 2.14), which we recall here for convenience. For positive semi-definite

307

Page 319: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

operators X and Y, q ≥ 0, and r ∈ [0, 1], the following inequality holds

Tr[(

Y12 XY

12

)rq]≥ Tr

[(Y

r2 XrY

r2

)q]. (4.6.38)

For r ≥ 1, the following inequality holds

Tr[(

Y12 XY

12

)rq]≤ Tr

[(Y

r2 XrY

r2

)q]. (4.6.39)

By employing (4.6.38) with q = 1, r = α ∈ (0, 1), Y = σ1αε , and X = σ

− 12

ε ρσ− 1

2ε ,

and recalling that σε := σ + ε1, we find that

Qα(ρ‖σε) = Tr[

σε

(σ− 1

2ε ρσ

− 12

ε

)α](4.6.40)

= Tr[(

σ1

2αε

)α (σ− 1

2ε ρσ

− 12

ε

)α (σ

12αε

)α](4.6.41)

≤ Tr[(

σ1

2αε σ− 1

2ε ρσ

− 12

ε σ1

2αε

)α](4.6.42)

= Tr[(

σ1−α2α

ε ρσ1−α2α

ε

)α](4.6.43)

= Qα(ρ‖σε), (4.6.44)

which implies for α ∈ (0, 1), by using definitions, that

Dα(ρ‖σε) ≤ Dα(ρ‖σε). (4.6.45)

Now taking the limit as ε → 0+, employing Proposition 4.27 and Defini-tion 4.36, we arrive at the inequality in (4.6.37).

Since the Araki–Lieb–Thirring inequality is reversed for r = α ∈ (1, ∞),we can employ similar reasoning as above, using (4.6.39) and definitions, toarrive at (4.6.37) for α ∈ (1, ∞).

If the state ρ is pure, then the geometric Rényi relative entropy simplifiesas follows, such that it is independent of α:

Proposition 4.41 Geometric Rényi Relative Entropy for Pure States

Let ρ = |ψ〉〈ψ| be a pure state and σ a positive semi-definite operator.

308

Page 320: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Then the following equality holds for all α ∈ (0, 1) ∪ (1, ∞):

Dα(ρ‖σ) =

log2〈ψ|σ−1|ψ〉 if supp(|ψ〉〈ψ|) ⊆ supp(σ)+∞ otherwise

, (4.6.46)

where the inverse σ−1 is taken on the support of σ. If σ is also a rank-oneoperator, so that σ = |φ〉〈φ| and ‖|φ〉‖2 > 0, then the following equalityholds for all α ∈ (0, 1) ∪ (1, ∞):

Dα(ρ‖σ) =− log2‖|φ〉‖

22 if ∃c ∈ C such that |ψ〉 = c|φ〉

+∞ otherwise. (4.6.47)

In particular, if σ = |φ〉〈φ| is a state so that ‖|φ〉‖22 = 1, then

Dα(ρ‖σ) =

0 if |ψ〉 = |φ〉+∞ otherwise . (4.6.48)

PROOF: Defining σε := σ + ε1, consider that

Tr[

σε

(σ− 1

2ε ρσ

− 12

ε

)α]= Tr

[σε

(σ− 1

2ε |ψ〉〈ψ|σ−

12

ε

)α](4.6.49)

=

(∥∥∥∥σ− 1

2ε |ψ〉

∥∥∥∥2

2

Tr

σε

σ− 1

2ε |ψ〉〈ψ|σ−

12

ε∥∥∥∥σ− 1

2ε |ψ〉

∥∥∥∥2

2

α (4.6.50)

=

(∥∥∥∥σ− 1

2ε |ψ〉

∥∥∥∥2

2

Tr

σε

σ− 1

2ε |ψ〉〈ψ|σ−

12

ε∥∥∥∥σ− 1

2ε |ψ〉

∥∥∥∥2

2

(4.6.51)

=

(∥∥∥∥σ− 1

2ε |ψ〉

∥∥∥∥2

2

)α−1

Tr[

σεσ− 1

2ε |ψ〉〈ψ|σ−

12

ε

](4.6.52)

=

(∥∥∥∥σ− 1

2ε |ψ〉

∥∥∥∥2

2

)α−1

Tr[|ψ〉〈ψ|] (4.6.53)

=[〈ψ|σ−1

ε |ψ〉]α−1

. (4.6.54)

The third equality follows because |ϕ〉〈ϕ|α = |ϕ〉〈ϕ| for all α ∈ (0, 1) ∪ (1, ∞)

309

Page 321: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

when ‖|ϕ〉‖2 = 1. Applying the chain of equalities above, we find that

1α− 1

log2 Tr[

σε

(σ− 1

2ε ρσ

− 12

ε

)α]=

1α− 1

log2

[〈ψ|σ−1

ε |ψ〉]α−1

(4.6.55)

= log2〈ψ|σ−1ε |ψ〉. (4.6.56)

Now let a spectral decomposition of σ be given by

σ = ∑y

µyQy, (4.6.57)

where µy are the non-negative eigenvalues and Qy are the eigenprojections.In this decomposition, we are including values of µy for which µy = 0. Thenit follows that

σε = σ + ε1 = ∑y

(µy + ε

)Qy, (4.6.58)

and we find thatσ−1

ε = ∑y

(µy + ε

)−1 Qy. (4.6.59)

We then conclude that

〈ψ|σ−1ε |ψ〉 = 〈ψ|∑

y

(µy + ε

)−1 Qy|ψ〉 (4.6.60)

= ∑y

(µy + ε

)−1 〈ψ|Qy|ψ〉 (4.6.61)

= ∑y:µy 6=0

(µy + ε

)−1 〈ψ|Qy|ψ〉+ ε−1〈ψ|Qy0 |ψ〉, (4.6.62)

where y0 is the value of y for which µy = 0 (if no such value of y exists, thenQy0 is equal to the zero operator). Thus, if 〈ψ|Qy0 |ψ〉 6= 0 (equivalent to |ψ〉being outside the support of σ), then it follows that

limε→0+

log2〈ψ|σ−1ε |ψ〉 = +∞. (4.6.63)

Otherwise the expression converges as claimed.

Now suppose that σ is a rank-one operator, so that σ = |φ〉〈φ| and ‖|φ〉‖2 >0. By defining

|φ′〉 :=|φ〉√‖|φ〉‖2

, N := ‖|φ〉‖22 , (4.6.64)

310

Page 322: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

we find that

σε = |φ〉〈φ|+ ε1 (4.6.65)= N |φ′〉〈φ′|+ ε

(1− |φ′〉〈φ′|+ |φ′〉〈φ′|

)(4.6.66)

= (N + ε) |φ′〉〈φ′|+ ε(1− |φ′〉〈φ′|

), (4.6.67)

so that

σ−1ε = (N + ε)−1 |φ′〉〈φ′|+ ε−1 (1− |φ′〉〈φ′|

)(4.6.68)

=((N + ε)−1 − ε−1

)|φ′〉〈φ′|+ ε−11 (4.6.69)

and then

〈ψ|σ−1ε |ψ〉 = 〈ψ|

[((N + ε)−1 − ε−1

)|φ′〉〈φ′|+ ε−11

]|ψ〉 (4.6.70)

=((N + ε)−1 − ε−1

) ∣∣〈ψ|φ′〉∣∣2 + ε−1 (4.6.71)

=|〈ψ|φ′〉|2

N + ε+

1− |〈ψ|φ′〉|2ε

. (4.6.72)

Note that we always have |〈ψ|φ′〉|2 ∈ [0, 1] because |ψ〉 and |φ′〉 are unit vec-tors. In the case that |〈ψ|φ′〉|2 ∈ [0, 1), then we find that

limε→0+

log2

[〈ψ|σ−1

ε |ψ〉]= lim

ε→0+log2

[|〈ψ|φ′〉|2

N + ε+

1− |〈ψ|φ′〉|2ε

](4.6.73)

= +∞. (4.6.74)

Otherwise, if |〈ψ|φ′〉|2 = 1, then

limε→0+

log2

[〈ψ|σ−1

ε |ψ〉]= lim

ε→0+log2

[|〈ψ|φ′〉|2

N + ε+

1− |〈ψ|φ′〉|2ε

](4.6.75)

= limε→0+

log2

[1

N + ε

](4.6.76)

= − log2 N, (4.6.77)

concluding the proof.

We note here that, for pure states ρ and σ and as indicated by (4.6.48), thegeometric Rényi relative entropy is either equal to zero or +∞, depending onwhether ρ = σ. This behavior of the geometric Rényi relative entropy for pure

311

Page 323: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

states ρ and σ is very different from that of the Petz– and sandwiched Rényirelative entropies. The latter quantities always evaluate to a finite value if thepure states are non-orthogonal.

The geometric Rényi relative entropy possesses a number of useful prop-erties, similar to those for the Petz– and sandwiched Rényi relative entropies,which we delineate now.

Proposition 4.42 Properties of Geometric Rényi Relative Entropy

For all states ρ, ρ1, ρ2 and positive semi-definite operators σ, σ1, σ2, thegeometric Rényi relative entropy satisfies the following properties:

1. Isometric invariance: For all α ∈ (0, 1) ∪ (1, ∞) and for every isom-etry V,

Dα(ρ‖σ) = Dα(VρV†‖VσV†). (4.6.78)

2. Monotonicity in α: For all α ∈ (0, 1) ∪ (1, ∞), the geometric Rényirelative entropy Dα is monotonically increasing in α; i.e., α < β im-plies Dα(ρ‖σ) ≤ Dβ(ρ‖σ).

3. Additivity: For all α ∈ (0, 1) ∪ (1, ∞),

Dα(ρ1 ⊗ ρ2‖σ1 ⊗ σ2) = Dα(ρ1‖σ1) + Dα(ρ2‖σ2). (4.6.79)

4. Direct-sum property: Let p : X→ [0, 1] be a probability distributionover a finite alphabet X with associated |X|-dimensional system X,and let q : X → [0, ∞) be a positive function on X. Let

ρx

A

x∈X bea set of states on a system A, and let

σx

A

x∈X be a set of positivesemi-definite operators on A. Then,

Qα(ρXA‖σXA) = ∑x∈X

p(x)αq(x)1−αQα(ρxA‖σx

A), (4.6.80)

where

ρXA := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, (4.6.81)

σXA := ∑x∈X

q(x)|x〉〈x|X ⊗ σxA. (4.6.82)

PROOF:

312

Page 324: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

1. Proof of isometric invariance: Let us start by writing Dα(ρ‖σ) as in (4.6.1)–(4.6.3):

Dα(ρ‖σ) = limε→0+

1α− 1

log2 Tr[

σε

(σ− 1

2ε ρσ

− 12

ε

)α]. (4.6.83)

whereσε := σ + ε1. (4.6.84)

Let V be an isometry. Then, defining

ωε := VσV† + ε1, (4.6.85)

we find that

Dα(VρV†‖VσV†) = limε→0+

1α− 1

log2 Tr[

ωε

(ω− 1

2ε VρV†ω

− 12

ε

)α].

(4.6.86)Now let Π := VV† be the projection onto the image of V, so that ΠV = V,and let Π := 1−Π. Then we can write

ωε = VσV† + εΠ + εΠ = VσεV† + εΠ. (4.6.87)

Since VσεV† and εΠ are supported on orthogonal subspaces, we obtain

ω− 1

2ε = Vσ

− 12

ε V† + ε−12 Π. (4.6.88)

Consider then that

ω− 1

2ε VρV†ω

− 12

ε

=

(Vσ

− 12

ε V† + ε−12 Π)

ΠVρV†Π(

Vσ− 1

2ε V† + ε−

12 Π)

(4.6.89)

=

(Vσ

− 12

ε V†)

ΠVρV†Π(

Vσ− 1

2ε V†

)(4.6.90)

= Vσ− 1

2ε ρσ

− 12

ε V†, (4.6.91)

where the second equality follows because ΠΠ = ΠΠ = 0. Thus,(

ω− 1

2ε VρV†ω

− 12

ε

= V(

σ− 1

2ε ρσ

− 12

ε

V†, (4.6.92)

313

Page 325: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

and we find that

Tr[

ωε

(ω− 1

2ε VρV†ω

− 12

ε

)α]

= Tr[(

VσεV† + εΠ)

V(

σ− 1

2ε ρσ

− 12

ε

V†]

(4.6.93)

= Tr[

σε

(σ− 1

2ε ρσ

− 12

ε

)α]. (4.6.94)

Since the equality

Tr[

ωε

(ω− 1

2ε VρV†ω

− 12

ε

)α]= Tr

[σε

(σ− 1

2ε ρσ

− 12

ε

)α](4.6.95)

holds for all ε > 0, we conclude the proof of isometric invariance bytaking the limit ε→ 0+.

2. Proof of monotonicity in α: We prove this by showing that the derivative isnon-negative for all α > 0. By applying (4.6.22), we can consider ρ andσ to be positive definite without loss of generality. By applying (4.6.14),consider that

Qα(ρ‖σ) = Tr[

ρ(

ρ−12 σρ−

12

)1−α]

(4.6.96)

= Tr[

ρ(

ρ12 σ−1ρ

12

)α−1]

. (4.6.97)

Now defining |ϕρ〉 = (ρ12 ⊗ 1)|Γ〉 as a purification of ρ, and setting

γ := α− 1, (4.6.98)

X := ρ12 σ−1ρ

12 , (4.6.99)

we can write the geometric Rényi relative entropy as

Dα(ρ‖σ) =1γ

log2〈ϕρ|Xγ ⊗ 1|ϕρ〉 = 1γ

ln〈ϕρ|Xγ ⊗ 1|ϕρ〉ln(2)

, (4.6.100)

where we made use of (4.6.97). Then ddα = d

dγdγdα = d

dγ , and so we findthat

ln(2)d

dαDα(ρ‖σ)

314

Page 326: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

=d

[1γ

ln〈ϕρ|Xγ ⊗ 1|ϕρ〉]

(4.6.101)

=

[− 1

γ2 ln〈ϕρ|Xγ ⊗ 1|ϕρ〉+ 1γ

ddγ

ln〈ϕρ|Xγ ⊗ 1|ϕρ〉]

(4.6.102)

=

[− 1

γ2 ln〈ϕρ|Xγ ⊗ 1|ϕρ〉+ 1γ

〈ϕρ|Xγ ln X⊗ 1|ϕρ〉〈ϕρ|Xγ ⊗ 1|ϕρ〉

](4.6.103)

=

[−〈ϕρ|Xγ ⊗ 1|ϕρ〉 ln〈ϕρ|Xγ ⊗ 1|ϕρ〉+ γ〈ϕρ|Xγ ln X⊗ 1|ϕρ〉γ2〈ϕρ|Xγ ⊗ 1|ϕρ〉

]

(4.6.104)

=

[−〈ϕρ|Xγ ⊗ 1|ϕρ〉 ln〈ϕρ|Xγ ⊗ 1|ϕρ〉+ 〈ϕρ|Xγ ln Xγ ⊗ 1|ϕρ〉γ2〈ϕρ|Xγ ⊗ 1|ϕρ〉

].

(4.6.105)

Letting g(x) := x log2 x, we write

ddα

Dα(ρ‖σ) =〈ϕρ|g(Xγ ⊗ 1)|ϕρ〉 − g(〈ϕρ|(Xγ ⊗ 1)|ϕρ〉)

γ2〈ϕρ|Xγ ⊗ 1|ϕρ〉 . (4.6.106)

Then, since g(x) is operator convex, by the operator Jensen inequality in(2.3.22), we conclude that

〈ϕρ|g(Xγ ⊗ 1)|ϕρ〉 ≥ g(〈ϕρ|(Xγ ⊗ 1)|ϕρ〉), (4.6.107)

which means that ddα Dα(ρ‖σ) ≥ 0. Therefore, Dα(ρ‖σ) is monotonically

increasing in α, as required.

3. Proof of additivity: The proof of (4.6.79) is found by direct evaluation. Con-sider that

limε1→0+

Qα(ρ1‖σ1,ε1) · limε2→0+

Qα(ρ2‖σ2,ε2)

= limε1→0+

limε2→0+

Qα(ρ1‖σ1,ε1) · Qα(ρ2‖σ2,ε2), (4.6.108)

where σ1,ε1:= σ1 + ε11 and σ2,ε2 := σ2 + ε21. We then find that

Qα(ρ1‖σ1,ε1) · Qα(ρ2‖σ2,ε2)

= Tr[

σ1,ε1

(σ− 1

21,ε1

ρ1σ− 1

21,ε1

)α]Tr[

σ2,ε2

(σ− 1

22,ε2

ρ2σ− 1

22,ε2

)α](4.6.109)

= Tr[

σ1,ε1

(σ− 1

21,ε1

ρ1σ− 1

21,ε1

⊗ σ2,ε2

(σ− 1

22,ε2

ρ2σ− 1

22,ε2

)α](4.6.110)

315

Page 327: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= Tr[(

σ1,ε1 ⊗ σ2,ε2

) ((σ− 1

21,ε1

ρ1σ− 1

21,ε1

⊗(

σ− 1

22,ε2

ρ2σ− 1

22,ε2

)α)](4.6.111)

= Tr[(

σ1,ε1 ⊗ σ2,ε2

) (σ− 1

21,ε1

ρ1σ− 1

21,ε1⊗ σ

− 12

2,ε2ρ2σ− 1

22,ε2

)α](4.6.112)

= Tr[(

σ1,ε1 ⊗ σ2,ε2

) ((σ1,ε1 ⊗ σ2,ε2

)− 12 (ρ1 ⊗ ρ2)

(σ1,ε1 ⊗ σ2,ε2

)− 12

)α]

(4.6.113)

= Qα(ρ1 ⊗ ρ2‖σ1,ε1 ⊗ σ2,ε2). (4.6.114)

By considering that

limε1→0+

limε2→0+

σ1,ε1 ⊗ σ2,ε2 = limε→0+

σ1 ⊗ σ2 + ε1⊗ 1, (4.6.115)

along with continuity of the underlying functions, we conclude that

limε1→0+

limε2→0+

Qα(ρ1 ⊗ ρ2‖σ1,ε1 ⊗ σ2,ε2)

= limε→0+

Qα(ρ1 ⊗ ρ2‖σ1 ⊗ σ2 + ε1⊗ 1). (4.6.116)

Finally, by applying the continuous function 1α−1 log2(·) to all sides of the

equalities established, we conclude that additivity holds.

4. Proof of direct-sum property: Define the classical–quantum state ρXA andoperator σXA, respectively, as

ρXA := ∑x∈X

p(x)|x〉〈x|X⊗ ρxA, σXA := ∑

x∈Xq(x)|x〉〈x|X⊗ σx

A. (4.6.117)

Define

σεXA := ∑

x∈Xq(x)|x〉〈x|X ⊗ σx

A + ε1X ⊗ 1A (4.6.118)

= ∑x∈X,q(x) 6=0

q(x)|x〉〈x|X ⊗ σxA + ε ∑

x∈X|x〉〈x|X ⊗ 1A (4.6.119)

= ∑x∈X,q(x) 6=0

|x〉〈x|X ⊗ q(x)σxA,ε + ∑

x∈X,q(x)=0|x〉〈x|X ⊗ ε1A, (4.6.120)

whereσx

A,ε := σxA + ε1A. (4.6.121)

Then we find that

316

Page 328: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

(σεXA)

− 12 = ∑

x∈X,q(x) 6=0|x〉〈x|X ⊗

(q(x)σx

A,ε)− 1

2 +

∑x∈X,q(x)=0

|x〉〈x|X ⊗ ε−12 1A, (4.6.122)

so that (omitting some lines of calculation)((σε

XA)− 1

2 ρXA (σεXA)

− 12)α

= ∑x∈X,q(x) 6=0,

p(x) 6=0

|x〉〈x|X ⊗(

p(x)q(x)

)α ((σx

A,ε)− 1

2 ρxA(σx

A,ε)− 1

2

+ ∑x∈X,q(x)=0,

p(x) 6=0

|x〉〈x|X ⊗ ε−α(ρxA)

α. (4.6.123)

Defining

ωxA :=

(σx

A,ε)− 1

2 ρxA(σx

A,ε)− 1

2 , (4.6.124)

it then follows that

Qα(ρXA‖σεXA)

= Tr[

σεXA

((σε

XA)− 1

2 ρXA (σεXA)

− 12)α]

(4.6.125)

= Tr

x∈X,q(x) 6=0

q(x)|x〉〈x|X ⊗ σxA,ε

∑x′∈X,

q(x′) 6=0,p(x′) 6=0

|x′〉〈x′|X ⊗(

p(x′)q(x′)

)α (ωx′

A

+ Tr

x∈X,q(x) 6=0

q(x)|x〉〈x|X ⊗ σxA,ε

∑x′∈X,

q(x′)=0,p(x′) 6=0

|x′〉〈x′|X ⊗ ε−α(ρx′A)

α

+ Tr

x∈X,q(x)=0

|x〉〈x|X ⊗ ε1A

∑x′∈X,

q(x′) 6=0,p(x′) 6=0

|x′〉〈x′|X ⊗(

p(x′)q(x′)

)α (ωx′

A

317

Page 329: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

+ Tr

x∈X,q(x)=0

|x〉〈x|X ⊗ ε1A

x′∈X,q(x′)=0,p(x′) 6=0

|x′〉〈x′|X ⊗ ε−α(ρx′A)

α

(4.6.126)

= ∑x∈X,q(x) 6=0,p(x) 6=0

p(x)αq(x)1−α Tr[σx

A,ε (ωxA)

α]

+ ∑x∈X,q(x)=0,p(x) 6=0

ε1−αTr[(ρxA)

α]. (4.6.127)

Now observing that Tr[σx

A,ε(ωx

A)α]= Qα(ρx

A‖σxA,ε) and taking the limit

ε→ 0+ in the last line above, we find that

limε→0+

∑x∈X,

q(x) 6=0,p(x) 6=0

p(x)αq(x)1−αQα(ρxA‖σx

A,ε) + ∑x∈X,

q(x)=0,p(x) 6=0

ε1−αTr[(ρxA)

α]

= ∑x∈X

p(x)αq(x)1−αQα(ρxA‖σx

A) (4.6.128)

if α ∈ (0, 1) or if α ∈ (1, ∞), supp(ρxA) ⊆ supp(σx

A), and there does notexist a value of x for which p(x) 6= 0 and q(x) = 0. The latter supportconditions are precisely the same as supp(ρXA) ⊆ supp(σXA). If α ∈(1, ∞) and the support conditions do not hold, then the limit evaluates to+∞, consistent with the right-hand side above. This concludes the proofof the direct-sum property.

We now establish the data-processing inequality for the geometric Rényirelative entropy for α ∈ (0, 1) ∪ (1, 2].

Theorem 4.43 Data-Processing Inequality for Geometric Rényi Rel-ative Entropy

Let ρ be a state, σ a positive semi-definite operator, and N a quantumchannel. Then, for all α ∈ (0, 1) ∪ (1, 2],

Dα(ρ‖σ) ≥ Dα(N(ρ)‖N(σ)). (4.6.129)

318

Page 330: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

PROOF: From Stinespring’s dilation theorem (Theorem 3.18), we know thatthe action of a quantum channel N on every linear operator X can be writtenas

N(X) = TrE[VXV†], (4.6.130)

where V is an isometry and E is an auxiliary system with dimension dE ≥rank(ΓN

AB), with ΓNAB the Choi operator for the channel N. As stated in Propo-

sition 4.42, the geometric Rényi relative entropy Dα is isometrically invariant.Therefore, it suffices to establish the data-processing inequality for Dα underpartial trace; i.e., it suffices to show that for every state ρAB, positive semi-definite operator σAB, and for all α ∈ (0, 1) ∪ (1, 2]:

Dα(ρAB‖σAB) ≥ Dα(ρA‖σA). (4.6.131)

We now proceed to prove this inequality. We prove it for ρAB, and henceρA, invertible, as well as for σAB and σA invertible. The result follows inthe general case of ρAB and/or ρA non-invertible, as well as σAB and/or σAnon-invertible, by applying the result to the invertible operators (1− δ) ρAB +δπAB and σAB + ε1AB, with δ ∈ (0, 1) and ε > 0, and taking the limit δ → 0+

followed by ε→ 0+, because

Dα(ρAB‖σAB) = limε→0+

limδ→0+

Dα((1− δ) ρAB + δπAB‖σAB + ε1AB), (4.6.132)

Dα(ρA‖σA) = limε→0+

limδ→0+

Dα((1− δ) ρA + δπA‖σA + dBε1A), (4.6.133)

which follows from (4.6.22) and the fact that the dimensional factor dB doesnot affect the limit in the second quantity above.

To establish the data-processing inequality, we make use of the Petz re-covery channel for partial trace (see Section 3.2.10.1), as well as the opera-tor Jensen inequality (Theorem 2.15). Recall that the Petz recovery channelPσAB,TrB for partial trace is defined as

PσAB,TrB(XA) ≡ P(XA) := σ12AB

(σ− 1

2A XAσ

− 12

A ⊗ 1B

12AB. (4.6.134)

The Petz recovery channel has the following property:

P(σA) = σAB, (4.6.135)

which can be verified by inspection. Since PσAB,TrB is completely positive andtrace preserving, it follows that its adjoint

P†(YAB) := σ− 1

2A TrB[σ

12ABYABσ

12AB]σ

− 12

A , (4.6.136)

319

Page 331: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

is completely positive and unital. Observe that

P†(σ− 1

2AB ρABσ

− 12

AB ) = σ− 1

2A ρAσ

− 12

A . (4.6.137)

We then find for α ∈ (1, 2] that

Qα(ρAB‖σAB) = Tr[

σAB

(σ− 1

2AB ρABσ

− 12

AB

)α](4.6.138)

= Tr[P(σA)

(σ− 1

2AB ρABσ

− 12

AB

)α](4.6.139)

= Tr[

σAP†((

σ− 1

2AB ρABσ

− 12

AB

)α)](4.6.140)

≥ Tr[

σA

(P†(

σ− 1

2AB ρABσ

− 12

AB

))α](4.6.141)

= Tr[

σA

(σ− 1

2A ρAσ

− 12

A

)α](4.6.142)

= Qα(ρA‖σA). (4.6.143)

The second equality follows from (4.6.135). The sole inequality is a conse-quence of the operator Jensen inequality and the fact that xα is operator con-vex for α ∈ (1, 2]. Indeed, for M a completely positive unital map, item 2. ofTheorem 2.15 implies that

f (M(X)) ≤M( f (X)) (4.6.144)

for Hermitian X and an operator convex function f . The second-to-last equal-ity follows from (4.6.137).

Applying the same reasoning as above, but using the fact that xα is oper-ator concave for α ∈ (0, 1), we find for α ∈ (0, 1) that

Qα(ρA‖σA) ≥ Qα(ρAB‖σAB). (4.6.145)

Putting together the above and employing definitions, we find that the fol-lowing inequality holds for α ∈ (0, 1) ∪ (1, 2]:

Dα(ρAB‖σAB) ≥ Dα(ρA‖σA), (4.6.146)

concluding the proof.

With the data-processing inequality for the geometric Rényi relative en-tropy in hand, we can establish some additional properties.

320

Page 332: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.44 Additional Properties of Geometric Rényi RelativeEntropy

The geometric Rényi relative entropy Dα satisfies the following prop-erties for every state ρ and positive semi-definite operator σ for α ∈(0, 1) ∪ (1, 2]:

1. If Tr[σ] ≤ Tr[ρ] = 1, then Dα(ρ‖σ) ≥ 0.

2. Faithfulness: Suppose that Tr[σ] ≤ Tr[ρ] = 1 and let α ∈ (0, 1) ∪(1, ∞). Then Dα(ρ‖σ) = 0 if and only if ρ = σ.

3. If ρ ≤ σ, then Dα(ρ‖σ) ≤ 0.

4. For every positive semi-definite operator σ′ such that σ′ ≥ σ, thefollowing inequality holds Dα(ρ‖σ) ≥ Dα(ρ‖σ′).

PROOF:

1. Apply the data-processing inequality with the channel being the full trace-out channel:

Dα(ρ‖σ) ≥ Dα(Tr[ρ]‖Tr[σ]) (4.6.147)

=1

α− 1log2

[(Tr[ρ])α (Tr[σ])1−α

](4.6.148)

= − log2 Tr[σ] (4.6.149)≥ 0. (4.6.150)

2. If ρ = σ, then it follows by direct evaluation that Dα(ρ‖σ) = 0 for α ∈(0, 1) ∪ (1, ∞).

To see the other implication, suppose first that (0, 1)∪ (1, 2]. Then Dα(ρ‖σ) =0 implies that equality is achieved in the two inequalities in item 1. above.So then Tr[σ] = 1. Furthermore, we conclude from data processing thatDα(M(ρ)‖M(σ)) = 0 for all measurement channels M. This includes themeasurement that achieves the trace distance. By applying the faithful-ness of the classical Rényi relative entropy on the distributions that resultfrom the optimal trace-distance measurement, we conclude that ρ = σ.To get the range outside the data-processing interval of (0, 1)∪ (1, 2], notethat Dα(ρ‖σ) = 0 for α > 2 implies by monotonicity (Property 2 of Propo-sition 4.42) that Dα(ρ‖σ) = 0 for α ≤ 2. Then it follows that ρ = σ.

321

Page 333: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

3. Consider that ρ ≤ σ implies that σ − ρ ≥ 0. Then define the followingpositive semi-definite operators:

ρ := |0〉〈0| ⊗ ρ, (4.6.151)σ := |0〉〈0| ⊗ ρ + |1〉〈1| ⊗ (σ− ρ) . (4.6.152)

By exploiting the direct-sum property of geometric Rényi relative en-tropy (Proposition 4.42) and the data-processing inequality (Theorem 4.43),we find that

0 = Dα(ρ‖ρ) = Dα(ρ‖σ) ≥ Dα(ρ‖σ), (4.6.153)

where the inequality follows from data processing with respect to partialtrace over the classical register.

4. Similar to the above proof, the condition σ′ ≥ σ implies that σ′ − σ ≥ 0.Then define the following positive semi-definite operators:

ρ := |0〉〈0| ⊗ ρ, (4.6.154)σ := |0〉〈0| ⊗ σ + |1〉〈1| ⊗

(σ′ − σ

). (4.6.155)

By exploiting the direct-sum property of geometric Rényi relative en-tropy (Proposition 4.42) and the data-processing inequality (Theorem 4.43),we find that

Dα(ρ‖σ) = Dα(ρ‖σ) ≥ Dα(ρ‖σ′), (4.6.156)

where the inequality follows from data processing with respect to partialtrace over the classical register.

The data-processing inequality for the geometric Rényi relative entropycan be written using the geometric Rényi relative quasi-entropy Qα(ρ‖σ) as

1α− 1

log2 Qα(ρ‖σ) ≥1

α− 1log2 Qα(N(ρ)‖N(σ)). (4.6.157)

Since α − 1 is negative for α ∈ (0, 1), we can use the monotonicity of thefunction log2 to obtain

Qα(ρ‖σ) ≥ Qα(N(ρ)‖N(σ)), for α ∈ (1, 2], (4.6.158)

Qα(ρ‖σ) ≤ Qα(N(ρ)‖N(σ)), for α ∈ (0, 1). (4.6.159)

We can use this to establish some convexity/concavity statements for the ge-ometric Rényi relative entropy.

322

Page 334: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.45 Joint Convexity & Concavity of the GeometricRényi Relative Quasi-Entropy

Let p : X → [0, 1] be a probability distribution over a finite alphabet Xwith associated |X|-dimensional system X, let

ρx

A

x∈X be a set of stateson system A, and let

σx

A

x∈X be a set of positive semi-definite operatorson A. Then, for α ∈ (1, 2],

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥∑x∈X

p(x)σxA

)≤ ∑

x∈Xp(x)Qα(ρ

xA‖σx

A), (4.6.160)

and for α ∈ (0, 1),

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥∑x∈X

p(x)σxA

)≥ ∑

x∈Xp(x)Qα(ρ

xA‖σx

A). (4.6.161)

Consequently, the geometric Rényi relative entropy Dα is jointly convexfor α ∈ (0, 1):

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥∑x∈X

p(x)σxA

)≤ ∑

x∈Xp(x)Dα(ρ

xA‖σx

A). (4.6.162)

PROOF: The first two inequalities follow directly from the direct-sum prop-erty of the geometric Rényi relative entropy (Proposition 4.42), the data-processinginequality (Theorem 4.43), and Proposition 4.15. The last inequality followsfrom the first by applying the logarithm, scaling by 1/ (α− 1), and taking amaximum.

Although the geometric Rényi relative entropy is not jointly convex forα ∈ (1, 2] , it is jointly quasi-convex, in the sense that

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥∑x∈X

p(x)σxA

)≤ max

x∈XDα(ρ

xA‖σx

A), (4.6.163)

for every finite alphabet X, probability distribution p : X→ [0, 1], set

ρxA

x∈Xof states, and set

σx

A

x∈X of positive semi-definite operators. Indeed, from

323

Page 335: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

(4.6.160), we immediately obtain

(∑

x∈Xp(x)ρx

A

∥∥∥∥∥∑x∈X

p(x)σxA

)≤ max

x∈XQα(ρ

xA‖σx

A). (4.6.164)

Taking the logarithm and multiplying by 1α−1 on both sides of this inequality

leads to (4.6.163).

The geometric Rényi relative entropy has another interpretation, which isworthwhile to mention.

Proposition 4.46 Geometric Rényi Relative Entropy from ClassicalPreparations

Let ρ be a state and σ a positive semi-definite operator satisfyingsupp(ρ) ⊆ supp(σ). For all α ∈ (0, 1) ∪ (1, 2], the geometric Rényi rel-ative entropy is equal to the smallest value that the classical Rényi rel-ative entropy can take by minimizing over classical–quantum channelsthat realize the state ρ and the positive semi-definite operator σ. That is,the following equality holds

Dα(ρ‖σ) = infp,q,P

Dα(p‖q) : P(ω(p)) = ρ, P(ω(q)) = σ , (4.6.165)

where the classical Rényi relative entropy is defined in (4.4.19), the chan-nel P is a classical–quantum channel, p : X → [0, 1] is a probability dis-tribution over a finite alphabet X, q : X → [0, ∞) is a positive functionon X, ω(p) := ∑x∈X p(x)|x〉〈x|, and ω(q) := ∑x∈X q(x)|x〉〈x|.

PROOF: First, suppose that there exists a quantum channel P such that

P(ω(p)) = ρ, P(ω(q)) = σ. (4.6.166)

Then consider the following chain of inequalities:

Dα(p‖q) = Dα(ω(p)‖ω(q)) (4.6.167)

≥ Dα(P(ω(p))‖P(ω(q))) (4.6.168)

= Dα(ρ‖σ). (4.6.169)

The first equality follows because the geometric Rényi relative entropy re-duces to the classical Rényi relative entropy for commuting operators. The

324

Page 336: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

inequality is a consequence of the data-processing inequality for the geomet-ric Rényi relative entropy (Theorem 4.43). The final equality follows from theconstraint in (4.6.166). Since the inequality holds for arbitrary p, q, and P

satisfying (4.6.166), we conclude that

infp,q,P

Dα(p‖q) : P(p) = ρ,P(q) = σ ≥ Dα(ρ‖σ). (4.6.170)

The equality in (4.6.165) then follows by demonstrating a specific distri-bution p, positive function q, and preparation channel P that saturate the in-equality in (4.6.170). The optimal choices of p, q, and P are given by

p(x) := λxq(x), (4.6.171)q(x) := Tr[Πxσ], (4.6.172)

P(·) := ∑x〈x|(·)|x〉σ

12 Πxσ

12

q(x), (4.6.173)

where the spectral decomposition of the positive semi-definite operator σ−12 ρσ−

12

is given byσ−

12 ρσ−

12 = ∑

xλxΠx. (4.6.174)

The choice of p(x) above is a probability distribution because

∑x

p(x) = ∑x

λxq(x) = ∑x

λx Tr[Πxσ] = Tr[σ−12 ρσ−

12 σ] = Tr[Πσρ] = 1.

(4.6.175)

The preparation channel P is a classical–quantum channel that measures the

input in the basis |x〉x and prepares the state σ12 Πxσ

12

q(x) if the measurementoutcome is x. We find that

P(ω(p)) = ∑x

p(x)q(x)

σ12 Πxσ

12 = ∑

x

λxq(x)q(x)

σ12 Πxσ

12 = σ

12

(∑x

λxΠx

12

= σ12

(σ−

12 ρσ−

12

12 = ΠσρΠσ = ρ, (4.6.176)

and

P(ω(q)) = ∑x

q(x)q(x)

σ12 Πxσ

12 = σ

12

(∑x

Πx

12 = σ. (4.6.177)

325

Page 337: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Finally, consider the classical Rényi relative quasi-entropy:

∑x

p(x)αq(x)1−α = ∑x(λxq(x))α q(x)1−α = ∑

xλα

xq(x) = ∑x

λαx Tr[Πxσ]

= Tr

(∑x

λαxΠx

)]= Tr

[σ(

σ−12 ρσ−

12

)α]= Qα(ρ‖σ),

(4.6.178)

where the second-to-last equality follows from the spectral decompositionin (4.6.174) and the form of the geometric Rényi relative quasi-entropy fromProposition 4.38. As a consequence of the equality

∑x

p(x)αq(x)1−α = Qα(ρ‖σ), (4.6.179)

and the fact that these choices of p, q, and P satisfy the constraints P(p) = ρand P(q) = σ, we conclude that

Dα(p‖q) = Dα(ρ‖σ). (4.6.180)

Combining this equality with (4.6.170), we conclude the equality in (4.6.165).

The following proposition establishes the ordering between the sandwiched,Petz–, and geometric Rényi relative entropies for the interval α ∈ (0, 1) ∪(1, 2]. It follows by applying similar reasoning as in the proof of Proposi-tion 4.46.

Proposition 4.47

Let ρ be a state and σ a positive semi-definite operator. For α ∈ (0, 1) ∪(1, 2], the following inequalities hold

Dα(ρ‖σ) ≤ Dα(ρ‖σ) ≤ Dα(ρ‖σ), (4.6.181)

for the sandwiched (Dα), Petz (Dα), and geometric (Dα) Rényi relativeentropies.

PROOF: The first inequality was stated as the last property of Proposition 4.29.So we establish the proof of the second inequality here. Suppose that P is aclassical–quantum channel, p : X → [0, 1] is a probability distribution over afinite alphabet X, and q : X→ (0, ∞) is a positive function on X satisfying

P(ω(p)) = ρ, P(ω(q)) = σ, (4.6.182)

326

Page 338: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

where

ω(p) := ∑x∈X

p(x)|x〉〈x|, ω(q) := ∑x∈X

q(x)|x〉〈x|. (4.6.183)

Then consider the following chain of inequalities:

Dα(p‖q) = Dα(ω(p)‖ω(q)) (4.6.184)≥ Dα(P(ω(p))‖P(ω(q))) (4.6.185)= Dα(ρ‖σ). (4.6.186)

The first equality follows because the Petz–Rényi relative entropy reduces tothe classical Rényi relative entropy for commuting operators. The inequalityfollows from the data-processing inequality for the Petz–Rényi relative en-tropy for α ∈ (0, 1) ∪ (1, 2] (Theorem 4.22). The final equality follows fromthe constraint in (4.6.182). Since the inequality above holds for all p, q, and P

satisfying (4.6.182), we conclude that

infp,q,P

Dα(p‖q) : P(p) = ρ,P(q) = σ ≥ Dα(ρ‖σ). (4.6.187)

Now applying Proposition 4.46, we conclude the second inequality in (4.6.181).

4.6.1 Proof of Proposition 4.39

First consider that(1− δ) ρ′δ ≤ ρδ ≤ ρ′δ, (4.6.188)

whereρ′δ := ρ + δπ. (4.6.189)

By operator monotonicity of xα for α ∈ (0, 1), we conclude that

(1− δ)α Tr[

σε

(σ− 1

2ε ρ′δσ

− 12

ε

)α]≤ Tr

[σε

(σ− 1

2ε ρδσ

− 12

ε

)α](4.6.190)

≤ Tr[

σε

(σ− 1

2ε ρ′δσ

− 12

ε

)α]. (4.6.191)

The prefactors in these bounds to the left of the trace expressions are uniformand independent of ε, and so it follows that

limε→0+

limδ→0+

Tr[

σε

(σ− 1

2ε ρδσ

− 12

ε

)α]= lim

ε→0+lim

δ→0+Tr[

σε

(σ− 1

2ε ρ′δσ

− 12

ε

)α],

(4.6.192)

327

Page 339: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

limδ→0+

limε→0+

Tr[

σε

(σ− 1

2ε ρδσ

− 12

ε

)α]= lim

δ→0+lim

ε→0+Tr[

σε

(σ− 1

2ε ρ′δσ

− 12

ε

)α].

(4.6.193)

Again from the operator monotonicity of xα for α ∈ (0, 1), we conclude forfixed ε > 0 that

δ1 ≤ δ2 ⇒ Tr[

σε

(σ− 1

2ε ρ′δ1

σ− 1

)α]≤ Tr

[σε

(σ− 1

2ε ρ′δ2

σ− 1

)α],

(4.6.194)where δ1 > 0. By exploiting the identity

Tr[

σε

(σ− 1

2ε ρ′δσ

− 12

ε

)α]= Tr

[ρ′δ

((ρ′δ)− 1

2 σε

(ρ′δ)− 1

2

)1−α]

(4.6.195)

from Proposition 4.37 and operator monotonicity of x1−α for α ∈ (0, 1), weconclude for fixed δ > 0 that

ε1 ≤ ε2 ⇒ Tr[

σε1

(σ− 1

2ε1 ρδσ

− 12

ε1

)α]≤ Tr

[σε2

(σ− 1

2ε2 ρ′δσ

− 12

ε2

)α],

(4.6.196)where ε1 > 0. Thus, we find that

limε→0+

limδ→0+

Tr[

σε

(σ− 1

2ε ρ′δσ

− 12

ε

)α]= inf

ε>0infδ>0

Tr[

σε

(σ− 1

2ε ρ′δσ

− 12

ε

)α], (4.6.197)

limδ→0+

limε→0+

Tr[

σε

(σ− 1

2ε ρ′δσ

− 12

ε

)α]= inf

δ>0infε>0

Tr[

σε

(σ− 1

2ε ρ′δσ

− 12

ε

)α]. (4.6.198)

Since infima can be exchanged, we conclude the statement of the proposition.

4.6.2 Proof of Proposition 4.38

First suppose that α ∈ (1, ∞) and supp(ρ) 6⊆ supp(σ). Then from Propo-sitions 4.27 and 4.40 and the fact that the sandwiched Rényi relative quasi-entropy Qα(ρ‖σ) = +∞ in this case, it follows that Qα(ρ‖σ) = +∞, thusestablishing the third expression in (4.6.16).

Now suppose that α ∈ (0, 1) ∪ (1, ∞) and supp(ρ) ⊆ supp(σ). Let usemploy the decomposition of the Hilbert space H as H = supp(σ)⊕ ker(σ).Then we can write ρ as

ρ =

(ρ0,0 ρ0,1ρ†

0,1 ρ1,1

), σ =

(σ 00 0

). (4.6.199)

328

Page 340: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Writing 1 = Πσ + Π⊥σ , where Πσ is the projection onto the support of σ andΠ⊥σ is the projection onto the orthogonal complement of supp(σ), we find that

σε =

(σ + εΠσ 0

0 εΠ⊥σ

), (4.6.200)

which implies that

σ− 1

2ε =

((σ + εΠσ)

− 12 0

0 ε−12 Π⊥σ

). (4.6.201)

The condition supp(ρ) ⊆ supp(σ) implies that ρ0,1 = 0 and ρ1,1 = 0. Then

σ− 1

2ε ρσ

− 12

ε =

((σ + εΠσ)

− 12 ρ0,0 (σ + εΠσ)

− 12 0

0 0

), (4.6.202)

so that

Tr[

σε

(σ− 1

2ε ρσ

− 12

ε

)α]

= Tr

[(σ + εΠσ 0

0 εΠ⊥σ

)([(σ + εΠσ)

− 12 ρ0,0 (σ + εΠσ)

− 12]α

00 0

)](4.6.203)

= Tr[(σ + εΠσ)

[(σ + εΠσ)

− 12 ρ0,0 (σ + εΠσ)

− 12]α]

. (4.6.204)

Taking the limit ε→ 0+ then leads to

limε→0+

Tr[

σε

(σ− 1

2ε ρσ

− 12

ε

)α]= Tr

[σ(

σ−12 ρ0,0σ−

12

)α](4.6.205)

= Tr[σ(

σ−12 ρσ−

12

)α], (4.6.206)

thus establishing the first expression in (4.6.16).

We now establish (4.6.19). For α ∈ (1, ∞) and supp(ρ) ⊆ supp(σ), thesame analysis implies that

Tr[

σε

(σ− 1

2ε ρσ

− 12

ε

)α]= Tr

[σε

(σ− 1

2ε ρ0,0σ

− 12

ε

)α], (4.6.207)

whereσε := σ + εΠσ. (4.6.208)

329

Page 341: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Since(

σ− 1

2ε ρ0,0σ

− 12

ε

= σ− 1

2ε ρ0,0σ

− 12

ε

(σ− 1

2ε ρ0,0σ

− 12

ε

)α−1(4.6.209)

for α > 1, we have that

Tr

[σεσ− 1

2ε ρ0,0σ

− 12

ε

(σ− 1

2ε ρ0,0σ

− 12

ε

)α−1]

= Tr

12ε ρ

120,0ρ

120,0σ

− 12

ε

(σ− 1

2ε ρ

120,0ρ

120,0σ

− 12

ε

)α−1]

(4.6.210)

= Tr

12ε ρ

120,0

120,0σ

− 12

ε σ− 1

2ε ρ

120,0

)α−1ρ

120,0σ

− 12

ε

](4.6.211)

= Tr

120,0σ

− 12

ε σ12ε ρ

120,0

120,0σ−1

ε ρ120,0

)α−1]

(4.6.212)

= Tr

[ρ0,0

120,0σ−1

ε ρ120,0

)α−1]

, (4.6.213)

where we applied Lemma 2.6 with f (x) = xα−1 and L = ρ120,0σ

− 12

ε . Now takingthe limit ε→ 0+, we conclude that

limε→0+

Tr[

σε

(σ− 1

2ε ρσ

− 12

ε

)α]= lim

ε→0+Tr

[ρ0,0

120,0σ−1

ε ρ120,0

)α−1]

(4.6.214)

= Tr

[ρ0,0

120,0σ−1ρ

120,0

)α−1]

(4.6.215)

= Tr[

ρ(

ρ12 σ−1ρ

12

)α−1]

, (4.6.216)

for the case α ∈ (1, ∞) and supp(ρ) ⊆ supp(σ), thus establishing (4.6.19).

For the case that α ∈ (0, 1) and supp(σ) ⊆ supp(ρ), we can employthe limit exchange from Lemma 4.39 and a similar argument as in (4.6.199)–(4.6.206), but with respect to the decomposition H = supp(ρ) ⊕ ker(ρ), toconclude that

Qα(ρ‖σ) = Tr[

ρ(

ρ−12 σρ−

12

)1−α]

, (4.6.217)

330

Page 342: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

thus establishing the second expression in (4.6.16). This case amounts to theexchange ρ↔ σ and α↔ 1− α.

We finally consider the case α ∈ (0, 1) and supp(ρ) 6⊆ supp(σ), which isthe most involved case. Consider that

σε := σ + ε1 =

(σε 00 εΠ⊥σ

), (4.6.218)

where σε := σ + εΠσ. Let us define

ρδ := (1− δ) ρ + δπ, (4.6.219)

with δ ∈ (0, 1) and π the maximally mixed state. By invoking Lemma 4.39,we conclude that the following exchange of limits is possible for α ∈ (0, 1):

limε→0+

Dα(ρ‖σε) = limε→0+

limδ→0+

Dα(ρδ‖σε) = limδ→0+

limε→0+

Dα(ρδ‖σε). (4.6.220)

Now define

ρδ0,0 := ΠσρδΠσ, ρδ

0,1 := ΠσρδΠ⊥σ , ρδ1,1 := Π⊥σ ρδΠ⊥σ , (4.6.221)

so that

ρδ =

(ρδ

0,0 ρδ0,1

(ρδ0,1)

† ρδ1,1

). (4.6.222)

Then

Dα(ρδ‖σε) =1

α− 1log2 Tr

[σε

(σ− 1

2ε ρδσ

− 12

ε

)α]. (4.6.223)

Consider that

σ− 1

2ε ρδσ

− 12

ε =

(σε 00 εΠ⊥σ

)− 12(

ρδ0,0 ρδ

0,1(ρδ

0,1)† ρδ

1,1

)(σε 00 εΠ⊥σ

)− 12

(4.6.224)

=

(σ− 1

2ε 00 ε−

12 Π⊥σ

)(ρδ

0,0 ρδ0,1

(ρδ0,1)

† ρδ1,1

)(σ− 1

2ε 00 ε−

12 Π⊥σ

)(4.6.225)

=

σ

− 12

ε ρδ0,0σ

− 12

ε ε−12 σ− 1

2ε ρδ

0,1Π⊥σε−

12 Π⊥σ (ρδ

0,1)†σ− 1

2ε ε−1Π⊥σ ρδ

1,1Π⊥σ

(4.6.226)

=

σ

− 12

ε ρδ0,0σ

− 12

ε ε−12 σ− 1

2ε ρδ

0,1

ε−12 (ρδ

0,1)†σ− 1

2ε ε−1ρδ

1,1

. (4.6.227)

331

Page 343: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

So then

Tr[

σε

(σ− 1

2ε ρδσ

− 12

ε

)α]

= Tr

(

σε 00 εΠ⊥σ

) σ

− 12

ε ρδ0,0σ

− 12

ε ε−12 σ− 1

2ε ρδ

0,1

ε−12 (ρδ

0,1)†σ− 1

2ε ε−1ρδ

1,1

α (4.6.228)

= Tr

(

σε 00 εΠ⊥σ

)ε−1

εσ

− 12

ε ρδ0,0σ

− 12

ε ε12 σ− 1

2ε ρδ

0,1

ε12 (ρδ

0,1)†σ− 1

2ε ρδ

1,1

α (4.6.229)

= Tr

(

ε−ασε 00 ε1−αΠ⊥σ

)εσ

− 12

ε ρδ0,0σ

− 12

ε ε12 σ− 1

2ε ρδ

0,1

ε12 (ρδ

0,1)†σ− 1

2ε ρδ

1,1

α (4.6.230)

Let us define

K(ε) :=

εσ

− 12

ε ρδ0,0σ

− 12

ε ε12 σ− 1

2ε ρδ

0,1

ε12 (ρδ

0,1)†σ− 1

2ε ρδ

1,1

, (4.6.231)

so that we can write

Tr[

σε

(σ− 1

2ε ρδσ

− 12

ε

)α]= Tr

[(ε−ασε 0

0 ε1−αΠ⊥σ

)(K(ε))α

]. (4.6.232)

Now let us invoke Lemma 4.48 with the substitutions

A↔ ρδ1,1, (4.6.233)

B↔ (ρδ0,1)

†σ− 1

2ε , (4.6.234)

C ↔ σ− 1

2ε ρδ

0,0σ− 1

2ε , (4.6.235)

ε↔ ε12 . (4.6.236)

Defining

L(ε) :=(

εS(ρδ, σε) 00 ρδ

1,1 + εR

), (4.6.237)

S(ρδ, σε) := σ− 1

(ρδ

0,0 − ρδ0,1(ρ

δ1,1)−1(ρδ

0,1)†)

σ− 1

2ε , (4.6.238)

R := Re[(ρδ1,1)−1(ρδ

0,1)†(σε)

−1(ρδ0,1)], (4.6.239)

332

Page 344: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

we conclude from Lemma 4.48 that∥∥∥K(ε)− e−i

√εGL(ε)ei

√εG∥∥∥

∞≤ o(ε), (4.6.240)

where G in Lemma 4.48 is defined from A and B above. The inequality in(4.6.240) in turn implies the following operator inequalities:

e−i√

εGL(ε)ei√

εG − o(ε)1 ≤ K(ε) ≤ e−i√

εGL(ε)ei√

εG + o(ε)1. (4.6.241)

Observe that

e−i√

εGL(ε)ei√

εG + o(ε)1 = e−i√

εG [L(ε) + o(ε)1] ei√

εG. (4.6.242)

Now invoking these and the operator monotonicity of the function xα forα ∈ (0, 1), we find that

Tr[

σε

(σ− 1

2ε ρδσ

− 12

ε

)α](4.6.243)

= Tr[(

ε−ασε 00 ε1−αΠ⊥σ

)(K(ε))α

](4.6.244)

≤ Tr[(

ε−ασε 00 ε1−αΠ⊥σ

)(e−i√

εG [L(ε) + o(ε)1] ei√

εG)α]

(4.6.245)

= Tr[(

ε−ασε 00 ε1−αΠ⊥σ

)e−i√

εG (L(ε) + o(ε)1)α ei√

εG]

. (4.6.246)

Consider that

(L(ε) + o(ε)1)α

=

(εS(ρδ, σε) + o(ε)1 0

0 ρδ1,1 + εR + o(ε)1

(4.6.247)

=

((εS(ρδ, σε) + o(ε)1

)α 0

0(

ρδ1,1 + εR + o(ε)1

)(4.6.248)

=

(εα(S(ρδ, σε) + o(1)1

)α 0

0(

ρδ1,1 + εR + o(ε)1

). (4.6.249)

Now expanding ei√

εG to first order in order to evaluate (4.6.246) (higher orderterms will end up being irrelevant), we find that

Tr[(

ε−ασε 00 ε1−αΠ⊥σ

)e−i√

εG (L(ε) + o(ε)1)α ei√

εG]

333

Page 345: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= Tr[(

ε−ασε 00 ε1−αΠ⊥σ

)(L(ε) + o(ε)1)α

]

+ Tr[(

ε−ασε 00 ε1−αΠ⊥σ

) (−i√

εG)(L(ε) + o(ε)1)α

]

+ Tr[(

ε−ασε 00 ε1−αΠ⊥σ

)(L(ε) + o(ε)1)α (i

√εG)]

+ o(1) (4.6.250)

= Tr

[(σε

(S(ρδ, σε) + o(1)1

)α 0

0 ε1−αΠ⊥σ(

ρδ1,1 + εR + o(ε)1

)]

− i√

ε Tr

[((S(ρδ, σε) + o(1)1

)ασε 0

0 ε1−α(

ρδ1,1 + εR + o(ε)1

)αΠ⊥σ

)G

]

+ i√

ε Tr

[(σε

(S(ρδ, σε) + o(1)1

)α 0

0 ε1−αΠ⊥σ(

ρδ1,1 + εR + o(ε)1

)G

]+ o(1)

(4.6.251)

= Tr

[(σε

(S(ρδ, σε) + o(1)1

)α 0

0 ε1−αΠ⊥σ(

ρδ1,1 + εR + o(ε)1

)]+ o(1)

(4.6.252)

= Tr[σε

(S(ρδ, σε) + o(1)1

)α]+ ε1−α Tr[Π⊥σ

(ρδ

1,1 + εR + o(ε)1)α

] + o(1).

(4.6.253)

By observing the last line, we see that higher order terms for ei√

εG includeprefactors of ε (or higher powers), which vanish in the ε → 0+ limit. Nowtaking the limit ε→ 0+, we find that

limε→0+

Tr[(

ε−ασε 00 ε1−αΠ⊥σ

)e−i√

εG (L(ε) + o(ε)1)α ei√

εG]

= Tr[σ(

σ−12

(ρδ

0,0 − ρδ0,1(ρ

δ1,1)−1(ρδ

0,1)†)

σ−12

)α], (4.6.254)

where the inverses are taken on the support of σ. By proceeding in a simi-lar way, but using the lower bound in (4.6.241), we find the following lowerbound on (4.6.243):

Tr[(

ε−ασε 00 ε1−αΠ⊥σ

)e−i√

εG (L(ε)− o(ε)1)α ei√

εG]

. (4.6.255)

Then by the same argument above, the lower bound on (4.6.243) after taking

334

Page 346: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

the limit ε→ 0+ is the same as in (4.6.254). So we conclude that

limε→0+

Tr[

σε

(σ− 1

2ε ρδσ

− 12

ε

)α]= Tr

[σ(

σ−12

(ρδ

0,0 − ρδ0,1(ρ

δ1,1)−1(ρδ

0,1)†)

σ−12

)α].

(4.6.256)Now consider that

limδ→0+

ρδ0,0 − ρδ

0,1(ρδ1,1)−1(ρδ

0,1)† = ρ0,0 − ρ0,1ρ−1

1,1 ρ†0,1, (4.6.257)

where the inverse on the right is taken on the support of ρ1,1. This followsbecause the image of ρ†

0,1 is contained in the support of ρ1,1. Thus, we take thelimit δ→ 0+, and find that

limδ→0+

limε→0+

Tr[

σε

(σ− 1

2ε ρδσ

− 12

ε

)α]= Tr

[σ(

σ−12

(ρ0,0 − ρ0,1ρ−1

1,1 ρ†0,1

)σ−

12

)α],

(4.6.258)where all inverses are taken on the support. This concludes the proof.

Lemma 4.48

Let A be an invertible Hermitian operator, B a linear operator, C a Her-mitian operator, and let ε > 0. Then with

M(ε) :=[

A εBεB† ε2C

], (4.6.259)

D(ε) :=[

A + ε2 Re[A−1BB†] 00 ε2 (C− B† A−1B

)]

, (4.6.260)

G :=[

0 −iA−1BiB† A−1 0

], (4.6.261)

the following inequality holds∥∥∥M(ε)− e−iεGD(ε)eiεG

∥∥∥∞≤ o(ε2). (4.6.262)

Proof. Observe that G is Hermitian and consider that

eiεG M(ε)e−iεG =

(I + iεG− ε2

2G2)

M(ε)

(I − iεG− ε2

2G2)+ o(ε2).

Then we find that

335

Page 347: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

(I + iεG− ε2

2G2)

M(ε)

(I − iεG− ε2

2G2)= M(ε) + iε [GM(ε)−M(ε)G]

+ ε2[

GM(ε)G− 12

G2M(ε)− 12

M(ε)G2]+ o(ε2). (4.6.263)

Now observe that

GM(ε) =

[0 −iA−1B

iB† A−1 0

] [A εB

εB† ε2C

](4.6.264)

=

[−iεA−1BB† −iε2A−1BCiB† iεB† A−1B

](4.6.265)

=

[−iεA−1BB† o(ε)iB† iεB† A−1B

], (4.6.266)

M(ε)G = [GM(ε)]† (4.6.267)

=

[iεBB† A−1 −iB

o(ε) −iεB† A−1B

], (4.6.268)

which implies that

iε [GM(ε)−M(ε)G]

= iε([−iεA−1BB† o(ε)

iB† iεB† A−1B

]−[

iεBB† A−1 −iBo(ε) −iεB† A−1B

])(4.6.269)

=

[2ε2 Re[A−1BB†] −εB + o(ε2)−εB† + o(ε2) −2ε2B† A−1B

]. (4.6.270)

Also, observe that

GM(ε)G =

[o(1) o(ε)iB† o(1)

] [0 −iA−1B

iB† A−1 0

](4.6.271)

=

[o(ε) o(1)o(1) B† A−1B

], (4.6.272)

G2M(ε) = G[GM(ε)] (4.6.273)

=

[0 −iA−1B

iB† A−1 0

] [o(1) o(ε)iB† o(1)

](4.6.274)

=

[A−1BB† o(1)

o(1) o(ε)

], (4.6.275)

M(ε)G2 =[

G2M(ε)]†

(4.6.276)

336

Page 348: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

=

[BB† A−1 o(1)

o(1) o(ε)

]. (4.6.277)

So then we find that

ε2[

GM(ε)G− 12

G2M(ε)− 12

M(ε)G2]

= ε2([

o(ε) o(1)o(1) B† A−1B

]− 1

2

[A−1BB† o(1)

o(1) o(ε)

]− 1

2

[BB† A−1 o(1)

o(1) o(ε)

])

(4.6.278)

=

[−ε2 Re[A−1BB†] + o(ε3) o(ε2)o(ε2) ε2B† A−1B + o(ε3)

]. (4.6.279)

So then(

I + iεG− ε2

2G2)

M(ε)

(I − iεG− ε2

2G2)

= M(ε) + iε [GM(ε)−M(ε)G]

+ ε2[

GM(ε)G− 12

G2M(ε)− 12

M(ε)G2]+ o(ε2) (4.6.280)

=

[A εB

εB† ε2C

]+

[2ε2 Re[A−1BB†] −εB + o(ε2)−εB† + o(ε2) −2ε2B† A−1B

]

+

[−ε2 Re[A−1BB†] + o(ε3) o(ε2)o(ε2) ε2B† A−1B + o(ε3)

]+ o(ε2) (4.6.281)

=

[A + ε2 Re[A−1BB†] 0

0 ε2 (C− B† A−1B)]+ o(ε2) (4.6.282)

= D(ε) + o(ε2). (4.6.283)

So we conclude that

eiεG M(ε)e−iεG = D(ε) + o(ε2), (4.6.284)

which in turn implies that

M(ε) = e−iεGD(ε)eiεG + o(ε2), (4.6.285)

from which we conclude the claim in (4.6.262).

337

Page 349: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

4.7 Belavkin–Staszewski Relative Entropy

A different quantum generalization of the classical relative entropy is givenby the Belavkin–Staszewski3 relative entropy:

Definition 4.49 Belavkin–Staszewski Relative Entropy

The Belavkin–Staszewski relative entropy of a quantum state ρ and apositive semi-definite operator σ is defined as

D(ρ‖σ) :=

Tr[ρ log2

12 σ−1ρ

12

)]if supp(ρ) ⊆ supp(σ)

+∞ otherwise, (4.7.1)

where the inverse σ−1 is taken on the support of σ and the logarithm isevaluated on the support of ρ.

This quantum generalization of classical relative entropy is not known topossess an information-theoretic meaning. However, it is quite useful for ob-taining upper bounds on quantum channel capacities and quantum channeldiscrimination rates, as considered in Part III of this book.

An important property of the Belavkin–Staszewski relative entropy is thatit is the limit of the geometric Rényi relative entropy as α→ 1.

Proposition 4.50

Let ρ be a state and σ a positive semi-definite operator. Then, in the limitα → 1, the geometric Rényi relative entropy converges to the Belavkin–Staszewski relative entropy:

limα→1

Dα(ρ‖σ) = D(ρ‖σ). (4.7.2)

PROOF: Suppose at first that supp(ρ) ⊆ supp(σ). Then Dα(ρ‖σ) is finite forall α ∈ (0, 1) ∪ (1, ∞), and we can write the following explicit formula for thegeometric Rényi relative entropy by employing Proposition 4.38:

Dα(ρ‖σ) =1

α− 1log2 Qα(ρ‖σ) (4.7.3)

3The name Staszewski is pronounced Stah·shev·ski, with emphasis on the second syllable.

338

Page 350: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

=1

α− 1log2 Tr

[σ(

σ−12 ρσ−

12

)α]. (4.7.4)

Our assumption implies that Tr[Πσρ] = 1, and we find that

Q1(ρ‖σ) = Tr[σ(

σ−12 ρσ−

12

)](4.7.5)

= Tr[Πσρ] (4.7.6)= 1. (4.7.7)

Since log2 1 = 0, we can write

Dα(ρ‖σ) =log2 Qα(ρ‖σ)− log2 Q1(ρ‖σ)

α− 1, (4.7.8)

so that

limα→1

Dα(ρ‖σ) = limα→1

log2 Qα(ρ‖σ)− log2 Q1(ρ‖σ)α− 1

(4.7.9)

=d

dαlog2 Qα(ρ‖σ)

∣∣∣∣α=1

(4.7.10)

=1

ln(2)

ddα Qα(ρ‖σ)

∣∣∣α=1

Q1(ρ‖σ)(4.7.11)

=1

ln(2)d

dαQα(ρ‖σ)

∣∣∣∣α=1

. (4.7.12)

Then

ddα

Qα(ρ‖σ)∣∣∣∣α=1

=d

dαTr[σ(

σ−12 ρσ−

12

)α]∣∣∣∣α=1

= Tr[

σd

(σ−

12 ρσ−

12

)α]∣∣∣∣

α=1.

For a positive semi-definite operator X with spectral decomposition

X = ∑z

νzΠz, (4.7.13)

it follows that

ddα

∣∣∣∣α=1

=d

dα ∑z

ναz Πz

∣∣∣∣∣α=1

(4.7.14)

339

Page 351: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= ∑z

(d

dανα

z

∣∣∣∣α=1

)Πz (4.7.15)

= ∑z(να

z ln ναz |α=1)Πz (4.7.16)

= ∑z(νz ln νz)Πz (4.7.17)

= X ln∗ X, (4.7.18)

where

ln∗(x) :=

ln(x) x > 00 x = 0 . (4.7.19)

Thus we find that

Tr[

σd

(σ−

12 ρσ−

12

)α]∣∣∣∣

α=1

= Tr[σ(

σ−12 ρσ−

12

)ln∗(

σ−12 ρσ−

12

)](4.7.20)

= Tr[σ

12 ρ

12 ρ

12 σ−

12 ln∗

(σ−

12 ρ

12 ρ

12 σ−

12

)](4.7.21)

= Tr[σ

12 ρ

12 ln∗

12 σ−

12 σ−

12 ρ

12

12 σ−

12

](4.7.22)

= Tr[ρ

12 Πσρ

12 ln∗

12 σ−

12 σ−

12 ρ

12

)](4.7.23)

= Tr[ρ ln(

ρ12 σ−1ρ

12

)]. (4.7.24)

The third equality follows from Lemma 2.6. The final equality follows fromthe assumption supp(ρ) ⊆ supp(σ) and by applying the interpretation of thelogarithm exactly as stated in Definition 4.49. Then we find that

limα→1

Dα(ρ‖σ) = Tr[ρ log2

12 σ−1ρ

12

)], (4.7.25)

for the case in which supp(ρ) ⊆ supp(σ).

Now suppose that α ∈ (1, ∞) and supp(ρ) 6⊆ supp(σ). Then Dα(ρ‖σ) =+∞, so that limα→1+ Dα(ρ‖σ) = +∞, consistent with the definition of theBelavkin–Staszewski relative entropy in this case (see Definition 4.49).

Suppose that α ∈ (0, 1) and supp(ρ) 6⊆ supp(σ). Employing Proposi-tion 4.40, we have that Dα(ρ‖σ) ≥ Dα(ρ‖σ) for all α ∈ (0, 1). Since limα→1− Dα(ρ‖σ) =+∞ in this case, it follows that limα→1− Dα(ρ‖σ) = +∞.

340

Page 352: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Therefore,

limα→1−

Dα(ρ‖σ)

=

Tr[ρ log2

12 σ−1ρ

12

)]if supp(ρ) ⊆ supp(σ)

+∞ otherwise(4.7.26)

= D(ρ‖σ). (4.7.27)

To conclude, we have established that limα→1+ Dα(ρ‖σ) = limα→1− Dα(ρ‖σ) =D(ρ‖σ), which means that

limα→1

Dα(ρ‖σ) = D(ρ‖σ), (4.7.28)

as required.

The following inequality relates the quantum relative entropy to the Belavkin–Staszewski relative entropy:

Proposition 4.51

Let ρ be a state and σ a positive semi-definite operator. Then the quan-tum relative entropy is never larger than the Belavkin–Staszewski rela-tive entropy:

D(ρ‖σ) ≤ D(ρ‖σ). (4.7.29)

PROOF: If supp(ρ) 6⊆ supp(σ), then there is nothing to prove in this casebecause both

D(ρ‖σ) = D(ρ‖σ) = +∞, (4.7.30)

and so the inequality in (4.7.29) holds trivially in this case. So let us supposeinstead that supp(ρ) ⊆ supp(σ). From Propositions 4.40 and 4.38, we con-clude for all α ∈ (0, 1) ∪ (1, ∞) that

Dα(ρ‖σ) ≤ Dα(ρ‖σ). (4.7.31)

From Proposition 4.28, we know that

limα→1

Dα(ρ‖σ) = D(ρ‖σ). (4.7.32)

While from Proposition 4.50, we know that

limα→1

Dα(ρ‖σ) = D(ρ‖σ). (4.7.33)

341

Page 353: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Thus, applying the limit α → 1 to (4.7.31) and the two equalities above, weconclude (4.7.29).

Similar to what was shown in Proposition 4.2, Definition 4.49 is consistentwith the following limit:

Proposition 4.52

For every state ρ and positive semi-definite operator σ, the followinglimit holds

D(ρ‖σ) = limε→0+

limδ→0+

Tr[

ρδ log2

12δ σ−1

ε ρ12δ

)], (4.7.34)

where δ ∈ (0, 1) and

ρδ := (1− δ) ρ + δπ, σε := σ + ε1, (4.7.35)

with π the maximally mixed state.

PROOF: Suppose first that supp(ρ) ⊆ supp(σ). We follow an approach simi-lar to that given in the proof of Proposition 4.38. Let us employ the decompo-sition of the Hilbert space into supp(σ)⊕ ker(σ). Then we can write ρ and σas in (4.6.199), so that

σ−1ε =

((σ + εΠσ)

−1 00 ε−1Π⊥σ

), (4.7.36)

where we have followed the developments in (4.6.199)–(4.6.201). The condi-tion supp(ρ) ⊆ supp(σ) implies that ρ0,1 = 0 and ρ1,1 = 0. It thus followsthat limδ→0+ ρδ = ρ0,0. We then find that

Tr[

ρδ log2

12δ σ−1

ε ρ12δ

)]= Tr

12δ σ

12ε σ− 1

2ε ρ

12δ log2

12δ σ− 1

2ε σ

− 12

ε ρ12δ

)](4.7.37)

= Tr[

ρ12δ σ

12ε log2

(σ− 1

2ε ρ

12δ ρ

12δ σ− 1

)σ− 1

2ε ρ

12δ

](4.7.38)

= Tr[

log2

(σ− 1

2ε ρδσ

− 12

ε

)(σ− 1

2ε ρδσ

− 12

ε

)σε

](4.7.39)

= Tr[

σε

(σ− 1

2ε ρδσ

− 12

ε

)log2

(σ− 1

2ε ρδσ

− 12

ε

)](4.7.40)

= Tr[

σεη

(σ− 1

2ε ρδσ

− 12

ε

)], (4.7.41)

342

Page 354: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

where the second equality follows from applying Lemma 2.6 with f = log2

and L = ρ12δ σ− 1

2ε . The second-to-last equality follows because σ

− 12

ε ρδσ− 1

2ε com-

mutes with log2(σ− 1

2ε ρδσ

− 12

ε ), and by employing cyclicity of trace. In the lastline, we made use of the following function:

η(x) := x log2 x, (4.7.42)

defined for all x ∈ [0, ∞) with η(0) = 0. By appealing to the continuity of thefunction η(x) on x ∈ [0, ∞) and the fact that limδ→0+ ρδ = ρ0,0, we find that

limδ→0+

Tr[

σεη

(σ− 1

2ε ρδσ

− 12

ε

)]= Tr

[σεη

(σ− 1

2ε ρ0,0σ

− 12

ε

)]. (4.7.43)

Now recall the function log2,∗ defined in (4.7.19). Using it, we can write

Tr[

σεη

(σ− 1

2ε ρ0,0σ

− 12

ε

)]

= Tr[

σεσ− 1

2ε ρ0,0σ

− 12

ε log2,∗

(σ− 1

2ε ρ0,0σ

− 12

ε

)](4.7.44)

= Tr[

σ12ε ρ

120,0ρ

120,0σ

− 12

ε log2,∗

(σ− 1

2ε ρ

120,0ρ

120,0σ

− 12

ε

)](4.7.45)

= Tr[

σ12ε ρ

120,0 log2,∗

120,0σ

− 12

ε σ− 1

2ε ρ

120,0

120,0σ

− 12

ε

](4.7.46)

= Tr[

ρ0,0 log2,∗

120,0σ−1

ε ρ120,0

)](4.7.47)

= Tr[

ρ0,0 log2,∗

120,0 (σ + εΠσ)

−1 ρ120,0

)], (4.7.48)

where the last line follows because

ρ120,0 (σ + εΠσ)

−1 ρ120,0

=

120,0 00 0

)((σ + εΠσ)

−1 00 ε−1Π⊥σ

)(ρ

120,0 00 0

)(4.7.49)

=

120,0 (σ + εΠσ)

−1 ρ120,0 0

0 0

). (4.7.50)

Now taking the limit as ε → 0+, and appealing to continuity of log2,∗(x) andx−1 for x > 0, we find that

limε→0+

Tr[

ρ0,0 log2,∗

120,0 (σ + εΠσ)

−1 ρ120,0

)]

343

Page 355: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= Tr[

ρ0,0 log2,∗

120,0σ−1ρ

120,0

)](4.7.51)

= Tr[ρ log2

12 σ−1ρ

12

)](4.7.52)

where the formula in the last line is interpreted exactly as stated in Defini-tion 4.49. Thus, we conclude that

limε→0+

limδ→0+

Tr[

ρδ log2

12δ σ−1

ε ρ12δ

)]= Tr

[ρ log2

12 σ−1ρ

12

)]. (4.7.53)

Now suppose that supp(ρ) 6⊆ supp(σ). Then applying Proposition 4.51,we find that the following inequality holds for all δ ∈ (0, 1) and ε > 0:

D(ρδ‖σε) ≥ D(ρδ‖σε). (4.7.54)

Now taking limits and applying Proposition 4.2, we find that

limε→0+

limδ→0+

D(ρδ‖σε) ≥ limε→0+

limδ→0+

D(ρδ‖σε) (4.7.55)

= limε→0+

D(ρ‖σε) (4.7.56)

= +∞. (4.7.57)

This concludes the proof.

By taking the limit α→ 1 in the statement of the data-processing inequal-ity for Dα, and applying Proposition 4.50, we immediately obtain the data-processing inequality for the Belavkin–Staszewski relative entropy.

Corollary 4.53 Data-Processing Inequality for Belavkin–StaszewskiRelative Entropy

Let ρ be a state, σ a positive semi-definite operator, and N a quantumchannel. Then

D(ρ‖σ) ≥ D(N(ρ)‖N(σ)). (4.7.58)

Some basic properties of the Belavkin–Staszewski relative entropy are asfollows:

344

Page 356: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.54 Properties of Belavkin–Staszewski Relative En-tropy

The Belavkin–Staszewski relative entropy satisfies the following prop-erties for states ρ, ρ1, ρ2 and positive semi-definite operators σ, σ1, σ2.

1. Isometric invariance: For every isometry V,

D(VρV†‖VσV†) = D(ρ‖σ). (4.7.59)

2. (a) If Tr[σ] ≤ 1, then D(ρ‖σ) ≥ 0.

(b) Faithfulness: Suppose that Tr[σ] ≤ Tr[ρ] = 1. Then D(ρ‖σ) = 0if and only if ρ = σ.

(c) If ρ ≤ σ, then D(ρ‖σ) ≤ 0.

(d) If σ ≤ σ′, then D(ρ‖σ) ≥ D(ρ‖σ′).3. Additivity:

D(ρ1 ⊗ ρ2‖σ1 ⊗ σ2) = D(ρ1‖σ1) + D(ρ2‖σ2). (4.7.60)

As a special case, for every β ∈ (0, ∞),

D(ρ‖βσ) = D(ρ‖σ) + log2

(1β

). (4.7.61)

4. Direct-sum property: Let p : X → [0, 1] be a probability distributionover a finite alphabet X with associated |X|-dimensional system X,and let q : X→ [0, ∞) be a positive function on X. Let ρx

A : x ∈ Xbe a set of states on a system A, and let σx

A : x ∈ X be a set ofpositive semi-definite operators on A. Then,

D(ρXA‖σXA) = D(p‖q) + ∑x∈X

p(x)D(ρxA‖σx

A). (4.7.62)

where

ρXA := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, (4.7.63)

σXA := ∑x∈X

q(x)|x〉〈x|X ⊗ σxA. (4.7.64)

345

Page 357: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

PROOF:

1. Isometric invariance is a direct consequence of Propositions 4.42 and 4.50.

2. All of the properties in the second item follow from data processing (Corol-lary 4.53).

(a) Applying the trace-out channel, we find that

D(ρ‖σ) ≥ D(Tr[ρ]‖Tr[σ]) (4.7.65)= Tr[ρ] log2(Tr[ρ]/ Tr[σ]) (4.7.66)= − log2 Tr[σ] (4.7.67)≥ 0. (4.7.68)

(b) If ρ = σ, then it follows by direct evalution that D(ρ‖σ) = 0. IfD(ρ‖σ) = 0 and Tr[σ] ≤ 1, then D(ρ‖σ) = 0 by Proposition 4.51 andwe conclude that ρ = σ from faithfulness of the quantum relativeentropy (Proposition 4.3).

(c) If ρ ≤ σ, then σ− ρ is positive semi-definite, and the following oper-ator is positive semi-definite:

σ := |0〉〈0| ⊗ ρ + |1〉〈1| ⊗ (σ− ρ) . (4.7.69)

Defining ρ := |0〉〈0| ⊗ ρ, we find from the direct-sum property that

0 = D(ρ‖ρ) = D(ρ‖σ) ≥ D(ρ‖σ), (4.7.70)

where the inequality follows from data processing by tracing out thefirst classical register of ρ and σ.

(d) If σ ≤ σ′, then the operator σ′ − σ is positive semi-definite and so isthe following one:

σ := |0〉〈0| ⊗ σ + |1〉〈1| ⊗(σ′ − σ

). (4.7.71)

Defining ρ := |0〉〈0| ⊗ ρ, we find from the direct-sum property that

D(ρ‖σ) = D(ρ‖σ) ≥ D(ρ‖σ′), (4.7.72)

where the inequality follows from data processing by tracing out thefirst classical register of ρ and σ.

3. Additivity follows by direct evaluation.

4. The direct-sum property follows by direct evaluation.

A statement similar to that made by Proposition 4.46 holds for the Belavkin–Staszewski relative entropy:

346

Page 358: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.55 Belavkin–Staszewski Relative Entropy from Clas-sical Preparations

Let ρ be a state and σ a positive semi-definite operator satisfyingsupp(ρ) ⊆ supp(σ). The Belavkin–Staszewski relative entropy is equalto the smallest value that the classical relative entropy can take by min-imizing over classical–quantum channels that realize the state ρ and thepositive semi-definite operator σ. That is, the following equality holds

D(ρ‖σ) = infp,q,P

D(p‖q) : P(ω(p)) = ρ, P(ω(q)) = σ , (4.7.73)

where the classical relative entropy is defined in (4.2.2), the channel P isa classical–quantum channel, p : X → [0, 1] is a probability distributionover a finite alphabet X, q : X → [0, ∞) is a positive function on X,ω(p) := ∑x∈X p(x)|x〉〈x|, and ω(q) := ∑x∈X q(x)|x〉〈x|.

PROOF: The proof is very similar to the proof of Proposition 4.46, and sowe use the same notation to provide a brief proof. By following the samereasoning that leads to (4.6.170), it follows that

infp,q,P

D(p‖q) : P(p) = ρ,P(q) = σ ≥ D(ρ‖σ). (4.7.74)

The optimal choices of p, q, and P saturating the inequality in (4.7.74) areagain given by (4.6.171)–(4.6.173). Consider for those choices that

∑x

p(x) log2

(p(x)q(x)

)= ∑

xp(x) log2(λx) (4.7.75)

= ∑x

λxq(x) log2(λx) (4.7.76)

= ∑x

λx Tr[Πxσ] log2(λx) (4.7.77)

= Tr

(∑x

λx log2(λx)Πx

)](4.7.78)

= Tr[σ(

σ−12 ρσ−

12

)log2

(σ−

12 ρσ−

12

)](4.7.79)

= Tr[ρ log2

12 σ−1ρ

12

)], (4.7.80)

where the last equality follows from reasoning similar to that used to justify(4.7.20)–(4.7.24). Then by following the reasoning at the end of the proof ofProposition 4.46, we conclude (4.7.73).

347

Page 359: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

4.8 Max-Relative Entropy

An important generalized divergence that appears in the context of placingupper bounds on communication rates of feedback-assisted protocols is themax-relative entropy.

Definition 4.56 Max-Relative Entropy

The max-relative entropy Dmax(ρ‖σ) of a state ρ and a positive semi-definite operator σ is defined as

Dmax(ρ‖σ) := log2

∥∥∥σ−12 ρσ−

12

∥∥∥∞

, (4.8.1)

if supp(ρ) ⊆ supp(σ); otherwise, Dmax(ρ‖σ) = +∞.

The max-relative entropy has the following equivalent representations:

Dmax(ρ‖σ) = log2

∥∥∥ρ12 σ−1ρ

12

∥∥∥∞

(4.8.2)

= 2 log2

∥∥∥ρ12 σ−

12

∥∥∥∞

(4.8.3)

= log2 infλ : ρ ≤ λσ. (4.8.4)= log2 sup

M≥0Tr[Mρ] : Tr[Mσ] ≤ 1. (4.8.5)

The second-to-last equality demonstrates that Dmax(ρ‖σ) can be calculatedusing a semi-definite program (SDP) (see Section 2.4). Indeed, the optimiza-tion in (4.8.4) can be cast in the standard form in (2.4.2), i.e.,

infλ : ρ ≤ λσ =

infimum Tr[BY]subject to Φ†(Y) ≥ A,

Y ≥ 0,(4.8.6)

with Y ≡ λ, A ≡ ρ, B ≡ 1, and Φ†(Y) = Yσ. (Note that taking the trace onboth sides of the constraint ρ ≤ λσ in (4.8.4) results in λ ≥ 1/Tr[σ], so thatλ ≥ 0.) The final equality in (4.8.5) results from calculating the SDP dual tothat in (4.8.4).

348

Page 360: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.57 Data-Processing Inequality for Max-Relative En-tropy

Let ρ be a state, σ a positive semi-definite operator, and N a quantumchannel. Then,

Dmax(ρ‖σ) ≥ Dmax(N(ρ)‖N(σ)). (4.8.7)

REMARK: This result holds more generally for positive maps that are not necessarilytrace preserving.

PROOF: To see this, let λ be such that the operator inequality ρ ≤ 2λσ holds.Then the operator inequality N(ρ) ≤ 2λN(σ) holds because the quantumchannel N is a positive map. Then

Dmax(N(ρ)‖N(σ)) = inf µ : N(ρ) ≤ 2µN(σ) ≤ λ. (4.8.8)

Since the inequality holds for all choices of λ such that ρ ≤ 2λσ holds, weconclude that

Dmax(N(ρ)‖N(σ)) ≤ infλ : ρ ≤ 2λσ = Dmax(ρ‖σ). (4.8.9)

This concludes the proof.

It turns out that the max-relative entropy is a limiting case of the sand-wiched and geometric Rényi relative entropies, as we now show.

Proposition 4.58

The sandwiched and geometric Rényi relative entropies converge to themax-relative entropy in the limit α→ ∞:

limα→∞

Dα(ρ‖σ) = limα→∞

Dα(ρ‖σ) = Dmax(ρ‖σ). (4.8.10)

PROOF: We begin with the case in which supp(ρ) 6⊆ supp(σ). We triviallyhave Dα(ρ‖σ) = Dα(ρ‖σ) = Dmax(ρ‖σ) = +∞ for all α > 1, which impliesthe equality in (4.8.10) in this case.

In the case that supp(ρ) ⊆ supp(σ), we can consider, without loss of gen-erality, that supp(σ) = H, so that λmin(σ) > 0.

We begin with the sandwiched Rényi relative entropy. Consider that

Dα(ρ‖σ) =1

α− 1log2 Tr[(ρ

12 σ

1−αα ρ

12 )α] (4.8.11)

349

Page 361: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

=1

α− 1log2 Tr[(ρ

12 σ−

12 σ

1α σ−

12 ρ

12 )α]. (4.8.12)

By the operator inequalities [λmin(σ)]1α 1 ≤ σ

1α ≤ [λmax(σ)]

1α 1 and the mono-

tonicity Tr[Xα] ≤ Tr[Yα] for positive semi-definite X and Y satisfying X ≤ Y(see Lemma 2.13), we find for α > 1 that

λmin(σ)Tr[(ρ12 σ−1ρ

12 )α] ≤ Tr[(ρ

12 σ−

12 σ

1α σ−

12 ρ

12 )α] (4.8.13)

≤ λmax(σ)Tr[(ρ1/2σ−1ρ12 )α]. (4.8.14)

Using the fact that

Tr[(ρ12 σ−1ρ

12 )α] =

∥∥∥ρ12 σ−1ρ

12

∥∥∥α

α, (4.8.15)

and taking a logarithm followed by multiplication of 1α−1 , we find that

1α− 1

log2 λmin(σ) +α

α− 1log2

∥∥∥ρ12 σ−1ρ

12

∥∥∥α≤ Dα(ρ‖σ)

≤ 1α− 1

log2 λmax(σ) +α

α− 1log2

∥∥∥ρ12 σ−1ρ

12

∥∥∥α

. (4.8.16)

Now taking the limit α→ ∞ and applying the fact that limα→∞ ‖X‖α = ‖X‖∞(Proposition 2.8), we conclude the equality limα→∞ Dα(ρ‖σ) = Dmax(ρ‖σ).

We now consider the geometric Rényi relative entropy. Since we have that

λmin(σ)1 ≤ σ ≤ λmax(σ)1, (4.8.17)

it follows that

λmin(σ)Tr[(

σ−12 ρσ−

12

)α]≤ Tr

[σ(

σ−12 ρσ−

12

)α](4.8.18)

≤ λmax(σ)Tr[(

σ−12 ρσ−

12

)α]. (4.8.19)

Now taking a logarithm, dividing by α− 1, and applying definitions, we findthat the following inequalities hold for α > 1:

1α− 1

log2 λmin(σ) +1

α− 1log2 Tr

[(σ−

12 ρσ−

12

)α]

≤ Dα(ρ‖σ) (4.8.20)

≤ 1α− 1

log2 λmax(σ) +1

α− 1log2 Tr

[(σ−

12 ρσ−

12

)α]. (4.8.21)

350

Page 362: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Rewriting

1α− 1

log2 Tr[(

σ−12 ρσ−

12

)α]=

α

α− 1log2

(Tr[(

σ−12 ρσ−

12

)α]) 1α

(4.8.22)

α− 1log2

∥∥∥σ−12 ρσ−

12

∥∥∥α

. (4.8.23)

Then by applying limα→∞ ‖X‖α = ‖X‖∞, it follows that

limα→∞

1α− 1

log2 Tr[(

σ−12 ρσ−

12

)α]= Dmax(ρ‖σ). (4.8.24)

Combining this limit with the inequalities in (4.8.20) and (4.8.21), we arrive atthe equality limα→∞ Dα(ρ‖σ) = Dmax(ρ‖σ).

As a consequence of Proposition 4.58, it is customary to use the notations

D∞(ρ‖σ) ≡ Dmax(ρ‖σ) ≡ D∞(ρ‖σ) (4.8.25)

to denote the max-relative entropy. It also follows that the max-relative en-tropy satisfies the properties of isometric invariance and additivity, as statedin Proposition 4.29. Proposition 4.29 also tells us that the sandwiched Rényirelative entropy Dα is monotonically increasing in α, which means that themax-relative entropy has the largest value among all sandwiched Rényi rel-ative entropies. Due to Proposition 4.42, a similar conclusion holds for thegeometric Rényi relative entropies. The max-relative entropy also satisfies allof the properties stated in Proposition 4.33.

The conditional entropy arising from the max-relative entropy (accordingto the general definition in (4.3.12)) is known as the conditional min-entropy:

Hmin(A|B)ρ := H∞(A|B)ρ = − infσB

Dmax(ρAB‖1A ⊗ σB) (4.8.26)

for all bipartite states ρAB, where the optimization is over states σB. Since themax-relative entropy has the largest value among all the sandwiched Rényirelative entropies, the quantity in (4.8.26) has the smallest value among allconditional sandwiched Rényi entropies, which is why it is called the condi-tional min-entropy. On the other hand, the quantity

Hmax(A|B)ρ := H12(A|B)ρ = − inf

σBD1

2(ρAB‖1A ⊗ σB) (4.8.27)

is known as the conditional max-entropy of the state ρAB, where the optimiza-tion is over states σB. This name comes from the fact that α = 1

2 is the small-est value of α for which the sandwiched Rényi relative entropy is known to

351

Page 363: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

satisfy the data-processing inequality (recall Theorem 4.31). Since the sand-wiched Rényi relative entropy Dα is monotonically increasing in α, the quan-tity in (4.8.27) is known as the conditional max-entropy because it has thelargest value among all conditional sandwiched Rényi entropies for whichthe data-processing inequality is known to hold.

REMARK: Let ρXB be a classical–quantum state of the form

ρXB = ∑x∈X

p(x)|x〉〈x|X ⊗ ρxB, (4.8.28)

where X is a finite alphabet p : X→ [0, 1] is a probability distribution, and ρxBx∈X is a set

of states. Using the duality of semi-definite programs (see Section 2.4), as done in (3.1.181),it follows that

Hmin(X|B)ρ = − log2 p∗succ((p(x), ρxB)x), (4.8.29)

where p∗succ((p(x), ρxB)), defined in (3.1.173), is the optimal success probability for mul-

tiple state discrimination. The conditional min-entropy of a classical–quantum state thushas an operational interpretation in terms of the optimal success probability for multiplestate discrimination.

4.8.1 Smooth Max-Relative Entropy

For the analysis of lower bounds on quantum and private communicationrates, we require the smooth max-relative entropy, which is an example of asmooth generalized divergence. A smooth generalized divergence, denotedby Dε(ρ‖σ), is defined by taking a generalized divergence D(ρ‖σ), for a givenstate ρ and positive semi-definite operator σ, and optimizing the quantityD(ρ‖σ) over states ρ that are within a distance ε from the given state ρ. Specif-ically, it is defined as follows:

Dε(ρ‖σ) := infρ∈Bε(ρ)

D(ρ‖σ), (4.8.30)

whereBε(ρ) := τ : τ ≥ 0, Tr[τ] = 1, P(ρ, τ) ≤ ε (4.8.31)

is the set of states τ that are ε-close to ρ in terms of the sine distance (Defini-tion 3.66).

352

Page 364: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Definition 4.59 Smooth Max-Relative Entropy

Let ρ be a state and σ a positive semi-definite operator. Then, the ε-smooth max-relative entropy, for ε ∈ [0, 1), is defined as

Dεmax(ρ‖σ) := inf

ρ∈Bε(ρ)Dmax(ρ‖σ). (4.8.32)

Just like the max-relative entropy, the smooth max-relative entropy is ageneralized divergence, satisfying the data-processing inequality:

Proposition 4.60 Data-Processing Inequality for Smooth Max-Relative Entropy

Let ρ be a state, σ a positive semi-definite operator, and N a quantumchannel. The smooth max-relative entropy obeys the following data-processing inequality for all ε ∈ (0, 1):

Dεmax(ρ‖σ) ≥ Dε

max(N(ρ)‖N(σ)). (4.8.33)

PROOF: To see this, let ρ be an arbitrary state such that

P(ρ, ρ) ≤ ε. (4.8.34)

Then from the data-processing inequality for the sine distance under positivetrace-preserving maps (see (3.5.135)), it follows that

P(N(ρ),N(ρ)) ≤ ε. (4.8.35)

So it follows that

Dmax(ρ‖σ) ≥ Dmax(N(ρ)‖N(σ)) (4.8.36)≥ Dε

max(N(ρ)‖N(σ)). (4.8.37)

Since the inequality holds for an arbitrary state ρ satisfying (4.8.34), we con-clude (4.8.33).

REMARK: The proof given above holds more generally when N is a positive, trace-preserving map, so that (4.8.33) holds in this more general case.

The smooth max-relative entropy can be related to the sandwiched Rényirelative entropy as follows:

353

Page 365: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.61 Smooth Max- to Sandwiched Rényi Relative En-tropy

Let ρ be a state, σ a positive semi-definite operator, α ∈ (1, ∞), andε ∈ (0, 1). Then,

Dεmax(ρ‖σ) ≤ Dα(ρ‖σ) +

1α− 1

log2

(1ε2

)+ log2

(1

1− ε2

). (4.8.38)

PROOF: The statement is trivially true if ρ = σ or if supp(ρ) 6⊆ supp(σ). Inthe former case, Dε

max(ρ‖σ) = Dα(ρ‖σ) = 0, and in the latter, Dα(ρ‖σ) = +∞.

So going forward, we assume that ρ 6= σ and supp(ρ) ⊆ supp(σ). Asmentioned in (4.8.5), the SDP dual of Dmax(τ‖ω) is given by

Dmax(τ‖ω) = log2 supΛ≥0Tr[Λτ] : Tr[Λω] ≤ 1 , (4.8.39)

implying that

Dεmax(ρ‖σ) = log2 inf

ρ:P(ρ,ρ)≤εsup

Λ≥0,Tr[Λσ]≤1Tr[Λρ]. (4.8.40)

Since the objective function Tr[Λρ] is linear in Λ and ρ, the set Λ : Λ ≥0, Tr[Λσ] ≤ 1 is compact and concave, and the set

ρ : P(ρ, ρ) ≤ ε, ρ ≥ 0, Tr[ρ] = 1 (4.8.41)

is compact and convex (due to convexity of sine distance), the minimax theo-rem (Theorem 2.18) applies and we find that

Dεmax(ρ‖σ) = log2 sup

Λ≥0,Tr[Λσ]≤1inf

ρ:P(ρ,ρ)≤εTr[Λρ]. (4.8.42)

For a fixed operator Λ ≥ 0 with spectral decomposition

Λ = ∑i

λi|φi〉〈φi|, (4.8.43)

let us define the following set, for a choice of λ > 0 to be specified later:

S :=

i : 〈φi|ρ|φi〉 > 2λ〈φi|σ|φi〉

. (4.8.44)

354

Page 366: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

LetΠ := ∑

i∈S|φi〉〈φi|. (4.8.45)

Then from the definition, we find that

Tr[Πρ] > 2λ Tr[Πσ], (4.8.46)

which implies thatTr[Πρ]

Tr[Πσ]> 2λ. (4.8.47)

Now consider from the data-processing inequality under the channel

∆(ω) := Tr[Πω]|0〉〈0|+ Tr[Πω]|1〉〈1| (4.8.48)

that

Dα(ρ‖σ) ≥ Dα(∆(ρ)‖∆(σ)) (4.8.49)

=1

α− 1log2

((Tr[Πρ])α (Tr[Πσ])1−α

+(Tr[Πρ]

)α (Tr[Πσ])1−α

)(4.8.50)

≥ 1α− 1

log2

((Tr[Πρ])α (Tr[Πσ])1−α

)(4.8.51)

=1

α− 1log2

(Tr[Πρ]

(Tr[Πρ]

Tr[Πσ]

)α−1)

(4.8.52)

=1

α− 1log2(Tr[Πρ]) + log2

(Tr[Πρ]

Tr[Πσ]

)(4.8.53)

≥ 1α− 1

log2(Tr[Πρ]) + λ. (4.8.54)

Now picking

λ = Dα(ρ‖σ) +1

α− 1log2

(1ε2

), (4.8.55)

we conclude thatTr[Πρ] ≤ ε2. (4.8.56)

Defining Π := 1−Π, this means that

Tr[Πρ] ≥ 1− ε2. (4.8.57)

Thus, the state

ρ′ :=ΠρΠ

Tr[Πρ](4.8.58)

355

Page 367: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

is such thatF(ρ, ρ′) ≥ 1− ε2, (4.8.59)

by applying Lemma 3.65, and in turn that

P(ρ, ρ′) ≤ ε. (4.8.60)

We also have that

ρ′ ≤ ΠρΠ1− ε2 . (4.8.61)

Now let Λ be an arbitrary operator satisfying Λ ≥ 0 and Tr[Λσ] ≤ 1, andlet Π be the projection defined in (4.8.45) for this choice of Λ. Then we findthat

(1− ε2

)Tr[Λρ′] ≤ Tr[ΛΠρΠ] (4.8.62)

= Tr[ΠΛΠρ] (4.8.63)

= ∑i/∈S

λi〈φi|ρ|φi〉 (4.8.64)

≤ 2λ ∑i/∈S

λi〈φi|σ|φi〉 (4.8.65)

≤ 2λ Tr[Λσ] (4.8.66)

≤ 2λ. (4.8.67)

Thus, we have found the following uniform bound for any operator Λ satis-fying Λ ≥ 0 and Tr[Λσ] ≤ 1, with ρ′ the state in (4.8.58) depending on Λ andsatisfying P(ρ, ρ′) ≤ ε:

Tr[Λρ′] ≤ 2λ+log2

(1

1−ε2

). (4.8.68)

Then it follows that

Dεmax(ρ‖σ) = log2 sup

Λ≥0,Tr[Λσ]≤1inf

ρ:P(ρ,ρ)≤εTr[Λρ] (4.8.69)

≤ log2 supΛ≥0,Tr[Λσ]≤1

Tr[Λρ′] (4.8.70)

≤ λ + log2

(1

1− ε2

). (4.8.71)

This concludes the proof.

356

Page 368: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

A quantity of interest is the smooth conditional min-entropy, which is a con-ditional entropy that we define via the general construction of conditionalentropies in (4.3.12). For every bipartite state ρAB, and every ε ∈ [0, 1), wedefine it as

Hεmin(A|B)ρ := − inf

σBDε

max(ρAB‖1A ⊗ σB), (4.8.72)

where we take the infimum over states σB.

Using the definition of the smooth conditional min-entropy in (4.8.72) andapplying Proposition 4.61, we conclude that

Hεmin(A|B)ρ ≥ Hα(A|B)ρ −

1α− 1

log2

(1ε2

)− log2

(1

1− ε2

), (4.8.73)

for all α > 1 and ε ∈ (0, 1).

4.9 Hypothesis Testing Relative Entropy

We now explore another important generalized divergence, called the hy-pothesis testing relative entropy. This particular entropy is defined to be theoptimal value of an operationally defined problem in the context of quantumhypothesis testing. As such, it is debatable as to whether such a quantityshould be given the name “entropy.” However, our perspective is that theadvantages of doing so far outweigh this semantic point about nomenclature,and so we adopt this perspective here and throughout the book. At the mostfundamental level, the hypothesis testing relative entropy obeys the quan-tum data-processing inequality, and for this reason and others, it is useful forcharacterizing the optimal limits of various communication protocols.

Definition 4.62 ε-Hypothesis Testing Relative Entropy

Given a state ρ, a positive semi-definite operator σ, and ε ∈ [0, 1], theε-hypothesis testing relative entropy is defined as

DεH(ρ‖σ) := − log2 βε(ρ‖σ), (4.9.1)

where

βε(ρ‖σ) := infΛTr[Λσ] : 0 ≤ Λ ≤ 1, Tr[Λρ] ≥ 1− ε . (4.9.2)

357

Page 369: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Observe that DεH(ρ‖σ) can be written as

DεH(ρ‖σ) = sup

Λ− log2 Tr[Λσ] : 0 ≤ Λ ≤ 1, Tr[Λρ] = 1− ε. (4.9.3)

That is, the monotonicity of the log2 function allows us to bring − log2 insidethe minimization in the definition of βε(ρ‖σ), and it suffices to optimize overmeasurement operators that meet the constraint Tr[Λρ] ≥ 1− ε with equality.This follows because for every measurement operator Λ such that Tr[Λρ] >1 − ε, we can modify it by scaling it by a positive number λ ∈ [0, 1) suchthat Tr[(λΛ) ρ] = 1− ε. The new operator λΛ is a legitimate measurementoperator and the error probability Tr[(λΛ)σ] only decreases under this scaling(i.e., Tr[(λΛ)σ] < Tr[Λσ]), which allows us to conclude (4.9.3).

The hypothesis testing relative entropy can be computed using a semi-definite program, as indicated in the following proposition:

Proposition 4.63 Hypothesis Testing Relative Entropy as an SDP

For every state ρ, positive semi-definite operator σ, and ε ∈ [0, 1], theε-hypothesis testing relative entropy can be expressed as the followingSDPs:

DεH(ρ‖σ) = − log2 inf

Λ≥0Tr[Λσ] : Λ ≤ 1, Tr[Λρ] ≥ 1− ε (4.9.4)

= − log2 supµ≥0,Z≥0

µ(1− ε)− Tr[Z] : µρ ≤ σ + Z. (4.9.5)

Complementary slackness implies that the following equalities hold foroptimal Λ, µ, and Z:

Λ(σ + Z) = µΛρ, Tr[Λρ]µ = (1− ε)µ, ΛZ = Z. (4.9.6)

PROOF: The primal formulation in (4.9.4) is immediate from Definition 4.62.Indeed, considering the standard form in (2.4.2), we see that we can set

B = σ, Y = Λ, Φ†(Y) =(

Tr[Λρ] 00 −Λ

), A =

(1− ε 0

0 −1

). (4.9.7)

To figure out the dual and having already identified A, B, and Φ†, we needto determine the map Φ and plug into the standard form in (2.4.1). Letting

X :=(

µ 00 Z

), (4.9.8)

358

Page 370: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

we find that

Tr[Φ†(Y)X] = Tr[(

Tr[Λρ] 00 −Λ

)(µ 00 Z

)](4.9.9)

= µTr[Λρ]− Tr[ΛZ] (4.9.10)= Tr[Λ(µρ− Z)], (4.9.11)

which implies thatΦ(X) = µρ− Z. (4.9.12)

Now substituting into (2.4.1) and simplifying, we conclude that the right-hand side of (4.9.5) is the dual SDP.

To show that this is equal to DεH(ρ‖σ), we should demonstrate that the

primal and dual SDPs satisfy the strong duality property. It is clear that Λ = 1

is a feasible point for the primal SDP. Furthermore, the choices µ = 1 andZ = 1 + [σ − ρ]+, where [σ − ρ]+ is the positive part of σ − ρ, are strictlyfeasible for the dual. Thus, we conclude (4.9.5) by applying Theorem 2.22.

The complementary slackness conditions in (4.9.6) follow directly fromProposition 2.23.

Proposition 4.64 Optimal Measurement for Hypothesis Testing Rel-ative Entropy

For every state ρ, positive semi-definite operator σ, and ε ∈ [0, 1], theε-hypothesis testing relative entropy Dε

H(ρ‖σ) is achieved by the follow-ing measurement operator:

Λ(µ∗, p∗) := Πµ∗ρ>σ + p∗Πµ∗ρ=σ, (4.9.13)

where Πµ∗ρ>σ is the projection onto the strictly positive part of µ∗ρ− σ,the projection Πµ∗ρ=σ projects onto the zero eigenspace of µ∗ρ− σ, andµ∗ ≥ 0 and p∗ ∈ [0, 1] are chosen as follows:

µ∗ := sup

µ : Tr[Πµρ>σρ] ≤ 1− ε

, (4.9.14)

p∗ :=1− ε− Tr[Πµ∗ρ>σρ]

Tr[Πµ∗ρ=σρ]. (4.9.15)

PROOF: To find the form of an optimal measurement operator for the hy-pothesis testing relative entropy, let Λ be a measurement operator satisfying

359

Page 371: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Tr[Λρ] = 1− ε and let µ ≥ 0. Then

Tr[Λσ] = Tr[Λσ] + µ (1− ε− Tr[Λρ]) (4.9.16)= −µε + Tr[(I −Λ)µρ] + Tr[Λσ] (4.9.17)

≥ −µε +12(Tr[µρ + σ]− ‖µρ− σ‖1) (4.9.18)

= −µε +12(µ + Tr[σ]− ‖µρ− σ‖1) . (4.9.19)

The sole inequality follows as an application of Theorem 3.13, with B = σ andA = µρ. Observe that the final expression is a universal bound independentof Λ. To determine an optimal measurement operator, we can look to Theo-rem 3.13. There, it was established that the following measurement operatoris an optimal one for infΛ:0≤Λ≤I Tr[(I −Λ)µρ] + Tr[Λσ]:

Λ(µ, p) := Πµρ>σ + pΠµρ=σ, (4.9.20)

where Πµρ>σ is the projection onto the strictly positive part of µρ− σ, the pro-jection Πµρ=σ projects onto the zero eigenspace of µρ− σ, and p ∈ [0, 1]. Themeasurement operator Λ(µ, p) is called a quantum Neyman–Pearson test. Westill need to choose the parameters µ ≥ 0 and p ∈ [0, 1]. Let us pick µ accord-ing to the following optimization:

µ∗ := sup

µ : Tr[Πµρ>σρ] ≤ 1− ε

. (4.9.21)

If it so happens that µ∗ is such that Tr[Πµ∗ρ>σρ] = 1− ε, then we are done; wecan pick p = 0. However, if µ∗ is such that Tr[Πµ∗ρ>σρ] < 1− ε, then we pickp∗ ∈ [0, 1] such that

p∗ :=1− ε− Tr[Πµ∗ρ>σρ]

Tr[Πµ∗ρ=σρ], (4.9.22)

with it following that p∗ ∈ [0, 1] because

Tr[Πµ∗ρ>σρ] < 1− ε ≤ Tr[Πµ∗ρ≥σρ]. (4.9.23)

With these choices, we then find that

Tr[Λ(µ∗, p∗)ρ] = 1− ε. (4.9.24)

By the analysis in (4.9.16)–(4.9.19), it then follows that

Tr[Λσ] ≥ Tr[Λ(µ∗, p∗)σ] (4.9.25)

for all measurement operators Λ satisfying 0 ≤ Λ ≤ I and Tr[Λρ] = 1− ε.

360

Page 372: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Note that the other generalized divergenes we have considered so far sat-isfy D(ρ‖ρ) = 0 for all states ρ. The ε-hypothesis testing relative entropy,however, does not have this property unless ε = 0. In fact, it is clear from thedefinition, along with (4.9.3), that

DεH(ρ‖ρ) = − log2(1− ε) (4.9.26)

for all states ρ and ε ∈ [0, 1].

Like the quantum relative entropy, the Petz–Rényi relative entropy, thesandwiched Rényi relative entropy, and the max-relative entropy, the ε-hypo-thesis testing relative entropy is also a generalized divergence, meaning thatis satisfies the data-processing inequality.

Theorem 4.65 Data-Processing Inequality for Hypothesis TestingRelative Entropy

Let ρ be a state, σ a positive semi-definite operator, and N a quantumchannel. Then, for all ε ∈ [0, 1],

DεH(ρ‖σ) ≥ Dε

H(N(ρ)‖N(σ)). (4.9.27)

PROOF: The intuition for this proof is as follows: A measurement operatorΛ satisfying the constraints 0 ≤ Λ ≤ 1 and Tr[ΛN(ρ)] ≥ 1− ε can be un-derstood as a particular measurement strategy for distinguishing ρ from σ inwhich we first apply the channel N and then apply the measurement operatorΛ. Then this particular measurement strategy cannot perform better than theoptimal measurement strategy for distinguishing ρ from σ.

We start by writing DεH(N(ρ)‖N(σ)) as in (4.9.1):

DεH(N(ρ)‖N(σ))

= supΛ− log2 Tr[ΛN(σ)] : 0 ≤ Λ ≤ 1, Tr[ΛN(ρ)] ≥ 1− ε. (4.9.28)

Fix Λ such that 0 ≤ Λ ≤ 1 and Tr[ΛN(ρ)] ≥ 1− ε. By definition of the adjoint,we have that

Tr[ΛN(σ)] = Tr[N†(Λ)σ], Tr[ΛN(ρ)] = Tr[N†(Λ)ρ]. (4.9.29)

Also, note that 0 ≤ N†(Λ) ≤ 1. The leftmost inequality is due to the fact thatN† is a positive map because N is. The rightmost inequality is due to the fact

361

Page 373: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

that N† is subunital because N is trace non-increasing. By the positivity of N†,we obtain

Λ ≤ 1⇒ 1−Λ ≥ 0⇒ N†(1−Λ) ≥ 0⇒ N†(Λ) ≤ N†(1) ≤ 1. (4.9.30)

Since Λ is arbitrary, we bound (4.9.28) as

DεH(N(ρ)‖N(σ))

≤ supΛ− log2 Tr[N†(Λ)σ] : 0 ≤ N†(Λ) ≤ 1, Tr[N†(Λ)ρ] ≥ 1− ε.

(4.9.31)Now, by enlarging the optimization set from measurement operators N†(Λ)satisfying 0 ≤ N†(Λ) ≤ 1 and Tr[N†(Λ)ρ] ≥ 1− ε to all measurement opera-tors, say Λ′, satisfying 0 ≤ Λ′ ≤ 1 and Tr[Λ′ρ] ≥ 1− ε, we obtain

DεH(N(ρ)‖N(σ))

≤ supΛ− log2 Tr[N†(Λ)σ] : 0 ≤ N†(Λ) ≤ 1, Tr[N†(Λ)ρ] ≥ 1− ε (4.9.32)

≤ supΛ′− log2 Tr[Λ′σ] : 0 ≤ Λ′ ≤ 1, Tr[Λ′ρ] ≥ 1− ε (4.9.33)

= DεH(ρ‖σ), (4.9.34)

as required.

REMARK: Inspection of the proof above reveals that it holds more generally for N a pos-itive, trace-non-increasing map.

Proposition 4.66 Properties of Hypothesis Testing Relative Entropy

The ε-hypothesis testing relative entropy satisfies the following proper-ties for all ε ∈ [0, 1]:

1. If ε′ ∈ (ε, 1], thenDε

H(ρ‖σ) ≤ Dε′H(ρ‖σ). (4.9.35)

2. The following limit holds

limε→0

DεH(ρ‖σ) = D0(ρ‖σ), (4.9.36)

where D0(ρ‖σ) = − log2 Tr[Πρσ] is the Petz–Rényi relative entropyof order zero and Πρ is the projection onto the support of ρ.

362

Page 374: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

3. For every state ρ and positive semi-definite operators σ, σ′ such thatσ′ ≥ σ, we have that Dε

H(ρ‖σ) ≥ DεH(ρ‖σ′).

4. For every state ρ, positive semi-definite operator σ, and α > 0, wehave that Dε

H(ρ‖ασ) = DεH(ρ‖σ)− log2 α.

5. Let p, q : X → [0, 1] be two probability distributions over a finitealphabet X with associated |X|-dimensional system X, let ρx

Ax∈Xbe a set of states on a system A, and let σA be a state on system A.Then,

DεH

(∑

x∈Xp(x)|x〉〈x|X ⊗ ρx

A

∥∥∥∥∥ ∑x∈X

q(x)|x〉〈x|X ⊗ σxA

)

≥ minx∈X

DεH(ρ

xA‖σx

A). (4.9.37)

PROOF:

1. Eq. (4.9.35) follows from Definition 4.62: increasing ε increases the setof measurement operators Λ over which we can optimize, and Dε

H(ρ‖σ)does not decrease under such a change.

2. Consider that the following inequality holds for all ε ∈ (0, 1):

DεH(ρ‖σ) ≥ D0(ρ‖σ), (4.9.38)

because the measurement operator Πρ (projection onto support of ρ) sat-isfies Tr[Πρρ] ≥ 1− ε for all ε ∈ (0, 1). So we conclude that

lim infε→0

DεH(ρ‖σ) ≥ D0(ρ‖σ). (4.9.39)

Alternatively, suppose that Λ is a measurement operator satisfying Tr[Λρ] =1− ε (note that when optimizing Dε

H, it suffices to optimize over measure-ment operators satisfying the constraint Tr[Λρ] ≥ 1− ε with equality, asmentioned in (4.9.3)). Then applying the data-processing inequality forDα(ρ‖σ) under the measurement Λ, I −Λ, which holds for α ∈ (0, 1),we find that

Dα(ρ‖σ) ≥1

α− 1log2

[(1− ε)α Tr[Λσ]1−α + εα (1− Tr[Λσ])1−α

].

(4.9.40)

363

Page 375: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Since this bound holds for all measurement operators Λ satisfying Tr[Λρ] =1− ε, we conclude the following bound for all α ∈ (0, 1):

Dα(ρ‖σ) ≥1

α− 1log2

[(1− ε)α

(2−Dε

H(ρ‖σ))1−α

+ εα(

1− 2−DεH(ρ‖σ)

)1−α]

. (4.9.41)

Now taking the limit of the right-hand side as ε → 0, we find that thefollowing bound holds for all α ∈ (0, 1):

Dα(ρ‖σ) ≥ lim supε→0

DεH(ρ‖σ). (4.9.42)

Since the bound holds for all α ∈ (0, 1), we can take the limit on theleft-hand side to arrive at

limα→0

Dα(ρ‖σ) = D0(ρ‖σ) ≥ lim supε→0

DεH(ρ‖σ). (4.9.43)

Now putting together (4.9.39) and (4.9.43), we conclude (4.9.36).

3. Fix an operator Λ satisfying 0 ≤ Λ ≤ 1. The assumption σ′ ≥ σ impliesthat Tr[Λσ′] ≥ Tr[Λσ], which in turn implies that

− log2 Tr[Λσ′] ≤ − log2 Tr[Λσ]. (4.9.44)

Since the operator Λ is arbitrary, we obtain DεH(ρ‖σ) ≥ Dε

H(ρ‖σ′), asrequired.

4. This follows immediately from the fact that

− log2 Tr[Λ(ασ)] = − log2(αTr[Λσ]) = − log2 Tr[Λσ]− log2 α (4.9.45)

for all operators Λ satisfying 0 ≤ Λ ≤ 1.

5. Since DεH(ρ‖σ) = − log2 βε(ρ‖σ), where βε(ρ‖σ) is defined in (4.9.2), we

can equivalently show that

βε

(∑

x∈Xp(x)|x〉〈x|X ⊗ ρx

A

∥∥∥∥∥ ∑x∈X

q(x)|x〉〈x|X ⊗ σxA

)

≤ maxx∈X

βε(ρxA‖σx

A). (4.9.46)

364

Page 376: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Now, in the definition of βε on the left-hand side of the inequality above,let us restrict the infimum over all measurement operators to those of theform ΛXA = ∑x∈X |x〉〈x|X⊗Mx

A such that 0 ≤ MxA ≤ 1A and Tr[Mx

AρxA] ≥

1− ε for all x ∈ X. Doing this leads to

βε

(∑

x∈Xp(x)|x〉〈x|X ⊗ ρx

A

∥∥∥∥∥ ∑x∈X

q(x)|x〉〈x|X ⊗ σxA

)(4.9.47)

≤ infMx

Ax∈X

x∈Xq(x)Tr[Mx

AσxA] : 0 ≤ Mx

A ≤ 1A,

Tr[MxAρx

A] ≥ 1− ε ∀x ∈ X

(4.9.48)

= ∑x∈X

q(x) infMx

A

Tr[MxAσx

A] : 0 ≤ MxA ≤ 1A,

Tr[MxAρx

A] ≥ 1− ε (4.9.49)

= ∑x∈X

q(x)βε(ρxA‖σx

A) (4.9.50)

≤ maxx∈X

βε(ρxA‖σx

A). (4.9.51)

The last inequality follows because βε(ρxA‖σx

A) ≤ maxx∈X βε(ρxA‖σx

A) forall x ∈ X. So we have shown that the inequality (4.9.46) holds, whichcompletes the proof.

4.9.1 Connection to Quantum Relative Entropy

We now prove a bound relating the ε-hypothesis testing relative entropy tothe quantum relative entropy.

Proposition 4.67 Hypothesis Testing to Quantum Relative Entropy

Fix ε ∈ [0, 1). Let ρ be a state and σ a positive semi-definite operator.Then the following bound relates the ε-hypothesis testing relative en-tropy to the quantum relative entropy:

DεH(ρ‖σ) ≤

11− ε

(D(ρ‖σ) + h2(ε) + ε log2 Tr[σ]) , (4.9.52)

where h2(ε) is the binary entropy defined in (4.1.4).

365

Page 377: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

PROOF: To see this, we use the fact that the optimization in the definition ofDε

H(ρ‖σ) can be restricted as in (4.9.3), i.e.,

DεH(ρ‖σ) = sup

Λ− log2 Tr[Λσ] : 0 ≤ Λ ≤ 1, Tr[Λρ] = 1− ε. (4.9.53)

For every measurement operator Λ such that Tr[Λρ] = 1− ε, the data-processinginequality for the quantum relative entropy (Theorem 4.4) implies that

D(ρ‖σ) ≥ D(1− ε, ε‖Tr[Λσ], Tr[σ]− Tr[Λσ])

= (1− ε) log2

(1− ε

Tr[Λσ]

)+ ε log2

Tr[σ]− Tr[Λσ]

)(4.9.54)

= (1− ε) log2(1− ε)− (1− ε) log2(Tr[Λσ])

+ ε log2(ε) + ε log2

(1

Tr[σ] (1− Tr[Λσ/Tr[σ]])

)(4.9.55)

= − (1− ε) log2 Tr[Λσ]− h2(ε)

+ ε log2

(1

1− Tr[Λσ/Tr[σ]]

)− ε log2 Tr[σ] (4.9.56)

≥ − (1− ε) log2 Tr[Λσ]− h2(ε)− ε log2 Tr[σ], (4.9.57)

where the inequality holds because ε log2

(1

1−Tr[Λσ/Tr[σ]]

)≥ 0. Rewriting this

gives

− log Tr[Λσ] ≤ 11− ε

[D(ρ‖σ) + h2(ε) + ε log2 Tr[σ]] . (4.9.58)

Since this bound holds for all measurement operators Λ satisfying Tr[Λρ] =1− ε, we conclude (4.9.52).

4.9.2 Connections to Quantum Rényi Relative Entropies

We now show a connection between the hypothesis testing relative entropyand the Petz– and sandwiched Rényi relative entropies.

366

Page 378: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.68 Hypothesis Testing to Sandwiched Rényi RelativeEntropy

Let ρ be a state and σ a positive semi-definite operator. Let α ∈ (1, ∞)and ε ∈ [0, 1). Then the following inequality holds

DεH(ρ‖σ) ≤ Dα(ρ‖σ) +

α

α− 1log2

(1

1− ε

). (4.9.59)

In particular, in the limit α→ ∞,

DεH(ρ‖σ) ≤ Dmax(ρ‖σ) + log2

(1

1− ε

). (4.9.60)

PROOF: If the support condition supp(ρ) ⊆ supp(σ) does not hold, then theright-hand side of (4.9.59) is equal to +∞, and so the result is trivially true.Thus, in what follows, we suppose that the support condition supp(ρ) ⊆supp(σ) holds. Let Λ denote a measurement operator such that Tr[Λρ] =1− ε. Let q := Tr[Λσ]. By the data-processing inequality for the sandwichedRényi relative entropy for α > 1 (Theorem 4.31), under the measurementchannel

ω 7→ Tr[Λω]|0〉〈0|+ Tr[(1−Λ)ω]|1〉〈1|, (4.9.61)

we find that

Dα(ρ‖σ) ≥ Dα(1− ε, ε‖q, Tr[σ]− q) (4.9.62)

=1

α− 1log2

[(1− ε)α q1−α + εα (Tr[σ]− q)1−α

](4.9.63)

≥ 1α− 1

log2[(1− ε)α q1−α] (4.9.64)

α− 1log2(1− ε)− log2 q. (4.9.65)

The statement in (4.9.59) follows by taking the supremum over all Λ such thatTr[Λρ] = 1 − ε. Furthermore, (4.9.60) follows because limα→∞ Dα(ρ‖σ) =Dmax(ρ‖σ) (as shown in Proposition 4.58) and the fact that limα→∞

αα−1 =

1.

The following proposition establishes an inequality relating hypothesistesting relative entropy and the Petz–Rényi relative entropy, and it repre-sents a counterpart to Proposition 4.68. After giving its proof, we show how

367

Page 379: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.68 and the following proposition lead to a proof of the quantumStein’s lemma.

Proposition 4.69 Hypothesis Testing to Petz–Rényi Relative Entropy

Let ρ be a state, and let σ be a positive semi-definite operator. Let α ∈(0, 1) and ε ∈ (0, 1]. Then the following inequality holds:

DεH(ρ‖σ) ≥ Dα(ρ‖σ) +

α

α− 1log2

(1ε

). (4.9.66)

PROOF: The statement is trivially true if ρσ = 0 because both DεH(ρ‖σ) = +∞

and Dα(ρ‖σ) = +∞ in this case. So we consider the non-trivial case when thisequality does not hold. Recall from Lemma 3.15 that the following inequalityholds for positive semi-definite operators A and B and for α ∈ (0, 1):

infΛ:0≤Λ≤1

Tr[(1−Λ)A] + Tr[ΛB] =12(Tr[A + B]− ‖A− B‖1) (4.9.67)

≤ Tr[AαB1−α], (4.9.68)

where the first equality is the result of Theorem 3.13. For p ∈ (0, 1), pickA = pρ and B = (1− p) σ. Plugging in to the inequality above, we find thatthere exists a measurement operator Λ∗ = Λ(p, ρ, σ) such that

pTr[(1−Λ∗)ρ] + (1− p)Tr[Λ∗σ] ≤ pα(1− p)1−αTr[ρασ1−α]. (4.9.69)

This implies that

pTr[(1−Λ∗)ρ] ≤ pα(1− p)1−αTr[ρασ1−α], (4.9.70)

and in turn that

Tr[(1−Λ∗)ρ] ≤(

1− pp

)1−α

Tr[ρασ1−α]. (4.9.71)

For a given ε ∈ (0, 1] and α ∈ (0, 1), we pick p ∈ (0, 1) such that(

1− pp

)1−α

Tr[ρασ1−α] = ε. (4.9.72)

This is possible because we can rewrite the equation above as

ε =

(1− p

p

)1−α

Tr[ρασ1−α]

368

Page 380: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

=

(1p− 1)1−α

Tr[ρασ1−α] (4.9.73)

⇐⇒(

1p− 1)1−α

Tr[ρασ1−α](4.9.74)

⇐⇒ 1p=

Tr[ρασ1−α]

) 11−α

+ 1 (4.9.75)

⇐⇒ p =1

Tr[ρασ1−α]

) 11−α

+ 1∈ (0, 1) . (4.9.76)

This means that Λ∗ = Λ(p, ρ, σ), with p selected as above, is a measurementoperator such that

Tr[(1−Λ∗)ρ] ≤ ε. (4.9.77)

Now, we use the fact that

(1− p)Tr[Λ∗σ] ≤ pα(1− p)1−αTr[ρασ1−α] (4.9.78)

implies

Tr[Λ∗σ] ≤(

p1− p

Tr[ρασ1−α]. (4.9.79)

Considering that

ε =

(1− p

p

)1−α

Tr[ρασ1−α] =

(p

1− p

)α−1Tr[ρασ1−α] (4.9.80)

implies that(

ε

Tr[ρασ1−α]

) 1α−1

=p

1− p, (4.9.81)

we get that

Tr[Λ∗σ] ≤(

p1− p

Tr[ρασ1−α] (4.9.82)

=

((ε

Tr[ρασ1−α]

) 1α−1)α

Tr[ρασ1−α] (4.9.83)

= εα

α−1

(Tr[ρασ1−α]

) α1−α Tr[ρασ1−α] (4.9.84)

369

Page 381: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= εα

α−1

(Tr[ρασ1−α]

) 11−α . (4.9.85)

Then, by taking the negative logarithm and optimizing over all Λ∗ satisfying(4.9.77), we find that

DεH(ρ‖σ) ≥ − log2 Tr[Λ∗σ] (4.9.86)

≥ − log2

αα−1

(Tr[ρασ1−α]

) 11−α

)(4.9.87)

= − α

α− 1log2(ε) +

1α− 1

log2 Tr[ρασ1−α] (4.9.88)

= − α

α− 1log2(ε) + Dα(ρ‖σ). (4.9.89)

Rearranging this inequality leads to (4.9.66), as required.

4.10 Quantum Stein’s Lemma

In this section, we show how Propositions 4.68 and 4.69 lead to a proof ofthe quantum Stein’s lemma, which is one of the most important results inthe asymptotic theory of quantum hypothesis testing. Before doing so, let usdiscuss the task of asymmetric hypothesis testing.

The general setting of asymmetric i.i.d. hypothesis testing is illustated inFigure 4.2. Bob is given n copies of a quantum system, each of which is ei-ther in the state ρ or in the state σ, and his task is to determine in whichstate the systems have been prepared. Bob’s strategy consists of performinga joint measurement on all of the systems at once, described by the POVMΛ(n), 1⊗n−Λ(n), guessing “ρ” if the outcome corresponds to Λ(n) and guess-ing “σ” if the outcome corresponds to 1⊗n −Λ(n). In this case, there are twotypes of errors that can occur.

1. Type-I Error: Bob guesses “σ”, but the systems are in the state ρ⊗n. Theprobability of this occurring is Tr[(1⊗n −Λ(n))ρ⊗n].

2. Type-II Error: Bob guesses “ρ”, but the systems are in the state σ⊗n. Theprobability of this occurring is Tr[Λ(n)σ⊗n].

The quantity βε(ρ⊗n‖σ⊗n), defined from (4.9.2), can be interpreted as theminimum type-II error probability subject to the constraint 1− Tr[Λ(n)ρ] =

370

Page 382: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

ρ or σ

Λ(n), 1⊗n −Λ(n)

“ρ” or “σ”

FIGURE 4.2: Schematic depiction of hypothesis testing. Many copies, sayn, of an identical quantum system are each prepared either in the stateρ or the state σ. A joint binary measurement, described by the POVMΛ(n), 1⊗n−Λ(n), is made on the overall state, which is either ρ⊗n or σ⊗n.

Tr[(1⊗n − Λ(n))ρ] ≤ ε on the type-I error probability. The rate of this proto-col, given a type-I error probability constraint of ε, is 1

n DεH(ρ

⊗n‖σ⊗n), whichis essentially the minimum type-II error probability exponent per copy of thestates ρ and σ. By increasing the number n of copies, one might imaginethat this normalized minimum type-II error probability exponent can be in-creased. Also, there is generally a trade-off between the type-I and type-IIerror probabilities, meaning that both cannot be made arbitrarily small si-multaneously. However, by increasing the number n of copies of the states ρand σ, we might expect that the type-I error probability can be brought downall the way to zero. We thus define the maximum rate for hypothesis testing ofthe states ρ and σ as the largest value of the normalized type-II error proba-bility exponent, as n → ∞, such that the type-I error probability vanishes inthis limit, i.e.,

infε∈(0,1)

lim infn→∞

DεH(ρ

⊗n‖σ⊗n)

n. (4.10.1)

A tractable expression for this optimal rate is given by the quantum Stein’slemma, which we state and prove in Theorem 4.75 below.

The task of asymmetric hypothesis testing is similar to the task of state dis-crimination that we considered in Section 3.1.3. While in hypothesis testingwe consider two error probabilities, the type-I and type-II error probabilities,in state discrimination we consider only one error probability that is in factthe average of the type-I and type-II error probabilities taken with respect toa prior probability distribution. If λ ∈ [0, 1] is the probability of choosing thestate ρ, then the average of the type-I and type-II error probabilities is

371

Page 383: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

λTr[(1⊗n −Λ(n))ρ⊗n] + (1− λ)Tr[Λ(n)σ⊗n]

= perr(Λ(n), 1⊗n −Λ(n), ρ⊗n, σ⊗n), (4.10.2)

where we recall the definition of the state discrimination error probabilityperr(Λ(n), 1⊗n − Λ(n); ρ, σ) from (3.1.117). Due to the fact that we com-bine both the type-I and type-II error probabilities, the task of state discrim-ination is often referred to as symmetric hypothesis testing, while the task ofhypothesis testing that we are considering in this section is referred to asasymmetric hypothesis testing.

More formally, a hypothesis testing protocol is defined by the four elements(n, ρ, σ, Λ(n)), where n is the number of copies of the system, each of whichis either in the state ρ or σ, and 0 ≤ Λ(n) ≤ 1⊗n is the operator defining thePOVM Λ(n), 1⊗n −Λ(n) used to decide the state of the system.

Definition 4.70 (n, εII, εI) Hypothesis Testing Protocol

Let (n, ρ, σ, Λ(n)) be the elements of a hypothesis testing protocol. Theprotocol is called an (n, εII, εI) protocol for ρ and σ if the type-I error prob-ability Tr[(1⊗n −Λ(n))ρ⊗n] satisfies

Tr[(1⊗n −Λ(n))ρ⊗n] ≤ εI, (4.10.3)

and the type-II error probability Tr[Λ(n)σ⊗n] satisfies

Tr[Λ(n)σ⊗n] ≤ εII. (4.10.4)

Observe that, by definition, if there exists an (n, εII, εI) hypothesis testingprotocol for ρ and σ, then there exists an (n, ε′II, εI) hypothesis testing pro-tocol for all ε′II ≥ εII because every measurement operator Λ(n) for whichTr[Λ(n)σ⊗n] ≤ εII clearly satisfies Tr[Λ(n)σ⊗n] ≤ ε′II. Similarly, if there doesnot exist an (n, εII, εI) hypothesis testing protocol, then there does not exist an(n, ε′II, εI) hypothesis testing protocol for all ε′II ≤ εII.

We define the rate of a hypothesis testing protocol (n, ρ, σ, Λ(n)) as

R(n, ρ, σ, Λ(n)) := −1n

log2 Tr[Λ(n)σ⊗n], (4.10.5)

The rate is also called the (normalized) type-II error exponent because the type-IIerror probability Tr[Λ(n)σ⊗n] can be expressed as Tr[Λ(n)σ⊗n] = 2−nR. The

372

Page 384: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

goal of the hypothesis testing protocol is to find the highest rate such that thetype-I error probability tends to zero as n increases.

Definition 4.71 Achievable Rate for Hypothesis Testing

Given quantum states ρ and σ, a rate R ∈ R+ is called an achievable ratefor hypothesis testing of ρ and σ if for all ε ∈ (0, 1], δ > 0, and sufficientlylarge n there exists an (n, 2−n(R−δ), ε) hypothesis testing protocol for ρand σ.

REMARK: When we say that there exists an (n, 2−n(R−δ), ε) hypothesis testing protocolfor ρ and σ for all ε ∈ (0, 1], δ > 0, and “sufficiently large n”, we mean that for all ε ∈ (0, 1]and δ > 0, there exists a number Nε,δ ∈ N such that for all n ≥ Nε,δ, there exists an(n, 2−n(R−δ), ε) hypothesis testing protocol for ρ and σ. This convention with the nomen-clature “sufficiently large n” is taken throughout the rest of the book.

Note that, by definition, for every achievable rate R there exists a value ofn such that the type-I error probability ε becomes arbitrarily close to zero.

Definition 4.72 Optimal Achievable Rate for Hypothesis Testing

Given quantum states ρ and σ, the optimal achievable rate, denoted byE(ρ, σ), is defined as the supremum of all achievable rates for hypothesistesting of ρ and σ, i.e.,

E(ρ, σ) := supR : R is an achievable rate for ρ, σ. (4.10.6)

Definition 4.73 Strong Converse Rate for Hypothesis Testing

Given quantum states ρ and σ, a rate R ∈ R+ is called a strong converserate for hypothesis testing of ρ and σ if for all ε ∈ [0, 1), δ > 0, and suffi-ciently large n, there does not exist an (n, 2−n(R+δ), ε) hypothesis testingprotocol for ρ and σ.

Note that, by definition, for every strong converse rate R there exists avalue of n such that the type-I error probability ε is arbitrarily close to one.

373

Page 385: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Definition 4.74 Optimal Strong Converse Rate for Hypothesis Test-ing

Given quantum states ρ and σ, the optimal strong converse rate, denotedby E(ρ, σ), is defined as the infimum of all strong converse rates for hy-pothesis testing of ρ and σ, i.e.,

E(ρ, σ) := infR : R is a strong converse rate for ρ, σ. (4.10.7)

Note that the following inequality is a direct consequence of definitions:

E(ρ, σ) ≤ E(ρ, σ). (4.10.8)

Indeed, suppose for a contradiction that this is not true, i.e., that E(ρ, σ) >

E(ρ, σ) holds. This means that there exists an achievable rate R such thatE(ρ, σ) < R < E(ρ, σ), so that for all ε ∈ (0, 1], all δ > 0, and all suffi-ciently large n there exists an (n, 2−n(R−δ), ε) hypothesis testing protocol. Onthe other hand, since E(ρ, σ) is the optimal strong converse rate, there existsa δ > 0 such that R − δ > E(ρ, σ) is a strong converse rate. This implies,by definition, that an (n, 2−n(R−δ+δ′), ε) protocol, with δ′ ∈ (0, δ), does notexist. However, since R was claimed to be an achievable rate, such a proto-col should exist since R− δ + δ′ < R. We have thus reached a contradiction.The inequality E(ρ, σ) > E(ρ, σ) therefore cannot be true, which means thatE(ρ, σ) ≤ E(ρ, σ).

We now state and prove the quantum Stein’s lemma: both the optimalachievable and strong converse rates are equal to the quantum relative en-tropy. As alluded to at the beginning of Section 4.2 on the quantum rela-tive entropy, the quantum Stein’s lemma gives the quantum relative entropyits most fundamental operational meaning as the optimal rate in asymmetricquantum hypothesis testing.

Theorem 4.75 Quantum Stein’s Lemma

For all states ρ and σ, the optimal achievable and strong converse ratesare equal to the quantum relative entropy of ρ and σ, i.e.,

E(ρ, σ) = E(ρ, σ) = D(ρ‖σ). (4.10.9)

PROOF: Note that if supp(ρ) * supp(σ), then D(ρ‖σ) = +∞ by definition.

374

Page 386: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

In this singular case, the optimal strong converse rate E(ρ, σ) is undefined,and so we prove that E(ρ, σ) = +∞.

Fix ε ∈ (0, 1], and let Πσ be the projection onto the support of σ. Notethat since supp(ρ) * supp(σ), we have that Tr[Πσρ] < 1. Now, pick n largeenough so that (Tr[Πσρ])n ≤ ε. Define the measurement operator Λ(n) asΛ(n) := 1− (Πσ)⊗n. Observe that the type-I error probability is

Tr[(1⊗n −Λ(n))ρ⊗n] = (Tr[Πσρ])n ≤ ε (4.10.10)

and the type-II error probability is

Tr[Λ(n)σ⊗n] = 1− (Tr[Πσσ])n = 0. (4.10.11)

Therefore, for all sufficiently large n such that (Tr[Πσρ])n ≤ ε holds, the el-ements (n, ρ, σ, Λ(n)) constitute an (n, 0, ε) hypothesis testing protocol for ρand σ, the rate of which is +∞ = D(ρ‖σ). Since ε is arbitrary, we concludethat, for all ε ∈ (0, 1] and sufficiently large n, there exists a hypothesis testingprotocol for ρ and σ with rate R = +∞. This implies that E(ρ, σ) = +∞ in thesingular case of supp(ρ) * supp(σ).

For the remainder of the proof, we assume that the support conditionsupp(ρ) ⊆ supp(σ) holds, so that D(ρ‖σ) is finite.

Let us first show that D(ρ‖σ) is an achievable rate, which establishes thatE(ρ, σ) ≥ D(ρ‖σ). To this end, fix ε ∈ (0, 1] and δ > 0. Let δ1, δ2 > 0 be suchthat

δ1 + δ2 = δ. (4.10.12)

Set α ∈ (0, 1) such that

δ1 ≥ D(ρ‖σ)− Dα(ρ‖σ), (4.10.13)

which is possible because limα→1 Dα(ρ‖σ) = D(ρ‖σ) by Proposition 4.20 andDα is monotonically increasing in α, as established in Proposition 4.21. Then,with this choice of α, take n large enough so that

δ2 ≥α

n(1− α)log2

(1ε

). (4.10.14)

Now, let 0 ≤ Λ(n) ≤ 1⊗n be a measurement operator that achieves theε-hypothesis testing relative entropy Dε

H(ρ⊗n‖σ⊗n), which means that

Tr[(1⊗n −Λ(n))ρ⊗n] = ε (4.10.15)

375

Page 387: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

and− 1

nlog2 Tr[Λ(n)σ⊗n] =

1n

DεH(ρ

⊗n‖σ⊗n). (4.10.16)

The elements (n, ρ, σ, Λ(n)) thus constitute an (n, 2−n( 1n Dε

H(ρ⊗n‖σ⊗n)), ε) hypoth-esis testing protocol for ρ and σ. We now apply Proposition 4.69 and the ad-ditivity of the Petz–Rényi relative entropy from Proposition 4.21 to find that

1n

DεH(ρ

⊗n‖σ⊗n) ≥ 1n

Dα(ρ⊗n‖σ⊗n) +

α

n(α− 1)log2

(1ε

)(4.10.17)

= Dα(ρ‖σ) +α

n(α− 1)log2

(1ε

). (4.10.18)

Rearranging the right-hand side of this inequality and using (4.10.12)–(4.10.14),we conclude that

1n

DεH(ρ

⊗n‖σ⊗n)

≥ D(ρ‖σ)−(

D(ρ‖σ)− Dα(ρ‖σ) +α

n(1− α)log2

(1ε

))(4.10.19)

≥ D(ρ‖σ)− (δ1 + δ2) (4.10.20)≥ D(ρ‖σ)− δ. (4.10.21)

We thus haveD(ρ‖σ)− δ ≤ 1

nDε

H(ρ⊗n‖σ⊗n). (4.10.22)

The error 2−n(D(ρ‖σ)−δ) is then greater than or equal to 2−n( 1n Dε

H(ρ⊗n‖σ⊗n)),which means, by the fact stated in the paragraph immediately after Defini-tion 4.70, that there exists an (n, 2−n(R−δ), ε) hypothesis testing protocol withR = D(ρ‖σ) for all sufficiently large n such that (4.10.14) holds. Since ε and δare arbitrary, we conclude that for all ε ∈ (0, 1], δ > 0, and sufficiently large n,there exists an (n, 2−n(R−δ), ε) hypothesis testing protocol with R = D(ρ‖σ).Then D(ρ‖σ) is an achievable rate, so that

E(ρ, σ) ≥ D(ρ‖σ). (4.10.23)

Let us now show that the quantum relative entropy D(ρ‖σ) is a strongconverse rate, which establishes that E(ρ, σ) ≤ D(ρ‖σ). Fix ε ∈ [0, 1) andδ > 0. Let δ1, δ2 > 0 be such that

δ > δ1 + δ2 =: δ′. (4.10.24)

376

Page 388: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Set α ∈ (1, ∞) such that

δ1 ≥ Dα(ρ‖σ)− D(ρ‖σ), (4.10.25)

which is possible because limα→1 Dα(ρ‖σ) = D(ρ‖σ) by Proposition 4.28 andDα is monotonically increasing in α, as established in Proposition 4.29. Withthis value of α, take n large enough so that

δ2 ≥α

n(α− 1)log2

(1

1− ε

). (4.10.26)

Now, consider an arbitrary measurement operator Λ(n) such that the hy-pothesis testing protocol given by (n, ρ, σ, Λ(n)) satisfies ε ≥ Tr[(1⊗n−Λ(n))ρ⊗n]

and εII ≥ Tr[Λ(n)σ⊗n]. By definition of the hypothesis testing relative entropy,we have that− log2 Tr[Λ(n)σ⊗n] ≤ Dε

H(ρ⊗n‖σ⊗n). Applying Proposition 4.68,

we thus find that

−1n

log2 Tr[Λ(n)σ⊗n] ≤ 1n

DεH(ρ

⊗n‖σ⊗n) (4.10.27)

≤ α

n(α− 1)log2

(1

1− ε

)+

1n

Dα(ρ⊗n‖σ⊗n) (4.10.28)

n(α− 1)log2

(1

1− ε

)+ Dα(ρ‖σ), (4.10.29)

where to obtain the second line we used the additivity of the sandwichedRényi relative entropy, as shown in Proposition 4.29. Rearranging the right-hand side of this inequality and using (4.10.24)–(4.10.26), we obtain

−1n

log2 Tr[Λ(n)σ⊗n] ≤ D(ρ‖σ) + Dα(ρ‖σ)− D(ρ‖σ)

n(α− 1)log2

(1

1− ε

)(4.10.30)

≤ D(ρ‖σ) + δ′ (4.10.31)< D(ρ‖σ) + δ. (4.10.32)

We thus have that Tr[Λ(n)σ⊗n] > 2−n(D(ρ‖σ)+δ). Since Λ(n) is an arbitrarymeasurement operator satisfying ε ≥ Tr[(1⊗n −Λ(n))ρ⊗n], we see that, for allsufficiently large n such that (4.10.26) holds, an (n, 2−n(D(ρ‖σ)+δ), ε) hypoth-esis testing protocol cannot exist, for if it did we would have Tr[Λ(n)σ⊗n] ≤2−n(D(ρ‖σ)+δ) for some Λ(n). Since ε and δ are arbitrary, we have that for all ε ∈

377

Page 389: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

[0, 1), δ > 0, and sufficiently large n, there does not exist an (n, 2−n(D(ρ‖σ)+δ), ε)hypothesis testing protocol for ρ and σ, which means that D(ρ‖σ) is a strongconverse rate, so that

E(ρ, σ) ≤ D(ρ‖σ). (4.10.33)

Using (4.10.23), (4.10.33), and (4.10.8), we obtain

E(ρ, σ) ≤ E(ρ, σ) ≤ D(ρ‖σ) ≤ E(ρ, σ), (4.10.34)

which means that E(ρ, σ) = E(ρ, σ) = D(ρ‖σ).

We can conclude the main result of Theorem 4.75 in a different yet relatedway. Recall that an alternate definition of the optimal type-II error exponentis given in (4.10.1), i.e.,

E(ρ, σ) = infε∈(0,1)

lim infn→∞

DεH(ρ

⊗n‖σ⊗n)

n. (4.10.35)

It is straightforward to show that

infε∈(0,1)

lim infn→∞

DεH(ρ

⊗n‖σ⊗n)

n= D(ρ‖σ). (4.10.36)

Indeed, using (4.10.18), we find that

infε∈(0,1)

lim infn→∞

1n

DεH(ρ

⊗n‖σ⊗n) ≥ Dα(ρ‖σ) (4.10.37)

for all α ∈ (0, 1), so that taking the supremum over α ∈ (0, 1) on the right-hand side leads to

E(ρ, σ) ≥ D(ρ‖σ). (4.10.38)

We can also write the optimal strong converse type-II error exponent as

E(ρ, σ) = supε∈(0,1)

lim supn→∞

1n

DεH(ρ

⊗n‖σ⊗n). (4.10.39)

Similarly, using (4.10.29), we conclude that

supε∈(0,1)

lim supn→∞

1n

DεH(ρ

⊗n‖σ⊗n) ≤ Dα(ρ‖σ) (4.10.40)

378

Page 390: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

for all α > 1, so that taking the infimum over α ∈ (1, ∞) on the right-handside leads to

E(ρ, σ) ≤ D(ρ‖σ). (4.10.41)

We therefore conclude that E(ρ, σ) = E(ρ, σ) = D(ρ‖σ). Note that in thearguments above for the lower bound on E(ρ, σ) and the upper bound onE(ρ, σ), we did not have to explicitly take the infimum or supremum overε ∈ (0, 1), respectively.

4.10.1 Error and Strong Converse Exponents

Given states ρ and σ, as well as the number n of copies of the states, we canchange our perspective a bit from that given in the previous section and in-stead determine bounds on the type-I error probability ε. In particular, wecan change our focus a bit, such that we are now interested in how fast thetype-I error probability converges to zero if the type-II error exponent is equalto a constant smaller than the quantum relative entropy, and we are also in-terested in how fast the type-I error probability converges to one if the type-IIerror exponent is equal to a constant larger than the quantum relative en-tropy. To assist with this analysis, we establish the following propositions,whose proofs are closely related to the proofs of Propositions 4.68 and 4.69.

Proposition 4.76

Let ρ be a state, and let σ be a positive semi-definite operator. Let α > 1and R ≥ 0. Then, for Λ a measurement operator satisfying

Tr[Λσ] ≤ 2−R, (4.10.42)

the following bound holds

Tr[(I −Λ) ρ] ≥ 1− 2−(α−1

α )(R−Dα(ρ‖σ)). (4.10.43)

REMARK: Note that the second bound is nontrivial only in the case that R > D(ρ‖σ)because Dα(ρ‖σ) > D(ρ‖σ) for α > 1.

PROOF: The proof is similar to the proof of Proposition 4.68. Let p := Tr[Λρ]and q := Tr[Λσ]. By applying the data-processing inequality for the sand-wiched Rényi relative entropy along with the measurement channel from

379

Page 391: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

(4.9.61), we conclude that

Dα(ρ‖σ) ≥ Dα(p, 1− p ‖ q, Tr[σ]− q) (4.10.44)

=1

α− 1log2

[pαq1−α + (1− p)α (Tr[σ]− q)1−α

](4.10.45)

≥ 1α− 1

log2

[pαq1−α

](4.10.46)

α− 1log2 p− log2 q (4.10.47)

≥ α

α− 1log2 p + R (4.10.48)

α− 1log2 Tr[Λρ] + R, (4.10.49)

which implies thatTr[Λρ] ≤ 2−(

α−1α )(R−Dα(ρ‖σ)) (4.10.50)

Rewriting this gives the bound in the statement of the proposition.

Proposition 4.77

Let ρ be a state, and let σ be a positive semi-definite operator. Let α ∈(0, 1) and R ≥ 0. Then, there exists a measurement operator Λ such that

Tr[Λσ] ≤ 2−R, (4.10.51)

Tr[(I −Λ) ρ] ≤ 2−(1−α

α )(Dα(ρ‖σ)−R). (4.10.52)

REMARK: Note that the second bound above is nontrivial only in the case that R <D(ρ‖σ) because Dα(ρ‖σ) < D(ρ‖σ) for α ∈ (0, 1).

PROOF: The proof is similar to the proof of Proposition 4.69. Employing thesame measurement operator Λ∗ therein, we conclude from the same reason-ing in that proof that

Tr[(I −Λ∗) ρ] ≤(

1− pp

)1−α

Tr[ρασ1−α], (4.10.53)

Tr[Λ∗σ] ≤(

p1− p

Tr[ρασ1−α]. (4.10.54)

We then pick p ∈ (0, 1) such that the following equation is satisfied

2−R =

(p

1− p

Tr[ρασ1−α] (4.10.55)

380

Page 392: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

=

(p

1− p

2(α−1)Dα(ρ‖σ) (4.10.56)

⇐⇒ 2−R2(1−α)Dα(ρ‖σ) =(

p1− p

(4.10.57)

⇐⇒ 2R/α2(α−1)Dα(ρ‖σ)/α =

(1− p

p

). (4.10.58)

We see that picking p in such a way is always possible because one more stepof the development above leads to the conclusion that

p =1

1 + 2R/α2(α−1)Dα(ρ‖σ)/α∈ (0, 1) . (4.10.59)

Substituting into (4.10.53), we find that

Tr[(I −Λ∗) ρ] ≤(

2R/α2(α−1)Dα(ρ‖σ)/α)1−α

2(α−1)Dα(ρ‖σ) (4.10.60)

= 2(1−α

α )R2−(1−α)2

α Dα(ρ‖σ)2(α−1)Dα(ρ‖σ) (4.10.61)

= 2(1−α

α )R2(α−1)

α Dα(ρ‖σ) (4.10.62)

= 2−(1−α

α )(Dα(ρ‖σ)−R), (4.10.63)

concluding the proof.

The inequalities from Propositions 4.76 and 4.77 lead to the followingbounds on the type-I error probability ε for quantum hypothesis testing, whenthe type-II error probability has a fixed rate R:

1− 2−n( α−1α )(R−Dα(ρ‖σ)) ≤ ε ≤ 2−n( 1−α

α )(Dα(ρ‖σ)−R). (4.10.64)

The left inequality holds for all α > 1, while the right inequality holds for allα ∈ (0, 1). Let us now examine the behavior of ε above and below the optimalrate D(ρ‖σ).

1. Consider a sequence (n, 2−nR, εn)n∈N of hypothesis testing protocolsfor ρ and σ such that the rate of each protocol has some arbitrary (butfixed) value R < D(ρ‖σ). By Proposition 4.77, the sequence can be cho-sen such that each element satisfies the right-most inequality in (4.10.64),so that

εn ≤ 2−n( 1−αα )(Dα(ρ‖σ)−R). (4.10.65)

381

Page 393: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

n→ ∞

Type-IIError

Exponent,R

Type-IError

Probability,εn

D(ρ‖σ)0

1

FIGURE 4.3: The type-I error probability εn as a function of the rate R,i.e., the type-II error exponent, as the number n of copies of the systemapproaches infinity for the task of asymmetric hypothesis testing for thestates ρ and σ. The optimal rate of D(ρ‖σ) for this task, as established bythe quantum Stein’s lemma in Theorem 4.75, has what is called the strongconverse property, which means that it is the optimal strong converse rate.Therefore, for every rate above it, the type-I error probability converges toone in the limit of arbitrarily many copies of the system.

Since the Petz–Rényi relative entropy Dα(ρ‖σ) is monotonically increas-ing in α, as established in Proposition 4.21, there exists an α∗ < 1 such thatDα∗(ρ‖σ) lies in between D(ρ‖σ) and R; i.e., we have that R < Dα∗(ρ‖σ).We thus obtain

εn ≤ 2−n(

1−α∗α∗)(Dα∗ (ρ‖σ)−R). (4.10.66)

Since R < Dα∗(ρ‖σ), taking the limit n → ∞ on both sides of this in-equality gives us limn→∞ εn ≤ 0. However, εn ≥ 0 for all n because εn isby definition a probability. So we find that

εn → 0 as n→ ∞ if R < D(ρ‖σ). (4.10.67)

2. Now consider a sequence (n, 2−nR, εn)n∈N of hypothesis testing pro-tocols for ρ and σ such that the rate of each protocol has some arbitrary(but fixed) value R > D(ρ‖σ). In other words, suppose that we wouldlike to perform hypothesis testing at a rate above the optimal rate. Eachelement of the sequence satisfies the left-most inequality in (4.10.64), sothat

εn ≥ 1− 2−n( α−1α )(R−Dα(ρ‖σ)). (4.10.68)

Then, recalling from Proposition 4.29 that the sandwiched Rényi relativeentropy Dα(ρ‖σ) is monotonically increasing in α, there exists an α∗ > 1such that Dα∗(ρ‖σ) lies in between D(ρ‖σ) and R; i.e., we have that R >

382

Page 394: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Dα∗(ρ‖σ). We thus obtain

εn ≥ 1− 2−n(

α∗−1α∗)(R−Dα∗ (ρ‖σ)). (4.10.69)

Since R > Dα∗(ρ‖σ), taking the limit n → ∞ on both sides of this in-equality gives us limn→∞ εn ≥ 1. However, εn ≤ 1 for all n because εn isby definition a probability. So we conclude that

εn → 1 as n→ ∞ if R > D(ρ‖σ). (4.10.70)

So the type-I error probability grows exponentially to one in the limitn→ ∞ for every rate above the optimal rate.

The optimal rate D(ρ‖σ) is therefore a sharp dividing point below whichthe type-I error probability εn exponentially drops to zero as n → ∞ andabove which it exponentially increases to one as n → ∞. This behavior isillustrated in Figure 4.3.

4.11 Information Measures for Quantum Channels

We conclude this chapter with a discussion of information measures for quan-tum channels.

Given an information measure defined for quantum states, we define acorresponding information measure for channels by sending one share of abipartite state through a given channel and evaluating the information mea-sure on the corresponding output state. Specifically, for every generalizeddivergence D : D(H) × L+(H) → R ∪ +∞, which was given in Defini-tion 4.13, we define the generalized channel divergence as follows:

Definition 4.78 Generalized Channel Divergence

Given a generalized divergence D : D(H) × L+(H) → R ∪ +∞, aquantum channel NA→B, and a completely positive map MA→B, we de-fine the generalized channel divergence of N and M as

D(N‖M) := supρRA

D(NA→B(ρRA)‖MA→B(ρRA)) , (4.11.1)

where the supremum is over all mixed states ρRA with an arbitrary ref-erence system R.

383

Page 395: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.79

Let D, NA→B, and MA→B be as given in Definition 4.78. It suffices tooptimize the generalized channel divergence D(N‖M) with respect topure states ψRA with the reference system R isomorphic to the inputsystem A:

D(N‖M) = supψRA

D(NA→B(ψRA)‖MA→B(ψRA)) (4.11.2)

= supρA

D(√

ρAΓNAB√

ρA‖√

ρAΓMAB√

ρA). (4.11.3)

In the second line, ΓNAB and ΓM

AB denote the Choi operators of NA→Band MA→B, respectively, and the optimization is with respect to everydensity operator ρA.

PROOF: Observe that if we take a purification |φ〉R′RA of the state ρRA in theoptimization in (4.11.1), then we find that

D(NA→B(ρRA)‖MA→B(ρRA)) (4.11.4)= D(NA→B(TrR′ [φR′RA])‖MA→B(TrR′ [φR′RA])) (4.11.5)= D(TrR′ [NA→B(φR′RA)] ‖TrR′ [MA→B(φR′RA)]) (4.11.6)≤ D(NA→B(φR′RA)‖MA→B(φR′RA)) , (4.11.7)

where we have used the data-processing inequality for the generalized diver-gence in the last line. This means that for every state ρRA, the generalizeddivergence in (4.11.4) is never larger than the corresponding generalized di-vergence evaluated on a purification of ρRA. This means that it suffices in(4.11.1) to optimize over only pure states. Furthermore, by the Schmidt de-composition theorem (Theorem 3.2), the purifying space HR′R need not havedimension exceeding that of the dimension of HA. Therefore, the generalizedchannel divergence can be written as in (4.11.2).

To see (4.11.3), we first use the fact in (2.2.29), which implies that for everypure state ψRA, there exists a state ρR and a unitary UR such that

|ψ〉RA = (UR√

ρR ⊗ 1A)|Γ〉RA. (4.11.8)

Thus, it follows that

NA→B(ψRA) = NA→B((UR√

ρR ⊗ 1A)ΓRA(√

ρRU†R ⊗ 1A)) (4.11.9)

384

Page 396: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= (UR√

ρR ⊗ 1B)NA→B(ΓRA)(√

ρRU†R ⊗ 1B) (4.11.10)

= (UR√

ρR ⊗ 1B)ΓNRB(√

ρRU†R ⊗ 1B), (4.11.11)

where we employed the Choi representation ΓNRB of the channel N. Similarly,

MA→B(ψRA) = UR√

ρRΓMRB√

ρRU†R. By employing the unitary invariance of a

generalized divergence, we conclude that

D(NA→B(ψRA)‖MA→B(ψRA)) = D(√

ρAΓNAB√

ρA‖√

ρAΓMAB√

ρA). (4.11.12)

Then it suffices to optimize over states ρA, so that (4.11.3) holds.

Proposition 4.80

Let D, NA→B, and MA→B be as given in Definition 4.78, and supposethat D obeys the direct-sum property in (4.3.7). Then the function

f (ρA,MA→B) := D(√

ρAΓNAB√

ρA‖√

ρAΓMAB√

ρA) (4.11.13)

is concave in the first argument and convex in the second. If M is aconvex set of completely positive maps, then

infM∈M

supρA

D(√

ρAΓNAB√

ρA‖√

ρAΓMAB√

ρA)

= supρA

infM∈M

D(√

ρAΓNAB√

ρA‖√

ρAΓMAB√

ρA). (4.11.14)

Equivalently,

infM∈M

D(N‖M) = supψRA

infM∈M

D(NA→B(ψRA)‖MA→B(ψRA)), (4.11.15)

where ψRA is a pure state with system R isomorphic to system A.

PROOF: To see the concavity, let ψ0RA and ψ1

RA be pure states with reducedstates ψ0

A and ψ1A. Let ψλ

SRA denote the following pure state:

|ψλ〉SRA :=√

1− λ|0〉S|ψ0〉RA +√

λ|1〉S|ψ1〉RA. (4.11.16)

Observe thatψλ

A = (1− λ)ψ0A + λψ1

A, (4.11.17)

385

Page 397: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

so that the reduced state ψλA is a convex combination of the reduced states ψ0

Aand ψ1

A. Define

ψλSRA := (1− λ) |0〉〈0|S ⊗ ψ0

RA + λ|1〉〈1|S ⊗ ψ1RA, (4.11.18)

which is the state resulting from the action of a completely dephasing qubitchannel on system S. Let φλ

RA be an arbitrary pure state with reduced stateequal to ψλ

A. Then we find that

D(NA→B(φλRA)‖MA→B(φ

λRA))

= D(NA→B(ψλSRA)‖MA→B(ψ

λSRA)) (4.11.19)

≥ D(NA→B(ψλSRA)‖MA→B(ψ

λSRA)) (4.11.20)

= λD(NA→B(ψ0RA)‖MA→B(ψ

0RA))

+ (1− λ) D(NA→B(ψ1RA)‖MA→B(ψ

1RA)). (4.11.21)

The first equality follows because every two purifications of the same stateare related by an isometric channel acting on the reference system, as wellas the isometric invariance of the generalized divergence. The inequality fol-lows from quantum data processing, by the action of a completely dephasingqubit channel on the system S. The final equality follows from the direct-sumproperty and because the generalized divergence is invariant under tensoringin the same state (Proposition 4.14). Finally, we have the following equalities,by employing the isometric invariance of the generalized divergence (Propo-sition 4.14), the equality in (4.11.12), and the definition in (4.11.13):

D(NA→B(φλRA)‖MA→B(φ

λRA)) = f (ψλ

A,MA→B), (4.11.22)

D(NA→B(ψ0RA)‖MA→B(ψ

0RA)) = f (ψ0

A,MA→B), (4.11.23)

D(NA→B(ψ1RA)‖MA→B(ψ

1RA)) = f (ψ1

A,MA→B). (4.11.24)

To see the convexity in M, consider that for every λ ∈ [0, 1] and completelypositive maps M1,M2 ∈M, the joint convexity of the generalized divergence(Proposition 4.15) gives that

f (ρA, λM1 + (1− λ)M2)

= D(√

ρAΓNAB√

ρA‖√

ρAΓλM1+(1−λ)M2AB

√ρA) (4.11.25)

= D(λ√

ρAΓNAB√

ρA + (1− λ)√

ρAΓNAB√

ρA

‖λ√ρAΓM1AB√

ρA + (1− λ)√

ρAΓM2AB√

ρA) (4.11.26)

386

Page 398: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

≤ λD(√

ρAΓNAB√

ρA‖√

ρAΓM1AB√

ρA)

+ (1− λ)D(√

ρAΓNAB√

ρA‖√

ρAΓM2AB√

ρA) (4.11.27)= λ f (ρA,M1) + (1− λ) f (ρA,M2). (4.11.28)

The equality in (4.11.14) follows from what was just shown and Sion’sminimax theorem (Theorem 2.18). The equality in (4.11.15) follows from (4.11.12)and Proposition 4.79.

The generalized channel divergence takes a simple form if the channel Nand the completely positive map M both happen to be jointly covariant withrespect to a group, as shown in the following proposition.

Proposition 4.81 Generalized Divergence for Jointly CovariantChannels

Let D : D(H)×L+(H)→ R∪ +∞ be a generalized divergence and Ga finite group with unitary representations Ug

Ag∈G and VgBg∈G. Let

NA→B be a quantum channel and MA→B a completely positive map thatare both covariant with respect to G, i.e., (Definition 3.48)

N(UgρU†g) = VgN(ρ)V†

g ,

M(UgρU†g) = VgM(ρ)V†

g ,(4.11.29)

for every g ∈ G and state ρ. Then, for every state ψA′A, with the dimen-sion of A′ equal to the dimension of A,

D(NA→B(ψA′A)‖MA→B(ψA′A))

≤ D(NA→B(φρRA)‖MA→B(φ

ρRA)), (4.11.30)

where ρA := ψA = TrA′ [ψA′A],

ρA := TG(ρA) :=1|G| ∑

g∈GUg

AρAUg†A , (4.11.31)

and φρRA is a purification of ρA. Consequently, the generalized channel

divergence D(N‖M) is given by optimizing over pure states ψA′A suchthat the reduced state ψA is invariant under the channel TG; i.e.,

D(N‖M)

387

Page 399: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= supψA′AD(NA→B(ψA′A)‖MA→B(ψA′A)) : ψA = TG(ψA). (4.11.32)

In particular, if the representation UgAg∈G is irreducible, then the opti-

mal state in (4.11.32) is the maximally entangled state ΦA′A, so that

D(N‖M) = D(ρNAB‖ρMAB), (4.11.33)

where ρNAB and ρMAB are Choi states of N and M, respectively.

PROOF: The inequality

D(N‖M) ≥ supψA′AD(NA→B(ψA′A)‖MA→B(ψA′A)) : ψA = TG(ψA) (4.11.34)

is immediate from the fact that the set ψA′A : ψA = TG(ψA) of pure statesis a subset of all pure states. The remainder of this proof is devoted to thereverse inequality.

Let ψA′A be an arbitrary state, let ρA = ψA, and let ρA = TG(ρA). Fur-thermore, let φ

ρRA be a purification of ρA. Let us also consider the following

purification of ρA:

|ψρ〉R′A′A :=1√|G| ∑

g∈G|g〉R′ ⊗ (1A′ ⊗Ug

A)|ψ〉A′A, (4.11.35)

where |g〉g∈G is an orthonormal basis for HR′ indexed by the elements ofG. Since all purifications of a state can be mapped to each other by isometriesacting on the purifying systems, there exists an isometry WR→R′A′ such that|ψρ〉R′A′A = WR→R′A′ |φρ〉RA. Using this, we find that

D(NA→B(ψρR′A′A)‖MA→B(ψ

ρR′A′A)) (4.11.36)

= D(NA→B(WR→R′A′(φρRA))‖MA→B(WR→R′A′(φ

ρRA))) (4.11.37)

= D(WR→R′A′(NA→B(φρRA))‖WR→R′A′(MA→B(φ

ρRA))) (4.11.38)

= D(NA→B(φρRA)‖MA→B(φ

ρRA)), (4.11.39)

where to obtain the last equality we used the fact that every generalized di-vergence is isometrically invariant (recall Proposition 4.14). Now, let us applythe dephasing map X 7→ ∑g∈G |g〉〈g|X|g〉〈g| to the R′ system. Since this map

388

Page 400: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

is a channel, by the data-processing inequality for the generalized divergence,we obtain

D(NA→B(ψ

ρR′A′A)‖MA→B(ψ

ρR′A′A)

)

≥ D

(1|G| ∑

g∈G|g〉〈g|R′ ⊗ (NA→B Ug

A)(ψA′A)

∥∥∥∥∥

1|G| ∑

g∈G|g〉〈g|R′ ⊗ (MA→B Ug

A)(ψA′A)

). (4.11.40)

Then, because generalized divergences are invariant under unitaries, we canapply the unitary channel given by the unitary ∑g∈G |g〉〈g| ⊗Vg†

B at the outputof N and M to obtain

D(NA→B(ψ

ρR′A′A)‖MA→B(ψ

ρR′A′A)

)

≥ D

(1|G| ∑

g∈G|g〉〈g|R′ ⊗ ((V

gB)

† NA→B UgA)(ψA′A)

∥∥∥∥∥

1|G| ∑

g∈G|g〉〈g|R′ ⊗ ((V

gB)

† MA→B UgA)(ψA′A)

). (4.11.41)

Finally, since the group-covariance of N and M with respect to the represen-tations Ug

Ag∈G and VgBg∈G implies that

(VgB)

† N UgA = N, V

g†B M U

gA = M (4.11.42)

for all g ∈ G, and since from Proposition 4.14 generalized divergences areinvariant under tensoring with the same state in both arguments, we obtain

D(NA→B(φρRA)‖MA→B(φ

ρRA)) (4.11.43)

= D(NA→B(ψ

ρR′A′A)‖MA→B(ψ

ρR′A′A)

)(4.11.44)

≥ D

(1|G| ∑

g∈G|g〉〈g|R′ ⊗NA→B(ψA′A)

∥∥∥∥∥1|G| ∑

g∈ G|g〉〈g|R′ ⊗MA→B(ψA′A)

)

(4.11.45)= D(NA→B(ψA′A)‖(MA→B(ψA′A)) , (4.11.46)

which is precisely (4.11.30). By definition, the pure state φρRA is such that its

reduced state on A is invariant under the channel TG. Therefore, optimizing

389

Page 401: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

over all such pure states, we obtain

D(NA→B(ψA′A)‖MA→B(ψA′A))

≤ supφRA

D(NA→B(φRA)‖MA→B(φRA)) : φA = TG(φA). (4.11.47)

Since this inequality holds for all pure states ψA′A, we obtain

D(N‖M) = supψA′A

D(NA→B(ψA′A)‖MA→B(ψA′A)) (4.11.48)

≤ supψA′AD(NA→B(ψA′A)‖MA→B(ψA′A)) : ψA = TG(ψA). (4.11.49)

Combining this inequality with (4.11.34) gives us (4.11.32).

Finally, if the representation UgAg∈G acting on the input space of the

channel N and the completely positive map M is irreducible, then for everystate ψA′A such that ρA = ψA, it holds that ρA = 1A/dA. Then, since themaximally entangled state is a purification of the maximally mixed state, welet φ

ρRA = ΦRA, which implies by (4.11.39) that

D(NA→B(ψρR′A′A)‖MA→B(ψ

ρR′A′A)) (4.11.50)

= D(NA→B(ΦRA)‖MA→B(ΦRA)) (4.11.51)

= D(ρNRB‖ρMRB). (4.11.52)

Then, by (4.11.46), we have that

D(ρNRB‖ρMRB) ≥ D(NA→B(ψA′A)‖MA→B(ψA′A)) (4.11.53)

for all states ψA′A, so that D(ρNRB‖ρMRB) ≥ D(N‖M). Since the reverse inequal-ity trivially holds, we obtain (4.11.33).

We can also define information measures for channels based on informa-tion measures for states derived from generalized divergences. Using thegeneralized coherent information and the generalized mutual informationdefined in (4.3.13) and (4.3.14), respectively, we make the following defini-tions.

390

Page 402: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Definition 4.82 Generalized Information Measures for QuantumChannels

Let D be a generalized divergence, as defined in Definition 4.13, and letNA→B be a quantum channel.

1. The generalized mutual information of N, denoted by I(N), is definedas

I(N) := supψRA

I(R; B)ω

= supψRA

infσB

D(NA→B(ψRA)‖ψR ⊗ σB),(4.11.54)

where ωRB = NA→B(ψRA), ψRA is a pure state with the dimensionof R the same as the dimension of A, and the generalized mutualinformation I(A; B)ρ of a bipartite state ρAB has been defined in(4.3.14).

2. The generalized coherent information of N, denoted by Ic(N), is definedas

Ic(N) := supψRA

I(R〉B)ω

= supψRA

infσB

D(NA→B(ψRA)‖1R ⊗ σB),(4.11.55)

where ωRB = NA→B(ψRA), ψRA is a pure state with the dimensionof R the same as the dimension of A, and the generalized coher-ent information Ic(A〉B)ρ of a bipartite state ρAB has been definedin (4.3.13).

3. The generalized Holevo information of N, denoted by χ(N), is definedas

χ(N) := supρXA

I(X; B)ω

= supρXA

infσB

D(NA→B(ρXA)‖ρX ⊗ σB),(4.11.56)

where ωXA = NA→B(ρXA). Here, ρXA = ∑x∈X p(x)|x〉〈x|X ⊗ ρxA

is a classical–quantum state, with X a finite alphabet with corre-sponding |X|-dimensional system X, ρx

Ax∈X is a set of states, andp : X → [0, 1] is a probability distribution on X. The supremum isadditionally over the classical system X; i.e., no restriction is madeon the size of X.

391

Page 403: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

The generalized mutual information I(N) and the generalized coherentinformation Ic(N) both involve an optimization over pure states. It is straigh-tforward to show that it suffices to optimize over pure states for both quanti-ties. The argument is similar to that in (4.11.4)–(4.11.7) above.

For the generalized mutual information of a covariant channel, we canprove a result that is analogous to Proposition 4.81.

Proposition 4.83 Generalized Mutual Information for CovariantChannels

Let NA→B be a G-covariant quantum channel for a finite group G. Then,for all pure states ψRA, with the dimension of R equal to the dimensionof A, we have that

I(R; B)ω ≤ I(R; B)ω, (4.11.57)

where ωRB := NA→B(ψRA), ωRB := NA→B(φρRA),

ρA =1|G| ∑

g∈GUg

AρAUg†A =: TG(ρA), (4.11.58)

ρA = ψA = TrR[ψRA], and φρRA is a purification of ρA. Consequently,

I(N) = supφRA

I(R; B)ω : φA = TG(φA), ωRB = NA→B(φRA). (4.11.59)

In other words, in order to calculate I(N), it suffices to optimize overpure states ψRA such that the reduced state ψA is invariant under thechannel TG defined in (4.11.58).

If the representation UgAg∈G is irreducible, then I(N) is equal to the

generalized mutual information of the Choi state of the channel, i.e.,

I(N) = I(R; B)ρN . (4.11.60)

PROOF: The proof is similar to the proof of Proposition 4.81 and uses somesteps therein.

The inequality

I(N) ≥ supφRA

I(R; B)ω : φA = TG(φA), ωRB = NA→B(φRA) (4.11.61)

392

Page 404: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

holds simply by restricting the optimization in the definition of I(N) to purestates φRA whose reduced states φA are invariant under the channel TG. Theremainder of the proof is devoted to showing that the reverse inequality holdsas well.

Let ψRA be an arbitrary pure state, let ρA = ψA, and let ρA = TG(ρA).Furthermore, let φ

ρRA be a purification of ρA. Let MA→B be the replacement

channel that traces out the A system and replaces it with a state σB. Nowfollowing the steps in (4.11.35)–(4.11.41), and incorporating the covariance ofthe channel N, we conclude that

D(NA→B(ϕρRA)‖MA→B(ϕ

ρRA))

= D(NA→B(ϕρRA)‖ϕ

ρR ⊗ σB) (4.11.62)

≥ D

(1|G| ∑

g∈G|g〉〈g|R′ ⊗NA→B(ψRA)

∥∥∥∥∥1|G| ∑

g∈G|g〉〈g|R′ ⊗ ψR ⊗Vg†

B σBVgB

)

(4.11.63)≥ D(NA→B(ψRA)‖ψR ⊗ τB), (4.11.64)

where to obtain the last line we applied the data-processing inequality underthe partial trace channel TrR′ and we let τB := 1

|G| ∑g∈G Vg†B σBVg

B . By takingthe infimum over all states τB on the right-hand side of the inequality above,we find that

D(NA→B(φρRA)‖φ

ρR ⊗ σB) ≥ D(NA→B(ψRA)‖ψR ⊗ τB) (4.11.65)

≥ infτB

D(NA→B(ψRA)‖ψR ⊗ τB) (4.11.66)

= I(R; B)ω, (4.11.67)

where ωRB = NA→B(ψRA). The inequality above holds for all states ψRA andall states σB. Therefore, optimizing over all states σB on the left-hand side ofthe above inequality leads to

infσB

D(NA→B(φρRA)‖φ

ρR ⊗ σB) = I(R; B)ω ≥ I(R; B)ω, (4.11.68)

where ωRB = NA→B(φρRA). Thus, we conclude (4.11.57).

Next, by construction, the state φρRA is such that its reduced state on A is

invariant under the channel TG. Optimizing over all such states leads to

supφRA

I(R; B)ω : φA = TG(φA), ωRB = NA→B(φRA) ≥ I(R; B)ω. (4.11.69)

393

Page 405: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Since this inequality holds for all pure states ψRA, we finally obtain

supφRA

I(R; B)ω : φA = TG(φA), ωRB = NA→B(φRA)

≥ supψRA

I(R; B)ω = I(N), (4.11.70)

as required.

To prove (4.11.60), note that if UgAg∈G is irreducible, then for every state

ψRA, the state ρA = ψA satisfies ρA = TG(ρA) = 1AdA

. Then, since the max-imally entangled state is a purification of the maximally mixed state, we letφ

ρRA = ΦRA, which implies via (4.11.68) that

infσB

D(NA→B(ΦRA)‖πR ⊗ σB) = I(R; B)ρN ≥ I(R; B)ω, (4.11.71)

where ωRB = NA→B(ψRA). Since the pure state ψRA is arbitrary, we have that

I(R; B)ρN ≥ supψRA

I(R; B)ω = I(N). (4.11.72)

The reverse inequality holds simply by restricting the optimization in the def-inition of I(N) to the maximally entangled state ΦRA. We thus have (4.11.60),as required.

The following proposition is helpful in simplifying the computation of thegeneralized Holevo information χ(N) of a quantum channel N:

Proposition 4.84

Let N be a quantum channel. To compute its generalized Holevo infor-mation χ(N), as defined in (4.11.56), it suffices to optimize over ensem-bles consisting of pure states. If the underlying generalized divergenceis continuous, then no more than d2 pure states are needed for the opti-mization, where d is the dimension of the input space of N.

PROOF: Let ρXA be a classical–quantum state of the form

ρXA = ∑x∈X

p(x)|x〉〈x|X ⊗ ρxA, (4.11.73)

where X is a finite alphabet with associated |X|-dimensional system X, p :X→ [0, 1] is a probability distribution on X, and ρx

Ax∈X is a set of states.

394

Page 406: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

First, by simply restricting the optimization in the definition of the Holevoinformation to ensembles containing pure states only, we obtain

χ(N) = supρXA

I(X; B)NA→B(ρXA)≥ sup

τZA

I(Z; B)NA→B(τZA), (4.11.74)

where τZA is a classical–quantum state consisting only of pure states, i.e.,

τZA = ∑z∈Z

p(x)|z〉〈z|Z ⊗ |ψz〉〈ψz|A. (4.11.75)

Now, let each state ρxA in the classical–quantum state ρXA have a spectral

decomposition of the form

ρxA =

rx

∑y=1

λxy|φx

y〉〈φxy |, (4.11.76)

where rx = rank(ρxA). So ρXA can be written as

ρXA = ∑x∈X

rx

∑y=1

p(x)λxy|x〉〈x|X ⊗ |φx

y〉〈φxy |A. (4.11.77)

Now, let us define the state

ωXYA = ∑x∈X

rx

∑y=1

p(x)λxy|x〉〈x|X ⊗ |y〉〈y|Y ⊗ |φx

y〉〈φxy |A. (4.11.78)

Then, we have thatρXA = TrY[ωXYA]. (4.11.79)

Therefore, by the data-processing inequality for the generalized divergence,we find that

I(X; B)N(ρ) = infσB

D(NA→B(ρXA)‖ρX ⊗ σB) (4.11.80)

= infσB

D(TrY[NA→B(ωXYA)]‖TrY[ωXY]⊗ σB) (4.11.81)

≤ infσB

D(NA→B(ωXYA)‖ωXY ⊗ σB) (4.11.82)

= I(XY; B)N(ω). (4.11.83)

Since ωXYA is a classical–quantum state with pure states, it holds that

I(XY; B)N(ω) ≤ supτZA

I(Z; B)N(τ). (4.11.84)

395

Page 407: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Therefore,

χ(N) = supρXA

I(X; B)NA→B(ρXA)≤ sup

τZA

I(Z; B)NA→B(τZA)(4.11.85)

which means thatχ(N) = sup

τZA

I(Z; B)NA→B(τZA), (4.11.86)

as required.

When the underlying generalized divergence is continuous, the fact thatthe alphabet X of the classical–quantum states τZA need not exceed d2 ele-ments is due to the Fenchel–Eggleston–Carathéodory Theorem (Theorem 2.17)and the fact that dimension of the space of density operators on a d-dimensionalspace is d2.

The information measures for channels on which we primarily focus inthis book are those based on the following generalized divergences: the quan-tum relative entropy, the Petz–Rényi relative entropy, the sandwiched Rényirelative entropy, and the hypothesis testing relative entropy. Specifically, giv-en a channel NA→B, we are interested in the following mutual informationquantities. In each case, ψRA is a pure state with the dimension of the sys-tem R the same as that of A, the state ωRB = NA→B(ψRA), and σB is a state.

1. ε-hypothesis testing mutual information of N, defined for ε ∈ [0, 1] as

IεH(N) := sup

ψRA

IεH(R; B)ω, (4.11.87)

whereIεH(A; B)ρ := inf

σBDε

H(ρAB‖ρA ⊗ σB) (4.11.88)

is the ε-hypothesis testing mutual information of the bipartite state ρAB.

2. Petz–Rényi mutual information of N:

Iα(N) := supψRA

Iα(R; B)ω ∀ α ∈ [0, 1) ∪ (1, 2], (4.11.89)

whereIα(A; B)ρ := inf

σBDα(ρAB‖ρA ⊗ σB) (4.11.90)

is the Petz–Rényi mutual information of the bipartite state ρAB.

396

Page 408: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

3. Sandwiched Rényi mutual information of N:

Iα(N) := supψRA

Iα(R; B)ω ∀ α ∈ [1/2, 1) ∪ (1, ∞), (4.11.91)

whereIα(A; B)ρ := inf

σBDα(ρAB‖ρA ⊗ σB) (4.11.92)

is the sandwiched Rényi mutual information of the bipartite state ρAB.

For each of these quantities, we are also interested in the special case ofclassical–quantum states. If X is a finite alphabet with corresponding |X|-dimensional system X, ρx

Ax∈X is a set of states, p : X → [0, 1] a prob-ability distribution on X, and ρXA = ∑x∈X p(x)|x〉〈x|X ⊗ ρx

A, then for ev-ery channel N we define the following quantities. In each case, we defineωXB := NA→B(ρXA).

1. ε-hypothesis testing Holevo information of N, defined for all ε ∈ [0, 1] as

χεH(N) := sup

ρXA

IεH(X; B)ω. (4.11.93)

2. Petz–Rényi Holevo information of N:

χα(N) := supρXA

Iα(X; B)ω ∀ α ∈ [0, 1) ∪ (1, 2]. (4.11.94)

3. Sandwiched Rényi Holevo information of N:

χα(N) := supρXA

Iα(X; B)ω ∀ α ∈ [1/2, 1) ∪ (1, ∞). (4.11.95)

We are also interested in the corresponding coherent information quanti-ties. In each case, ψRA is a pure state with the dimension of the system R thesame as that of A, the state ωRB = NA→B(ψRA), and σB is a state. The coher-ent information quantities are defined as follows for every quantum channelNA→B:

1. ε-hypothesis testing coherent information of N, defined for all ε ∈ [0, 1] as

Ic,εH (N) := sup

ψRA

IεH(R〉B)ω, (4.11.96)

whereIεH(A〉B)ρ := inf

σBDε

H(ρAB‖1A ⊗ σB) (4.11.97)

is the ε-hypothesis testing coherent information of the bipartite state ρAB.

397

Page 409: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

2. Petz–Rényi coherent information of N:

Icα(N) := sup

ψRA

Iα(R〉B)ω ∀ α ∈ [0, 1) ∪ (1, 2], (4.11.98)

whereIα(A〉B)ρ := inf

σBDα(ρAB‖1A ⊗ σB) (4.11.99)

is the Petz–Rényi coherent information of the bipartite state ρAB.

3. Sandwiched Rényi coherent information of N:

Icα(N) := sup

ψRA

Iα(R〉B)ω ∀ α ∈ [1/2, 1) ∪ (1, ∞), (4.11.100)

whereIα(A〉B)ρ := inf

σBDα(ρAB‖1A ⊗ σB) (4.11.101)

is the sandwiched Rényi coherent information of the bipartite state ρAB.

For all of the quantities defined above, we define the corresponding quan-tities based on the quantum relative entropy by taking the limit α → 1. Thekey such quantities of interest in this book are the following:

1. Mutual information of N, denoted by I(N) and defined as

I(N) := supψRA

I(R; B)ω, (4.11.102)

where ωRB = NA→B(ψRA), and we recall from (4.2.95) that the mutualinformation I(A; B)ρ of a bipartite state ρAB is given by

I(A; B)ρ = H(A)ρ + H(B)ρ − H(AB)ρ (4.11.103)= D(ρAB‖ρA ⊗ ρB) (4.11.104)= inf

σBD(ρAB‖ρA ⊗ σB), (4.11.105)

where the optimization in the last line is over states σB.

2. Holevo information of N, denoted by χ(N) and defined as

χ(N) := supρXA

I(X; B)ω, (4.11.106)

where ωRB = NA→B(ρXA), and the supremum is over all classical-quan-tum states of the form ρXA = ∑x∈X p(x)|x〉〈x|X ⊗ ρx

A, with X a finite al-phabet with associated |X|-dimensional system X, ρx

Ax∈X a set of states,and p : X→ [0, 1] a probability distribution on X.

398

Page 410: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

3. Coherent information of N, denoted by Ic(N) and defined as

Ic(N) := supψRA

I(R〉B)ω, (4.11.107)

where ωRB = NA→B(ψRA), and we recall from (4.2.90) that the coherentinformation I(A〉B)ρ of a bipartite state ρAB is given by

I(A〉B)ρ = H(B)ρ − H(AB)ρ (4.11.108)= D(ρAB‖1A ⊗ ρB) (4.11.109)= inf

σBD(ρAB‖1A ⊗ σB), (4.11.110)

where the optimization in the last line is over states σB.

Using (4.11.108), we can write the coherent information of the channel Nas

Ic(N) = supψRA

H(B)ω − H(RB)ω. (4.11.111)

Given a Stinespring representation of N, so that N(ρ) = TrE[VρV†], whereV : HA → HB⊗HE is an isometry such that dE ≥ rank(ΓN), observe that

H(RB)ω = H(E)τ, (4.11.112)

where τE = Nc(ψA). This is due to the fact that the state φRBE := VψRAV†

is pure, meaning that ωRB and τE = TrRB[VψRAV†] = Nc(ψA) have thesame spectrum. Furthermore, the state on which H(B)ω is evaluated isequal to N(ψA). Therefore, we have that

Ic(N) = supρH(N(ρ))− H(Nc(ρ)), (4.11.113)

where the optimization is over all states ρ.

4.11.1 Simplified Formulas for Rényi Information Measures

In this section, we provide some simplified formulas for the Petz–Rényi in-formation quantities for general bipartite states and for all Rényi informationquantities for pure bipartite states.

399

Page 411: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.85 Quantum Sibson Identities

Let ρAB be a bipartite state. Then the Petz–Rényi mutual and coherentinformations simplify as follows for all α ∈ (0, 1) ∪ (1, ∞):

Iα(A; B)ρ =α

α− 1log2 Tr

[(TrA[ρ

αABρ1−α

A ]) 1

α

], (4.11.114)

Iα(A〉B)ρ =α

α− 1log2 Tr

[(TrA[ρ

αAB])

]. (4.11.115)

PROOF: We show both identities with a unified approach. Let τA be a positivesemi-definite operator, let σB be a state, and let ωB(α) denote the followingstate:

ωB(α) :=

(TrA[ρ

αABτ1−α

A ]) 1

α

Tr[(

TrA[ραABτ1−α

A ]) 1

α

] , (4.11.116)

so that(

TrA[ραABτ1−α

A ]) 1

α= Tr

[(TrA[ρ

αABτ1−α

A ]) 1

α

]·ωB(α) (4.11.117)

We first prove that

Dα(ρAB‖τA ⊗ σB) = Dα(ρAB‖τA ⊗ωB(α)) + Dα(ωB(α)‖σB) (4.11.118)≥ Dα(ρAB‖τA ⊗ωB(α)), (4.11.119)

where the inequality follows because Dα(ωB(α)‖σB) ≥ 0 for all states. Con-sider that

Qα(ρAB‖τA ⊗ σB)

= Tr[ραAB (τA ⊗ σB)

1−α] (4.11.120)

= Tr[ραAB(τ

1−αA ⊗ σ1−α

B )] (4.11.121)

= TrB[TrA[ραAB(τ

1−αA ⊗ σ1−α

B )]] (4.11.122)

= Tr[TrA[ραABτ1−α

A ]σ1−αB ] (4.11.123)

= Tr

[(Tr[(

TrA[ραABτ1−α

A ]) 1

α

]·ωB(α)

σ1−αB

](4.11.124)

=

(Tr[(

TrA[ραABτ1−α

A ]) 1

α

])α

· Tr[ωB(α)

ασ1−αB

]. (4.11.125)

400

Page 412: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Applying the function (·)→ 1α−1 log2(·) to both sides, we conclude that

Dα(ρAB‖τA ⊗ σB) =

α

α− 1log2 Tr

[(TrA[ρ

αABτ1−α

A ]) 1

α

]+ Dα(ωB(α)‖σB). (4.11.126)

Now consider that

Qα(ρAB‖τA ⊗ωB(α))

= Tr[ραAB(τ

1−αA ⊗ωB(α)

1−α)] (4.11.127)

= Tr[TrA[ραABτ1−α

A ]ωB(α)1−α] (4.11.128)

= Tr

TrA[ραABτ1−α

A ]

(TrA[ρ

αABτ1−α

A ]) 1

α

Tr[(

TrA[ραABτ1−α

A ]) 1

α

]

1−α

(4.11.129)

=

(Tr[(

TrA[ραABτ1−α

A ]) 1

α

])α−1

Tr[

TrA[ραABτ1−α

A ](

TrA[ραABτ1−α

A ]) 1−α

α

]

(4.11.130)

=

(Tr[(

TrA[ραABτ1−α

A ]) 1

α

])α−1

Tr[(

TrA[ραABτ1−α

A ]) 1

α

](4.11.131)

=

(Tr[(

TrA[ραABτ1−α

A ]) 1

α

])α

. (4.11.132)

Now applying the function (·)→ 1α−1 log2(·) to both sides, we conclude that

Dα(ρAB‖τA ⊗ωB(α)) =α

α− 1log2 Tr

[(TrA[ρ

αABτ1−α

A ]) 1

α

]. (4.11.133)

So this establishes (4.11.118). We then conclude from (4.11.119) that

infσB

Dα(ρAB‖τA ⊗ σB) = Dα(ρAB‖τA ⊗ωB(α)), (4.11.134)

because the lower bound in (4.11.119) is achieved by picking σB = ωB(α). Sothis establishes that

infσB

Dα(ρAB‖τA ⊗ σB) =α

α− 1log2 Tr

[(TrA[ρ

αABτ1−α

A ]) 1

α

]. (4.11.135)

We conclude the formula in (4.11.114) by setting τA = ρA, and we concludethe formula in (4.11.115) by setting τA = 1A.

401

Page 413: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

Proposition 4.86 Rényi Information Measures for Pure BipartiteStates

Let ψAB be a pure bipartite state, and let α ∈ (0, 1) ∪ (1, ∞). Then thePetz–, sandwiched, and geometric Rényi mutual informations simplifyas follows:

Iα(A; B)ψ = 2H2−αα(A)ψ, (4.11.136)

Iα(A; B)ψ = 2H 12α−1

(A)ψ, (4.11.137)

Iα(A; B)ψ = 2H0(A)ψ, (4.11.138)

where the Rényi entropy Hα(A) is defined in (4.4.3). The Petz–, sand-wiched, and geometric Rényi coherent informations simplify as follows:

Iα(A〉B)ψ = H 1α(A)ψ, (4.11.139)

Iα(A〉B)ψ = H α2α−1

(A)ψ, (4.11.140)

Iα(A〉B)ψ = H12(A)ψ. (4.11.141)

PROOF: We start by proving (4.11.136). We assume that ψAB is in its Schmidtform, so that without loss of generality the Hilbert spaces for systems A andB are isomorphic and each have dimension equal to the Schmidt rank of ψAB.With ΓAB the maximally entangled operator with local bases chosen to match

those from the Schmidt decomposition, we have that ψAB = ψ12AΓABψ

12A, where

ψA = TrB[ψAB]. We apply the Sibson identity in (4.11.114) to find that

2α−1

α Iα(A;B)ψ = Tr[(

TrA[ψαABψ1−α

A ]) 1

α

]= Tr

[(TrA[ψABψ1−α

A ]) 1

α

](4.11.142)

= Tr

[(TrA[ψ

12AΓABψ

12Aψ1−α

A ]

) 1α

](4.11.143)

= Tr

[(TrA[ΓABψ

12Aψ1−α

A ψ12A]

) 1α

](4.11.144)

= Tr[(

TrA[ΓABψ2−αA ]

) 1α

](4.11.145)

= Tr[(

TrA[ΓAB(ψTB)

2−α]) 1

α

]= Tr

[((ψT

B)2−α) 1

α

](4.11.146)

402

Page 414: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= Tr[

ψ2−α

αB

]= Tr

2−αα

A

]. (4.11.147)

The fourth equality follows from cyclicity of partial trace and the sixth followsfrom the transpose trick in (2.2.30). Rearranging the first and last lines gives

Iα(A; B)ψ =α

α− 1log2 Tr

2−αα

A

](4.11.148)

= 2

(1

1− 2−αα

)log2 Tr

2−αα

A

](4.11.149)

= 2H2−αα(A)ψ. (4.11.150)

We now prove (4.11.139). It follows from the Sibson identity in (4.11.115):

2α−1

α Iα(A〉B)ψ = Tr[(TrA[ψ

αAB])

]= Tr

[(TrA[ψAB])

](4.11.151)

= Tr[(ψB)

]= Tr

1αA

]. (4.11.152)

Rearranging this gives

Iα(A〉B)ψ =α

α− 1log2 Tr

1αA

]=

11− 1

α

log2 Tr[

ψ1αA

]= H 1

α(A)ψ. (4.11.153)

We now prove (4.11.137). Consider, for an arbitrary state σB, that

Qα(ψAB‖ψA ⊗ σB)

= Tr[(

ψ12AB (ψA ⊗ σB)

1−αα ψ

12AB

)α](4.11.154)

= Tr[(|ψ〉〈ψ|AB (ψA ⊗ σB)

1−αα |ψ〉〈ψ|AB

)α]

(4.11.155)

=(〈ψ|AB (ψA ⊗ σB)

1−αα |ψ〉AB

)αTr[|ψ〉〈ψ|αAB] (4.11.156)

=(〈ψ|AB (ψA ⊗ σB)

1−αα |ψ〉AB

)α(4.11.157)

=

(〈Γ|ABψ

12A

1−αα

A ⊗ σ1−α

αB

12A|Γ〉AB

(4.11.158)

=

(〈Γ|AB

1αA ⊗ σ

1−αα

B

)|Γ〉AB

(4.11.159)

403

Page 415: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

=

(〈Γ|AB

1αA [TA(σA)]

1−αα ⊗ 1B

)|Γ〉AB

(4.11.160)

=

(Tr[

ψ1αA [TA(σA)]

1−αα

])α

. (4.11.161)

Now applying the function (·)→ 1α−1 log2(·) to both sides, we conclude that

Dα(ψAB‖ψA ⊗ σB) =α

α− 1log2 Tr

1αA [TA(σA)]

1−αα

], (4.11.162)

and applying Proposition 2.12, we conclude that

Iα(A; B)ψ = infσB

Dα(ψAB‖ψA ⊗ σB) (4.11.163)

= infσA

α

α− 1log2 Tr

1αA [TA(σA)]

1−αα

](4.11.164)

α− 1log2

∥∥∥∥ψ1αA

∥∥∥∥α

2α−1

(4.11.165)

α− 1log2

(Tr

[(ψ

1αA

) α2α−1]) 2α−1

α

(4.11.166)

α− 12α− 1

αlog2 Tr

[(ψ

1αA

) α2α−1]

(4.11.167)

=2α− 1α− 1

log2 Tr[

ψ1

2α−1A

](4.11.168)

= 2

(1

1− 12α−1

)log2 Tr

12α−1A

](4.11.169)

= 2H 12α−1

(A)ψ. (4.11.170)

The third equality follows from Proposition 2.12.

We now prove (4.11.140):

Qα(ψAB‖1A ⊗ σB)

= Tr[(

ψ12AB (1A ⊗ σB)

1−αα ψ

12AB

)α](4.11.171)

= Tr[(|ψ〉〈ψ|AB

(1A ⊗ σ

1−αα

B

)|ψ〉〈ψ|AB

)α](4.11.172)

404

Page 416: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

=

(〈ψ|AB

(1A ⊗ σ

1−αα

B

)|ψ〉AB

Tr[(|ψ〉〈ψ|AB)

α] (4.11.173)

=

(〈ψ|AB

(1A ⊗ σ

1−αα

B

)|ψ〉AB

(4.11.174)

=

(〈Γ|AB

(ψA ⊗ σ

1−αα

B

)|Γ〉AB

(4.11.175)

=(〈Γ|AB

(ψA [TA(σA)]

1−αα ⊗ 1B

)|Γ〉AB

)α(4.11.176)

=(

Tr[ψA [TA(σA)]

1−αα

])α. (4.11.177)

Then consider that

Iα(A〉B)ψ = infσB

1α− 1

log2 Qα(ψAB‖1A ⊗ σB) (4.11.178)

= infσA

1α− 1

log2

(Tr[ψA [TA(σA)]

1−αα

])α(4.11.179)

α− 1log2 ‖ψA‖ α

2α−1(4.11.180)

α− 1log2

(Tr[

ψα

2α−1A

]) 2α−1α

(4.11.181)

α− 12α− 1

αlog2 Tr

α2α−1A

](4.11.182)

=2α− 1α− 1

log2 Tr[

ψα

2α−1A

](4.11.183)

=1

1− α2α−1

log2 Tr[

ψα

2α−1A

](4.11.184)

= H α2α−1

(A)ψ. (4.11.185)

The third equality follows from Proposition 2.12.

We prove (4.11.138). Let σB be a state with the same support as ψB. Recallthe formula in Proposition 4.41 for the geometric Rényi relative entropy whenthe state ρ is pure. We use this to conclude that

Dα(ψAB‖ψA ⊗ σB) = log2〈ψ|AB (ψA ⊗ σB)−1 |ψ〉AB (4.11.186)

= log2〈ψ|AB

(ψ−1

A ⊗ σ−1B

)|ψ〉AB (4.11.187)

= log2〈Γ|ABψ12A

(ψ−1

A ⊗ σ−1B

12A|Γ〉AB (4.11.188)

405

Page 417: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

= log2〈Γ|AB

(1A ⊗ σ−1

B

)|Γ〉AB (4.11.189)

= log2 Tr[σ−1

B

]. (4.11.190)

Now consider that the minimum value of infσB Tr[σ−1

B

]occurs when σB is the

maximally mixed state πB. This follows from using the Lagrange multipliermethod (or alternatively infσB Tr

[σ−1

B

]can be evaluated as d2

B by applyingProposition 2.12 again, with an implicit identity operator acting on the sup-port of ψB). We then conclude that

Iα(A; B)ψ = infσB

log2 Tr[σ−1

B

]= log2 Tr

[π−1

B

](4.11.191)

= 2 log2 rank(ψA) = 2H0(A)ψ. (4.11.192)

We finally prove (4.11.141):

Dα(ψAB‖1A ⊗ σB) = log2〈ψ|AB (1A ⊗ σB)−1 |ψ〉AB (4.11.193)

= log2〈ψ|AB

(1A ⊗ σ−1

B

)|ψ〉AB (4.11.194)

= log2〈Γ|AB

(ψA ⊗ σ−1

B

)|Γ〉AB (4.11.195)

= log2〈Γ|AB

(1A ⊗ σ−1

B TB(ψB))|Γ〉AB (4.11.196)

= log2 Tr[σ−1

B TB(ψB)]

. (4.11.197)

Now applying Proposition 2.12, we conclude that

Iα(A〉B)ψ = infσB

Dα(ψAB‖1A ⊗ σB) = infσB

log2 Tr[σ−1

B TB(ψB)]

(4.11.198)

= log2 ‖TB(ψB)‖ 12= log2 ‖ψB‖ 1

2(4.11.199)

= log2 ‖ψA‖ 12= H1

2(A)ψ. (4.11.200)

This concludes the proof.

4.11.2 Remarks on Defining Channel Quantities from StateQuantities

Observe that all of the generalized information measures for quantum chan-nels given in Definition 4.82, as well as all of the channel information mea-

406

Page 418: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

sures given above for specific generalized divergences, are defined in a com-mon manner. Specifically, given a function f : D(HAB) → R for bipartitestates, the corresponding function f for quantum channels4 is defined as

f (N) := supρRA

f (R; B)ω, (4.11.201)

where NA→B is a quantum channel, ρRA is a quantum state, with the dimen-sion of R unbounded, and ωRB = NA→B(ρRA). In other words, we definethe channel quantity by taking a quantum state ρRA of a bipartite system con-sisting of the input system A of the channel and a reference system R (whosedimension is in general unbounded), passing A through the channel, thenevaluating the state quantity on the output state NA→B(ρRA). We then opti-mize over all states ρRA.

A similar principle as in (4.11.201) has been used in Definition 4.78 for thegeneralized divergence between two quantum channels. In particular, if wehave a function f : D(H) × L+(H) → R ∪ +∞ on two quantum states,5

then we define a corresponding quantity f for two quantum channels as

f (N,M) := supρRA

f (NA→B(ρRA),MA→B(ρRA)), (4.11.202)

where NA→B and MA→B are quantum channels and ρRA is a quantum state,with the dimension of R unbounded. In other words, we define the chan-nel quantity by evaluating the state quantity on the states NA→B(ρRA) andMA→B(ρRA) and optimizing over all states ρRA. We have already seen thisprinciple being used in Chapter 3 to define the diamond distance (Defini-tion 3.68) and fidelity (Definition 3.73) of two quantum channels, in whichcase the state quantity f is the trace distance or fidelity.

In both (4.11.201) and (4.11.202), properties of the underlying state quan-tity (namely, the data-processing inequality), as well as the Schmidt decompo-sition theorem, allow us to restrict the optimizations in (4.11.201) and (4.11.202)to pure states |ψ〉RA without loss of generality, with the dimension of R equalto the dimension of A. If the underlying state quantity f in (4.11.201) is in-variant with respect to local unitaries, then we can use this simplification towrite f (N) as

f (N) = supρA

f (ρA,NA→B), (4.11.203)

4In a slight abuse of notation, we use the same letter to denote the channel quantity as thestate quantity.

5We allow for the second argument to be a positive semi-definite operator more generally.

407

Page 419: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

where the optimization is now only over states ρA for the input system A forthe channel N, and

f (ρA,NA→B) := f (A; B)ω, ωAB =√

ρAΓNAB√

ρA. (4.11.204)

This holds due to (2.2.29), which states for every purification |ψρ〉AA′ of ρAthere exists an operator YA such that (YA ⊗ 1A′)|Γ〉AA′ = |ψρ〉AA′ . Then, bythe polar decomposition (Theorem 2.2), and the fact that YAY†

A = ρA, it holdsthat YA = UA

√ρA for some unitary UA. Finally, using the definition of the

Choi representation ΓNAB and the unitary invariance of f , we obtain (4.11.203).

This equivalent formulation of the channel quantity f (N) has been used in(4.11.113) for the coherent information of a channel.

If the underlying state quantity in (4.11.202) is unitarily invariant, then byusing the same reasoning as above we can write f (N,M) in a form analogousto (4.11.203):

f (N,M) = supρA

f (√

ρAΓNAB√

ρA,√

ρAΓMAB√

ρA). (4.11.205)

4.12 Summary

In this chapter, we studied various entropic quantities, starting with quantumrelative entropy. We proved many of its most important properties, and wesaw that it acts as a parent quantity for well-known quantities such as vonNeumann entropy, quantum conditional entropy, quantum mutual informa-tion and conditional mutual information, and coherent information. We thenstudied the Petz–Rényi, sandwiched Rényi, geometric Rényi, and hypothe-sis testing relative entropies, and we proved many of their most importantproperties.

The unifying concept of this chapter is that of generalized divergence. Ageneralized divergence is a function D : D(H)×L+(H)→ R that satisifes thedata-processing inequality: for every state ρ, positive semi-definite operatorσ, and quantum channel N,

D(ρ‖σ) ≥ D(N(ρ)‖N(σ)). (4.12.1)

This inequality holds for all of the quantum relative entropies that we con-sidered in this chapter, and many of their important properties (such as jointconvexity) can be derived using it. The data-processing inequality is a core

408

Page 420: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

concept in information theory, and it underlies virtually all of the results thatwe present in this book.

At the end of the chapter, we defined information measures for quantumchannels. Given a state information measure (or, more generally, a general-ized divergence), we define an information measure for channels in a manneranalogous to the way that the diamond norm (a generalized divergence forchannels) is defined from the trace norm (a generalized divergence for states):we send one share of a bipartite pure state through the channel, evaluate thestate measure, and then optimize over all input states. All of the upper andlower bounds on communication capacities that we present in this book aregiven in terms of channel information measures defined in this way.

4.13 Bibliographic Notes

The von Neumann quantum entropy was originally defined by von Neu-mann (1927b). The quantum conditional entropy was considered implicitlyby Lieb and Ruskai (1973a,b), proposed as a quantum-information theoreticquantity of interest by Cerf and Adami (1997), and the coherent informationthereafter by Schumacher and Nielsen (1996). The modern definition of thequantum mutual information was proposed by Stratonovich (1965). Its non-negativity was proved by Molière and Delbrück (1935); Lanford and Robin-son (1968). The quantum conditional mutual information was considered im-plicitly by Lanford and Robinson (1968); Lieb and Ruskai (1973a,b) and wasproposed as a quantum information-theoretic quantity of interest by Cerf andAdami (1997, 1998). Chain rules for quantum conditional entropy and infor-mation were employed by Cerf and Adami (1998).

The definition of the quantum relative entropy as presented in Defini-tion 4.1 is due to Umegaki (1962). It took many years after this until the paperby Hiai and Petz (1991) was published, which solidified the operational in-terpretation of the “Umegaki quantum relative entropy” in terms of quantumhypothesis testing and the quantum Stein’s lemma. The strong converse forthe quantum Stein’s lemma was established by Ogawa and Nagaoka (2000).

The non-negativity of quantum relative entropy follows as a consequenceof an inequality by Klein (1931), and its data-processing inequality for quan-tum channels was established by Lindblad (1975). The data-processing in-equality for the quantum relative entropy under partial trace was established

409

Page 421: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

by Lieb and Ruskai (1973a), from which its joint convexity follows. The ex-pression in (4.2.99) for the quantum conditional mutual information was givenimplicitly by Lieb and Ruskai (1973a). Strong subadditivity of quantum en-tropy (or equivalently, non-negativity of quantum conditional mutual infor-mation) was established by Lieb and Ruskai (1973a,b). The data-processinginequality for the quantum conditional mutual information under local chan-nels was established by Christandl and Winter (2004). The uniform continuitybound for quantum conditional mutual information, as presented in Proposi-tion 4.8, was established by Shirokov (2017).

The notion of generalized divergence in the classical case was proposed byPolyanskiy and Verdú (2010), and in the quantum case by Sharma and Warsi(2012). Proposition 4.14 was established by Wilde et al. (2014); Tomamichelet al. (2017). The generalized mutual information and conditional quantumentropy were proposed by Sharma and Warsi (2012).

The Petz–Rényi relative entropy was proposed by Petz (1985, 1986a), whereinits data-processing inequality was established for the case of partial trace.The general definition of Petz–Rényi relative entropy incorporating supportconditions and its data-processing inequality for general channels was estab-lished by Tomamichel et al. (2009), along with several of its other propertiessuch as monotonicity in α. The expression in (4.4.23) for the Petz–Rényi rela-tive entropy was presented by Tomamichel et al. (2009); Sharma (2010).

The sandwiched Rényi relative entropy was independently proposed byMüller-Lennert et al. (2013) and Wilde et al. (2014), and the alternative ex-pression in (4.5.5) was given by Dupuis and Wilde (2016). The variationalexpression in (4.5.6), as well as Proposition 4.27, are due to Müller-Lennertet al. (2013). Proposition 4.28 was established by Müller-Lennert et al. (2013);Wilde et al. (2014). The fact that sandwiched Rényi relative entropy is mono-tone with respect to α is due to Müller-Lennert et al. (2013), with an indepen-dent proof for α > 1 by Beigi (2013). The inequality in (4.5.45) for α > 1 is dueto Wilde et al. (2014), as a direct consequence of the inequality by Lieb andThirring (1976). The inequality in (4.5.45) for α ∈ (0, 1) is due to Datta andLeditzky (2014), as a direct consequence of the Araki–Lieb–Thirring inequal-ities by Lieb and Thirring (1976); Araki (1990). The “reverse” Araki–Lieb–Thirring inequality, which leads to the inequality in (4.5.46), was proved byIten et al. (2017). The data-processing inequality for the sandwiched Rényirelative entropy was established in a number of papers for various parameterranges of α: Müller-Lennert et al. (2013); Wilde et al. (2014); Frank and Lieb(2013); Beigi (2013); Mosonyi and Ogawa (2015), being established for the full

410

Page 422: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

range α ∈ [1/2, ∞] by Frank and Lieb (2013). The proof that we presented hereis due to Wilde (2018b). Counterexamples to data processing for the sand-wiched Rényi relative entropy in the range α ∈ (0, 1/2) were given by Bertaet al. (2017).

The geometric Rényi relative entropy has its roots in work of Petz andRuskai (1998), and it was further developed by Matsumoto (2013, 2018). Seealso (Tomamichel, 2015; Hiai and Mosonyi, 2017) for other expositions. Itwas given the name “geometric Rényi relative entropy” by Fang and Fawzi(2019) because it is a function of the matrix geometric mean of its arguments.See, e.g., Lawson and Lim (2001) for a review of matrix geometric means.Proposition 4.38 was established by Katariya and Wilde (2020), with rootsin the earlier work of Matsumoto (2014a,b). In particular, the expressionTr[σ(σ−1/2ρσ−1/2)α] for α = 1/2 and supp(ρ) 6⊆ supp(σ) was identifiedin (Matsumoto, 2014a, Section 3) and later generalized to all α ∈ (0, 1) in(Matsumoto, 2014b, Section 2). Lemma 4.39 and Proposition 4.41 were alsoestablished by Katariya and Wilde (2020), along with monotoniticity in αfor all α ∈ (0, 1) ∪ (1, ∞) (in Proposition 4.42). The inequality in Proposi-tion 4.40 was established for the interval α ∈ (0, 1) ∪ (1, 2] in Tomamichel(2015) (by making use of a general result in Matsumoto (2013, 2018)) andfor the full interval α ∈ (1, ∞) in Wang et al. (2019a). Here, we have fol-lowed the approach of Wang et al. (2019a) and offered a unified proof interms of the Araki–Lieb–Thirring inequality Araki (1990); Lieb and Thirring(1976). The first inequality in Proposition 4.47 was established for α ∈ (1, 2]in Wilde et al. (2014) and for α ∈ (0, 1) in Datta and Leditzky (2014), by em-ploying the Araki–Lieb–Thirring inequality Araki (1990); Lieb and Thirring(1976). The second inequality was established by Matsumoto (2013, 2018)and reviewed by Tomamichel (2015). Data processing was established by anoperator-theoretic approach in Petz and Ruskai (1998) and by an operationalmethod in Matsumoto (2013, 2018). The operator-theoretic approach takenhere has its roots in (Hiai and Petz, 1991, Proposition 2.5) and was reviewedin (Hiai and Mosonyi, 2017, Corollary 3.31). The interpretation of geomet-ric Rényi relative entropy given in Proposition 4.46 was discovered by Mat-sumoto (2013, 2018). Lemma 4.48 was presented by Katariya and Wilde (2020)and is based on (Zhou and Jiang, 2019, Lemma 3).

Belavkin and Staszewski (1982) discovered the quantum generalization ofthe classical relative entropy given in Section 4.7, now known as the Belavkin–Staszewski relative entropy. Matsumoto (2013, 2018) showed that the Belavkin–Staszewski relative entropy is the limit of the geometric Rényi relative en-tropy as α → 1. The proof given here was presented by Katariya and Wilde

411

Page 423: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

(2020). Hiai and Petz (1991) found the inequality in Proposition 4.51 relat-ing the quantum relative entropy to the Belavkin–Staszewski relative entropy(the proof given here is due to Katariya and Wilde (2020)). Hiai and Petz(1991) established the data-processing inequality for the Belavkin–Staszewskirelative entropy, by a method different from that given here. Matsumoto(2013, 2018) found the interpretation of the Belavkin–Staszewski relative en-tropy given in Proposition 4.55.

The max-relative entropy was proposed by Datta (2009b) as a quantuminformation-theoretic quantity of interest. Datta also established many basicinformation processing properties of the max-relative entropy and studied itsrole in quantum hypothesis testing and entanglement theory Datta (2009b,a).Proposition 4.58 is due to Müller-Lennert et al. (2013); Katariya and Wilde(2020). Proposition 4.61 is due to Wang and Wilde (2019b) and Anshu et al.(2019).

The conditional min-entropy was defined by Renner (2005), as well as thesmooth conditional min-entropy. The operational interpretations of the con-ditional min- and max-entropies were examined by Koenig et al. (2009).

The hypothesis testing relative entropy was studied implicitly by a varietyof authors for a long time in the context of quantum hypothesis testing: Hiaiand Petz (1991); Ogawa and Nagaoka (2000); Hayashi and Nagaoka (2003);Nagaoka (2006); Hayashi (2007). It was proposed as a quantum information-theoretic quantity of interest by Buscemi and Datta (2010a) (in the contextof operator smoothing of a one-shot entropic quantity), and given the namehypothesis testing relative entropy by Wang and Renner (2012), wherein itsconnection to classical communication was explored (see also Hayashi andNagaoka (2003)). An alternate proof of the data-processing inequality forquantum relative entropy in terms of hypothesis testing was given by Bje-lakovic and Siegmund-Schultze (2003). Various properties of the hypothesistesting relative entropy were established by Dupuis et al. (2013) and Dattaet al. (2016), including the fact that it can be written as an SDP (see alsoWang and Wilde (2019b) in this context). Eq. (4.9.36) was established by Wangand Wilde (2019b). Proposition 4.64 was considered by Helstrom (1976) anddiscussed more recently by Vazquez-Vilar (2016). A special case of Proposi-tion 4.67 was established by Wang and Renner (2012); Matthews and Wehner(2014). Proposition 4.68 was established by Cooney et al. (2016). Proposi-tion 4.69 is essentially due to Hayashi (2007), with a refinement by Audenaertet al. (2012) and a later rediscovery of it, formulated in a different way, by Qiet al. (2018). Proposition 4.77 is essentially due to Hayashi (2007).

412

Page 424: Principles of Quantum Communication Theory: A Modern Approach

Chapter 4: Quantum Entropies and Information

The generalized channel divergence of Definition 4.78 was proposed byLeditzky et al. (2018), and Proposition 4.81 was established as well by Led-itzky et al. (2018). The various generalized channel information measures canbe found in the papers of Wilde et al. (2014); Gupta and Wilde (2015), andthe related channel information measures based on hypothesis testing, Petz–Rényi, and sandwiched Rényi relative entropy are from Koenig and Wehner(2009); Sharma and Warsi (2012); Wilde et al. (2014); Gupta and Wilde (2015);Datta et al. (2016). The channel mutual information was defined by Adamiand Cerf (1997), the channel Holevo information by Schumacher and West-moreland (1997) (based on the Holevo quantity for ensembles Holevo (1973)),and the channel coherent information by Lloyd (1997). These papers togetherthus developed a general concept of promoting a measure of correlations in aquantum state to a measure of a channel’s ability to create the same correla-tions, by optimizing the state measure with respect to a (subset of) all of thestates that can be generated by means of the channel.

The review by Ruskai (2002) is helpful not only for understanding entropyinequalities in quantum information, but also for understanding the historyof developments with respect to quantum entropy and information. The bookof Tomamichel (2015) provides an exposition of Rényi relative entropies andtheir properties (see also Leditzky (2016)). The book of Wilde (2017) providesan overview of entropies in the von Neumann family, their properties, andthe derived channel information measures.

413

Page 425: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5

Entanglement Measures

In the previous chapter, we laid the foundation for analyzing quantum com-munication protocols by defining entropic quantities, such as the Petz– andsandwiched Rényi relative entropies, as well as information measures forquantum states and channels derived from these relative entropies. We nowuse these information measures to define entanglement measures for quan-tum states and channels. Given quantum systems A and B, an entanglementmeasure is a function E : D(HAB) → R that quantifies the amount of en-tanglement present in a state ρAB of these systems. The notion of “quanti-fying entanglement” is explained in Section 5.1 below, with the defining re-quirement of an entanglement measure being that it does not increase underchannels realized by local operations and classical communication (Defini-tion 3.37). We can think of this requirement of “LOCC monotonicity” as arestricted form of the data-processing inequality, but now applied to a singlebipartite state rather than to a pair of states. The data-processing inequalityindicates that the distinguishability of two states does not increase under theaction of the same quantum channel (Definition 4.13), whereas LOCC mono-tonicity indicates that entanglement does not increase under the action of anLOCC channel on a bipartite state.

Given an entanglement measure E for states, the corresponding entan-glement measure for channels is defined using the general principle in Sec-tion 4.11.2, which is to optimize the state measure with respect to all bipartitestates that can be shared between the sender and receiver of the channel bymaking use of the channel. We develop entanglement measures for channelsin Section 5.5, and these naturally quantify how much entanglement can begenerated by a channel connecting a sender to a receiver.

414

Page 426: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Entanglement measures feature prominently in the analysis of optimalrates for distillation and communication protocols. In particular, entangle-ment measures for states arise as upper bounds on their distillable entangle-ment and secret key, which we examine in Chapters 8 and 10, respectively.Entanglement measures for quantum channels arise as upper bounds on therates of quantum and private communication over a quantum channel, whichwe consider in Chapters 9 and 11, respectively, as well as for their feedback-assisted counterparts that we consider in Part III of this book.

Being a uniquely quantum-mechanical property, it is perhaps not surpris-ing that entanglement features prominently in quantum communication pro-tocols, both in the encoding and decoding of messages, as well as in the anal-ysis of their optimal rates. In fact, any state that is not entangled (i.e., sep-arable) is useless for entanglement and secret key distillation, and similarly,entanglement breaking channels are useless for quantum and private com-munication. The distinguishability of a given state from the set of separablestates, which we show in this chapter is an entanglement measure for states,can thus give an indication of how much entanglement or secret key can bedistilled from it. Similarly, for communication tasks over quantum channels,the distinguishability of a given quantum channel from the set of entangle-ment breaking channels, which we show in this chapter is an entanglementmeasure for channels, can be used to determine how good the channel is forquantum or private communication.

The rest of this chapter proceeds as follows. We start in Section 5.1 byformally defining what it means for a function E to be an entanglement mea-sure for bipartite states and by providing examples of entanglement mea-sures. In Section 5.2, we consider entanglement measures that quantify thedistinguishability of a given state ρAB from the set SEP(A : B) of separablestates, with the measure given by some generalized divergence D (see Defi-nition 4.13). We also consider in Section 5.3 a class of entanglement measuresbased on the distinguishability of a given state from the larger set PPT′ ⊃ SEP.In Section 5.4, we consider a different kind of entanglement measure for statescalled squashed entanglement. Finally, in Section 5.5, we extend all of thestate entanglement measures to those for channels.

415

Page 427: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

5.1 Definition and Basic Properties

Recall from Definition 3.8 that a bipartite state ρAB is called entangled if it isnot separable, meaning that it cannot be written in the following form

∑x∈X

p(x)τxA ⊗ωx

B, (5.1.1)

for some finite alphabet X, probability distribution p : X → [0, 1], and setsτx

Ax∈X and ωxBx∈X of states. Determining whether a given quantum state

ρAB is entangled is a fundamental problem in quantum information theory.In Section 3.1.1, in the discussion after Definition 3.8, we listed the followingcriteria for the entanglement of pure and mixed states:

• Schmidt rank criterion: A pure bipartite state ψAB is entangled if and onlyif its Schmidt rank is strictly greater than one.

• PPT criterion: If a bipartite mixed state ρAB has negative partial transpose(i.e., the partial transpose ρTB

AB has at least one negative eigenvalue), thenit is entangled. If both systems A and B are qubit systems or if one of thesystems is a qubit and the other a qutrit, then ρAB is entangled if and onlyif ρTB

AB has negative partial transpose.

In the case of mixed states, there is generally not a simple necessary and suf-ficient criterion to determine whether a given bipartite state is entangled, andin fact it is known that it is computationally difficult, in a precise sense, todecide if a state is entangled (please consult the Bibliographic Notes in Sec-tion 5.8).

In addition to determining whether or not a given quantum state is entan-gled, we are interested in quantifying the amount of entanglement present ina quantum state. Doing so allows us to compare quantum states based on theamount of entanglement present in them. An entanglement measure is a func-tion E : D(HAB) → R from the set of density operators acting on the Hilbertspace of a bipartite system to the set of real numbers, and it quantifies the en-tanglement of a quantum state ρAB ∈ D(HAB). (The formal definition of anentanglement measure is given in Definition 5.1.) To indicate the partitioningof the subsystems explicitly, we often write E(A; B)ρ instead of E(ρAB).

How exactly do we quantify entanglement? Suppose that we have a bi-partite state ρAB and we would like to quantify the entanglement between thesystems A and B. One fundamental observation is that the entanglement of

416

Page 428: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

ρAB cannot increase under the action of a local operations and classical com-munication (LOCC) channel (recall Definition 3.37). This is intuitive becauseentanglement is a non-local property of a bipartite state, and so local opera-tions alone do not increase it. Similarly, classical communication should onlyaffect the classical correlations between the two systems A and B, and not thequantum correlations, i.e., the entanglement.

Given the reasoning above, the defining property of an entanglement mea-sure E : D(HAB) → R for the quantum systems A and B is that it does notincrease under the action of an LOCC channel:

Definition 5.1 Entanglement Measure

We say that E(A; B)ρ is an entanglement measure for a bipartite stateρAB if the following inequality holds for every bipartite state ρAB andevery LOCC channel LAB→A′B′ that acts on ρAB:

E(A; B)ρ ≥ E(A′; B′)ω, (5.1.2)

where ωA′B′ := LAB→A′B′(ρAB).

This property of LOCC monotonicity is the most operational property ofan entanglement measure, in the sense that it relates directly to the distillationor communication tasks that we consider in this book, which involve LOCCbetween the two parties sharing a state ρAB or between sender and receiverat the terminals of a quantum channel, respectively. It is also analogous to thedata-processing inequality for generalized divergences.

The defining requirement of LOCC monotonicity implies that an entan-glement measure E takes on its minimum value on the set of separable states.To see this, recall from the discussion after Definition 3.8 that a separable statecan be prepared by LOCC. Thus, starting from an arbitrary state ρAB, Aliceand Bob can trace out their local systems A and B and perform LOCC to pre-pare a separable state σA′B′ . The serial concatenation of these two actions isitself an LOCC channel. Thus, it follows from the definition above that

E(A; B)ρ ≥ E(A′; B′)σ (5.1.3)

for every separable state σA′B′ . Now, given another separable state σ′A′′B′′ , itis possible to transform between σA′B′ and σ′A′′B′′ using LOCC, meaning thatE(A′; B′)σ ≥ E(A′′; B′′)σ′ and E(A′; B′)σ ≤ E(A′′; B′′)σ′ . Therefore,

E(A′; B′)σ = E(A′′; B′′)σ′ , (5.1.4)

417

Page 429: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

for all separable states σAB and σ′A′B′ . As a consequence, an entanglementmeasure E takes on its minimum value and is equal to a constant c ∈ R for allseparable states. It is often convenient and simpler if an entanglement mea-sure E is equal to zero for all separable states. If this is not the case, then wecan simplify redefine the entanglement measure as E′(A; B)ρ = E(A; B)ρ − c.By this reasoning and adjustment (if needed), every entanglement measure(as per Definition 5.1) satisfies the following two properties of non-negativityon all states and vanishing on separable states:

1. Non-negativity: E(ρAB) ≥ 0 for every state ρAB.

2. Vanishing for separable states: E(σAB) = 0 for every separable state σAB.

Other properties that are desirable for an entanglement measure E are asfollows:

1. Faithfulness: E(σAB) = 0 if and only if σAB is separable, so that E(ρAB) > 0if and only if ρAB is entangled.

2. Invariance under classical communication: For every finite alphabet X, prob-ability distribution p : X→ [0, 1] on X, and set ρx

ABx∈X of states, definethe following classical–quantum state:

ρXAB := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxAB. (5.1.5)

Then the entanglement measure E satisfies invariance under classicalcommunication if

E(XA; B)ρ = E(A; BX)ρ = ∑x∈X

p(x)E(A; B)ρx . (5.1.6)

This property has the interpretation of invariance under classical com-munication because the equality E(XA; B)ρ = E(A; BX)ρ indicates thatthe classical value x in register X can be communicated classically to Boband discarded locally, and the entanglement measure does not changeunder this action. Furthermore, the value of the entanglement is sim-ply the expected entanglement, where the expectation is calculated withrespect to the probability distribution p(x).

This property is also known as the “flags” property in the research liter-ature.

418

Page 430: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

3. Convexity: For every finite alphabet X, probability distribution p : X →[0, 1] on X, and set ρx

ABx∈X of states,

∑x∈X

p(x)E(ρxAB) ≥ E

(∑

x∈Xp(x)ρx

AB

). (5.1.7)

Convexity is an intuitive property of entanglement, and for entangle-ment measures that are invariant under classical communication (obey-ing (5.1.6)), it captures the idea that entanglement should not increase onaverage if classical information about the identity of a state is lost. (Infact, convexity is an immediate consequence of LOCC monotonicity andinvariance under classical communication.)

4. Additivity: The entanglement of a tensor-product state ρA1 A2B1B2 = τA1B1⊗ωA2B2 is the sum of the entanglement of the individual states in the tensorproduct:

E(A1A2; B1B2)τ⊗ω = E(A1; B1)τ + E(A2; B2)ω. (5.1.8)

If instead we have only that

E(A1A2; B1B2)τ⊗ω ≤ E(A1; B1)τ + E(A2; B2)ω (5.1.9)

for all states τA1B1 and ωA2B2 , then the entanglement measure E is subad-ditive.

5. Selective LOCC monotonicity: A property stronger than LOCC monotonic-ity is that E is non-increasing on average under an LOCC instrument. Inmore detail, let ρAB be a bipartite state, and let Lx

AB→A′B′x∈X be a col-lection of maps, such that L↔AB→A′B′ is an LOCC channel of the form:

L↔AB→A′B′ = ∑x∈X

LxAB→A′B′ , (5.1.10)

for some finite alphabet X and where each map LxAB→A′B′ is completely

positive such that the sum map L↔AB→A′B′ is trace preserving (i.e., a quan-tum channel). Furthermore, each map Lx

AB→A′B′ can be written in theform of (3.2.278), as follows:

LxAB→A′B′ = ∑

y∈YE

x,yA→A′ ⊗ F

x,yB→B′ , (5.1.11)

where Ex,yA→A′x∈X and Fx,y

B→B′x∈X are sets of completely positive maps.Set

p(x) := Tr[LxAB→A′B′(ρAB)], (5.1.12)

419

Page 431: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

and for x ∈ X such that p(x) 6= 0, set

ωxAB :=

1p(x)

LxAB→A′B′(ρAB). (5.1.13)

If the classical value of x is not discarded, then the given state ρAB istransformed to the ensemble (p(x), ωx

AB)x∈X via LOCC.

The entanglement measure E satisfies selective LOCC monotonicity if

E(ρAB) ≥ ∑x∈X:p(x) 6=0

p(x)E(ωxAB), (5.1.14)

for every ensemble (p(x), ωxAB)x∈X that arises from ρAB via LOCC as

specified above. Selective LOCC monotonicity indicates that entangle-ment does not increase on average under the action of LOCC. Many en-tanglement measures satisfy this stronger property.

Observe that selective LOCC monotonicity in (5.1.14) implies LOCC mono-tonicity in (5.1.2), simply because the alphabet X in (5.1.10) can consist ofonly one letter.

The entanglement measures that we consider in this chapter satisfy many ofthe properties listed above.

Given that we would like to quantify entanglement, it makes sense to askwhat the basic unit of entanglement should be. We take as our unit of en-tanglement the two-qubit maximally entangled Bell state |Φ〉 = 1√

2(|0, 0〉 +

|1, 1〉), and we thus say that the state |Φ〉 represents “one ebit.” A maximallyentangled state of Schmidt rank d is then referred to as having “log2 d ebits.”All of the entanglement measures that we consider in this chapter are equal toone for a two-qubit maximally entangled state, which is another justificationfor using it as the unit of entanglement.1 Similarly, for a maximally entangledstate of Schmidt rank d, all of the entanglement measures that we consider inthis chapter are equal to log2 d.

To close out this introductory section, we prove a lemma that helps toreduce the difficulty in determining whether a given function is an entangle-ment measure.

1This “normalization” condition is sometimes taken to be a requirement for an entangle-ment measure.

420

Page 432: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Lemma 5.2

Let E : D(HAB)→ R be a function that, for every bipartite state ρAB, is1. invariant under classical communication, as defined in (5.1.6), and

2. obeys data processing under local channels, in the sense that

E(A; B)ρ ≥ E(A′; B′)ω, (5.1.15)

for all channels NA→A′ and MB→B′ , where

ωA′B′ := (NA→A′ ⊗MB→B′)(ρAB). (5.1.16)

Then E is convex, as defined in (5.1.7), and a selective LOCC monotone,as defined in (5.1.14).

PROOF: We first prove convexity. Let

ρXAB := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxAB, (5.1.17)

where X is a finite alphabet, p : X → [0, 1] is a probability distribution, andρx

ABx∈X is a set of states. Then

∑x∈X

p(x)E(A; B)ρx = E(XA; B)ρ (5.1.18)

≥ E(A; B)ρ, (5.1.19)

where the entanglement E(A; B) in the last line is evaluated with respect tothe reduced state ρAB = ∑x∈X p(x)ρx

AB. The equality follows from the as-sumption of invariance under classical communication, as defined in (5.1.6),and the inequality follows because the partial trace channel TrX is a localchannel that discards the classical system X.

To establish selective LOCC monotonicity, consider that an LOCC chan-nel L↔AB→A′B′ of the form in (5.1.10), by Definition 3.37, can be built up as aserial concatenation of one-way LOCC channels. Furthermore, each one-wayLOCC channel can keep some classical information and discard some of it. Itis helpful conceptually to think of the retained classical information as part ofthe variable x in (5.1.10) and the discarded classical information as part of thevariable y in (5.1.11). In more detail, each such one-way LOCC channel (from

421

Page 433: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Alice to Bob in the case discussed below) has the following form:

∑k,`

Ek,`A→A′ ⊗ Fk,`

B→B′ , (5.1.20)

where Ek,`A k,` is a collection of completely positive maps such that the sum

map ∑k,` Ek,`A is trace preserving, and Fk,`

B k,` is a collection of quantum chan-nels. For now and for simplicity, let use the superindex m := (k, `). We shouldthink of the classical information k as that which is being kept and that in ` asthat which is being lost or let go. The one-way LOCC channel in (5.1.20) canbe implemented using the following steps:

1. Alice applies the following local quantum channel:

τAB →∑m

EmA→A′(τAB)⊗ |m〉〈m|MA . (5.1.21)

2. Alice employs a classical communication channel

(·)MA →∑m|m〉MB〈m|MA(·)MA |m〉MA〈m|MB (5.1.22)

to communicate the value in MA to Bob:

∑m

EmA→A′(τAB)⊗ |m〉〈m|MA →∑

mEm

A→A′(τAB)⊗ |m〉〈m|MB . (5.1.23)

3. Bob performs the local channel

(·)BMB →∑m

FmB→B′(·)⊗ |m〉〈m|MB(·)MB |m〉〈m|MB , (5.1.24)

which can be understood as “looking in the classical register MB” to de-termine the value m and performing the local quantum channel Fm

B→B′based on the value m found. Under this local channel, the global statebecomes as follows:

∑m

EmA→A′(τAB)⊗ |m〉〈m|MB →∑

m(Em

A→A′ ⊗ FmB→B′)(τAB)⊗ |m〉〈m|MB .

(5.1.25)

4. Bob then discards the ` part of the classical information m, as follows:

∑m(Em

A→A′ ⊗ FmB→B′)(τAB)⊗ |m〉〈m|MB

422

Page 434: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

= ∑k,`(Ek,`

A→A′ ⊗ Fk,`B→B′)(τAB)⊗ |k, `〉〈k, `|KBLB (5.1.26)

→∑k,`(Ek,`

A→A′ ⊗ Fk,`B→B′)(τAB)⊗ |k〉〈k|KB (5.1.27)

= ∑k

p(k)ωkA′B′ ⊗ |k〉〈k|KB , (5.1.28)

where

p(k) := Tr

[∑`

(Ek,`A→A′ ⊗ Fk,`

B→B′)(τAB)

], (5.1.29)

ωkA′B′ :=

1p(k) ∑

`

(Ek,`A→A′ ⊗ Fk,`

B→B′)(τAB). (5.1.30)

Bob could, if desired, finally discard the classical register KB to imple-ment the one-way LOCC channel in (5.1.20). However, it is helpful tohold on to it for our analysis below.

Now we analyze how the entanglement changes under each of these steps,omitting the state subscripts at each step except for the first and last lines,because those not shown are clear from the context:

E(A; B)τ ≥ E(A′MA; B) (5.1.31)= E(A′; BMB) (5.1.32)≥ E(A′; B′MB) (5.1.33)= E(A′; B′KBLB) (5.1.34)≥ E(A′; B′KB) (5.1.35)

= ∑k

p(k)E(A′; B′)ωk . (5.1.36)

The first inequality follows from data processing under the local channel in(5.1.21). The first equality follows from the assumption of invariance of clas-sical communication, i.e., invariance under the action of the classical channelin (5.1.22). The second inequality follows from data processing under thelocal channel in (5.1.25). The second equality is trivial, following becauseMB = (KB, LB) by definition. The third inequality follows again from dataprocessing under the local channel in (5.1.27). The final equality follows againfrom invariance under classical communication.

Thus, we have shown strong one-way LOCC monotonicity (from Alice to

423

Page 435: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Bob) in the following sense:

E(A; B)τ ≥∑k

p(k)E(A′; B′)ωk , (5.1.37)

where the ensemble (p(k), ωkA′B′)k arises from the state τAB by means of

one-way LOCC from Alice to Bob. By the same argument, but flipping therole of Alice and Bob, it follows that strong one-way LOCC monotonicityfrom Bob to Alice holds for the function E. Since every LOCC channel is builtup as a serial concatenation of one-way LOCC channels and since we haveproven that strong monotonicity holds for the function E for each of them, itfollows that E obeys selective LOCC monotonicity.

5.1.1 Examples

Let us now consider some examples of entanglement measures. The first twoentanglement measures that we consider are related to the Schmidt rank cri-terion and the PPT criterion, respectively, stated in Section 3.1.1 and reiteratedat the beginning of this chapter. They are known as the entanglement of for-mation and the log-negativity, respectively, and are some of the simplest andearliest entanglement measures defined. They are also conceptually linked tomore complex entanglement measures like squashed entanglement and theRains relative entropy, the latter of which are the best known upper boundson distillable entanglement (studied in Chapter 8).

5.1.1.1 Entanglement of Formation

Given a pure bipartite state ψAB = |ψ〉〈ψ|AB, there exists a Schmidt decompo-sition of |ψ〉AB such that

|ψ〉AB =r

∑k=1

√λk|ek〉A ⊗ | fk〉B, (5.1.38)

where r is the Schmidt rank, λk > 0 are the Schmidt coefficients, and |ek〉Ark=1,

| fk〉Brk=1 are orthonormal sets. Observe that the reduced states ψA := TrB[ψAB]

and ψB := TrA[ψAB] have the same non-zero eigenvalues, which means thattheir entropies are equal, i.e., H(ψA) = H(ψB). Furthermore, H(ψA) = 0 ifand only if r = 1, and r = 1 if and only if ψAB is separable, by the Schmidt

424

Page 436: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

rank criterion. Therefore, the entropy of the reduced state of a pure bipartitestate provides us with a signature of entanglement for pure bipartite states:

ψAB entangled ⇐⇒ H(ψA) > 0. (5.1.39)

We let

EF(ψAB) := H(TrB[ψAB]) = −r

∑k=1

λk log2 λk (5.1.40)

for every pure state ψAB.

The function EF is an entanglement measure, as proven in Proposition 5.3below. When evaluated on pure bipartites as above, it is known as the en-tropy of entanglement or entanglement entropy, and it is often simply denoted byE(ψAB) in the research literature. By (5.1.39), it is also a faithful entanglementmeasure on pure states, i.e., EF(ψAB) = 0 if and only if ψAB is a separablestate. Recall from Section 3.1.1.4 that a maximally entangled state is definedby having λk = 1

r for all 1 ≤ k ≤ r. For such states, EF(ψAB) = log2 r, whichjustifies calling them maximally entangled because log2 r is the largest valueof the quantum entropy for states supported on an r-dimensional space.

The definition in (5.1.40) for the entanglement measure EF, so far, has beendefined only for pure states. To extend the definition to mixed states, we usethe fact that a mixed state ρAB can be decomposed into a convex combinationof pure states as follows:

ρAB = ∑x∈X

p(x)ψxAB, (5.1.41)

where X is a finite alphabet, p : X → [0, 1] is a probability distribution, andψx

ABx∈X is a set of pure states. We can then measure the entanglementof ρAB by taking the expected entanglement of the pure states involved inthe decomposition of ρAB, i.e., by ∑x∈X p(x)H(ψx

A), where ψxA := TrB[ψ

xAB].

However, this strategy can lead to different values for the entanglement ofρAB, depending on the chosen decomposition, because the decomposition ofmixed states into a convex combination of pure states is generally not unique.We can address this issue by minimizing over all possible decompositions,leading to the following definition:

425

Page 437: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Definition 5.3 Entanglement of Formation

The entanglement of formation of a bipartite state ρAB is defined as

EF(ρAB) := inf(p(x),ψx

AB)x∈X

x∈Xp(x)H(ψx

A) : ρAB = ∑x∈X

p(x)ψxAB

,

(5.1.42)where ρAB = ∑x∈X p(x)ψx

AB is a pure-state decomposition of ρAB.

It suffices to take |X| ≤ dim(HAB)2 in the optimization in (5.1.42), and

furthermore, the infimum is achieved by at least one pure-state decomposi-tion. The fact that the alphabet X need not exceed dim(HAB)

2 elements isdue to the entropy being a continuous function and the Fenchel–Eggleston–Carathéodory Theorem (Theorem 2.17) and the fact that dimension of thespace of density operators on a dim(HAB)-dimensional space is dim(HAB)

2.The fact that the infimum is achieved follows because the optimization is withrespect to a compact space and the function being optimized is continuous.

For every pure bipartite state ψAB, the equality in (5.1.40) holds because itis not possible to decompose a pure state ψAB as a mixture of other pure statesdifferent from ψAB. As mentioned above, the entanglement of formation isalso known as the entropy of entanglement for the special case of a pure bipartitestate, due to the fact that it is equal to the entropy of the reduced state.

It is a direct consequence of Definition 5.3 and the non-negativity of quan-tum entropy in (4.2.106) that the entanglement of formation is non-negative,i.e.,

EF(ρAB) ≥ 0 (5.1.43)

for every bipartite state ρAB.

Not only is the entanglement of formation non-negative, but it is alsofaithful; and we can make an even more refined statement about approxi-mate faithfulness (when ρAB is close to separable or when EF(ρAB) is close tozero).

Before establishing this statement in Proposition 5.5 below, we first provea uniform continuity bound for the entanglement of formation. Uniform con-tinuity is a desirable property of an entanglement measure: if two bipartitestates ρAB and σAB are not very distinguishable from each other (according tosome distinguishability measure), then the difference of their entanglement

426

Page 438: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

should be small. (See Section 2.3 for a definition of uniform continuity.)

Proposition 5.4 Uniform Continuity of Entanglement of Formation

Let ρAB and σAB be states satisfying

12‖ρAB − σAB‖1 ≤ ε, (5.1.44)

where ε ∈ [0, 1]. Then the entanglement of formations of ρAB and σABsatisfy

|EF(ρAB)− EF(σAB)| ≤ δ log2 min dA, dB+ g2(δ), (5.1.45)

where δ :=√

ε (2− ε) and g2(x) := (x + 1) log2(x + 1)− x log2 x.

PROOF: By applying Theorem 3.64 to (5.1.44), we find that√

F(ρAB, σAB) ≥ 1− ε, (5.1.46)

which implies that√

1− F(ρAB, σAB) ≤√

1− (1− ε)2 =√

ε (2− ε). (5.1.47)

Let|ψρ〉RAB := ∑

x

√p(x)|x〉R|ψx〉AB. (5.1.48)

be a purification of ρAB, with ρAB = ∑x p(x)ψxAB a pure-state decomposition

of ρAB. By applying Uhlmann’s theorem (Theorem 3.58), there exists a purifi-cation ψσ

RAB of σAB such that F(ψρRAB, ψσ

RAB) = F(ρAB, σAB). By combiningthis observation with (5.1.47), and the fact that the sine distance of two purestates is equal to the normalized trace distance (see (3.5.1)), we conclude that

12

∥∥∥ψρRAB − ψσ

RAB

∥∥∥1≤√

ε (2− ε). (5.1.49)

We now apply the measurement channel MR→X(·) := ∑x |x〉X〈x|R(·)|x〉R〈x|Xto the R systems, as well as the data-processing inequality for the trace dis-tance, to conclude that

12

∥∥∥MR→X(ψρRAB)−MR→X(ψ

σRAB)

∥∥∥1≤√

ε (2− ε). (5.1.50)

427

Page 439: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Consider that

ρXAB := MR→X(ψρRAB) = ∑

xp(x)|x〉〈x|X ⊗ ψx

AB, (5.1.51)

and there exists a probability distribution q(x) and a set ϕxABx, satisfying

σAB = ∑x

q(x)ϕxAB, (5.1.52)

such thatσXAB := MR→X(ψ

σRAB) = ∑

xq(x)|x〉〈x|X ⊗ ϕx

AB. (5.1.53)

Now applying the uniform continuity of conditional mutual information (Propo-sition 4.8), we conclude that

12

I(A; B|X)σ ≤12

I(A; B|X)ρ + δ log2 mindA, dB+ g2(δ), (5.1.54)

with δ =√

ε (2− ε). Since the states of systems AB are pure when condi-tioned on the classical system X, for both ρXAB and σXAB, consider that

12

I(A; B|X)σ = H(A|X)σ,12

I(A; B|X)ρ = H(A|X)ρ. (5.1.55)

So we conclude that

EF(σAB) ≤ H(A|X)σ ≤ H(A|X)ρ + δ log2 mindA, dB+ g2(δ), (5.1.56)

where the first inequality follows from Definition 5.3 and (5.1.52). Since thepure-state decomposition of ρAB is arbitrary, we conclude that

EF(σAB) ≤ EF(ρAB) + δ log2 mindA, dB+ g2(δ). (5.1.57)

Running the argument again, but starting from an arbitrary pure-state de-composition of σAB, we conclude the inequality

EF(ρAB) ≤ EF(σAB) + δ log2 mindA, dB+ g2(δ), (5.1.58)

which, together with (5.1.57), implies (5.1.45).

428

Page 440: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Proposition 5.5 Faithfulness of Entanglement of Formation

The entanglement of formation is faithful, so that EF(ρAB) = 0 if andonly if the state ρAB is separable. More quantitatively, for a state ρABand ε ∈ [0, 1], if

infσAB∈SEP(A:B)

12‖ρAB − σAB‖1 ≤ ε, (5.1.59)

thenEF(ρAB) ≤ δ log2 mindA, dB+ g2(δ), (5.1.60)

where δ :=√

ε (2− ε). Conversely, for ε ≥ 0, if

EF(ρAB) ≤ ε, (5.1.61)

theninf

σAB∈SEP(A:B)

12‖ρAB − σAB‖1 ≤

√ε ln 2. (5.1.62)

PROOF: We begin by proving the first statement. Suppose that the state σABis separable. Then by the remark after Definition 3.8, there exists a pure-statedecomposition of σAB as

σAB = ∑x

p(x)φxA ⊗ ϕx

B. (5.1.63)

For this decomposition, we have that ∑x p(x)H(φxA) = 0 because the quan-

tum entropy is equal to zero for a pure state. This implies by definition thatEF(σAB) = 0. The statement in (5.1.59)–(5.1.60) then follows by combiningthis observation with (5.1.44)–(5.1.45), as well as the fact that the function onthe right-hand side of (5.1.60) is monotone in ε.

To see the second statement, let

ρAB = ∑x

p(x)ψxAB (5.1.64)

be an arbitrary pure-state decomposition of ρAB. By applying the observationin (5.1.55), we find that

∑x

p(x)H(ψxA) =

12 ∑

xp(x)I(A; B)ψx (5.1.65)

429

Page 441: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

=12 ∑

xp(x)D(ψx

AB‖ψxA ⊗ ψx

B) (5.1.66)

≥ 14 ln 2 ∑

xp(x) ‖ψx

AB − ψxA ⊗ ψx

B‖21 (5.1.67)

≥ 14 ln 2

∥∥∥∥∥∑xp(x)ψx

AB −∑x

p(x)ψxA ⊗ ψx

B

∥∥∥∥∥

2

1

(5.1.68)

=1

4 ln 2

∥∥∥∥∥ρAB −∑x

p(x)ψxA ⊗ ψx

B

∥∥∥∥∥

2

1

(5.1.69)

≥ 1ln 2

(inf

σAB∈SEP(A:B)

12‖ρAB − σAB‖1

)2. (5.1.70)

The second equality follows from rewriting the mutual information in termsof relative entropy (see (4.2.95)). The first inequality follows from the quan-tum Pinsker inequality (Corollary 4.30 and the remark thereafter). The secondinequality follows from convexity of the square function and the trace norm.Since the inequality holds for an arbitrary pure-state decomposition of ρAB,we conclude that

EF(ρAB) ≥1

ln 2

(inf

σAB∈SEP(A:B)

12‖ρAB − σAB‖1

)2. (5.1.71)

From this and (5.1.61), we conclude (5.1.62).

Finally, to conclude exact faithfulness from approximate faithfulness, weargue that the infimum in (5.1.59) and (5.1.62) is achieved (i.e., can be replacedwith a minimum). This follows because the trace distance is continuous inσAB and the set of separable states is compact.

As the following proposition indicates, the entanglement of formation isan entanglement measure according to Definition 5.1. The proof detailed be-low is also a good opportunity to apply Lemma 5.2 to a simple example.

Proposition 5.6

The entanglement of formation is convex, so that (5.1.7) holds with E setto EF, and it is a selective LOCC monotone, so that (5.1.14) holds with Eset to EF.

PROOF: By Lemma 5.2, we only need to show that the entanglement of for-mation does not increase under the action of a local channel and is invariant

430

Page 442: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

under classical communication. We begin with the first one. Let ρAB be a bi-partite state, and let NA→A′ be a local quantum channel. Let (p(x), ψx

AB)xbe a pure-state decomposition of ρAB, i.e., satisfying ∑x p(x)ψx

AB = ρAB. LetNA→A′ have the Kraus representation Ny

A→A′y. Then

ωA′B := NA→A′(ρAB) = ∑x

p(x)NA→A′(ψxAB), (5.1.72)

and

NA→A′(ψxAB) = ∑

yNy

A→A′ψxAB(Ny

A→A′)† = ∑

yp(y|x)ϕ

x,yA′B, (5.1.73)

where

p(y|x) := Tr[NyA→A′ψ

xAB(Ny

A→A′)†], (5.1.74)

ϕx,yA′B :=

1p(y|x)Ny

A→A′ψxAB(Ny

A→A′)†. (5.1.75)

Thus, (p(x)p(y|x), ϕx,yA′B)x,y is a pure-state decomposition of ωA′B. Also,

observe that

ψxB = TrA[ψ

xAB] (5.1.76)

= TrA′ [NA→A′(ψxAB)] (5.1.77)

= ∑y

p(y|x)TrA′ [ϕx,yA′B] (5.1.78)

= ∑y

p(y|x)ϕx,yB . (5.1.79)

Then we have that

∑x

p(x)H(ψxB) ≥∑

x,yp(x)p(y|x)H(ϕ

x,yB ) (5.1.80)

≥ EF(A′; B)ω, (5.1.81)

where the first inequality follows from the concavity of entropy (see (4.2.105))and the second from the definition of entanglement of formation. Since theinequality holds for all pure-state decompositions of ρAB, we conclude thedesired inequality:

EF(A; B)ρ ≥ EF(A′; B)ω. (5.1.82)

By flipping the role of Alice and Bob in the analysis above, we conclude thatthe entanglement of formation does not increase under the action of a localchannel on Bob’s system.

431

Page 443: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Now we prove that EF is invariant under classical communication. LetρXAB be the classical–quantum state defined in (5.1.5). A pure-state decom-position of ρXAB has the form (p(x)p(y|x), |x〉〈x|X ⊗ ψ

x,yAB)x,y, where

ρxAB = ∑

yp(y|x)ψx,y

AB. (5.1.83)

This ensemble serves as a decomposition of ρXAB for EF(XA; B)ρ. Then

∑x,y

p(x)p(y|x)H(ψx,yB ) ≥∑

xp(x)EF(A; B)ρx . (5.1.84)

Since the inequality holds for all pure-state decompositions of ρXAB, we con-clude that

EF(XA; B)ρ ≥∑x

p(x)EF(A; B)ρx . (5.1.85)

Now let (p(y|x), ψx,yAB)y be a pure-state decomposition of ρx

AB. Then we findthat

∑x,y

p(x)p(y|x)H(ψx,yB ) ≥ EF(XA; B)ρ (5.1.86)

because the ensemble (p(x)p(y|x), |x〉〈x|X ⊗ ψx,yAB)x,y is a particular pure-

state decomposition of ρXAB. Since the inequality holds for all pure-state de-compositions of ρx

AB, we conclude that

∑x

p(x)EF(A; B)ρx ≥ EF(XA; B)ρ. (5.1.87)

Putting together (5.1.85) and (5.1.87), we conclude that

∑x

p(x)EF(A; B)ρx = EF(XA; B)ρ. (5.1.88)

By the same argument, but exchanging the roles of Alice and Bob, we con-clude that

∑x

p(x)EF(A; B)ρx = EF(A; BX)ρ. (5.1.89)

This concludes the proof.

Proposition 5.7 Subadditivity of Entanglement of Formation

The entanglement of formation is subadditive; i.e., (5.1.9) holds with Eset to EF.

432

Page 444: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

PROOF: Let ∑x p(x)ψxA1B1

and ∑y q(y)φyA2B2

be respective pure-state decom-positions of τA1B1 and ωA2B2 . Then ∑x,y p(x)q(y)ψx

A1B1⊗ φ

yA2B2

is a pure-statedecomposition of τA1B1 ⊗ωA2B2 . It follows that

EF(A1A2; B1B2)τ⊗ω ≤∑x,y

p(x)q(y)H(ψxA1⊗ φ

yA2) (5.1.90)

= ∑x,y

p(x)q(y)[H(ψxA1) + H(φ

yA2)] (5.1.91)

= ∑x

p(x)H(ψxA1) + ∑

yq(y)H(φ

yA2). (5.1.92)

Since the inequality holds for arbitrary pure-state decompositions of τA1B1and ωA2B2 , we conclude that subadditivity holds.

The opposite inequality, superadditivity of entanglement of formation, isknown not to hold in general. Thus, the entanglement of formation is non-additive in general. The proof of this statement is highly nontrivial (pleaseconsult the Bibliographic Notes in Section 5.8).

The entanglement of formation is connected to an information-theoretictask: EF(ρAB) is an achievable rate for the task of preparing the state ρABfrom many copies of the two-qubit maximally entangled state |Φ〉AB whenallowing LOCC for free (please consult the Bibliographic Notes in Section 5.8).

For two-qubit states ρAB, the entanglement of formation has the followinganalytic expression:

EF(ρAB) = h2

(1 +

√1− C(ρAB)2

2

)(two-qubit states). (5.1.93)

(please consult the Bibliographic Notes in Section 5.8.) Here,

C(ρAB) = max0, λ1 − λ2 − λ3 − λ4, (5.1.94)

where λ1, λ2, λ3, λ4 are the eigenvalues of√√

ρAB ρAB√

ρAB in decreasing or-der. The operator ρAB is defined as ρAB := (Y⊗ Y)ρAB(Y⊗ Y), with Y beingthe Pauli-Y operator (see (3.2.106)) and ρAB being the complex conjugate ofρAB in the standard basis.

5.1.1.2 Negativity and Logarithmic Negativity

Similar to how we motivated the entanglement of formation from the Schmidtrank criterion for pure states, we can also motivate an entanglement measure

433

Page 445: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

from the PPT criterion. The PPT criterion states that if the partial transposeTB(ρAB) of a given state ρAB has a negative eigenvalue, then ρAB is entan-gled.2 We use this fact to define the negativity of ρAB as

N(ρAB) :=‖TB(ρAB)‖1 − 1

2, (5.1.95)

and the logarithmic negativity (often written simply as log-negativity) of ρAB as

EN(ρAB) := log2 ‖TB(ρAB)‖1 . (5.1.96)

Both the negativity and the log-negativity quantify the extent to which thepartial transpose TB(ρAB) has negative eigenvalues. In particular, supposethat TB(ρAB) has the following Jordan–Hahn decomposition:

TB(ρAB) = P− N, (5.1.97)

where P and N are the positive and negative parts of TB(ρAB), satisfyingP, N ≥ 0 and PN = 0, and we have used (2.2.51) and (2.2.52). By definitionof the trace norm,

‖TB(ρAB)‖1 = Tr[P + N]. (5.1.98)

On the other hand, observe that Tr[TB(ρAB)] = Tr[ρAB] = 1, so that

1 = Tr[TB(ρAB)] = Tr[P− N]. (5.1.99)

Therefore,

N(ρAB) =‖TB(ρAB)‖1 − 1

2=‖TB(ρAB)‖1 − Tr[TB(ρAB)]

2= Tr[N]. (5.1.100)

So, according to (2.2.52), the negativity is the sum of the absolute values ofthe negative eigenvalues of ρTB

AB.

By utilizing Hölder duality and semi-definite programming duality, it ispossible to write ‖TB(ρAB)‖1 as the following primal and dual semi-definiteprograms:

‖TB(ρAB)‖1 = supRAB

Tr[RABρAB] : −1AB ≤ TB(RAB) ≤ 1AB , (5.1.101)

= infKAB,LAB≥0

Tr[KAB + LAB] : TB(KAB − LAB) = ρAB . (5.1.102)

where the optimization in the first line is with respect to Hermitian RAB. Wegive a proof of (5.1.102)–(5.1.101) in Appendix 5.A.

2Note that it does not matter in which basis the transpose is defined.

434

Page 446: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Proposition 5.8

The log-negativity is non-negative for all bipartite states, and it is faith-ful on the set of PPT states (i.e., it is equal to zero if and only if a stateis PPT).

PROOF: To see the first statement, we note that the choice RAB = 1AB isfeasible for the primal SDP in (5.1.101), so that ‖TB(ρAB)‖1 ≥ 1, and henceEN(ρAB) ≥ 0, for every bipartite state ρAB.

Suppose that ρAB is a PPT state. Then ‖TB(ρAB)‖1 = Tr[TB(ρAB)] =Tr[ρAB] = 1 due to the assumption that TB(ρAB) ≥ 0, implying that EN(ρAB) =0 for every PPT state.

Finally, suppose that EN(ρAB) = 0. Then ‖TB(ρAB)‖1 = 1, and employingthe notation of (5.1.98)–(5.1.99), we conclude that 1 = Tr[P + N] = Tr[P− N],which implies that Tr[N] = 0. Since N ≥ 0, this implies that N = 0. Thus,TB(ρAB) has no negative part and ρAB is thus a PPT state.

Definition 5.9 Selective PPT Monotonicity

As a generalization of selective LOCC monotonicity defined in (5.1.14),we say that a function E : D(HAB) → R is a selective PPT monotone ifit satisfies

E(ρAB) ≥ ∑x∈X:p(x)>0

p(x)E(ωxA′B′), (5.1.103)

for every bipartite state ρAB and C-PPT-P instrument PxAB→A′B′x∈X,

with

p(x) := Tr[PxAB→A′B′(ρAB)], (5.1.104)

ωxA′B′ :=

1p(x)

PxAB→A′B′(ρAB). (5.1.105)

A C-PPT-P instrument is such that every map PxAB→A′B′ is completely

positive and TB′ PxAB→A′B′ TB is completely positive, and the sum

map ∑x∈X PxAB→A′B′ is trace preserving.

It follows that E is a PPT monotone if it is a selective PPT monotone,because the former is a special case of the latter in which the alphabet Xhas only one letter.

435

Page 447: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

An LOCC instrument in (5.1.10) is a C-PPT-P instrument because everymap Lx

AB→A′B′ in an LOCC instrument satisfies the requirements for a C-PPT-P instrument.

The negativity and the log-negativity are entanglement measures, as shownin Proposition 5.10 below. Interestingly, the method of proof does not in-volve making use of Lemma 5.2 because it is impossible to do so for the log-negativity, as the latter is not convex. In any case, we prove a stronger resultthan selective LOCC monotonicity for the log-negativity: we prove that it is aselective PPT monotone.

Proposition 5.10

The negativity and the log-negativity are selective PPT monotones, sat-isfying (5.1.103). The negativity is convex, satisfying (5.1.7), but the log-negativity is not.

PROOF: Define ρAB and (p(x), ωxA′B′)x∈X as in (5.1.104)–(5.1.105), and let

PxAB→A′B′x∈X be a C-PPT-P instrument. Let KAB and LAB be arbitrary posi-

tive semi-definite operators satisfying

TB(KAB − LAB) = ρAB. (5.1.106)

Then we find that

p(x)ωxA′B′ = Px

AB→A′B′(ρAB) (5.1.107)= Px

AB→A′B′(TB(KAB − LAB)) (5.1.108)= TB′(TB′ Px

AB→A′B′ TB)(KAB − LAB)). (5.1.109)

Let us define

KxA′B′ :=

1p(x)

(TB′ PxAB→A′B′ TB)(KAB), (5.1.110)

LxA′B′ :=

1p(x)

(TB′ PxAB→A′B′ TB)(LAB), (5.1.111)

so thatωx

A′B′ = TB′(KxA′B′ − Lx

A′B′). (5.1.112)

Furthermore, since TB′ PxAB→A′B′ TB is completely positive, it follows that

KxA′B′ , Lx

A′B′ ≥ 0. Thus, KxA′B′ and Lx

A′B′ are feasible for the SDP in (5.1.102) for∥∥TB(ωxA′B′)

∥∥1, and we conclude that

Tr[KxA′B′ + Lx

A′B′ ] ≥ ‖TB(ωxA′B′)‖1 . (5.1.113)

436

Page 448: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Then consider that

Tr[KAB + LAB]

= Tr[TB(KAB + LAB)] (5.1.114)

= ∑x∈X:p(x)>0

Tr[PxAB→A′B′(TB(KAB + LAB))] (5.1.115)

= ∑x∈X:p(x)>0

Tr[(TB′ PxAB→A′B′ TB)(KAB + LAB)] (5.1.116)

= ∑x∈X:p(x)>0

p(x)Tr[KxA′B′ + Lx

A′B′ ] (5.1.117)

≥ ∑x∈X:p(x)>0

p(x) ‖TB(ωxA′B′)‖1 . (5.1.118)

The first and third equalities hold because the trace is invariant under a partialtranspose. The second equality follows because the sum map ∑x P

xAB→A′B′ is

trace preserving. The fourth equality follows from the definitions in (5.1.110)–(5.1.111). The inequality follows from (5.1.113). Since the inequality holds forall KAB, LAB ≥ 0 satisfying (5.1.106), we conclude that

‖TB(ρAB)‖1 ≥ ∑x∈X:p(x)>0

p(x) ‖TB(ωxA′B′)‖1 . (5.1.119)

By applying the definition in (5.1.95), we conclude that the negativity is aselective PPT monotone. Now considering (5.1.119) and taking the logarithm,and using its monotonicity and concavity, we conclude that the log-negativityis a selective PPT monotone:

EN(ρAB) = log2 ‖TB(ρAB)‖1 (5.1.120)

≥ log2

x∈X:p(x)>0p(x) ‖TB(ω

xA′B′)‖1

(5.1.121)

≥ ∑x∈X:p(x)>0

p(x) log2 ‖TB(ωxA′B′)‖1 (5.1.122)

= ∑x∈X:p(x)>0

p(x)EN(ωxA′B′). (5.1.123)

That the negativity is convex follows directly from the definition, convex-ity of the trace norm, and linearity of the partial transpose.

437

Page 449: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

The lack of convexity of log-negativity follows from direct evaluation forthe states ΦAB := 1

2 ∑i,j∈0,1 |ii〉〈jj|AB, σAB := 12 ∑i∈0,1 |ii〉〈ii|AB, and ρAB :=

12 (ΦAB + σAB), for which we have

EN(ΦAB) = 1, EN(σAB) = 0, EN(ρAB) = log232

, (5.1.124)

so thatEN(ρAB) >

12(EN(ΦAB) + EN(σAB)) . (5.1.125)

This concludes the proof.

Proposition 5.11 Additivity of Log-Negativity

The logarithmic negativity is additive; i.e., (5.1.8) holds with E set to EN.

PROOF: For every two states τA1B1 and ωA2B2 , consider that

EN(A1A2; B1B2)τ⊗ω = log2

∥∥TB1B2(τA1B1 ⊗ωA2B2)∥∥

1 (5.1.126)

= log2

∥∥TB1(τA1B1)⊗ TB2(ωA2B2)∥∥

1 (5.1.127)

= log2(∥∥TB1(τA1B1)

∥∥1 ·∥∥TB2(ωA2B2)

∥∥1

)(5.1.128)

= log2

∥∥TB1(τA1B1)∥∥

1 + log2

∥∥TB2(ωA2B2)∥∥

1 (5.1.129)

= EN(A1; B1)τ + EN(A2; B2)ω. (5.1.130)

This concludes the proof.

Proposition 5.12 Log-Negativity of Pure Bipartite States

For a pure bipartite state ψAB, the log-negativity is equal to the Rényientropy of order 1

2 of the reduced state ψA:

EN(ψAB) = H12(ψA). (5.1.131)

PROOF: For every pure state ψAB = |ψ〉〈ψ|AB such that

|ψ〉AB =r

∑k=1

√λk|ek〉A ⊗ | fk〉B (5.1.132)

is a Schmidt decomposition, we have that

|ψ〉〈ψ|TBAB =

r

∑k,k′=1

√λkλk′ |ek〉〈ek′ |A ⊗ | fk′〉〈 fk|B, (5.1.133)

438

Page 450: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

where we have taken the partial transpose with respect to the orthonormalset | fk〉Br

k=1. Observe that

|ψ〉〈ψ|TBAB = FAB

(r

∑k′=1

√λk′ |ek′〉〈ek′ |A ⊗

r

∑k=1

√λk| fk〉〈 fk|B

), (5.1.134)

where FAB = ∑rk,k′=1 |ek′〉〈ek|A⊗ | fk〉〈 fk′ |B is a unitary swap operator. Thus, by

unitary invariance of the trace norm, we obtain

EN(ψAB) = log2

(r

∑k=1

√λk

)2

= 2 log2

(r

∑k=1

√λk

). (5.1.135)

By comparing with (4.4.3), we conclude the statement of the proposition.

For a maximally entangled state (λk = 1r for all 1 ≤ k ≤ r), we find that

EN(ψAB) = log2 r, exactly as with the entanglement of formation.

5.1.1.3 Divergence-Based Measures

The two entanglement measures considered above are based on specific math-ematical properties of entanglement. However, using the fact that entangledstates are, by definition, not separable, we can construct a broad class of en-tanglement measures by finding the divergence of a given state ρAB with theset of separable states. This idea is illustrated in Figure 5.1. We primarilyconsider such divergence-based entanglement measures in this book (in the re-search literature, these are also called “distance-based” entanglement mea-sures, even though divergences that are not distances, such as relative en-tropy, are used in this approach).

As an example of a divergence-based entanglement measure, let us con-sider a concrete divergence, the normalized trace distance, which we definedin Section 3.5.1 as 1

2 ‖ρ− σ‖1 for every two states ρ and σ. Mathematically,the distance of a point to a set is defined by finding the element of that setthat is closest to the given point. With this idea, we define the trace distance ofentanglement of a state ρAB as the normalized trace distance from ρAB to theclosest state σAB ∈ SEP(A : B):

ET(A; B)ρ := infσAB∈SEP(A:B)

12‖ρAB − σAB‖1 . (5.1.136)

439

Page 451: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

ρAB σ∗AB

D(HAB)

SEP(A : B)

D(ρAB‖σ∗AB)

FIGURE 5.1: A simple way to measure the entanglement of a bipartite stateρAB is to calculate its divergence with the set of separable states. If weuse a generalized divergence D as our measure, then the measure of theentanglement in ρAB is given by the smallest value of D(ρAB‖σAB), whereσAB ∈ SEP(A : B) is a separable state, i.e., by the quantity D(ρAB‖σ∗AB) =infσAB∈SEP(A:B) D(ρAB‖σAB).

Note that the infimum is indeed achieved, because SEP(A : B) is a compact setand the trace norm is continuous in σAB, so that there always exists a closestseparable state to the given state ρAB. Recall that we implicitly introduced thetrace distance of entanglement in Proposition 5.5, when considering approxi-mate faithfulness of the entanglement of formation.

The quantity ET is indeed an entanglement measure. To see this, we usethe data-processing inequality for the trace distance (Theorem 3.52), and thefact that separable states are preserved under LOCC channels (which followsimmediately from the definition of LOCC channels). Then, for every stateρAB, every LOCC channel LAB→A′B′ , and letting ωA′B′ = LAB→A′B′(ρAB), weobtain

ET(A; B)ρ = infσAB∈SEP(A:B)

12‖ρAB − σAB‖1 (5.1.137)

≥ infσAB∈SEP(A:B)

12‖LAB→A′B′(ρAB)−LAB→A′B′(σAB)‖1 (5.1.138)

≥ infτA′B′∈SEP(A′:B′)

12‖LAB→A′B′(ρAB)− τA′B′‖1 (5.1.139)

= ET(A′; B′)ω. (5.1.140)

Although the simple proof above makes it clear that the trace distance of en-

440

Page 452: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

tanglement is an LOCC monotone, it is known that the trace distance of en-tanglement is not a selective LOCC monotone, as defined in (5.1.14) (pleaseconsult the Bibliographic Notes in Section 5.8).

The trace distance of entanglement is also faithful, which is due to the factthat the trace distance is a metric in the mathematical sense: 1

2 ‖ρAB − σAB‖1 ≥0 for all states ρAB, σAB, and 1

2 ‖ρAB − σAB‖1 = 0 if and only if ρAB = σAB.

Beyond the trace distance, we can take any distinguishability measure anddefine an entanglement measure analogous to the one in (5.1.136). That is, wecan take any generalized divergence D as our divergence. Recall from Defini-tion 4.13 that a generalized divergence is a function D : D(H) × L+(H) →R ∪ +∞ that obeys the data-processing inequality. We then define the gen-eralized divergence of entanglement of ρAB as follows:

E(A; B)ρ := infσAB∈SEP(A:B)

D(ρAB‖σAB). (5.1.141)

See Figure 5.1 for a visual depiction of the idea behind this quantity. If thegeneralized divergence D is continuous in its second argument, then the infi-mum in (5.1.141) is achieved. We study this entanglement measure in muchmore detail in Section 5.2. By the data-processing inequality for the gener-alized divergence, as well as the fact that separable states are preserved un-der LOCC channels, it follows that E(A; B)ρ is an entanglement measure. Weprove this and other properties of the generalized divergence of entanglementin Proposition 5.16. As was the case for the trace distance of entanglement, itdoes not necessarily follow that the generalized divergence of entanglementis a selective LOCC monotone, as defined in (5.1.14), but it does hold for someimportant cases.

Although the generalized divergence of entanglement of a bipartite stateis conceptually simple, it is in general difficult to optimize over the set of sep-arable states because it does not have a simple characterization (except in lowdimensions). This means that the generalized divergence of entanglement isdifficult to compute in most cases.

To obtain an entanglement measure that is simpler to compute, one idea isto relax the optimization in (5.1.141) from the set of separable states to someother set of states that contains the set of separable states. It is ideal if thisother set is easier to characterize than the set of separable states. As a firststep, let us recall the PPT criterion from Section 3.1.1.5, which states that ifa bipartite state is separable, then it is PPT, meaning that it has positive par-tial transpose (recall Definition 3.10). This fact immediately leads to the con-

441

Page 453: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

tainment SEP(A : B) ⊆ PPT(A : B). (As stated in Section 3.1.1.5, if both Aand B are qubits, or if one of them is a qubit and the other a qutrit, thenPPT(A : B) = SEP(A : B).) We can thus define the PPT generalized divergence ofa bipartite state ρAB as

EPPT(A; B)ρ := infσAB∈PPT(A:B)

D(ρAB‖σAB). (5.1.142)

If the generalized divergence D is continuous in its second argument, thenthe infimum is achieved. Also, note that EPPT(A; B)ρ = E(A; B)ρ when bothA and B are qubits or when one of them is a qubit and the other a qutrit.

Like the generalized divergence of entanglement, the PPT divergence isan entanglement measure. In fact, the PPT divergence is monotone underC-PPT-P channels, as defined in Definition 3.42. This is due to the data-processing inequality for the generalized divergence and the fact that the setof PPT states is closed under C-PPT-P channels (see Proposition 3.43). Sincethe set of LOCC channels is contained in the set of C-PPT-P channels (Propo-sitions 3.39 and 3.44), it follows that the PPT divergence is an entanglementmeasure.

Unlike the generalized divergence of entanglement, the PPT divergenceis not a faithful entanglement measure. It is true that EPPT(A; B)ρ = 0 for allseparable states ρAB due to the containment

SEP(A : B) ⊆ PPT(A : B). (5.1.143)

However, the converse statement is not true because the infimum in (5.1.142),if achieved, need not be achieved by a separable state. In other words, thereexist PPT entangled states ρAB for which EPPT(A; B)ρ = 0.

It turns out to be useful to relax the set of PPT states further:

Definition 5.13 PPT’

Let PPT′(A : B) denote the following convex set of positive semi-definiteoperators:

PPT′(A : B) := σAB : σAB ≥ 0, ‖TB(σAB)‖1 ≤ 1 . (5.1.144)

Convexity of the set PPT′(A : B) follows from convexity of the trace norm.Furthermore, the set PPT′(A : B) contains the set of PPT states because everyPPT state σAB satisfies ‖TB(σAB)‖1 = 1. Furthermore, every operator σAB ∈

442

Page 454: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

PPT(A : B) SEP(A : B)PPT′(A : B)

FIGURE 5.2: The set SEP(A : B) of separable states acting on the Hilbertspace HAB is contained in the set PPT(A : B) of positive partial transpose(PPT) states, which in turn is contained in the set PPT′(A : B) of operatorsdefined in (5.1.144). The sets PPT and PPT′ are relaxations of the set ofseparable states that can be easily characterized in terms of semi-definiteconstraints.

PPT′(A : B) is subnormalized, satisfying Tr[σAB] ≤ 1, which follows because

Tr[σAB] = Tr[TB(σAB)] ≤ ‖TB(σAB)‖1 ≤ 1. (5.1.145)

The set PPT′(A : B) can be written equivalently as

PPT′(A : B) := σAB : σAB ≥ 0, EN(σAB) ≤ 0 , (5.1.146)

by inspecting the formula for log-negativity in (5.1.96). By comparing with(3.1.79), we clearly have the containment

SEP(A : B) ⊆ PPT(A : B) ⊆ PPT′(A : B). (5.1.147)

See Figure 5.2 for a visual depiction of this containment.

We define the generalized Rains divergence of a bipartite state ρAB as

R(A; B)ρ := infσAB∈PPT′(A:B)

D(ρAB‖σAB). (5.1.148)

If the underlying generalized divergence D is continuous in its second argu-ment, then the infimum is achieved in (5.1.148). Like the entanglement mea-sures E(A; B)ρ and EPPT(A; B)ρ, the generalized Rains divergence is an entan-glement measure. In fact, it is monotone under C-PPT-P channels, which fol-lows from the data-processing inequality for the generalized divergence and

443

Page 455: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

because the set PPT′ is preserved under C-PPT-P channels (a consequence ofLemma 5.14 below). Since the set of LOCC channels is contained in the set ofC-PPT-P channels (Propositions 3.39 and 3.44), it follows that the generalizedRains divergence is an entanglement measure.

As with the PPT divergence, the generalized Rains divergence is not afaithful entanglement measure. It is true that R(A; B)ρ = 0 for all separablestates ρAB, due to the containment SEP(A : B) ⊆ PPT′(A : B). However, theconverse statement is not true because the infimum in (5.1.148) need not beachieved by a separable state.

Depending on the form of the generalized divergence D, the relaxationfrom the set SEP to the set PPT′ leads to an entanglement measure that can becomputed efficiently via semi-definite programming (Section 2.4). We inves-tigate one such example of an entanglement measure in Section 5.3.1. Also,due to the containments in (5.1.147), we have that

E(A; B)ρ ≥ EPPT(A; B)ρ ≥ R(A; B)ρ, (5.1.149)

for every bipartite state ρAB. Thus, as we show later in the book, the relaxationfrom SEP to PPT′ via the generalized Rains divergence not only allows for thepossibility of efficiently computable entanglement measures, but due to theinequality in (5.1.149), it also allows for the possibility of obtaining a tighterupper bound on communication rates in certain scenarios. We investigate theproperties of the generalized Rains divergence in detail in Section 5.3.

Before proceeding, let us state some properties of the set PPT′.

Lemma 5.14 Properties of the Set PPT′

The set PPT′(A : B) defined in (5.1.144) has the following properties:1. It is closed under completely PPT-preserving channels (recall Def-

inition 3.42). In more detail, let PAB→A′B′ be a completely PPT-preserving channel. Then, for every state ρAB ∈ PPT′(A : B), wehave that PAB→A′B′(ρAB) ∈ PPT′(A′ : B′).

2. It is closed under LOCC channels (recall Definition 3.37). In moredetail, let LAB→A′B′ be an LOCC channel. Then, for every stateρAB ∈ PPT′(A : B), it holds that LAB→A′B′(ρAB) ∈ PPT′(A′ : B′).

444

Page 456: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

REMARK: We emphasize that not all operators in the set PPT′ are quantum states, mean-ing that not all operators σAB ∈ PPT′(A : B) satisfy Tr[σAB] = 1.

PROOF:

1. Let σA′B′ = PAB→A′B′(ρAB). Since PAB→A′B′ is a channel, and ρAB is astate, we have that σA′B′ ≥ 0. Then,

TB′(σA′B′) = (TB′ PAB→A′B′)(ρAB) (5.1.150)= (TB′ PAB→A′B′ TB TB)(ρAB) (5.1.151)= (TB′ PAB→A′B′ TB)(TB(ρAB)). (5.1.152)

Now, consider that the induced trace norm of TB′ PAB→A′B′ TB satisfies‖TB′ PAB→A′B′ TB‖1 = 1, which follows from (2.2.159)–(2.2.160) andthe fact that TB′ PAB→A′B′ TB is a channel by definition of PAB→A′B′ .Furthermore, we have that ‖TB(ρAB)‖1 ≤ 1 because ρAB ∈ PPT′(A : B).Putting these observations together, we find that

‖TB′(σA′B′)‖1 = ‖(TB′ PAB→A′B′ TB)(TB(ρAB))‖1 (5.1.153)≤ ‖TB′ PAB→A′B′ TB‖1 ‖TB(ρAB)‖1 (5.1.154)≤ 1. (5.1.155)

Therefore, σA′B′ ∈ PPT′(A : B).

2. Since every LOCC channel is a completely PPT-preserving channel (Propo-sitions 3.39 and 3.44), the result immediately follows from the proof above.

5.1.1.4 Squashed Entanglement

In the previous example, we considered the generalized divergence of entan-glement, which is simply the generalized divergence of a given bipartite statewith the set of separable states. If we restrict ourselves to product states only,i.e., states of the form τA⊗ σB, and we consider the quantum relative entropy,then we obtain

infτA,σB

D(ρAB‖τA ⊗ σB) = I(A; B)ρ, (5.1.156)

where we recall the expression in (4.2.98) for the mutual information I(A; B)ρ

of the state ρAB. Thus, the mutual information is the minimal relative en-tropy between the state of interest and the set of product states. We can thusview the mutual information as a measure of the correlations contained in the

445

Page 457: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

bipartite state ρAB. However, the mutual information quantifies both classi-cal and quantum correlations because it detects any correlation whatsoever.Thus, it cannot be the case that the mutual information is an entanglementmeasure, because it is strictly positive even for some separable states thathave only classical correlations, and so it can in general increase under LOCCchannels.

Regardless, we can still use the mutual information in a meaningful wayto quantify entanglement. For example, suppose that σAB is a separable state,so that

σAB = ∑x∈X

p(x)ρxA ⊗ τx

B , (5.1.157)

for a finite alphabet X, probability distribution p : X→ [0, 1], and sets ρxAx∈X,

τxBx∈X of states. Let us form the following extension of σAB to a state ωABX,

with X a classical register:

ωABX = ∑x∈X

p(x)ρxA ⊗ τx

B ⊗ |x〉〈x|X. (5.1.158)

This is indeed an extension because TrX[ωABX] = σAB. Let us now considerthe conditional mutual information I(A; B|X)ω of this extension (recall thedefinition of the quantum conditional mutual information in (4.1.11)). SinceωABX is a classical–quantum state, it follows that

I(A; B|X)ω = ∑x∈X

p(x)I(A; B)ρx⊗τx = 0. (5.1.159)

Therefore, while the mutual information of a separable state can in generalbe non-zero, the conditional mutual information is always zero. Intuitively,this is due to the fact that the classical system acts as a “probe,” which, whenmeasured, reveals a value x ∈ X for the classical system X. Conditioned onthis value, the joint state ρx

A ⊗ τxB is product.

Thus, for every separable state ρAB, there exists a classical extension ωABXsuch that the conditional mutual information I(A; B|X)ω is equal to zero. Us-ing this idea, we could propose a potential measure of entanglement as fol-lows:

12

infωABX

I(A; B|X)ω : TrX[ωABX] = ρAB , (5.1.160)

where the optimization is with respect to extensions of ρAB having a classicalextension system X of arbitrary (finite) dimension |X|, so that

ωABX = ∑x∈X

p(x)ρxAB ⊗ |x〉〈x|X (5.1.161)

446

Page 458: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

for some set ρxABx∈X of states and a probability distribution p(x) satisfy-

ing ρAB = ∑x∈X p(x)ρxAB. The normalization factor of 1

2 is there for reasonsthat become apparent later. If we require that every state ρx

AB in the extensionωABX should be pure, then the measure in (5.1.160) reduces to the entangle-ment of formation (this was actually used in (5.1.55) in the proof of Proposi-tion 5.4).

The quantity proposed in (5.1.160) is non-negative for every state ρAB,due to the non-negativity of mutual information and the fact that conditionalmutual information with a classical conditioning system is equal to a con-vex combination of mutual informations. It is already clear that the quantityproposed in (5.1.160) is equal to zero for every separable state—if a state isseparable, then the optimization in (5.1.160) finds the separable decomposi-tion and the value of the quantity is zero, as discussed just after (5.1.159). Theconverse is also true, which follows from the same proof given for (5.1.61)–(5.1.62). It is actually also possible to show that the quantity in (5.1.160) is anentanglement measure. However, we do not make further use of this quan-tity in this book, because there is an entanglement measure more suitable forour purposes, as introduced below.

Instead of taking a classical extension of the separable state σAB in (5.1.157),as we did in (5.1.158), we can take a “quantum extension,” in the sense thatwe could define an extension ωABE in which the system E is some finite-dimensional quantum system. Optimizing over all such extensions, we ob-tain the squashed entanglement:

Esq(A; B)ρ :=12

infωABEI(A; B|E)ω : TrE[ωABE] = ρAB, (5.1.162)

which can only be smaller than the quantity proposed in (5.1.160). Note thatwe optimize with respect to extensions ωABE for which the extension systemE can have arbitrary finite dimension. It is not known whether the optimiza-tion can be restricted to extension systems of a certain fixed dimension. Ingeneral, therefore, it is not known whether the infimum in (5.1.162) can bereplaced by a minimum.

The squashed entanglement is indeed an entanglement measure, as weshow in Section 5.4. It is also a faithful entanglement measure. That it van-ishes for separable states follows from the arguments presented above. Fora proof of the converse direction, please consult the Bibliographic Notes inSection 5.8. We establish other important properties of the squashed entan-glement in Section 5.4.

447

Page 459: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

5.2 Generalized Divergence of Entanglement

In this section, we investigate properties of the generalized divergence of en-tanglement, which is a general construction of an entanglement measure. Letus recall from (5.1.141) above that the generalized divergence of entanglementof a bipartite state is the generalized divergence between that state and the setof separable states.

Definition 5.15 Generalized Divergence of Entanglement

Let D be a generalized divergence (see Definition 4.13). For every bipar-tite state ρAB, we define the generalized divergence of entanglement of ρABas

E(A; B)ρ := infσAB∈SEP(A:B)

D(ρAB‖σAB). (5.2.1)

If the underlying generalized divergence D is continuous in its secondargument, then the infimum can be replaced by a minimum.

We are particularly interested throughout the rest of this book in the fol-lowing generalized divergences of entanglement for every state ρAB:

1. The relative entropy of entanglement of ρAB,

ER(A; B)ρ := infσAB∈SEP(A:B)

D(ρAB‖σAB), (5.2.2)

where D(ρAB‖σAB) is the quantum relative entropy of ρAB and σAB (Def-inition 4.1).

2. The ε-hypothesis testing relative entropy of entanglement of ρAB,

EεH(A; B)ρ := inf

σAB∈SEP(A:B)Dε

H(ρAB‖σAB), (5.2.3)

where DεH(ρAB‖σAB) is the ε-hypothesis testing relative entropy of ρAB

and σAB (Definition 4.62).

3. The sandwiched Rényi relative entropy of entanglement of ρAB,

Eα(A; B)ρ := infσAB∈SEP(A:B)

Dα(ρAB‖σAB), (5.2.4)

where Dα(ρAB‖σAB), α ∈ [1/2, 1) ∪ (1, ∞), is the sandwiched Rényi rel-ative entropy of ρAB and σAB (Definition 4.26). Note that Eα(A; B)ρ is

448

Page 460: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

monotonically increasing in α for all ρAB (see Proposition 4.29). This fact,along with the fact that limα→1 Dα = D (see Proposition 4.28), leads to

limα→1

Eα(A; B)ρ = ER(A; B)ρ (5.2.5)

for every state ρAB. See Appendix 5.B for details of the proof.

4. The max-relative entropy of entanglement of ρAB,

Emax(A; B)ρ := infσAB∈SEP(A:B)

Dmax(ρAB‖σAB), (5.2.6)

where Dmax(ρAB‖σAB) is the max-relative entropy of ρAB and σAB (Defi-nition 4.56). Using the fact that limα→∞ Dα = Dmax (see Proposition 4.58),we find that

Emax(A; B)ρ = limα→∞

Eα(A; B)ρ (5.2.7)

for every state ρAB. See Appendix 5.B for details of the proof. As a conse-quence of this fact, and the fact that Eα(A; B)ρ is monotonically increasingin α for all ρAB, we have that

Emax(A; B)ρ ≥ Eα(A; B)ρ (5.2.8)

for all α ∈ (1, ∞) and every state ρAB.

In addition to the quantities above, we also can define the Petz– and geo-metric Rényi relative entropies of entanglement in a similar way, but based onDefinitions 4.18 and 4.36, respectively, for the range of α for which data pro-cessing holds. These are denoted by Eα(A; B)ρ and Eα(A; B)ρ, respectively.

Proposition 5.16 Properties of Generalized Divergence of Entangle-ment

Let D be a generalized divergence that is continuous in its secondargument, and consider the generalized divergence of entanglementE(A; B)ρ of a state ρAB, as defined in (5.2.1).

1. Separable monotonicity: For every separable channel SAB→A′B′ ,the generalized divergence of entanglement is monotone non-increasing:

E(A; B)ρ ≥ E(A′; B′)ω, (5.2.9)

where ωA′B′ = SAB→A′B′(ρAB). Since every LOCC channel is a sep-

449

Page 461: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

arable channel, the generalized divergence of entanglement is alsomonotone non-increasing under LOCC channels. It is therefore anentanglement measure as per Definition 5.1.

2. Faithfulness: If D satisfies D(ρ‖σ) ≥ 0 and D(ρ‖σ) = 0 if and onlyif ρ = σ (for all states ρ and σ), then E(A; B)ρ = 0 if and only ifρAB ∈ SEP(A : B). The generalized divergence of entanglement isthen a faithful entanglement measure.

3. Subadditivity: If D is additive for product positive semi-definite op-erators, i.e., D(ρ ⊗ ω‖σ ⊗ τ) = D(ρ‖σ) + D(ω‖τ), then for everytwo quantum states ρA1B1 and ωA2B2 , the generalized divergence ofentanglement is sub-additive:

E(A1A2; B1B2)ρ⊗ω ≤ E(A1; B1)ρ + E(A2; B2)ω. (5.2.10)

4. Convexity: If D is jointly convex, meaning that

D

(∑

x∈Xp(x)ρx

AB

∥∥∥∥∥ ∑x∈X

p(x)σxAB

)≤ ∑

x∈Xp(x)D(ρx

AB‖σxAB), (5.2.11)

for every finite alphabet X, probability distribution p : X → [0, 1],and sets ρx

ABx∈X, σxABx∈X of states, then the generalized diver-

gence of entanglement is convex:

E(A; B)ρ ≤ ∑x∈X

p(x)E(A; B)ρx , (5.2.12)

where ρAB = ∑x∈X p(x)ρxAB.

Properties 1., 2., 3. are satisfied when the generalized divergence is thequantum relative entropy, the Petz–, sandwiched, and geometric Rényirelative entropy, and the max-relative entropy. Property 4. is satisfiedwhen the generalized divergence is the quantum relative entropy andthe Petz–, sandwiched, and geometric Rényi relative entropies for therange of α < 1 for which data processing holds.

PROOF:

450

Page 462: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

1. For ωA′B′ = SAB→A′B′(ρAB), we have by definition,

E(A′; B′)ω = infτA′B′∈SEP(A′:B′)

D(ωA′B′‖τA′B′) (5.2.13)

= infτA′B′∈SEP(A′:B′)

D(SAB→A′B′(ρAB)‖τA′B′). (5.2.14)

Now, recall that every separable channel SAB→A′B′ takes σAB ∈ SEP(A : B)to a state in SEP(A′ : B′), as shown already in (3.2.290)–(3.2.291). There-fore, restricting the optimization in (5.2.14) leads to

E(A′; B′)ω ≤ infσAB∈SEP(A:B)

D(SAB→A′B′(ρAB)‖SAB→A′B′(σAB)) (5.2.15)

≤ infσAB∈SEP(A:B)

D(ρAB‖σAB) (5.2.16)

= E(A; B)ρ, (5.2.17)

as required, where we used the data-processing inequality for the gener-alized divergence to obtain the second inequality.

2. We haveE(A; B)ρ = inf

σAB∈SEP(A:B)D(ρAB‖σAB). (5.2.18)

If ρAB ∈ SEP(A : B), then the state ρAB itself achieves the minimum in(5.2.18) because D(ρAB‖ρAB) = 0. We thus have E(A; B)ρ = 0. On theother hand, if E(A; B)ρ = 0, then there exists a separable state σ∗AB suchthat D(ρAB‖σ∗AB) = 0, which by assumption implies that ρAB = σ∗AB, i.e.,that ρAB is separable.

3. By definition, the optimization in the definition of E(A1A2; B1B2)ρ⊗ω isover the set SEP(A1A2 : B1B2). It is straightforward to see that this setcontains states of the form ξA1B1 ⊗ τA2B2 , where ξA1B1 ∈ SEP(A1 : B1) andτA2B2 ∈ SEP(A2 : B2). By restricting the optimization to such states, andby using additivity of the generalized divergence D, we obtain

E(A1A2; B1B2)ρ⊗ω ≤ D(ρA1B1 ⊗ωA2B2‖ξA1B1 ⊗ τA2B2) (5.2.19)= D(ρA1B1‖ξA1B1) + D(ωA2B2‖τA2B2). (5.2.20)

Since ξA1B1 ∈ SEP(A1 : B1) and τA2B2 ∈ SEP(A2 : B2) are arbitrary, theinequality in (5.2.10) follows.

4. We have

E(A; B)ρ = infσAB∈SEP(A:B)

D

(∑

x∈Xp(x)ρx

AB

∥∥∥∥∥σAB

). (5.2.21)

451

Page 463: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Let us restrict the optimization over all separable states to an optimiza-tion over sets σx

ABx∈X of separable states indexed by the alphabet X.Then, ∑x∈X p(x)σx

AB is a separable state because the set of separable statesis convex. Therefore, using the joint convexity of D, we obtain

E(A; B)ρ ≤ infσx

ABx⊂SEP(A:B)D

(∑

x∈Xp(x)ρx

AB

∥∥∥∥∥ ∑x∈X

p(x)σxAB

)(5.2.22)

≤ infσx

ABx⊂SEP(A:B)∑

x∈Xp(x)D(ρx

AB‖σxAB) (5.2.23)

≤ ∑x∈X

p(x) infσx

AB∈SEP(A:B)D(ρx

AB‖σxAB) (5.2.24)

= ∑x∈X

p(x)E(A; B)ρx , (5.2.25)

as required.

We now delve a bit more into particular examples of the generalized di-vergence of entanglement, which are based on the relative entropy and thePetz–, sandwiched, and geometric Rényi relative entropies.

Proposition 5.17

The relative entropy of entanglement is invariant under classical com-munication; i.e., (5.1.6) holds with E set to ER.

PROOF: Let ρXAB be a classical–quantum state of the form in (5.1.5). Letσx

ABx∈X be an arbitrary set of separable states, and set

σXAB := ∑x∈X

p(x)|x〉〈x|X ⊗ σxAB. (5.2.26)

Consider that

ER(XA; B)ρ = infσXAB∈SEP(XA:B)

D(ρXAB‖σXAB) (5.2.27)

≤ D(ρXAB‖σXAB) (5.2.28)

= ∑x∈X

p(x)D(ρxAB‖σx

AB), (5.2.29)

where the last equality follows from the direct-sum property of relative en-tropy in (4.2.27). Since the inequality holds for every set σx

ABx∈X of separa-ble states, we conclude that

ER(XA; B)ρ ≤ ∑x∈X

p(x)ER(A; B)ρx . (5.2.30)

452

Page 464: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Now suppose that σXAB is an arbitrary separable state of the systemsXA|B (here we assume that the system X is not necessarily classical). Afterperforming the completely dephasing channel ∆X(·) := ∑x∈X |x〉〈x|X(·)|x〉〈x|Xon the X system of σXAB, the resulting state is a classical–quantum state of thefollowing form:

∆X(σXAB) = ∑x∈X

q(x)|x〉〈x|X ⊗ σxAB, (5.2.31)

where q(x) is a probability distribution and σxABx∈X is a set of separable

states. Then consider that

D(ρXAB‖σXAB) ≥ D(∆X(ρXAB)‖∆X(σXAB)) (5.2.32)

= ∑x∈X

p(x)D(ρxAB‖σx

AB) + D(p‖q) (5.2.33)

≥ ∑x∈X

p(x)D(ρxAB‖σx

AB) (5.2.34)

≥ ∑x∈X

p(x)ER(A; B)ρx . (5.2.35)

The first inequality follows from the data-processing inequality for relativeentropy (Theorem 4.4). The equality follows from the direct-sum propertyof relative entropy in (4.2.27). The second inequality follows from the non-negativity of the classical relative entropy D(p‖q). The final inequality fol-lows from the definition of the relative entropy of entanglement and the factthat σx

AB is separable. Since the chain of inequalities holds for every separablestate σXAB, we conclude that

ER(XA; B)ρ ≥ ∑x∈X

p(x)ER(A; B)ρx . (5.2.36)

Putting together (5.2.30) and (5.2.36), and noting that the same argument ap-plies when exchanging the roles of Alice and Bob, we conclude the statementof the proposition.

As an immediate corollary of Proposition 5.17, Lemma 5.2, and Prop-erty 1. of Proposition 5.16, we conclude that the relative entropy of entangle-ment is a selective LOCC monotone. However, we can conclude somethingstronger, which is what we prove in Proposition 5.19 below after definingselective separable monotonicity.

453

Page 465: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Definition 5.18 Selective Separable Monotonicity

As a generalization of selective LOCC monotonicity defined in (5.1.14)and in the spirit of the selective PPT monotonicity from Definition 5.9,we say that a function E : D(HAB) → R is a selective separable mono-tone if it satisfies

E(ρAB) ≥ ∑x∈X:p(x)>0

p(x)E(ωxA′B′), (5.2.37)

for every bipartite state ρAB and separable instrument SxAB→A′B′x∈X,

with

p(x) := Tr[SxAB→A′B′(ρAB)], (5.2.38)

ωxA′B′ :=

1p(x)

SxAB→A′B′(ρAB). (5.2.39)

A separable instrument is such that every map SxAB→A′B′ is completely

positive and separable (with Kraus operators of the form in (3.2.289)),and the sum map ∑x∈X Sx

AB→A′B′ is trace preserving.It follows that E is a separable monotone if it is a selective separablemonotone, because the former is a special case of the latter in which thealphabet X has only one letter.

Proposition 5.19 Selective Separable Monotonicity of Relative En-tropies of Entanglement

The relative entropy of entanglement is a selective separable monotone;i.e., (5.2.37) holds with E set to ER. The Petz–, sandwiched, and geo-metric Rényi relative entropies of entanglement are selective separablemonotones for the range α > 1 for which data processing holds.

PROOF: Let us begin with the relative entropy of entanglement. Let ρAB bean arbitrary bipartite state, let Sx

AB→A′B′x∈X be a separable instrument, andlet SAB→XA′B′ denote the following quantum channel:

SAB→XA′B′(ωAB) := ∑x∈X|x〉〈x|X ⊗ Sx

AB→A′B′(ωAB). (5.2.40)

454

Page 466: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Let τXA′B′ := SAB→XA′B′(ρAB), and note that

τXA′B′ = ∑x∈X

p(x)|x〉〈x|X ⊗ τxA′B′ , (5.2.41)

for some probability distribution p(x) and set τxA′B′x∈X of states. Then con-

sider that

ER(A; B)ρ ≥ ER(XA′; B′)τ (5.2.42)

= ∑x∈X

p(x)ER(A′; B′)τx . (5.2.43)

The inequality follows because ER is monotone under separable channels(Property 1. of Proposition 5.16), and the equality follows from Proposition 5.17.

Let us now consider proving the statement for the Petz–Rényi relative en-tropy of entanglement for α ∈ (1, 2]. Consider the same channel SAB→XA′B′and state τXA′B′ defined above. Let σAB be an arbitrary separable state, andconsider that

SAB→XA′B′(σAB) = ∑x∈X

q(x)|x〉〈x|X ⊗ σxA′B′ , (5.2.44)

for some probability distribution q(x) and set σxA′B′x∈X of separable states.

Then we find that

Qα(ρAB‖σAB) ≥ Qα(SAB→XA′B′(ρAB)‖SAB→XA′B′(σAB)) (5.2.45)

= ∑x∈X:p(x)>0

p(x)αq(x)1−αQα(τxA′B′‖σx

A′B′) (5.2.46)

= ∑x∈X:p(x)>0

p(x)(

p(x)q(x)

)α−1

Qα(τxA′B′‖σx

A′B′). (5.2.47)

The first inequality follows from the data-processing inequality for the Petz–Rényi relative quasi-entropy (Theorem 4.22), and the first equality followsfrom its direct-sum property (see (4.4.46)). Now applying the monotonicityand concavity of the function (·)→ 1

α−1 log2(·) for α ∈ (1, 2], we find that

Dα(ρAB‖σAB)

=1

α− 1log2 Qα(ρAB‖σAB) (5.2.48)

≥ 1α− 1

log2

x∈X:p(x)>0p(x)

(p(x)q(x)

)α−1

Qα(τxA′B′‖σx

A′B′)

(5.2.49)

455

Page 467: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

≥ ∑x∈X:p(x)>0

p(x)1

α− 1log2

[(p(x)q(x)

)α−1

Qα(τxA′B′‖σx

A′B′)

](5.2.50)

= ∑x∈X:p(x)>0

p(x)[

log2

(p(x)q(x)

)+ Dα(τ

xA′B′‖σx

A′B′)

](5.2.51)

= D(p‖q) + ∑x∈X:p(x)>0

p(x)Dα(τxA′B′‖σx

A′B′) (5.2.52)

≥ ∑x∈X:p(x)>0

p(x)Eα(A′; B′)τx . (5.2.53)

The final two equalities follow by direct evaluation and applying definitions.The final inequality follows because D(p‖q) ≥ 0 for probability distributionsp and q, and it also follows from the definition of the Petz–Rényi relativeentropy of entanglement and the fact that the state σx

A′B′ is separable. Sincethe inequality holds for every separable state σAB, we conclude the desiredinequality:

Eα(A; B)ρ ≥ ∑x∈X:p(x)>0

p(x)Eα(A′; B′)τx . (5.2.54)

By applying the same method of proof for the sandwiched and geomet-ric Rényi relative entropies for the range of α > 1 for which data processingholds, along with their data processing and direct-sum properties, we con-clude the same inequality for the sandwiched and geometric Rényi relativeentropies of entanglement.

The following additional facts are known specifically about the relativeentropy of entanglement and the Petz–, sandwiched, and geometric Rényirelative entropies of entanglement.

Proposition 5.20

1. For every bipartite state ρAB,

ER(A; B)ρ ≥ maxI(A〉B)ρ, I(B〉A)ρ, (5.2.55)Eα(A; B)ρ ≥ maxIα(A〉B)ρ, Iα(B〉A)ρ, (5.2.56)

Eα(A; B)ρ ≥ max Iα(A〉B)ρ, Iα(B〉A)ρ (5.2.57)

Eα(A; B)ρ ≥ max Iα(A〉B)ρ, Iα(B〉A)ρ, (5.2.58)

where the last three inequalities hold for the range of α for whichdata processing holds.

456

Page 468: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

2. For every pure bipartite state ψAB,

ER(A; B)ψ = H(A)ψ, (5.2.59)Eα(A; B)ψ = H 1

α(A)ψ, (5.2.60)

Eα(A; B)ψ = H α2α−1

(A)ψ, (5.2.61)

Eα(A; B)ψ = H0(A)ψ, (5.2.62)

where the last three equalities hold for the range of α for which dataprocessing holds.

REMARK: Observe that for pure states, the relative entropy of entanglement is equal tothe entanglement of formation (see (5.1.40)).

PROOF:

1. Let σAB be an arbitrary separable state, which can be written as

σAB = ∑x∈X

p(x)ωxA ⊗ τx

B , (5.2.63)

where X is some finite alphabet p : X→ [0, 1] is a probability distribution,and ωx

Ax∈X, τxBx∈X are sets of states. Since every ωx

A is a state (inparticular, since all of its eigenvalues are bounded from above by one),the inequality ωx

A ≤ 1A holds, which implies that 1A ⊗ τxB ≥ ωx

A ⊗ τxB for

all x ∈ X, which thus implies that

1A ⊗ σB = ∑x∈X

p(x)1A ⊗ τxB ≥ ∑

x∈Xp(x)ωx

A ⊗ τxB = σAB. (5.2.64)

Therefore, using property 2.(d) in Proposition 4.3, we have that

D(ρAB‖σAB) ≥ D(ρAB‖1A ⊗ σB) (5.2.65)≥ inf

σBD(ρAB‖1A ⊗ σB) (5.2.66)

= I(A〉B)ρ (5.2.67)

for all separable states, where the optimization is with respect to everystate σB on the right-hand side, and where we have used the expressionin (4.2.91) for coherent information. We thus have

ER(A; B)ρ = infσAB∈SEP(A:B)

D(ρAB‖σAB) ≥ I(A〉B)ρ. (5.2.68)

457

Page 469: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

By the same argument, but flipping the roles of Alice and Bob, we con-clude that

ER(A; B)ρ ≥ I(B〉A)ρ. (5.2.69)

Combining this inequality with the one in (5.2.68) leads to the desiredresult.

The same proof, but using (5.2.64) and Property 4. of Propositions 4.24,4.33, and 4.44, leads to the inequalities in (5.2.56)–(5.2.58).

2. Let

|ψ〉AB =r

∑k=1

√λk|ek〉A ⊗ | fk〉B (5.2.70)

be the Schmidt decomposition of |ψ〉AB, where r is the Schmidt rank,λk > 0 for all 1 ≤ k ≤ r, and |ek〉Ar

k=1, | fk〉Brk=1 are orthonormal sets

of vectors.

Since the entropy H(AB)ψ vanishes for all pure states, we immediatelyhave

I(A〉B)ψ = H(B)ψ = H(A)ψ = I(B〉A)ψ, (5.2.71)

where the equality H(B)ψ = H(A)ψ follows from the Schmidt decompo-sition in (5.2.70), which tells us that the reduced states ψA and ψB have thesame non-zero eigevalues. Based on the fact that ER(A; B)ψ ≥ I(A〉B)ψ,which we just proved above, we thus have the lower bound

ER(A; B)ψ ≥ H(A)ψ. (5.2.72)

The same reasoning, but using (4.11.139)–(4.11.141), leads to the inequal-ities

Eα(A; B)ψ ≥ H 1α(A)ψ, (5.2.73)

Eα(A; B)ψ ≥ H α2α−1

(A)ψ, (5.2.74)

Eα(A; B)ψ ≥ H0(A)ψ. (5.2.75)

For the reverse inequality, let

Π :=r

∑k=1|ek〉〈ek|A ⊗ | fk〉〈 fk|B (5.2.76)

be the projection onto the r2-dimension subspace of HAB on which ψABis supported. Also, define a channel N as

N(XAB) := ΠXABΠ + (1AB −Π)XAB(1AB −Π). (5.2.77)

458

Page 470: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Note that, by definition, N(ψAB) = ψAB. Also, let σB be a state, and, forp(k) := 〈 fk|σB| fk〉B, set

σAB :=r

∑k=1

p(k)|ek〉〈ek|A ⊗ | fk〉〈 fk|B, (5.2.78)

which is a separable state. It is straightforward to show that

Π(1A ⊗ σB)Π = σAB. (5.2.79)

Also, we have that

D(N(ψAB)‖N(1A ⊗ σB))

= D(ψAB‖σAB + (1AB −Π)(1A ⊗ σB)(1AB −Π)) (5.2.80)= D(ψAB‖σAB), (5.2.81)

because the operator (1AB −Π)(1A ⊗ σB)(1AB −Π) is supported on thespace orthogonal to the support of ψAB. Therefore, by the data-processinginequality for quantum relative entropy, we obtain

ER(A; B)ψ = infσAB∈SEP(A:B)

D(ψAB‖σAB) (5.2.82)

≤ D(ψAB‖σAB) (5.2.83)= D(N(ψAB)‖N(1A ⊗ σB)) (5.2.84)≤ D(ψAB‖1A ⊗ σB). (5.2.85)

Since the inequality holds for every state σB, we conclude that

ER(A; B)ψ ≤ infσB

D(ψAB‖1A ⊗ σB) = I(A〉B)ψ = H(A)ψ. (5.2.86)

Combining this with (5.2.72) gives us ER(A; B)ψ = H(A)ψ, as required.

Applying the same method of proof, but using the properties of the Petz–,sandwiched, and geometric Rényi relative entropies, as well as (4.11.139)–(4.11.141), we conclude the inequalities

Eα(A; B)ψ ≤ H 1α(A)ψ, (5.2.87)

Eα(A; B)ψ ≤ H α2α−1

(A)ψ, (5.2.88)

Eα(A; B)ψ ≤ H0(A)ψ, (5.2.89)

which, combined with (5.2.73)–(5.2.75), leads to (5.2.60)–(5.2.62).

459

Page 471: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

5.2.1 Cone Program Formulations

Computing a generalized divergence of entanglement involves an optimiza-tion over the set of separable states, which is known to be hard (please con-sult the Bibliographic Notes in Section 5.8). The optimization is made morecomplicated by the fact that most of the generalized divergences we considerin this book are non-linear functions of the input state (for example, sand-wiched Rényi relative entropy). However, both the max-relative entropy andhypothesis testing relative entropy can be formulated as semi-definite pro-grams (SDPs). Indeed, recall from (4.8.4) that

Dmax(ρ‖σ) = log2 infλ : ρ ≤ λσ, (5.2.90)

and recall from (4.9.2) that

DεH(ρ‖σ) = − log2 infTr[Λσ] : 0 ≤ Λ ≤ 1, Tr[Λρ] ≥ 1− ε. (5.2.91)

As discussed earlier, both of these generalized divergences can be cast intothe SDP standard forms in Definition 2.20, and thus their corresponding gen-eralized divergence of entanglement can be formulated as a cone program. Acone program is an optimization problem over a convex cone3 with a convexobjective function. An SDP is a special case of a cone program in which theconvex cone is the set of positive semi-definite operators.

The convex cone of interest here is the set SEP(A : B) of all separable oper-ators, which we define as follows: XAB ∈ SEP(A : B) if there exists a positiveinteger ` and positive semi-definite operators Px

A`x=1 and QxB`x=1 such that

XAB =`

∑x=1

PxA ⊗Qx

B. (5.2.92)

In what follows, we sometimes employ the shorthands SEP and SEP whenthe bipartition is clear from the context.

We now show that the max-relative entropy of entanglement can be writ-ten as a cone program.

3A subset C of a vector space is called a cone if αx ∈ C for every x ∈ C and α > 0. Aconvex cone is one for for which αx + βy ∈ C for all α, β > 0 and x, y ∈ C.

460

Page 472: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Proposition 5.21 Cone Program for Max-Relative Entropy of Entan-glement

Let ρAB be a bipartite state. Then,

Emax(A; B)ρ = log2 Gmax(A; B)ρ, (5.2.93)

where

Gmax(A; B)ρ := inf

Tr[XAB] : ρAB ≤ XAB, XAB ∈ SEP

. (5.2.94)

PROOF: Employing the expression in (5.2.90), we find that

Emax(A; B)ρ infσAB∈SEP(A:B)

Dmax(ρAB‖σAB)

= log2 infµ : ρAB ≤ µσAB, σAB ∈ SEP (5.2.95)

= log2 inf

Tr[XAB] : ρAB ≤ XAB, XAB ∈ SEP

, (5.2.96)

as required, where in the last step we made the change of variable µσAB ≡XAB. Since σAB ∈ SEP(A : B) and µ ≥ 0, we have that XAB ∈ SEP(A : B).

Next, we show that the hypothesis testing relative entropy of entangle-ment can be written as a cone program.

Proposition 5.22 Cone Program for Hypothesis Testing Relative En-tropy of Entanglement

Let ρAB be a bipartite state. Then, for all ε ∈ [0, 1],

EεH(A; B)ρ = − log2 supµ(1− ε)− Tr[ZAB] : µ ≥ 0, ZAB ≥ 0,

σAB ∈ SEP, µρAB ≤ σAB + ZAB, Tr[σAB] = 1. (5.2.97)

PROOF: This follows from the definition in (5.2.3) and the dual formulationof the hypothesis testing relative entropy stated in (4.9.5).

Recall that SEP = PPT in the case of qubit-qubit and qubit-qutrit states,which means that the optimizations in (5.2.94) and (5.2.97) are SDPs whenρAB is either a two-qubit state or a qubit-qutrit state.

461

Page 473: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

5.3 Generalized Rains Divergence

As explained in Section 5.1.1, the generalized divergence of entanglement isin general complicated to compute because the set of separable states does notadmit a simple characterization, making optimization over separable statesdifficult. By relaxing the set of separable states to the set PPT′ defined in(5.1.144), which has a simple characterization in terms of semi-definite con-straints, we defined the generalized Rains divergence in (5.1.148). Recall that

PPT′(A : B) = σAB : σAB ≥ 0, ‖TB(σAB)‖1 ≤ 1. (5.3.1)

In this section, we investigate properties of the generalized Rains diver-gence. We start by recalling its definition.

Definition 5.23 Generalized Rains Divergence of a Bipartite State

Let D be a generalized divergence (see Definition 4.13). For every bipar-tite state ρAB, we define the generalized Rains divergence of ρAB as

R(A; B)ρ := infσAB∈PPT′(A:B)

D(ρAB‖σAB). (5.3.2)

If D is continuous in its second argument, then the infimum can be re-placed by a minimum.

Since SEP ⊆ PPT′, optimizing over states in PPT′ can never lead to a valuethat is greater than the value obtained by optimizing over separable states.Therefore, as stated in (5.1.149),

R(A; B)ρ ≤ E(A; B)ρ (5.3.3)

for every state ρAB.

We are particularly interested throughout the rest of this book in the fol-lowing generalized Rains divergences for every state ρAB:

1. The Rains relative entropy of ρAB,

R(A; B)ρ := infσAB∈PPT′(A:B)

D(ρAB‖σAB), (5.3.4)

where D(ρAB‖σAB) is the quantum relative entropy of ρAB and σAB (Def-inition 4.1). For two-qubit states, it is known that the infimum in (5.3.4)

462

Page 474: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

is achieved by a separable state, which means that ER(A; B)ρ = R(A; B)ρ

for two-qubit states ρAB (please consult the Bibliographic Notes in Sec-tion 5.8).

2. The ε-hypothesis testing Rains relative entropy of ρAB,

RεH(A; B)ρ := inf

σAB∈PPT′(A:B)Dε

H(ρAB‖σAB), (5.3.5)

where DεH(ρAB‖σAB) is the ε-hypothesis testing relative entropy of ρAB

and σAB (Definition 4.62).

3. The sandwiched Rényi Rains relative entropy of ρAB,

Rα(A; B)ρ := infσAB∈PPT′(A:B)

Dα(ρAB‖σAB), (5.3.6)

where Dα(ρAB‖σAB), α ∈ [1/2, 1) ∪ (1, ∞), is the sandwiched Rényi rel-ative entropy of ρAB and σAB (Definition 4.26). Note that Rα(A; B)ρ ismonotonically increasing in α for all ρAB (see Proposition 4.29). This fact,along with the fact that limα→1 Dα = D (see Proposition 4.28), leads to

limα→1

Rα(A; B)ρ = R(A; B)ρ (5.3.7)

for every state ρAB. See Appendix 5.B for details of the proof.

4. The max-Rains relative entropy of ρAB,

Rmax(A; B)ρ := infσAB∈PPT′(A:B)

Dmax(ρAB‖σAB), (5.3.8)

where Dmax(ρAB‖σAB) is the max-relative entropy of ρAB and σAB (Defi-nition 4.56). Using the fact that limα→∞ Dα = Dmax (see Proposition 4.58),we find that

Rmax(A; B)ρ = limα→∞

Rα(A; B)ρ (5.3.9)

for every state ρAB. See Appendix 5.B for details of the proof. Due to thisfact, along with the fact that Rα(A; B)ρ is monotonically increasing in αfor all ρAB, we have that

Rmax(A; B)ρ ≥ Rα(A; B)ρ (5.3.10)

for all α ∈ (1, ∞) and every state ρAB.

The following inequalities relate the log-negativity, as defined in (5.1.96),to the Rains relative entropy, the sandwiched Rényi Rains relative entropy,and the max-Rains relative entropy:

463

Page 475: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Proposition 5.24 Log-Negativity to Rains Relative Entropies

For a bipartite state ρAB, the following inequalities hold

R(A; B)ρ ≤ Rmax(A; B)ρ ≤ EN(A; B)ρ. (5.3.11)

Furthermore, for all α, β ∈ [1/2, 1) ∪ (1, ∞) such that α < β, we have that

Rα(A; B)ρ ≤ Rβ(A; B)ρ. (5.3.12)

PROOF: The inequality R(A; B)ρ ≤ Rmax(A; B)ρ and the inequality in (5.3.12)are a direct consequence of the monotonicity in α of the sandwiched Rényirelative entropy (Proposition 4.29), as well as (5.3.7) and (5.3.9).

To see the inequality Rmax(A; B)ρ ≤ EN(A; B)ρ, consider picking

σAB =ρAB

‖TB(ρAB)‖1(5.3.13)

in (5.3.8). For this choice, we have that σAB ≥ 0 and ‖TB(σAB)‖1 ≤ 1, so thatσAB ∈ PPT′(A : B). Thus

Rmax(A; B)ρ ≤ Dmax(ρAB‖σAB) (5.3.14)= Dmax(ρAB‖ρAB) + log2 ‖TB(ρAB)‖1 (5.3.15)= EN(A; B)ρ. (5.3.16)

The first equality follows by direct evaluation using Definition 4.56. Thelast equality follows because Dmax(ρAB‖ρAB) = 0 and from the definitionin (5.1.96).

Proposition 5.25 Properties of Generalized Rains Divergence

Let D be a generalized divergence, and consider the generalized Rainsdivergence R(A; B)ρ of a state ρAB as defined in (5.3.2).

1. PPT monotonicity: For every completely PPT-preserving channelPAB→A′B′ ,

R(A; B)ρ ≥ R(A′; B′)ω, (5.3.17)

where ωA′B′ = PAB→A′B′(ρAB). In other words, the general-ized Rains divergence is monotonically non-increasing under com-pletely PPT-preserving channels. Since every LOCC channel is a

464

Page 476: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

completely PPT-preserving channel (Propositions 3.39 and 3.44),the generalized Rains divergence is monotone non-increasing un-der LOCC channels, and thus it is an entanglement measure as perDefinition 5.1.

2. Subadditivity: If D is additive for product positive semi-definite op-erators, i.e., D(ρ ⊗ ω‖σ ⊗ τ) = D(ρ‖σ) + D(ω‖τ), then for everytwo quantum states ρA1B1 and ωA2B2 the generalized Rains relativeentropy is subadditive:

R(A1A2; B1B2)ρ⊗ω ≤ R(A1; B1)ρ + R(A2; B2)ω. (5.3.18)

3. Convexity: If D is jointly convex, meaning that

D

(∑

x∈Xp(x)ρx

AB

∥∥∥∥∥ ∑x∈X

p(x)σxAB

)≤ ∑

x∈Xp(x)D(ρx

AB‖σxAB) (5.3.19)

for every finite alphabet X, probability distribution p : X → [0, 1],set ρx

ABx∈X of states, set σxABx∈X of positive semi-definite oper-

ators, then the generalized Rains divergence is convex:

R(A; B)ρ ≤ ∑x∈X

p(x)R(A; B)ρx , (5.3.20)

where ρAB = ∑x∈X p(x)ρxAB.

Properties 1. and 2. are satisfied when the generalized divergence is thequantum relative entropy, the Petz–, sandwiched, and geometric Rényirelative entropies, and the max-relative entropy. Property 3. is satisfiedwhen the generalized divergence is the quantum relative entropy andthe Petz–, sandwiched, and geometric Rényi relative entropies for therange of α < 1 for which data processing holds.

REMARK: Note that the generalized Rains divergence is generally not a faithful entangle-ment measure. Although R(A; B)ρ = 0 for all separable states ρAB due to the containmentSEP(A : B) ⊆ PPT′(A : B), the converse statement is not generally true because the infimumin the definition of R(A; B)ρ is not generally achieved by a separable state.

PROOF:

465

Page 477: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

1. For ωA′B′ = PAB→A′B′(ρAB), we have by definition,

R(A′; B′)ω = infτA′B′∈PPT′(A′:B′)

D(ωA′B′‖τA′B′) (5.3.21)

= infτA′B′∈PPT′(A′:B′)

D(PAB→A′B′(ρAB)‖τA′B′). (5.3.22)

Now, recall from Lemma 5.14 that the set PPT′ is closed under completelyPPT-preserving channels. Based on this, it follows that the output oper-ators of the completely PPT-preserving channel PAB→A′B′ are in the setPPT′(A′ : B′). In other words, we have

PAB→A′B′(σAB) : σAB ∈ PPT′(A : B) ⊆ PPT′(A′ : B′). (5.3.23)

Therefore, restricting the optimization in (5.3.22) to this set leads to

R(A′; B′)ω ≤ infσAB∈PPT′(A:B)

D(PAB→A′B′(ρAB)‖PAB→A′B′(σAB)) (5.3.24)

≤ infσAB∈PPT′(A:B)

D(ρAB‖σAB) (5.3.25)

= R(A; B)ρ, (5.3.26)

as required, where to obtain the second inequality we used the data-processing inequality for the generalized divergence.

2. By definition, the optimization in the definition of R(A1A2; B1B2)ρ⊗ω isover the set

PPT′(A1A2 : B1B2)

= σA1 A2B1B1 : σA1 A2B1B2 ≥ 0,∥∥TB1B2(σA1 A2B1B2)

∥∥1 ≤ 1, (5.3.27)

which contains operators of the form σA1 A2B1B2 = ξA1B1 ⊗ τA2B2 , whereξA1B1 ∈ PPT′(A1 : B1) and τA2B2 ∈ PPT′(A2 : B2). By restricting the opti-mization to such tensor product operators, and by using the additivity ofthe generalized divergence D, we obtain

R(A1A2; B1B2)ρ⊗ω ≤ D(ρA1B1 ⊗ωA2B2‖ξA1B1 ⊗ τA2B2) (5.3.28)= D(ρA1B1‖ξA1B1) + D(ωA2B2‖τA2B2). (5.3.29)

Since ξA1B1 ∈ PPT′(A1 : B1) and τA2B2 ∈ PPT′(A2 : B2) are arbitrary, theinequality in (5.3.18) follows.

466

Page 478: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

3. We have

R(A; B)ρ = infσAB∈PPT′(A:B)

D

(∑

x∈Xp(x)ρx

AB

∥∥∥∥∥σAB

). (5.3.30)

Let us restrict the optimization over all PPT′ operators to an optimiza-tion over sets σx

ABx∈X of PPT′ operators indexed by the alphabet X.Then, because PPT′(A : B) is a convex set, we have that ∑x∈X p(x)σx

AB ∈PPT′(A : B). Therefore, using the joint convexity of D, we obtain

R(A; B)ρ ≤ infσx

ABx⊂PPT′(A:B)D

(∑

x∈Xp(x)ρx

AB

∥∥∥∥∥ ∑x∈X

p(x)σxAB

)(5.3.31)

≤ infσx

ABx⊂PPT′(A:B)∑

x∈Xp(x)D(ρx

AB‖σxAB) (5.3.32)

≤ ∑x∈X

p(x) infσx

AB∈PPT′(A:B)D(ρx

AB‖σxAB) (5.3.33)

= ∑x∈X

p(x)R(A; B)ρx , (5.3.34)

as required.

By Proposition 5.25, the sandwiched Rényi Rains relative entropy Rα isconvex for α ∈ [1/2, 1). Although convexity does not hold for α > 1, we dohave quasi-convexity, which we now prove.

Proposition 5.26 Quasi-Convexity of Rényi Rains Relative Entropy

Let p : X → [0, 1] be a probability distribution over a finite alphabet X,and let ρx

ABx∈X be a set of states. Then, for all α ∈ (1, ∞),

Rα(A; B)ρ ≤ maxx∈X

Rα(A; B)ρx , (5.3.35)

where ρAB = ∑x∈X p(x)ρxAB.

PROOF: We have

Rα(A; B)ρ = infσAB∈PPT′(A:B)

(∑

x∈Xp(x)ρx

AB

∥∥∥∥∥σAB

). (5.3.36)

Let us restrict the optimization over all PPT′ operators to an optimization oversets σx

ABx∈X of PPT′ operators indexed by the alphabet X. Then, because

467

Page 479: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

PPT′(A : B) is a convex set, we have that ∑x∈X p(x)σxAB ∈ PPT′(A : B). Let us

also recall from (4.5.174) that the sandwiched Rényi relative entropy is jointlyquasi-convex, meaning that

(∑

x∈Xp(x)ρx

AB

∥∥∥∥∥ ∑x∈X

p(x)σxAB

)≤ max

x∈XDα(ρ

xAB‖σx

AB), (5.3.37)

We thus obtain

Rα(A; B)ρ ≤ infσx

ABx⊂PPT′(A:B)Dα

(∑

x∈Xp(x)ρx

AB

∥∥∥∥∥ ∑x∈X

p(x)σxAB

)(5.3.38)

≤ infσx

ABx⊂PPT′(A:B)maxx∈X

Dα(ρxAB‖σx

AB) (5.3.39)

≤ maxx∈X

infσx

AB∈PPT′(A:B)Dα(ρ

xAB‖σx

AB) (5.3.40)

= maxx∈X

Rα(A; B)ρx , (5.3.41)

as required.

5.3.1 Semi-Definite Program Formulations

One of the advantages of using the generalized Rains divergence as an en-tanglement measure is that the set PPT′ involved in its definition has a simplecharacterization in terms of semi-definite constraints. Indeed, let us recall that

PPT′(A : B) = σAB : σAB ≥ 0, ‖TB(σAB)‖1 ≤ 1. (5.3.42)

Using the expression in (5.1.102) for ‖TB(σAB)‖1, this set can equivalently bewritten as follows:

PPT′(A : B) = σAB : σAB ≥ 0, ∃KAB, LAB ≥ 0 such thatTr[KAB + LAB] ≤ 1, TB(KAB − LAB) = σAB, (5.3.43)

which can be further simplified to

PPT′(A : B) = TB(KAB − LAB) : TB(KAB − LAB) ≥ 0,KAB, LAB ≥ 0, Tr[KAB + LAB] ≤ 1, (5.3.44)

468

Page 480: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

In this section, we show how these characterizations of the set PPT′ allow usto compute both the max-Rains relative entropy and the hypothesis testingRains relative entropy via semi-definite programs (Section 2.4).

We first consider the max-Rains relative entropy, which we recall is de-fined as

Rmax(A; B)ρ = infσAB∈PPT′(A:B)

Dmax(ρAB‖σAB). (5.3.45)

Let us also recall from (4.8.4) that Dmax(ρAB‖σAB) can be written as follows:

Dmax(ρAB‖σAB) = log2 infλ : ρAB ≤ λσAB. (5.3.46)

As shown in the discussion after (4.8.4), the optimization in the equationabove is a semi-definite program (SDP). This, along with the definition in(5.3.42) of the set PPT′(A : B), leads to the following SDP formulation for Rmax.

Proposition 5.27 SDPs for the Max-Rains Relative Entropy

Let ρAB be a bipartite state. Then the max-Rains relative entropy can bewritten as

Rmax(A; B)ρ = log2 Wmax(A; B)ρ, (5.3.47)

where

Wmax(A; B)ρ

= infKAB,LAB≥0

Tr[KAB + LAB] : TB[KAB − LAB] ≥ ρAB, (5.3.48)

= supYAB≥0

Tr[YABρAB] : ‖TB[YAB]‖∞ ≤ 1. (5.3.49)

PROOF: First, we establish the equality for W(A; B)ρ in (5.3.48). Due to thefact that the infimum over PPT′ operators in the definition of Rmax can beachieved, we have that

Rmax(A; B)ρ = infσAB∈PPT′(A:B)

Dmax(ρAB‖σAB) (5.3.50)

= infσAB∈PPT′(A:B)

log2 infλ : ρAB ≤ λσAB (5.3.51)

= log2 infλ : ρAB ≤ λσAB, σAB ≥ 0, ‖TB[σAB]‖1 ≤ 1.(5.3.52)

The constraint ρAB ≤ λσAB implies (by taking the trace on both sides ofthe inequality) that λ ≥ 1. This in turn implies that λσAB ≥ 0 and that

469

Page 481: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

‖TB[λσAB]‖1 ≤ λ. So we have

Rmax(A; B)ρ = log2 infλ : ρAB ≤ λσAB, λσAB ≥ 0, ‖TB[λσAB]‖1 ≤ λ(5.3.53)

Let us now make the change of variable SAB ≡ λσAB. We then have

Rmax(A; B)ρ = log2 infλ : ρAB ≤ SAB, SAB ≥ 0, ‖TB[SAB]‖1 ≤ λ (5.3.54)= log2 inf‖TB[SAB]‖1 : ρAB ≤ SAB, (5.3.55)

where to obtain the last line we eliminated the constraint SAB ≥ 0 (because itis implied by ρAB ≤ SAB) and we used the fact that µ ≥ ‖TB[SAB]‖1, meaningthat the smallest value of λ is ‖TB[SAB]‖1.

Now, for an arbitrary operator SAB satisfying ρAB ≤ SAB, recall from(2.2.52) that the Jordan–Hahn decomposition of TB[SAB] is given by TB(SAB) =KAB − LAB, with KAB, LAB ≥ 0 and KABLAB = 0. We then find that

‖TB[SAB]‖1 = Tr[KAB + LAB], (5.3.56)SAB = TB[KAB − LAB]. (5.3.57)

The following inequality thus holds:

infTr[KAB + LAB] : ρAB ≤ TB[KAB − LAB], KAB, LAB ≥ 0≤ inf‖TB(SAB)‖1 : ρAB ≤ SAB. (5.3.58)

To see the opposite inequality, let KAB and LAB be arbitrary operators suchthat KAB, LAB ≥ 0 and ρAB ≤ TB[KAB − LAB]. Then, setting SAB = TB[KAB −LAB], we find that ρAB ≤ SAB and

‖TB(SAB)‖1 = ‖KAB − LAB‖1 (5.3.59)≤ ‖KAB‖1 + ‖LAB‖1 (5.3.60)= Tr[KAB + LAB], (5.3.61)

where we used the triangle inequality. This implies that

inf‖TB(SAB)‖1 : ρAB ≤ SAB ≤infTr[KAB + LAB] : ρAB ≤ TB(KAB − LAB), KAB, LAB ≥ 0. (5.3.62)

Putting together (5.3.58) and (5.3.62), we conclude that Wmax(A; B)ρ is givenby the equality in (5.3.48).

470

Page 482: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

To see the equality in (5.3.49), we employ semi-definite programming du-ality (see Section 2.4). We first put (5.3.48) into standard form (see Defini-tion 2.20) as follows:

infX≥0Tr[CX] : Φ(X) ≥ D, (5.3.63)

with

X =

(KAB 0

0 LAB

), C =

(1AB 0

0 1AB

), (5.3.64)

Φ(X) = TB[KAB − LAB], D = ρAB. (5.3.65)

The dual program is then given by

supY≥0Tr[DY] : Φ†(Y) ≤ C. (5.3.66)

In order to determine the adjoint Φ† of Φ, consider that

Tr[Φ(X)Y] = Tr[TB[KAB − LAB]YAB] (5.3.67)= Tr[(KAB − LAB)TB[YAB]] (5.3.68)

= Tr[(

KAB 00 LAB

)(TB[YAB] 0

0 −TB[YAB]

)]. (5.3.69)

Therefore,

Φ†(Y) =(

TB[YAB] 00 −TB[YAB]

)(5.3.70)

so that Φ†(Y) ≤ C is equivalent to(

TB[YAB] 00 −TB[YAB]

)≤(

1AB 00 1AB

). (5.3.71)

This is equivalent to the condition −1AB ≤ TB[YAB] ≤ 1AB, which is equiva-lent to ‖TB[YAB]‖∞ ≤ 1 (see (2.4.20)). We thus conclude that (5.3.66) is equalto (5.3.49).

To arrive at the equality in (5.3.48)–(5.3.49), we need to verify that strongduality holds, and so we check the conditions of Theorem 2.22. For the dualprogram in (5.3.49), pick YAB = 1

21AB. Then the constraints in (5.3.48) arestrict, because YAB > 0 and ‖TB(YAB)‖∞ < 1 for this choice. Furthermore,for the primal in (5.3.48), pick KAB and LAB equal to the positive and negativeparts of TB(ρAB), respectively. These are positive semi-definite by definitionand satisfy ρAB = TB(KAB− LAB), so that they are feasible for the primal.

471

Page 483: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

It is worthwhile to observe the close connection of the SDP in (5.3.48),related to the max-Rains relative entropy, to that in (5.1.102), related to thelog-negativity. In fact, the SDP in (5.3.48) is a relaxation of that in (5.1.102),and this leads to an alternate proof of the rightmost inequality from (5.3.11):

Rmax(A; B)ρ ≤ EN(A; B)ρ, (5.3.72)

holding for every bipartite state ρAB.

Furthermore, the form of the SDP in (5.3.48), along with the same proofgiven for Proposition 5.10, except with (5.1.106) and (5.1.112) replaced withinequality constraints, leads to the following conclusion:

Proposition 5.28 Max-Rains Relative Entropy is a Selective PPTMonotone

The max-Rains relative entropy is a selective PPT monotone; i.e.,(5.1.103) holds with E set to Rmax.

Due to the additivity of Dmax, Proposition 5.25 implies that Rmax is sub-additive:

Rmax(A1A2; B1B2)ρ⊗ω ≤ Rmax(A1; B1)ρ + Rmax(A2; B2)ω, (5.3.73)

where the inequality holds for all states ρA1B1 and ωA2B2 . Using the dualformulation in (5.3.48) for Rmax, we find that the reverse inequality also holds,implying that Rmax is an additive entanglement measure.

Proposition 5.29 Additivity of Max-Rains Relative Entropy

Let ρA1B1 and ωA2B2 be quantum states. Then,

Rmax(A1A2; B1B2)ρ⊗ω = Rmax(A1; B1)ρ + Rmax(A2; B2)ω. (5.3.74)

PROOF: We prove the inequality reverse to the one in (5.3.73). To this end,we employ the dual formulation of Rmax in (5.3.49). Let YA1B1 and SA2B2 bearbitrary operators satisfying

∥∥TB1 [YA1B1 ]∥∥

∞ ≤ 1, YA1B1 ≥ 0, (5.3.75)∥∥TB2 [SA2B2 ]∥∥

∞ ≤ 1, SA2B2 ≥ 0. (5.3.76)

Then it follows from multiplicativity of the Schatten ∞-norm under tensorproducts (see (2.2.104)) that

∥∥TB1B2 [YA1B1 ⊗ SA2B2 ]∥∥

∞ =∥∥TB1 [YA1B1 ]⊗ TB2 [SA2B2 ]

∥∥∞ (5.3.77)

472

Page 484: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

=∥∥TB1 [YA1B1 ]

∥∥∞

∥∥TB2 [SA2B2 ]∥∥

∞ (5.3.78)

≤ 1. (5.3.79)

Furthermore, we have that YA1B1 ⊗ SA2B2 ≥ 0. So it follows that

log2 Tr[YA1B1ρA1B1 ] + log2 Tr[SA2B2ωA2B2 ]

= log2(Tr[YA1B1ρA1B1 ]Tr[SA2B2ωA2B2 ]

)(5.3.80)

= log2(Tr[(YA1B1 ⊗ SA2B2

) (ρA1B1 ⊗ωA2B2

)])

(5.3.81)≤ Rmax(A1A2; B1B2)ρ⊗ω. (5.3.82)

The inequality follows because YA1B1 ⊗ SA2B2 is a particular operator satisfy-ing the constraints in (5.3.48) for Rmax(A1A2; B1B2)ρ⊗ω. Since the inequalityholds for all YA1B1 and SA2B2 satisfying (5.3.75)–(5.3.76), the superadditivityinequality

Rmax(A1A2; B1B2)ρ⊗ω ≥ Rmax(A1; B1)ρ + Rmax(A2; B2)ω (5.3.83)

follows.

We now consider the hypothesis testing Rains relative entropy, which isdefined as

RεH(A; B)ρ = inf

σAB∈PPT′(A:B)Dε

H(ρAB‖σAB), (5.3.84)

for ε ∈ [0, 1]. Recall the primal and dual formulations of DεH from Proposi-

tion 4.63. Using the dual formulation, we obtain the following:

Proposition 5.30 SDP for Hypothesis Testing Rains Relative En-tropy

Let ρAB be a bipartite state. Then, the hypothesis testing Rains relativeentropy can be written as

RεH(A; B)ρ = − log2 Wε

H(A; B)ρ, (5.3.85)

for all ε ∈ [0, 1], where

WεH(A; B)ρ

:= supµ≥0,ZAB,

KAB,LAB≥0

µ(1− ε)− Tr[ZAB] : µρAB ≤ TB(KAB − LAB) + ZAB,

TB(KAB − LAB) ≥ 0, Tr[KAB + LAB] ≤ 1 (5.3.86)

473

Page 485: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

= infMAB,NAB≥0

‖TB(MAB + NAB)‖∞ : Tr[MABρAB] ≥ 1− ε, MAB ≤ 1AB.(5.3.87)

PROOF: Using the dual SDP formulation of the hypothesis testing relativeentropy from Proposition 4.63, we obtain

WεH(A; B)ρ =

supµ≥0,ZAB≥0,σAB∈PPT′(A:B)

µ(1− ε)− Tr[ZAB] : µρAB ≤ σAB + ZAB (5.3.88)

Now, combining with the characterization of PPT′(A : B) from (5.3.44), weconclude (5.3.86).

The SDP for WεH(A; B)ρ in (5.3.86) can be written in the standard form of

(2.4.1) assupX≥0Tr[AX] : Φ(X) ≤ B , (5.3.89)

where

X =

µ 0 0 00 ZAB 0 00 0 KAB 00 0 0 LAB

, A =

1− ε 0 0 00 −1AB 0 00 0 0 00 0 0 0

, (5.3.90)

Φ(X) =

µρAB − TB(KAB − LAB)− ZAB 0 00 −TB(KAB − LAB) 00 0 Tr[KAB + LAB]

,

(5.3.91)

B =

0 0 00 0 00 0 1

. (5.3.92)

Setting

Y =

MAB 0 00 NAB 00 0 λ

, (5.3.93)

we compute the dual map Φ† as follows:

Tr[YΦ(X)]

474

Page 486: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

= Tr[MAB (µρAB − TB(KAB − LAB)− ZAB)]− Tr[NABTB(KAB − LAB)]

+ λ Tr[KAB + LAB] (5.3.94)= µ Tr[MABρAB]− Tr[MABZAB] + Tr[(λ1AB − TB(MAB + NAB))KAB]

+ Tr[(λ1AB + TB(MAB + NB)) LAB], (5.3.95)

which implies that

Φ†(Y) =

Tr[MABρAB] 0 0 00 −MAB 0 00 0 λ1AB − TB(MAB + NAB) 00 0 0 λ1AB + TB(MAB + NB)

.

(5.3.96)

Then using the standard form of the dual program in (2.4.2), i.e.,

infY≥0

Tr[BY] : Φ†(Y) ≥ A

, (5.3.97)

we find that the dual SDP is given by

infλ,MAB,NAB≥0

λ : Tr[MABρAB] ≥ 1− ε, MAB ≤ 1AB,

λ1AB ± TB(MAB + NAB) ≥ 0. (5.3.98)

This can alternatively be written as

infMAB,NAB≥0

‖TB(MAB + NAB)‖∞ : Tr[MABρAB] ≥ 1− ε, MAB ≤ 1AB.(5.3.99)

Finally, we verify that strong duality holds, by applying Theorem 2.22. Astrictly feasible choice for the primal variables is µ = 1, ZAB = 1AB, KAB =πAB/2, LAB = πAB/3. A feasible choice for the dual variables is MAB = 1ABand NAB = 1AB.

5.4 Squashed Entanglement

In this section, we investigate the properties of the squashed entanglement,which we introduced as an entanglement measure in Section 5.1.1. We firstrecall the definition of squashed entanglement from (5.1.162).

475

Page 487: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Definition 5.31 Squashed Entanglement

Let ρAB be a bipartite state. Then, the squashed entanglement is definedas

Esq(A; B)ρ :=12

infωABEI(A; B|E)ω : TrE[ωABE] = ρAB , (5.4.1)

where the quantum conditional mutual information I(A; B|E)ω is de-fined in (4.1.11) as

I(A; B|E)ω = H(A|E)ω + H(B|E)ω − H(AB|E)ω (5.4.2)= H(AE)ω − H(E)ω + H(BE)ω − H(ABE)ω (5.4.3)= H(B|E)ω − H(B|AE)ω (5.4.4)

The optimization in the definition of squashed entanglement is with re-spect to all extensions ωABE of ρAB, with the extension system E having ar-bitrarily large, yet finite dimension, which means that the infimum cannot ingeneral be replaced by a minimum. The fact that the extension system canhave arbitrarily large, yet finite dimension also means that computing thesquashed entanglement is in general difficult; however, we can always placean upper bound on it by calculating the quantum conditional mutual infor-mation of a specific extension.

To understand why the quantity in (5.4.1) is called squashed entangle-ment, it is helpful to think in terms of a cryptographic scenario. This crypto-graphic perspective turns out to be useful in the context of secret key agree-ment, which we consider in Chapters 10 and 15. Consider three parties, theprotagonists Alice and Bob, as well as an eavesdropper. Alice and Bob possessthe quantum systems A and B, respectively, and the eavesdropper possessesa system E, such that the global state ωABE is consistent with the reducedstate ρAB of Alice and Bob. The conditional mutual information I(A; B|E)ω

corresponds to the correlations between Alice and Bob from the perspectiveof the eavesdropper. The squashed entanglement Esq(A; B)ρ is then an op-timization over all possible global states ωABE such that TrE[ωABE] = ρAB,and this optimization corresponds to the worst possible scenario in which theeavesdropper attempts to “squash down” the correlations of Alice and Bob,i.e., to reduce the value of I(A; B|E)ω as much as possible. This cryptographicperspective actually allows us to write the squashed entanglement in an al-ternative way, which we do in Proposition 5.37 below.

476

Page 488: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

We begin by establishing some basic properties of squashed entangle-ment. As we will see, the squashed entanglement possesses all of the desiredproperties of an entanglement measure stated at the beginning of Section 5.1.

Proposition 5.32 Properties of Squashed Entanglement

The squashed entanglement Esq(A; B)ρ has the following properties.1. Non-negativity: For every bipartite state ρAB,

Esq(A; B)ρ ≥ 0. (5.4.5)

2. Faithfulness: We have Esq(A; B)σ = 0 if and only if σAB is a separablestate.

3. Convexity: Let X be a finite alphabet, p : X → [0, 1] a probabilitydistribution on X, and ρx

ABx∈X a set of states. Then,

Esq(A; B)ρ ≤ ∑x∈X

p(x)Esq(A; B)ρx , (5.4.6)

where ρAB = ∑x∈X p(x)ρxAB.

4. Superadditivity and additivity: For every quantum state ρA1 A2B1B2 ,

Esq(A1A2; B1B2)ρ ≥ Esq(A1; B1)ρ + Esq(A2; B2)ρ. (5.4.7)

In particular, for a tensor product state ρA1 A2B1B2 = ωA1B1 ⊗ τA2B2 ,we have

Esq(A1A2; B1B2)ω⊗τ = Esq(A1; B1)ω + Esq(A2; B2)τ. (5.4.8)

PROOF:

1. This follows immediately from the fact that the conditional mutual infor-mation of an arbitrary state is non-negative (Theorem 4.6).

2. The statement “if σAB is a separable state, then Esq(A; B)σ = 0” followsfrom the line of reasoning in (5.1.157)–(5.1.159) used to motivate the def-inition of squashed entanglement. For a proof of the converse statement,please consult the Bibliographic Notes in Section 5.8.

477

Page 489: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

3. Let ωxABE denote an arbitrary extension of ρx

AB. Then

ωABEX := ∑x∈X

p(x)ωxABE ⊗ |x〉〈x|X (5.4.9)

is a particular extension of ρAB. It follows that

2 · Esq(A; B)ρ ≤ I(A; B|EX)ω (5.4.10)

= ∑x∈X

p(x)I(A; B|E)ωx , (5.4.11)

where to obtain the equality we used the direct-sum property of con-ditional mutual information (see Proposition 4.7). Since the inequalityholds for arbitrary extensions of ρx

AB, we conclude the desired inequality.

4. To see the inequality in (5.4.7), let ωA1 A2B1B2E be an extension of ρA1 A2B1B2 .Then by the chain rule for conditional mutual information, we have that

I(A1A2; B1B2|E)ω = I(A1; B1B2|E)ω + I(A2; B1B2|EA1)ω (5.4.12)= I(A1; B1|E)ω + I(A1; B2|EB1)ω

+ I(A2; B2|EA1)ω + I(A2; B1|EA1B2)ω (5.4.13)≥ I(A1; B1|E)ω + I(A2; B2|EA1)ω (5.4.14)≥ 2

[Esq(A1; B1)ρ + Esq(A2; B2)ρ

]. (5.4.15)

The first inequality follows because conditional mutual information isnon-negative. The second inequality follows because ωA1B1E is a partic-ular extension of the reduced state ρA1B1 , and ωA1 A2B2E is a particularextension of the reduced state ρA2B2 . Since the extension ωA1 A2B1B2E wasarbitrary, optimizing over all such extensions on the left-hand side of theinequality above gives (5.4.7).To see the inequality in (5.4.8), let ωA1B1E1 be an extension of ωA1B1 , andlet τA2B2E2 be an extension of τA2B2 . Then ωA1B1E1⊗ τA2B2E2 is an extensionof ωA1B1 ⊗ τA2B2 . We then have that

2 · Esq(A1A2; B1B2)σ ≤ I(A1A2; B1B2|E1E2)ω⊗τ (5.4.16)= I(A1; B1|E1)ω + I(A2; B2|E2)τ, (5.4.17)

where the equality follows from the additivity of conditional mutual in-formation with respect to tensor-product states (Proposition 4.7). Sincethe extensions ωA1B1E1 and τA2B2E2 are arbitrary, the inequality in (5.4.8)follows.

Let us now prove that the squashed entanglement is indeed an entangle-ment measure, as per Definition 5.1.

478

Page 490: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Theorem 5.33 Selective LOCC Monotonicity of Squashed Entangle-ment

The squashed entanglement is a selective LOCC monotone, so that(5.1.14) holds with E set to Esq.

PROOF: We prove that the conditions of Lemma 5.2 hold and then we applyit. The first part of the proof follows from the fact that conditional mutualinformation does not increase under the action of local channels (see Propo-sition 4.7): for a tripartite state ξABE and local channels NA→A′ and MB→B′ ,

I(A; B|E)ξ ≥ I(A′; B′|E)ζ , (5.4.18)

where ζA′B′E := (NA→A′ ⊗MB→B′)(ξABE). After incorporating optimizations,it then follows immediately that squashed entanglement does not increaseunder the action of local channels:

Esq(A; B)ρ ≥ Esq(A′; B′)κ, (5.4.19)

where κA′B′ := (NA→A′ ⊗MB→B′)(ρAB).

Also, the squashed entanglement is invariant under classical communica-tion, which follows from Lemma 5.34 below. Now applying Lemma 5.2, weconclude the statement of the theorem.

Lemma 5.34 Invariance of Squashed Entanglement Under ClassicalCommunication

Let ρXAB be a classical–quantum state of the following form:

ρXAB := ∑x∈X

p(x)|x〉〈x|X ⊗ ρxAB, (5.4.20)

where X is a finite alphabet, p : X → [0, 1] is a probability distribution,and ρx

ABx∈X is a set of states. Then

Esq(XA; B)ρ = Esq(A; BX)ρ = ∑x∈X

p(x)Esq(A; B)ρx . (5.4.21)

PROOF: Note that discarding or appending a classical state |x〉〈x|XA is a localchannel, so that Esq(A; B)ρx = Esq(XA; B)|x〉〈x|⊗ρx . In Proposition 5.32, weproved that the squashed entanglement is a convex function. Using this fact,

479

Page 491: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

it follows that∑

x∈Xp(x)Esq(A; B)ρx ≥ Esq(XA; B)ρ. (5.4.22)

Similarly, we have that

∑x∈X

p(x)Esq(A; B)ρx ≥ Esq(A; BX)ρ. (5.4.23)

So it suffices to establish the following inequalities:

∑x∈X

p(x)Esq(A; B)ρx ≤ Esq(XA; B)ρ, Esq(A; BX)ρ. (5.4.24)

To this end, consider that an arbitrary extension of ρXA AB has the followingform:

θXA ABE := ∑x∈X

p(x)|x〉〈x|XA ⊗ θxABE, (5.4.25)

for states θxABE that extend ρx

AB. We then find that

2 ∑x∈X

p(x)Esq(A; B)ρx ≤∑x

p(x)I(A; B|E)ρx (5.4.26)

= I(A; B|EX)θ (5.4.27)≤ I(XA; B|E)θ. (5.4.28)

The last inequality follows from the chain rule

I(A; B|EX)θ = I(XA; B|E)θ − I(X; B|E)θ (5.4.29)

and non-negativity of conditional mutual information: I(X; B|E)θ ≥ 0. Sincethe inequality holds for an arbitrary extension of ρXAB, we conclude that

∑x∈X

p(x)Esq(A; B)ρx ≤ Esq(XA; B)ρ. (5.4.30)

A proof for the other inequality

∑x∈X

p(x)Esq(A; B)ρx ≤ Esq(A; BX)ρ (5.4.31)

follows similarly.

In Section 5.1.1, we introduced the entanglement of formation as an en-tanglement measure. The following inequality relates the entanglement offormation to the squashed entanglement:

480

Page 492: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Proposition 5.35 Squashed Entanglement and Entanglement of For-mation

The entanglement of formation is never smaller than the squashed en-tanglement:

Esq(A; B)ρ ≤ EF(A; B)ρ, (5.4.32)

for every bipartite state ρAB.

PROOF: We already alluded to this inequality when introducing squashedentanglement in Section 5.1.1.4. If the extension ωABE is restricted to be of theform

ωABE = ∑x

p(x)φxAB ⊗ |x〉〈x|, (5.4.33)

where p(x) is a probability distribution and φxABx is a set of pure states

satisfying ρAB = ∑x p(x)φxAB, then it follows that

Esq(A; B)ρ ≤12

I(A; B|X)ω = H(A|X)ω. (5.4.34)

Since the inequality holds for all such extensions, the inequality in (5.4.32) fol-lows by applying the definition of entanglement of formation in (5.1.42).

For pure bipartite states, the entanglement of formulation is simply theentropy of the reduced state of one of the subsystems. It turns out that thesquashed entanglement reduces to the same quantity for pure bipartite states.

Proposition 5.36 Squashed Entanglement for Pure States

Let ψAB be a pure bipartite state. Then, the squashed entanglement ofψAB is equal to the entropy of its reduced state on A:

Esq(A; B)ψ = H(A)ψ. (5.4.35)

PROOF: Every extension ωABE of the pure bipartite state ψAB is a productstate of the form ωABE = ψAB ⊗ ρE for some state ρE. By Proposition 4.7, itfollows that

I(A; B|E)ψ⊗ρ = I(A; B)ψ = 2H(A)ψ, (5.4.36)

where the second equality holds because H(AB)ψ = 0 and H(A)ψ = H(B)ψ

for a pure bipartite state. We thus have Esq(A; B)ψ = H(A)ψ.

481

Page 493: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

An immediate consequence of Proposition 5.36 is that, for the maximallyentangled state |Φ〉AB = 1√

d ∑d−1i=0 |i, i〉AB of Schmidt rank d, the squashed

entanglement is equal to log2 d:

Esq(A; B)Φ = log2 d. (5.4.37)

Let us now return to the discussion after Definition 5.31 on the crypto-graphic interpretation of squashed entanglement. We stated that the quantumconditional mutual information I(A; B|E)ω can be interpreted as the amountof correlations between Alice (A) and Bob (B) from the point of view of aneavesdropper (E), where ωABE is the joint state shared by all three parties. Ifthe eavesdropper wants to reduce, or “squash” these correlations as much aspossible, then we optimize with respect to every state ωABE that is consistentwith the state ρAB shared by Alice and Bob, leading to the squashed entan-glement Esq(A; B)ρ. Now, recall Proposition 3.19, which states that for everyextension ωABE of a given state ρAB there exists a quantum channel SE′→Esuch that SE′→E(ψ

ρABE′) = ωABE, where ψ

ρABE′ is a purification of ρAB. We

therefore immediately have the following.

Proposition 5.37 Other Representations of Squashed Entanglement

Let ρAB be a bipartite quantum state, and let ψρABE′ be a purification of

it. Then the squashed entanglement Esq(A; B)ρ can be written as

Esq(A; B)ρ =12

infSE′→E

I(A; B|E)ω : ωABE = SE′→E(ψρABE′), (5.4.38)

where the infimum is with respect to every quantum channel SE′→E. An-other representation of squashed entanglement is

Esq(A; B)ρ

=12

infVE′→EF

H(B|E)θ + H(B|F)θ : θBEF := VE′→EF(ψ

ρABE′)

, (5.4.39)

where the infimum is with respect to every isometric channel VE′→EF.

The act of “squashing” the correlations between Alice and Bob can thus bethought of explicitly in terms of an eavesdropper applying the channel SE′→Eto their purifying system E′ of ρAB. For this reason, we call SE′→E a squashingchannel.

482

Page 494: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

PROOF: The equality (5.4.38) follows immediately from Proposition 3.19.

In order to establish (5.4.39), let SE′→E be an arbitrary squashing chan-nel, and let VE′→EF be an isometric extension of SE′→E, so that SE′→E = TrF VE′→EF, where the system F satisfies dF ≥ rank(ΓS

EE′). Consider that θABEF :=VE′→EF(ψ

ρABE′) is a purification of both ωABE and θBEF, i.e., ωABE = SE′→E(ψ

ρABE′) =

TrF[θABEF] and θBEF = TrA[θABEF]. Then, using (5.4.4), it follows that

I(A; B|E)ω = H(B|E)ω − H(B|AE)ω (5.4.40)= H(B|E)θ − H(B|AE)θ. (5.4.41)

Now, since θABEF is a pure state, we have from duality of conditional entropythat

− H(B|AE)θ = H(B|F)θ. (5.4.42)

Therefore,I(A; B|E)ω = H(B|E)θ + H(B|F)θ. (5.4.43)

We conclude that (5.4.39) holds because the squashing channel SE′→E is arbi-trary in the development above.

We now establish an explicit uniform continuity bound for the squashedentanglement:

Proposition 5.38 Uniform Continuity of Squashed Entanglement

Let ρAB and σAB be bipartite states such that the following fidelity boundholds

F(ρAB, σAB) ≥ 1− ε, (5.4.44)

for ε ∈ [0, 1]. Then the following bound applies to their squashed entan-glements:∣∣Esq(A; B)ρ − Esq(A; B)σ

∣∣ ≤√

ε log2 min dA, dB+ g2(√

ε), (5.4.45)

whereg2(δ) := (δ + 1) log2(δ + 1)− δ log2 δ. (5.4.46)

PROOF: Due to Uhlmann’s theorem (Theorem 3.58) and Proposition 3.19, foran arbitrary extension ρABE of ρAB, there exists an extension σABE of σAB suchthat

F(ρABE, σABE) = F(ρAB, σAB) ≥ 1− ε. (5.4.47)

483

Page 495: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

By the relation between trace distance and fidelity (Theorem 3.64), it followsthat

12‖ρABE − σABE‖1 ≤

√ε. (5.4.48)

Then, applying the uniform continuity of conditional mutual information(Proposition 4.8), we find that

2Esq(A; B)σ ≤ I(A; B|E)σ (5.4.49)

≤ I(A; B|E)ρ + 2√

ε log2 min dA, dB+ 2g2(√

ε). (5.4.50)

Since the extension ρABE is arbitrary, it follows that

Esq(A; B)σ ≤ Esq(A; B)ρ +√

ε log2 min dA, dB+ g2(√

ε). (5.4.51)

The other bound,

Esq(A; B)ρ ≤ Esq(A; B)σ +√

ε log2 min dA, dB+ g2(√

ε), (5.4.52)

follows from a similar proof.

5.5 Entanglement Measures for Channels

So far we have considered entanglement measures for quantum states. Wenow consider entanglement measures for channels. Using the general princi-ple discussed in Section 4.11.2 for constructing channel quantities out of statequantitites, we arrive at the following definition for the entanglement of achannel.

Definition 5.39 Entanglement of a Quantum Channel

From an entanglement measure E defined on bipartite quantum states,we define the corresponding entanglement of a quantum channel NA→Bas follows:

E(N) := supρRA

E(R; B)ω, (5.5.1)

where ωRB := NA→B(ρRA), and the optimization is with respect to everybipartite state ρRA, with system R arbitrarily large, yet finite.

484

Page 496: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

REMARK: Note that it suffices to optimize (5.5.1) with respect to pure states ψRA, withthe dimension of R equal to the dimension of A, when calculating the entanglement of achannel, so that

E(N) := supψRA

E(R; B)ω, (5.5.2)

where ωRB := NA→B(ψRA). This follows from the fact that an entanglement measure forstates is, by definition, monotone under LOCC channels. It is therefore monotone under alocal partial trace channel. In particular, consider a mixed state ρRA, with the dimensionof R not necessarily equal to the dimension of A. Let ωRB = NA→B(ωRA). Then, if we takea purification φR′RA of ρRA, we obtain

E(R; B)ω = E(NA→B(ρRA)) (5.5.3)= E(NA→B(TrR′ [φR′RA])) (5.5.4)= E(TrR′ [NA→B(φR′RA)]) (5.5.5)≤ E(NA→B(φR′RA)) (5.5.6)

= E(R′R; B)τ, (5.5.7)

where τR′RB = NA→B(φR′RA) and to obtain the inequality we used the fact that E is mono-tone under the partial trace channel TrR′ . This demonstrates that it suffices to optimizewith respect to pure states when calculating the entanglement of a channel. Furthermore,by the Schmidt decomposition theorem (Theorem 3.2), the dimension of the purifying sys-tem R′R need not exceed the dimension of A.

Note that in the definition above the channel NA→B acts locally on the stateψRA to produce the state ωRB = NA→B(ψRA). We can thus view NA→B as anLOCC channel, which means that E(R; B)ω ≤ E(R; A)ψ, by the definition ofan entanglement measure for states. In other words, by sending one share ofa bipartite state through the channel N, the entanglement can only stay thesame or go down. The quantity E(N) thus indicates how well entanglementis preserved when one share of it is sent through the channel N.

Let us consider three examples of entanglement measures for quantumchannels, defined using entanglement measures for bipartite quantum states.

1. The generalized divergence of entanglement of a channel NA→B, defined forevery generalized divergence D as

E(N) := supψRA

E(R; B)ω (5.5.8)

= supψRA

infσRB∈SEP(R:B)

D(NA→B(ψRA)‖σRB), (5.5.9)

where ωRB = NA→B(ψRA), and the optimization is with respect to purestates ψRA, with the dimension of R equal to the dimension of A. Weinvestigate this entanglement measure in Section 5.5.2.

485

Page 497: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

2. The generalized Rains divergence of a channel NA→B, defined for every gen-eralized divergence D as

R(N) := supψSA

R(A; B)ω (5.5.10)

= supψSA

infσSB∈PPT′(S:B)

D(NA→B(ψSA)‖σSB), (5.5.11)

where ωSB = NA→B(ψSA), and the optimization is with respect to purestates ψSA, with the dimension of S equal to the dimension of A. Weinvestigate this entanglement measure in Section 5.5.3.

3. The squashed entanglement of a channel NA→B, defined as

Esq(N) := supψRA

Esq(R; B)ω, (5.5.12)

where ωRB = NA→B(ψRA), and the optimization is with respect to purestates ψRA, with the dimension of R equal to the dimension of A. Weinvestigate this entanglement measure in Section 5.5.4.

Many properties of state entanglement measures carry over, or have ananalogue, to the corresponding channel entanglement measure.

Proposition 5.40 Properties of Entanglement Measures for Channels

Let E be an entanglement measure for states, as defined in Definition 5.1,and consider the corresponding channel entanglement measure definedin Definition 5.39.

1. Faithfulness: If E vanishes for all separable states, then E(N) = 0 forall entanglement breaking channels. If E is faithful (vanishing if andonly if the input state is separable), then E(N) = 0 implies that N isentanglement breaking.

2. Convexity: If E is a convex entanglement measure for states, then thecorresponding channel entanglement measure is convex: for everyfinite alphabet X, probability distribution p : X → [0, 1], and setNx

A→Bx∈X of quantum channels,

E

(∑

x∈Xp(x)Nx

)≤ ∑

x∈Xp(x)E(Nx). (5.5.13)

486

Page 498: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

3. Superadditivity: If E is superadditive, meaning that

E(A1A2; B1B2)ρ⊗τ ≥ E(A1; B1)ρ + E(A2; B2)τ (5.5.14)

for all states ρA1B1 and τA2B2 , then the channel entanglment measureis also superadditive: for every two channels NA1→B1 and MA2→B2 ,

E(N⊗M) ≥ E(N) + E(M). (5.5.15)

PROOF:

1. Let N be an entanglement breaking channel. This means that ωRB =NA→B(ψRA) is separable for every pure state ψRA. Therefore, since Evanishes for all separable states, we have E(R; B)ω = 0 for every purestate ψRA, so that E(N) = 0.

Now, suppose that E is a faithful state entanglement measure, and letE(N) = 0. Since E is faithful, it is non-negative for all input states, whichimplies that E(R; B)ω = 0 for every state ωRB = NA→B(ψRA), i.e., forevery pure state ψRA. Furthermore, by faithfulness of E for states, it holdsthat NA→B(ψRA) is separable for every pure state ψRA. Therefore, N isentanglement breaking.

2. Let ψRA be an arbitrary pure state, and let

ωRB =

(∑

x∈XNx

A→B

)(ψRA) = ∑

x∈Xp(x)ωx

RB, (5.5.16)

where ωxRB = Nx

A→B(ψRA). Then, by the convexity of E for states,

E(R; B)ω ≤ ∑x∈X

p(x)E(R; B)ωx ≤ ∑x∈X

p(x)E(Nx), (5.5.17)

where for the last inequality we used the definition of the channel entan-glement measure. Therefore, for every pure state ψRA, we have

E(R; B)ω ≤ ∑x∈X

p(x)E(Nx). (5.5.18)

Thus,

E

(∑

x∈Xp(x)Nx

)= sup

ψRA

E(R; B)ω ≤ ∑x∈X

p(x)E(Nx), (5.5.19)

as required.

487

Page 499: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

3. By restricting the optimization in the definition of E(N ⊗M) to pureproduct states φR1 A1 ⊗ ϕR2 A2 , letting ξR1B1 = NA1→B1(ϕR1 A1), τR2B2 =MA2→B2(ϕR2 A2), and using superadditivity of the state entanglement mea-sure E, we obtain

E(N⊗M) = supψRA1 A2

E(R; B1B2)ω (5.5.20)

≥ supφR1 A2⊗ϕR2 A2

E(R1R2; B1B2)ξ⊗τ (5.5.21)

≥ supφR1 A1

E(R1; B1)ξ + supϕR2 A2

E(R2; B2)τ (5.5.22)

= E(N) + E(M), (5.5.23)

as required.

5.5.1 Amortized Entanglement

There is another way to define the entanglement of a quantum channel froman entanglement measure on quantum states, and this method turns out tobe useful in the feedback-assisted communciation protocols that we considerin Part III. To see how this measure is defined, consider the situation shownin Figure 5.3. In this setup, Alice and Bob each have access to systems A′

and B′, respectively. Alice also possesses the system A, and she passes itthrough the channel NA→B to Bob. The initial joint state ρA′AB′ then becomesωABB′ = NA→B(ρA′AB′). The systems A′ and B′ can be thought of as “memorysystems” that hold auxiliary information. In feedback-assisted quantum com-munication protocols, these memory systems can hold the results of previousrounds of communication between Alice and Bob for the purpose of decid-ing what local operations to perform in subsequent rounds. By taking thedifference between the final entanglement in the state ωA′BB′ and the initialentanglement in the state ρA′AB′ , and optimizing over all initial states ρA′AB′ ,we arrive at what is called the amortized entanglement.

Definition 5.41 Amortized Entanglement of a Quantum Channel

From an entanglement measure E defined on bipartite quantum states,we define the amortized entanglement of a quantum channel NA→B as fol-

488

Page 500: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

N ωABB′ρA′AB′

A

A′

B′B

AliceBob

FIGURE 5.3: Starting from a state ρA′AB′ , Alice sends the system A throughthe channel NA→B to Bob, resulting in the state ωA′BB′ = NA→B(ρA′AB′).The difference between the final and initial entanglement (as quantifiedby an entanglement measure for states), optimized over all initial statesρA′AB′ , is equal to the amortized entanglement of N; see Definition 5.41.

lows:

EA(N) := supρA′AB′

E(A′; BB′)ω − E(A′A; B′)ρ

, (5.5.24)

where ωA′BB′ := NA→B(ρA′AB′) and the optimization is with respect tostates ρA′AB′ . The systems A′ and B′ have arbitrarily large, yet finitedimensions.

Due to the fact that the systems A′ and B′ can be arbitrarily large, it isnot necessarily the case that the supremum above can be achieved. Thus, ingeneral, it might be difficult to compute a channel’s amortized entanglement.

For every entanglement measure E that is equal to zero for all separablestates, we always have that the entanglement of the channel never exceedsthe amortized entanglement of the channel:

Lemma 5.42

For a given quantum channel N and entanglement measure E that isequal to zero for all separable states, the channel’s entanglement doesnot exceed its amortized entanglement:

E(N) ≤ EA(N). (5.5.25)

PROOF: By choosing the input state ρA′AB′ in the optimization for amortized

489

Page 501: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

entanglement to have a trivial (one-dimensional) system B′ (so that ρA′AB′ istrivially a separable state between Alice and Bob), we find that E(A′; BB′)ω =E(A′; B)ω and E(AA′; B′)ρ = 0. Since such a state is an arbitrary state to con-sider for optimizing the channel’s entanglement, the inequality follows.

Whether the inequality reverse to the one in (5.5.25) holds, which wouldimply that E(N) = EA(N), depends on the entanglement measure E. In Sec-tion 5.6 below, we show that this so-called “amortization collapse” occurs forsome entanglement measures.

The amortized entanglement of a channel has several interesting proper-ties, which we list in some detail in this section. These include convexity,faithfulness, and (sub)additivity.

Proposition 5.43 Properties of Amortized Entanglement of a Quan-tum Channel

1. Dimension bound: Let E be a subadditive entanglement measure forstates, and let NA→B be a quantum channel. Then,

EA(N) ≤ minE(A; A′)Φ, E(B; B′)Φ, (5.5.26)

where A′ has the same dimension as A, B′ has the same dimensionas B, and Φ+ is a maximally entangled state of systems AA′ or BB′.

2. Faithfulness: Let E be an entanglement measure that is equal to zerofor all separable states. If a channel N is entanglement-breaking,then its amortized entanglement EA(N) is equal to zero. If the en-tanglement measure E is faithful (equal to zero if and only if thestate is separable) and the amortized entanglement EA(N) of a chan-nel N is equal to zero, then the channel N is entanglement breaking.

3. Convexity: Let E be a convex entanglement measure for states. Then,the amortized entanglement EA of a channel is convex: for everyfinite alphabet X, probability distribution p : X → [0, 1], and setNx

A→Bx∈X of quantum channels,

EA

(∑

x∈Xp(x)Nx

)≤ ∑

x∈Xp(x)EA(Nx). (5.5.27)

4. Subdditivity and additivity: For every entanglement measure E, theamortized entanglement EA of a channel is subadditive, meaning

490

Page 502: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

that for every two quantum channels N and M,

EA(N⊗M) ≤ EA(N) + EA(M). (5.5.28)

If E is an additive entanglement measure, then the amortized entan-glement EA is additive, meaning that

EA(N⊗M) = EA(N) + EA(M) (5.5.29)

for every two quantum channels N and M.

PROOF:

1. To prove (5.5.26), we use the fact that N can be simulated via telepora-tion. Specifically, from (3.3.35) and (3.3.36), we can represent the actionof NA→B on every state ρA′AB′ in the following two ways:

NA→B(ρA′AB′) = NB′i→B

(TAAiBi→B′i

(ρA′AB′ ⊗ΦAiBi))

, (5.5.30)

NA→B(ρA′AB′) = TA′o AoBo→B

(NA→A′o(ρA′AB′)⊗Φ+

AoBo

), (5.5.31)

where Ai, Bi, B′i are auxiliary systems such that dAi = dBi = dB′i= dA (i.e.,

systems with the same dimension as the input system A of the channel)and Ao, A′o, Bo are auxiliary systems such that dAo = dA′o = dBo = dB (i.e.,systems with the same dimension as the output system B of the channel).The teleportation channel T is given by (3.3.29). Since T is an LOCC chan-nel, and N is a local channel, by LOCC monotonicity of the entanglementmeasure E we obtain

E(A′; BB′)ω ≤ E(A′AAi; B′Bi)ρ⊗Φ, (5.5.32)

E(A′; BB′)ω ≤ E(A′AAo; B′Bo)ρ⊗Φ, (5.5.33)

where ωA′BB′ = NA→B(ρA′AB′). Then, by subadditivity of E, we find that

E(A′; BB′)ω ≤ E(A′A; B′)ρ + E(Ai; Bi)Φ, (5.5.34)E(A′; BB′)ω ≤ E(A′A; B′)ρ + E(Ao; Bo)Φ. (5.5.35)

Since the state ρA′AB′ is arbitrary, we conclude that

EA(N) ≤ E(Ai; Bi)Φ = E(A; A′)Φ, (5.5.36)

491

Page 503: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

EA(N) ≤ E(Ao; Bo)Φ = E(B; B′)Φ, (5.5.37)

where for the last equality in each case we used dAi = dBi = dA′ = dAand dAo = dBo = dB′ = dB. We thus have

EA(N) ≤ minE(A; A′)Φ, E(B; B′)Φ, (5.5.38)

as required.

2. Let N be an entanglement breaking channel. For every state ρA′AB′ , letωABB′ = NA→B(ρA′AB′). Recall from Section 3.2.8, specifically Theo-rem 3.33, that every entanglement breaking channel can be representedas a measurement of the input system followed by the preparation of astate on the output system conditioned on the outcome of the measure-ment. As such, every entanglement breaking channel can be simulatedby an LOCC channel. Therefore, by the monotonicity of the entangle-ment measure E under LOCC,

E(A′; BB′)ω ≤ E(A′A; B′)ρ, (5.5.39)

which means that

E(A′; BB′)ω − E(A′A; B′)ρ ≤ E(A′A; B′)ρ − E(A′A; B′)ρ = 0 (5.5.40)

for every state ρA′AB′ . Therefore, EA(N) ≤ 0. On the other hand, becauseE vanishes for all separable states, it holds that E(N) ≥ 0. Therefore, byLemma 5.42, EA(N) ≥ 0, and we conclude that EA(N) = 0.

Now, let E be a faithful entanglement measure, meaning that it vanishesif and only if the input state is separable, and suppose that EA(N) =0. By Lemma 5.42, we have that 0 = EA(N) ≥ E(N), which in turnimplies that E(N) = 0 because E(N) ≥ 0 for all channels N. Therefore,by Proposition 5.40, we conclude that N is entanglement breaking.

3. Let ρA′AB′ be an arbitrary state, and let

ωA′BB′ =

(∑

x∈Xp(x)Nx

A→B

)(ρA′AB′) = ∑

x∈Xp(x)ωx

A′BB′ , (5.5.41)

where ωxA′BB′ = Nx

A→B(ρA′AB′) for all x ∈ X. Then, by convexity of theentanglement measure E, we obtain

E(A′; BB′)ω ≤ ∑x∈X

p(x)E(A′; BB′)ωx . (5.5.42)

492

Page 504: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Also, E(A′A; B′)ρ = ∑x∈X p(x)E(A′A; B′)ρ. Therefore, by the definitionof amortized entanglement, we find that

E(A′; BB′)ω − E(A′A; B′)ρ (5.5.43)

≤ ∑x∈X

p(x)(E(A′; BB′)ωx − E(A′A; B′)ρ

)(5.5.44)

≤ ∑x∈X

p(x)EA(Nx). (5.5.45)

Since the state ρA′AB′ is arbitrary, by optimizing over every state ρA′AB′on the left-hand side of the inequality above, we obtain

EA

(∑

x∈Xp(x)Nx

)≤ ∑

x∈Xp(x)EA(Nx), (5.5.46)

as required.

4. Let A1 and B1 denote the respective input and output systems for thequantum channel N, and let A2 and B2 denote the respective input andoutput quantum systems for the quantum channel M. Let ρA′A1 A2B′ bean arbitrary state. Let

ωA′B1B2B′ = (NA1→B1 ⊗MA2→B2)(ρA′A1 A2B′) (5.5.47)= NA1→B1(τA′A1B2B′), (5.5.48)

whereτA′A1B2B′ := MA2→B2(ρA′A1 A2B′). (5.5.49)

Observe that the state ωA′B1B2B′ is both an example of an output state inthe optimization defining EA(N ⊗M) and in the optimization definingEA(N) (with an appropriate identification of the A′, B, and B′ systemsfor the latter). Observe also that τA′A1B2B′ is an example of an outputstate in the optimization defining EA(M). Therefore,

E(A′; B1B2B′)ω − E(A′A1A2; B′)ρ

= E(A′; B1B2B′)ω − E(A′A1; B2B′)τ

+ E(A′A1; B2B′)τ − E(A′A1A2; B′)ρ (5.5.50)

≤ EA(N) + EA(M). (5.5.51)

Since the state ρA′A1 A2B′ is arbitrary, we can optimize over all such stateson the left-hand side of the inequality above to obtain

EA(N⊗M) ≤ EA(N) + EA(M). (5.5.52)

493

Page 505: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Now, suppose that E is an additive entanglement measure. Let us restrictthe optimization over states ρA′A1 A2B′ in the definition of EA(N⊗M) suchthat A′ ≡ A′1A′2, B′ ≡ B′1B′2, and ρA′A1 A2B′ = ρ1

A′1 A1B′1⊗ ρ2

A′2 A2B′2, for states

ρ1A′1 A1B′1

and ρ2A′2 A2B′2

. Then,

ωA′B1B2B′ = (NA1→B1 ⊗MA2→B2)(ρA′A1 A2B′) (5.5.53)

= NA1→B1(ρ1A′1 A1B′1

)⊗MA2→B2(ρ2A′2 A2B′2

) (5.5.54)

=: ω1A′1B1B′1

⊗ω2A2B2B′2

. (5.5.55)

Therefore, using additivity of E, we obtain

EA(N⊗M) = supρA′A1 A2B′

E(A′; B1B2B′)ω − E(A′A1A2; B′)ρ (5.5.56)

≥ supρ1

A′1 A1B′1⊗ρ2

A′2 A2B′2

E(A′1A′2; B1B2B′1B′2)ω1⊗ω2

− E(A′1A′2A1A2; B′1B′2)ρ1⊗ρ2 (5.5.57)

≥ supρ1

A′1 A1B′1⊗ρ2

A′2 A2B′2

E(A′1; B1B′1)ω1 + E(A′2; B2B′2)ω2

− E(A′1A1; B′1)ρ1 − E(A′2A2; B′2)ρ2 (5.5.58)

= supρ1

A′1 A1B′1

E(A′1; B1B′1)ω1 − E(A′1A1; B′1)ρ1

+ supρ2

A′2 A2B′2

E(A′2; B2B′2)ω2 − E(A′2A2; B′2)ρ2

(5.5.59)

= EA(N) + EA(M). (5.5.60)

Combining this with (5.5.52), we conclude that

EA(N⊗M) = EA(N) + EA(M), (5.5.61)

as required.

An immediate consequence of (5.5.28) is the following inequality:

supM

[EA(N⊗M)− EA(M)

]≤ EA(N), (5.5.62)

where the supremum is with respect to quantum channels M. This inequalitydemonstrates that no other channel can help to enhance the amortized entan-glement of a quantum channel.

494

Page 506: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

5.5.1.1 Amortized Entanglement and Teleportation Simulation

Teleportation (or, more generally, LOCC, separable, or PPT) simulation of aquantum channel is a key tool that we can use to establish upper bounds oncapacities of certain quantum channels when they are assisted by LOCC. Re-calling Section 3.3.5, the basic idea behind this tool is that a quantum channelcan be simulated by the action of a teleportation protocol, with a maximallyentangled resource state shared between the sender A and receiver B. Moregenerally, recalling Definition 3.40, a channel NA→B with input system A andoutput system B is defined to be LOCC-simulable with associated resourcestate ωRB′ if the following equality holds for all input states ρA:

NA→B(ρA) = LARB′→B(ρA ⊗ωRB′), (5.5.63)

where LARB′→B is a quantum channel consisting of LOCC between the sender,who has systems A and R, and the receiver, who has system B′. Whenever theunderlying state entanglement measure is subadditive, the amortized entan-glement of an LOCC-simulable channel can be bounded from above by theentanglement of the resource state. In fact, this is precisely what we did whenproving the dimension bound in Proposition 5.43 above. We can thereforeunderstand the dimension bound as being a consequence of the fact that allchannels are teleportation simulable, and hence LOCC simulable, by using amaximally entangled resource state.

Proposition 5.44

Let E be a subadditive state entanglement measure (recall (5.1.9)). Ifa quantum channel NA→B is LOCC-simulable with associated resourcestate ωRB′ , i.e.,

NA→B(ρA) = LARB′→B(ρA ⊗ωRB′), (5.5.64)

where LARB′→B is an LOCC channel, then the amortized entanglementEA(N) of N is bounded from above by the entanglement of the resourcestate:

EA(N) ≤ E(R; B′)ω. (5.5.65)

PROOF: For every state ρA′AB′′ , we use monotonicity of the state entangle-ment measure under LOCC, as well as subadditivity of the measure, to obtain

E(A′; BB′′)L(ρ⊗ω) − E(A′A; B′′)ρ

495

Page 507: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

≤ E(A′AR; B′′B′)ρ⊗ω − E(A′A; B′′)ρ (5.5.66)≤ E(A′A; B′′)ρ + E(R; B′)ω − E(A′A; B′′)ρ (5.5.67)= E(R; B′)ω, (5.5.68)

where for the first inequality we made use of LOCC monotonicity and for thesecond inequality we made use of the assumption of subadditivity. Since thestate ρA′AB′′ was arbitrary, we conclude (5.5.65).

If it happens that a channel NA→B is LOCC-simulable with resource stateωRB′ = NA→B′(ρRA) for some state ρRA, then the inequality in (5.5.65) be-comes an equality. In Section 3.3.5, we saw an example in which such a situ-ation arises, namely, when the channel N is group covariant. In this case, theresource state is simply the Choi state of the channel.

Proposition 5.45

Let E be an entanglement measure that is subadditive with respect tostates and zero on separable states, and let EA denote its amortized ver-sion. If a channel NA→B is LOCC-simulable with associated resourcestate ωRB = NA→B(ρRA) for some input state ρRA, then the followingequality holds

EA(N) = E(R; B)ω. (5.5.69)

PROOF: From Proposition 5.44, we have that EA(N) ≤ E(R; B)ω. For thereverse inequality, we take ρA′AB′′ = ρRA in the optimization that definesEA(N), where we identify A′ ≡ R and B′′ ≡ ∅ (i.e., B′′ is a trivial one-dimensional system). Then, NA→B(ρA′AB′′) = NA→B(ρRA), which is the re-source state. Furthermore, since B′′ is a one-dimensional system, the stateρA′AB′′ is trivially separable, so that E(A′A; B′′)ρ = 0. Therefore, EA(N) ≥E(A′; B)ω ≡ E(R; B)ω.

5.5.2 Generalized Divergence of Entanglement

In this section, we examine the generalized divergence of entanglement ofquantum channels, which is a channel entanglement measure that arises fromthe generalized divergence of entanglement for quantum states that we con-sidered in Section 5.2.

496

Page 508: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Definition 5.46 Generalized Divergence of Entanglement

Let D be a generalized divergence (see Definition 4.13). For every quan-tum channel NA→B, we define the generalized divergence of entanglementof N as

E(N) := supψRA

E(R; B)ω (5.5.70)

= supψRA

infσRB∈SEP(R:B)

D(NA→B(ψRA)‖σRB), (5.5.71)

where ωRB := NA→B(ψRA). The supremum is with respect to every purestate ψRA, with the dimension of R equal to the dimension of A.

In the remark immediately after Definition 5.39, we stated how it sufficesto optimize with respect to pure bipartite states (with equal dimension foreach subsystem) when calculating the generalized divergence of entangle-ment of a quantum channel.

We can write the generalized divergence of entanglement of NA→B in thefollowing alternate form:

E(N) = supρA

E(NA→B, ρA), (5.5.72)

E(NA→B, ρA) := infσAB∈SEP(A:B)

D(√

ρAΓNAB√

ρA‖σAB). (5.5.73)

This is indeed true because we can write every purification of ρA as (VA′√

ρA′⊗1A)|Γ〉A′A for some isometry VA′ (see (2.2.29) and Theorem 2.2), that the setof separable states is invariant under local isometries, and that generalizeddivergences are invariant under local isometries (see Proposition 4.14).

As with the state quantities, we are interested in the following generalizeddivergences of entanglement of a quantum channel NA→B:

1. The relative entropy of entanglement of N,

ER(N) := supψSA

ER(S; B)ω (5.5.74)

= supψSA

infσSB∈SEP(S:B)

D(NA→B(ψSA)‖σSB), (5.5.75)

where ωSB = NA→B(ψSA).

497

Page 509: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

2. The ε-hypothesis testing relative entropy of entanglement of N,

EεH(N) := sup

ψRA

EεH(R; B)ω (5.5.76)

= supψRA

infσRB∈SEP(R:B)

DεH(NA→B(ψRA)‖σRB), (5.5.77)

where ωRB = NA→B(ψRA).

3. The sandwiched Rényi relative entropy of entanglement of N,

Eα(N) := supψRA

Eα(R; B)ω (5.5.78)

= supψRA

infσRB∈SEP(R:B)

Dα(NA→B(ψRA)‖σRB), (5.5.79)

where ωRB ∈ NA→B(ψRA) and α ∈ [1/2, 1) ∪ (1, ∞). It follows fromProposition 4.29 that Eα(N) is monotonically increasing in α. Also, inAppendix 5.B, we prove that

ER(N) = limα→1+

Eα(N) (5.5.80)

for every quantum channel N.

4. The max-relative entropy of entanglement of N,

Emax(N) := supψRA

Emax(R; B)ω (5.5.81)

= supψRA

infσRB∈SEP(R:B)

Dmax(NA→B(ψRA)‖σRB), (5.5.82)

where ωRB = NA→B(ψRA). In Appendix 5.B, we prove that

Emax(N) = limα→∞

Eα(N) (5.5.83)

for every quantum channel N.

The generalized divergence of entanglement of a quantum channel satis-fies all of the general properties of a channel entanglement measure shown inProposition 5.40 except for the superadditivity property, which holds whenthe generalized divergence of a bipartite state is superadditive. However,as shown in Proposition 5.16, the generalized divergence of entanglement of

498

Page 510: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

a bipartite state is generally only subadditive. Thus, neither the superaddi-tivity nor the subadditivity of the generalized divergence of entanglementof a channel immediately follows. In Section 5.6 below, we show that themax-relative entropy of entanglement of a quantum channel is subadditive,meaning that

Emax(N⊗M) ≤ Emax(N) + Emax(M) (5.5.84)

for all quantum channels N and M.

The amortized generalized divergence of entanglement, defined accord-ing to Definition 5.41 as

EA(N) := supρA′AB′

E(A′; BB′)ω − E(A′A; B)ρ, (5.5.85)

where ωA′BB′ = NA→B(ρA′AB′), satisfies all of the properties stated in Propo-sition 5.43. In particular, it is subadditive. We show in Section 5.6 below thatEmax(N) = EA

max(N) for every quantum channel N, and this is what leads tothe subadditivity statement in (5.5.84).

For covariant channels, the optimization over pure input states in the gen-eralized divergence of entanglement can be simplified, as we now show. Thissimplification is similar to the simplification that occurs for the generalizedchannel divergence for jointly covariant channels (see Proposition 4.81).

Proposition 5.47 Generalized Divergence of Entanglement for Co-variant Channels

Let NA→B be a G-covariant quantum channel for a finite group G (recallDefinition 3.48). Then, for every pure state ψRA, with the dimension ofR equal to the dimension of A, we have that

E(R; B)ω ≤ E(R; B)ω, (5.5.86)

where ωRB := NA→B(ψRA), ωRB := NA→B(φρRA),

ρA =1|G| ∑

g∈GUg

AψAUg†A =: TG(ψA), (5.5.87)

and φρRA is a purification of ρA. Consequently,

E(N) = supφRA

E(R; B)ω : φA = TG(φA), (5.5.88)

499

Page 511: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

where ωRB = NA→B(φRA). In other words, in order to calculate E(N), itsuffices to optimize with respect to pure states φRA such that the reducedstate φA is invariant under the channel TG defined in (5.5.87).

REMARK: Using (5.5.72), we can write (5.5.86) as

E(N, ρ) ≤ E(N,TG(ρ)), (5.5.89)

which holds for every state ρ acting on the input space of the channel N. We can write(5.5.88) as

E(N) = supρE(N, ρ) : ρ = TG(ρ). (5.5.90)

PROOF: The proof is similar to the proof of Proposition 4.81. Let ψRA be anarbitrary pure state, and let ρA = TG(ψA). Furthermore, let φ

ρRA be a purifica-

tion of ρA. Let us also consider the following purification of ρA:

|ψρ〉R′RA :=1√|G| ∑

g∈G|g〉R′ ⊗ (1R ⊗Ug

A)|ψ〉RA, (5.5.91)

where |g〉R′g∈G is an orthonormal basis for HR′ indexed by the elements ofG. Since all purifications of a state can be mapped to each other by isome-tries on the purifying systems, there exists an isometry WR→R′R such that|ψρ〉R′RA = WR→R′R|φρ〉RA. Then, because the set SEP of separable statesis invariant under local isometries, for every state σRB ∈ SEP(R : B) we havethat τR′RB := WR→R′R(σRB) ∈ SEP(R′R : B). Therefore,

D(NA→B(ψρR′RA)‖τR′RB) (5.5.92)

= D(NA→B(WR→R′R(φρRA))‖WR→R′A′(σRB)) (5.5.93)

= D(WR→R′R(NA→B(φρRA))‖WR→R′R(σRB)) (5.5.94)

= D(NA→B(φρRA)‖σRB), (5.5.95)

where, to obtain the last equality, we used the fact that any generalized di-vergence is isometrically invariant (recall Proposition 4.14). Now, if we applythe dephasing channel X 7→ ∑g∈G |g〉〈g|X|g〉〈g| to the R′ system, then by thedata-processing inequality for the generalized divergence D, we obtain

D(NA→B(ψρR′RA)‖τR′RB)

500

Page 512: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

≥ D

(1|G| ∑

g∈G|g〉〈g|R′ ⊗ (NA→B Ug

A)(ψRA)

∥∥∥∥∥ ∑g∈G

p(g)|g〉〈g|R′ ⊗ τgRB

)

(5.5.96)

= D

(1|G| ∑

g∈G|g〉〈g|R′ ⊗ ((V

gB)

† NA→B UgA)(ψRA)

∥∥∥∥∥

∑g∈G

p(g)|g〉〈g|R′ ⊗Vg†B τ

gRBVg

B

), (5.5.97)

where to obtain the last line we applied the unitary channel given by the uni-tary ∑g∈G |g〉〈g|R′ ⊗ Vg†

B and we used the fact that generalized divergencesare invariant under unitaries. Furthermore, we wrote the action of the de-phasing channel on τR′RB as ∑g∈G p(g)|g〉〈g|R′ ⊗ τ

gRB, where p : G → [0, 1] is

a probability distribution and τgRBg∈G is a set of states. This operator is in

the set SEP(R′R : B) because the set SEP is closed under local channels. Next,due to the covariance of N, we have that (Vg

B)† N Ug

A = N, so that

D(NA→B(ψρR′RA)‖τR′RB) (5.5.98)

≥ D

(1|G| ∑

g∈G|g〉〈g|R′ ⊗NA→B(ψRA)

∥∥∥∥∥ ∑g∈G

p(g)|g〉〈g|R′ ⊗Vg†B τ

gRBVg

B

)

(5.5.99)

≥ D

(NA→B(ψRA)

∥∥∥∥∥ ∑g∈G

p(g)Vg†B τ

gRBVg

B

), (5.5.100)

where to obtain the last inequality we used the data-processing inequality forD under the channel TrR′ . Now, observe that the state ∑g∈G p(g)|g〉〈g|R′ ⊗Vg†

B τgRBVg

B is in the set SEP(R′R : B). This is due to the fact that ∑g∈G |g〉〈g|R′⊗Vg†

B is a controlled unitary, and since register R′ is classical, this controlledunitary can be implemented as an LOCC channel. Also, the set SEP is closedunder LOCC channels. It follows then that ∑g∈G p(g)Vg†

B τRBVgB ∈ SEP(R : B)

because we obtain it from the previous separable state by applying a localpartial trace over R′. By taking the infimum over every state τRB ∈ SEP(R : B)in (5.5.100), we have that

D(NA→B(φρRA)‖σRB) = D(NA→B(ψ

ρR′RA)‖τR′RB) (5.5.101)

≥ infτRB∈SEP(R:B)

D(NA→B(ψRA)‖τRB) (5.5.102)

501

Page 513: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

= E(R; B)ω, (5.5.103)

where ωRB = NA→B(ψRA). This inequality holds for every state ψRA andevery state σRB ∈ SEP(R : B). Therefore, optimizing over all σRB ∈ SEP(R : B)leads to

infσRB∈SEP(R:B)

D(NA→B(φρRA)‖σRB) = E(R; B)ω ≥ E(R; B)ω, (5.5.104)

where ωRB = NA→B(φρRA). This is precisely the inequality in (5.5.86).

Next, by construction, the state φρRA is such that its reduced state on A is

invariant under the channel TG. Optimizing over all such states leads to

supφRA

E(R; B)ω : φA = TG(φA), ωRB = NA→B(φRA) ≥ E(R; B)ω. (5.5.105)

Since this inequality holds for every pure state ψRA, we finally obtain

supφRA

E(R; B)ω : φA = TG(φA), ωRB = NA→B(φRA) ≥ E(N). (5.5.106)

Since the reverse inequality trivially holds, we obtain (5.5.88).

We saw in Section 5.2.1 that both the max-relative entropy of entangle-ment and the hypothesis testing relative entropy of entanglement can be for-mulated as cone programs. We now show that the max-relative entropy ofentanglement for channels can also be formulated as a cone program.

Proposition 5.48 Cone Program for the Max-Relative Entropy of En-tanglement of a Quantum Channel

Let NA→B be a quantum channel. Then

Emax(N) = log2 Σmax(N), (5.5.107)

where

Σmax(N) := infYSB∈SEP

‖TrB[YSB]‖∞ : ΓN

SB ≤ YSB

, (5.5.108)

and ΓNSB is the Choi operator of the channel NA→B.

502

Page 514: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

PROOF: Using the definition of a channel’s max-relative entropy of entan-glement, and the cone program formulation of the max-relative entropy ofentanglement for states from Proposition 5.21, we have

Emax(N) = supψSA

Emax(S; B)ω (5.5.109)

= supψSA

infXSB∈SEP

log2Tr[XSB] : NA→B(ψSA) ≤ XSB , (5.5.110)

where ωSB = NA→B(ψSA). Now, recall from (2.2.29) that an arbitrary purebipartite state ψSA can be written as ZSΓSAZ†

S, where ΓSA = |Γ〉〈Γ|, |Γ〉SA =

∑dA−1i=0 |i, i〉SA, and ZS is an operator satisfying Tr[Z†

SZS] = 1. Then

NA→B(ψSA) = NA→B(ZSΓSAZ†S) (5.5.111)

= ZSNA→B(ΓSA)Z†S (5.5.112)

= ZSΓNSBZ†

S, (5.5.113)

where ΓNSB = NA→B(ΓSA) is the Choi operator of NA→B. Since the set of

operators ZS satisfying Z†SZS > 0 and Tr[Z†

SZS] = 1 is dense in the set of alloperators satisfying Tr[Z†

SZS] = 1, we find that

Emax(N)

= log2 supZS

infXSB∈SEP

Tr[XSB] : ZSΓN

SBZ†S ≤ XSB, Z†

SZS > 0, Tr[Z†SZS] = 1

.

(5.5.114)

Let us now make a change of variable, defining the variable YSB according tothe relation XSB = ZSYSBZ†

S. Then, since

ZSΓNSBZ†

S ≤ XSB = ZSYSBZ†S ⇐⇒ ΓN

SB ≤ YSB, (5.5.115)

XSB ∈ SEP ⇐⇒ YSB ∈ SEP, (5.5.116)

we find that

Eq. (5.5.114)

= supZS

infYSB∈SEP

Tr[ZSYSBZ†

S] : ΓNSB ≤ YSB, Z†

SZS > 0, Tr[Z†SZS] = 1

= supZS

infYSB∈SEP

Tr[Z†

SZSYSB] : ΓNSB ≤ YSB, Z†

SZS > 0, Tr[Z†SZS] = 1

503

Page 515: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

= supρS

infYSB∈SEP

Tr[ρSYSB] : ΓN

SB ≤ YSB

, (5.5.117)

where in the last line we made the substitution ρS = Z†SZS, so that the op-

timization is with respect to density operators. Furthermore, we have em-ployed the fact that the set of density operators satisfying ρS > 0 is densein the set of all density operators. Now observing that the objective functionis linear in ρS and YSB, the set of density operators is compact and convex,and the set of separable operators is convex, the Sion minimax theorem (The-orem 2.18) applies, such that we can exchange the optimizations to find that

supρS

infYSB∈SEP

Tr[ρSYSB] : ΓN

SB ≤ YSB

= infYSB∈SEP

supρS

Tr[ρSYSB] : ΓN

SB ≤ YSB

(5.5.118)

= infYSB∈SEP

supρS

Tr[ρSTrB[YSB]] : ΓN

SB ≤ YSB

(5.5.119)

= infYSB∈SEP

‖TrB[YSB]‖∞ : ΓN

SB ≤ YSB

(5.5.120)

= Σmax(N). (5.5.121)

The second equality follows from partial trace, and the third follows because‖X‖∞ = supρ Tr[Xρ] for positive semi-definite operators X, where the opti-mization is with respect to density operators (see (2.2.84)).

5.5.3 Generalized Rains Divergence

We now examine the generalized Rains divergence of quantum channels,which is a channel entanglement measure that arises from the generalizedRains divergence for bipartite quantum states that we considered in Section 5.3.

Definition 5.49 Generalized Rains Information of a Quantum Chan-nel

Let D be a generalized divergence (see Definition 4.13. For every quan-tum channel NA→B, we define the generalized Rains information of N as

R(N) := supψSA

R(S; B)ω (5.5.122)

504

Page 516: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

= supψSA

infσSB∈PPT′(S:B)

D(NA→B(ψSA)‖σSB), (5.5.123)

where ωSB := NA→B(ψSA). The supremum is with respect to every purestate ψSA, with the dimension of S the same as the dimension of A.

In the remark immediately after Definition 5.39 we show that it suffices tooptimize with respect to pure bipartite states (with equal dimension for eachsubsystem) when calculating the generalized Rains divergence of a quantumchannel.

We can write the generalized Rains divergence of NA→B in the followingalternate form:

R(N) = supρA

R(NA→B, ρA), (5.5.124)

R(NA→B, ρA) := infσAB∈PPT′(A:B)

D(√

ρAΓNAB√

ρA‖σAB). (5.5.125)

This is indeed true because we can write every purification of ρA as (VS√

ρS⊗1A)|Γ〉SA for some isometry VS (see (2.2.29) and Theorem 2.2), the set of PPT′

operators is invariant under local isometries, and the generalized divergencesare invariant under local isometries (see Proposition 4.14).

As with the state quantities, we are interested in the following generalizedRains information quantities for every quantum channel NA→B. For everycase below, we define ωSB = NA→B(ψSA).

1. The Rains information of N,

R(N) := supψSA

R(S; B)ω (5.5.126)

= supψSA

infσSB∈PPT′(S:B)

D(NA→B(ψSA)‖σSB). (5.5.127)

2. The ε-hypothesis testing Rains information of N,

RεH(N) := sup

ψSA

RεH(S; B)ω (5.5.128)

= supψSA

infσSB∈PPT′(S:B)

DεH(NA→B(ψSA)‖σSB), (5.5.129)

505

Page 517: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

3. The sandwiched Rényi Rains information of N,

Rα(N) := supψSA

Rα(S; B)ω (5.5.130)

= supψSA

infσSB∈PPT′(S:B)

Dα(NA′→B(ψSA)‖σSB), (5.5.131)

where α ∈ [1/2, 1) ∪ (1, ∞). It follows from Proposition 4.29 that Rα(N) ismonotonically increasing in α. Also, in Appendix 5.B, we prove that

R(N) = limα→1

Rα(N) (5.5.132)

for every quantum channel N.

4. The max-Rains information of N,

Rmax(N) := supψSA

Rmax(S; B)ω (5.5.133)

= supψSA

infσSB∈PPT′(S:B)

Dmax(NA→B(ψSA)‖σSB). (5.5.134)

In Appendix 5.B, we prove that

Rmax(N) = limα→∞

Rα(N) (5.5.135)

for every quantum channel N.

The generalized Rains divergence of a quantum channel satisfies all of theproperties of a channel entanglement measure laid out in Proposition 5.40,except for faithfulness and superadditivity. Faithfulness generally does nothold because the generalized Rains divergence of a bipartite quantum stateis not faithful. Superadditivity does not hold in general because the Rainsdivergence of a bipartite quantum state is generally only subadditive. Themax-Rains information of a quantum channel, however, is additive, meaningthat

Rmax(N⊗M) = Rmax(N) + Rmax(M) (5.5.136)

for all quantum channels N and M. We defer a proof of this to Section 5.6below.

The amortized generalized Rains divergence RA(N), defined according toDefinition 5.41 as

RA(N) := supρA′AB′

R(A′; BB′)ω − R(A′A; B)ρ, (5.5.137)

506

Page 518: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

where ωA′BB′ = NA→B(ρA′AB′), satisfies all of the properties stated in Propo-sition 5.43 except for faithfulness, because the generalized Rains divergenceof a bipartite quantum state is not faithful. In particular, due to additivityof the max-Rains relative entropy (Proposition 5.29), we immediately obtainadditivity of the amortized max-Rains information of a quantum channel, i.e.,

RAmax(N⊗M) = RA

max(N) + RAmax(M) (5.5.138)

for all quantum channels N and M. We show in Section 5.6 below that

Rmax(N) = RAmax(N) (5.5.139)

for every quantum channel N, and it is this fact that leads to the additivitystatement in (5.5.136).

For covariant channels, the optimization over pure input states in the gen-eralized Rains divergence simplifies in the same way as it does for the gener-alized divergence of entanglement.

Proposition 5.50 Generalized Rains Information for CovariantChannels

Let NA→B be a G-covariant quantum channel for a finite group G (recallDefinition 3.48). Then, for every pure state ψRA, with the dimension ofR equal to the dimension of A, we have that

R(S; B)ω ≤ R(S; B)ω, (5.5.140)

where ωRB := NA→B(ψRA), ωRB := NA→B(φρRA),

ρA =1|G| ∑

g∈GUg

AρAUg†A =: TG(ρA), (5.5.141)

ρA = ψA = TrS[ψSA], and φρSA a purification of ρA. Consequently,

R(N) = supφSA

R(S; B)ω : φA = TG(φA), ωSB = NA→B(φSA). (5.5.142)

In other words, in order to calculate R(N), it suffices to optimize withrespect to pure states φSA such that the reduced state φA is invariantunder the channel TG defined in (5.5.141).

507

Page 519: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

REMARK: Using (5.5.124), we can write (5.5.140) as

R(N, ρ) ≤ R(N,TG(ρ)), (5.5.143)

which holds for every state ρ acting on the input space of the channel N. We can write(5.5.142) as

R(N) = supRR(N, ρ) : ρ = TG(ρ). (5.5.144)

PROOF: The proof is identical to the proof of Proposition 5.47, with the excep-tion that the set PPT’ is involved rather than the set SEP. The LOCC channelsdiscussed there preserve the set PPT’, and this is the main reason why thesame proof applies.

Using the SDP formulation of max-Rains relative entropy in Proposition 5.27,we arrive at an SDP formulation for the max-Rains information of a quantumchannel.

Proposition 5.51 SDPs for Max-Rains Information of a QuantumChannel

Let NA→B be a quantum channel. Then

Rmax(N) = log2 Γmax(N), (5.5.145)

where

Γmax(N)

= infYSB,VSB≥0

‖TrB[VSB + YSB]‖∞ : TB(VSB −YSB) ≥ ΓNSB (5.5.146)

= supρS≥0Tr[ΓN

SBXSB] : Tr[ρS] ≤ 1, −ρS ⊗ 1B ≤ TB(XSB) ≤ ρS ⊗ 1B.

(5.5.147)

PROOF: To arrive at (5.5.147), consider that, with ωSB = NA→B(ψSA),

2Rmax(N)

= supψSA

2Rmax(S;B)ω (5.5.148)

= sup Tr[NA→B(ψSA)XSB] : ‖TB(XSB)‖∞ ≤ 1, XSB ≥ 0 , (5.5.149)

where the last equality follows from (5.3.47)–(5.3.49). Recall from (2.2.29) thatan arbitrary pure bipartite state ψSA can be written as ZSΓSAZ†

S, where ΓSA =

508

Page 520: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

|Γ〉〈Γ|SA, |Γ〉SA = ∑dA=1i=0 |i, i〉SA, and ZS is an operator satisfying Tr[Z†

SZS] =1. Then

Tr[NA→B(ψSA)XSB] = Tr[NA→B(ZSΓSAZ†S)XSB] (5.5.150)

= Tr[ZSNA→B(ΓSA)Z†SXSB] (5.5.151)

= Tr[ΓNSBZ†

SXSBZS], (5.5.152)

where ΓNSB = NA→B(ΓSA) denotes the Choi operator of the channel NA→B.

Since the set of operators ZS satisfying Z†SZS > 0 and Tr[Z†

SZS] = 1 is densein the set of all operators satisfying Tr[Z†

SZS] = 1, we find that

2Rmax(N) = supTr[ΓNSBZ†

SXSBZS] : ‖TB(XSB)‖∞ ≤ 1,

XSB ≥ 0, Z†SZS > 0, Tr[Z†

SZS] = 1. (5.5.153)

Consider that XSB ≥ 0⇔ Z†SXSBZS ≥ 0 and

‖TB(XSB)‖∞ ≤ 1⇐⇒ − 1SB ≤ TB(XSB) ≤ 1SB (5.5.154)

⇐⇒ − Z†SZS ⊗ 1B ≤ Z†

STB(XSB)ZS ≤ Z†SZS ⊗ 1B (5.5.155)

⇐⇒ − Z†SZS ⊗ 1B ≤ TB(Z†

SXSBZS) ≤ Z†SZS ⊗ 1B. (5.5.156)

We now set X′SB := Z†SXSBZS and ρS = Z†

SZS > 0 and rewrite as follows:

2Rmax(N) = supTr[ΓNSBX′SB] : −ρS ⊗ 1B ≤ TB(X′SB) ≤ ρS ⊗ 1B,

X′SB ≥ 0, ρS > 0, Tr[ρS] = 1, (5.5.157)

which is the equality in (5.5.145) and (5.5.147), after observing that the setρS : ρS > 0, Tr[ρS] = 1 is dense in the set ρS : ρS ≥ 0, Tr[ρS] = 1.

To arrive at the equality in (5.5.146), we employ the dual formulation ofthe max-Rains relative entropy in (5.3.49). Consider that

2Rmax(N)

= supψSA

2Rmax(S;B)ω (5.5.158)

= supψSA

infKSB,LSB≥0

Tr[KSB + LSB] : TB(KSB − LSB) ≥ NA→B(ψSA). (5.5.159)

509

Page 521: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Making the same observations as we did previously, we have that NA→B(ψSA) =ZSΓN

SBZ†S, as well as

TB(KSB − LSB) ≥ NA→B(ψSA) ⇐⇒ TB(K′SB − L′SB

)≥ ΓN

SB, (5.5.160)

where K′SB and L′SB are such that KSB = ZSK′SBZ†S and LSB = ZSL′SBZ†

S, re-spectively. Then KSB, LSB ≥ 0⇐⇒ K′SB, L′SB ≥ 0, and we find that

2Rmax(N) = supZS

infK′SB,L′SB≥0

Tr[ZSK′SBZ†S + ZSL′SBZ†

S] : TB(K′SB − L′SB

)≥ ΓN

SB,

Z†SZS > 0, Tr[Z†

SZS] = 1. (5.5.161)

Employing cyclicity of trace, setting ρS = Z†SZS, and exploiting the fact that

the set ρS : ρS > 0, Tr[ρS] = 1 is dense in the set ρS : ρS ≥ 0, Tr[ρS] = 1,we find that

2Rmax(N) = supρS

infK′SB,L′SB

Tr[ρS(K′SB + L′SB)] : K′SB, L′SB ≥ 0,

TB(K′SB − L′SB

)≥ ΓN

SB, ρS ≥ 0, Tr[ρS] = 1. (5.5.162)

The function that we are optimizing is linear in ρS and jointly convex in K′SBand L′SB (the set with respect to which the infimum is performed is also com-pact), so that the minimax theorem (Theorem 2.18) applies and we can ex-change sup with inf to find that

2Rmax(N) = infK′SB,L′SB

supρS

Tr[ρS(K′SB + L′SB)] : K′SB, L′SB ≥ 0,

TB(K′SB − L′SB

)≥ ΓN

SB, ρS ≥ 0, Tr[ρS] = 1. (5.5.163)

For fixed K′SB and L′SB, consider that

supρS

Tr[ρS(K′SB + L′SB)] : ρS ≥ 0, Tr[ρS] = 1

= supρS

Tr[ρSTrB[K′SB + L′SB]] : ρS ≥ 0, Tr[ρS] = 1 (5.5.164)

=∥∥TrB[K′SB + L′SB]

∥∥∞ , (5.5.165)

where for the last line we used (2.2.84). Substituting back in, we find that

2Rmax(N) = infK′SB,L′SB

∥∥TrB[K′SB + L′SB]

∥∥∞ : K′SB, L′SB ≥ 0,

510

Page 522: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

TB(K′SB − L′SB

)≥ ΓN

SB, (5.5.166)

as claimed in (5.5.147).

According to Theorem 2.22, strong duality holds by picking VSB and YSBequal to the positive and negative parts of TB(ΓN

SB), respectively, which arefeasible for (5.5.147). Furthermore, we can pick ρS = 1S/(2dS) and XSB =1SB/(3dS), which are strictly feasible for (5.5.146).

5.5.4 Squashed Entanglement

We now move on to the squashed entanglement of a quantum channel, whichis a channel entanglement measure that arises from the squashed entangle-ment of a bipartite state, the latter defined in Section 5.4.

Definition 5.52 Squashed Entanglement of a Quantum Channel

For every quantum channel NA→B, we define the squashed entanglementof N as

Esq(N) = supψRA

Esq(R; B)ω (5.5.167)

=12

supψRA

infτRBEI(R; B|E)τ : TrE[τRBE] = ωRB, (5.5.168)

where ωRB = NA→B(ψRA) and the quantum conditional mutual infor-mation I(R; B|E)τ is defined as

I(R; B|E)τ = H(R|E)τ + H(B|E)τ − H(RB|E)τ. (5.5.169)

We can write the squashed entanglement of a quantum channel in thefollowing alternate form:

Esq(N) = supρA

Esq(N, ρ), (5.5.170)

Esq(N, ρ) := Esq(R; B)ω. (5.5.171)

where ωRB = NA→B(ψρRA), with ψ

ρRA some purification of ρA. This is indeed

true because the squashed entanglement of a bipartite state is invariant underisometries and because all purifications of a given state are related to eachother by local isometries acting on the purifying system.

511

Page 523: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Let us now recall the following alternate expression for the squashed en-tanglement of a bipartite state from (5.4.39):

Esq(A; B)ρ =12

infVE′→EF

H(B|E)θ + H(B|F)θ : θBEF = VE′→EF(ψ

ρABE′),

,

(5.5.172)

where ψρABE′ is some purification of ρAB. Now, given an input state ρA of a

quantum channel NA→B, let φRA be a purification of ρA. Then, for the stateωRB := NA→B(φRA), we can take a purification to be ψRBE′ = UN

A→BE′(φRA),where UN is an isometric channel that extends N. Then, for every isometricchannel VE′→EF, we can define the state

θBEF = VE′→EF(ψBE′) = (VE′→EF UNA→BE′)(ρA), (5.5.173)

where ψBE′ = TrR[ψRBE′ ]. We then have

Esq(N, ρA) = E(R; B)ω =

12

infVE′→EF

H(B|E)θ + H(B|F)θ : θBEF = (VE′→EF UN

A→BE′)(ρA)

(5.5.174)

for every state ρA. We thus obtain the following expression for the squashedentanglement of a channel:

Esq(N) =

12

supρA

infVE′→EF

H(B|E)θ + H(B|F)θ : θBEF = (VE′→EF UN

A→BE′)(ρA)

.

(5.5.175)

The squashed entanglement of a quantum channel, as well as its amor-tized version EA

sq defined as

EAsq(N) := sup

ρA′AB′Esq(A′; BB′)ω − Esq(A′A; B′)ρ, (5.5.176)

with ωA′BB′ = NA→B(ρA′AB′), satisfy all of the properties stated in Proposi-tion 5.40 and Proposition 5.43, respectively. In particular, because the squashedentanglement for states is additive (see (5.4.8)), we immediately have that theamortized squashed entanglement of a channel is additive, i.e.,

EAsq(N⊗M) = EA

sq(N) + EAsq(M) (5.5.177)

512

Page 524: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

for all quantum channels N and M. In Section 5.6 below, we prove thatEsq(N) = EA

sq(N) for every quantum channel N, which then implies the addi-tivity of the squashed entanglement of a channel, i.e.,

Esq(N⊗M) = Esq(N) + Esq(M) (5.5.178)

for all channels N and M.

Proposition 5.53

Let NA→B be a quantum channel. The function ρ 7→ Esq(N, ρ), whereEsq(N, ρ) is defined in (5.5.171), is concave: for X a finite alphabet, p :X → [0, 1] a probability distribution on X, and ρx

Ax∈X a set of states,the following inequality holds

Esq

(N, ∑

x∈Xp(x)ρx

A

)≥ ∑

x∈Xp(x)Esq(N, ρx

A). (5.5.179)

PROOF: In order to prove this, we make use of the expression for Esq(N, ρA)in (5.5.174).

For every state ρxA, with x ∈ X, define the state

θxBEF := (VE′→EF UN

A→BE′)(ρxA), (5.5.180)

where VE′→EF is an arbitrary isometric channel. Then, using (5.5.174), wehave

Esq(N, ρxA) = E(R; B)ωx ≤ 1

2(H(B|E)θx + H(B|F)θx , ) (5.5.181)

Now, let

ρA := ∑x∈X

p(x)ρxA, (5.5.182)

θBEF := ∑x∈X

p(x)θxBEF (5.5.183)

= ∑x∈X

p(x)(VE′→EF UNA→BE′)(ρ

xA) (5.5.184)

= (VE′→EF UNA→BE′)(ρA). (5.5.185)

Using concavity of conditional entropy (see (4.2.122)), we obtain

∑x∈X

p(x)Esq(N, ρxA) ≤

12 ∑

x∈Xp(x) (H(B|E)θx + H(B|F)θx) (5.5.186)

513

Page 525: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

≤ 12(H(B|E)θ + H(B|F)θ). (5.5.187)

Finally, since the isometric channel VE′→EF is arbitrary, taking the infimumover all such channels on the right-hand side of the inequality above andusing (5.5.185) gives us

∑x∈X

p(x)Esq(N, ρxA) ≤

12

infVE′→EF

H(B|E)θ + H(B|F)θ (5.5.188)

= Esq(N, ρA), (5.5.189)

which is what we set out to prove.

5.6 Amortization Collapses

In Lemma 5.42, we proved the following relation between the entanglementE(N) of a channel N and its amortized entanglement EA(N):

E(N) ≤ EA(N). (5.6.1)

In general, therefore, amortization can yield a larger value for the entangle-ment of a channel than the usual channel entanglement measure.

For which entanglement measures does the reverse inequality hold? Inthis section, we investigate this question, and we prove that three of thechannel entanglement measures that we have considered in this chapter —max-relative entropy of entanglement, max-Rains information, and squashedentanglement — satisfy the reverse inequality. Thus, for these three entan-glement measures, amortization does not yield a higher entanglement valuethan the usual channel entanglement measure. This so-called “amortizationcollapse” is important because it immediately implies additivity of the usualchannel entanglement measure.

5.6.1 Max-Relative Entropy of Entanglement

We start by proving that the amortization collapse occurs for max-relative en-tropy of entanglement. The key tools in the proof are Propositions 5.21 and

514

Page 526: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

5.48, which provide cone programs for both the max-relative entropy of en-tanglement for bipartite states and the max-relative entropy of entanglementfor quantum channels. Let us recall these now:

Emax(A; B)ρ = log2 Gmax(A; B)ρ (5.6.2)= log2 inf

XAB∈SEPTr[XAB] : ρAB ≤ XAB , (5.6.3)

Emax(N) = log2 Σmax(N) (5.6.4)

= log2 infYSB∈SEP

‖TrB[YSB]‖∞ : ΓN

SB ≤ YSB

, (5.6.5)

where ρAB is a bipartite state and N is a quantum channel with Choi operatorΓN

SB.

Theorem 5.54 Amortization Collapse for Max-Relative Entropy ofEntanglement

Let NA→B be a quantum channel. For every state ρA′AB′ ,

Emax(A′; BB′)ω ≤ Emax(N) + Emax(A′A; B′)ρ, (5.6.6)

where ωA′BB′ := NA→B(ρA′AB′). Consequently,

EAmax(N) ≤ Emax(N), (5.6.7)

and thus (by Lemma 5.42), we have that

EAmax(N) = Emax(N) (5.6.8)

for every quantum channel N.

PROOF: Using the cone program formulations in (5.6.2)–(5.6.5), we find thatthe inequality in (5.6.6) is equivalent to

Gmax(A′; BB′)ω = Σmax(N) · Gmax(A′A; B′)ρ. (5.6.9)

We now set out to prove this inequality.

Using (5.6.3), we find that

Gmax(A′A; B′)ρ = inf Tr[CA′AB′ ], (5.6.10)

515

Page 527: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

subject to the constraints

CA′AB′ ∈ SEP(A′A : B′), (5.6.11)CA′AB′ ≥ ρA′AB′ , (5.6.12)

andGmax(A′; BB′)ω = inf Tr[DA′BB′ ], (5.6.13)

subject to the constraints

DA′BB′ ∈ SEP(A′ : BB′), (5.6.14)DA′BB′ ≥ NA→B(ρA′AB′). (5.6.15)

Furthermore, (5.6.5) gives that

Σmax(N) = inf ‖TrB[YSB]‖∞ , (5.6.16)

subject to the constraints

YSB ∈ SEP(S : B), (5.6.17)

YSB ≥ ΓNSB. (5.6.18)

With these optimizations in place, we can now establish the inequalityin (5.6.9) by making a judicious choice for DA′BB′ . Let CA′AB′ be an arbi-trary operator to consider in the optimization for Gmax(A′A; B′)ρ (i.e., sat-isfying (5.6.11)–(5.6.12)), and let YSB be an arbitrary operator to consider inthe optimization for Σmax(N) (i.e., satisfying (5.6.17)–(5.6.18)). Let |Γ〉SA =

∑dA−1i=0 |i, i〉SA. Pick

DA′BB′ = 〈Γ|SACA′AB′ ⊗YSB|Γ〉SA. (5.6.19)

We need to prove that DA′BB′ is feasible for Gmax(A′; BB′)ω. To this end, con-sider that

〈Γ|SACA′AB′ ⊗YSB|Γ〉SA ≥ 〈Γ|SAρA′AB′ ⊗ ΓNSB|Γ〉SA (5.6.20)

= NA→B(ρA′AB′), (5.6.21)

which follows from (5.6.12), (5.6.18), and (3.2.8). Now, since CA′AB′ ∈ SEP(A′A :B′), it can be written as ∑x∈X Px

A′A ⊗ QxB′ for some finite alphabet X and for

sets PxA′Ax∈X, Qx

B′x∈X of positive semi-definite operators. Similarly, sinceYSB ∈ SEP (S : B), it can be written as ∑y∈Y Ly

S ⊗MyB, for some finite alphabet

516

Page 528: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Y and sets LySy∈Y, My

By∈Y of positive semi-definite operators. Then, using(2.2.30) and (2.2.33), we have that

〈Γ|SACA′AB′ ⊗YSB|Γ〉SA

= ∑x∈Xy∈Y

〈Γ|SAPxA′A ⊗Qx

B′ ⊗ LyS ⊗My

B|Γ〉SA (5.6.22)

= ∑x∈Xy∈Y

〈Γ|SAPxA′ATA(Ly

A)⊗QxB′ ⊗ 1S ⊗My

B|Γ〉SA (5.6.23)

= ∑x∈Xy∈Y

TrA[PxA′ATA(Ly

A)]⊗QxB′ ⊗My

B (5.6.24)

∈ SEP(A′ : BB′). (5.6.25)

For the second equality, we used the transpose trick from (2.2.30), and for thethird, we used (2.2.33). The last statement follows because

TrA[PxA′ATA(Ly

A)] = TrA

[√TA(Ly

A)PxA′A

√TA(Ly

A)

](5.6.26)

is positive semi-definite for each x ∈ X and y ∈ Y. Thus, DA′BB′ is feasible forGmax(A′; BB′)ω. Finally, using (2.2.33) again, consider that

Gmax(A′; BB′)ω ≤ Tr[DA′BB′ ] (5.6.27)= Tr[〈Γ|SACA′AB′ ⊗YSB|Γ〉SA] (5.6.28)= Tr[CA′AB′TA(YAB)] (5.6.29)= Tr[CA′AB′TA(TrB[YAB])] (5.6.30)

For the second equality, we used the transpose trick from (2.2.30). SinceCA′AB′ and YSB are positive semi-definite (this follows from (5.6.12) and (5.6.18),respectively), using (2.2.106) we have that

Tr[CA′AB′TA(TrB[YAB])] = |Tr[CA′AB′TA(TrB[YAB])]| (5.6.31)≤ ‖CA′AB′‖1 ‖TA(TrB[YAB])‖∞ (5.6.32)= Tr[CA′AB′ ] ‖TA(TrB[YAB])‖∞ (5.6.33)= Tr[CA′AB′ ] ‖TrB[YAB]‖∞ , (5.6.34)

where for the last equality we used the fact that the spectrum of an operatoris invariant under the action of a full transpose (note, in this case, that TA is afull transpose because the operator TrB[YAB] acts only on A). Therefore,

Gmax(A′; BB′)ω ≤ Tr[CA′AB′ ] ‖TrB[YAB]‖∞ . (5.6.35)

517

Page 529: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Since this inequality holds for all CA′AB′ satisfying (5.6.11)–(5.6.12) and for allYSB satisfying (5.6.17)–(5.6.18), we conclude (5.6.9) after taking an infimumwith respect to all such operators.

Having shown that

Emax(A′; BB′)ω ≤ Emax(N) + Emax(A′A; B′)ρ

=⇒ Emax(A′; BB′)ω − Emax(A′A; B′)ρ ≤ Emax(N) (5.6.36)

for every state ρA′AB′ , it follows immediately from the definition of EAmax(N)

that EAmax(N) ≤ Emax(N). Combined with the result of Lemma 5.42, which is

the reverse inequality, we obtain EAmax(N) = Emax(N).

With the equality EAmax(N) = Emax(N) in hand, the subadditivity of max-

relative entropy of entanglement of quantum channels immediately follows.

Corollary 5.55 Subadditivity of Max-Relative Entropy of Entangle-ment of a Quantum Channel

The max-relative entropy of entanglement of a quantum channel is sub-additive: for every two quantum channels N and M,

Emax(N⊗M) ≤ Emax(N) + Emax(M). (5.6.37)

PROOF: Given that the amortized entanglement of a quantum channel is al-ways subadditive, regardless of whether or not the underlying state entan-glement measure is additive (see Proposition 5.43), we have that

EAmax(N⊗M) ≤ EA

max(N) + EAmax(M) (5.6.38)

for all quantum channels N and M. Using (5.6.8), we immediately obtain thedesired result.

5.6.2 Max-Rains Information

Let us now prove that the amortization collapse also occurs for the max-Rainsinformation of a quantum channel. The key tools needed for the proof arePropositions 5.27 and 5.51, which provide semi-definite programs for both themax-Rains relative entropy for bipartite states and the max-Rains informationfor quantum channels. Let us recall these now:

2Rmax(A;B)ρ = Wmax(A; B)ρ (5.6.39)

518

Page 530: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

= infKAB,LAB≥0

Tr[KAB + LAB] : TB[KAB − LAB] ≥ ρAB, (5.6.40)

2Rmax(N) = Γmax(N) (5.6.41)

= infYSB,VSB≥0

‖TrB[VSB + YSB]‖∞ : TB[VSB −YSB] ≥ ΓNSB, (5.6.42)

where ρAB is a bipartite state and N is a quantum channel with Choi opera-tor ΓN

SB.

Theorem 5.56 Amortization Collapse for Max-Rains Information

Let NA→B be a quantum channel. For every state ρA′AB′ ,

Rmax(A′; BB′)ω ≤ Rmax(N) + Rmax(A′A; B′)ρ, (5.6.43)

where ωA′BB′ := NA→B(ρA′AB′). Consequently,

RAmax(N) ≤ Rmax(N), (5.6.44)

and thus (by Lemma 5.42), we have that

RAmax(N) = Rmax(N) (5.6.45)

for every quantum channel N.

PROOF: The proof given below is conceptually similar to the proof of Theo-rem 5.54, but it has some key differences.

Using the semi-definite program formulations in (5.6.39)–(5.6.42), we findthat the inequality in (5.6.43) is equivalent to

Wmax(A′; BB′)ω ≤ Γmax(N) ·Wmax(A′A; B′)ρ, (5.6.46)

and so we aim to prove this one.

Using (5.6.40), we have that

W(A′A; B′)ρ = inf Tr[CA′AB′ + DA′AB′ ], (5.6.47)

subject to the constraints

CA′AB′ , DA′AB′ ≥ 0, (5.6.48)TB′(CA′AB′ − DA′AB′) ≥ ρA′AB′ , (5.6.49)

519

Page 531: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

and we also have

W(A′; BB′)ω = inf Tr[GA′BB′ + FA′BB′ ], (5.6.50)

subject to the constraints

GA′BB′ , FA′BB′ ≥ 0, (5.6.51)NA→B(ρA′AB′) ≤ TBB′ [GA′BB′ − FA′BB′ ]. (5.6.52)

Using (5.6.42), we have that

Γmax(N) = inf ‖TrB[VSB + YSB]‖∞ , (5.6.53)

subject to the constraints

YSB, VSB ≥ 0, (5.6.54)

TB[VSB −YSB] ≥ ΓNSB. (5.6.55)

With these SDP formulations in place, we can now establish the inequal-ity in (5.6.46) by making judicious choices for GA′BB′ and FA′BB′ . Let CA′AB′and DA′AB′ be arbitrary operators in the optimization for Wmax(A′A; B′)ρ, andlet YSB and VSB be arbitrary operators in the optimization for Γmax(N). Let|Γ〉SA = ∑dA−1

i=0 |i, i〉SA. Pick

GA′BB′ = 〈Γ|SACA′AB′ ⊗VSB + DA′AB′ ⊗YSB|Γ〉SA, (5.6.56)FA′BB′ = 〈Γ|SACA′AB′ ⊗YSB + DA′AB′ ⊗VSB|Γ〉SA. (5.6.57)

Note that GA′BB′ , FA′BB′ ≥ 0 because CA′AB′ , DA′AB′ , YSB, VSB ≥ 0. Using(5.6.49) and (5.6.55), consider that

TBB′ [GA′BB′ − FA′BB′ ]

= TBB′ [〈Γ|SA(CA′AB′ − DA′AB′)⊗ (VSB −YSB)|Γ〉SA] (5.6.58)= 〈Γ|SATB′ [CA′AB′ − DA′AB′ ]⊗ TB[VSB −YSB]|Γ〉SA (5.6.59)

≥ 〈Γ|SAρA′AB′ ⊗ ΓNSB|Γ〉SA (5.6.60)

= NA→B(ρA′AB′), (5.6.61)

where the last equality follows from (3.2.8). Our choices of GA′BB′ and FA′BB′are thus feasible points for Wmax(A′; BB′)ω. Using this, along with (2.2.30)and (2.2.33), we obtain

Wmax(A′; BB′)ω

520

Page 532: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

≤ Tr[GA′BB′ + FA′BB′ ] (5.6.62)= Tr[〈Γ|SA(CA′AB′ + DA′AB′)⊗ (VSB + YSB)|Γ〉SA] (5.6.63)= Tr[(CA′AB′ + DA′AB′)TA(VAB + YAB)] (5.6.64)= Tr[(CA′AB′ + DA′AB′)TA(TrB[VAB + YAB])]. (5.6.65)

The second equality follows from the transpose trick from (2.2.30). Now, sinceCA′AB′ , DA′AB′ ≥ 0 (recall (5.6.48)), and VAB, YAB ≥ 0 (recall (5.6.54)), we canuse (2.2.106) to conclude that

Tr[(CA′AB′ + DA′AB′)TA(TrB[VAB + YAB])] (5.6.66)= |Tr[(CA′AB′ + DA′AB′)TA(TrB[VAB + YAB])]| (5.6.67)≤ ‖CA′AB′ + DA′AB′‖1 ‖TA(TrB[VAB + YAB])‖∞ (5.6.68)= Tr[CA′AB′ + DA′AB′ ] ‖TA(TrB[VAB + YAB])‖∞ (5.6.69)= Tr[CA′AB′ + DA′AB′ ] ‖TrB[VAB + YAB]‖∞ , (5.6.70)

where the final equality follows because the spectrum of an operator is in-variant under the action of a (full) transpose (note, in this case, that TA is afull transpose because the operator TrB[VAB + YAB] acts only on system A).We thus have

Wmax(A′; BB′)ω ≤ Tr[CA′AB′ + DA′AB′ ] ‖TA[TrB[VAB + YAB]]‖∞ (5.6.71)

Since this inequality holds for all CA′AB′ and DA′AB′ satisfying (5.6.49) and forall VAB and YAB satisfying (5.6.55), we conclude the inequality in (5.6.46).

Having shown that

Rmax(A′; BB′)ω ≤ Rmax(N) + Rmax(A′A; B′)ρ,=⇒ Rmax(A′; BB′)ω − Rmax(A′A; B′)ρ ≤ Rmax(N) (5.6.72)

for every state ρA′AB′ , it immediately follows from the definition of amortizedentanglement of a channel that

RAmax(N) ≤ Rmax(N). (5.6.73)

Thus, by Lemma 5.42, which proves the reverse inequality, we obtain

RAmax(N) = Rmax(N) (5.6.74)

for every quantum channel N.

With the equality RAmax(N) = Rmax(N) in hand, along with additivity of

max-Rains relative entropy for bipartite states, additivity of max-Rains infor-mation of a quantum channel immediately follows.

521

Page 533: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Corollary 5.57 Additivity of Max-Rains Information of a Channel

The max-Rains information of a quantum channel is additive: for everytwo quantum channels N and M,

Rmax(N⊗M) = Rmax(N) + Rmax(M). (5.6.75)

PROOF: The additivity of max-Rains relative entropy for bipartite states (seeProposition 5.29), along with Proposition 5.43, implies that the amortizedmax-Rains information of a quantum channel is additive, meaning that

RAmax(N⊗M) = RA

max(N) + RAmax(M) (5.6.76)

for all quantum channels N and M. Then, from (5.6.45), we obtain the desiredresult.

5.6.3 Squashed Entanglement

Finally, let us prove that the amortization collapse occurs for the squashedentanglement of a quantum channel.

Theorem 5.58 Amortization Collapse for Squashed Entanglement

The squashed entanglement of a channel does not increase under amor-tization, i.e.,

Esq(N) = EAsq(N) (5.6.77)

for every quantum channel N.

PROOF: The inequality Esq(N) ≤ EAsq(N) has already been shown in Lemma 5.42.

We thus prove the reverse inequality.

Let ρA′AB′ be an arbitrary state, and let ωA′BB′ = NA→B(ρA′AB′). LetUN

A→BE be an isometric channel extending NA→B, and let ϕA′AB′R be a purifi-cation of ρA′AB′ . Then, the state θA′BB′ER := UN

A→BE(ϕA′AB′R) is a purificationof ωA′BB′ . As we show in Lemma 5.60 below, the following inequality holds

Esq(A′; BB′)ω = Esq(A′; BB′)θ ≤ Esq(A′B′R; B)θ + Esq(A′BE; B′)θ. (5.6.78)

Now, squashed entanglement is invariant under the action of an isometricchannel, in particular UN

A→BE, so that Esq(A′BE; B′)θ = Esq(A′A; B′)ρ. Also,

522

Page 534: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

the state θA′BB′ER can be viewed as a particular state in the optimization overstates that defines Esq(N) because θA′BB′ER is a purification of ωA′BB′ . There-fore,

Esq(A′B′R; B)θ ≤ Esq(N). (5.6.79)

Altogether, we thus have

Esq(A′; BB′)ω ≤ Esq(N) + Esq(A′A; B′)ρ

=⇒ Esq(A′; BB′)ω − Esq(A′A; B′)ρ ≤ Esq(N) (5.6.80)

for every state ρA′AB′ , which means by definition of amortized entanglementthat

EAsq(N) ≤ Esq(N), (5.6.81)

which is what we sought to prove.

With the equality EAsq(N) = Esq(N) in hand, along with additivity of

squashed entanglement for bipartite states (Property 4. of Proposition 5.32),additivity of squashed entanglement for channels immediately follows.

Corollary 5.59 Additivity of Squashed Entanglement of a Channel

The squashed entanglement of a quantum channel is additive: for everytwo quantum channels N and M,

Esq(N⊗M) = Esq(N) + Esq(M). (5.6.82)

PROOF: The additivity of the squashed entanglement for bipartite states (seeProposition 5.32), along with Proposition 5.43, implies that the amortizedsquashed entanglement of a quantum channel is additive, meaning that

EAsq(N⊗M) = EA

sq(N) + EAsq(M) (5.6.83)

for all quantum channels N and M. Then, from (5.6.77), we obtain the desiredresult.

Lemma 5.60

Let ψKL1L2M1M2 be a pure state. Then

Esq(K; L1L2)ψ ≤ Esq(KL2M2; L1)ψ + Esq(KL1M1; L2)ψ. (5.6.84)

523

Page 535: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

PROOF: We make use of the squashing channel representation of squashedentanglement in (5.4.38), namely,

Esq(A; B)ρ =12

infSE′→E

I(A; B|E)ω : ωABE = SE′→E(ψρABE′), (5.6.85)

where ψρABE′ is a purification of ρ and the infimum is with respect to every

quantum channel SE′→E. Let us also recall that

I(A; B|E)ω = H(A|E)ω + H(B|E)ω − H(AB|E)ω (5.6.86)= H(B|E)ω − H(B|AE)ω, (5.6.87)

and that strong subadditivity (Theorem 4.6) is the statement that I(A; B|E)ω ≥0. From this we obtain the following two inequalitites:

H(AB|E)ω ≤ H(A|E)ω + H(B|E)ω, (5.6.88)H(B|AE)ω ≤ H(B|E)ω. (5.6.89)

Now, the given pure state ψKL1L2M1M2 can be thought of as a purification ofthe reduced state ψKL1L2 for which the squashed entanglement Esq(K; L1L2)ψ

is evaluated, with the purifying systems being M1 and M2. Then, consideringan arbitrary product squashing channel S1

M1→M′1⊗ S2

M2→M′2, and letting

ωKL1L2M′1M′2= (S1

M1→M′1⊗ S2

M2→M′2)(ψKL1L2M1M2), (5.6.90)

we find from (5.6.85) that

2Esq(K; L1L2)ψ ≤ I(K; L1L2|M′1M′2)ω. (5.6.91)

Expanding the quantum conditional mutual information using (5.6.87), wehave that

I(K; L1L2|M′1M′2)ω = H(L1L2|M′1M′2)ω − H(L1L2|M′1M′2K)ω. (5.6.92)

Now, let φKL1L2M′1M′2R be a purification of ωKL1L2M′1M′2with purifying sys-

tem R. Then, by definition of conditional entropy, and using the fact thatφKL1L2M′1M′2R is pure, we obtain4

H(L1L2|M′1M′2K)ω = H(L1L2M′1M′2K)ω − H(M′1M′2K)ω (5.6.93)

4The steps in (5.6.93)–(5.6.95) establish a general fact called duality of conditional entropy:for every pure state ψABE, the following equality holds H(A|E)ψ + H(B|E)ψ = 0.

524

Page 536: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

= H(R)φ − H(L1L2R)φ (5.6.94)= −H(L1L2|R)φ. (5.6.95)

Therefore,

2Esq(K; L1L2)ψ ≤ H(L1L2|M′1M′2)φ + H(L1L2|R)φ. (5.6.96)

Using the inequality in (5.6.88) followed by two applications of (5.6.89) (withappropriate identification of subsystems in all three cases), we obtain

H(L1L2|M′1M′2)φ ≤ H(L1|M′1M′2)φ + H(L2|M′1M′2)φ (5.6.97)≤ H(L1|M′1)φ + H(L2|M′2)φ. (5.6.98)

Next, using (5.6.88), we conclude that

H(L1L2|R)φ ≤ H(L1|R)φ + H(L2|R)φ. (5.6.99)

Therefore, continuing from (5.6.96), we have

2Esq(K; L1L2)ψ ≤ H(L1|M′1)φ + H(L2|M′2)φ + H(L1|R)φ + H(L2|R)φ.(5.6.100)

Now, applying reasoning analogous to that in (5.6.93)–(5.6.95) for the last twoterms, we find that

2Esq(K; L1L2)ψ ≤ H(L1|M′1)ω − H(L2|KL2M′1M′2)ω

+ H(L2|M′2)ω − H(L2|KL1M′1M′2)ω (5.6.101)= I(KL2M′2; L1|M′1)ω + I(KL1M′1; L2|M′2)ω, (5.6.102)

where in the last line we used the expression in (5.6.87) for the conditionalmutual information.

Now, let us regard ψKL1L2M1M2 as a purification of the reduced state ψKL2M2L1 ,for which the squashed entanglement Esq(KL2M2; L1)ψ is evaluated, with thepurifying system being M1. Then, the state

τKL2M2L1M′1:= S1

M1→M′1(ψKL2M1L1M1) (5.6.103)

is a particular extension of ψKL2M2L1 in the optimization for Esq(KL2M2; L1)ψ.Similarly, we can regard ψKL1L2M1M2 as a purification of the reduced stateψKL1M1L2 , for which the squashed entanglement Esq(KL1M1; L2)ψ is evalu-ated, with the purifying system being M2. Then, the state

σKL1M1L2M′2:= S2

M2→M′2(ψKL1M1L2M2) (5.6.104)

525

Page 537: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

is a particular extension of ψKL1M1L2 in the optimization for Esq(KL1M1; L2)ψ.Using all of this, we proceed from (5.6.102) to obtain

2Esq(K; L1L2)ψ ≤ I(KL2M′2; L1|M′1)ω + I(KL1M′1; L2|M′2)ω (5.6.105)≤ I(KL2M2; L1|M′1)τ + I(KL1M1; L2|M′2)σ, (5.6.106)

where for the second inequality we used the data-processing inequality forconditional mutual information (Proposition 4.7). Since the squashing chan-nels S1

M1→M′1and S2

M2→M′2are arbitrary, optimizing over all such channels on

the right-hand side of the inequality above leads us to

Esq(K; L1L2)ψ ≤ Esq(KL2M2; L1)ψ + Esq(KL1M1; L2)ψ, (5.6.107)

as required.

5.7 Summary

In this chapter, we studied entanglement measures for quantum states andquantum channels. The defining property of an entanglement measure forstates is monotonicity under local operations and classical communication(LOCC): a function E : HAB → R is an entanglement measure if E(ρAB) ≥E(L(ρAB)) for every bipartite state ρAB and every LOCC channel L. LOCCmonotonicity can be thought of as a special kind of data-processing inequal-ity, and it is a core concept in entanglement theory in the same way that thedata-processing inequality is the core concept behind generalized divergence.

An important type of state entanglement measure for our purposes inthis book is a divergence-based measure, in which the entanglement in agiven bipartite quantum state is quantified by its divergence with the setof separable states. As our divergence, we take a generalized divergenceD : D(H) × L+(H) → R, and we call the resulting quantity generalized di-vergence of entanglement. Due to the data-processing inequality (which holdsfor a generalized divergence by definition), we immediately obtain LOCCmonotonicity for the generalized divergence of entanglement, thus making itan entanglement measure. We also consider the divergence with the largerset of PPT′ operators that contains all separable states, and call the resultingquantity generalized Rains divergence.

We considered two types of channel entanglement measures. The firsttype quantifies the entanglement of a bipartite state after one share of it is

526

Page 538: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

sent through the given quantum channel, in a manner analogous to the chan-nel information measures defined in Chapter 4. The second type of chan-nel entanglement measure is called amortized entanglement, which essentiallyquantifies the difference in entanglement between a bipartite state and thestate obtained after sending one share of it through the given channel. Theconcept of amortized entanglement turns out to play an important role infeedback-assisted communciation scenarios (as considered in Part III), as itcan be used to prove important properties of entanglement measures of thefirst kind.

5.8 Bibliographic Notes

Entanglement theory has a long history, with one of the seminal papers beingthat of Bennett et al. (1996c) (see also Bennett et al. (1996b,a) for earlier works).Bennett et al. (1996c) introduced the resource theory of entanglement, withthe separable states as the free states and LOCC channels as the free channels,along with the related operational notions of distillable entanglement and en-tanglement cost. The review of Horodecki et al. (2009b) is a useful resourcefor gaining an understanding of notable accomplishments in the area. Seealso the review of Plenio and Virmani (2007), which focuses specifically onentanglement measures.

The axiomatic approach to defining an entanglement measure that wehave taken in this chapter was proposed by Bennett et al. (1996c); Vedralet al. (1997); Vedral and Plenio (1998); Vidal (2000), with LOCC monotonic-ity emerging as the defining property of an entanglement measure Horodeckiet al. (2009b). Vidal (2000) and Horodecki (2005) established simplified con-ditions for proving that a function is a selective LOCC monotone. Lemma 5.2is in this spirit.

Entanglement of formation was defined by Bennett et al. (1996c), and theyshowed that it is a selective LOCC monotone. They also showed that it is anupper bound on entanglement cost, and thus an upper bound on distillableentanglement (we did not define these concepts here, but we will define distil-lable entanglement in detail in Chapter 8). The uniform continuity bound forentanglement of formation in Proposition 5.4 was given by Winter (2016). Thefaithfulness bounds for entanglement of formation in Proposition 5.5 weregiven by Li and Winter (2018) (see also Nielsen (2000)). The non-additivityof entanglement of formation was established by Hastings (2009), which built

527

Page 539: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

upon an earlier result of Shor (2004) connecting various additivity conjecturesin quantum information theory. The formula in (5.1.93) for the entanglementof formation of two-qubit states was determined by Wootters (1998).

Negavitity and log-negativity were defined by Zyczkowski et al. (1998)(see also (Vidal and Werner, 2002) and (Plenio, 2005)). Vidal and Werner(2002) showed that log-negativity is monotone under LOCC. They also provedthat it is additive. See also (Plenio, 2005) for a proof of selective LOCC mono-tonicity of log-negativity, and for a proof of the fact that log-negativity is notconvex. The semi-definite programming approach, which we have taken hereto prove selective LOCC monotonicity of log-negativity, is based on Wangand Duan (2016a).

The relative entropy of entanglement was defined by Vedral et al. (1997);Vedral and Plenio (1998), who also proved many of its properties, such asLOCC monotonicity and convexity. They also proposed concepts closely re-lated to the generalized divergence of entanglement. The fact that trace dis-tance of entanglement is not a selective LOCC monotone was shown by Qiaoet al. (2018). Optimization over the set of separable states has been shownby Gurvits (2004); Gharibian (2010) to be NP-hard in general (see also (Ioan-nou, 2007; Shi and Wu, 2012)). A closed-form formula for the relative en-tropy of entanglement for two-qubit states was derived by Miranowicz andIshizaka (2008). The max-relative entropy of entanglement was defined byDatta (2009b), the hypothesis testing relative entropy of entanglement by Bran-dao and Datta (2011), and the sandwiched Rényi relative entropy of entangle-ment by Wilde et al. (2017). The fact that the relative entropy of entanglementis invariant under classical communication (Proposition 5.17) was stated byHorodecki (2005) and proven by Kaur and Wilde (2017). Our proof of se-lective separable monotonicity in Proposition 5.19, for the Rényi relative en-tropies, is based on the approach from Wang and Wilde (2019a). The proof ofProperty 1. of Proposition 5.20 follows the approach from Plenio et al. (2000),and the proof of Property 2. follows the approach of (Tomamichel et al., 2017,Proposition 10). Property 2. of Proposition 5.20 was first established by Vedraland Plenio (1998). The cone program formulations of max-relative entropy ofentanglement of a bipartite state, as well as max-relative entropy of entangle-ment of a quantum channel, were given by Berta and Wilde (2018).

The relative entropy with the set of PPT states was defined by Rains (1999a)in the context of entanglement distillation (see Chapter 8). Rains (2001) thenmodified the quantity to obtain a tighter bound on the distillable entangle-ment. After this development, Audenaert et al. (2002) defined the set PPT′

528

Page 540: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

and showed that this improved bound can be written as the relative entropywith the set of PPT′ operators, which we refer to here as the Rains relativeentropy. The fact that the set PPT’ is preserved by completely PPT preservingchannels (Property 1. of Lemma 5.14) was shown by Tomamichel et al. (2017).The sandwiched Rains relative entropy of a bipartite state was defined byTomamichel et al. (2017), as well as the generalized Rains divergence, both asgeneralizations of the Rains relative entropy of Rains (2001); Audenaert et al.(2002). The semi-definite program formulation of the max-Rains relative en-tropy of a bipartite state was given by Wang and Duan (2016a); Wang et al.(2019b), who also recognized that the semi-definite programming bound fromWang and Duan (2016a) is equal to the max-Rains relative entropy. Wang andDuan (2016a) proved that the max-Rains relative entropy is a selective PPTmonotone (Proposition 5.28) and that it is additive (Proposition 5.29). Mira-nowicz and Ishizaka (2008) proved that the Rains relative entropy is equal tothe relative entropy of entanglement for two-qubit states.

The squashed entanglement of a bipartite state was defined by Chris-tandl and Winter (2004), who established several of its key properties men-tioned in Propositions 5.32, 5.33, and 5.36, including non-negativity, vanish-ing on separable states, convexity, superadditivity in general, additivity fortensor-product states, LOCC monotonicity, reduction for pure states, and thesquashing channel representation in (5.4.38). The faithfulness of squashedentanglement was established by Brandao et al. (2011). A function related tosquashed entanglement was discussed by Tucci (1999, 2002). Our discussionsmotivating squashed entanglement are related to those presented by Tucci(1999, 2002). The representation of squashed entanglement in (5.4.39) is dueto Takeoka et al. (2014). Uniform continuity of the squashed entanglement ofa bipartite state was established by Alicki and Fannes (2004), and the explicitbound given here is due to Shirokov (2017).

The entanglement of a quantum channel was presented by Takeoka et al.(2014); Tomamichel et al. (2017), by employing the squashed entanglementand the Rains relative entropy entanglement measures, respectively. Theamortized entanglement of a quantum channel has its roots in early workby Bennett et al. (2003); Leifer et al. (2003), and it was formally defined and itsvarious properties established by Kaur and Wilde (2017). The work by Rigo-vacca et al. (2018) is related to the notion of amortized entanglement. Theconnection of amortized entanglement to teleportation simulation of quan-tum channels was elucidated by Kaur and Wilde (2017). A channel’s rela-tive entropy of entanglement was defined by Pirandola et al. (2017), max-relative entropy of entanglement by Christandl and Müller-Hermes (2017),

529

Page 541: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

and the hypothesis testing and sandwiched Rényi relative entropy of entan-glement by Wilde et al. (2017). Proposition 5.47 is based on (Tomamichel et al.,2017, Proposition 2) and (Wilde et al., 2017, Proposition 14). The general-ized Rains information of a quantum channel was defined by Tomamichelet al. (2017), who explicitly considered the Rains information and the sand-wiched Rényi Rains information as special cases. Tomamichel et al. (2016)focused on the hypothesis testing Rains information of a quantum channel.Wang et al. (2019b) observed that the quantity defined by Wang and Duan(2016b) is equal to the max-Rains information of a quantum channel. Thesemi-definite program formulation of the max-Rains information of a quan-tum channel in Proposition 5.51 is due to Wang and Duan (2016b); Wanget al. (2019b). Proposition 5.50 is due to Tomamichel et al. (2017). Concavityof a channel’s unoptimized squashed entanglement (Proposition 5.53) is dueto Takeoka et al. (2014). The amortization collapse of max-relative entropyof entanglement and of max-Rains information (Theorems 5.54 and 5.56, re-spectively) were shown by Berta and Wilde (2018), with the amortization col-lapse of max-relative entropy implicitly considered by Christandl and Müller-Hermes (2017). Lemma 5.60 is due to Takeoka et al. (2014), and the explicitobservation that Lemma 5.60 implies that amortization does not increase thesquashed entanglement of a quantum channel (Theorem 5.58) was realizedby Kaur and Wilde (2017). Corollary 5.59 was established by Takeoka et al.(2014).

Appendix 5.A Semi-Definite Programs for Negativ-ity

Here we prove that the quantity ‖TB(ρAB)‖1 has the following primal anddual SDP formulations:

‖TB(ρAB)‖1 = supRAB

Tr[RABρAB] : −1AB ≤ TB(RAB) ≤ 1AB , (5.A.1)

= infKAB,LAB≥0

Tr[KAB + LAB] : TB(KAB − LAB) = ρAB , (5.A.2)

where the optimization in the first line is with respect to Hermitian RAB. Sincethis is the core quantity underlying both the negativity and the log-negativity,it follows that these entanglement measures can be computed by means ofsemi-definite programs. To see the first equality, consider that

‖TB(ρAB)‖1 = ‖TB(ρAB)‖1 (5.A.3)

530

Page 542: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

= sup‖RAB‖∞≤1

Tr[RABTB(ρAB)] (5.A.4)

= sup‖RAB‖∞≤1

Tr[TB(RAB)ρAB] (5.A.5)

= sup‖TB(RAB)‖∞≤1

Tr[RABρAB] (5.A.6)

= supRAB

Tr[RABρAB] : −1AB ≤ TB(RAB) ≤ 1AB . (5.A.7)

The second equality follows from Hölder duality (see (2.2.106)), and sinceTB(ρAB) is Hermitian, it suffices to optimize over Hermitian RAB. The thirdequality follows because the partial transpose is its own Hilbert–Schmidt ad-joint. The fourth equality follows from the substitution RAB → TB(RAB). Thefinal equality follows because the inequality ‖TB(RAB)‖∞ ≤ 1 is equivalentto −1AB ≤ TB(RAB) ≤ 1AB for a Hermitian operator TB(RAB).

Now consider that the set of Hermitian operators is equivalent to the setof operators formed as differences of positive semi-definite operators. So thisimplies that

‖TB(ρAB)‖1 =

supPAB,QAB≥0

Tr[(PAB −QAB) ρAB] : −1AB ≤ TB(PAB −QAB) ≤ 1AB . (5.A.8)

Then by setting

X =

(PAB 0

0 QAB

), A =

(ρAB 0

0 −ρAB

), B =

(1AB 0

0 1AB

), (5.A.9)

Φ(X) =

(TB(PAB −QAB) 0

0 −TB(PAB −QAB)

), (5.A.10)

this primal SDP is now in the standard form of (2.4.1). Then, setting

Y =

(KAB 0

0 LAB

), (5.A.11)

we can calculate the Hilbert–Schmidt adjoint of Φ as

Tr[YΦ(X)]

= Tr[(

KAB 00 LAB

)(TB(PAB −QAB) 0

0 −TB(PAB −QAB)

)](5.A.12)

= Tr[KAB(TB(PAB −QAB))]− Tr[LAB(TB(PAB −QAB))] (5.A.13)

531

Page 543: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

= Tr[TB(KAB)(PAB −QAB))]− Tr[TB(LAB)(PAB −QAB)] (5.A.14)= Tr[TB(KAB − LAB)PAB] + Tr[TB(LAB − KAB)QAB] (5.A.15)

= Tr[(

TB(KAB − LAB) 00 −TB(KAB − LAB)

)(PAB 0

0 QAB

)], (5.A.16)

so that

Φ†(Y) =(

TB(KAB − LAB) 00 −TB(KAB − LAB)

). (5.A.17)

Then, plugging into the standard form for the dual SDP in (2.4.2) and simpli-fying a bit, we find that it is given by

infKAB,LAB≥0

Tr[KAB + LAB] : TB(KAB − LAB) ≥ ρAB,

−TB(KAB − LAB) ≥ −ρAB

= infKAB,LAB≥0

Tr[KAB + LAB] : TB(KAB − LAB) = ρAB . (5.A.18)

Strong duality holds according to Theorem 2.22. Indeed, setting KAB andLAB to the respective positive and negative parts of TB(ρAB) is feasible for thedual, while setting RAB = 1AB/2 is strictly feasible for the primal.

Appendix 5.B The α → 1 and α → ∞ Limits of theSandwiched Rényi EntanglementMeasures

In this section, we prove the α → 1 and α → ∞ limits of the sandwichedRényi state and channel entanglement measures that we have considered inthis chapter. Specifically, we consider the limits of

Eα(A; B)ρ = infσAB∈SEP(A:B)

Dα(ρAB‖σAB), (5.B.1)

Eα(N) = supψRA

infσRB∈SEP(R:B)

Dα(NA→B(ψRA)‖σRB), (5.B.2)

Rα(A; B)ρ = infσAB∈PPT′(A:B)

Dα(ρAB‖σAB), (5.B.3)

Rα(N) = supψRA

infσRB∈PPT′(R:B)

Dα(NA→B(ψRA)‖σRB), (5.B.4)

532

Page 544: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

where α ∈ [1/2, 1) ∪ (1, ∞).

We make consistent use throughout this section of the fact that the sand-wiched Rényi relative entropy Dα is monotonically increasing in α for all α ∈(0, 1) ∪ (1, ∞) (see Proposition 4.29), as well as the fact that limα→1 Dα = D,where D is the quantum relative entropy (see Proposition 4.28). We also usethe fact that limα→∞ Dα = Dmax (see Proposition 4.58).

5.B.0.1 α → 1 Limits

We start by showing that

limα→1

Eα(A; B)ρ = ER(A; B) = infσAB∈SEP(A:B)

D(ρAB‖σAB). (5.B.5)

The proof of

limα→1

Rα(A; B)ρ = R(A; B)ρ = infσAB∈PPT′(A:B)

D(ρAB‖σAB) (5.B.6)

is analogous, and so we omit it.

When approaching one from above, due to monotonicity in α of Dα (Propo-sition 4.29), we have that

limα→1+

Dα = infα∈(1,∞)

Dα = D. (5.B.7)

Therefore, we readily obtain

limα→1+

Eα(A; B)ρ = infα∈(1,∞)

infσAB∈SEP(A:B)

Dα(ρAB‖σAB) (5.B.8)

= infσAB∈SEP(A:B)

infα∈(1,∞)

Dα(ρAB‖σAB) (5.B.9)

= infσAB∈SEP(A:B)

D(ρAB‖σAB) (5.B.10)

= ER(A; B)ρ. (5.B.11)

Now, when approaching one from below, we have

limα→1−

Dα = supα∈(0,1)

Dα = D. (5.B.12)

533

Page 545: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Therefore,

limα→1−

Eα(A; B)ρ = supα∈(1/2,1)

infσAB∈SEP(A:B)

Dα(ρAB‖σAB). (5.B.13)

Now, we apply Theorem 2.19. Specifically, we can apply the theorem in orderto exchange the order of the infimum and supremum because the function

(σAB, α) 7→ Dα(ρAB‖σAB) (5.B.14)

is continuous in the first argument and montonically increasing in the secondargument (also, the set of separable states is compact). We thus obtain

limα→1−

Eα(A; B)ρ = supα∈(1/2,1)

infσAB∈SEP(A:B)

Dα(ρAB‖σAB) (5.B.15)

= infσAB∈SEP(A:B)

supα∈(1/2,1)

Dα(ρAB‖σAB) (5.B.16)

= infσAB∈SEP(A:B)

D(ρAB‖σAB) (5.B.17)

= ER(A; B)ρ. (5.B.18)

This concludes the proof of (5.B.5).

Now, for the channel measure, we show that

limα→1

Eα(N) = ER(N) = supψRA

infσRB∈SEP(R:B)

D(NA→B(ψRA)‖σRB). (5.B.19)

The proof of

limα→1

Rα(N) = R(N) = supψRA

infσRB∈PPT′(R:B)

D(NA→B(ψRA)‖σRB) (5.B.20)

is analogous, and so we omit it.

When approaching one from below, we use exactly the same argumentsas above to exchange the infimum and supremum, in order to conclude that

limα→1−

Eα(N) = supα∈(1/2,1)

supψRA

infσRB∈SEP(R:B)

Dα(NA→B(ψRA)‖σRB) (5.B.21)

= supψRA

supα∈(1/2,1)

infσRB∈SEP(R:B)

Dα(NA→B(ψRA)‖σRB) (5.B.22)

= supψRA

infσRB∈SEP(R:B)

supα∈(1/2,1)

Dα(NA→B(ψRA)‖σRB) (5.B.23)

534

Page 546: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

= supψRA

infσRB∈SEP(R:B)

D(NA→B(ψRA)‖σRB) (5.B.24)

= ER(N). (5.B.25)

Next, when approaching from above, we again use Theorem 2.19. This time,since the function

(ψRA, α) 7→ infσRB∈SEP(R:B)

Dα(NA→B(ψRA)‖σRB) (5.B.26)

is continuous in the first argument and monotonically increasing in the sec-ond argument, we can exchange the order of the infimum and supremum toobtain

limα→1+

Eα(N) = infα∈(1,∞)

supψRA

infσRB∈SEP(R:B)

Dα(NA→B(ψRA)‖σRB) (5.B.27)

= supψRA

infα∈(1,∞)

infσRB∈SEP(R:B)

Dα(NA→B(ψRA)‖σRB) (5.B.28)

= supψRA

infσRB∈SEP(R:B)

infα∈(1,∞)

Dα(NA→B(ψRA)‖σRB) (5.B.29)

= supψRA

infσRB∈SEP(R:B)

D(NA→B(ψRA)‖σRB) (5.B.30)

= ER(N). (5.B.31)

This concludes the proof of (5.B.19).

5.B.0.2 α → ∞ Limits

We now move on to the α→ ∞ limits. We first show that

limα→∞

Eα(A; B)ρ = Emax(A; B) = infσAB∈SEP(A:B)

Dmax(ρAB‖σAB). (5.B.32)

The proof of

limα→∞

Rα(A; B)ρ = Rmax(A; B)ρ = infσAB∈PPT′(A:B)

Dmax(ρAB‖σAB) (5.B.33)

is analogous, and so we omit it.

Note that due to monotonicity in α of Dα, the following equality holds

limα→∞

Dα = supα∈(1,∞)

Dα = Dmax. (5.B.34)

535

Page 547: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

Therefore,

limα→∞

Eα(A; B)ρ = supα∈(1,∞)

infσAB∈SEP(A:B)

Dα(ρAB‖σAB). (5.B.35)

Now, since the function

(σAB, α) 7→ Dα(ρAB‖σAB) (5.B.36)

is continuous in the first argument, monotonically increasing in the secondargument, and because the set of separable states is compact, we can use The-orem 2.19 to change the order of the supremum and infimum to obtain

limα→∞

Eα(A; B)ρ = supα∈(1,∞)

infσAB∈SEP(A:B)

Dα(ρAB‖σAB) (5.B.37)

= infσAB∈SEP(A:B)

supα∈(1,∞)

Dα(ρAB‖σAB) (5.B.38)

= infσAB∈SEP(A:B)

Dmax(ρAB‖σAB) (5.B.39)

= Emax(A; B)ρ. (5.B.40)

This completes the proof of (5.B.32).

Finally, we prove that

limα→∞

Eα(N) = Emax(N) = supψRA

infσRB∈SEP(R:B)

Dmax(NA→B(ψRA)‖σRB). (5.B.41)

The proof of

limα→∞

Rα(N) = Rmax(N) = supψRA

infσRB∈PPT′(R:B)

Dmax(NA→B(ψRA)‖σRB)

(5.B.42)is analogous, and so we omit it.

Using exactly the same argument as in the proof of (5.B.32) above, weobtain

limα→∞

Eα(N) = supα∈(1,∞)

supψRA

infσRB∈SEP(R:B)

Dα(NA→B(ψRA)‖σRB) (5.B.43)

= supψRA

supα∈(1,∞)

infσRB∈SEP(R:B)

Dα(NA→B(ψRA)‖σRB) (5.B.44)

= supψRA

infσRB∈SEP(R:B)

supα∈(1,∞)

Dα(NA→B(ψRA)‖σRB) (5.B.45)

536

Page 548: Principles of Quantum Communication Theory: A Modern Approach

Chapter 5: Entanglement Measures

= supψRA

infσRB∈SEP(R:B)

Dmax(NA→B(ψRA)‖σRB) (5.B.46)

= Emax(N), (5.B.47)

where in the third line we used Theorem 2.19 in order to exchange the infi-mum and supremum based on exactly the same arguments used in the proofof (5.B.32) above. This concludes the proof of (5.B.41).

537

Page 549: Principles of Quantum Communication Theory: A Modern Approach

Part II

Quantum CommunicationProtocols

We now begin our study of quantum communication protocols. Inthis part of the book, we focus on point-to-point communicationprotocols, most of which do not make use of feedback between thesender and receiver. The settings include classical communication,entanglement-assisted classical communication, entanglement distil-lation, quantum communication, secret key distillation, and privatecommunication. These point-to-point protocols are the most basiccommunication models in quantum information, and in many cases,they are relevant from a practical perspective. Furthermore, theseprotocols serve as benchmarks for the usefulness of the feedback-assisted protocols that we consider in Part III, in the sense that theoptimal communication rates of any feedback-assisted protocol shouldnot be smaller than the corresponding point-to-point protocol, in or-der for the feedback-assisted protocol to be deemed useful or advan-tageous.

Page 550: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6

Entanglement-AssistedClassical Communication

The first communication task that we consider is entanglement-assisted clas-sical communication. In this scenario, Alice and Bob are allowed to share anunlimited amount of entanglement prior to communication, and the goal isfor Alice to transmit the maximum possible amount of classical informationover a given channel N, by using this prior shared entanglement as a resource.We consider this particular setting before all other communication settingsbecause, perhaps unexpectedly, the main information-theoretic results in thissetting are much simpler than those in all other communication settings thatwe consider in this book.

Entanglement is a uniquely quantum phenomenon, and it is natural toask, when communicating over quantum channels, whether it can be usedto provide an advantage for sending classical information. The super-densecoding protocol, described in Section 3.4, is an example of such an advantage.Recall that in this protocol, Alice and Bob share a pair of quantum systems inthe maximally entangled state |Φ+〉 = 1√

2(|0, 0〉 + |1, 1〉), and they are con-

nected by a noiseless qubit channel. With this shared entanglement, alongwith only one use of the channel, Alice can communicate two bits of classi-cal information to Bob. In the case of qudits, using the maximally entangledstate 1√

d ∑d−1i=0 |i, i〉, Alice can communication 2 log2 d bits to Bob with only one

use of a noiseless qudit quantum channel. Does this kind of advantage existin general? Specifically, supposing that we allow Alice and Bob unlimitedshared entanglement, what is the maximum amount of classical information

539

Page 551: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

that can be communicated over a given quantum channel N?

The answer to this question is provided by Theorem 6.16, which tells usthat the entanglement-assisted classical capacity of a channel N is equal to themutual information I(N) of the channel (see (4.11.102)). The strength of thisresult is that it holds for all channels. Entanglement-assisted classical com-munication is one of the few scenarios in which such a profoundly simplestatement—applying to all channels—can be made. Furthermore, the fact thatthe mutual information I(N) is the optimal rate for entanglement-assistedclassical communication for all channels N makes this communication sce-nario formally analogous to communication over classical channels. Indeed,the famous result of Shannon from 1948 is that the capacity of a classical chan-nel described by a conditional probability distribution pY|X(y|x) with inputand output random variables X and Y, respectively, is equal to maxpX I(X; Y),where I(X; Y) is the mutual information between the random variables X andY and the optimization is performed over all probability distributions pX cor-responding to the input X. Entanglement-assisted classical communicationcan thus be viewed as a “natural” analogue of classical communication in thequantum setting.

6.1 One-Shot Setting

We begin by considering the one-shot setting for entanglement-assisted clas-sical communication over N, with such a protocol depicted in Figure 6.1. Wecall this the “one-shot setting” because the channel N is used only once. Thisis in contrast to the “asymptotic setting” that we consider in the next section,in which the channel may be used an arbitrarily large number of times.

The protocol depicted in Figure 6.1 is defined by the four elements (M,ΨA′B′ ,EM′A′→A,DBB′→M), in which M is a message set, ΨA′B′ is an entangledstate shared by Alice and Bob, EM′A′→A is an encoding channel, and DBB′→Mis a decoding channel. The triple (Ψ,E,D), consisting of the resource state andencoding and decoding channels, is called an entanglement-assisted code or,more simply, a code. In what follows, we employ the abbreviation

P ≡ (Ψ,E,D), (6.1.1)

where the notation P indicates protocol.

Given that there are |M| messages in the message set, it holds that eachmessage can be uniquely associated with a bit string of size at least log2 |M|.

540

Page 552: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

E NM 3 m A

ΨA′B′AliceBob

A′

B′

B

m

FIGURE 6.1: Depiction of a protocol for entanglement-assisted classicalcommunication over one use of the quantum channel N. Alice and Bobinitially share a pair of quantum systems in the state ΨA′B′ . Alice, whowishes to send a message m selected from a set M of messages, first en-codes the message into a quantum state on a quantum system A by usingan encoding channel E. She then sends the quantum system A throughthe channel N. After Bob receives the system B, he performs a measure-ment on both of his systems BB′, using the outcome of the measurementto give an estimate m of the message sent by Alice.

The quantity log2 |M| thus represents the number of bits that are communi-cated in the protocol. One of the goals of this section is to obtain upper andlower bounds, in terms of information measures for channels, on the maxi-mum number log2 |M| of bits that can be communicated in an entanglement-assisted classical communication protocol.

The protocol proceeds as follows: let p : M → [0, 1] be a probability dis-tribution over the message set. Alice starts by preparing two systems M andM′ in the following classically correlated state:

ΦpMM′ := ∑

m∈Mp(m)|m〉〈m|M ⊗ |m〉〈m|M′ . (6.1.2)

Note that if Alice wishes to send a particular message m deterministically,then she can choose the distribution p to be the degenerate distribution, equalto one for m and zero for all other messages. Alice and Bob share the stateΨA′B′ before communication begins, so that the global state shared betweenthem is

ΦpMM′ ⊗ΨA′B′ . (6.1.3)

Alice then sends the M′ and A′ registers through the encoding channelEM′A′→A. Due to the fact that the system M′ is classical, this encoding channelrealizes a set Em

A′→Am∈M of quantum channels as follows:

EmA′→A(τA′) := EM′A′→A(|m〉〈m|M′ ⊗ τA′) (6.1.4)

541

Page 553: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

for every state τA′ . The global state after the encoding channel is therefore

EM′A′→A(ΦpMM′ ⊗ΨA′B′) = ∑

m∈Mp(m)|m〉〈m|M ⊗ Em

A′→A(ΨA′B′). (6.1.5)

Alice then transmits the A system through the channel NA→B, leading tothe state

(NA→B EM′A′→A)(ΦpMM′ ⊗ΨA′B′)

= ∑m∈M

p(m)|m〉〈m|M ⊗ (NA→B EmA′→A)(ΨA′B′).

= ∑m∈M

p(m)|m〉〈m|M ⊗ τmBB′ ,

(6.1.6)

whereτm

BB′ := (NA→B EmA′→A)(ΨA′B′) ∀ m ∈M. (6.1.7)

Bob, whose task is to determine which message Alice sent, applies a de-coding channel DBB′→M on his system B′ and the system B received throughthe channel N. The decoding channel is a quantum-classical channel (Defini-tion 3.28) associated with a POVM Λm

BB′m∈M, so that

DBB′→M(τmBB′) := ∑

m∈MTr[Λm

BB′τmBB′ ]|m〉〈m|M, (6.1.8)

for all m ∈M. The global state in (6.1.6) thus becomes

ωpMM

:= (DBB′→M NA→B EM′A′→A)(ΦpMM′ ⊗ΨA′B′) (6.1.9)

= ∑m,m∈M

p(m)|m〉〈m|M ⊗ Tr[ΛmBB′NA→B(E

mA′→A(ΨA′B′))]|m〉〈m|M.

(6.1.10)

The final decoding measurement by Bob induces the conditional proba-bility distribution q : M×M→ [0, 1] defined by

q(m|m) := Pr[M = m|M = m] (6.1.11)

= Tr[ΛmBB′NA→B(E

mA′→A(ΨA′B′))]. (6.1.12)

Bob’s strategy is such that if the outcome m occurs from his measurement,then he declares that the message sent was m.

542

Page 554: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

The probability that Bob correctly identifies a given message m is thengiven by q(m|m). The message error probability of the code P ≡ (Ψ,E,D) andmessage m is then given by

perr(m,P;N) := 1− q(m|m)

= Tr[(1BB′ −ΛmBB′)NA→B(E

mA′→A(ΨA′B′))]

= ∑m∈M\m

q(m|m).(6.1.13)

The average error probability of the code is

perr(P; p,N) := ∑m∈M

p(m)perr(m;P)

= ∑m∈M

p(m)(1− q(m|m))

= ∑m∈M

∑m∈M\m

p(m)q(m|m).

(6.1.14)

The maximal error probability of the code is

p∗err(P;N) := maxm∈M

perr(m,P;N). (6.1.15)

Each of these three error probabilities can be used to assess the reliability of theprotocol, i.e., how well the encoding and decoding allow Alice to transmit hermessage to Bob.

Definition 6.1 (|M|, ε) Entanglement-Assisted Classical Communi-cation Protocol

Let (M, ΨA′B′ ,EM′A′→A,DBB′→M) be the elements of an entanglement-assisted classical communication protocol over the channel NA→B. Theprotocol is called an (|M|, ε) protocol, with ε ∈ [0, 1], if p∗err(P;N) ≤ ε.

Lemma 6.2

The following equalities hold

perr(P; p,N) =12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1

, (6.1.16)

p∗err(P;N) = maxp:M→[0,1]

12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1

, (6.1.17)

543

Page 555: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

where ΦpMM′ and ω

pMM

are defined in (6.1.2) and (6.1.9), respec-tively. Thus, the error criterion p∗err(P;N) ≤ ε is equivalent tomaxp:M→[0,1]

12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1≤ ε.

REMARK: The final criterion above states that the normalized trace distance betweenthe initial and final states of the protocol, maximized over all possible prior probabilitydistributions, does not exceed ε.

PROOF: To see this, let us first note that the normalized trace distance in(6.1.17) is equal to the average error probability of the code. Indeed,

12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1

=12

∥∥∥∥∥ ∑m∈M

p(m)|m, m〉〈m, m|MM′

− ∑m,m∈M

p(m)q(m|m)|m, m〉〈m, m|MM

∥∥∥∥∥1

(6.1.18)

=12

∥∥∥∥∥ ∑m∈M

p(m)|m〉〈m|M

⊗(|m〉〈m|M′ − ∑

m∈Mq(m|m)|m〉〈m|M

)∥∥∥∥∥1

(6.1.19)

=12 ∑

m∈Mp(m)

∥∥∥∥∥|m〉〈m|M′ − ∑m∈M

q(m|m)|m〉〈m|M

∥∥∥∥∥1

(6.1.20)

=12 ∑

m∈Mp(m) ‖(1− q(m|m))|m〉〈m|

− ∑m∈M\m

q(m|m)|m〉〈m|M

∥∥∥∥∥∥1

(6.1.21)

=12 ∑

m∈Mp(m)

(1− q(m|m)) + ∑

m∈M\mq(m|m)

(6.1.22)

=12 ∑

m∈Mp(m)(1− q(m|m))

︸ ︷︷ ︸perr(P;p,N)

+12 ∑

m∈M∑

m∈M\mp(m)q(m|m)

︸ ︷︷ ︸perr(P;p,N)

(6.1.23)

544

Page 556: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

= perr(P; p,N), (6.1.24)

where to obtain the third and fifth equality we used (2.2.105) with α = 1.Then, if m∗ ∈M is the message attaining the maximum error probability p∗err,let p : M → [0, 1] be the probability distribution such that p(m∗) = 1 andp(m) = 0 for all m 6= m∗. Using this probability distribution, we obtain

p∗err(P;N) = maxm∈M

perr(m,P;N) (6.1.25)

= ∑m∈M

p(m)perr(m,P;N) (6.1.26)

= perr(P; p,N) (6.1.27)

=12

∥∥∥Φ pMM′ −ω

pMM

∥∥∥1

(6.1.28)

≤ maxp:M→[0,1]

12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1

. (6.1.29)

Furthermore, letting p∗ be the distribution attaining the maximum averageerror probability, we find that

maxp:M→[0,1]

12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1= perr(P; p∗,N) (6.1.30)

= ∑m∈M

p∗(m)perr(m,P;N) (6.1.31)

≤ ∑m∈M

p∗(m)p∗err(P;N) (6.1.32)

= p∗err(P;N), (6.1.33)

where to obtain the inequality we used the fact that

perr(m,P;N) ≤ maxm∈M

perr(m,P;N) = p∗err(P;N) (6.1.34)

for all m ∈M. So we find that

p∗err(P;N) = maxp:M→[0,1]

perr(P; p) (6.1.35)

= maxp:M→[0,1]

12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1

. (6.1.36)

This concludes the proof.

545

Page 557: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Another way to define the error criterion of an (|M|, ε) protocol, whichis equivalent to the average error probability, is through what is called thecomparator test. The comparator test is a measurement defined by the two-element POVM ΠMM, 1−ΠMM, where ΠMM is the projection defined as

ΠMM := ∑m∈M|m〉〈m|M ⊗ |m〉〈m|M. (6.1.37)

Note that Tr[ΠMMωpMM

] is equal to the probability that the classical registers

M and M in the state ωMM have the same values. In particular, observe that

Tr[ΠMMω

pMM

]

= Tr

[(∑

m∈M|m, m〉〈m, m|MM

)

×(

∑m′,m∈M

p(m′)q(m|m′)|m′, m〉〈m′, m|MM

)](6.1.38)

= ∑m,m′,m∈M

p(m′)q(m|m′)δm,m′δm,m (6.1.39)

= ∑m∈M

p(m)q(m|m) (6.1.40)

= 1− perr(P; p,N). (6.1.41)

We can interpret this expression as the average success probability of the codeP ≡ (Ψ,E,D), which we denote by

psucc(P; p,N) := 1− perr(P; p,N). (6.1.42)

As mentioned at the beginning of this chapter, our goal is to bound (fromabove and below) the maximum number log2 |M| of transmitted bits in anyentanglement-assisted classical communication protocol over N. Given anerror probability tolerance of ε, we call the maximum bits of transmitted bitsthe one-shot entanglement-assisted classical capacity of N.

Definition 6.3 One-Shot Entanglement-Assisted Classical Capacity

Given a quantum channel NA→B and ε ∈ [0, 1], the one-shot ε-errorentanglement-assisted classical capacity of N, denoted by Cε

EA(N), is de-fined to be the maximum number log2 |M| of transmitted bits among all

546

Page 558: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

(|M|, ε) entanglement-assisted classical communication protocols overN. In other words,

CεEA(N) := sup

(M,Ψ,E,D)

log2 |M| : p∗err((Ψ,E,D);N) ≤ ε, (6.1.43)

where the optimization is with respect to all protocols(M, ΨA′B′ ,EM′A′→A,DBB′→M) such that dM′ = dM = |M|.

In addition to finding, for a given ε ∈ [0, 1], the maximum number oftransmitted bits among all (|M|, ε) classical communication protocols overNA→B, we can consider the following complementary problem: for a givennumber of messages |M|, find the smallest possible error among all (|M|, ε)entanglement-assisted classical communication protocols, which we denoteby ε∗EA(|M|;N). In other words, the problem is to determine

ε∗EA(|M|;N) := inf(Ψ,E,D)

p∗err((Ψ,E,D);N) : dM′ = dM = |M|, (6.1.44)

where the optimization is over every state ΨA′B′ , encoding channel EM′A′→A,and decoding channel DBB′→M, such that dM′ = dM = |M|. In this chapter,we focus primarily on the problem of optimizing the number of transmittedbits rather than the error, and so our primary quantity of interest is the one-shot capacity Cε

EA(N).

6.1.1 Protocol Over a Useless Channel

Our first goal is to obtain an upper bound on the one-shot entanglement-assisted classical capacity defined in (6.1.43). To do so, along with the entanglement-assisted classical communication protocol over the actual channel N describedabove, we also consider the same protocol but over the useless channel de-picted in Figure 6.2. This useless channel discards the quantum state encodedwith the message and replaces it with some arbitrary (but fixed) state σB. Thisreplacement channel is useless for communication because the state σB doesnot contain any information about the message m. Intuitively, we can say thatsuch a channel corresponds to “cutting the communication line.” As we showin Lemma 6.4, comparing this protocol over the useless channel with the ac-tual protocol allows us to obtain an upper bound on the quantity log2 |M|,which we recall represents the number of bits that are transmitted over thechannel.

547

Page 559: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

E PσBM 3 m A

ΨA′B′AliceBob

A′

B′

B

m

FIGURE 6.2: Depiction of a protocol that is useless for entanglement-assisted classical communication. The state encoding the message m via E

is discarded and replaced by an arbitrary (but fixed) state σB.

The definition of the useless channel implies that, for every message m ∈M,

EmA′→A(ΨA′B′) 7→ PσB TrA Em

A′→A(ΨA′B′) = σB ⊗ΨB′ , (6.1.45)

where ΨB′ := TrA′ [ΨA′B′ ]. Making use of the definition of the replacementchannel RσB

A→B in Definition 3.23, we can write this as

EmA′→A(ΨA′B′) 7→ R

σBA→B(EA′→A(ΨA′B′)). (6.1.46)

The state at the end of the protocol over the useless channel is then

τpMM

:= ∑m,m∈M

p(m)Tr[ΛmBB′(σB ⊗ΨB′)]|m〉〈m|M ⊗ |m〉〈m|M

= πpM ⊗ ∑

m∈MTr[Λm

BB′(σB ⊗ΨB′)]|m〉〈m|M,(6.1.47)

whereπ

pM := ∑

m∈Mp(m)|m〉〈m|M. (6.1.48)

Now, recall from (6.1.10) that the state ωpMM

at the end of the actual protocolover the channel N is given by

ωpMM

= ∑m,m∈M

p(m)|m〉〈m|M ⊗ Tr[ΛmBB′NA→B(E

mA′→A(ΨA′B′))]|m〉〈m|M.

(6.1.49)It is helpful in what follows to let

ΦMM′ :=1|M| ∑

m∈M|m〉〈m|M ⊗ |m〉〈m|M′ (6.1.50)

548

Page 560: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

be the state in (6.1.2) in which the probability distribution p is the uniformdistribution over M.

We now state a lemma that is helpful for placing an upper bound on thenumber log2 |M| of bits communicated in an entanglement-assisted classicalcommunication protocol. This lemma can also be used for the same pur-pose for unassisted classical communication protocols, as discussed in thenext chapter.

Lemma 6.4

Let ΦMM′ be the state defined in (6.1.50), and let ωMM′ be a state on thetwo classical registers M and M′ such that ωM = TrM′ [ωMM′ ] = πM =1M|M| . If the probability Tr[ΠMM′ωMM′ ] that the state ωMM′ passes thecomparator test defined by the POVM ΠMM, 1−ΠMM, where ΠMMis the projection defined in (6.1.37), satisfies

Tr[ΠMM′ωMM′ ] ≥ 1− ε, (6.1.51)

for some ε ∈ [0, 1], then

log2 |M| ≤ IεH(M; M′)ω, (6.1.52)

where the ε-hypothesis testing mutual information IεH(M; M′)ω is de-

fined in (4.11.88).

PROOF: By assumption, we have that

Tr[ΠMM′ωMM′ ] ≥ 1− ε, (6.1.53)

Now, consider a state τMM′ of the form τMM′ = ωM⊗ σM′ = πM⊗ σM′ , whereσM′ is some state.1 Then,

Tr[ΠMM′τMM′ ] = Tr[ΠMM′(πM ⊗ σM′)] (6.1.54)

=1|M|Tr[ΠMM′(1M ⊗ σM′)] (6.1.55)

=1|M|Tr[TrM[ΠMM′ ]σM′ ] (6.1.56)

1Note that the state in (6.1.47) at the end of the entanglement-assisted classical commu-nication protocol over the useless channel has precisely this form (when p is taken to be theuniform distribution over the message set M).

549

Page 561: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

=1|M| , (6.1.57)

We thus obtain

log2 |M| = − log2 Tr[ΠMM′τMM′ ] (6.1.58)≤ Dε

H(ωMM′‖τMM′) (6.1.59)= Dε

H(ωMM′‖ωM ⊗ σM′), (6.1.60)

where the inequality follows from the definition of the hypothesis testing rel-ative entropy in (4.9.1) (i.e., ΠMM′ is a particular measurement operator sat-isfying (6.1.53), but Dε

H(ωMM′‖τMM′) involves an optimization over all suchoperators). Since the state σM′ is arbitrary, we conclude that

log2 |M| ≤ infσM′

DεH(ωMM′‖ωM ⊗ σM′) = Iε

H(M; M′)ω, (6.1.61)

which is (6.1.52), as required.

The right-hand side of (6.1.52) is an upper bound on the number log2 |M|of bits communicated using an (|M|, ε) entanglement-assisted classical com-munication protocol over the channel N. Indeed, since the error criterionp∗err(P) ≤ ε holds by definition of an (|M|, ε) protocol, using (6.1.36) and(6.1.24) we obtain

perr(P; p,N) ≤ maxp:M→[0,1]

perr(P; p,N) (6.1.62)

= maxp:M→[0,1]

12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1

(6.1.63)

= p∗err(P;N) (6.1.64)≤ ε (6.1.65)

for every probability distribution p on M. In particular, the inequality aboveholds with p being the uniform distribution on M, so that p(m) = 1

|M| for allm ∈M. Let us define the state

ωMM :=1|M| ∑

m,m∈MTr[Λm

BB′NA→B(EmA′→A(ΨA′B′))]|m, m〉〈m, m|MM, (6.1.66)

which is the state ωpMM

with p being the uniform distribution over M. Ob-serve that

TrM[ωMM] =1|M| ∑

m∈MTr

[(∑

m∈MΛm

BB′

)NA→B(E

mA′→A(ΨA′B′))

]|m〉〈m|M

(6.1.67)

550

Page 562: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

=1|M| ∑

m∈MTr[NA→B(E

mA′→A(ΨA′B′))]|m〉〈m|M (6.1.68)

= πM, (6.1.69)

where the last equality follows because the channels NA→B and EmA′→A are

trace preserving. Finally, since the probability of passing the comparator testis given by (6.1.41), i.e.,

Tr[ΠMMω

pMM

]= 1− perr(P; p,N) (6.1.70)

for every probability distribution p, we find that Tr[ΠMMωMM] ≥ 1− ε. Thestate ωMM thus satisfies the condition of Lemma 6.4. We conclude that

log2 |M| ≤ IεH(M; M)ω (6.1.71)

for every (|M|, ε) entanglement-assisted classical communication protocol.

Recall from Section 4.9 that the hypothesis testing relative entropy hasan operational meaning as the optimal type-II error exponent in asymmet-ric hypothesis testing. The quantity Dε

H(ωMM‖ωM ⊗ σM) thus represents theoptimal type-II error exponent, subject to the upper bound of ε on the type-Ierror exponent, for distinguishing between the state resulting from the ac-tual entanglement-assisted classical communication protocol over N and thestate resulting from an entanglement-assisted classical communication proto-col over a useless channel, which discards the state encoded with the messageand replaces it with the state σM. By taking an infimum over all states σM, thequantity Iε

H(M; M)ω represents the smallest possible minimum type-II errorexponent. The bound in (6.1.71) thus establishes a close link between the tasksof reliable communication and hypothesis testing.

Given a particular choice of the encoding and decoding channels, as wellas a particular choice of the shared state ΨA′B′ , if p∗err(P;N) ≤ ε, then thequantity Iε

H(M; M)ω in (6.1.71) is an upper bound on the maximum num-ber of bits that can be transmitted over the channel N. The optimal value ofthis upper bound can be realized by finding the state σM defining the uselesschannel that optimizes the quantity Iε

H(M; M)ω in addition to the measure-ment that achieves the ε-hypothesis testing relative entropy. Importantly, adifferent choice of encoding, decoding, and of the state ΨA′B′ produces a dif-ferent value for this upper bound. We would thus like to find an upper boundthat applies regardless of which specific protocol is chosen. In other words,we would like an upper bound that is a function of the channel N only, andthis is the topic of the next section.

551

Page 563: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

6.1.2 Upper Bound on the Number of Transmitted Bits

We now give a general upper bound on the number of transmitted bits possi-ble for an arbitrary one-shot entanglement-assisted classical communicationprotocol for a channel N. This result is stated in Theorem 6.6. The upperbound obtained therein holds independently of the encoding and decodingchannels used in the protocol and depends only on the given communicationchannel N.

Let us start with an arbitrary (|M|, ε) entanglement-assisted classical com-munication protocol over N corresponding to, as described at the beginningof this chapter, a message set M, a prior shared entangled state ΨA′B′ , an en-coding channel E, and a decoding channel D. The error criterion p∗err(P;N) ≤ε holds by the definition of an (|M|, ε) protocol. Then, by the argumentsat the end of the previous section, Lemma 6.4 implies that the inequalitylog2 |M| ≤ Iε

H(M; M)ω holds. Using this bound on the number log2 |M| oftransmitted bits, we obtain the following result:

Proposition 6.5 Upper Bound on One-Shot Entanglement-AssistedClassical Capacity

Let NA→B be a quantum channel. For every (|M|, ε) entanglement-assisted classical communication protocol over NA→B, with ε ∈ [0, 1],the number of bits transmitted over N is bounded from above by theε-hypothesis testing mutual information of N, defined in (4.11.87), i.e.,

log2 |M| ≤ IεH(N). (6.1.72)

Consequently, for the one-shot entanglement-assisted classical capacity,we have

CεEA(N) ≤ Iε

H(N) (6.1.73)

for all ε ∈ [0, 1].

PROOF: First, let us apply Lemma 6.4 to conclude that

log2 |M| ≤ IεH(M; M)ω, (6.1.74)

where ωMM is defined in (6.1.66). Using the data-processing inequality forthe hypothesis testing mutual information under the action of the decoding

552

Page 564: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

channel DBB′→M (from Proposition 4.17), we find that

IεH(M; M)ω ≤ Iε

H(M; BB′)θ, (6.1.75)

where the state θMBB′ is the same as that in (6.1.6):

θMBB′ := (NA→B EM′A′→A)(ΦMM′ ⊗ΨA′B′). (6.1.76)

Observe that the reduced state θMB′ is a product state because the channelNA→B and encoding EM′A′→A are trace preserving:

θMB′ = TrB[(NA→B EM′A′→A)(ΦMM′ ⊗ΨA′B′)] (6.1.77)

= TrM′A′ [ΦMM′ ⊗ΨA′B′ ] (6.1.78)

= ΦM ⊗ΨB′ (6.1.79)= θM ⊗ θB′ . (6.1.80)

Now, by definition, we have that

IεH(M; BB′)θ = inf

σBB′Dε

H(θMBB′‖θM ⊗ σBB′) (6.1.81)

Choosing σBB′ to be σB ⊗ θB′ and optimizing over σB only, we find that

IεH(M; BB′)θ ≤ inf

σBDε

H(θMBB′‖θM ⊗ σB ⊗ θB′) (6.1.82)

= infσB

DεH(θMBB′‖θMB′ ⊗ σB) (6.1.83)

= IεH(MB′; B)θ, (6.1.84)

where the first equality follows from the observation in (6.1.80) and the sec-ond equality follows by definition. Now, observe that the state θMBB′ has theform NA→B(ρSA), where S ≡ MB′ and ρSA ≡ EM′A′→A(ΦMM′ ⊗ΨA′B′). Thismeans that we can optimize over every state ρSA to obtain

IεH(MB′; B)θ ≤ sup

ρSA

IεH(S; B)ξ , (6.1.85)

where ξSB = NA→B(ρSA). Note that this optimization over states ρSA is ef-fectively an optimization over all possible encoding channels EM′A′→A thatdefine the (|M|, ε) protocol. Now, since it suffices to take pure states whenoptimizing the ε-hypothesis testing mutual information of bipartite states,following from the same reasoning in (4.11.4)–(4.11.7) in the context of gener-alized divergences, we find that

IεH(MB′; B)θ ≤ sup

ψSA

IεH(S; B)ζ (6.1.86)

553

Page 565: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

= IεH(N), (6.1.87)

where ψSA is a pure state, with the dimension of S the same as that of A, andζSB = NA→B(ψSA). So we have

log2 |M| ≤ IεH(M; M)ω ≤ Iε

H(MB′; B)θ ≤ IεH(N), (6.1.88)

as required.

The result of Proposition 6.5 can be written explicitly as

log2 |M| ≤ supψRA

infσB

DεH(NA→B(ψRA)‖ψR ⊗ σB) (6.1.89)

= supψRA

infσB

DεH(NA→B(ψRA)‖RσB

A→B(ψRA)), (6.1.90)

in which we explictly see the comparison, via the hypothesis testing relativeentropy, between the actual entanglement-assisted classical communicationprotocol and the protocols over the useless channels RσB

A→B, with each labeledby the state σB. The state ψRA corresponds to the state after the encodingchannel, and optimizing over these states is effectively an optimization overall encoding channels and all shared entangled states. The latter is true sincethe preparation of the shared state ΨA′B′ can always be incorporated into alarger encoding channel. Specifically, the encoding EM′A′→A(ΦMM′ ⊗ ΨA′B′)can be written as E′M′→A(ΦMM′), where E′M′→A := EM′A′→A PΨA′B′ .

As an immediate consequence of Propositions 6.5, 4.67, and 4.68, we havethe following two bounds:

Theorem 6.6 One-Shot Upper Bounds for Entanglement-AssistedClassical Communication

Let NA→B be a quantum channel, and let ε ∈ [0, 1). For all (|M|, ε)entanglement-assisted classical communication protocols over the chan-nel N, the following bounds hold,

log2 |M| ≤1

1− ε(I(N) + h2(ε)), (6.1.91)

log2 |M| ≤ Iα(N) +α

α− 1log2

(1

1− ε

)∀ α > 1, (6.1.92)

where I(N) is the mutual information of N, as defined in (4.11.102),and Iα(N) is the sandwiched Rényi mutual information of N, as defined

554

Page 566: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

in (4.11.91).

Since the bounds in (6.1.91) and (6.1.92) hold for every (|M|, ε) entanglement-assisted classical communication protocol over N, we have that

CεEA(N) ≤ 1

1− ε(I(N) + h2(ε)), (6.1.93)

CεEA(N) ≤ Iα(N) +

α

α− 1log2

(1

1− ε

)∀ α > 1, (6.1.94)

for all ε ∈ [0, 1).

Let us recap the steps we took to arrive at the bounds in (6.1.91) and(6.1.92).

1. We first compared the entanglement-assisted classical communicationprotocol over N with the same protocol over a useless channel, by us-ing the hypothesis testing relative entropy. Lemma 6.4 plays a role inbounding the maximum number of transmitted bits for a particular pro-tocol.

2. We then used the data-processing inequality for the hypothesis testingrelative entropy to obtain a quantity that is independent of the decodingchannel, as well as being minimized over all useless protocols when com-pared to the actual protocol. This is done in (6.1.75) and (6.1.81)–(6.1.84)in the proof of Proposition 6.5.

3. Finally, we optimized over all encoding channels (and, effectively, overall shared states ΨA′B′) in (6.1.85)–(6.1.87) to obtain Proposition 6.5, inwhich the bound is a function solely of the channel and the error proba-bility.

4. Using Propositions 4.67 and 4.68, which relate the hypothesis testing rel-ative entropy to the quantum relative entropy and the sandwiched Rényirelative entropy, respectively, we arrived at Theorem 6.6.

The bounds in (6.1.91) and (6.1.92) are fundamental upper bounds on thenumber of transmitted bits for every entanglement-assisted classical commu-nication protocol. A natural question to ask now is whether the upper boundsin (6.1.91) and (6.1.92) can be achieved. In other words, is it possible to deviseprotocols such that the number of transmitted bits is equal to the right-handside of either (6.1.91) or (6.1.92)? We do not know how to, especially if we

555

Page 567: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

demand that we exactly attain the right-hand side of either (6.1.91) or (6.1.92).However, when given many uses of a channel (in the asymptotic setting),we can come close to achieving these upper bounds. This motivates findinglower bounds on the number of transmitted bits.

6.1.3 Lower Bound on the Number of Transmitted Bits viaPosition-Based Coding and Sequential Decoding

To obtain lower bounds on the number of transmitted bits, as discussed inAppendix A, we should devise particular coding schemes. Concretely, weshould devise, for all ε ∈ (0, 1), an entanglement-assisted classical commu-nication protocol (M, ΨA′B′ ,E,D) that is an (|M|, ε) protocol, meaning thatthe maximal error probability p∗err(P;N) satisfies p∗err(P;N) ≤ ε. Recall from(6.1.15) that the maximal error probability is defined as

p∗err(P;N) = maxm∈M

perr(m,P;N), (6.1.95)

where for m ∈ M the message error probability perr(m,P;N) is defined in(6.1.13) as

perr(m,P;N) = 1− q(m|m), (6.1.96)

with q(m|m) being the probability of identifying the message sent as m, giventhat the message m was sent.

We make use of a technique called position-based coding along with sequen-tial decoding to establish the lower bound (6.1.107) in Proposition 6.8 below,which is analogous to the upper bound (6.1.72) in Proposition 6.5. We nowgive a brief description of position-based coding and sequential decoding,while leaving the details to the proof of Proposition 6.8.

Let us consider an entanglement-assisted classical communication proto-col defined by the four elements (M, ρ

⊗|M|A′B′ ,E,D) and depicted in Figure 6.3.

The state shared by Alice and Bob prior to communication is |M| copies ofa state ρA′B′ . The encoding E is defined such that if Alice wishes to senda message m ∈ M, then she sends her mth A′ system through the channel.Specifically, the encoding channels Em

(A′)|M|→Aare defined as

Em(A′)|M|→A(ρ

⊗|M|A′B′ ) = ρB′1

⊗ · · · ⊗ ρAB′m ⊗ · · · ⊗ ρB′m = TrAm

[ρ⊗|M|A′B′

], (6.1.97)

556

Page 568: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Alice Bob

N

...

...

...

...

A′1

A′2

...

A′m

...

A|M|

B′1

B′2

...

B′m

...

B|M|

ρA′1B′1

ρA′2B′2

...

...

ρA′|M|B′|M|

m 3M

FIGURE 6.3: Schematic depiction of position-based coding. Alice and Bobstart with |M| copies of a state ρA′B′ . If Alice wants to send the messagem ∈M, she sends the mth system through the channel N to Bob.

where Am indicates all systems Ak except for Am. This encoding procedureis called position-based coding because the message is encoded into the par-ticular A′ system that is sent to Bob. In other words, the message is encodedinto the “position” of the A′ systems.2

The state held by the receiver Bob after Alice sends the A system in (6.1.97)through the channel NA→B is

τmB′1···B′m···B′|M|B

:= ρB′1⊗ · · · ⊗NA→B(ρAB′m)⊗ · · · ⊗ ρB′|M|

. (6.1.98)

Bob, whose task is to determine the message m sent to him, should apply adecoding channel that ideally succeeds with high probability. The sequentialdecoding strategy consists of Bob performing a sequence of measurementson systems B′i and B. Each of these measurements is defined by the POVMΠB′BR, 1B′BR−ΠB′BR, where R is an arbitrary (finite-dimensional) referencesystem held by Bob that he makes use of to help with the decoding. In par-ticular, Bob has identical reference systems R1, . . . , R|M|, each associated with

2In practice, it would be wasteful for the sender to discard so much entanglement byexplicitly using the encoding procedure in (6.1.97). The explicit encoding given should thusbe considered a conceptual tool for understanding that the mth system is sent through thechannel, and in practice it can be realized simply by sending the mth system through thechannel.

557

Page 569: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

his B′ systems. The projectors defining the sequential decoding strategy arethen

Pi := 1B′1R1⊗ · · · ⊗ 1B′i−1Ri−1

⊗ΠB′i BRi⊗ 1B′i+1Ri+1

⊗ · · · ⊗ 1B′|M|R|M|(6.1.99)

for all 1 ≤ i ≤ |M|, and they correspond to measuring systems B′i BRi withthe POVM ΠB′BR, 1B′BR−ΠB′BR. This measurement can be thought of intu-itively as asking the question “Was the ith message sent?”, with the outcomecorresponding to Pi being “yes” and the outcome corresponding to

Pi := 1− Pi (6.1.100)

being “no.” Bob performs a measurement on the systems B′1BR1, followed bya measurement on B′2BR2, followed by a measurement on B′3BR3, etc., untilhe obtains the outcome corresponding to “yes.” The system number corre-sponding to this outcome is then his guess for the message. The probabilityq(m|m) of guessing m given that the message m was sent is, therefore,

q(m|m) = Tr[PmPm−1 · · · P1ωmB′1···B′|M|BR1···R|M| P1 · · · Pm−1Pm], (6.1.101)

where

ωmB′1···B′|M|BR1···R|M| := τm

B′1···B′|M|B⊗ |0, · · · , 0〉〈0, · · · , 0|R1···R|M| . (6.1.102)

The message error probability is then

perr(m;P) = 1− Tr[PmPm−1 · · · P1ωmB′1···B′|M|BR1···R|M| P1 · · · Pm−1Pm] (6.1.103)

for all m ∈M.

Recall that our goal is to place an upper bound on the maximal error prob-ability p∗err(P;N) of this position-based coding and sequential decoding pro-tocol. We obtain an upper bound on perr(m,P;N) for each message m fromapplying the following theorem, called the quantum union bound, whose proofcan be found in Appendix 6.A. This theorem can be thought of as a quantumgeneralization of the union bound from probability theory. Indeed, if A1, . . . ,AN is a sequence of events, then the union bound is as follows:

Pr[(A1 ∩ · · · ∩ AN)c] = Pr[Ac

1 ∪ · · · ∪ AcN] ≤

N

∑i=1

Pr[Aci ], (6.1.104)

where the superscript c denotes the complement of an event.

558

Page 570: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Theorem 6.7 Quantum Union Bound

Let PiNi=1 be a set of projectors. For every state ρ and c > 0,

1−Tr[PNPN−1 · · · P1ρP1 · · · PN−1PN]

≤ (1 + c)Tr[(1− PN)ρ] + (2 + c + c−1)N−1

∑i=1

Tr[(1− Pi)ρ].(6.1.105)

PROOF: See Appendix 6.A.

Using this theorem, we place the following upper bound on the messageerror probability perr(m,P;N):

perr(m,P;N) ≤ (1 + c)Tr[PmωmB′1···B′|M|BR1···R|M| ]

+ (2 + c + c−1)m−1

∑i=1

Tr[PiωmB′1···B′|M|BR1···R|M| ],

(6.1.106)

which holds for all c > 0. By making a particular choice for the projectorsP1, . . . , P|M|, and a particular choice for the constant c, we obtain the follow-ing.

Proposition 6.8 Lower Bound on One-Shot Entanglement-AssistedClassical Capacity

Let NA→B be a quantum channel. For ε ∈ (0, 1) and η ∈ (0, ε), there ex-ists an (|M|, ε) entanglement-assisted classical communication protocolover NA→B such that

log2 |M| = I ε−ηH (N)− log2

(4ε

η2

)(6.1.107)

Consequently, for all ε ∈ (0, 1) and for all η ∈ (0, ε),

CεEA(N) ≥ Iε−η

H (N)− log2

(4ε

η2

). (6.1.108)

Here,Iε

H(N) := supψRA

IεH(R; B)ω, (6.1.109)

559

Page 571: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

where ωRB = NA→B(ψRA), ψRA is a pure state with the dimension of Rthe same as that of A, and

IεH(A; B)ρ := Dε

H(ρAB‖ρA ⊗ ρB). (6.1.110)

REMARK: The quantity IεH(A; B)ρ defined in the statement of Proposition 6.8 above is

similar to the quantity IεH(A; B)ρ defined in (4.11.88), except that we do not perform an

optimization over states σB. The resulting channel quantity IεH(N) is then similar to the

quantity IεH(N) defined in (4.11.87). The fact that it suffices to optimize over pure states

ψRA in IεH(N), with the dimension of R the same as that of A, follows from arguments

analogous to those presented in Section 4.11.

PROOF: Fix ε ∈ (0, 1) and η ∈ (0, ε). Starting with the encoded state on Bob’ssystems as defined in (6.1.98), observe that the state of the systems B and B′mis given by

τmBB′m

=

NA′→B(ρA′B′) if m = m,

NA′→B(ρA′)⊗ ρB′ if m 6= m. (6.1.111)

Recall that the system B results from the system Am being sent through thechannel by Alice. If, along with system B, he measures the system B′m, thenBob is performing a measurement on the state NA′→B(ρA′B′). On the otherhand, if Bob measures a system B′m, with m 6= m, then Bob is performing ameasurement on the state NA′→B(ρA′)⊗ ρB′ . Bob, knowing the states ρA′B′ aswell as the channel N, can devise the measurement ΛB′B, 1B′B − ΛB′B thatachieves the minimum value of the probability Tr[ΛB′B(ρB′ ⊗ NA′→B(ρA′))]while satisfying

Tr[ΛB′BNA′→B(ρA′B′)] ≥ 1− (ε− η). (6.1.112)

Equivalently, by recalling Definition 4.62, the measurement achieves the ε-hypothesis testing relative entropy

Dε−ηH (NA′→B(ρA′B′)‖ρB′ ⊗NA′→B(ρA′)), (6.1.113)

meaning that

− log2 Tr[ΛB′B(ρB′ ⊗NA′→B(ρA′))]

= Dε−ηH (NA′→B(ρA′B′)‖ρB′ ⊗NA′→B(ρA′))

= Iε−ηH (B′; B)ξ ,

(6.1.114)

where ξB′B := NA′→B(ρA′B′).

560

Page 572: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

The measurement with POVM ΛB′B, 1B′B−ΛB′B forms one part of Bob’sdecoding strategy. The other part of the decoding strategy is based on thefact that Bob does not know the position corresponding to the system B hereceives through the channel from Alice. He therefore does not know whichof the systems B′1, . . . , B′|M| to measure along with B. As described beforethe statement of the proposition, the sequential decoding strategy consists ofBob performing a sequence of projective measurements on the systems B′i BRcorresponding to the question “Was the ith message sent?”. Let us definethe projectors ΠB′BR, 1B′BR−ΠB′BR on which this measurement is based asfollows:

ΠB′BR := U†B′BR(1B′B ⊗ |1〉〈1|R)UB′BR, (6.1.115)

where R is a qubit system and the unitary UB′BR is defined as

UB′BR :=√

1B′B −ΛB′B ⊗ (|0〉〈0|R + |1〉〈1|R)+√

ΛB′B ⊗ (|1〉〈0|R − |0〉〈1|R) . (6.1.116)

Then, it follows that

Tr[ΠB′BR(NA′→B(ρA′B′)⊗ |0〉〈0|R)]= Tr[(1B′B ⊗ 〈0|R)ΠB′BR(1B′B ⊗ |0〉R)NA′→B(ρA′B′)] (6.1.117)= Tr[ΛB′BNA′→B(ρA′B′)] (6.1.118)≥ 1− (ε− η), (6.1.119)

where the second equality holds by the definition of ΠB′BR in (6.1.115) and thefact that (1B′B ⊗ 〈1|R)UB′BR(1B′B ⊗ |0〉R) =

√ΛB′B, which can be seen from

(6.1.116). To obtain the inequality, we used (6.1.112). Defining the projectionoperators Pi|M|i=1 as in (6.1.99), and defining the state ω as in (6.1.102), it alsoholds that

Tr[PiωmB′1···B′|M|BR1···R|M| ] = Tr[ΛB′B(ρB′ ⊗NA′→B(ρA′))] (6.1.120)

for all i < m, and

Tr[PmωmB′1···B′|M|BR1···R|M| ] = Tr[(1B′B −ΛB′B)NA′→B(ρB′A′)]. (6.1.121)

Now, recall that the message error probability perr(m,P;N) is defined as in(6.1.103), i.e.,

perr(m,P;N)

= 1− Tr[PmPm−1 · · · P1ωB′1···B′|M|BR1···R|M| P1 · · · Pm−1Pm],(6.1.122)

561

Page 573: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

and that we can use the quantum union bound (Theorem 6.7) to place anupper bound on this quantity as in (6.1.106), i.e.,

perr(m;P) ≤ (1 + c)Tr[PmωmB′1···B′|M|BR1···R|M| ]

+ (2 + c + c−1)m−1

∑i=1

Tr[PiωmB′1···B′|M|BR1···R|M| ]

(6.1.123)

for all c > 0. Using (6.1.120) and (6.1.121), the inequality in (6.1.112), and theequality in (6.1.114), the upper bound can be simplified so that

perr(m,P;N)

≤ (1 + c)Tr[(1B′B −ΛB′B)NA′→B(ρB′A′)]

+ (2 + c + c−1)(m− 1)Tr[ΛB′B(ρB′ ⊗NA′→B(ρA′)] (6.1.124)

≤ (1 + c)(ε− η) + (2 + c + c−1)|M|2−Iε−ηH (B′;B)ξ (6.1.125)

for all c > 0, where to obtain the second inequality we used the fact thatm− 1 ≤ |M|. The inequality in (6.1.125) holds for all m ∈ M, which meansthat for all c > 0,

p∗err(P;N) ≤ (1 + c)(ε− η) + (2 + c + c−1)|M|2−Iε−ηH (B′;B)ξ . (6.1.126)

By taking c = η2ε−η , and letting the set M of messages be such that

|M| = 2Iε−η

H (B′;B)ξ−log2

(4εη2

)

, (6.1.127)

we find thatp∗err(P) ≤ ε. (6.1.128)

This proves the existence of an (|M|, ε) protocol with |M| given by (6.1.127).

However, (6.1.127) holds for every state ρA′B′ , which means that we cantake

log2 |M| = supρA′B′

Iε−ηH (B′; B)ξ − log2

(4ε

η2

)

= Iε−ηH (N)− log2

(4ε

η2

),

(6.1.129)

and have (6.1.128) hold. This is precisely (6.1.107), and since ε ∈ (0, 1) andη ∈ (0, ε) are arbitrary, the proof is complete.

562

Page 574: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Let us take note of the following two facts from the proof of Proposition 6.8given above:

1. Given a particular ε ∈ (0, 1) and an η ∈ (0, ε), we can construct a position-based coding and sequential decoding protocol achieving a maximal er-ror probability of p∗err(P) ≤ ε by taking

log2 |M| = Iε−ηH (B′; B)ξ − log2

(4ε

η2

), (6.1.130)

where ξB′B = NA′→B(ρA′B′) and ρA′B′ is the shared state that Alice andBob start with (see (6.1.127)). Note that this holds for every state ρA′B′ . Wetake the supremum at the end of the proof in order to obtain the highestpossible number of transmitted bits, and also to obtain a quantity that isa function of the channel N only. The right-hand side of (6.1.107) can thusbe achieved by determining the optimal shared state ρA′B′ .

2. Since it suffices to optimize over pure states in order to obtain Iε−ηH (N),

we see that the shared state ρA′B′ can be taken to be pure, with the dimen-sion of B′ the same as that of A′.

An immediate consequence of Propositions 6.8 and 4.69 is the followingtheorem.

Theorem 6.9 One-Shot Lower Bounds for Entanglement-AssistedClassical Communication

Let NA→B be a quantum channel. For all ε ∈ (0, 1), η ∈ (0, ε), andα ∈ (0, 1), there exists an (|M|, ε) entanglement-assisted classical com-munication protocol over NA→B such that

log2 |M| ≥ Iα(N) +α

α− 1log2

(1

ε− η

)− log2

(4ε

η2

). (6.1.131)

Here,Iα(N) := sup

ψRA

Iα(R; B)ω, (6.1.132)

where ωRB = NA→B(ψRA), ψRA is a pure state, R has dimension equalto that of A, and

Iα(A; B)ρ := Dα(ρAB‖ρA ⊗ ρB). (6.1.133)

563

Page 575: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

REMARK: The quantity Iα(A; B)ρ defined in the statement of Theorem 6.9 above is sim-ilar to the quantity Iα(A; B)ρ defined in (4.11.90), except that we do not perform an opti-mization over states σB. The resulting channel quantity Iα(N) is then similar to the quan-tity Iα(N) defined in (4.11.89). The fact that it suffices to optimize over pure states ψRA inIα(N), with the dimension of R the same as that of A, follows from arguments analogousto those presented in Section 4.11.

PROOF: From Proposition 6.8, we know that for all ε ∈ (0, 1) and η ∈ (0, ε)there exists an (|M|, ε) entanglement-assisted classical communication proto-col such that

log2 |M| = Iε−ηH (N)− log2

(4ε

η2

). (6.1.134)

Proposition 4.69 relates the hypothesis testing relative entropy to the Petz–Rényi relative entropy according to

DεH(ρ‖σ) ≥ Dα(ρ‖σ) +

α

α− 1log2

(1ε

)(6.1.135)

for all α ∈ (0, 1), which implies that

IεH(N) ≥ Iα(N) +

α

α− 1log2

(1ε

). (6.1.136)

Combining this inequality with (6.1.134), we obtain the desired result.

Since the inequality in (6.1.131) holds for all (|M|, ε) entanglement-assistedclassical communication protocols, we have that

CεEA(N) ≥ Iα(N) +

α

α− 1log2

(1

ε− η

)− log2

(4ε

η2

)(6.1.137)

for all α ∈ (0, 1), where ε ∈ (0, 1) and η ∈ (0, ε).

6.2 Entanglement-Assisted Classical Capacity of aQuantum Channel

Let us now consider the asymptotic setting of entanglement-assisted classicalcommunication. In this scenario, depicted in Figure 6.4, instead of encodingthe message into one quantum system and consequently using the channel Nonly once, Alice encodes the message into n ≥ 1 quantum systems A1, . . . , An,

564

Page 576: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

E

ΨA′B′

A′

B′

N

N

...

N

N

M 3 m

A1

A2

...An−1

An

B1

B2

...Bn−1

Bnm

AliceBob

FIGURE 6.4: The most general entanglement-assisted classical communi-cation protocol over a multiple number n ≥ 1 uses of a quantum channelN. Alice and Bob intially share a pair of quantum systems in the stateΨA′B′ . Alice, who wishes to send a message m from a set M of messages,first encodes the message into a quantum state on n quantum systems us-ing an encoding channel E. She then sends each quantum system throughthe channel N. After Bob receives the systems, he performs a joint mea-surement on them and the system B′, using the outcome of the measure-ment to give an estimate m of the message m sent to him by Alice.

all with the same dimension as that of A, and sends each one of these throughthe channel N. We call this the asymptotic setting because the number n canbe arbitrarily large.

The analysis of the asymptotic setting is almost exactly the same as thatof the one-shot setting. This is due to the fact that n independent uses of thechannel N can be regarded as a single use of the tensor-product channel N⊗n.So the only change that needs to be made is to replace N with N⊗n and todefine the states and POVM elements as acting on n systems instead of justone. In particular, the state at the end of the protocol is

ωpMM

= (DBnB′→M N⊗nA→B EM′A′→An)(Φp

MM′ ⊗ΨA′B′), (6.2.1)

where p is the prior probability distribution over the set of messages M, theencoding channel EM′A′→An defines a set Em

M′→Anm∈M of channels so that

EM′A′→An(ΦpMM′ ⊗ΨA′B′) = ∑

m∈Mp(m)|m〉〈m|M ⊗ Em

A′→An(ΨA′B′), (6.2.2)

and the decoding channel DBnB′→M, with associated POVM ΛmBnB′m∈M, is

defined asDBnB′→M(τBnB′) = ∑

m∈MTr[Λm

BnB′τBnB′ ]|m〉〈m|M. (6.2.3)

565

Page 577: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Then, for a given code specified by the encoding and decoding channels,the definitions of the message error probability of the code, the average er-ror probability of the code, and the maximal error probability of the code allfollow analogously from their definitions in (6.1.13), (6.1.14), and (6.1.15), re-spectively, from the one-shot setting.

Definition 6.10 (n, |M|, ε) Entanglement-Assisted Classical Com-munication Protocol

Let (M, ΨA′B′ ,EM′A′→An ,DBnB′→M) be the elements of an entanglement-assisted classical communication protocol over n uses of the channelNA→B. The protocol is called an (n, |M|, ε) protocol, with ε ∈ [0, 1], ifp∗err(P;N⊗n) ≤ ε.

Note that if there exists an (n, |M|, ε) entanglement-assisted classical com-munication protocol, then there exists an (n, |M′|, ε) entanglement-assistedclassical communication protocol for all M′ satisfying |M′| ≤ |M|. Indeed,simply take a subset M′ ⊆M of size |M′| and define the encoding and decod-ing channels E′ and D′ as the restrictions of the original channels E,D to theset M′. Then, using the shorthand P′ ≡ (Ψ,E′,D′),

p∗err(P′;N) = max

m′∈M′perr(m′,P′;N) (6.2.4)

≤ maxm∈M

perr(m,P;N) (6.2.5)

= p∗err(P;N) (6.2.6)≤ ε, (6.2.7)

where the first inequality holds because M′ is a subset of M. So we have an(n, |M′|, ε) protocol. Similarly, if there does not exist an (n, |M|, ε) entanglement-assisted classical communication protocol, then there does not exist an (n, |M′|, ε)entanglement-assisted classical communication protocol for all M′ satisfying|M′| ≥ |M|.

The rate of an entanglement-assisted classical communication protocol ov-er n uses of a channel is equal to the number of bits that can be transmittedper channel use, i.e.,

R(n, |M|) :=1n

log2 |M|. (6.2.8)

Observe that the rate depends only on the number of messages in the set andon the number of uses of the channel. In particular, it does not directly depend

566

Page 578: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

on the communication channel nor on the encoding and decoding channels.Given a channel NA→B and ε ∈ (0, 1), the maximum rate of entanglement-assisted classical communication over N among all (n, |M|, ε) protocols is

Cn,εEA(N) :=

1n

CεEA(N

⊗n) (6.2.9)

= sup(M,Ψ,E,D)

1n

log2 |M| : p∗err((Ψ,E,D);N⊗n) ≤ ε

, (6.2.10)

where the optimization is over all entanglement-assisted classical communi-cation protocols (M, ΨA′B′ ,EM′A′→An ,DBnB′→M) over N⊗n, with dM′ = dM =|M|.

The goal of an entanglement-assisted classical communication protocol inthe asymptotic setting is to maximize the rate while at the same time keepingthe maximal error probability low, using the number n of channel uses as atunable parameter. Ideally, we would want the error probability to vanish,and since we want to determine the highest possible rate, we are not neces-sarily concerned about the practical question regarding how many channeluses might be required, at least in the asymptotic setting. In particular, asindicated by definitions, it might take an arbitrarily large number of channeluses to obtain the highest rate with a vanishing error probability.

Definition 6.11 Achievable Rate for Entanglement-Assisted Classi-cal Communication

Given a quantum channel N, a rate R ∈ R+ is called an achievable rate forentanglement-assisted classical communication over N if for all ε ∈ (0, 1], δ >

0, and sufficiently large n, there exists an (n, 2n(R−δ), ε) entanglement-assisted classical communication protocol.

As we prove in Appendix A,

R achievable rate ⇐⇒ limn→∞

ε∗EA(2n(R−δ);N⊗n) = 0 ∀ δ > 0. (6.2.11)

In other words, a rate R is achievable if the optimal error probability for asequence of protocols with rate R − δ, δ > 0, vanishes as the number n ofuses of N increases.

567

Page 579: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Definition 6.12 Entanglement-Assisted Classical Capacity of aQuantum Channel

The entanglement-assisted classical capacity of a quantum channel N, de-noted by CEA(N), is defined as the supremum of all achievable rates,i.e.,

CEA(N) := supR : R is an achievable rate for N. (6.2.12)

The entanglement-assisted classical capacity can also be written as

CEA(N) = infε∈(0,1]

lim infn→∞

1n

CεEA(N

⊗n). (6.2.13)

See Appendix A for a proof.

Definition 6.13 Weak Converse Rate for Entanglement-AssistedClassical Communication

Given a quantum channel N, a rate R ∈ R+ is called a weak converse ratefor entanglement-assisted classical communication over N if every R′ > R isnot an achievable rate for N.

We show in Appendix A in that

R weak converse rate ⇐⇒ limn→∞

ε∗EA(2n(R−δ);N⊗n) > 0 ∀ δ > 0. (6.2.14)

In other words, a weak converse rate is a rate above which the optimal errorprobability cannot be made to vanish in the limit of a large number of channeluses.

Definition 6.14 Strong Converse Rate for Entanglement-AssistedClassical Communication

Given a quantum channel N, a rate R ∈ R+ is called a strong converse ratefor entanglement-assisted classical communication over N if for all ε ∈ [0, 1),δ > 0, and sufficiently large n, there does not exist an (n, 2n(R+δ), ε)entanglement-assisted classical communication protocol over N.

We show in Appendix A that

R strong converse rate ⇐⇒ limn→∞

ε∗EA(2n(R+δ);N⊗n) = 1 ∀ δ > 0. (6.2.15)

568

Page 580: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

In other words, unlike the weak converse, in which the optimal error is re-quired to simply be bounded away from zero as the number n of channeluses increases, in order to have a strong converse rate the optimal error has toconverge to one as n increases. By comparing (6.2.14) and (6.2.15), it is clearthat every strong converse rate is a weak converse rate.

Definition 6.15 Strong Converse Entanglement-Assisted ClassicalCapacity of a Quantum Channel

The strong converse entanglement-assisted classical capacity of a quantumchannel N, denoted by CEA(N), is defined as the infimum of all strongconverse rates, i.e.,

CEA(N) := infR : R is a strong converse rate for N. (6.2.16)

As shown in general in Appendix A, we have that

CEA(N) ≤ CEA(N) (6.2.17)

for every quantum channel N. We can also write the strong converse entanglement-assisted classical capacity as

CEA(N) = supε∈[0,1)

lim supn→∞

1n

CεEA(N

⊗n). (6.2.18)

See Appendix A for a proof.

Having defined the entanglement-assisted classical capacity of a quantumchannel, as well as the strong converse capacity, we now state the main theo-rem of this chapter, which gives an expression for the entanglement-assistedclassical capacity of a quantum channel.

Theorem 6.16 Entanglement-Assisted Classical Capacity

For every quantum channel N, its entanglement-assisted classical ca-pacity CEA(N) and its strong converse entanglement-assisted classicalcapacity CEA(N) are both equal to the mutual information I(N), i.e.,

CEA(N) = CEA(N) = I(N), (6.2.19)

where I(N) is defined in (4.11.102).

569

Page 581: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

There are two ingredients to proving Theorem 6.16:

1. Achievability: We show that I(N) is an achievable rate. In general, to showthat R ∈ R+ is achievable, we define the shared entangled state ΨA′B′ andconstruct encoding and decoding channels such that for all ε ∈ (0, 1] andsufficiently large n, the encoding and decoding channels correspond to(n, 2nr, ε) protocols, as per Definition 6.10, with rates r < R. Thus, if R isan achievable rate, then, for every error probability ε, it is possible to findan n large enough, along with encoding and decoding channels, suchthat the resulting protocol has rate arbitrarily close to R and maximalerror probability bounded from above by ε.

The achievability part of the proof establishes that CEA(N) ≥ I(N).

2. Strong Converse: We show that I(N) is a strong converse rate, from whichit follows that CEA(N) ≤ I(N). In general, to show that R ∈ R+ is astrong converse rate, we show that, given any shared entangled stateΨA′B′ and any encoding and decoding channels, for every rate r > R,ε ∈ [0, 1), and sufficiently large n, the communication protocol definedby the encoding and decoding channels is not an (n, 2nr, ε) protocol.

After showing the achievability and strong converse parts, we can use theinequality in (6.2.17) to conclude that

I(N) ≤ CEA(N) ≤ CEA(N) ≤ I(N), (6.2.20)

which immediately implies that CEA(N) = CEA(N) = I(N).

We first establish in Section 6.2.1 that the rate I(N) is achievable for entan-glement-assisted classical communication over N. We then address the addi-tivity of the mutual information of a channel, in particular of the sandwichedRényi mutual information of a channel, in Section 6.2.2. Finally, we prove thatI(N) is a strong converse rate in Section 6.2.3. This implies that I(N) is a weakconverse rate; however, in Section 6.2.4, we provide an independent proof ofthis fact, as the technique used in the proof is useful for alternate communi-cation scenarios (besides entanglement-assisted communication) for which astrong converse theorem is not known to hold.

6.2.1 Proof of Achievability

In this section, we prove that I(N) is an achievable rate for entanglement-assisted classical communication over N.

570

Page 582: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

First, recall from Theorem 6.9 that for all ε ∈ (0, 1), η ∈ (0, ε), and α ∈(0, 1), there exists an (|M|, ε) entanglement-assisted classical communicationprotocol over N such that

log2 |M| ≥ Iα(N) +α

α− 1log2

(1

ε− η

)− log2

(4ε

η2

), (6.2.21)

where Iα(N) is defined in (6.1.132). We obtained this result through a position-based coding strategy along with sequential decoding. A simple corollary ofthis result is the following.

Corollary 6.17 Lower Bound for Entanglement-Assisted ClassicalCommunication in Asymptotic Setting

Let N be a quantum channel. For all ε ∈ (0, 1], n ∈ N, and α ∈ (0, 1),there exists an (n, |M|, ε) entanglement-assisted classical communica-tion protocol over n uses of N such that

1n

log2 |M| ≥ Iα(N)− 1n(1− α)

log2

(2ε

)− 3

n. (6.2.22)

PROOF: Fix ε ∈ (0, 1]. The inequality (6.2.21) holds for every channel N,which means that it holds for N⊗n. Applying the inequality in (6.2.21) toN⊗n and dividing both sides by n, we obtain

log2 |M|n

≥ 1n

Iα(N⊗n) +

α

n(α− 1)log2

(1

ε− η

)− 1

nlog2

(4ε

η2

)(6.2.23)

for all α ∈ (0, 1). By restricting the optimization in the definition of Iα(N⊗n)to tensor-power states, we conclude that Iα(N⊗n) ≥ nIα(N). This followsfrom the additivity of the Petz–Rényi relative entropy under tensor-productstates (see Proposition 4.21). So we obtain

log2 |M|n

≥ Iα(N) +α

n(α− 1)log2

(1

ε− η

)− 1

nlog2

(4ε

η2

)(6.2.24)

for all α ∈ (0, 1). Letting η = ε2 , and using the fact that α− 1 is negative for

α ∈ (0, 1), this inequality becomes

log2 |M|n

≥ Iα(N)− 1n(1− α)

log2

(2ε

)− 3

n(6.2.25)

for all α ∈ (0, 1). Since ε is arbitrary, we find that for all ε ∈ (0, 1], there existsan (n, |M|, ε) protocol such that (6.2.22) is satisfied, as required.

571

Page 583: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

The inequality in (6.2.22) gives us, for every ε ∈ (0, 1] and n ∈ N, alower bound on the size |M| of the message set that we can take for a cor-responding (n, |M|, ε) entanglement-assisted classical communication proto-col defined by position-based coding and sequential decoding. If instead wefix a particular communication rate R by letting |M| = 2nR, then we can re-arrange the inequality in (6.2.22) to obtain an upper bound on the maximalerror probability of the corresponding (n, 2nR, ε) entanglement-assisted clas-sical communication protocol. Specifically, we conclude that

ε ≤ 2 · 2−n(1−α)(Iα(N)−R− 3n) (6.2.26)

for all α ∈ (0, 1).

The inequality in (6.2.22) implies that

Cn,εEA(N) ≥ Iα(N)− 1

n(α− 1)log2

(2ε

)− 3

n(6.2.27)

for all ε ∈ (0, 1] and α ∈ (0, 1).

We can now use (6.2.22) to prove that the mutual information I(N) is anachievable rate for entanglement-assisted classical communication over N.

Proof of the Achievability Part of Theorem 6.16

Fix ε ∈ (0, 1] and δ > 0. Let δ1, δ2 > 0 be such that

δ = δ1 + δ2. (6.2.28)

Set α ∈ (0, 1) such thatδ1 ≥ I(N)− Iα(N), (6.2.29)

which is possible since Iα(N) is monotonically increasing in α (this followsfrom Proposition 4.21), and since limα→1− Iα(N) = I(N) (see Appendix 6.Bfor a proof). With this value of α, take n large enough so that

δ2 ≥1

n(1− α)log2

(2ε

)+

3n

. (6.2.30)

Now, making use of the inequality in (6.2.22) of Corollary 6.17, there existsan (n, |M|, ε) protocol, with n and ε chosen as above, such that

log2 |M|n

≥ Iα(N)− 1n(1− α)

log2

(2ε

)− 3

n. (6.2.31)

572

Page 584: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Rearranging the right-hand side of this inequality, and using (6.2.28)–(6.2.30),we find that

log2 |M|n

≥ I(N)−(

I(N)− Iα(N) +1

n(1− α)log2

(2ε

)+

3n

)(6.2.32)

≥ I(N)− (δ1 + δ2) (6.2.33)= I(N)− δ. (6.2.34)

We thus have I(N) − δ ≤ 1n log2 |M|. By the fact stated immediately after

Definition 6.10, we conclude that there exists an (n, 2n(R−δ), ε) entanglement-assisted classical communication protocol with R = I(N) for all sufficientlylarge n such that (6.2.30) holds. Since ε and δ are arbitrary, we have that forall ε ∈ (0, 1], δ > 0, and sufficiently large n, there exists an (n, 2n(I(N)−δ),ε) entanglement-assisted classical communication protocol. This means thatI(N) is an achievable rate, and thus that CEA(N) ≥ I(N). See Appendix 6.Cfor a discussion of a different way of seeing the achievability proof.

6.2.2 Additivity of the Sandwiched Rényi Mutual Informa-tion of a Channel

We now turn our attention to establishing converse bounds for entanglement-assisted classical communication in the asymptotic setting. Recall from Theo-rem 6.6 that, for every quantum channel N, ε ∈ [0, 1), and all (|M|, ε) entanglement-assisted classical communication protocols over N,

log2 |M| ≤1

1− ε[I(N) + h2(ε)], (6.2.35)

log2 |M| ≤ Iα(N) +α

α− 1log2

(1

1− ε

)∀ α > 1. (6.2.36)

To obtain these inequalities, we considered an entanglement-assisted classi-cal communication protocol over a useless channel and used the hypothesistesting relative entropy to compare this protocol with the actual protocol overthe channel N. The useless channel in the asymptotic setting is analogous tothe one in Figure 6.2 and is shown in Figure 6.5. A simple corollary of Theo-rem 6.6, which is relevant for the asymptotic setting, is the following:

573

Page 585: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

E

ΨA′B′

A′

B′

PσBn...

M 3 m

A1

A2

...An−1

An

B1

B2

...Bn−1

Bnm

AliceBob

FIGURE 6.5: Depiction of a protocol that is useless for entanglement-assisted classical communication in the asymptotic setting. The state en-coding the message m via E is discarded and replaced by an arbitrary (butfixed) state σBn .

Corollary 6.18 Upper Bounds for Entanglement-Assisted ClassicalCommunication in Asymptotic Setting

Let N be a quantum channel. For all ε ∈ [0, 1), n ∈ N, and (n, |M|, ε)entanglement-assisted classical communication protocols over n uses ofN, the rate of transmitted bits is bounded from above as follows:

1n

log2 |M| ≤1

1− ε

(1n

I(N⊗n) +1n

h2(ε)

), (6.2.37)

1n

log2 |M| ≤1n

Iα(N⊗n) +

α

n(α− 1)log2

(1

1− ε

)∀ α > 1. (6.2.38)

PROOF: Since the inequalities (6.2.35) and (6.2.36) of Theorem 6.6 hold forevery channel N, they hold for the channel N⊗n. Therefore, applying (6.2.35)and (6.2.36) to N⊗n and dividing both sides by n, we immediately obtain thedesired result.

The inequalities in the corollary above give us, for all ε ∈ [0, 1) and n ∈N,an upper bound on the size |M| of the message set we can take for every corre-sponding (n, |M|, ε) entanglement-assisted classical communication protocol.If instead we fix a particular communication rate R by letting |M| = 2nR, thenwe can obtain a lower bound on the maximal error probability of the corre-

574

Page 586: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

sponding (n, 2nR, ε) entanglement-assisted classical communication protocol.Specifically, using (6.2.38), we find that

ε ≥ 1− 2−n( α−1α )(R− 1

n Iα(N⊗n)) (6.2.39)

for all α > 1.

The inequality in (6.2.37) can be simplified by using the fact that the mu-tual information of a channel is additive, as stated in the following theorem:

Theorem 6.19 Additivity of Mutual Information of a QuantumChannel

For every two quantum channels N1 and N2, the mutual information ofN1 ⊗N2 is equal to the sum of mutual informations of N1 and N2, i.e.,

I(N1 ⊗N2) = I(N1) + I(N2). (6.2.40)

PROOF: We first recall from (4.11.102) that the mutual information I(N) ofthe channel N is defined as

I(N) = supψRA

I(R; B)ω

= supψRA

D(NA→B(ψRA)‖ψR ⊗NA→B(ψA)),(6.2.41)

where ωRB = NA→B(ψRA), ψRA is a pure state, and R is a system with thesame dimension as A. Now that, as shown in Section 4.11 in the context ofgeneralized divergences, optimizing over all states ρRA is not required, since

supρRA

I(R; B)ω = supψRA

I(R; B)ω. (6.2.42)

We also recall that the mutual information I(A; B)ρ of a bipartite state ρAB isa special case of conditional mutual information I(A; B|C) with trivial con-ditioning system C. By applying the additivity of conditional mutual infor-mation (see (4.2.131)), we thus conclude additivity of mutual information forproduct states τA1B1 ⊗ωA2B2 :

I(A1A2; B1B2)τ⊗ω = I(A1; B1)τ + I(A2; B2)ω. (6.2.43)

Using these facts, the inequality

I(N1 ⊗N2) ≥ I(N1) + I(N2), (6.2.44)

575

Page 587: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

is straightforward to establish. Letting

ρR1R2B1B2 := ((N1)A1→B1 ⊗ (N2)A2→B2)(ψR1R2 A1 A2), (6.2.45)τR1B1 := (N1)A1→B1(φR1 A1), (6.2.46)

ωR2B2 := (N2)A2→B2(ϕR2 A2), (6.2.47)

and restricting the optimization in the mutual information of N to pure prod-uct states φ⊗ ϕ, we get that

I(N1 ⊗N2) = supψ

I(R1R2; B1B2)ρ (6.2.48)

≥ supφ⊗ϕ

I(R1R2; B1B2)τ⊗ω (6.2.49)

= supφ⊗ϕI(R1; B1)τ + I(R2; B2)ω (6.2.50)

= supφ

I(R1; B1)τ + supϕ

I(R2; B2)ω (6.2.51)

= I(N1) + I(N2). (6.2.52)

To prove the reverse inequality, let ρRB1B1 := ((N1)A1→B1 ⊗ (N2)A2→B2)(ψRA1 A2). Then, using the formula in (4.1.8) for the mutual information interms of the quantum entropy, it is straightforward to verify that

I(R; B1B2)ρ = I(R; B1)ρ + I(RB1; B2)ρ − I(B1; B2)ρ. (6.2.53)

Now, Klein’s inequality in Proposition 4.3, implies that the mutual informa-tion is non-negative. Using this fact on the last term in (6.2.53), we find that

I(R; B1B2)ρ ≤ I(R; B1)ρ + I(RB1; B2)ρ. (6.2.54)

Since N2 is trace preserving, we have that

ρRB1 = TrB2 [ρRB1B2 ] = (N1)A1→B1(ψRA1). (6.2.55)

Therefore,I(R; B1)ρ ≤ sup

ρRA1

I(R; B1)τ = I(N1), (6.2.56)

where to obtain the equality we have used (6.2.42). Similarly, by writingρRB1B2 as

ρRB1B2 = (N2)A2→B2(ωRB1 A2), ωRB1 A2 := (N1)A1→B1(ψRA1 A2), (6.2.57)

576

Page 588: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

we get thatI(RB1; B2)ρ ≤ I(N2). (6.2.58)

Therefore,I(R; B1B1)ρ ≤ I(N1) + I(N2). (6.2.59)

Since the state ψRA1 A2 that we started with is arbitrary, we obtain

I(N1 ⊗N2) = supψRA1 A2

I(R; B1B2)ρ ≤ I(N1) + I(N2), (6.2.60)

as required. Combining this inequality with that in (6.2.44), we have the re-quired equality, I(N1 ⊗N2) = I(N1) + I(N2).

Using the additivity of the mutual information of a channel, the inequalityin (6.2.37) can be rewritten as

log2 |M|n

≤ 11− ε

(I(N) +

1n

h2(ε)

), (6.2.61)

which implies that

Cn,εEA(N) ≤ 1

1− ε

(I(N) +

1n

h2(ε)

)(6.2.62)

for all n ≥ 1 and ε ∈ (0, 1). Using this inequality, it is straightforward toconclude that I(N) is a weak converse rate for entanglement-assisted classicalcommunication, and the interested reader can jump ahead to Section 6.2.4 tosee this.

We now show that the sandwiched Rényi mutual information Iα(N) of achannel N is also additive, i.e., that

Iα(N1 ⊗N2) = Iα(N1) + Iα(N2) (6.2.63)

for all α > 1. One ingredient of the proof is the additivity of the sandwichedRényi mutual information of bipartite states with respect to tensor-productstates, i.e.,

Iα(A1A2; B1B2)ξ⊗ω = Iα(A1; B1)ξ + Iα(A2; B2)ω, (6.2.64)

where ξA1B1 and ωA2B2 are states. To show this, let us first recall the definitionof the sandwiched Rényi mutual information of a bipartite state ρAB from(4.11.92):

Iα(A; B)ρ = infσB

Dα(ρAB‖ρA ⊗ σB), (6.2.65)

577

Page 589: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

where the optimization is over states σB. This quantity, as well as the sand-wiched Rényi mutual information of a channel, can be written in an alternateway, as we show in the following lemma, the proof of which can be found inAppendix 6.D.

Lemma 6.20

For every bipartite state ρAB and α > 1, the sandwiched Rényi mutualinformation Iα(A; B)ρ can be written as

Iα(A; B)ρ

α− 1log2 sup

τC

∥∥∥∥TrAC

[(ρ

1−αα

A ⊗ τα−1

αC

)|ψ〉〈ψ|ABC

]∥∥∥∥α

2α−1

,(6.2.66)

where |ψ〉ABC is a purification of ρAB and τC is a state. The sandwichedRényi mutual information I(N) of a channel N can be written as

Iα(N) =α

α− 1infσB

log2

∥∥∥S(α)σB N∥∥∥

CB, 1→α, (6.2.67)

where S(α)σB (·) := σ

1−α2α

B (·)σ1−α2α

B and

‖M‖CB, 1→α := supYR>0,

Tr[YR]≤1

∥∥∥∥MA→B

(Y

12αR |Γ〉〈Γ|RAY

12αR

)∥∥∥∥α

, (6.2.68)

with M a completely positive map. (See Appendix 6.E for an alternateexpression for ‖·‖CB,1→α.)

PROOF: See Appendix 6.D.

Using the alternate expression in (6.2.66), we establish the additivity state-ment in (6.2.64) of the sandwiched Rényi mutual information of a bipartitestate.

Proposition 6.21 Additivity of Sandwiched Rényi Mutual Informa-tion of Bipartite States

For every product state ξA1B1 ⊗ωA2B2 and α > 1, the sandwiched Rényi

578

Page 590: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

mutual information Iα(A1A2; B1B2)ξ⊗ω is additive, i.e.,

Iα(A1A2; B1B2)ξ⊗ω = Iα(A1; B1)ξ + Iα(A2; B2)ω. (6.2.69)

PROOF: By definition, we have that

Iα(A1A2; B1B2)ξ⊗ω = infσB1B2

Dα(ξA1B1 ⊗ωA2B2‖ξA1 ⊗ωA2 ⊗ σB1B2) (6.2.70)

If we restrict the optimization to product states σ1B1⊗ σ2

B2, then we find that

Iα(A1A2; B1B2)ξ⊗ω

≤ infσ1⊗σ2

Dα(ξA1B1 ⊗ωA2B2‖ξA1 ⊗ωA2 ⊗ σ1B1⊗ σ2

B2) (6.2.71)

= infσ1,σ2

Dα(ξA1B1‖ξA1 ⊗ σ1

B1) + Dα(ωA2B2‖ωA2 ⊗ σ2

B2)

(6.2.72)

= Iα(A1; B1)ξ + Iα(A2; B2)ω. (6.2.73)

SoIα(A1A2; B1B2)ξ⊗ω ≤ Iα(A1; B1)ξ + Iα(A2; B2)ω. (6.2.74)

To show the reverse inequality, we use the alternate expression (6.2.66) inLemma 6.20 for the sandwiched Rényi mutual information of a bipartite state.In this expression, if we take a product purification |ψ1〉A1B1C1 ⊗ |ψ2〉A2B2C2 ofξA1B1 ⊗ωA2B2 and restrict the optimization to product states, we obtain

Iα(A1A2; B1B2)ξ⊗ω

α− 1log2 sup

τC1C2

∥∥∥∥TrA1 A2C1C2

[((ξA1 ⊗ωA2

) 1−αα ⊗ τ

α−1α

C1C2

)

× |ψ1〉〈ψ1|A1B1C1 ⊗ |ψ2〉〈ψ2|A2B2C2

]∥∥α

2α−1(6.2.75)

≥ α

α− 1log2 sup

τC1⊗τC2

∥∥∥∥TrA1 A2C1C2

[(ξ

1−αα

A1⊗ω

1−αα

A2⊗ τ

α−1α

C1⊗ τ

α−1α

C2

)

×|ψ1〉〈ψ1|A1B1C1 ⊗ |ψ2〉〈ψ2|A2B2C2

]∥∥α

2α−1(6.2.76)

α− 1log2 sup

τC1,τC2

∥∥∥∥TrA1C1

[(ξ

1−αα

A1⊗ τ

α−1α

C1

)|ψ1〉〈ψ1|A1B1C1

]

⊗TrA2C2

[(ω

1−αα

A2⊗ τ

α−1α

C2

)|ψ2〉〈ψ2|A2B2C2

]∥∥∥∥α

2α−1

(6.2.77)

579

Page 591: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

α− 1log2 sup

τC1,τC2

∥∥∥∥TrA1C1

[(ξ

1−αα

A1⊗ τ

α−1α

C1

)|ψ1〉〈ψ1|A1B1C1

]∥∥∥∥α

2α−1

×∥∥∥∥TrA2C2

[(ω

1−αα

A2⊗ τ

α−1α

C2

)|ψ2〉〈ψ2|A2B2C2

]∥∥∥∥α

2α−1

(6.2.78)

α− 1log2 sup

τC1

∥∥∥∥TrA1C1

[(ξ

1−αα

A1⊗ τ

α−1α

C1

)|ψ1〉〈ψ1|A1B1C1

]∥∥∥∥α

2α−1

α− 1log2 sup

τC2

∥∥∥∥TrA2C2

[(ω

1−αα

A2⊗ τ

α−1α

C2

)|ψ2〉〈ψ2|A2B2C2

]∥∥∥∥α

2α−1

(6.2.79)

= Iα(A1; B1)ξ + Iα(A2; B2)ω. (6.2.80)

We have thus shown (6.2.69), as required.

Theorem 6.22 Additivity of Sandwiched Rényi Mutual Informationof a Channel

For every two channels N1 and N2, and for all α > 1,

Iα(N1 ⊗N2) = Iα(N1) + Iα(N2). (6.2.81)

PROOF: Recall thatIα(N) = sup

ψRA

Iα(R; B)ω, (6.2.82)

where ωRB := NA→B(ψRA), and the supremum is taken over every pure stateψRA, with R having the same dimension as A. The superadditivity of thesandwiched Rényi mutual information of a channel, namely,

Iα(N1 ⊗N2) ≥ Iα(N1) + Iα(N2) (6.2.83)

follows immediately by restricting the optimization in the definition (6.2.82)to product states and using the additivity of the sandwiched Rényi mutual in-formation of bipartite states, as proven in Proposition 6.21. An explicit proofof this statement goes along the same lines as (6.2.48)–(6.2.52), with I replacedby Iα.

To prove the reverse inequality, namely, the subadditivity of the sand-wiched Rényi mutual information of a channel, we use the expression in(6.2.67). By restricting the infimum in (6.2.67) to product states, we find that

Iα(N1 ⊗N2)

580

Page 592: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

α− 1inf

σB1B2

log2

∥∥∥S(α)σB1B2 (N1 ⊗N2)

∥∥∥CB, 1→α

(6.2.84)

≤ α

α− 1inf

σ1B1⊗σ2

B2

log2

∥∥∥∥S(α)

σ1B1⊗σ2

B2

(N1 ⊗N2)

∥∥∥∥CB, 1→α

(6.2.85)

α− 1inf

σ1B1

,σ2B2

log2

∥∥∥∥(S(α)

σ1B1

N1

)⊗(S(α)

σ2B2

N2

)∥∥∥∥CB, 1→α

, (6.2.86)

where to obtain the last equality we have used the fact that

S(α)

σ1B1⊗σ2

B2

=

(S(α)

σ1B1

⊗ idB2

)(

idB1 ⊗ S(α)

σ2B2

). (6.2.87)

Now, consider that the norm ‖·‖CB, 1→α is multiplicative with respect to tensorproducts of completely positive maps, i.e.,

‖M1 ⊗M2‖CB, 1→α = ‖M1‖CB, 1→α ‖M2‖CB, 1→α (6.2.88)

for every two completely positive maps M1,M2 and all α > 1 (see Appendix 6.Ffor a proof). Using this, we find that

Iα(N1 ⊗N2)

≤ α

α− 1inf

σ1B1

,σ2B2

log2

∥∥∥∥(S(α)

σ1B1

N1

)⊗(S(α)

σ2B2

N2

)∥∥∥∥CB, 1→α

(6.2.89)

α− 1infσ1

B1

log2

∥∥∥∥S(α)

σ1B1

N1

∥∥∥∥CB, 1→α

α− 1infσ2

B2

log2

∥∥∥∥S(α)

σ2B2

N2

∥∥∥∥CB, 1→α

(6.2.90)

= Iα(N1) + Iα(N2). (6.2.91)

We have thus shown that Iα(N1 ⊗N2) ≤ Iα(N1) + Iα(N2), and by combiningthis with (6.2.83), we conclude (6.2.81).

Note that the additivity of the mutual information of a channel, i.e., The-orem 6.19, follows straightforwardly from the theorem above by taking thelimit α→ 1 (see Appendix 6.B for a proof).

Using the additivity of the sandwiched Rényi mutual information of achannel (Theorem 6.21), the inequality in (6.2.38) can be written as

1n

log2 |M| ≤ Iα(N) +α

n(α− 1)log2

(1

1− ε

)(6.2.92)

581

Page 593: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

for all α > 1. This implies that

Cn,εEA(N) ≤ Iα(N) +

α

n(α− 1)log2

(1

1− ε

)(6.2.93)

for all n ≥ 1, ε ∈ (0, 1), and α > 1.

6.2.3 Proof of the Strong Converse

With the inequality in (6.2.93) in hand, we can now prove that the mutualinformation I(N) is a strong converse rate for entanglement-assisted classicalcommunication over N and establish that CEA(N) = I(N).

Proof of the Strong Converse Part of Theorem 6.16

Fix ε ∈ [0, 1) and δ > 0. Let δ1, δ2 > 0 be such that

δ > δ1 + δ2 =: δ′. (6.2.94)

Set α ∈ (1, ∞) such thatδ1 ≥ Iα(N)− I(N), (6.2.95)

which is possible since Iα(N) is monotonically increasing with α (followingfrom Proposition 4.29), and since limα→1+ Iα(N) = I(N) (see Appendix 6.Bfor a proof). With this value of α, take n large enough so that

δ2 ≥α

n(α− 1)log2

(1

1− ε

). (6.2.96)

Now, with the values of n and ε as above, every (n, |M|, ε) entanglement-assisted classical communication protocol satisfies (6.2.92). Rearranging theright-hand side of this inequality, and using (6.2.94)–(6.2.96), we obtain

log2 |M|n

≤ I(N) + Iα(N)− I(N) +α

n(α− 1)log2

(1

1− ε

)(6.2.97)

≤ I(N) + δ1 + δ2 (6.2.98)= I(N) + δ′ (6.2.99)< I(N) + δ. (6.2.100)

582

Page 594: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

So we have that I(N) + δ >log2 |M|

n for all (n, |M|, ε) entanglement-assistedclassical communication protocols and sufficiently large n. Due to this strictinequality, it follows that there cannot exist an (n, 2n(I(N)+δ), ε) entanglement-assisted classical communication protocol for all sufficiently large n such that(6.2.96) holds, for if it did there would exist a set M such that |M| = 2n(I(N)+δ),which we have just seen is not possible. Since ε and δ are arbitrary, we con-clude that for all ε ∈ [0, 1), δ > 0, and sufficiently large n, there does not existan (n, 2n(I(N)+δ), ε) entanglement-assisted classical communication protocol.This means that I(N) is a strong converse rate, and thus that CEA(N) ≤ I(N).See Appendix 6.G for a different way of understanding the strong converse.

6.2.4 Proof of the Weak Converse

We now conclude Section 6.2 by providing an independent proof of the factthat the mutual information I(N) of a channel N is a weak converse rate.3

Theorem 6.23 Weak Converse for Entanglement-Assisted ClassicalCommunication

For every quantum channel N, the mutual information I(N) is aweak converse rate for entanglement-assisted classical communicationover N.

PROOF: Suppose that R is an achievable rate. Then, by definition, for all ε ∈(0, 1], δ > 0, and sufficiently large n, there exists an (n, 2n(R−δ), ε) entanglement-assisted classical communication protocol over N. For all such protocols, theinequality (6.2.61) holds by Corollary 6.18 and the additivity of the mutualinformation of a channel, i.e.,

R− δ ≤ 11− ε

(I(N) +

1n

h2(ε)

). (6.2.101)

Since this bound holds for all sufficiently large n, it holds in the limit n→ ∞,so that

R ≤ 11− ε

I(N) + δ. (6.2.102)

3Recall that any strong converse rate is also a weak converse rate, so that by the proof ofthe strong converse part of Theorem 6.16 we can immediately conclude that I(N) is a weakconverse rate.

583

Page 595: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Then, since this inequality holds for all ε ∈ (0, 1] and δ > 0, we obtain

R ≤ limε,δ→0

1

1− εI(N) + δ

= I(N). (6.2.103)

We have thus shown that if R is an achievable rate, then R ≤ I(N). Thecontrapositive of this statement is that if R > I(N), then R is not an achievablerate. By definition, therefore, I(N) is a weak converse rate.

6.3 Examples

In this section, we determine the entanglement-assisted classical capacity ofsome of the channels that we introduced in Chapter 3. Recall that Theo-rem 6.16 states that the entanglement-assisted classical capacity CEA(N) isgiven by the mutual information of the channel N, i.e.,

CEA(N) = I(N) = supψRA

I(R; B)ω, (6.3.1)

where ωRB = NA→B(ψRA) and the optimization is over every pure state ψRA,with the dimension of R the same as the dimension of the input system A ofthe channel. The mutual information I(R; B)ω can be calculated using eitherthe quantum relative entropy or the quantum entropy via

I(R; B)ω = D(NA→B(ψRA)‖ψR ⊗NA→B(ψA)) (6.3.2)= H(R)ω + H(B)ω − H(AB)ω. (6.3.3)

In what follows, we consider the entanglement-assisted classical capacityof channels that are covariant with respect to a group G, and then we pro-vide an explicit expression for the entanglement-assisted classical capacity ofthe depolarizing, erasure, and generalized amplitude damping channels (seeSection 3.2.5). See Figure 6.6 for a plot of these capacities.

6.3.1 Covariant Channels

Let us start with covariant channels. Recall from Definition 3.48 that a chan-nel N is covariant with respect to a group G if there exist projective unitaryrepresentations Ug

Ag∈G and VgBg∈G such that

N(UgAρ(Ug

A)†) = Vg

BN(ρ)(VgB )

† (6.3.4)

584

Page 596: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

0.0 0.2 0.4 0.6 0.8 1.0p

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

CE

(bit

s)

Dp

Ep

Ap

FIGURE 6.6: [CHANGE Y-AXIS LABEL] The entanglement-assisted clas-sical capacity CEA of the depolarizing channel Dp (expressed in (6.3.19)),the erasure channel Ep (expressed in (6.3.35)), and the amplitude dampingchannel Ap (expressed in (6.3.48)), all of which are defined for the param-eter p ∈ [0, 1].

for every state ρ and all g ∈ G.

Suppose that a channel N is covariant with respect to a group G such thatthe representation Ug

Ag∈G of G acting on the input space of N satisfies

1|G| ∑

g∈GUg

AρA(UgA)

† = Tr[ρA]1

d, (6.3.5)

for all ρA, where d is the dimension of the input space of the channel N. Suchchannels are called irreducibly covariant.

Let us now recall Proposition 4.83, which tells us that the generalized mu-tual information for every covariant channel is given as follows:

I(N) = supφRA

I(R; B)ω : φA = TG(φA), (6.3.6)

where ωRB = NA→B(φRA) and TG(·) := 1|G| ∑g∈G Ug(·)Ug†. In other words,

for covariant channels, it suffices to optimize over pure states φRA for whichthe reduced state φA is invariant under the channel TG. For irreducibly co-variant channels, the expression in (6.3.6) simplifies to

I(N) = I(A; B)ρN , (6.3.7)

585

Page 597: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

where ρNAB = NA′→B(ΦAA′) is the Choi state of N.

Using (6.3.6), we immediately obtain the following theorem.

Theorem 6.24 Entanglement-Assisted Classical Capacity of Covari-ant Channels

If a channel NA→B is covariant with respect to a group G as in (6.3.4),then its entanglement-assisted classical capacity is given by

CEA(N) = supφRA

I(R; B)ω : φA = TG(φA), (6.3.8)

where ωRB = NA→B(φRA). If the representation UgAg∈G is irreducible,

then the entanglement-assisted classical capacity of N is given by themutual information of its Choi state ρNAB, i.e.,

CEA(N) = I(A; B)ρN . (6.3.9)

6.3.1.1 Depolarizing Channel

In Section 3.2.5, specifically in (3.2.112), we defined the depolarizing channelon qubits as

Dp(ρ) := (1− p)ρ +p3(XρX + YρY + ZρZ) (6.3.10)

From this, it follows that the channel is covariant with respect to the Paulioperators, meaning that

Dp(XρX) = XDp(ρ)X, (6.3.11)Dp(YρY) = YDp(ρ)Y, (6.3.12)Dp(ZρZ) = ZDp(ρ)Z. (6.3.13)

Furthermore, we have the identity in (3.2.113), which asserts that for everystate ρ,

14

ρ +14

XρX +14

YρY +14

ZρZ =1

2. (6.3.14)

586

Page 598: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

This means that the operators 1, X, Y, Z satisfy the property in (6.3.5).4 ByTheorem 6.24, we thus have that the entanglement-assisted classical capacityof the depolarizing channel is simply the mutual information of its Choi state.

It is straightforward to see that the Choi state ρDpAB of the depolarizing

channel is

ρDpAB = (1− p)|Φ+〉〈Φ+|AB

+p3(|Ψ+〉〈Ψ+|AB + |Ψ−〉〈Ψ−|AB + |Φ−〉〈Φ−|AB

). (6.3.15)

Since H(A)ρDp = H(B)

ρDp = log2(2) = 1 and

H(AB)ρDp = −(1− p) log2(1− p)− p log2

( p3

), (6.3.16)

we find that

CEA(Dp) = I(A; B)ρDp = H(A)

ρDp + H(B)ρDp − H(AB)

ρDp (6.3.17)

= 2 + (1− p) log2(1− p) + p log2

( p3

)(6.3.18)

= 2− h2(p)− p log2(3) (6.3.19)

for all p ∈ [0, 1]. See Figure 6.6 above for a plot of the capacity.

Let us also briefly analyze the lower and upper bounds obtained in Corol-laries 6.17 and 6.18, respectively. Specifically, let us consider the followingbounds on the maximal error probability that results from these bounds, i.e.,

ε ≤ 2 · 2−n(1−α)(Iα(Dp)−R− 3n), (6.3.20)

ε ≥ 1− 2−n( α−1α )(R− Iα(Dp)), (6.3.21)

which are rearrangements of (6.2.22) and (6.2.38) and are discussed further inAppendices 6.C and 6.G, respectively. Now, by (6.3.7), we have

Iα(Dp) = Iα(R; B)ρDp ≤ Iα(R; B)

ρDp , (6.3.22)

whereIα(A; B)ρ := Dα(ρAB‖ρA ⊗ ρB). (6.3.23)

4The operators 1, X, Y, Z form a projective unitary representation of the group Z2 ×Z2 = (0, 0), (0, 1), (1, 0), (1, 1), where Z2 is the group consisting of the set 0, 1 with ad-dition modulo two. Specficially, we have U(0,0) = 1, U(0,1) = X, U(1,0) = Z, and U(1,1) = Y.

587

Page 599: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

0.0 0.2 0.4 0.6 0.8 1.0Rate, R

0.0

0.2

0.4

0.6

0.8

1.0

Erro

r,ε

n = 106

n = 2 · 105

n = 105

0.0 0.2 0.4 0.6 0.8 1.0Rate, R

0.0

0.2

0.4

0.6

0.8

1.0

Erro

r,ε

n = 107

FIGURE 6.7: Plot of the error bounds in (6.3.24) and (6.3.25) for the depo-larizing channel Dp with p = 0.4. By increasing the number n of channeluses, it is possible to communicate at rates closer to the capacity (indicatedby the black vertical line) with vanishing error probability. Furthermore,for every rate above the capacity, as n increases, the error probability ap-proaches one at an exponential rate, consistent with the fact that the mu-tual information I(Dp) is a strong converse rate.

For simplicity, let us use the quantity in (6.3.22), which does not involve anoptimization over states σB, to place a lower bound on ε, so that

ε ≥ 1− 2−n( α−1α )

(R− Iα(R;B)ω

), (6.3.24)

where ωRB = ρDpRB is the Choi state of Dp. Similarly, for simplicity, let us take

the quantity Iα(N), which by definition involves an optimization over all purestates ψRA, and let ψRA be the maximally entangled state Φ+

RA. So we take theupper bound on the error probability to be

ε ≤ 2 · 2−n(1−α)(Iα(R;B)ω−R− 3n), (6.3.25)

where ωRB = ρDpRB is the Choi state of Dp. Then, for p = 0.4, we plot in Fig-

ure 6.7 the bounds in (6.3.24) and (6.3.25) (with α = 1.0001 and α = 0.9999,respectively) to obtain plots that are analogous to the generic plot in Fig-ure 6.8 in Appendix 6.G. As portrayed in Figure 6.8, we indeed see that, as thenumber n of channel uses increases, the capacity CEA(Dp) becomes a clearerdividing point between reliable communication—with nearly-vanishing er-ror probability—and unreliable communication—with error probability ap-proaching one at an exponential rate.

588

Page 600: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Let us consider the qudit depolarizing channel, as defined in (3.2.138)in terms of the Heisenberg–Weyl operators Wz,xz,x from (3.2.115). Fromthe definition in (3.2.138) and the properties stated in (3.2.118)–(3.2.120), itfollows that the qudit depolarizing channel is covariant with respect to theHeisenberg–Weyl operators. Furthermore, the Heisenberg–Weyl operatorsform an irreducible projective unitary representation of the group Zd ×Zd,and thus satisfy

1d2

d−1

∑z,x=0

Wz,xρW†z,x = Tr[ρ]

1

d(6.3.26)

for all ρ. Therefore, by Theorem 6.24, we have that

CEA(D(p)p ) = I(A; B)

ρD(d)p

. (6.3.27)

By calculations analogous to those above, we obtain

CEA(D(d)p ) = 2 log2 d− h2(p)− p log2(d

2 − 1). (6.3.28)

6.3.1.2 Erasure Channel

In Section 3.2.5, specifically in (3.2.99), we defined the qubit erasure channelas

Ep(ρ) := (1− p)ρ + pTr[ρ]|e〉〈e| (6.3.29)

for p ∈ [0, 1], where |e〉 is a state that is orthogonal to all states in the inputqubit system. Recall that if we let the output space simply be a qutrit systemwith the orthonormal basis |0〉, |1〉, |2〉, then the input qubit space can benaturally embedded into the subspace of the qutrit system spanned by |0〉and |1〉, so that we can let the erasure state simply be |2〉. This means that wecan write the action of the erasure channel as

Ep(ρ) = (1− p)ρ + p|2〉〈2|. (6.3.30)

Observe that, like the depolarizing channel, the erasure channel is covari-ant with respect to the group Z2 ×Z2, with the representation 1, X, Y, Zon the input qubit space and the representation 1 ⊕ |2〉〈2|, X ⊕ |2〉〈2|, Y ⊕|2〉〈2|, Z ⊕ |2〉〈2| on the output space. Then, by Theorem 6.24, the entangle-ment-assisted classical capacity of the erasure channel is equal to the mutualinformation of its Choi state.

589

Page 601: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

The Choi state ρEpAB of the erasure channel is

ρEpAB = (1− p)|Φ+〉〈Φ+|AB + p

1A2⊗ |2〉〈2|. (6.3.31)

It is straightforward to verify that

H(A)ρEp = 1, (6.3.32)

H(B)ρEp = −(1− p) log2

(1− p

2

)− p log2 p, (6.3.33)

H(AB)ρEp = −(1− p) log2(1− p)− p log2

( p2

), (6.3.34)

so thatCEA(Ep) = I(A; B)

ρEp = 2(1− p). (6.3.35)

In general, as discussed in Section 3.2.5.2, we can consider an erasurechannel E

(d)p acting on a qudit system with dimension d and orthonormal

basis |1〉, . . . , |d〉, so that the output of the erasure channel is a state on a(d + 1)-dimensional system, i.e.,

E(d)p (ρ) = (1− p)ρ + pTr[ρ]|d + 1〉〈d + 1|. (6.3.36)

Then, it is straightforward to see that the qudit erasure channel is irreduciblycovariant with respect to the group Zd ×Zd, with the corresponding repre-sentation on the input space being Wz,x : 0 ≤ z, x ≤ d− 1 and the repre-sentation on the output space being Wz,x ⊕ |d + 1〉〈d + 1| : 0 ≤ z, x ≤ d− 1.Therefore, by reasoning analogous to the above, we obtain

CEA(E(d)p ) = 2(1− p) log2 d. (6.3.37)

6.3.2 Generalized Amplitude Damping Channel

In (3.2.96), we defined the generalized amplitide damping channel Aγ,N asthe channel with the four Kraus operators in (3.2.97) and (3.2.98), i.e.,

Aγ,N(ρ) = A1ρA†1 + A2ρA†

2 + A3ρA†3 + A4ρA†

4, (6.3.38)

where

A1 =√

1− N(

1 00√

1− γ

), A2 =

√1− N

(0√

γ0 0

), (6.3.39)

590

Page 602: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

A3 =√

N(√

1− γ 00 1

), A4 =

√N(

0 0√γ 0

). (6.3.40)

Now, it is straightforward to verify that ZA1 = A1Z, ZA2 = −A2Z,ZA3 = A3Z, and ZA4 = −ZA4. Therefore,

Aγ,N(ZρZ) = ZAγ,N(ρ)Z (6.3.41)

for every state ρ. The generalized amplitude damping channel is thus covari-ant with respect to 1, Z, which can be viewed as a representation of thegroup G = Z2 on both the input and output spaces of the channel. Note thatthis representation does not satisfy the property in (6.3.5).

Nevertheless, we can use the expression in (6.3.8) to determine the entan-glement-assisted classical capacity of the generalized amplitude damping ch-annel. First, observe that the channel ρ 7→ 1

|G| ∑g∈G UgρU†g is same as the

completely dephasing channel defined in (3.2.109). Since the output of thecompletely dephasing channel is always diagonal in the basis |0〉, |1〉, by(6.3.8) it suffices to optimize over pure states ψRA such that the reduced stateψA is diagonal in the basis |0〉, |1〉. Thus, up to an (irrelevant) unitary onthe system R, the pure states ψRA in (6.3.8) can be taken to have the form

|ψ〉RA =√

1− p|0, 0〉RA +√

p|1, 1〉RA (6.3.42)

for p ∈ [0, 1]. For every such state, it is straightforward to show that thecorresponding output state ωRB = (Aγ,N)A→B(ψRA) has four eigenvalues:(1− p) γN, (1− N) pγ, and

λ± :=12(1− γ(N + p− 2Np)± f (N, p, γ)) , (6.3.43)

f (N, p, γ) :=√

γ2(N − p)2 − 2γ(N + p− 2Np) + 1, (6.3.44)

which means that

H(RB)ρAγ,N = −(1− p)γN log2((1− p)γN)

− (1− N)pγ log2((1− N)pγ)

− λ+ log2 λ+ − λ− log2 λ−. (6.3.45)

Since ωR = ψR = (1− p)|0〉〈0| + p|1〉〈1|, we have that H(R)ρAγ,N = h2(p).

Finally, we have

H(B)ρAγ,N = h2(γ(p− N) + 1− p)). (6.3.46)

591

Page 603: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Therefore,

I(Aγ,N) = maxp∈[0,1]

h2(p) + h2(γ(p− N) + 1− p))

+ (1− p)γN log2((1− p)γN)

+ (1− N)pγ log2((1− N)pγ)

+ λ+ log2 λ+ + λ− log2 λ−. (6.3.47)

In the case N = 0, the channel Aγ,0 is the amplitude damping channel Aγ

(see (3.2.87)). Using (6.3.47), we find that

I(Aγ) = maxp∈[0,1]

h2(p(1− γ)) + h2(p)− h2(γp) (6.3.48)

for all γ ∈ [0, 1].

6.4 Summary

In this chapter, we formally defined and studied entanglement-assisted clas-sical communication. We began with the fundamental one-shot setting, inwhich a quantum channel is used just once for entanglement-assisted classicalcommunication, and we defined the one-shot entanglement-assisted classicalcapacity in (6.1.43). We then derived upper and lower bounds on the one-shot capacity (Propositions 6.5 and 6.8), making a fundamental link betweencommunication and hypothesis testing for both bounds. To derive the up-per bound, the main conceptual point was to compare an actual protocol forentanglement-assisted communication with a useless one. This approach ledto the hypothesis testing mutual information as an upper bound. To derivethe lower bound, we employed the combined approach of sequential decod-ing and position-based coding, which at its core, is about how well a corre-lated state can be distinguished from a product state. Stepping back a bit, thisis conceptually similar to the idea behind the converse upper bound, whichultimately features a comparison between a correlated state and a productstate. We can consider the one-shot setting to contain the fundamental infor-mation theoretic argument for entanglement-assisted communication.

With the one-shot setting in hand, we moved on the asymptotic setting,in which the channel is allowed to be used multiple times (as a model of how

592

Page 604: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

communication channels would be used in practice). We defined various no-tions of communication rates, including achievable rates, capacity, weak con-verse rates, strong converse rates, and strong converse capacity. With thefundamental one-shot bounds in place, we then substituted one use of thechannel N with n uses (the tensor-product channel N⊗n) and applied vari-ous technical arguments to prove that the mutual information of a channelis equal to both its capacity and strong converse capacity for entanglement-assisted communication. As a main step to establish the capacity, we provedthat the mutual information of a channel is additive, and as a main step toestablish the strong converse capacity, we proved that the sandwiched Rényimutual information of a channel is additive.

Finally, we calculated the entanglement-assisted classical for several keychannels, including the depolarizing, erasure, and generalized amplitude damp-ing channels, in order to illustrate the theory on some concrete examples.

As it turns out, the strongest results known in quantum information the-ory are for the entanglement-assisted capacity. The results stated above holdfor all quantum channels, and thus can be viewed from the physics perspec-tive as universal physical laws delineating the ultimate limits of entanglement-assisted classical communication for any physical process (i.e., described bya quantum channel). In this sense, shared entanglement simplifies quantuminformation theory immensely.

Going forward from here, the same concepts such as capacity, achievablerate, etc. can be defined for different communication tasks (i.e., unassistedclassical communication, quantum communication, private communication,etc.). What changes is that the known results are not as strong as they arefor the entanglement-assisted setting. We know the capacity of these othercommunication tasks only for certain subclasses of channels. This might beconsidered unfortunate, but a different perspective is that it is exciting, be-cause rather exotic phenomena such as superadditivity and superactivationcan occur.

6.5 Bibliographic Notes

Entanglement-assisted classical communication was devised by Bennett et al.(1999b), as an information-theoretic extension of super-dense coding. Theentanglement-assisted classical capacity theorem was proven by Bennett et al.

593

Page 605: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

(2002), and Holevo (2002a) gave a different proof for this theorem.

Entanglement-assisted classical communication in the one-shot setting wasconsidered by Datta and Hsieh (2013); Matthews and Wehner (2014); Dattaet al. (2016); Anshu et al. (2019); Qi et al. (2018); Anshu et al. (2019). Proposi-tion 6.5 is due to Matthews and Wehner (2014), and the proof given here wasfound independently by Qi et al. (2018) and Anshu et al. (2019).

The position-based coding method was introduced by Anshu et al. (2019).It can be understood as a quantum generalization of pulse position modu-lation (Verdu, 1990; Cariolaro and Erseghe, 2003). Sequential decoding wasconsidered by Giovannetti et al. (2012); Sen (2011); Wilde (2013); Gao (2015);Oskouei et al. (2019), and Theorem 6.7 is due to Oskouei et al. (2019). Propo-sition 6.8 is due to Qi et al. (2018), and the proof given here was found byOskouei et al. (2019).

The strong converse for entanglement-assisted classical capacity was es-tablished by Bennett et al. (2014) and Gupta and Wilde (2015), with the latterpaper employing the Rényi entropic method used in this chapter. Eq. (6.2.66)and the additivity of sandwiched Rényi mutual information of bipartite states(Proposition 6.21) were established by Beigi (2013). Eq. (6.2.67) was estab-lished by Gupta and Wilde (2015), and the completely-bound 1 → α normwas studied in depth by Devetak et al. (2006). Theorem 6.22 was provenby Gupta and Wilde (2015), by employing the multiplicativity of completelybounded norms (Eq. (6.2.88)) found by Devetak et al. (2006).

The entanglement-assisted classical capacity of the depolarizing and era-sure channels was evaluated by Bennett et al. (1999b), the same capacity forthe amplitude damping channel was evaluated by Giovannetti and Fazio(2005), and the same capacity for the generalized amplitude damping channelwas evaluated by Li-Zhen and Mao-Fa (2007a).

The proofs in Appendix 6.B are due to Cooney et al. (2016), and the proofsin Appendices 6.E and 6.F are due to Jencova (2006) (with the proofs in thisbook containing some slight variations). The Lieb concavity theorem (Theo-rem 6.30) is due to Lieb (1973).

Appendix 6.A Proof of Theorem 6.7

Theorem 6.7 is a consequence of the following more general result:

594

Page 606: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Theorem 6.25

Let PiNi=1 be a finite set of projectors. For every vector |ψ〉 and c > 0,

‖|ψ〉‖22 − ‖PNPN−1 · · · P1|ψ〉‖2

2 ≤ (1 + c) ‖(1− PN)|ψ〉‖22

+ (2 + c + c−1)N−1

∑i=2‖(1− Pi)|ψ〉‖2

2

+ (2 + c−1) ‖(1− P1)|ψ〉‖22 .

(6.A.1)

Indeed, recall from Theorem 2.4 that every density operator ρ has a spec-tral decomposition of the following form:

ρ = ∑j∈J

p(j)|ψj〉〈ψj|, (6.A.2)

where p : J→ [0, 1] is a probability distribution, and |ψj〉j∈J is an orthonor-mal set of eigenvectors. Applying Theorem 6.25, we find that

1− Tr[PNPN−1 · · · P1|ψj〉〈ψj|P1 · · · PN−1]

=∥∥|ψj〉

∥∥22 −

∥∥PNPN−1 · · · P1|ψj〉∥∥2

2 (6.A.3)

≤ (1 + c)∥∥(1− PN) |ψj〉

∥∥22 + (2 + c + c−1)

N−1

∑i=2

∥∥(1− Pi) |ψj〉∥∥2

2

+ (2 + c−1)∥∥(1− P1) |ψj〉

∥∥22 (6.A.4)

= (1 + c)Tr[(1− PN) |ψj〉〈ψj|] + (2 + c + c−1)N−1

∑i=2

Tr[(1− Pi)|ψj〉〈ψj|]

+ (2 + c−1)Tr[(1− P1) |ψj〉〈ψj|]. (6.A.5)

The reduction from Theorem 6.25 to Theorem 6.7 follows by averaging theabove inequality over the probability distribution p : J → [0, 1] and from thefact that the right-hand side of the resulting inequality can be bounded fromabove by

(1 + c)Tr[(1− PN)ρ] + (2 + c + c−1)N−1

∑i=1

Tr[(1− Pi)ρ], (6.A.6)

595

Page 607: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

so that

1− Tr[PNPN−1 · · · P1ρP1 · · · PN−1PN]

≤ (1 + c)Tr[(1− PN)ρ] + (2 + c + c−1)N−1

∑i=1

Tr[(1− Pi)ρ].(6.A.7)

We therefore shift our focus to proving Theorem 6.25, and we do so with theaid of several lemmas. To simplify the notation, we hereafter employ thefollowing shorthands:

‖· · · ‖ψ ≡ ‖· · · |ψ〉‖2 , (6.A.8)

〈· · · 〉ψ ≡ 〈ψ| · · · |ψ〉, (6.A.9)

Pi ≡ 1− Pi, (6.A.10)

where for a given operator A the expression 〈A〉ψ = 〈ψ|A|ψ〉 is assumedto mean 〈ψ|(A|ψ〉). Furthermore, we also assume without loss of generalitythat the vector |ψ〉 in Theorem 6.25 is a unit vector. This assumption can bedropped by scaling the resulting inequality by an arbitrary positive numbercorresponding to the norm of |ψ〉.

First recall that, due to the fact that P2 = P for every projector P, we havethe following identities holding for all i ∈ 1, 2, . . . , N:

〈PiPi−1 · · · P1〉ψ = 〈PiPiPi−1 · · · P1〉ψ,〈P1 · · · Pi〉ψ = 〈P1 · · · PiPi〉ψ,

(6.A.11)

under the convention that Pi−1 · · · P1 = P1 · · · Pi−1 = 1 for i = 1.

Lemma 6.26

For a set PiNi=1, a unit vector |ψ〉, and employing the shorthand in

(6.A.8)–(6.A.10), we have the following identities and inequality:

N

∑i=1〈PiPi−1 · · · P1〉ψ = 1− 〈PN · · · P1〉ψ, (6.A.12)

N

∑i=1〈P1 · · · Pi−1Pi〉ψ = 1− 〈P1 · · · PN〉ψ, (6.A.13)

N

∑i=1〈P1 · · · Pi−1PiPi−1 · · · P1〉ψ = 1− 〈P1 · · · PN · · · P1〉ψ, (6.A.14)

596

Page 608: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

1−√〈PN〉ψ

√〈P1 · · · PN · · · P1〉ψ

≤N

∑i=1

√〈Pi〉ψ

√〈P1 · · · Pi−1PiPi−1 · · · P1〉ψ,

(6.A.15)

under the convention that Pi−1 · · · P1 = P1 · · · Pi−1 = 1 for i = 1.

PROOF: The following identities are straightforward to verify:

1 = 〈P1〉ψ + 〈P2P1〉ψ + · · ·+ 〈PN−1PN−2 · · · P1〉ψ + 〈PNPN−1 · · · P1〉ψ+ 〈PNPN−1 · · · P1〉ψ, (6.A.16)

1 = 〈P1〉ψ + 〈P1P2〉ψ + · · ·+ 〈P1 · · · PN−2PN−1〉ψ + 〈P1 · · · PN−1PN〉ψ+ 〈P1 · · · PN−1PN〉ψ, (6.A.17)

1 = 〈P1〉ψ + 〈P1P2P1〉ψ + · · ·+ 〈P1 · · · PN−2PN−1PN−2 · · · P1〉ψ+ 〈P1 · · · PN−1PNPN−1 · · · P1〉ψ + 〈P1 · · · PN−1PNPN−1 · · · P1〉ψ.

(6.A.18)

Consequently, from the equalities in (6.A.16), (6.A.17), and (6.A.18), we obtain(6.A.12), (6.A.13), and (6.A.14), respectively. The following equality is a directconsequence of (6.A.16) and (6.A.11):

1 = 〈P1〉ψ + 〈P2P2P1〉ψ + · · ·+ 〈PN−1PN−1PN−2 · · · P1〉ψ+ 〈PN PNPN−1 · · · P1〉ψ + 〈PNPNPN−1 · · · P1〉ψ. (6.A.19)

By applying the Cauchy–Schwarz inequality from (2.2.23) to (6.A.19), we findthat

1 ≤ 〈P1〉ψ +√〈P2〉ψ

√〈P1P2P1〉ψ

+ · · ·+√〈PN〉ψ

√〈P1 · · · PN−1PNPN−1 · · · P1〉ψ

+√〈PN〉ψ

√〈P1 · · · PN−1PNPN−1 · · · P1〉ψ, (6.A.20)

from which (6.A.15) immediately follows.

597

Page 609: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Lemma 6.27

For a set PiNi=1 of projectors, a unit vector |ψ〉, and employing the

shorthand in (6.A.8)–(6.A.10), the following inequality holds for N ≥ 2:

N

∑i=1‖Pi(1− Pi−1 · · · P1)‖2

ψ ≤N−1

∑i=1‖Pi‖2

ψ, (6.A.21)

under the convention that Pi−1 · · · P1 = P1 · · · Pi−1 = 1 for i = 1. Equiv-alently,

N

∑i=2‖Pi(1− Pi−1 · · · P1)‖2

ψ ≤N−1

∑i=1‖Pi‖2

ψ, (6.A.22)

due to the aforementioned convention.

PROOF: Consider the following chain of equalities:

N

∑i=1‖Pi(1− Pi−1 · · · P1)‖2

ψ

=N

∑i=1‖Pi − PiPi−1 · · · P1‖2

ψ (6.A.23)

=N

∑i=1

(‖Pi‖2

ψ − 〈PiPi−1 · · · P1〉ψ − 〈P1 · · · Pi−1Pi〉ψ

+〈P1 · · · Pi−1PiPi−1 · · · P1〉ψ)

(6.A.24)

=

(N

∑i=1‖Pi‖2

ψ

)− 1 + 〈PN · · · P1〉ψ − 1 + 〈P1 · · · PN〉ψ

+ 1− 〈P1 · · · PN · · · P1〉ψ (6.A.25)

=

(N

∑i=1‖Pi‖2

ψ

)− 1 + 〈PNPNPN−1 · · · P1〉ψ + 〈P1 · · · PN−1PNPN〉ψ

− 〈P1 · · · PN · · · P1〉ψ. (6.A.26)

To obtain (6.A.24), we used the identities in (6.A.11). Next, to get (6.A.25), theidentities in (6.A.12), (6.A.13), and (6.A.14) of Lemma 6.26 were used. Con-

598

Page 610: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

tinuing, we have that

Eq. (6.A.26) ≤(

N

∑i=1‖Pi‖2

ψ

)− 1− 〈P1 · · · PN · · · P1〉ψ

+ 2√〈PN〉ψ

√〈P1 · · · PN · · · P1〉ψ (6.A.27)

=

(N

∑i=1‖Pi‖2

ψ

)− 1 + 〈PN〉ψ

−(√〈PN〉ψ −

√〈P1 · · · PN · · · P1〉ψ

)2

(6.A.28)

≤(

N

∑i=1‖Pi‖2

ψ

)− ‖PN‖2

ψ =N−1

∑i=1‖Pi‖2

ψ. (6.A.29)

To obtain (6.A.27), the Cauchy-Schwarz inequality was employed.

We are now in a position to prove Theorem 6.25.

Proof of Theorem 6.25

Consider that

1− ‖PN · · · P1‖2ψ = 1− 〈P1 · · · PN · · · P1〉ψ

+ 2(

1−√〈PN〉ψ

√〈P1 · · · PN · · · P1〉ψ

)

− 2(

1−√〈PN〉ψ

√〈P1 · · · PN · · · P1〉ψ

)(6.A.30)

= 2(

1−√〈PN〉ψ

√〈P1 · · · PN · · · P1〉ψ

)

−(√〈PN〉ψ −

√〈P1 · · · PN · · · P1〉ψ

)2

− 1 + 〈PN〉ψ. (6.A.31)

Continuing, we have that

Eq. (6.A.31)

≤ −‖PN‖2ψ + 2

(1−

√PN

√〈P1 · · · PN · · · P1〉ψ

)(6.A.32)

≤ −‖PN‖2ψ + 2

N

∑i=1

√〈Pi〉ψ

√〈P1 · · · Pi−1PiPi−1 · · · P1〉ψ (6.A.33)

599

Page 611: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

≤ −‖PN‖2ψ + 2

N

∑i=1

√〈Pi〉ψ

(‖Pi‖ψ + ‖Pi(1− Pi−1 · · · P1)‖ψ

). (6.A.34)

First, (6.A.32) is obtained by observing that

−(√〈PN〉ψ −

√〈P1 · · · PN · · · P1〉ψ

)2− 1 + 〈PN〉ψ ≤ −1 + 〈PN〉ψ

= −‖PN‖2ψ. (6.A.35)

Next, (6.A.33) follows from (6.A.15) of Lemma 6.26. Then, (6.A.34) is a conse-quence of the triangle inequality:

√〈P1 · · · Pi−1PiPi−1 · · · P1〉ψ =

√〈P1 · · · Pi−1PiPiPi−1 · · · P1〉ψ (6.A.36)

= ‖PiPi−1 · · · P1‖ψ (6.A.37)

= ‖Pi(−1 + 1− Pi−1 · · · P1)‖ψ (6.A.38)

= ‖−Pi + Pi(1− Pi−1 · · · P1)‖ψ (6.A.39)

≤ ‖Pi‖ψ + ‖Pi(1− Pi−1 · · · P1)‖ψ. (6.A.40)

Continuing, we have that

Eq. (6.A.34)

= −‖PN‖2ψ + 2

N

∑i=1‖Pi‖2

ψ + 2N

∑i=1

(‖Pi‖ψ‖Pi(1− Pi−1 · · · P1)‖ψ

)(6.A.41)

= −‖PN‖2ψ + 2

N

∑i=1‖Pi‖2

ψ + 2N

∑i=2

(‖Pi‖ψ‖Pi(1− Pi−1 · · · P1)‖ψ

)(6.A.42)

≤ −‖PN‖2ψ + 2

N

∑i=1‖Pi‖2

ψ

+N

∑i=2

(c‖Pi‖2

ψ + c−1‖Pi(1− Pi−1 · · · P1)‖2ψ

)(6.A.43)

≤ −‖PN‖2ψ + 2

N

∑i=1‖Pi‖2

ψ + cN

∑i=2‖Pi‖2

ψ + c−1N−1

∑i=1‖Pi‖2

ψ (6.A.44)

≤ (1 + c)‖PN‖2ψ + (2 + c−1)‖P1‖2

ψ + (2 + c + c−1)N−1

∑i=2‖Pi‖2

ψ. (6.A.45)

Eq. (6.A.42) follows from the convention that Pi−1 · · · P1 = 1 for i = 1. Eq.(6.A.43) is a consequence of the inequality 2xy ≤ cx2 + c−1y2, holding forx, y ∈ R and c > 0. Finally, (6.A.44) is obtained by using Lemma 6.27.

600

Page 612: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Appendix 6.B The α → 1 Limit of the Sandwic-hed Rényi Mutual Information of aChannel

In this appendix, we show that

limα→1−

Iα(N) = limα→1+

Iα(N) = I(N), (6.B.1)

where we recall that

Iα(N) = supψRA

Dα(NA→B(ψRA)‖ψR ⊗NA→B(ψA)), (6.B.2)

Iα(N) = supψRA

infσB

Dα(NA→B(ψRA)‖ψR ⊗ σB). (6.B.3)

In these definitions, ψRA is a pure state, with the dimension of R equal to thedimension of A, and in the definition of Iα(N) the infimum is over states σB.

All of the arguments presented here are similar to those in Appendix 5.B,which we refer to for additional details.

As a consequence of the fact that Iα(N) increases monotonically with α(see Proposition 4.21), as well as the fact that limα→1 Dα(ρ‖σ) = D(ρ‖σ) (seeProposition 4.20), we find that

limα→1−

Iα(N) = supα∈(0,1)

supψRA

Dα(NA→B(ψRA)‖ψR ⊗NA→B(ψA)) (6.B.4)

= supψRA

supα∈(0,1)

Dα(NA→B(ψRA)‖ψR ⊗NA→B(ψA)) (6.B.5)

= supψRA

D(NA→B(ψRA)‖ψR ⊗NA→B(ψA)) (6.B.6)

= I(N), (6.B.7)

as required.

Similarly, for the sandwiched Rényi mutual information, we use the factthat it increases monotonically with α (see Proposition 4.29), along with thefact that limα→1 Dα(ρ‖σ) = D(ρ‖σ) (see Proposition 4.28), to obtain

limα→1+

Iα(N) = infα∈(1,∞)

supψRA

infσB

Dα(NA→B(ψRA)‖ψR ⊗ σB) (6.B.8)

601

Page 613: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

= supψRA

infα∈(1,∞)

infσB

Dα(NA→B(ψRA)‖ψR ⊗ σB) (6.B.9)

= supψRA

infσB

infα∈(1,∞)

Dα(NA→B(ψRA)‖ψR ⊗ σB) (6.B.10)

= supψRA

infσB

D(NA→B(ψRA)‖ψR ⊗ σB) (6.B.11)

= supψRA

D(NA→B(ψRA)‖ψR ⊗NA→B(ψA)) (6.B.12)

= I(N). (6.B.13)

To obtain the second equality, we made use of the minimax theorem in The-orem 2.19 to exchange infα∈(1,∞) and supψRA

. Specifically, we applied thattheorem to the function

(α, ψRA) 7→ infσB

Dα(NA→B(ψRA)‖ψR ⊗ σB), (6.B.14)

which is monotonically increasing in the first argument and continuous in thesecond argument.

Appendix 6.C Achievability from a Different Pointof View

Here we show that the mutual information I(N) is an achievable rate basedon the alternate definition given in Appendix A. According to that definition,a rate R ∈ R+ is an achievable rate for entanglement-asssisted classical com-munication over N if there exists a sequence (n, |Mn|, εn)n∈N of (n, |M|,ε) entanglement-assisted classical communication protocols over n uses of Nsuch that

lim infn→∞

1n

log2 |Mn| ≥ R and limn→∞

εn = 0. (6.C.1)

To start, let us recall Corollary 6.17, which states that for all ε ∈ (0, 1],n ∈N, and α ∈ (0, 1), there exists an (n, |M|, ε) protocol satisfying

1n

log2 |M| ≥ Iα(N)− 1n(1− α)

log2

(2ε

)− 3

n. (6.C.2)

Fix constants δ1, δ2 satisfying 0 < δ2 < δ1 < 1. Pick αn ∈ (0, 1) and εn ∈ (0, 1]as follows:

αn := 1− n−(1−δ1), εn := 2−nδ2 . (6.C.3)

602

Page 614: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Plugging in to (6.C.2), we find that there exists a sequence of (n, |Mn|, εn)n∈N

protocols satisfying

1n

log2 |Mn| ≥ Iαn(N)− 1n(1− αn)

log2

(2εn

)− 3

n(6.C.4)

= Iαn(N)− 1 + nδ2

nδ1− 3

n(6.C.5)

= Iαn(N)− 1nδ1− 1

nδ1−δ2− 3

n. (6.C.6)

Now taking the limit n→ ∞, we find that

lim infn→∞

1n

log2 |Mn| ≥ lim infn→∞

[Iαn(N)− 1

nδ1− 1

nδ1−δ2− 3

n

](6.C.7)

= I(N), (6.C.8)

and limn→∞ εn = 0. The equality above follows because limn→∞ αn = 1 andlimα→1 Iα(N) = I(N) (see Appendix 6.B for a proof). Thus, it follows that themutual information rate I(N) is achievable according to the alternate defini-tion given in Appendix A.

In the approach detailed above, the error probability decays subexponen-tially to zero (i.e., slower than an exponential decay) and the rate increasesto I(N) with increasing n. If we would like to have exponential decay of theerror probability, then we can instead fix the rate R to be a constant satisfyingR < I(N) and reconsider the analysis. Rearranging the inequality in (6.2.22)in order to get a bound on εn, we find that for all α ∈ (0, 1), there exists asequence of (n, |Mn|, εn)n∈N protocols satisfying

εn ≤ 2 · 2−n(1−α)(Iα(N)−R− 3n). (6.C.9)

Since R < I(N), limα→1 Iα(N) = I(N), and since Iα(N) is monotonically in-creasing in α (this follows from Proposition 4.29), there exists an α∗ < 1 suchthat Iα∗(N) > R. Applying the bound in (6.C.9) to this value of α, we find that

εn ≤ 2 · 2−n(1−α∗)(Iα∗ (N)−R− 3n). (6.C.10)

Then, taking the limit n → ∞ on both sides of this inequality, we concludethat limn→∞ εn = 0 exponentially fast. Thus, by choosing R as a constantsatisfying R < I(N) it follows that there exists a sequence of (n, 2nR, εn)n∈N

protocols such that the error probability εn decays exponentially fast to zero.

603

Page 615: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Appendix 6.D Proof of Lemma 6.20

We start by writing the definition of Iα(A; B)ρ as

Iα(A; B)ρ = infσB

α

α− 1log2

∥∥∥∥(

ρ1−α2α

A ⊗ σ1−α2α

B

)ρAB

1−α2α

A ⊗ σ1−α2α

B

)∥∥∥∥α

(6.D.1)

α− 1log2 inf

σB

∥∥∥∥(

ρ1−α2α

A ⊗ σ1−α2α

B

)ρAB

1−α2α

A ⊗ σ1−α2α

B

)∥∥∥∥α

, (6.D.2)

where α > 1 and the optimization is over states σB. Then, for every purifica-tion |ψ〉ABC of ρAB, we have(

ρ1−α2α

A ⊗ σ1−α2α

B

)ρAB

1−α2α

A ⊗ σ1−α2α

B

)

= TrC

[(ρ

1−α2α

A ⊗ σ1−α2α

B

)|ψ〉〈ψ|ABC

1−α2α

A ⊗ σ1−α2α

B

)]. (6.D.3)

Now, the operator inside TrC on the last line in the equation above is rankone, which means that

TrC

[(ρ

1−α2α

A ⊗ σ1−α2α

B

)|ψ〉〈ψ|ABC

1−α2α

A ⊗ σ1−α2α

B

)]and

TrAB

[(ρ

1−α2α

A ⊗ σ1−α2α

B

)|ψ〉〈ψ|ABC

1−α2α

A ⊗ σ1−α2α

B

)]

have the same non-zero eigenvalues. This means that their Schatten normsare equal, so that

Iα(A; B)ρ

α− 1log2 inf

σB

∥∥∥∥TrC

[(ρ

1−α2α

A ⊗ σ1−α2α

B

)|ψ〉〈ψ|ABC

1−α2α

A ⊗ σ1−α2α

B

)]∥∥∥∥α

(6.D.4)

α− 1log2 inf

σB

∥∥∥∥TrAB

[(ρ

1−α2α

A ⊗ σ1−α2α

B

)|ψ〉〈ψ|ABC

1−α2α

A ⊗ σ1−α2α

B

)]∥∥∥∥α

(6.D.5)

Now, we use the variational characterization of the Schatten norm in (2.2.106),which states that for every operator X,

‖X‖p = sup‖Y‖p′=1

∣∣∣Tr[Y†X]∣∣∣ (6.D.6)

604

Page 616: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

for all 1 ≤ p ≤ ∞, where p′ satisfies 1p + 1

p′ = 1. Note that if X is positivesemi-definite, then we can restrict the optimization to positive semi-definiteoperators Y. Using (6.D.6), with p = α and p′ = α

α−1 , on the expressionin (6.D.5), and since the argument of the norm in that expression is positivesemi-definite, we can optimize over positive semi-definite operators τC to ob-tain

Iα(A; B)ρ

α− 1log2 inf

σBsup

τC

Tr[

τα−1

αC TrAB

[(ρ

1−α2α

A ⊗ σ1−α2α

B

|ψ〉〈ψ|ABC

1−α2α

A ⊗ σ1−α2α

B

)]](6.D.7)

α− 1log2 inf

σBsup

τC

Tr[(

ρ1−α2α

A ⊗ σ1−α2α

B ⊗ τα−12α

C

)

×|ψ〉〈ψ|ABC

1−α2α

A ⊗ σ1−α2α

B ⊗ τα−12α

C

)](6.D.8)

α− 1log2 inf

σBsup

τC

Tr[(

ρ1−α

αA ⊗ σ

1−αα

B ⊗ τα−1

αC

)|ψ〉〈ψ|ABC

](6.D.9)

α− 1log2 sup

τC

infσB

Tr[(

ρ1−α

αA ⊗ σ

1−αα

B ⊗ τα−1

αC

)|ψ〉〈ψ|ABC

], (6.D.10)

where to obtain the last line we applied Sion’s minimax theorem (Theorem 2.18)to the function

(τC, σB) 7→ Tr[(

ρ1−α

αA ⊗ σ

1−αα

B ⊗ τα−1

αC

)|ψ〉〈ψ|ABC

], (6.D.11)

which is convex in the first argument because σB 7→ σ1−α

αB is operator convex

and concave in the second argument because τC 7→ τα−1

αC is operator concave.

Finally, we use Proposition 2.12, which is that

‖X‖p = infY≥0,‖Y‖p′=1

Tr[XY] (6.D.12)

for all 0 < p < 1, where 1p +

1p′ = 1. Applying this to (6.D.10) with p′ = α

1−α ,so that p = α

2α−1 , we conclude that

Iα(A; B)ρ

605

Page 617: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

α− 1log2 sup

τC

infσB

Tr[(

ρ1−α

αA ⊗ σ

1−αα

B ⊗ τα−1

αC

)|ψ〉〈ψ|ABC

](6.D.13)

α− 1log2 sup

τC

infσB

Tr[

σ1−α

αB TrAC

[(ρ

1−αα

A ⊗ τα−1

αC

)|ψ〉〈ψ|ABC

]](6.D.14)

α− 1log2 sup

τC

∥∥∥∥TrAC

[(ρ

1−αα

A ⊗ τα−1

αC

)|ψ〉〈ψ|ABC

]∥∥∥∥α

2α−1

, (6.D.15)

the last line of which is (6.2.66), as required.

To prove (6.2.67), we use the fact that the definition of the sandwichedRényi mutual information of a bipartite state can be written as in (6.D.2), i.e.,

Iα(R; B)ρ =α

α− 1log2 inf

σB

∥∥∥∥(

ρ1−α2α

R ⊗ σ1−α2α

B

)ρRB

1−α2α

R ⊗ σ1−α2α

B

)∥∥∥∥α

, (6.D.16)

which means that the definition in (6.2.82) can be written as

Iα(N)

α− 1supψRA

log2 infσB

∥∥∥∥(

ψ1−α2α

R ⊗ σ1−α2α

B

)NA→B(ψRA)

1−α2α

R ⊗ σ1−α2α

B

)∥∥∥∥α

.

(6.D.17)Now, we use the fact mentioned in (2.2.28), which is that for every pure stateψRA, with the systems R and A having the same dimensions, there exists anoperator XR such that

|ψ〉RA = (XR ⊗ 1A)|Γ〉RA, (6.D.18)

and Tr[X†RXR] = 1, with this latter equality following from (2.2.33). By taking

a polar decomposition of XR as XR = UR√

τR for a unitary UR and a state τR(see Theorem 2.2), we can then write

|ψ〉RA = (UR√

τR ⊗ 1A)|Γ〉RA. (6.D.19)

This implies that

ψR = TrB[NA→B(ψRA)] (6.D.20)

= TrB[(UR√

τR ⊗ 1B)NA→B(ΓRA)(√

τRU†R ⊗ 1B)] (6.D.21)

= URτRU†R, (6.D.22)

where to obtain the last equality we used the fact that N is trace preservingand the fact that TrA[|Γ〉〈Γ|RA] = 1R. Using this, we find that(

ψ1−α2α

R ⊗ σ1−α2α

B

)NA→B(ψRA)

1−α2α

R ⊗ σ1−α2α

B

)

606

Page 618: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

=

(URτ

1−α2α

R U†R ⊗ σ

1−α2α

B

)UR√

τRNA→B(ΓRA)√

τRU†R

(URτ

1−α2α

R U†R ⊗ σ

1−α2α

B

)

=

(URτ

12αR ⊗ σ

1−α2α

B

)NA→B(ΓRA)

12αR U†

R ⊗ σ1−α2α

B

). (6.D.23)

Therefore, by exploiting unitary invariance of the α-Schatten norm, we canwrite Iα(N) as

Iα(N)

α− 1sup

ρR

log2 infσB

∥∥∥∥(

ρ1

2αR ⊗ σ

1−α2α

B

)NA→B(ΓRA)

12αR ⊗ σ

1−α2α

B

)∥∥∥∥α

(6.D.24)

α− 1sup

ρR

log2 infσB

∥∥∥∥NA→B(ΓRA)12

1αR ⊗ σ

1−αα

B

)NA→B(ΓRA)

12

∥∥∥∥α

. (6.D.25)

Now, the function

(ρR, σB) 7→∥∥∥∥NA→B(ΓRA)

12

1αR ⊗ σ

1−αα

B

)NA→B(ΓRA)

12

∥∥∥∥α

(6.D.26)

is concave in the first argument (this follows from Lemma 6.29 in Appendix 6.Fbelow) and convex in the second argument (this follows from the operator

convexity of σB 7→ σ1−α

αB for α > 1 and convexity of the Schatten norm).

Thus, by the Sion minimax theorem (Theorem 2.18), we can exchange supρR

and infσB . Also, we define the completely positive map S(α)σB by S

(α)σB (·) :=

σ1−α2α

B (·)σ1−α2α

B . We can then further rewrite Iα(N) as

Iα(N)

α− 1infσB

log2 supρR

∥∥∥∥(S(α)σB NA→B)

12αR |Γ〉〈Γ|RAρ

12αR

)∥∥∥∥α

(6.D.27)

α− 1infσB

log2

∥∥∥S(α)σB N∥∥∥

CB, 1→α, (6.D.28)

where, to arrive at the last line, we used the definition in (6.2.68). Also, con-sider that the optimum in (6.2.68) is achieved when Tr[YR] = 1. Therefore,

Iα(N) =α

α− 1infσB

log2

∥∥∥S(α)σB N∥∥∥

CB, 1→α, (6.D.29)

as required.

607

Page 619: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Appendix 6.E Alternate Expression for the 1 → αCB Norm

In this section, we show that

‖M‖CB,1→α = supYRA>0

‖MA→B(YRA)‖α

‖TrA[YRA]‖α

(6.E.1)

for every completely positive map M. We start with the expression in (6.2.68)and write it alternatively as follows:

‖M‖CB,1→α = supYR>0,

Tr[YR]≤1

∥∥∥∥MA→B

(Y

12αR |Γ〉〈Γ|RAY

12αR

)∥∥∥∥α

(6.E.2)

= supYR>0,‖YR‖α≤1

∥∥∥∥MA→B

(Y

12R |Γ〉〈Γ|RAY

12R

)∥∥∥∥α

(6.E.3)

= supYR>0

∥∥∥∥MA→B

(Y

12R |Γ〉〈Γ|RAY

12R

)∥∥∥∥α

‖YR‖α

. (6.E.4)

Now, we use the fact that there is a one-to-one correspondence between theoperators YR and the vectors

|ΓY〉RA := (Y12R ⊗ 1A)|Γ〉RA. (6.E.5)

This allows us to rewrite the optimization in (6.E.4) in terms of such vec-tors. Then, by employing isometric invariance of the norms with respect toan isometry acting on the reference system R, we can restrict the optimizationto arbitrary vectors |ψ〉RA. Therefore, we have that

‖M‖CB,1→α = supψRA

‖MA→B(ψRA)‖α

‖TrA[ψRA]‖α

, (6.E.6)

where ψRA ≡ |ψ〉〈ψ|RA. Since the optimization in (6.E.6) is over a subset of thepositive semi-definite operators YRA (and by approximation), we concludethe inequality

‖M‖CB,1→α ≤ supYRA>0

‖MA→B(YRA)‖α

‖TrA[YRA]‖α

. (6.E.7)

608

Page 620: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

It remains to show the opposite inequality. Consider a vector |φ〉SRA thatpurifies YRA > 0, in the sense that TrS[φSRA] = YRA. Then we have that

‖MA→B(YRA)‖α

‖TrA[YRA]‖α

=‖(MA→B ⊗ TrS)(φSRA)‖α

‖TrSA[φSRA]‖α

(6.E.8)

≤ sup|φ〉SRA

‖(MA→B ⊗ TrS)(φSRA)‖α

‖TrSA[φSRA]‖α

(6.E.9)

= ‖M⊗ Tr‖CB,1→α (6.E.10)

= ‖M‖CB,1→α ‖Tr‖CB,1→α (6.E.11)

= ‖M‖CB,1→α (6.E.12)

The third-to-last equality follows from (6.E.6). The second-to-last equalityfollows from (6.2.88), as shown in Appendix 6.F. The final inequality followsbecause ‖Tr‖CB,1→α = 1, as can be readily verified.

Appendix 6.F Proof of the Multiplicativity of the1 → α CB Norm

In this appendix, we prove the statement in (6.2.88), which is that

‖M1 ⊗M2‖CB,1→α = ‖M1‖CB,1→α ‖M2‖CB,1→α (6.F.1)

for every two completely positive maps M1 and M2 and all α > 1. Recall from(6.2.68) that

‖M‖CB,1→α = supYR>0,

Tr[YR]≤1

∥∥∥∥Y1

2αR ΓM

RBY1

2αR

∥∥∥∥α

, (6.F.2)

where ΓMRB := MA→B(ΓRA) is the Choi representation of M, and the dimen-

sion of R is the same as the dimension of A. For a completely positive mapPC→D, let us also define

‖P‖α→α := supZC>0

‖PC→D(ZC)‖α

‖ZC‖α

. (6.F.3)

Note that

‖P‖α→α = supZC>0

‖PC→D(ZC)‖α

‖ZC‖α

(6.F.4)

609

Page 621: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

= supZC>0,‖ZC‖α≤1

‖PC→D(ZC)‖α (6.F.5)

= supYC>0,

Tr[YC]≤1

∥∥∥∥PC→D(Y1α

C )

∥∥∥∥α

, (6.F.6)

where the last equality follows from the substitution YC = ZαC so that Tr[YC] =

Tr[ZαC] = ‖ZC‖α

α.

Now, it immediately follows that

‖M1 ⊗M2‖CB,1→α ≥ ‖M1‖CB,1→α ‖M2‖CB,1→α . (6.F.7)

Indeed, due to the fact that the Choi representation of M1 ⊗M2 has a tensor-product form (see (3.2.19)), we can restrict the optimization in the definitionof the norm ‖M1 ⊗M2‖CB,1→α to tensor-product operators YR1⊗YR2 to obtain

‖M1 ⊗M2‖CB,1→α

= supYR1R2>0,

Tr[YR1R2 ]≤1

∥∥∥∥Y1

2αR1R2

(ΓM1R1B1⊗ ΓM2

R2B2)Y

12αR1R2

∥∥∥∥α

(6.F.8)

≥ supYR1>0,YR2>0,

Tr[YR1 ]≤1,Tr[YR2 ]≤1

∥∥∥∥(Y1

2αR1⊗Y

12αR2)(ΓM1

R1B1⊗ ΓM2

R2B2)(Y

12αR1⊗Y

12αR2)

∥∥∥∥α

(6.F.9)

= supYR1>0,YR2>0,

Tr[YR1 ]≤1,Tr[YR2 ]≤1

∥∥∥∥Y1

2αR1

ΓM1R1B1

Y1

2αR1⊗Y

12αR2

ΓM2R2B2

Y1

2αR2

∥∥∥∥α

(6.F.10)

= supYR1>0,YR2>0,

Tr[YR1 ]≤1,Tr[YR2 ]≤1

∥∥∥∥Y1

2αR1

ΓM1R1B1

Y1

2αR1

∥∥∥∥α

∥∥∥∥Y1

2αR2

ΓM2R2B2

Y1

2αR2

∥∥∥∥α

(6.F.11)

= supYR1>0,

Tr[YR1 ]≤1

∥∥∥∥Y1

2αR1

ΓM1R1B1

Y1

2αR1

∥∥∥∥α

supY2>0,

Tr[YR2 ]≤1

∥∥∥∥Y1

2αR2

ΓM2R2B2

Y1

2αR2

∥∥∥∥α

(6.F.12)

= ‖M1‖CB,1→α ‖M2‖CB,1→α . (6.F.13)

Now we establish the opposite inequality. Let UMA→BE be a linear map that

extends MA→B, in the sense that there is a linear operator UMA→BE such that

UMA→BE(YA) = UM

A→BEYA(UMA→BE)

†, (6.F.14)

610

Page 622: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

TrE[UMA→BE(YA)] = MA→B(YA). (6.F.15)

Due to the fact that Y1

2αR UM

A→BE(ΓRA)Y1

2αR is a rank-one operator, and from an

application of a generalization of the Schmidt decomposition (Theorem 3.2),the following operators have the same non-zero eigenvalues:

TrE[Y1

2αR UM

A→BE(ΓRA)Y1

2αR ] = Y

12αR MA→B(ΓRA)Y

12αR (6.F.16)

and

TrRB[Y1

2αR UM

A→BE(ΓRA)Y1

2αR ] = TrR[Y

12αR Mc

A→B(ΓRA)Y1

2αR ] (6.F.17)

= McA→E((Y

TA)

1α ), (6.F.18)

where McA→E = TrB UM

A→BE denotes the complementary map and the lastequality follows by applying the transpose trick in (2.2.30) and the fact thatTrR[ΓRA] = 1A. Then we find that

‖M‖CB,1→α = supYA>0,

Tr[YA]≤1

∥∥∥McA→E((Y

TA)

1α )∥∥∥

α(6.F.19)

= supYA>0,

Tr[YA]≤1

∥∥∥∥McA→E(Y

1αA)

∥∥∥∥α

(6.F.20)

= supYA>0

∥∥McA→E(YA)

∥∥α

‖YA‖α

(6.F.21)

= ‖Mc‖α→α , (6.F.22)

where to obtain the second equality we used (6.F.4)–(6.F.6). So we have that

‖MA→B‖CB,1→α = ‖McA→E‖α→α . (6.F.23)

Finally, for an arbitrary operator YA1 A2 satisfying YA1 A2 > 0, and settingXA1E2 := (Mc

2)A2→E2(YA1 A2), we can write((Mc

1)A1→E1 ⊗ (Mc2)A2→E2

)(YA1 A2)

= (Mc1)A1→E1

((Mc

2)A2→E2(YA1 A2))

(6.F.24)= (Mc

1)A1→E1(XA1E2). (6.F.25)

Then, multiplying and dividing by∥∥XA1E2

∥∥α=∥∥(Mc

2)A2→E2(YA1 A2)∥∥

αgives

∥∥∥((

Mc1)

A1→E1⊗ (Mc

2)A2→E2

)(YA1 A2)

∥∥∥α∥∥YA1 A2

∥∥α

611

Page 623: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

=

∥∥∥(Mc

1)

A1→E1

(XA1E2

)∥∥∥α∥∥XA1E2

∥∥α

∥∥∥(Mc2)A2→E2

(YA1 A2)∥∥∥

α∥∥YA1 A2

∥∥α

(6.F.26)

≤ sup

XA1E2>0

∥∥∥(Mc

1)

A1→E1(XA1E2)

∥∥∥α∥∥XA1E2

∥∥α

× sup

YA1 A2>0

∥∥∥(Mc2)A2→E2

(YA1 A2)∥∥∥

α∥∥YA1 A2

∥∥α

(6.F.27)

=∥∥∥(Mc

1)A1→E1⊗ idE2

∥∥∥α→α

∥∥∥idA1 ⊗ (Mc2)A2→E2

∥∥∥α→α

(6.F.28)

=∥∥∥(Mc

1)A1→E1

∥∥∥α→α

∥∥∥(Mc2)A2→E2

∥∥∥α→α

(6.F.29)

= ‖M1‖CB,1→α ‖M2‖CB,1→α . (6.F.30)

The third equality follows from Lemma 6.28 below. The final equality holdsby (6.F.23). Since YA1 A2 is arbitrary, we find that

‖M1 ⊗M2‖CB,1→α = ‖Mc1 ⊗Mc

2‖α→α (6.F.31)

= supYA1 A2>0

∥∥(Mc1 ⊗Mc

2)(YA1 A2)∥∥

α∥∥YA1 A2

∥∥α

(6.F.32)

≤ ‖M1‖CB,1→α ‖M2‖CB,1→α . (6.F.33)

So we have that ‖M1 ⊗M2‖CB,1→α = ‖M1‖CB,1→α ‖M2‖CB,1→α.

Lemma 6.28

Let M be a completely positive map. Then, for id an arbitrary identitymap, the following equality holds,

‖id⊗M‖α→α = ‖M‖α→α . (6.F.34)

PROOF: The inequality ‖id⊗M‖α→α ≥ ‖M‖α→α immediately follows by re-stricting the optimization on the left-hand side of the inequality. So we nowestablish the non-trivial inequality ‖id⊗M‖α→α ≤ ‖M‖α→α. Letting theidentity map act on a reference system R, consider from (6.F.6) that

‖id⊗M‖α→α = supYRA>0,

Tr[YRA]≤1

∥∥∥∥MA→B(Y1αRA)

∥∥∥∥α

. (6.F.35)

612

Page 624: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Let Vid2R

i=1 denote a set of Heisenberg–Weyl operators acting on the referencesystem R (see (3.2.115)), so that

1d2

R

d2R

∑i=1

ViR(·)(Vi

R)† = Tr[·] 1

dR. (6.F.36)

Then, for an arbitrary YRA > 0 satisfying Tr[YRA] ≤ 1, we use the unitaryinvariance of the Schatten norm to obtain

∥∥∥∥MA→B(Y1αRA)

∥∥∥∥α

=1

d2R

d2R

∑i=1

∥∥∥∥ViRMA→B(Y

1αRA)(V

iR)

†∥∥∥∥

α

(6.F.37)

=1

d2R

d2R

∑i=1

∥∥∥MA→B((ViRYRA(Vi

R)†)

1α )∥∥∥

α(6.F.38)

∥∥∥∥∥∥∥MA→B

1

d2R

d2R

∑i=1

ViRYRA(Vi

R)†

∥∥∥∥∥∥∥α

(6.F.39)

=∥∥∥MA→B

([πR ⊗YA]

)∥∥∥α

, (6.F.40)

where the inequality follows from Lemma 6.29 below, which states that thefunction X 7→

∥∥∥M(X1α )∥∥∥

αis concave for all α > 1. The last equality follows

from (3.2.122), with πR = 1R|R| the maximally mixed state and YA = TrR[YRA].

Continuing, we find that∥∥∥MA→B

([πR ⊗YA]

)∥∥∥α=

∥∥∥∥MA→B

1αR ⊗Y

1αA

)∥∥∥∥α

(6.F.41)

=

∥∥∥∥π1αR ⊗MA→B(Y

1αA)

∥∥∥∥α

(6.F.42)

=

∥∥∥∥π1αR

∥∥∥∥α

∥∥∥∥MA→B(Y1αA)

∥∥∥∥α

(6.F.43)

=

∥∥∥∥MA→B(Y1αA)

∥∥∥∥α

(6.F.44)

≤ supYA>0,

Tr[YA]≤1

∥∥∥∥MA→B(Y1αA)

∥∥∥∥α

(6.F.45)

= ‖M‖α→α . (6.F.46)

613

Page 625: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Since the inequality holds for arbitrary YRA > 0 satisfying Tr[YRA] ≤ 1, wefind that

‖id⊗M‖α→α ≤ ‖M‖α→α , (6.F.47)

concluding the proof.

Lemma 6.29

Let X be a positive semi-definite operator, and let M be a completelypositive map. For α > 1, the following function is concave:

X 7→∥∥∥M(X

1α )∥∥∥

α. (6.F.48)

PROOF: Since M is completely positive, it has a Kraus representation as

M(Z) = ∑i

MiZM†i . (6.F.49)

From Proposition 2.12, consider that∥∥∥M(X

1α )∥∥∥

α= sup

Y>0,‖Y‖ α

α−1≤1

Tr[M(X

1α )Y

](6.F.50)

= supY>0,

Tr[Y]≤1

Tr[M(X

1α )Y

α−1α

](6.F.51)

= supY>0,

Tr[Y]≤1

∑i

Tr[

MiX1α M†

i Yα−1

α

]. (6.F.52)

The Lieb concavity theorem (see Theorem 6.30 below) is the statement that thefollowing function is jointly concave with respect to positive semi-definite Rand S for arbitrary t ∈ (0, 1) and an arbitrary operator K:

(R, S) 7→ Tr[KRtK†S1−t]. (6.F.53)

Let X0, X1 ≥ 0 and let Y0, Y1 > 0 be such that Tr[Y0], Tr[Y1] ≤ 1. Then forλ ∈ (0, 1], and defining

Xλ := λX0 + (1− λ) X1, Yλ := λY0 + (1− λ)Y1, (6.F.54)

we find that

λTr[M(X

1α0 )Y

α−1α

0

]+ (1− λ)Tr

[M(X

1α1 )Y

α−1α

1

]

614

Page 626: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

= ∑i

λTr[

MiX1α0 M†

i Yα−1

α0

]+ ∑

i(1− λ)Tr

[MiX

1α1 M†

i Yα−1

α1

](6.F.55)

= ∑i

λTr[

MiX1α0 M†

i Yα−1

α0

]+ (1− λ)Tr

[MiX

1α1 M†

i Yα−1

α1

](6.F.56)

≤∑i

(Tr[

MiX1αλ M†

i Yα−1

αλ

])(6.F.57)

= Tr[M(X

1αλ )Y

α−1α

λ

](6.F.58)

≤ supYA>0,

Tr[YA]≤1

Tr[M(X

1αλ )Y

α−1α

A

]=

∥∥∥∥M(X1αλ )

∥∥∥∥α

, (6.F.59)

where the first inequality follows from an application of the Lieb concavitytheorem, and the second inequality follows from applying (6.F.52) and be-cause Yλ is a particular operator satisfying Yλ > 0 and Tr[Yλ] ≤ 1. Since thechain of inequalities holds for arbitary Y0, Y1 > 0 such that Tr[Y0], Tr[Y1] ≤ 1,we conclude that

λ

∥∥∥∥M(X1α0 )

∥∥∥∥α

+ (1− λ)

∥∥∥∥M(X1α1 )

∥∥∥∥α

≤∥∥∥∥M(X

1αλ )

∥∥∥∥α

, (6.F.60)

which concludes the proof.

Theorem 6.30 Lieb Concavity

The following function is jointly concave with respect to positive semi-definite operators R and S for arbitrary t ∈ (0, 1) and an arbitrary oper-ator K:

(R, S) 7→ Tr[KRtK†S1−t]. (6.F.61)

PROOF: We begin by restricting the first argument of the function in (6.F.61)to positive definite operators. Defining |K〉RA = K†

A|Γ〉RA, consider that

Tr[KRtK†S1−t] = 〈Γ|RA1R ⊗(

KRtK†S1−t)

A|Γ〉RA (6.F.62)

= 〈Γ|RA(STR)

1−t ⊗ KARtAK†

A|Γ〉RA (6.F.63)

= 〈Γ|RAKA

[(ST

R)1−t ⊗ Rt

A

]K†

A|Γ〉RA (6.F.64)

= 〈K|RAR12A

[(ST

R)1−t ⊗ Rt−1

A

]R

12A|K〉RA (6.F.65)

615

Page 627: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

= 〈K|RAR12A

[ST

R ⊗ R−1A

]1−tR

12A|K〉RA (6.F.66)

= 〈K|RAR12Ag(ST

R ⊗ R−1A )R

12A|K〉RA, (6.F.67)

where the fourth equality holds by the positive definiteness of R, and whereg(x) := x1−t is an operator concave function. For λ ∈ [0, 1], let

Rλ := λR0 + (1− λ) R1, Sλ := λS0 + (1− λ) S1, (6.F.68)

where R0 and R1 are positive definite and S0 and S1 are positive semi-definite.Also, let

G0 := 1⊗√

λR0 (Rλ)− 1

2 , (6.F.69)

G1 := 1⊗√(1− λ) R1 (Rλ)

− 12 . (6.F.70)

Then

G†0 G0 + G†

1 G1 = 1⊗ (Rλ)− 1

2 λR0 (Rλ)− 1

2

+ 1⊗ (Rλ)− 1

2 (1− λ) R1 (Rλ)− 1

2 (6.F.71)

= 1⊗ (Rλ)− 1

2 Rλ (Rλ)− 1

2 (6.F.72)= 1⊗ 1. (6.F.73)

A variation of the operator Jensen inequality (Theorem 2.15) is that the follow-ing inequality holds for an operator concave function f , a finite set Xii ofHermitian operators, and a finite set Aii of operators satisfying ∑i A†

i Ai =1:

∑i

A†i f (Xi)Ai ≤ f

(∑

iA†

i Xi Ai

). (6.F.74)

Then from the operator Jensen inequality and (6.F.67), we conclude that

λTr[KRt0K†S1−t

0 ] + (1− λ)Tr[KRt1K†S1−t

1 ]

= λ〈K|RA (R0)12A g(ST

0 ⊗ R−10 ) (R0)

12A |K〉RA

+ (1− λ) 〈K|RA (R1)12A g(ST

1 ⊗ R−11 ) (R1)

12A |K〉RA (6.F.75)

= λ〈K|RA (R0)12A g(λST

0 ⊗ (λR0)−1) (R0)

12A |K〉RA

+ (1− λ) 〈K|RA (R1)12A g((1− λ) ST

1 ⊗ ((1− λ) R1)−1) (R1)

1/2A |K〉RA

(6.F.76)

616

Page 628: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

= 〈K|RA (Rλ)12A G†

0 g(λST0 ⊗ (λR0)

−1)G0 (Rλ)12A |K〉RA

+ 〈K|RA (Rλ)12A G†

1 g((1− λ) ST1 ⊗ ((1− λ) R1)

−1)G1 (Rλ)12A |K〉RA (6.F.77)

≤ 〈K|RA (Rλ)12A g(L) (Rλ)

12A |K〉RA, (6.F.78)

where to obtain the third equality we used the fact that 1R ⊗√

λR120 = 1R ⊗

(Rλ)12 G†

0 . In the last line, we have let

L := G†0

(λST

0 ⊗ (λR0)−1)

G0

+ G†1

((1− λ) ST

1 ⊗ ((1− λ) R1)−1)

G1. (6.F.79)

Consider that

L = G†0

(λST

0 ⊗ (λR0)−1)

G0 + G†1

((1− λ) ST

1 ⊗ ((1− λ) R1)−1)

G1

=(

1⊗ (Rλ)− 1

2√

λR0

) (λST

0 ⊗ (λR0)−1) (

1⊗√

λR0 (Rλ)− 1

2)

+

(1⊗ (Rλ)−

12

√(1− λ) R1

)

×((1− λ) ST

1 ⊗ ((1− λ) R1)−1)(

1⊗√(1− λ) R1 (Rλ)

− 12

)

(6.F.80)

= λST0 ⊗ (Rλ)

−1 + (1− λ) ST1 ⊗ (Rλ)

−1 (6.F.81)

= STλ ⊗ (Rλ)

−1 . (6.F.82)

Continuing, we find that

λTr[KRt0K†S1−t

0 ] + (1− λ)Tr[KRt1K†S1−t

1 ]

≤ 〈K|RA (Rλ)12A g (L) (Rλ)

12A |K〉RA (6.F.83)

= 〈K|RA (Rλ)12A g(ST

λ ⊗ (Rλ)−1) (Rλ)

12A |K〉RA (6.F.84)

= Tr[KRtλK†S1−t

λ ]. (6.F.85)

So the function (R, S) 7→ Tr[KRtK†S1−t] is jointly concave when the first ar-gument is restricted to be a positive definite operator. The more general caseof positive semi-definite operators in the first argument can be established byadding ε1 to any positive semi-definite operator to ensure that it is positivedefinite, applying the above inequality, and then taking the limit ε→ 0 at theend. This concludes the proof.

617

Page 629: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

Appendix 6.G The Strong Converse from a Differ-ent Point of View

Here we show that the mutual information I(N) is a strong converse ratebased on the the alternate definition given in Appendix A. According to thatdefinition, a rate R ∈ R+ is a strong converse rate for entanglement-assistedclassical communication over a channel N if for every sequence (n, |Mn|, εn)n∈N

of (n, |M|, ε) entanglement-assisted classical communication protocols over nuses of N, we have that lim infn→∞

1n log2 |Mn| > R⇒ limn→∞ εn = 1.

Let us show that the mutual information I(N) of the channel N is a strongconverse rate under this alternate definition. Let (n, |Mn| , εn)n∈N be a se-quence of protocols satisfying lim infn→∞

1n log2 |Mn| > I(N). Due to this

strict inequality, the fact that limα→1 Iα(N) = I(N), and since the sandwichedRényi mutual information Iα(N) is monotonically increasing in α (this followsfrom Proposition 4.29), there exists a value α∗ > 1 such that

lim infn→∞

1n

log2 |Mn| > Iα∗(N). (6.G.1)

Now recall the following bound from (6.2.92), which holds for all α > 1 andfor every (n, |M| , ε) protocol:

1n

log2 |M| ≤ Iα(N) +α

n (α− 1)log2

(1

1− ε

). (6.G.2)

We can apply it in our case to conclude that

1n

log2|Mn| ≤ Iα∗(N) +α∗

n (α∗ − 1)log2

(1

1− εn

). (6.G.3)

Now suppose thatlim inf

n→∞εn = c ∈ [0, 1). (6.G.4)

Then it follows that

lim infn→∞

1n

log2|Mn| ≤ lim infn→∞

[Iα∗(N) +

α∗

n (α∗ − 1)log2

(1

1− εn

)](6.G.5)

= Iα∗(N) + lim infn→∞

[α∗

n (α∗ − 1)log2

(1

1− εn

)](6.G.6)

= Iα∗(N), (6.G.7)

618

Page 630: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

where the last equality follows because α∗ > 1 is a constant and the sequenceεnn∈N converges to a constant c ∈ [0, 1). However, this contradicts (6.G.1).Thus, (6.G.4) cannot hold, and so we conclude that lim infn→∞ εn = 1.

The argument given above makes no statement about how fast the errorprobability converges to one in the large n limit. If we fix the rate R of com-munication to be a constant satisfying R > I(N), then we can argue that theerror probability converges exponentially fast to one. To this end, consider asequence (n, 2nR, εn)n∈N of (n, |M|, ε) protocols, with each element of thesequence having an arbitrary (but fixed) rate R > I(N). For each element ofthe sequence, the inequality in (6.2.92) holds, which means that

R ≤ Iα(N) +α

n(α− 1)log2

(1

1− εn

)(6.G.8)

for all α > 1. Rearranging this inequality leads to the following lower boundon the error probabilities εn:

εn ≥ 1− 2−n( α−1α )(R− Iα(N)) (6.G.9)

for all α > 1. Now, since R > I(N), limα→1 Iα(N) = I(N), and since thesandwiched Rényi mutual information Iα(N) is monotonically increasing inα (this follows from Proposition 4.29), there exists an α∗ > 1 such that R >

Iα∗(N). Applying the inequality in (6.G.9) to this value of α, we find that

εn ≥ 1− 2−n(

α∗−1α∗)(R− Iα∗ (N)). (6.G.10)

Then, taking the limit n → ∞ on both sides of this inequality, we concludethat limn→∞ εn = 1 and the convergence to one is exponentially fast.

From the arguments above, we find not only that I(N) is a strong converserate according to the alternate definition provided in Appendix A, but alsothat the maximal error probability of every sequence of (n, |M|, ε) entanglement-assisted classical communication protocols with fixed rate strictly above themutual information I(N) approaches one at an exponential rate.

In Section 6.C, we showed that the error probability vanishes in the limitn → ∞ for every fixed rate R < I(N). We thus see that, as n → ∞, themutual information I(N) is a sharp dividing point between reliable, error-free communication and communication with error probability approachingone exponentially fast. This situation is depicted in Figure 6.8.

619

Page 631: Principles of Quantum Communication Theory: A Modern Approach

Chapter 6: Entanglement-Assisted Classical Communication

n→ ∞

Rate, Rn

ErrorProbability,

εn

I(N)0

1

FIGURE 6.8: The error probability εn as a function of the rate Rn forentanglement-assisted classical communication over a quantum chan-nel N. As n→ ∞, for every rate below the mutual information I(N), thereexists a sequence of protocols with error probability converging to zero.For every rate above the mutual information I(N), the error probabilityconverges to one for all possible protocols.

620

Page 632: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7

Classical Communication

We now move on to classical communication over quantum channels. Un-like the previous chapter, here we suppose that Alice and Bob do not haveaccess to shared entanglement prior to communication. Thus, the scenarioconsidered in this chapter is more practical than the entanglement-assistedsetting—in the previous chapter, we made the simplifying assumption thatshared entanglement is available for free to the sender and receiver. How-ever, without widespread entanglement-sharing networks available, this as-sumption is not really practical, and so the entanglement-assisted capacity ismostly of academic interest at the moment.

Without shared entanglement available to the sender and receiver, is it stilladvantageous to use a quantum strategy to send classical information over aquantum channel? At first glance, it may seem that, without prior sharedentanglement, there might not be any point in using a quantum strategy tosend classical information over a quantum channel. However, when usinga channel multiple times, there is still the possibility of encoding a messageinto a state at the encoder that is entangled across multiple channel uses andthen performing a collective measurement at the decoder. For many exam-ples of channels, it is known that collective measurements can enhance com-munication capacity, and it is known that in principle, there exists a channelfor which entangled states at the encoder provides a further enhancement tocommunication capacity.

Although it may seem that determining the maximum amount of classi-cal information that can be communicated using a given quantum channel,i.e., determining the classical capacity of a quantum channel, might be easier

621

Page 633: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

than its entanglement-assisted counterpart, this problem turns out to be oneof the most challenging problems in quantum Shannon theory. This is dueto the fact that, as we discuss in this chapter, the relevant quantity in calcu-lating the classical capacity of a quantum channel N is related to its Holevoinformation χ(N), and this quantity is not known to be additive for all quan-tum channels. This means that the best we can say for a given channel is thatits Holevo information is an achievable rate for classical communication—wecannot necessarily say that it is the highest possible achievable rate. This is instark contrast to the case of entanglement-assisted classical communication,for which we know that the mutual information I(N) of a channel is addi-tive for all channels and thus is equal to the entanglement-assisted classicalcapacity of any quantum channel.

We start in the next brief section by considering some simple motivatingexamples of communication. Then we consider the one-shot setting. This set-ting for classical communication is similar to the one-shot setting of entanglement-assisted classical communication from the previous chapter. The only differ-ence is that, in this case, there is no entanglement assistance. We then moveon to the asymptotic setting, for which we prove Theorem 7.13, which statesthat the classical capacity of a quantum channel is equal to the regularizedHolevo information of the channel. This is a quantity that in general requirescomputing the Holevo information for an arbitrarily large number of uses ofthe given channel, and it is therefore intractable unless the Holevo informa-tion happens to be additive for the channel. From here, we consider variousclasses of channels for which the Holevo information is additive or for whichwe can establish the strong converse. We also consider various methods forbounding the classical capacity from above. Finally, we calculate the classicalcapacity for some examples of channels.

Simple Example of Classical Communication Over a Quantum Channel

At the beginning of the previous chapter, we stated that super-dense cod-ing is a simple example of an entanglement-assisted classical communica-tion protocol over a noiseless quantum channel. The essense of that proto-col is the encoding of d2 messages into d2 mutually orthogonal pure states(the maximally entangled states defined in (3.3.13)). The d2 messages containlog2 d2 = 2 log2 d bits of classical information, which can be communicatedwithout error from Alice to Bob, with just one use of a noiseless qudit quan-tum channel.

622

Page 634: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Now, without the assistance of prior shared entanglement, one result ofthis chapter is that the maximum amount of classical information that canbe communicated over a noiseless quantum channel without error is log2 d,where d is the dimension of the channel. Let us describe a simple protocolthat achieves this number of communicated bits. Consider a discrete set M ofmessages, and suppose that Alice encodes each message m ∈ M into a quan-tum state |m〉, such that the set |m〉m∈M, is orthonormal, i.e., 〈m|m′〉 = δm,m′for all m, m′ ∈ M. Bob, knowing Alice’s encoding of the messages, devises ameasurement to extract the message described by the POVM |m〉〈m|m∈M.His strategy is to guess that the message sent was “m” if the outcome of hismeasurement is m ∈ M. If Alice sends the state |m〉〈m| through a noiselessquantum channel, then Bob is guaranteed to receive the state |m〉〈m| unal-tered, so that his guess will always be correct. Alice can thus send log2 |M|bits of classical information to Bob without error.

Now, if the channel is noisy, the initially orthogonal states in general be-come non-orthogonal, so that if Alice sends the state |m〉〈m| through the chan-nel then Bob generally receive a mixed state ρm instead. As a consequence ofusing a noisy quantum channel, Bob’s decoding strategy will not always suc-ceed, meaning that there will be errors. In order to mitigate the effects ofnoise, Alice can choose a more clever encoding of the message, and similarlyBob can devise a more clever decoding strategy.1 Alice and Bob can also usethe channel multiple times, which can decrease the error in general, while alsoallowing for the messages to be encoded into higher-dimensional entangledstates.

Observe that the task of classical communication over a quantum channelis closely related to the task of state discrimination (see Section 3.1.3). Re-call that the goal of state discrimination is to minimize the error probabilityfor a given set ρmm∈M of states corresponding to the message set M anda particular decoding POVM Λm

B m∈M indexed by the messages. In classi-cal communication, we focus primarily on maximizing the rate 1

n log2 |M| ofcommunication for a given error probability ε, and we are interested in deter-mining the maximum rate R for which ε vanishes as the number n of channeluses increases.

1We assume, as in all communication tasks considered in this book, that Alice and Bobknow the channel connecting them, so that they can use this knowledge to develop theirencoding and decoding.

623

Page 635: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

E NM 3 mA B

m

Alice Bob

FIGURE 7.1: Depiction of a protocol for classical communication over oneuse of the quantum channel N. Alice, who wishes to send a message mchosen from a set M of messages, first encodes the message into a quan-tum state on a quantum system A, using a classical–quantum encodingchannel E. She then sends the quantum system A through the channelNA→B. After Bob receives the system B, he performs a measurement onit, using the outcome of the measurement to give an estimate m of themessage sent by Alice.

7.1 One-Shot Setting

In the one-shot setting, we start by considering a classical communicationprotocol over a quantum channel N, as depicted in Figure 7.1. The protocol isdefined by the triple (M,EM→A,DB→M), consisting of a message set M, an en-coding channel EM→A, and a decoding channel DB→M. The pair (E,D) of encod-ing and decoding channels is often called a code and denoted by C = (E,D).The encoding channel is a classical–quantum channel (see Definition 3.27),and the decoding channel is a quantum–classical or measurement channel(see Definition 3.28).

Now, given that there are |M| messages in the message set, it follows thateach message can be uniquely associated with a bit string of size at leastlog2 |M|. The quantity log2 |M| thus represents the number of bits commu-nicated in the protocol. One of the goals of this section is to obtain upper andlower bounds on maximum number of log2 |M| of bits that can be communi-cated in any classical communication protocol.

The protocol proceeds as follows: let p : M → [0, 1] be a probability dis-tribution over the message set. With probability p(m), Alice picks a messagem ∈ M and makes a local copy of it. Letting |m〉m∈M be an orthonormalbasis indexed by the messages, her initial state is described by the followingclassically correlated state:

ΦpMM′ := ∑

m∈Mp(m)|m〉〈m|M ⊗ |m〉〈m|M′ . (7.1.1)

624

Page 636: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Note that if Alice wishes to send a particular message m deterministically,then she can choose the distribution p to be the degenerate distribution, equalto one for m and zero for all other messages.

She then uses an encoding channel EM→A to map the message to a quan-tum state ρm

A. We can explicitly define the encoding channel EM′→A as

EM′→A(|m〉〈m′|M′) = δm,m′ρmA ∀ m, m′ ∈M. (7.1.2)

Note that this channel has the form of a classical–quantum channel (recallDefinition 3.27). The action of the encoding channel on the initial state in(7.1.1) is as follows:

EM′→A(ΦpMM′) = ∑

m∈Mp(m)|m〉〈m|M ⊗ EM′→A(|m〉〈m|) (7.1.3)

= ∑m∈M

p(m)|m〉〈m|M ⊗ ρmA (7.1.4)

=: ρpMA. (7.1.5)

Alice then sends the system A through the channel NA→B, resulting in thestate

NA→B(ρMA) = ∑m∈M

p(m)|m〉〈m|M ⊗NA→B(ρmA). (7.1.6)

Bob, whose task is to determine which message Alice sent, performs a de-coding measurement on his received system B, which has the correspondingPOVM Λm

B m∈M. The measurement is associated with the decoding chan-nel DB→M, which is simply a quantum–classical channel as given in Defini-tion 3.28, i.e.,

DB→M(τB) := ∑m∈M

Tr[ΛmB τB]|m〉〈m|M (7.1.7)

for every state τB. So the final state of the protocol is

ωpMM

:= (DB→M NA→B EM′→A)(ΦpMM′) (7.1.8)

= ∑m,m∈M

p(m)|m〉〈m|M ⊗ Tr[ΛmBNA→B(ρA)]|m〉〈m|M. (7.1.9)

The measurement by Bob induces the conditional probability distributionq : M×M→ [0, 1] defined by

q(m|m) := Pr[M = m|M = m] = Tr[ΛmBNA→B(ρ

mA)]. (7.1.10)

625

Page 637: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Bob’s strategy is such that if the outcome m occurs from his measurement,then he guesses that the message sent was m. The probability that Bob cor-rectly identifies a given message m is then equal to q(m|m). The message errorprobability of the code is given by

perr(m, (E,D);N) := 1− q(m|m)

= Tr[(1B −ΛmB )NA→B(ρ

mA)]

= ∑m∈M\m

q(m|m).(7.1.11)

The average error probability of the code is

perr((E,D); p,N) := ∑m∈M

p(m)perr(m, (E,D);N) (7.1.12)

= ∑m∈M

p(m)(1− q(m|m)) (7.1.13)

= ∑m∈M

∑m∈M\m

p(m)q(m|m). (7.1.14)

The maximal error probability of the code is

p∗err(E,D;N) := maxm∈M

perr(m, (E,D);N). (7.1.15)

Just as in the case of entanglement-assisted classical communication in Chap-ter 6, each of these three error probabilities can be used to assess the reliabil-ity of the protocol, i.e., how well the encoding and decoding allows Alice totransmit her message to Bob.

Definition 7.1 (|M|, ε) Classical Communication Protocol

A classical communication protocol (M,EM→A,DB→M) over the chan-nel NA→B is called an (|M|, ε) protocol, with ε ∈ [0, 1], if p∗err(E,D;N) ≤ ε.

As with entanglement-assisted classical communication, the error crite-rion perr(E,D;N)∗ ≤ ε is equivalent to

maxp:M→[0,1]

12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1≤ ε, (7.1.16)

and the steps to show this are the same as those shown in the proof of Lemma 6.2.In particular, the following equality holds

12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1= perr((E,D); p), (7.1.17)

626

Page 638: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

which leads to

p∗err(E,D;N) = maxp:M→[0,1]

perr((E,D); p,N)

= maxp:M→[0,1]

12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1

.(7.1.18)

Also, as in Chapter 6, another way to define the error criterion of the pro-tocol is through a comparator test. Recall that the comparator test is a mea-surement defined by the two-element POVM ΠMM, 1MM − ΠMM, whereΠMM is the projection defined as

ΠMM := ∑m∈M|m〉〈m|M ⊗ |m〉〈m|M. (7.1.19)

Note that Tr[ΠMMωpMM

] is simply the probability that the classical registers

M and M in the state ωpMM

have the same values. In particular, following thesame steps as in (6.1.38)–(6.1.40), we have

Tr[ΠMMω

pMM

]= 1− perr((E,D); p,N) =: psucc((E,D); p,N), (7.1.20)

where we have acknowledged that the expression on the left-hand side can beinterpreted as the average success probability of the code (E,D) and denotedit by psucc((E,D); p,N).

As mentioned at the beginning of this chapter, our goal is to bound (fromabove and below) the maximum number log2 |M| of transmitted bits for everyclassical communication protocol over N. Given an error probability thresh-old of ε, we call the maximum number of transmitted bits the one-shot classicalcapacity of N.

Definition 7.2 One-Shot Classical Capacity of a Quantum Channel

Given a quantum channel NA→B and ε ∈ [0, 1], the one-shot ε-error classi-cal capacity of N, denoted by Cε(N), is defined to be the maximum num-ber log2 |M| of transmitted bits among all (|M|, ε) classical communica-tion protocols over N. In other words,

Cε(N) := sup(M,E,D)

log2 |M| : p∗err(E,D;N) ≤ ε, (7.1.21)

where the optimization is over all protocols (M,EM′→A,DB→M) satisfy-

627

Page 639: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

ing dM′ = dM = |M|.

In addition to finding, for a given ε ∈ [0, 1], the maximum number oftransmitted bits among all (|M|, ε) classical communication protocols overNA→B, we can consider the following complementary problem: for a givennumber of messages |M|, find the smallest possible error probability amongall (|M|, ε) classical communication protocols, which we denote by ε∗C(|M|;N).In other words, to problem is to determine

ε∗C(|M|;N) := infE,Dp∗err(E,D;N) : dM′ = dM = |M|, (7.1.22)

where the optimization is over encoding channels E with input space dimen-sion |M| and decoding channels D with output space dimension |M|. In thisbook, we focus primarily on the problem of optimizing the number of trans-mitted bits rather than the error probability, and so our primary quantity ofinterest is the one-shot capacity Cε(N).

7.1.1 Protocol Over a Useless Channel

We now turn to establishing an upper bound on the one-shot classical capac-ity, and our approach is similar to the approach outlined in Section 6.1.1. Withthis goal in mind, along with the actual classical communication protocol, wealso consider the same protocol but performed over a useless channel as de-picted in Figure 7.2. This useless channel discards the state encoded withthe message and replaces it with some arbitrary (but fixed) state σB. In otherwords,

ρmA 7→ σB =: (PσB Tr)(ρm

A) ∀ m ∈M, (7.1.23)

where the encoded states ρmA are defined in (7.1.2). This channel is useless

because the state σB does not contain any information about the message. Aswith entanglement-assisted classical communication, comparing this proto-col over the useless channel with the actual protocol allows us to obtain anupper bound on the quantity log2 |M|, which we recall represents the num-ber of bits that are transmitted over the channel.

The state at the end of the protocol over the useless channel is the follow-ing tensor-product state:

τpMM

:= ∑m∈M

p(m)|m〉〈m|M ⊗ ∑m∈M

Tr[ΛmB σB]|m〉〈m|M, (7.1.24)

628

Page 640: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

E PσBM 3 mA B

m

Alice Bob

FIGURE 7.2: Depiction of a protocol that is useless for classical communi-cation. The state encoding the message m via E is discarded and replacedwith an arbitrary (but fixed) state σB.

which indicates that the decoded message system M is independent of themessage system M in this case. Now, recall from (7.1.9) that the state ω

pMM

atthe end of the actual protocol over the channel N is given by

ωpMM

= ∑m,m∈M

p(m)Tr[ΛmBNA→B(ρ

mA)]|m〉〈m|M ⊗ |m〉〈m|M. (7.1.25)

Similar to the notation from Chapter 6, we let

ωMM :=1|M| ∑

m,m∈MTr[Λm

BNA→B(ρmA)]|m〉〈m|M ⊗ |m〉〈m|M, (7.1.26)

be the state ωpMM

with the probability distribution p over the message set

equal to the uniform distribution, i.e., p(m) = 1|M| . We also let

ΦMM′ :=1|M| ∑

m∈M|m〉〈m|M ⊗ |m〉〈m|M′ (7.1.27)

be the state in (7.1.1) with p being the uniform distribution over M.

Now, observe that TrM[ωMM] = πM. Also, for every (|M|, ε) classicalcommunication protocol, the condition p∗err(E,D;N) ≤ ε holds. By followingthe same steps as in (6.1.62)–(6.1.65), this condition implies that perr((E,D); p,N) ≤ε for the uniform distribution p. Then, by (7.1.20), we find that

Tr[ΠMMωMM] ≥ 1− ε. (7.1.28)

We can therefore use Lemma 6.4 to conclude that

log2 |M| ≤ IεH(M; M)ω (7.1.29)

for every (|M|, ε) classical communication protocol. This means that, given aparticular choice of the encoding and decoding channels, if p∗err(E,D;N) ≤ ε,

629

Page 641: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

then the upper bound in (7.1.29) is the maximum number of bits that can betransmitted over the channel N. The optimal value of this upper bound isrealized by finding the state σM defining the useless channel that optimizesthe quantity Iε

H(M; M)ω in addition to the measurement that achieves the ε-hypothesis testing relative entropy in (6.1.60). Importantly, a different choiceof encoding and decoding produces a different value for this upper bound.We would thus like to find an upper bound that applies regardless of whichspecific protocol is chosen. In other words, we would like an upper boundthat is a function of the channel N only.

7.1.2 Upper Bound on the Number of Transmitted Bits

We now give a general upper bound on the number of transmitted bits thatcan be communicated in any classical communication protocol. This resultis stated in Theorem 7.4, and the upper bound obtained therein holds inde-pendently of the encoding and decoding channels used in the protocol anddepends only on the given communication channel N.

Let us start with an arbitrary (|M|, ε) classical communication protocolover the channel N, corresponding to, as described at the beginning of thischapter, a message set M, an encoding channel E, and a decoding channel D.The error criterion p∗err(E,D;N) ≤ ε holds by definition of an (|M|, ε) pro-tocol, which implies the upper bound in (7.1.29) for the number log2 |M| oftransmitted bits in any (|M|, ε) classical communication protocol. Using thisupper bound, we obtain the following:

Proposition 7.3 Upper Bound on One-Shot Classical Capacity

Let N be a quantum channel. For every (|M|, ε) classical communicationprotocol over N, with ε ∈ [0, 1], the number of bits transmitted over N isbounded from above by the ε-hypothesis testing Holevo information ofN, as defined in (4.11.93), i.e.,

log2 |M| ≤ χεH(N). (7.1.30)

Therefore,Cε(N) ≤ χε

H(N). (7.1.31)

630

Page 642: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

PROOF: We start with the upper bound in (7.1.29), i.e.,

log2 |M| ≤ IεH(M; M)ω, (7.1.32)

where the state ωMM is defined in (7.1.26). Recall that this bound followsfrom Lemma 6.4. Note that the state ωMM can be written as

ωMM = DB→M(θMB), (7.1.33)

whereθMB :=

1|M| ∑

m∈M|m〉〈m|M ⊗NA→B(ρ

mA). (7.1.34)

Now, from the data-processing inequality for the hypothesis testing rela-tive entropy under the action of the decoding channel DB→M, we find that

IεH(M; M)ω = inf

σ′M

DεH(ωMM‖ωM ⊗ σ′M) ≤ Iε

H(M; B)θ, (7.1.35)

where we have used the fact that θM = πM = ωM. Note that

θMB = NA→B(ρMA), (7.1.36)

where ρMA is the classical–quantum state ρpMA defined in (7.1.5) with p equal

to the uniform probability distribution. Optimizing over all classical–quan-tum states ξMA then leads to

IεH(M; B)θ ≤ sup

ξMA

IεH(M; B)ζ = χε

H(N), (7.1.37)

where ζMB = NA→B(ξMA) and we have used the definition in (4.11.93) forthe ε-hypothesis testing Holevo information of a channel. Note that this opti-mization over all classical–quantum states is effectively an optimization overall possible encoding channels EM′→A that define the (|M|, ε) protocol. Putt-ing everything together, we obtain

log2 |M| ≤ IεH(M; M)ω ≤ Iε

H(M; B)θ ≤ χεH(N), (7.1.38)

as required.

The result of Proposition 7.3 can be written explicitly as

log2 |M| ≤ supρMA

infσB

DεH(NA→B(ρMA)‖ρM ⊗ σB) (7.1.39)

631

Page 643: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

= supρMA

infσB

DεH(NA→B(ρMA)‖RσB

A→B(ρMA)), (7.1.40)

where ρMA is a classical–quantum state. By doing so, we explictly see herethe comparison, via the hypothesis testing relative entropy, between the ac-tual classical communication protocol and the protocol over the useless chan-nels R

σBA→B, labeled by the states σB. The state ρMA corresponds to the state

after the encoding channel, and optimizing over these states is effectively anoptimization over all encoding channels.

As an immediate consequence of Propositions 7.3, 4.67, and 4.68, we havethe following two bounds:

Theorem 7.4 One-Shot Upper Bounds for Classical Communication

Let N be a quantum channel, let ε ∈ [0, 1), and let α > 1. For every(|M|, ε) classical communication protocol over N, the following boundshold

log2 |M| ≤1

1− ε(χ(N) + h2(ε)) , (7.1.41)

log2 |M| ≤ χα(N) +α

α− 1log2

(1

1− ε

), (7.1.42)

where χ(N) is the Holevo information of N, as defined in (4.11.106),and χα(N) is the sandwiched Rényi Holevo information of N, as definedin (4.11.95).

Since the bounds in (7.1.41) and (7.1.42) hold for every (|M|, ε) classicalcommunication protocol over N, we have that

Cε(N) ≤ 11− ε

(χ(N) + h2(ε)), (7.1.43)

Cε(N) ≤ χα(N) +α

α− 1log2

(1

1− ε

)∀ α > 1, (7.1.44)

for all ε ∈ [0, 1).

Let us recap the steps that we took to arrive at the bounds in (7.1.41)and (7.1.42).

1. We first compared the classical communication protocol over the chan-nel N with a protocol over a useless channel, by using the ε-hypothesistesting relative entropy. This led us to the upper bound in (7.1.29).

632

Page 644: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

2. We then used the data-processing inequality for the hypothesis testingrelative entropy to obtain a quantity that is independent of the decodingchannel, and also optimized over all useless protocols. This is done in(7.1.35) in the proof of Proposition 7.3.

3. Finally, to obtain a bound that is a function solely of the channel N andthe error probability, we optimized over all encoding channels to obtainProposition 7.3.

4. Using Propositions 4.67 and 4.68, which relate the hypothesis testing rel-ative entropy to the quantum relative entropy and the sandwiched-Rényirelative entropy, respectively, we arrived at Theorem 7.4.

The bounds in (7.1.41) and (7.1.42) are fundamental upper bounds on thenumber of transmitted bits for every classical communication protocol. A nat-ural question to ask is whether the upper bounds in (7.1.41) and (7.1.42) canbe achieved. In other words, is it possible to devise protocols such that thenumber of transmitted bits is equal to the right-hand side of either (7.1.41)or (7.1.42)? We do not know how to, especially if we demand that we ex-actly attain the right-hand side of either (7.1.41) or (7.1.42). However, whengiven many uses of a channel (in the asymptotic setting), we can come closeto achieving these upper bounds. This motivates finding lower bounds onthe number of transmitted bits.

7.1.3 Lower Bound on the Number of Transmitted Bits

Having obtained upper bounds on the number transmitted bits in the previ-ous section, let us now determine lower bounds. The key result of this sectionis Proposition 7.5, resulting in Theorem 7.6, which contains a lower boundon the number of transmitted bits for every (|M|, ε) classical communicationprotocol.

As we saw in the previous chapter on entanglement-assisted classical co-mmunication, in order to obtain a lower bound on the number of transmittedbits, we should devise an explicit classical communication protocol (M,E,D)such that the maximal error probability satisfies p∗err(E,D;N) ≤ ε for ε ∈ [0, 1].Recall from (7.1.15) that the maximal error probability is defined as

p∗err(E,D;N) = maxm∈M

perr(m, (E,D);N), (7.1.45)

633

Page 645: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

where, for all m ∈ M, the message error probability perr(m; (E,D)) is definedin (7.1.11) as

perr(m, (E,D);N) = 1− q(m|m), (7.1.46)

with q(m|m) being the probability of identifying the message sent as m giventhat the message m was sent.

The classical communication protocol discussed here is related to the enta-nglement-assisted classical communication protocol in Section 6.1.3. We sup-pose that Alice and Bob have some shared randomness prior to communi-cation. This shared randomness is strictly speaking not part of the classi-cal communication protocol as outlined at the beginning of Section 7.1, butthe advantage of using it is that we can directly employ all of the develop-ments for the position-based coding and sequential decoding strategy fromSection 6.1.3. We then perform what is called derandomization and expurga-tion (both of which we outline below) ultimately to remove this shared ran-domness from the protocol and thus obtain the desired lower bound on thenumber of transmitted bits for the true unassisted classical communicationprotocol.

Proposition 7.5 Lower Bound on One-Shot Classical Capacity

Let NA→B be a quantum channel. For all ε ∈ (0, 1) and η ∈(0, ε

2), there

exists an (|M|, ε) classical communication protocol over NA→B such that

log2 |M| = χε2−ηH (N)− log2

(4ε

η2

). (7.1.47)

Consequently, for all ε ∈ (0, 1) and η ∈(0, ε

2),

Cε(N) ≥ χε2−ηH (N)− log2

(4ε

η2

). (7.1.48)

Here,χε

H(N) := supρXA

IεH(X; B)ω, (7.1.49)

where ωXB = NA→B(ρXA), the state ρXA is a classical–quantum state,and

IεH(X; B)ω = Dε

H(ωXB‖ωX ⊗ωB). (7.1.50)

634

Page 646: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

REMARK: The quantity χεH(N) defined in the statement of Proposition 7.5 above is sim-

ilar to the quantity χεH(N) defined in (4.11.93), except that it is defined with respect to

the mutual information IεH(X; B)ρ that we encountered in Proposition 6.8, which does not

involve an optimization over states σB.

PROOF: As described before the statement of the proposition, we start witha protocol based on randomness-assisted classical communication, in whichAlice and Bob have shared randomness prior to communication via the stateρ⊗|M′|XB′ , where M′ is a message set and ρXB′ is the following classically corre-

lated state:ρXB′ := ∑

x∈Xr(x)|x〉〈x|X ⊗ |x〉〈x|B′ . (7.1.51)

Here, X is a finite alphabet and r : X → [0, 1] is a probability distributionon X. The system X is held by Alice and the system B′ is held by Bob. Wedenote the encoding and decoding channels for this protocol by E′ and D′,respectively, and they correspond to the position-based coding and sequentialdecoding strategies developed in Section 6.1.3. The goal is to use this protocolto determine the existence of encoding and decoding channels E and D for an(|M|, ε) classical communication protocol.

As a first step, Alice processes each of her X systems with a classical–quantum channel x 7→ ρx

A (see Definition 3.27), so that the state shared by

them becomes ρ⊗|M′|A′B′ , where

ρA′B′ = ∑x∈X

r(x)ρxA′ ⊗ |x〉〈x|B′ . (7.1.52)

Just as in Section 6.1.3, the rest of the encoding channel E′ is defined such thatif Alice wishes to send the message m ∈ M, then she sends the mth A systemthrough the channel. Thus, the state shared by Alice and Bob becomes

ρA′1B′1⊗ · · · ⊗NA→B(ρAB′m)⊗ · · · ⊗ ρA′|M′ |B

′|M′ |

(7.1.53)

for all m ∈M′, where

ρA′iB′i

:= ∑x∈X

r(x)ρxA′i⊗ |x〉〈x|B′i . (7.1.54)

The reduced state on Bob’s systems is then

τB′1···B′m···B′|M|B = ρB′1⊗ · · · ⊗NA→B(ρAB′m)⊗ · · · ⊗ ρB′|M′ |

. (7.1.55)

635

Page 647: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

In particular, we have that

τB′mB =

NA′→B(ρA′B′) if m = m,

ρB′ ⊗NA′→B(ρA′) if m 6= m, (7.1.56)

for all m ∈ M′. Bob’s task is to perform a test to guess which of the twostates NA′→B(ρA′B′) and ρB′ ⊗NA′→B(ρA′) he has on B and the |M′| systemsB′1 · · · B′|M′|. Since the shared state in the proof of Proposition 6.8 is arbitrary,we can apply all the arguments in that proof with the state ρA′B′ defined in(7.1.54) above to conclude immediately via (6.1.125) that

perr(m, (E′,D′);N) ≤ ε (7.1.57)

for all m ∈M′, provided that

log2 |M′| = Iε−ηH (B′; B)ξ − log2

(4ε

η2

), (7.1.58)

where ξB′B := NA′→B(ρA′B′). Note that ξB′B is a classical–quantum state,which means that Iε−η

H (B′; B)ξ = χε−ηH (B′; B)ξ . Furthermore, by taking a

supremum over every input ensemble (r(x), ρxA)x∈X, we find that

log2 |M′| = χε−ηH (N)− log2

(4ε

η2

). (7.1.59)

Combining (7.1.57) and (7.1.59), we can already conclude that, when sharedrandomness is available for free, a lower bound on the number of transmittedbits is given by (7.1.59). The condition in (7.1.57) on the message error prob-ability implies that the average error probability perr((E

′,D′); p), with p theuniform distribution over M′, satisfies

perr((E′,D′); p,N) = ∑

m∈M

1|M′| perr(m, (E′,D′),N) ≤ ε. (7.1.60)

Let us now use the expression in (6.1.103) to derive an exact expressionfor the average error probability. The expression in (6.1.103) is

perr(m, (E,D);N)

= 1− Tr[PmPm−1 · · · P1ωmB′1···B′|M|BR1···R|M| P1 · · · Pm−1Pm]

(7.1.61)

636

Page 648: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

for all m ∈ M′. First, observe that both NA′→B(ρA′B′) and ρB′ ⊗NA′→B(ρA′)are classical–quantum states. This means that, for every measurement opera-tor ΛBB′ , we find that

Tr[ΛBB′NA′→B(ρB′A′)] = ∑x∈X

r(x)Tr[ΛB′B(|x〉〈x|B′ ⊗ ρxB)] (7.1.62)

= ∑x∈X

r(x)Tr[(〈x|B′ ⊗ 1B)ΛB′B(|x〉B′ ⊗ 1B)ρxB] (7.1.63)

= ∑x∈X

r(x)Tr[MxBρx

B], (7.1.64)

where ρxB = NA′→B(ρ

xA′), and we have defined the operators

MxB := TrB′ [(|x〉〈x|B′ ⊗ 1B′)ΛB′B]. (7.1.65)

Similarly, letting ρB := ∑x∈X r(x)ρxB, we find that

Tr[ΛB′B(ρB′ ⊗NA′→B(ρA′))] = ∑x∈X

r(x)Tr[ΛB′B(|x〉〈x|B′ ⊗ ρB)] (7.1.66)

= ∑x∈X

r(x)Tr[MxBρB]. (7.1.67)

This implies that the measurement operator Λ∗B′B that achieves the optimalvalue for the quantity Dε−η

H (NA′→B(ρA′B′)‖ρB′ ⊗NA′→B(ρA′)) can be taken tohave the form

Λ∗B′B = ∑x∈X|x〉〈x|B′ ⊗Mx

B. (7.1.68)

Now using the fact that√

Λ∗B′B = ∑x∈X|x〉〈x|B′ ⊗

√Mx

B, (7.1.69)

the projectors ΠB′BR defined in (6.1.115) have the following form:

ΠB′BR = ∑x∈X|x〉〈x|B′ ⊗Πx

BR, (7.1.70)

where R is a reference system held by Bob to help with the decoding, and theprojector Πx

BR is given by

ΠxBR := (Ux

BR)† (1B ⊗ |1〉〈1|R)Ux

BR, (7.1.71)

UxBR :=

√1B −Mx

B ⊗ (|0〉〈0|R + |1〉〈1|R)

637

Page 649: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

+√

MxB ⊗ (|1〉〈0|R − |0〉〈1|R) . (7.1.72)

This in turn implies that the measurement operators Pi, which are used forthe sequential decoding and are defined in (6.1.99), have the form

Pi = ∑x1,...,x|M′ |∈X

|x〉〈x|B′1···B′|M′ | ⊗ Pxii , (7.1.73)

where |x〉 ≡ |x1, . . . , x|M′|〉 and

Pxii := 1R1 ⊗ · · · ⊗ 1Ri−1 ⊗Πxi

BRi⊗ 1Ri+1 ⊗ · · · ⊗ 1R|M′ | . (7.1.74)

Finally, since we can write the state τB′1···B′m···B′|M′ |Bin (7.1.55) as

τB′1···B′|M′ |B= ∑

x1,...,x|M′ |∈Xr(x1) · · · r(x|M′|)|x〉〈x|B′1···B′|M′ | ⊗ ρxm

B , (7.1.75)

we find that

ωB′1···B′|M′ |BR1···R|M′ |:= τB′1···B′|M′ |B

⊗ |0〉〈0|R1···R|M′ | (7.1.76)

= ∑x1,...,x|M′ |∈X

r(x1) · · · r(x|M′|)|x〉〈x|B′1···B′|M′ | ⊗ (ρxmB ⊗ |0〉〈0|R1···R|M′ |),

(7.1.77)

where |0〉 ≡ |0, . . . , 0〉. Therefore, by definition,

perr(m; (E′,D′))

= 1− Tr[PmPm−1 · · · P1ωB′1···B′|M′ |BR1···R|M′ | P1 · · · Pm−1Pm] (7.1.78)

= ∑x1,...,x|M′ |∈X

[r(x1) · · · r(x|M′|)

×(

1− Tr[Ωxmm (ρxm

B ⊗ |0〉〈0|R1···R|M′ |)])]

, (7.1.79)

for all m ∈M′, where

Ωxmm := Px1

1 · · · Pxm−1m−1 Pxm

m Pxm−1m−1 · · · P

x11 . (7.1.80)

Therefore, the average error probability is bounded as

perr((E′,D′); p)

638

Page 650: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

= ∑m∈M′

1|M′| ∑

x1,...,x|M′ |∈X

[r(x1) · · · r(x|M′|)

×(

1− Tr[Ωxmm (ρxm

B ⊗ |0〉〈0|R1···R|M′ |)])]

(7.1.81)

≤ ∑m∈M′

1|M′| ∑

x1,...,x|M′ |∈X

[r(x1) · · · r(x|M′|)

×(

γITr[(IB −MxmB )ρxm

B ] + γII

m−1

∑i=1

Tr[MxiB ρxm

B ]

)](7.1.82)

≤ ε, (7.1.83)

where γI := 1 + c and γII := 2 + c + c−1, with c = η2ε−η , and the inequality

on the last line holds due to (7.1.60). Exchanging the sum over M′ with thesum over the elements x1, . . . , x|M′| of X (in the spirit of the famous trick ofShannon), we find that

∑x1,...,x|M′ |∈X

r(x1) · · · r(x|M′|)perr(C; p)

≤ ∑x1,...,x|M′ |∈X

r(x1) · · · r(x|M′|)uerr(C; p) ≤ ε, (7.1.84)

where

perr(C; p,N) := ∑m∈M′

1|M′|

(1− Tr[Ωxm

m (ρxmB ⊗ |0〉〈0|R1···R|M′ |)]

)(7.1.85)

is the average error probability under a code C in which each message m isencoded as m 7→ xm 7→ ρxm

A and

uerr(C; p,N) := ∑m∈M′

1|M′|

(γI Tr[(IB −Mxm

B )ρxmB ] + γII

m−1

∑i=1

Tr[MxiB ρxm

B ]

)

(7.1.86)is an upper bound on the average error probability perr(C; p,N). The decod-ing is defined by the measurement operators Ωxm

m m∈M′ . Note that the codeC is a random variable, in the sense that the string x1, . . . , x|M′| of length |M′|is used for the encoding and decoding with probability r(x1) · · · r(x|M′|).

Since the minimum does not exceed the average, the inequality in (7.1.84)implies that there exists a code C∗, with corresponding string x∗1 , . . . , x∗|M′|,such that

uerr(C∗; p,N) ≤ ε, (7.1.87)

639

Page 651: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

and in turn, via Theorem 6.7, that

perr(C∗; p,N) ≤ uerr(C

∗; p,N) ≤ ε. (7.1.88)

By choosing this particular code, we can now follow through the entire ar-gument above without the shared randomness (in the form of the state ρA′B′)in order to conclude that with the code C∗, the number of transmitted bits isgiven by (7.1.59), and the average error probability of the code is boundedfrom above by ε. This completes the derandomization part of the proof.

Finally, we are interested in a code, call it (E,D), satisfying the maximalerror probability criterion p∗err(E,D;N) ≤ ε instead of the average error prob-ability criterion. To find such a code, we can apply expurgation to the code C∗

defined above. Formally, this means the following: since we have a code sat-isfying (7.1.87), by Markov’s inequality (see (2.3.19)), half of the codewords inC∗ (call them c1, . . . , c |M′ |

2) satisfy

uerr(m;C∗) ≤ 2ε, (7.1.89)

for all m ∈M′ corresponding to the codewords c1, . . . , c |M′ |2

, where

uerr(m,C∗;N) := γITr[(IB −MxmB )ρxm

B ] + γII

m−1

∑i=1

Tr[MxiB ρxm

B ]. (7.1.90)

We thus define a new message set M ⊂ M′, with |M| = |M′|2 , by removing all

but those messages in M′ whose encodings are given by c1, . . . , c |M′ |2

. Let C de-

note the expurgated code. Due to the fact that all of the terms in uerr(m,C∗;N)are non-negative, we find for all m ∈M that

uerr(m,C;N) ≤ uerr(m,C∗;N), (7.1.91)

where

uerr(m,C;N) := γITr[(IB −McmB )ρcm

B ] + γII

m−1

∑i=1

Tr[MciB ρcm

B ]. (7.1.92)

Again applying the quantum union bound (Theorem 6.7), we then find that

perr(m,C;N) ≤ uerr(m,C;N), (7.1.93)

where

640

Page 652: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

perr(m,C;N) := 1−Tr[Pcm

m Pcm−1m−1 · · · P

c11 (ρcm

B ⊗ |0〉〈0|R1···R|M′ |)Pc11 · · · P

cm−1m−1 Pcm

m ]. (7.1.94)

We thus have a code (E,D) satisfying

p∗err(E,D;N) ≤ 2ε. (7.1.95)

Specifically, the encoding is given by m 7→ cm for all m ∈ M, and the decod-ing is given by the sequential decoding procedure consisting of sequentiallyapplying the binary measurements Pcm

m , Pcmm for all m ∈M and decoding as

message m as soon as the outcome Pcmm occurs.

Therefore, we can use (7.1.59) to obtain the following for the numberlog2 |M| of transmitted bits with the reduced message set:

log2 |M| = log2

( |M′|2

)= log2 |M′| − log2(2) (7.1.96)

= χε−ηH (N)− log2

(4ε

η2

)− log2(2) (7.1.97)

= χε−ηH (N)− log2

(8ε

η2

). (7.1.98)

Since ε and η are arbitrary, we have shown that for all ε ∈ (0, 1) and η ∈(0, ε), there exists an (|M|, 2ε) classical communication protocol satisfyinglog2 |M| = χ

ε−ηH (N) − log2

(8εη2

). By the substitution 2ε → ε, we can finally

say that for all ε ∈ (0, 1) and η ∈(0, ε

2), there exists an (|M|, ε) classical com-

munication protocol satisfying

log2 |M| = χε2−ηH (N)− log2

(4ε

η2

). (7.1.99)

This concludes the proof.

An immediate consequence of Propositions 7.5 and 4.69 is the followingtheorem.

Theorem 7.6 One-Shot Lower Bounds for Classical Communication

Let NA→B be a quantum channel. For all ε ∈ (0, 1), η ∈(0, ε

2), and

α ∈ (0, 1), there exists an (|M|, ε) classical communication protocol over

641

Page 653: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

NA→B such that

log2 |M| ≥ χα(N) +α

α− 1log2

(1

ε2 − η

)− log2

(4ε

η2

). (7.1.100)

Here,χα(N) := sup

ρXA

Iα(X; B)ω, (7.1.101)

where ωXB = NA→B(ρXA), the state ρXA is a classical–quantum state,and

Iα(X; B)ω := Dα(ωXB‖ωX ⊗ωB). (7.1.102)

REMARK: The quantity χα(N) defined in the statement of Theorem 7.6 above is similar tothe quantity χα(N) defined in (4.11.94), except that it is defined with respect to the mutualinformation Iα(X; B)ω that we encountered in Theorem 6.9, which does not involve anoptimization over states σB.

PROOF: From Proposition 7.5, we know that for all ε ∈ (0, 1) and η ∈(0, ε

2),

there exists an (|M|, ε) classical communication protocol such that

log2 |M| = χε2−ηH (N)− log2

(4ε

η2

). (7.1.103)

Proposition 4.69 relates the hypothesis testing relative entropy to the Petz–Rényi relative entropy according to

DεH(ρ‖σ) ≥ Dα(ρ‖σ) +

α

α− 1log2

(1ε

)(7.1.104)

for all α ∈ (0, 1), which implies that

χεH(N) ≥ χα(N) +

α

α− 1log2

(1ε

). (7.1.105)

Combining this inequality with (7.1.103), we immediately get the desired re-sult.

Since the inequality in (7.1.100) holds for every (|M|, ε) classical commu-nication protocol, we have that

Cε(N) ≥ χα(N) +α

α− 1log2

(1

ε2 − η

)− log2

(4ε

η2

)(7.1.106)

for all α ∈ (0, 1), ε ∈ (0, 1), and η ∈(0, ε

2).

642

Page 654: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

E

N

N

...

N

N

M 3 m

A1

A2

...An−1

An

B1

B2

...Bn−1

Bn

m

Alice Bob

FIGURE 7.3: The most general classical communication protocol over amultiple number n ≥ 1 uses of a quantum channel N. Alice, who wishesto send a message m selected from a set M, first encodes the message into aquantum state on n quantum systems using a classical–quantum encodingchannel E. She then sends each quantum system through the channel N.After Bob receives the systems, he performs a collective measurement onthem, using the outcome of the measurement to give an estimate m of themessage m sent to him by Alice.

7.2 Classical Capacity of a Quantum Channel

Let us now consider the asymptotic setting of classical communication, asdepicted in Figure 7.3. Similar to entanglement-assisted classical communi-cation, instead of encoding the message into one quantum system and conse-quently using the channel N only once, Alice encodes the message into n ≥ 1quantum systems A1, . . . , An, all with the same dimension as A, and sendseach one of these through the channel N. We call this the asymptotic settingbecause the number n of channel uses can be arbitrarily large.

Recall that in the case of entanglement-assisted classical communication,we showed that encoding channels that entangle the n systems A1, . . . , An donot help to achieve higher rates in the asymptotic setting. This is due to theadditivity of the mutual information and the additivity of the sandwichedRényi mutual information of a channel for all channels and α > 1. In thecase of classical communication that we consider in this chapter, it turns outthat, so far, such a statement is known to be generally false for the Holevoinformation of a quantum channel (please consult the Bibliographic Notes inSection 7.5). That is, in principle there exists a channel for which the Holevoinformation is not additive. Therefore, unlike entanglement-assisted classicalcommunication, concrete expressions for the classical capacity exist only for

643

Page 655: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

specific classes of channels.

The analysis of the classical communication protocol in the asymptoticsetting is almost exactly the same as in the one-shot setting. This is due to thefact that n independent uses of the channel N can be regarded as a single useof the channel N⊗n. So the only change that needs to be made is to replace N

with N⊗n and to define the states and POVM elements as acting on n systemsinstead of just one. In particular, the state at the end of the protocol presentedin (7.1.8)–(7.1.9) at the beginning of Section 7.1 is

ωpMM

= (DBn→M N⊗nA→B EM′→An)(Φp

MM′), (7.2.1)

where p is the prior probability distribution over the message set M, the en-coding channel EM′→An is defined as

EM′→An(|m〉〈m|M′) = ρmAn ∀ m ∈M, (7.2.2)

and the decoding channel DBn→M, with associated POVM ΛmBnm∈M, is de-

fined asDBn→M(τBn) = ∑

m∈MTr[Λm

BnτBn ]|m〉〈m|M. (7.2.3)

Then, for every given code specified by the encoding and decoding channels,the definitions of the message error probability of the code, the average er-ror probability of the code, and the maximal error probability of the code allfollow analogously from their definitions in (7.1.11), (7.1.13), and (7.1.15), re-spectively, in the one-shot setting.

Definition 7.7 (n, |M|, ε) Classical Communication Protocol

A classical communication protocol (M,EM→An ,DBn→M) over n usesof the channel NA→B is called an (n, |M|, ε) protocol, with ε ∈ [0, 1], ifp∗err(E,D) ≤ ε.

Just as in the case of entanglement-assisted classical communication, therate of a classical communication protocol over n uses of a channel is simplythe number of bits that can transmitted per channel use, i.e.,

R(n, |M|) :=1n

log2 |M|. (7.2.4)

Given a channel NA→B and ε ∈ [0, 1], the maximum rate of classical commu-

644

Page 656: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

nication over N among all (n, |M|, ε) protocols is

Cn,ε(N) :=1n

Cε(N⊗n) = sup(M,E,D)

1n

log2 |M| : p∗err(E,D;N⊗n) ≤ ε

, (7.2.5)

where the optimization is with respect to every classical communication pro-tocol (M,EM′→A,DB→M) over N⊗n, with dM′ = dM = |M|.

As with entanglement-assisted classical communication, the goal of a clas-sical communication protocol is to maximize the rate while at the same timekeeping the maximal error probability low. Ideally, we would want the errorprobability to vanish, and since we want to determine the highest possiblerate, we are not concerned about the practical question regarding how manychannel uses might be required, at least in the asymptotic setting. In particu-lar, as we will see below, it might take an arbitrarily large number of channeluses to obtain the highest rate with a vanishing error probability.

Definition 7.8 Achievable Rate for Classical Communication

Given a quantum channel N, a rate R ∈ R+ is called an achievable rate forclassical communication over N if for all ε ∈ (0, 1], δ > 0, and sufficientlylarge n, there exists an (n, 2n(R−δ), ε) classical communication protocol.

As we prove in Appendix A,

R acheivable rate ⇐⇒ limn→∞

ε∗C(2n(R−δ);N⊗n) = 0 ∀ δ > 0. (7.2.6)

In other words, a rate R is achievable if the optimal error probability for asequence of protocols with rate R − δ, δ > 0, vanishes as the number n ofuses of N increases.

Definition 7.9 Classical Capacity of a Quantum Channel

The classical capacity of a quantum channel N, denoted by C(N), is de-fined as the supremum of all achievable rates, i.e.,

C(N) := supR : R is an achievable rate for N. (7.2.7)

The classical capacity can also be written as

C(N) = infε∈(0,1]

lim infn→∞

1n

Cε(N⊗n). (7.2.8)

645

Page 657: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

See Appendix A for a proof.

Definition 7.10 Weak Converse Rate for Classical Communication

Given a quantum channel N, a rate R ∈ R+ is called a weak converse ratefor classical communication over N if every R′ > R is not an achievable ratefor N.

We show in Appendix A that

R weak converse rate ⇐⇒ limn→∞

ε∗C(2n(R−δ);N⊗n) > 0 ∀ δ > 0. (7.2.9)

In other words, a weak converse rate is a rate above which the optimal errorprobability cannot be made to vanish in the limit of a large number of channeluses.

Definition 7.11 Strong Converse Rate for Classical Communication

Given a quantum channel N, a rate R ∈ R+ is called a strong converse ratefor classical communication over N if for all ε ∈ [0, 1), δ > 0, and sufficientlylarge n, there does not exist an (n, 2n(R+δ), ε) classical communicationprotocol over N.

We show in Appendix A that

R strong converse rate ⇐⇒ limn→∞

ε∗C(2n(R+δ);N⊗n) = 1 ∀ δ > 0. (7.2.10)

In other words, unlike the weak converse, in which the optimal error proba-bility is required to simply be bounded away from zero as the number n ofchannel uses increases, in order to have a strong converse rate the optimalerror has to converge to one as n increases. By comparing (7.2.9) and (7.2.10),it is clear that every strong converse rate is a weak converse rate.

Definition 7.12 Strong Converse Classical Capacity of a QuantumChannel

The strong converse classical capacity of a quantum channel N, denoted byC(N), is defined as the infimum of all strong converse rates,i.e.,

C(N) := infR : R is a strong converse rate for N. (7.2.11)

646

Page 658: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

As shown in general in Appendix A, the following inequality holds

C(N) ≤ C(N) (7.2.12)

for every quantum channel N. We can also write the strong converse classicalcapacity as

C(N) = supε∈[0,1)

lim supn→∞

1n

Cε(N⊗n). (7.2.13)

See Appendix A for a proof.

Having defined the classical capacity of a quantum channel, as well asthe strong converse capacity, we now state one of the main theorems of thischapter, which gives us a formal expression for the classical capacity of everyquantum channel.

Theorem 7.13 Classical Capacity of a Quantum Channel

The classical capacity of a quantum channel N is equal to its regularizedHolevo information χreg(N) of N, i.e.,

C(N) = χreg(N) := limn→∞

1n

χ(N⊗n). (7.2.14)

REMARK: The quantity χreg(N) := limn→∞1n χ(N⊗n) is called the regularization of the

Holevo information. It can be shown that the limit in the definition of χreg(N) does indeedexist (please consult the Bibliographic Notes in Section 7.5).

Note that, unlike the case of entanglement-assisted classical communica-tion in Chapter 6, the right-hand side of (7.2.14) does not depend on only asingle use of the channel N. Rather, the capacity formula involves a limit overan arbitrarily large number of uses of the channel and is essentially impossi-ble to compute in general, firstly because of the difficulty of computing theHolevo information of a channel and secondly due to the limit over an arbi-trarily large number of uses of the channel. The issue is essentially that theHolevo information of a channel is not known to be additive for all channels,while the mutual information of a channel is known to have this property(we show this in Theorem 6.19). However, as we show below in Section 7.2.3,the Holevo information is superadditive, meaning that χ(N⊗n) ≥ nχ(N). Thisimplies that the Holevo information is always a lower bound on the quantumcapacity of every channel N:

C(N) ≥ χ(N) for all channels N. (7.2.15)

647

Page 659: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Channels for which the Holevo information is known to be additive includethe following:

1. All entanglement-breaking channels. (See Definition 3.30.)

2. All Hadamard channels. (See Definition 3.34.)

3. The depolarizing channel. (See (3.2.112).)

4. The erasure channel. (See (3.2.99).)

For all of these channels, we thus have that C(N) = χ(N).

Also notice that, unlike Theorem 6.16 for entanglement-assisted classicalcommunication, Theorem 7.13 only makes a statement about the classical ca-pacity C(N) of all channels, not about the strong converse classical capacityC(N). In the case of entanglement-assisted classical communication, prov-ing that the mutual information of a channel is a strong converse rate in-volved proving that the sandwiched Rényi mutual information is additivefor all channels, which we established in Theorem 6.22. Similarly, in the caseof classical communication and attempting to follow a similar approach, therelevant quantity is the sandwiched Rényi Holevo information χα, defined in(4.11.95). Unlike the sandwiched Rényi mutual information, the sandwichedRényi Holevo information is not known to be additive for all channels. How-ever, as we show in Section 7.2.3.1, it is additive for all entanglement-breakingchannels. It is also additive for Hadamard and depolarizing channels (pleaseconsult the Bibliographic Notes in Section 7.5). The best we can do, at themoment, is to say that the regularized Holevo information is a weak converserate for all channels.

There are two ingredients to the proof of Theorem 7.13:

1. Achievability: We show that χreg(N) is an achievable rate. In general, toshow that R ∈ R+ is achievable, we define encoding and decoding chan-nels such that for all ε ∈ (0, 1] and sufficiently large n, the encoding anddecoding channels correspond to (n, 2nr, ε) protocols with rates r < R, asper Definition 7.8. Thus, if R is an achievable rate, then, given an errorprobability ε, it is possible to find an n large enough, along with encodingand decoding channels, such that the resulting protocol has rate arbitrar-ily close to R and maximal error probability bounded from above by ε.

The achievability part of the proof establishes that C(N) ≥ χreg(N).

2. Weak Converse: We show that χreg(N) is a weak converse rate, from which

648

Page 660: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

it follows that C(N) ≤ χreg(N). To show that χreg(N) is a weak converserate, we show that every achievable rate r satisfies r ≤ χreg(N).

The achievability and weak converse proofs establish that the classical ca-pacity is equal to the regularized Holevo information: C(N) = χreg(N). The-orem 7.13 and the inequality in (7.2.12) allow us to conclude that

C(N) ≥ limn→∞

1n

χ(N⊗n). (7.2.16)

We first establish in Section 7.2.1 that the rate χreg(N) is achievable forclassical communication over N. Then, in Section 7.2.2, we prove that χreg(N)is a weak converse rate. We prove that the sandwiched Rényi Holevo infor-mation of a entanglement-breaking channel is additive in Section 7.2.3. Withthis additivity result, we prove in Section 7.2.4 that C(N) = C(N) = χ(N) forall entanglement-breaking channels.

7.2.1 Proof of Achievability

In this section, we prove that χreg(N) is an achievable rate for classical com-munication over N.

First, recall from Theorem 7.6 that for all ε ∈ (0, 1) and η ∈ (0, ε2), there

exists an (|M|, ε) classical communication protocol over N such that

log2 |M| ≥ χα(N) +α

α− 1log2

(1

ε2 − η

)− log2

(4ε

η2

)(7.2.17)

for all α ∈ (0, 1), where we recall from (7.1.101) that

χα(N) = supρXA

Iα(X; B)ω (7.2.18)

= supρXA

Dα(NA→B(ρXA)‖ρX ⊗NA→B(ρA)), (7.2.19)

where ωXB := NA→B(ρXA) and the optimization is over all classical–quantumstates ρXA. A simple corollary of this result is the following.

649

Page 661: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Corollary 7.14 Lower Bound for Classical Communication inAsymptotic Setting

Let N be a quantum channel. For all ε ∈ (0, 1], n ∈ N, and α ∈ (0, 1),there exists an (n, |M|, ε) classical communication protocol over n usesof N such that

1n

log2 |M| ≥ χα(N)− 1n(1− α)

log2

(4ε

)− 4

n. (7.2.20)

PROOF: The inequality (7.2.17) holds for every channel N, which means thatit holds for N⊗n. Applying the inequality in (7.2.17) to N⊗n and dividing bothsides by n, we obtain

1n

log2 |M| ≥1n

χα(N⊗n) +

α

n(α− 1)log2

(1

ε2 − η

)− 1

nlog2

(4ε

η2

)(7.2.21)

for all α ∈ (0, 1). By restricting the optimization in the definition of χα(N⊗n)

to tensor-power states, we find that χα(N⊗n) ≥ nχα(N). This follows from

the additivity of the Petz–Rényi relative entropy under tensor-product states(see Proposition 4.21). So we obtain

1n

log2 |M| ≥ χα(N) +α

n(α− 1)log2

(1

ε2 − η

)− 1

nlog2

(4ε

η2

)(7.2.22)

for all α ∈ (0, 1). Now, letting η = ε4 and using the fact that α− 1 is negative

for α ∈ (0, 1), the following inequality holds for all α ∈ (0, 1):

1n

log2 |M| ≥ χα(N)− 1n(1− α)

log2

(4ε

)− 4

n. (7.2.23)

In other words, for all ε ∈ (0, 1], there exists an (n, |M|, ε) classical communi-cation protocol such that (7.2.20) is satisfied. This concludes the proof.

The inequality in (7.2.20) gives us, for every ε ∈ (0, 1] and n ∈ N, a lowerbound on the rate of a corresponding (n, |M|, ε) classical communication pro-tocol, which is known to exist due to Proposition 7.5. If instead we fix a par-ticular communication rate R by letting |M| = 2nR, then we can rearrange theinequality in (7.2.20) to obtain an exponentially decaying upper bound on themaximal error probability of the corresponding (n, 2nR, ε) classical communi-cation protocol. Specifically, we find that

ε ≤ 4 · 2−(1−α)(χα(N)−R− 4n) (7.2.24)

650

Page 662: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

for all α ∈ (0, 1).

The inequality in (7.2.20) implies that

Cn,ε(N) ≥ χα(N)− 1n(α− 1)

log2

(4ε

)− 4

n(7.2.25)

for all n ≥ 1, ε ∈ (0, 1), and α ∈ (0, 1).

We can now use (7.2.20) to prove that χreg(N) is an achievable rate forclassical communication over N.

Proof of the Achievability Part of Theorem 7.13

Fix ε ∈ (0, 1] and δ > 0. Let δ1, δ2 > 0 be such that

δ = δ1 + δ2. (7.2.26)

Set α ∈ (0, 1) such thatδ1 ≥ χ(N)− χα(N), (7.2.27)

which is possible because χα(N) is monotonically increasing in α (this followsfrom Proposition 4.21), because limα→1− χα(N) = χ(N) (the proof of this isanalogous to the one presented in Appendix 6.B). With this value of α, take nlarge enough so that

δ2 ≥1

n(1− α)log2

(4ε

)+

4n

. (7.2.28)

Now, making use of the inequality in (7.2.20) of Corollary 7.14, there existsan (n, |M|, ε) protocol, with n and ε chosen as above, such that

1n

log2 |M| ≥ χα(N)− 1n(1− α)

log2

(4ε

)− 4

n. (7.2.29)

Rearranging the right-hand side of this inequality, and using (7.2.26)–(7.2.28),we find that

1n

log2 |M| ≥ χ(N)−(

χ(N)− χα(N) +1

n(1− α)log2

(4ε

)+

4n

)(7.2.30)

≥ χ(N)− (δ1 + δ2) (7.2.31)= χ(N)− δ. (7.2.32)

651

Page 663: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

We thus have χ(N) − δ ≤ 1n log2 |M|. Recall that if an (n, |M|, ε) protocol

exists, then an (n, |M′|, ε) also exists for all M′ satisfying |M′| ≤ |M|. Wethus conclude that there exists an (n, 2n(R−δ), ε) classical communication withR = χ(N) for all sufficiently large n such that (7.2.28) holds. Since ε and δ arearbitrary, we conclude that, for all ε ∈ (0, 1], δ > 0, and sufficiently large n,there exists an (n, 2n(χ(N)−δ), ε) classical communication protocol. This meansthat χ(N) is an achievable rate, and thus that C(N) ≥ χ(N).

Now, we can repeat the arguments above for the tensor-power channel N⊗k

with k ≥ 1, and we conclude that 1k χ(N⊗k) is an achievable rate. Since this

holds for all k, we conclude that limk→∞1k χ(N⊗k) = χreg(N) is an achievable

rate. Therefore, C(N) ≥ χreg(N).

7.2.1.1 Achievability from a Different Point of View

Using arguments similar to those given in Appendix 6.C, we can make thefollowing statement: there exists a sequence (n, 2nRn , εn)n∈N of (n, |M|, ε)classical communication protocols over N, such that lim infn→∞ Rn ≥ χ(N)and limn→∞ εn = 0. If we consider a sequence (n, 2nR, εn)n∈N of (n, |M|, ε)classical communication protocols, this time keeping the rate at an arbitrary(but fixed) value R < χ(N) and varying the error probability, we concludethat there exists a sequence of protocols for which the error probabilities εnapproach zero exponentially fast as n→ ∞.

7.2.2 Proof of the Weak Converse

We now show that the regularized Holevo information χreg(N) is a weak con-verse rate. The result is to establish that C(N) ≤ χreg(N) and therefore thatC(N) = χreg(N), completing the proof of Theorem 7.13.

Let us first recall from Theorem 7.4 that for every quantum channel N wehave the following: for all ε ∈ [0, 1) and (|M|, ε) classical communicationprotocols over N,

log2 |M| ≤1

1− ε(χ(N) + h2(ε)) , (7.2.33)

log2 |M| ≤ χα(N) +α

α− 1log2

(1

1− ε

), ∀ α > 1. (7.2.34)

652

Page 664: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

E PσBn...M 3 m

A1

A2

...An−1

An

B1

B2

...Bn−1

Bn

m

Alice Bob

FIGURE 7.4: Depiction of a protocol that is useless for classical communi-cation in the asymptotic setting. The state encoding the message m via E

is discarded and replaced by an arbitrary (but fixed) state σBn .

To obtain these inequalities, we considered a classical communication proto-col over a useless channel and used the hypothesis testing relative entropy tocompare this protocol with the actual protocol over the channel N. The use-less channel in the asymptotic setting is analogous to the one in Figure 7.2 andis shown in Figure 7.4. A simple corollary of Theorem 7.4, which is relevantfor the asymptotic setting, is the following.

Corollary 7.15 Upper Bounds for Classical Communication inAsymptotic Setting

Let N be a quantum channel. For all ε ∈ [0, 1), n ∈ N, and (n, |M|, ε)classical communication protocols over n uses of N, the rate of transmit-ted bits is bounded from above as follows:

log2 |M|n

≤ 11− ε

(1n

χ(N⊗n) +1n

h2(ε)

), (7.2.35)

log2 |M|n

≤ 1n

χα(N⊗n) +

α

n(α− 1)log2

(1

1− ε

)∀ α > 1. (7.2.36)

PROOF: Since the inequalities in (7.2.33) and (7.2.34) of Theorem 7.4 hold forevery channel N, they hold for the channel N⊗n. Therefore, applying (7.2.33)and (7.2.34) to N⊗n and dividing both sides by n, we immediately obtain thedesired result.

The inequalities in the corollary above give us, for every ε ∈ [0, 1) andn ∈ N, an upper bound on the size |M| of the message set we can take for

653

Page 665: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

an arbitrary (n, |M|, ε) classical communication protocol. If instead we fix aparticular communication rate R by letting |M| = 2nR, then we can obtain alower bound on the maximal error probability of an arbitrary (n, 2nR, ε) clas-sical communication protocol. Specifically, using (7.2.36), we find that

ε ≥ 1− 2−n( α−1α )(R− 1

n χα(N⊗n)) (7.2.37)

for all α > 1.

The inequalities in (7.2.35) and (7.2.36) imply that

Cn,ε(N) ≤ 11− ε

(1n

χ(N⊗n) +1n

h2(ε)

), (7.2.38)

Cn,ε(N) ≤ 1n

χα(N⊗n) +

α

n(α− 1)log2

(1

1− ε

)∀ α > 1, (7.2.39)

where n ≥ 1 and ε ∈ (0, 1).

Using (7.2.35), we can now prove the weak converse part of Theorem 7.13.

Proof of the Weak Converse Part of Theorem 7.13

Suppose that R is an achievable rate. Then, by definition, for all ε ∈ (0, 1],δ > 0, and sufficiently large n, there exists an (n, 2n(R−δ), ε) classical com-munication protocol over N. For all such protocols, the inequality (7.2.35) inCorollary 7.15 holds, so that

R− δ ≤ 11− ε

(1n

χ(N⊗n) +1n

h2(ε)

). (7.2.40)

Since this bound holds for all n, it holds in the limit n→ ∞, so that

R ≤ limn→∞

11− ε

(1n

χ(N⊗n) +1n

h2(ε)

)+ δ (7.2.41)

=1

1− εlim

n→∞

1n

χ(N⊗n) + δ. (7.2.42)

Then, since this inequality holds for all ε, δ > 0, we then conclude that

R ≤ limε,δ→0

1

1− εlim

n→∞

1n

χ(N⊗n) + δ

= lim

n→∞

1n

χ(N⊗n). (7.2.43)

654

Page 666: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

We have thus shown that if R is an achievable rate, then R ≤ χreg(N). Thecontrapositive of this statement is that if R > χreg(N), then R is not an achiev-able rate. By definition, therefore, χreg(N) is a weak converse rate.

Recall that Theorem 7.13 only gives an expression for the capacity C(N),and not for the strong converse capacity C(N). The sandwiched Rényi Holevoinformation χα(N) of a channel N can be used to obtain the upper bound in(7.2.36), holding for every (n, |M|, ε) protocol. This inequality then leads toan expression for the strong converse capacity in the case that χα(N) happensto be additive for N. We now, therefore, address this question regarding theadditivity of the sandwiched Rényi Holevo information.

7.2.3 The Additivity Question

Although we have shown that the classical capacity C(N) of a channel N isgiven by the regularized Holevo information χreg(N) = limn→∞

1n χ(N⊗n), as

mentioned earlier, without the additivity of χ(N) this result is not particu-larly helpful since it is not known how to compute the regularized Holevoinformation in general.

Note, however, that for every two channels N1 and N2 we always have thesuperadditivity of the Holevo information, i.e.,

χ(N1 ⊗N2) ≥ χ(N1) + χ(N2). (7.2.44)

This follows by performing exactly the same steps in (6.2.44)–(6.2.52), butwith the systems R1 and R2 therein taken to be classical systems. There-fore, to prove the additivity of χ for a channel N, it suffices to show thatχ(N⊗M) ≤ χ(N) + χ(M).

Similarly, the sandwiched Rényi Holevo information is superadditive; i.e.,for all α ≥ 1 and every two channels N1 and N2, it holds that

χα(N1 ⊗N2) ≥ χα(N1) + χα(N2). (7.2.45)

First, recall thatχα(N) = sup

ρXA

Iα(X; B)ω, (7.2.46)

where ωXB = NA→B(ρXA), and where we optimize over classical–quantumstates ρXA = ∑x∈X p(x)|x〉〈x|X ⊗ ρx

A, with X a finite alphabet with associated

655

Page 667: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

|X|-dimensional system X and ρxAx∈X a set of states. Also, recall that for

every state ρAB,Iα(A; B)ρ = inf

σBDα(ρAB‖ρA ⊗ σB). (7.2.47)

Now, the proof of (7.2.45) proceeds similarly to the proof of the corre-sponding inequality (7.2.44) for the Holevo information. By restricting theoptimization in the definition of χα(N1 ⊗ N2) to product states, and lettingρ′X1X2B1B2

be defined as

ρ′X1X2B1B2:= ((N1)A1→B1 ⊗ (N2)A2→B2)(ρX1X2 A1 A2), (7.2.48)

for a classical–quantum state ρX1X2 A1 A2 (X systems classical and A systemsquantum), we find that

χα(N1 ⊗N2) = supρ

Iα(X1X2; B1B2)ρ′ (7.2.49)

≥ supτ⊗ω

Iα(X1X2; B1B2)ξ ′⊗ω′ , (7.2.50)

where τ′X1B1:= (N1)A1→B1(ξX1 A2) and ω′X2B2

:= (N2)A2→B2(ωX2 A2). Proposi-tion 6.21 states that the sandwiched Rényi mutual information Iα is additivefor product states, meaning that

Iα(A1A2; B1B2)τ⊗ω = Iα(A1; B1)τ + Iα(A2; B2)ω (7.2.51)

for every two states τA1B1 ⊗ωA2B2 . Using this, we find that

χα(N1 ⊗N2) ≥ supτ,ω

Iα(X1; B1)τ′ + Iα(X2; B2)ω′

(7.2.52)

= supτ

Iα(X1; B1)τ′ + supω

Iα(X2; B2)ω′ (7.2.53)

= χα(N1) + χα(N2), (7.2.54)

i.e.,χα(N1 ⊗N2) ≥ χα(N1) + χα(N2), (7.2.55)

as required.

We see that in order to show the additivity of the sandwiched Rényi Holevoinformation for N, it suffices to show subadditivity for N, i.e.,

χα(N⊗n) ≤ nχα(N) ∀ n ≥ 1. (7.2.56)

We now show that subadditivity, and thus additivity, of the sandwiched RényiHolevo information holds for all entanglement-breaking channels.

656

Page 668: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

7.2.3.1 Entanglement-Breaking Channels

In this section, we prove that the sandwiched Rényi Holevo information isadditive for all entanglement-breaking channels.

Theorem 7.16 Additivity of χα for Entanglement-Breaking Chan-nels

For an entanglement-breaking channel N and an arbitrary channel M,the following equality holds for all α > 1:

χα(N⊗M) = χα(N) + χα(M). (7.2.57)

The proof of this theorem relies on two lemmas, the first of which statesthat the sandwiched Rényi Holevo information χα(N) of a channel N is equalto a quantity Kα(N), called the sandwiched Rényi information radius of N.

Lemma 7.17

For every quantum channel N and α > 1, the following equality holds

χα(N) = infσ

supρ

Dα(N(ρ)‖σ) =: Kα(N), (7.2.58)

where the optimizations are over states ρ and σ. The quantity Kα(N) iscalled the sandwiched Rényi information radius of N.

PROOF: To prove this lemma, we show that χα(N) ≤ Kα(N) and χα(N) ≥Kα(N).

First, using the definition in (4.11.95) of χα(N), we find that for everystate τB,

χα(N)

= supρXA

Iα(X; B)ω (7.2.59)

= supρXA

infσB

(∑

x∈Xp(x)|x〉〈x|X ⊗N(ρx

A)

∥∥∥∥∥ ∑x∈X

p(x)|x〉〈x|X ⊗ σB

)(7.2.60)

≤ supρXA

(∑

x∈Xp(x)|x〉〈x|X ⊗N(ρx

A)

∥∥∥∥∥ ∑x∈X

p(x)|x〉〈x|X ⊗ τB

), (7.2.61)

657

Page 669: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

where ωXB = NA→B(ρXA) and the supremum is over all classical–quantumstates ρXA of the form ρXA = ∑x∈X p(x)|x〉〈x|X ⊗ ρx

A, with X a finite alphabetwith associated |X|-dimensional quantum system X and ρx

Ax∈X is a set ofstates. Now, recall from (4.5.174) that the sandwiched Rényi relative entropyis jointly quasi-convex for α > 1 and invariant under tensoring in the samestate |x〉〈x|, which implies that

χα(N) ≤ supρXA

(∑

x∈Xp(x)|x〉〈x|X ⊗N(ρx

A)

∥∥∥∥∥ ∑x∈X

p(x)|x〉〈x|X ⊗ τB

)(7.2.62)

≤ supρXA

maxx∈X

Dα(N(ρxA)‖τB) (7.2.63)

≤ supρA

Dα(N(ρA)‖τB). (7.2.64)

The final inequality above holds for every state τB, which implies that

χα(N) ≤ infτB

supρA

Dα(N(ρA)‖τB) = Kα(N), (7.2.65)

i.e., χα(N) ≤ Kα(N).

We now show that Kα(N) ≤ χα(N). First, consider that

Kα(N) = infσB

supρA

Dα(N(ρA)‖σB) (7.2.66)

=1

α− 1infσB

supρA

log2 Qα(N(ρA)‖σB) (7.2.67)

=1

α− 1log2 inf

σBsupρA

Qα(N(ρA)‖σB). (7.2.68)

Now, by taking a supremum over all probability measures µ on the set of allstates ρA, we find that

supρA

Qα(N(ρA)‖σB) ≤ supµ

∫Qα(N(ρA)‖σB) dµ(ρA). (7.2.69)

So we have that

Kα(N) ≤ 1α− 1

log2 infσB

supµ

∫Qα(N(ρA)‖σB) dµ(ρA). (7.2.70)

658

Page 670: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

We now apply the Sion minimax theorem (Theorem 2.18) to exchange infσBand supµ. This theorem is applicable because the function

(µ, σB) 7→∫

Qα(N(ρA)‖σB) dµ(ρA) (7.2.71)

is linear in the measure µ and convex in the states σB. The latter is indeed truebecause

Qα(N(ρA)‖σB) =

∥∥∥∥N(ρ)12 σ

1−αα

B N(ρA)12

∥∥∥∥α

α

, (7.2.72)

for all α > 1 the function σB 7→ σ1−α

αB is operator convex, the Schatten norm

‖·‖α is convex, and the function x 7→ xα is convex for all x ≥ 0. So we findthat

Kα(N) ≤ 1α− 1

log2 supµ

infσB

∫Qα(N(ρA)‖σB) dµ(ρA). (7.2.73)

Now, by Carathéodory’s theorem (Theorem 2.17), if ρ is a density operatoracting on a d-dimensional space, then there exists an alphabet X of size nomore than d2, a probability distribution p : X → [0, 1] on X, and an ensemble(p(x), ρx

A)x∈X of states such that∫

Qα(N(ρA)‖σB) dµ(ρ) = ∑x∈X

p(x)Qα(N(ρxA)‖σB). (7.2.74)

Therefore,

Kα(N) ≤ 1α− 1

log2 sup(p(x),ρx

A)x

infσB

∑x∈X

p(x)Qα(N(ρxA)‖σB) (7.2.75)

=1

α− 1log2 sup

(p(x),ρxA)x

infσB

Qα(NA→B(ρXB)‖ρX ⊗ σB) (7.2.76)

= supρXA

infσB

Dα(ρXA‖ρX ⊗ σB) (7.2.77)

= supρXA

Iα(X; B)ω (7.2.78)

= χα(N), (7.2.79)

where ωXB = NA→B(ρXA), and to obtain the first equality we used the direct-sum property of Qα in (4.5.41). So we have Kα(N) ≤ χα(N) in addition toKα(N) ≥ χα(N), which means that Kα(N) = χα(N), as required.

659

Page 671: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Using the fact that limα→1+ χα(N) = χ(N) (the proof of this analogousto the one presented in Appendix 6.B), we obtain the following alternate for-mula for the Holevo information of a quantum channel N:

χ(N) = infσ

supρ

D(N(ρ)‖σ), (7.2.80)

where the optimizations are over states ρ and σ.

Before stating the following lemma, let us define an entanglement-breakingmap NA→B to be a completely positive map such that NA→B(XRA) is a sepa-rable operator of systems R and B for every positive semi-definite input oper-ator XRA. We can think of it as a generalization of an entanglement-breakingchannel (Definition 3.30) in which there is no requirement of trace preserva-tion.

Lemma 7.18

Let NA→B be a completely positive entanglement-breaking map, and letMA′→B′ be a completely positive map. Then, for all α ≥ 1, the followingequality holds

να(N⊗M) = να(N) · να(M), (7.2.81)

whereνα(P) := max

ρ‖P(ρ)‖α , (7.2.82)

for every completely positive map P and the supremum is taken overevery density operator ρ in the domain of P.

PROOF: We start with the observation that

να(N⊗M) ≥ να(N) · να(M) (7.2.83)

holds for every two completely positive maps N and M, simply by restrict-ing the optimization in the definition of να(N⊗M) to tensor-product states.Specifically,

να(N⊗M) = supρAA′‖(N⊗M)(ρAA′)‖α (7.2.84)

≥ supσA,ωA′

‖(N⊗M)(σA ⊗ωA′)‖α (7.2.85)

= supσA,ωA′

‖N(σA)⊗M(ωA′)‖α (7.2.86)

660

Page 672: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

= supσA,ωA′

‖N(σA)‖α · ‖M(ωA′)‖α (7.2.87)

= supσA

‖N(σA)‖α · supωA′‖M(ωA′)‖α (7.2.88)

= να(N) · να(M). (7.2.89)

We now show that the reverse inequality να(N ⊗M) ≤ να(N) · να(M)holds for an arbitrary completely positive map M and a completely positiveentanglement-breaking map N.

Since N is an entanglement breaking map, for every bipartite state ρAA′ ,the following equality holds

NA→B(ρAA′) = ∑x∈X

SxB ⊗ τx

A′ . (7.2.90)

where X is a finite alphabet, SxBx∈X is a set of positive semi-definite opera-

tors, and τxA′x∈X is a set of states. Then,

(NA→B ⊗MA′→B′)(ρAA′) = ∑x∈X

SxB ⊗MA′→B′(τ

xA′) = VTV†, (7.2.91)

whereV := ∑

x∈X〈x| ⊗

√Sx

B ⊗ 1B′ (7.2.92)

andT := ∑

x∈X|x〉〈x| ⊗ 1B ⊗MA′→B′(τ

xA′). (7.2.93)

This implies that

Tr[((NA→B ⊗MA′→B′)(ρAA′))

α] = Tr[(

VTV†)α]

. (7.2.94)

Now, let us apply the Araki–Lieb–Thirring inequality (Lemma 2.14), whichstates that for every two positive semi-definite operators X and Y acting on afinite-dimensional Hilbert space,

Tr[(

Y12 XY

12

)rq]≤ Tr

[(Y

r2 XrY

r2

)q](7.2.95)

for all q ≥ 0 and all r ≥ 1. For q = 1, we obtain

Tr[(

Y12 XY

12

)r]≤ Tr[XrYr] . (7.2.96)

661

Page 673: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Now, for every operator Z, note that ZXZ† has the same non-zero eigenval-ues as (Z†Z)

12 X(Z†Z)

12 (this follows by considering the polar decomposition

of Z). In addition, since Z†Z is positive semi-definite, applying (7.2.96) withY = Z†Z gives us

Tr[(

ZXZ†)r]

= Tr[(

(Z†Z)12 X(Z†Z)

12

)r]≤ Tr

[(Z†Z)rXr

]. (7.2.97)

Substituting r = α, Z = V, and X = T into this inequality gives us

Tr[((NA→B ⊗MA′→B′)(ρAA′))

α] = Tr[(

VTV†)α]

(7.2.98)

≤ Tr[(V†V)αTα

]. (7.2.99)

Letting

R := ∑x∈X〈x| ⊗

√Sx

B, (7.2.100)

observe thatV†V = R†R⊗ 1B′ , (7.2.101)

which implies that(V†V)α = (R†R)α ⊗ 1B′ . (7.2.102)

Therefore, since T, and thus Tα, is block diagonal, we find that

Tr[(V†V)αTα

]

= Tr

[((R†R)α ⊗ 1B′

)(∑

x∈X|x〉〈x| ⊗ 1B ⊗ (MA′→B′(τ

xA′))

α

)](7.2.103)

= ∑x∈X

Tr[(

(R†R)α)

x,x

]Tr[(MA′→B′(τ

xA′))

α] , (7.2.104)

where ((R†R)α

)x,x

:= (〈x| ⊗ 1B)(R†R)α(|x〉 ⊗ 1B). (7.2.105)

Now,

Tr[M(τxA′)

α]1α = ‖M(τx

A′)‖α ≤ να(M)⇒ Tr[M(τxA′)

α] ≤ να(M)α. (7.2.106)

Also, by taking the partial trace over A′ in (7.2.90), we find that

N(ρA) = ∑x∈X

SxB = RR†. (7.2.107)

662

Page 674: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Using this, we find that

∑x∈X

Tr[(

(R†R)α)

x,x

]= Tr

[(R†R)α

]

= Tr[(RR†)α

]

= Tr[N(ρA)α]

≤ να(N)α.

(7.2.108)

Therefore, putting everything together,

‖(NA→B ⊗MA′→B′)(ρAA′)‖α

= Tr[((NA→B ⊗M)A′→B′)(ρAA′))

α] 1α (7.2.109)

= Tr[(

VTV†)α] 1

α(7.2.110)

≤ Tr[(V†V)αTα

] 1α (7.2.111)

≤ να(N) · να(M). (7.2.112)

Since the state ρAA′ is arbitrary, we obtain

supρAA′‖(NA→B ⊗MA′→B′)(ρAA′)‖α = να(N⊗M) ≤ να(N) · να(M). (7.2.113)

Along with (7.2.83), we conclude that να(N ⊗M) = να(N) · να(M), as re-quired.

With Lemmas 7.17 and 7.18 in hand, we can now prove Theorem 7.16.

Proof of Theorem 7.16

We start by using (4.5.3) to write the definition of Kα(N) as

Kα(N) = infσB

supρA

Dα(N(ρA)‖σB) (7.2.114)

= infσB

supρA

α

α− 1log2

∥∥∥∥σ1−α2α

B N(ρA)σ1−α2α

B

∥∥∥∥α

. (7.2.115)

Then, for every channel M,

Kα(N⊗M)

663

Page 675: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

= infσA′B′

supρAB

Dα((N⊗M)(ρAB)‖σA′B′) (7.2.116)

α− 1inf

σA′B′supρAB

log2

∥∥∥∥σ1−α2α

A′B′ ((N⊗M)(ρAB)) σ1−α2α

A′B′

∥∥∥∥α

(7.2.117)

α− 1inf

σA′B′log2 sup

ρAB

∥∥∥∥σ1−α2α

A′B′ ((N⊗M)(ρAB)) σ1−α2α

A′B′

∥∥∥∥α

(7.2.118)

≤ α

α− 1inf

σA′ ,τB′log2 sup

ρAB

∥∥∥∥∥

1−α2α

A′ ⊗ τ1−α2α

B′

)((N⊗M)(ρAB))

1−α2α

A′ ⊗ τ1−α2α

B′

)∥∥∥∥∥α

,

(7.2.119)

where to obtain the inequality we have restricted the infimum to tensor prod-uct states. Now, observe that since N is entanglement-breaking, then sand-

wiching the output of the channel by the positive semi-definite operator σ1−α2α

A′leads to a new map N′ that is a completely positive entanglement-breakingmap (though not necessarily trace preserving). Similarly, sandwiching the

output of M by the positive semi-definite operator τ1−α2α

B′ leads to a new com-pletely positive map M′. Therefore, using Lemma 7.18, we obtain

Kα(N⊗M)

≤ α

α− 1inf

σA′ ,τB′log2

(supρAB

∥∥(N′ ⊗M′)(ρAB)∥∥

α

)(7.2.120)

α− 1inf

σA′ ,τB′log2 να(N

′ ⊗M′) (7.2.121)

α− 1inf

σA′ ,τB′log2

(να(N

′)να(M′))

(7.2.122)

α− 1inf

σA′ ,τB′

[log2 να(N

′) + log2 να(M′)]

(7.2.123)

= infσA′

α

α− 1log2 sup

ρA

∥∥N′(ρA)∥∥

α+ inf

τB′

α

α− 1log2 sup

ωB

∥∥M′(ωB)∥∥

α(7.2.124)

= infσA′

supρA

α

α− 1log2

∥∥N′(ρA)∥∥

α+ inf

τB′supωB

α

α− 1log2

∥∥M′(ωB)∥∥

α(7.2.125)

= Kα(N) + Kα(M). (7.2.126)

So we have that Kα(N⊗M) ≤ Kα(N) + Kα(M) for every channel M. UsingLemma 7.17, we obtain the desired result.

Note that the additivity of the Holevo information of every entanglement-breaking channel follows from the additivity of the sandwiched Rényi Holevo

664

Page 676: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

information of such channels by taking the limit α → 1+ (the proof is analo-gous to the one presented in Appendix 6.B).

7.2.4 Proof of the Strong Converse for Entanglement-Break-ing Channels

Having shown that the sandwiched Rényi Holevo information is additivefor all entanglement-breaking channels, we can now proceed further from(7.2.36) to prove a strong converse theorem for all entanglement-breakingchannels. Moreover, since the sandwiched Rényi Holevo information χα(N)satisfies limα→1+ χα(N) = χ(N) (the proof of this is analogous to the one pre-sented in Appendix 6.B), we can go beyond the statement of Theorem 7.13and say that C(N) = χ(N) for all entanglement-breaking channels N.

Theorem 7.19 Classical Capacity of Entanglement-Breaking Chan-nels

For every entanglement-breaking channel N,

C(N) = C(N) = χ(N). (7.2.127)

REMARK: Note that this theorem holds more generally for every channel N for which thesandwiched Rényi Holevo information χα(N) is additive.

PROOF: Since limα→1+ χα(N) = χ(N) (the proof of this is analogous to theone presented in Appendix 6.B), we find that the Holevo information is addi-tive for all entanglement-breaking channels. The equality C(N) = χ(N) thenfollows from Theorem 7.13.

The remainder of the proof is devoted to establishing that χ(N) is a strongconverse rate for classical communication over N, from which it follows thatC(N) ≤ χ(N), which in turn implies, via (7.2.12), that C(N) = χ(N).

Fix ε ∈ [0, 1) and δ > 0. Let δ1, δ2 > 0 be such that

δ > δ1 + δ2 =: δ′. (7.2.128)

Set α ∈ (1, ∞) such thatδ1 ≥ χα(N)− χ(N), (7.2.129)

665

Page 677: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

which is possible since χα(N) is monotonically increasing with α (this followsfrom Proposition 4.29), and since limα→1+ χα(N) = χ(N). With this value ofα, take n large enough so that

δ2 ≥α

n(α− 1)log2

(1

1− ε

). (7.2.130)

Now, with the values of n and ε chosen as above, every (n, |M|, ε) classicalcommunication protocol satisfies (7.2.36) in Corollary 7.15. In particular, us-ing the additivity of the sandwiched Rényi Holevo information for all α > 1,we can write (7.2.36) as

1n

log2 |M| ≤ χα(N) +α

n(α− 1)log2

(1

1− ε

). (7.2.131)

Rearranging the right-hand side of this inequality, and using the assumptionsin (7.2.128)–(7.2.130), we obtain

1n

log2 |M| ≤ χ(N) + χα(N)− χ(N) +α

n(α− 1)log2

(1

1− ε

)(7.2.132)

≤ χ(N) + δ1 + δ2 (7.2.133)= χ(N) + δ′ (7.2.134)< χ(N) + δ. (7.2.135)

So we have that χ(N) + δ > 1n log2 |M| for all (n, |M|, ε) classical commu-

nication protocols with n sufficiently large. Due to this strict inequality, itfollows that there cannot exist an (n, 2n(χ(N)+δ), ε) classical communicationprotocol for all sufficiently large n such that (7.2.130) holds, for if it did therewould exist some message set M such that 1

n log2 |M| = χ(N) + δ, whichwe have just seen is not possible. Since ε and δ are arbitrary, we concludethat for all ε ∈ [0, 1), δ > 0, and sufficiently large n, there does not exist an(n, 2n(χ(N)+δ), ε) classical communication protocol. This means that χ(N) is astrong converse rate, which completes the proof.

7.2.4.1 The Strong Converse from a Different Point of View

Just as we did in the case of entanglement-assisted classical communicationin Appendix 6.G, we can use the alternative definitions of classical capac-ity and strong converse classical capacity (stated in Appendix A) to see that

666

Page 678: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

n→ ∞

Rate, Rn

ErrorProbability,

εn

χ(N)0

1

FIGURE 7.5: The error probability εn as a function of the rate Rn for classi-cal communication over a quantum channel N for which the sandwichedRényi Holevo information χα(N) is additive. As n → ∞, for every ratebelow the Holevo information χ(N), there exists a sequence of protocolswith error probability converging to zero. For every rate above the Holevoinformation χ(N), the error probability converges to one for all possibleprotocols.

C(N) = C(N) = χ(N) for all channels N for which the sandwiched RényiHolevo information is additive and that, as shown in Figure 7.5, the quantityχ(N) is a sharp dividing point between reliable, error-free communicationand communication with error approaching one exponentially fast. Specifi-cally, by following the arguments in Appendix 6.G, we obtain the following:for every sequence (n, 2nR, εn)n∈N of (n, |M|, ε) protocols, with each ele-ment of the sequence having an arbitrary (but fixed) rate R > χ(N), the se-quence εnn∈N of error probabilities approaches one at an exponential rate.

7.2.5 General Upper Bounds on the Strong Converse Classi-cal Capacity

The difficulty in proving the additivity of the Holevo information for a gen-eral channel, and thus obtaining an upper bound on its classical capacity, hasmotivated the study of other, more tractable upper bounds on the classicalcapacity of a quantum channel. In this section, we present two upper boundson the strong converse classical capacity of a quantum channel.

667

Page 679: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

7.2.5.1 Υ-Information Upper Bound

Recall from Proposition 7.3 that the following upper bound on the number oftransmitted bits holds for every (|M|, ε) classical communication protocol:

log2 |M| ≤ χεH(N), (7.2.136)

where the ε-hypothesis testing Holevo information χεH(N) of the quantum

channel N is defined in (4.11.93) as

χεH(N) = sup

ρXA

IεH(X; B)ω (7.2.137)

= supρXA

infσB

DεH(NA→B(ρXA)‖ρX ⊗ σB). (7.2.138)

Here, ωXB = NA→B(ρXA), and the optimization is over classical–quantumstates ρXA of the form ρXA = ∑x∈X p(x)|x〉〈x|X ⊗ ρx

A, where X is a finite al-phabet, p : X → [0, 1] is a probability distribution, and (p(x), ρx

A)x∈X is anensemble of states.

We derived the inequality in (7.2.136) by comparing the actual classicalcommunication protocol over the channel N with a classical communicationprotocol over the replacement channel R (see (7.1.23) in Section 7.1.1). The re-placement channel is useless for classical communication because it discardsthe state encoded with the message and replaces it with a fixed state. We canmake the comparison between the channel N and the channel R for classicalcommunication more explicit by writing the quantity χε

H(H) as

χεH(N) = sup

ρXA

infRA→B

DεH(NA→B(ρXA)‖RA→B(ρXA)), (7.2.139)

where RA→B has the action RA→B(ρXA) = ρX ⊗ σB.

Intuitively, an approach for obtaining an alternative upper bound on thestrong converse classical capacity is to expand the set of useless channels fromthe replacement channels to the following set of completely positive tracenon-increasing maps:

F := FA→B : ∃ σB ≥ 0, Tr[σB] ≤ 1, FA→B(ρA) ≤ σB ∀ρA ∈ D(HA),(7.2.140)

This set of maps, even though they contain completely positive, non-trace-preserving maps, can be thought of intuitively as also being useless for clas-sical communication, and it contains the set of replacement channels. Usingthis set, we define the generalized Υ-information as follows:

668

Page 680: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Definition 7.20 Generalized Υ-Information

Let D be a generalized divergence (see Definition 4.13). For every quan-tum channel NA→B, we define the generalized Υ-information of N as

Υ(N) := supψRA

infF∈F

D(NA→B(ψRA)‖FA→B(ψRA)), (7.2.141)

where the supremum is over pure states ψRA, with the dimension of Rthe same as the dimension of A.

REMARK: Note that it suffices to optimize over pure states ψRA, with the dimension of Requal to the dimension of A, when calculating the generalized Υ-information of a channel,i.e., for general states ρRA (with the dimension of R not necessarily equal to the dimensionof A),

supρRA

infF∈F

D(NA→B(ρRA)‖FA→B(ρRA)) = Υ(N). (7.2.142)

The proof of this proceeds analogously to the steps in (4.11.4)–(4.11.2) for proving that itsuffices to optimize over pure states when calculating the generalized channel divergence.

In this section, we are interested in the following generalized Υ-informa-tion channel quantities:

1. The Υ-information of N,

Υ(N) := supψRA

infF∈F

D(NA→B(ψRA)‖FA→B(ψRA)). (7.2.143)

2. The ε-hypothesis testing Υ-information of N,

ΥεH(N) := sup

ψRA

infF∈F

DεH(NA→B(ψRA)‖FA→B(ψRA)). (7.2.144)

3. The sandwiched Rényi Υ-information of N,

Υα(N) := supψRA

infF∈F

Dα(NA→B(ψRA)‖FA→B(ψRA)), (7.2.145)

where α ∈ [1/2, 1) ∪ (1, ∞).

669

Page 681: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Proposition 7.21 Holevo Information and Υ-Information

The Υ-information Υ(N) of a quantum channel N is greater than or equalto its Holevo information χ(N):

Υ(N) ≥ χ(N). (7.2.146)

PROOF: To see this, we apply Proposition 4.80 to obtain

Υ(N) = supρA

infF∈F

D(√

ρAΓNAB√

ρA‖√

ρAΓFAB√

ρA) (7.2.147)

= infF∈F

supρA

D(√

ρAΓNAB√

ρA‖√

ρAΓFAB√

ρA). (7.2.148)

Then, by employing (7.2.148), we find that

Υ(N) = infF∈F

supρA

D(√

ρAΓNAB√

ρA‖√

ρAΓFAB√

ρA) (7.2.149)

≥ infF∈F

supρA

D(NA→B(ρA)‖FA→B(ρA)), (7.2.150)

where, to obtain the inequality, we used the fact that√

ρAΓNAB√

ρA = (√

ρA′ ⊗ 1B)NA→B(ΓA′A)(√

ρA′ ⊗ 1B) (7.2.151)= NA→B ((

√ρA′ ⊗ 1A)ΓA′A(

√ρA′ ⊗ 1A)) (7.2.152)

= NA→B

((1A′ ⊗

√ρT

A)ΓAA(1A′ ⊗√

ρTA))

. (7.2.153)

In the last line we used the transpose trick (see (2.2.30). Then, we appliedthe monotonicity of the quantum relative entropy with respect to the partialtrace TrA. Finally, we used the fact that ρT

A is a state for every ρA, so that theoptimization over states remains unchanged.

Continuing, we have that

Υ(N) ≥ infF∈F

supρA

D(NA→B(ρA)‖σF) (7.2.154)

≥ infσB

supρA

D(NA→B(ρA)‖σB) (7.2.155)

= χ(N). (7.2.156)

To obtain the first inequality, we first used the fact that for every map F ∈ Fthere exists a state, which we call σF, such that FA→B(ρA) ≤ σF for all input

670

Page 682: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

states ρA. We then used 2.(d) in Proposition 4.3. To obtain the last inequality,we simply enlarged the set over which the infimum is performed to includeall states. Then, to obtain the equality on the last line, we used the expressionin (7.2.80) for the Holevo information.

We now prove an analogue of Proposition 7.3 involving the ε-hypothesistesting Υ-information, Υε

H(N), in place of the ε-hypothesis testing Holevo in-formation χε

H(N).

Proposition 7.22

Let N be a quantum channel. For every (|M|, ε) classical communicationprotocol over N, the number of bits transmitted over N is bounded fromabove by the ε-hypothesis testing Υ-information of N, i.e.,

log2 |M| ≤ ΥεH(N). (7.2.157)

PROOF: For every (|M|, ε) classical communication protocol, with encodingand decoding channel given by E and D, respectively, the maximal error prob-ability criterion p∗err(E,D;N) ≤ ε holds. This implies perr((E,D); p,N) ≤ ε forthe average probability, where p : M→ [0, 1] is the uniform prior probabilitydistribution over the messages in M. If the encoding channel E is defined suchthat we obtain the set ρm

Am∈M of states associated to each message m ∈ M

(see (7.1.2)), and the decoding channel D is defined by the POVM ΛmB m∈M,

then we can write the average success probability psucc((E,D); p,N) of thecode (E,D) as

psucc((E,D); p,N) = 1− perr((E,D); p,N) (7.2.158)

=1|M| ∑

m∈MTr[Λm

BNA→B(ρmA)], (7.2.159)

and we have that psucc((E,D); p,N) ≥ 1− ε. Now, recall from (3.2.7) that wecan write the action of NA→B in terms of its Choi representation ΓN

AB as

NA→B(ρmA) = TrA

[((ρm

A)T ⊗ 1B

)ΓN

AB

](7.2.160)

for all m ∈M. Also, let us define the average state

ρA :=1|M| ∑

m∈Mρm

A, (7.2.161)

671

Page 683: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

and a purification of it

|φ〉AA′ := (1A ⊗√

ρA′)|Γ〉AA′ =(√

ρTA ⊗ 1A′

)|Γ〉A′A, (7.2.162)

where we used the transpose trick in (2.2.30) to obtain the last equality. Then,observe that

NA′→B(φAA′) =√

ρTANA′→B(ΓAA′)

√ρT

A (7.2.163)

=√

ρTAΓN

AB

√ρT

A, (7.2.164)

which implies that the Choi operator ΓNAB can be written as2

ΓNAB = (ρT

A)− 1

2NA′→B(φAA′)(ρTA)− 1

2 . (7.2.165)

Therefore,

psucc((E,D); p,N)

=1|M| ∑

m∈MTr[Λm

BNA→B(ρmA)] (7.2.166)

=1|M| ∑

m∈MTr[((ρm

A)T ⊗Λm

B)

ΓNAB

](7.2.167)

=1|M| ∑

m∈MTr[((ρm

A)T ⊗Λm

B)(ρT

A)− 1

2NA′→B(φAA′)(ρTA)− 1

2

](7.2.168)

= Tr

[(ρT

A)− 1

2

(1|M| ∑

m∈M(ρm

A)T ⊗Λm

B

)(ρT

A)− 1

2NA′→B(φAA′)

](7.2.169)

= Tr[ΩABNA′→B(φAA′)], (7.2.170)

where

ΩAB := (ρTA)− 1

2

(1|M| ∑

m∈M(ρm

A)T ⊗Λm

B

)(ρT

A)− 1

2 . (7.2.171)

Note that ΩAB is positive semi-definite, i.e., ΩAB ≥ 0. Also, observe that sinceΛm

B ≤ 1B for all m ∈M, we have that

ΩAB ≤ (ρTA)

(1|M| ∑

m∈M(ρm

A)T ⊗ 1B

)(ρT

A)− 1

2 = 1AB. (7.2.172)

2Note that if ρTA is not invertible, then the inverse is understood to be on the support of ρT

A.

672

Page 684: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Together with ΩAB ≥ 0, this means that ΩAB is a measurement operator. Sowe have that

psucc((E,D); p,N) = Tr[ΩABNA→B(φAA′)] ≥ 1− ε. (7.2.173)

Now, let F ∈ F. This means that there exists a state, call it σB, such thatF(ρA) ≤ σB for all states ρA. We find that

Tr[ΩABFA′→B(φAA′)]

= Tr

[(ρT

A)− 1

2

(1|M| ∑

m∈M(ρm

A)T ⊗Λm

B

)(ρT

A)− 1

2FA′→B(φAA′)

](7.2.174)

=1|M| ∑

m∈MTr[((ρm

A)T ⊗Λm

B)

ΓFAB

](7.2.175)

=1|M| ∑

m∈MTr[Λm

BFA→B(ρmA)] (7.2.176)

≤ 1|M| ∑

m∈MTr[Λm

B σB] (7.2.177)

≤ 1|M| , (7.2.178)

where we used (7.2.165) to obtain the second equality and we used the factthat F(ρA) ≤ σB for every input state ρA to obtain the second-to-last inequal-ity.

Now, by optimizing the quantity Tr[ΩABFA′→B(φAA′)] over all measure-ment operators, subject to the constraint Tr[ΩABNA′→B(φAA′)] ≥ 1 − ε, weget that

log2 |M| ≤ DεH(NA′→B(φAA′)‖FA′→B(φAA′)). (7.2.179)

Since this holds for every F ∈ F, we have that

log2 |M| ≤ infF∈F

DεH(NA′→B(φAA′)‖FA′→B(φAA′)). (7.2.180)

Finally, optimizing over all pure states φAA′ , we conclude that

log2 |M| ≤ supψRA

infF∈F

DεH(NA→B(ψRA)‖FA→B(ψRA)) = Υε

H(N), (7.2.181)

as required.

As an immediate consequence of Propositions 7.22, 4.67, and 4.68, we havethe following two bounds:

673

Page 685: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Proposition 7.23

Let N be a quantum channel, let ε ∈ [0, 1), and let α > 1. For every(|M|, ε) classical communication protocol over N, the following boundshold

log2 |M| ≤1

1− ε(Υ(N) + h2(ε)) , (7.2.182)

log2 |M| ≤ Υα(N) +α

α− 1log2

(1

1− ε

). (7.2.183)

In the asymptotic setting, the bounds above become the following:

1n

log2 |M| ≤1

1− ε

(1nΥ(N⊗n) +

1n

h2(ε)

), (7.2.184)

1n

log2 |M| ≤1nΥα(N

⊗n) +α

n(α− 1)log2

(1

1− ε

)∀ α > 1. (7.2.185)

These upper bounds hold for every (n, |M|, ε) classical communication proto-col over a quantum channel N, where n ∈N and ε ∈ [0, 1).

Now, as with the Holevo information and the sandwiched Rényi Holevoinformation, we are faced with the additivity of the Υ-information and thesandwiched Rényi Υ-information. Our primary focus is on the latter, sincewe would like to make a statement about the strong converse for channelsmore general than entanglement-breaking channels. It turns out that thesandwiched Rényi Υ-information is additive for irreducibly-covariant chan-nels.

Recall from Definition 3.48 that a channel NA→B is covariant with respectto a group G if there exist projective unitary representations Ug

Ag∈G andVg

Bg∈G such that

N(UgAρA(U

gA)

†) = VgBN(ρA)(V

gB )

† (7.2.186)

for all states ρA and all g ∈ G. The channel N is called irreducibly covari-ant if the representation Ug

Ag∈G acting on the input space of the channel isirreducible, which means that it satisfies

1|G| ∑

g∈GUg

AρA(UgA)

† =1

dA(7.2.187)

for every state ρA.

674

Page 686: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Proposition 7.24 Generalized Υ-Information for Irreducibly-Covariant Channels

Let NA→B be an irreducibly-covariant quantum channel. Then the gen-eralized Υ-information of N can be calculated using the maximally en-tangled state ΦRA, i.e.,

Υ(N) = infF∈F

D(NA→B(ΦRA)‖FA→B(ΦRA))

= infF∈F

D(ρNRB‖ρFRB).(7.2.188)

PROOF: By simply restricting the optimization over states ψRA in the defini-tion of the generalized Υ-information to the maximally entangled state ΦRA,we obtain

Υ(N) = supψRA

infF∈F

D(NA→B(ψRA)‖FA→B(ψRA)) (7.2.189)

≥ infF∈F

D(NA→B(ΦRA)‖FA→B(ΦRA)). (7.2.190)

To prove the reverse inequality, let us recall Proposition 4.81, specifically itsproof. Let G be the group with respect to which N is irreducibly covariant, letUg

Ag∈G be the irreducible representation of G acting on the input space ofNA→B, and let Vg

Bg∈G be the representation of G acting on the output spaceof N. Since the maps FA→B in F, in particular the map achieving the infimumin the definition of Υ(N), need not be irreducibly covariant, we cannot useProposition 4.81 directly. Instead, we consider (4.11.41) in its proof. For everyFA→B ∈ F, by using (4.11.52), the inequality in (4.11.41) becomes

D(NA→B(ΦRA)‖FA→B(ΦRA))

≥ D

(1|G| ∑

g∈G|g〉〈g|R′ ⊗

((V

gB)

† NA→B UgA

)(ψRA)

∥∥∥∥∥ (7.2.191)

1|G| ∑

g∈G|g〉〈g|R′ ⊗

((V

gB)

† FA→B UgA

)(ψRA)

)(7.2.192)

for every pure state ψRA, with the dimension of R equal to the dimension ofA. Using the data-processing inequality for the generalized divergence withrespect to the partial trace TrR, and using the fact that N is covariant, we findthat

D(NA→B(ΦRA)‖FA→B(ΦRA))

675

Page 687: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

≥ D

(NA→B(ψRA)

∥∥∥∥∥1|G| ∑

g∈G

((V

gB)

† FA→B UgA

)(ψRA)

). (7.2.193)

Now, observe that the map F′A→B := 1|G| ∑g∈G(V

gB)

† FA→B UgA is in the set

F since F is. Therefore, optimizing over all F′ ∈ F, we find that

D(NA→B(ΦRA)‖FA→B(ΦRA))

≥ infF′∈F

D(NA→B(ψRA)‖F′A→B(ψRA)). (7.2.194)

Since the map F ∈ F is arbitrary, we conclude that

infF∈F

D(NA→B(ΦRA)‖FA→B(ΦRA))

≥ infF′∈F

D(NA→B(ψRA)‖F′A→B(ψRA)). (7.2.195)

Finally, since the state ψRA is arbitrary, we obtain

infF∈F

D(NA→B(ΦRA)‖FA→B(ΦRA))

≥ supψRA

infF′∈F

D(NA→B(ψRA)‖F′A→B(ψRA)) (7.2.196)

= Υ(N). (7.2.197)

We thus conclude (7.2.188), as required.

By Proposition 7.24, we have that for irreducibly-covariant channels thesandwiched Rényi Υ-information can be calculated without an optimizationover all pure states ψRA—we simply set ψRA = ΦRA. Using this fact, weobtain the following:

Proposition 7.25 Subadditivity of Sandwiched Rényi Υ-Information for Irreducibly-Covariant Channels

Let NA→B be an irreducibly covariant quantum channel. Then, for allα ∈ [1/2, 1)∪ (1, ∞), the sandwiched Rényi Υ-information Υα(N) is sub-additive, i.e.,

Υα(N⊗n) ≤ nΥα(N) (7.2.198)

for all n ≥ 1.

676

Page 688: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

PROOF: Since N is irreducibly covariant, we use Proposition 7.24 to concludethat

Υα(N⊗n) = inf

F∈FnDα(N

⊗nA→B(ΦRn An)‖FAn→Bn(ΦRn An)), (7.2.199)

where Fn is the set of completely positive maps in (7.2.140) acting on the spaceof the system An. Now, the maximally entangled state ΦRn An on n identicalcopies R1 · · · Rn and A1 · · · An of the systems R and A splits into a tensorproduct in the following way:

ΦRn An = ΦR1 A1 ⊗ · · · ⊗ΦRn An . (7.2.200)

Furthermore, if we restrict the optimization over maps FAn→Bn ∈ Fn to atensor product of identical maps G in the set F such that

FAn→Bn = GA1→B1 ⊗ · · · ⊗ GAn→Bn , (7.2.201)

then, we arrive at the following bound:

Υα(N⊗n)

≤ infG∈F

Dα(NA1→B1(ΦR1 A1)⊗ · · · ⊗NAn→Bn(ΦRn An)‖

GA1→B1(ΦR1 A1)⊗ · · · ⊗ GAn→Bn(ΦRn An)) (7.2.202)

= infG∈F1

nDα(NA→B(ΦRA)‖GA→B(ΦRA)) (7.2.203)

= nΥα(N), (7.2.204)

as required, where the first equality follows from the additivity of the sand-wiched Rényi relative entropy for tensor-product states (see (4.5.40) in Propo-sition 4.29).

With the subadditivity of the sandwiched Rényi Υ-information for irre-ducibly covariant channels, we can now state the following strong conversetheorem.

Theorem 7.26 Υ-Information Upper Bound on the Strong ConverseClassical Capacity of Irreducibly-Covariant Channels

The Υ-information Υ(N) of an irreducibly-covariant quantum chan-nel N is a strong converse rate for classical communication over N; i.e.,

C(N) ≤ Υ(N). (7.2.205)

677

Page 689: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

PROOF: Fix ε ∈ [0, 1) and δ > 0. Let δ1, δ2 > 0 be such that

δ > δ1 + δ2 =: δ′. (7.2.206)

Set α ∈ (1, ∞) such thatδ1 ≥ Υα(N)−Υ(N), (7.2.207)

which is possible because Υα(N) is monotonically increasing with α (this fol-lows from Proposition 4.29) and limα→1+ Υα(N) = Υ(N) (see Appendix 7.Afor a proof). With this value of α, take n large enough so that

δ2 ≥α

n(α− 1)log2

(1

1− ε

). (7.2.208)

Now, with the values of n and ε chosen as above, every (n, |M|, ε) classicalcommunication protocol satisfies (7.2.185). In particular, using the subaddi-tivity of the sandwiched Rényi Υ-information for all α > 1, we can write(7.2.185) as

1n

log2 |M| ≤ Υα(N) +α

n(α− 1)log2

(1

1− ε

). (7.2.209)

Rearranging the right-hand side of this inequality, and using the assumptionsin (7.2.206)–(7.2.208), we obtain

1n

log2 |M| ≤ Υ(N) + Υα(N)−Υ(N) +α

n(α− 1)log2

(1

1− ε

)(7.2.210)

≤ Υ(N) + δ1 + δ2 (7.2.211)= Υ(N) + δ′ (7.2.212)< Υ(N) + δ. (7.2.213)

So we have that Υ(N) + δ > 1n log2 |M| for all (n, |M|, ε) classical communica-

tion protocols with n sufficiently large. Due to this strict inequality, it followsthat there cannot exist an (n, 2n(Υ(N)+δ), ε) classical communication protocolfor all sufficiently large n such that (7.2.208) holds, for if it did there would ex-ist some message set M such that 1

n log2 |M| = Υ(N) + δ, which we have justseen is not possible. Since ε and δ are arbitrary, we have that for all ε ∈ [0, 1),δ > 0, and sufficiently large n, there does not exist an (n, 2n(Υ(N)+δ), ε) classi-cal communication protocol. This means that Υ(N) is a strong converse rate,which completes the proof.

678

Page 690: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Theorem 7.26 thus gives us an upper bound on the strong converse clas-sical capacity C(N) of any irreducibly-covariant channel N, namely,

C(N) ≤ Υ(N), (7.2.214)

which in turn implies, via (7.2.12), that

C(N) ≤ Υ(N) (7.2.215)

for every irreducibly-covariant channel N. Recall that the Holevo informationχ(N) is an achievable rate for classical communication over any quantumchannel, which implies that C(N) ≥ χ(N). (This in fact gives another way forconcluding that Υ(N) ≥ χ(N).)

It turns out that the Υ-information Υ(N) is equal to the Holevo informationin the case of the erasure channel E(d)p . Recall from Section 6.3.1.2 that the era-sure channel is irreducibly covariant. This fact, along with other reasoning,allows us to conclude that

C(E(d)p ) = C(E(d)p ) = χ(E(d)p ) = (1− p) log2 d (7.2.216)

for all dimensions d ≥ 2 and all p ∈ [0, 1]. We provide a proof of this chain ofequalities in Section 7.3.1.2 below.

7.2.5.2 SDP Upper Bound

While the Υ-information gives us an upper bound on the strong converseclassical capacity of any irreducibly-covariant channel, computing it is rel-atively challenging due to the minimization over the set F. In this section,we define a subset of F, denoted by Fβ, that allows us to obtain a quantitythat can be computed using a semi-definite program (SDP). Furthermore, thisquantity turns out to be additive for all channels, which means that it is anupper bound on the strong converse classical capacity for all channels.

The set Fβ is defined as the following set of completely positive maps:

Fβ := F completely positive : β(F) ≤ 1, (7.2.217)

where β(F) is defined as the solution to the following optimization problem:

β(F) :=

infimum Tr[SB]subject to −RAB ≤ (ΓF

AB)TB ≤ RAB,

−1A ⊗ SB ≤ RTBAB ≤ 1A ⊗ SB

(7.2.218)

679

Page 691: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Note that the optimization occurs over the operators SB and RAB, and thatthe optimization problem as a whole is an SDP. Indeed, recalling the generalform of an SDP from Section 2.4, we can write it as

β(F) =

infimum Tr[CX]subject to Φ(X) ≥ D,

X ≥ 0,(7.2.219)

where

X =

(SB 00 RAB

), C =

(1B 00 0AB

), (7.2.220)

Φ(X) =

RAB 0 0 00 RAB 0 00 0 1R ⊗ SB − RTB

AB 00 0 0 1R ⊗ SB + RTB

AB

, (7.2.221)

D =

(ΓFAB)

TB 0 0 00 −(ΓF

AB)TB 0 0

0 0 0 00 0 0 0

. (7.2.222)

Note that the constraints in (7.2.218) imply that SB and RAB are positive semi-definite, since the constraint −RAB ≤ (ΓF

AB)TB ≤ RAB implies that

RAB − (ΓFAB)

TB ≥ 0, (7.2.223)

RAB + (ΓFAB)

TB ≥ 0. (7.2.224)

Adding the two inequalities leads to RAB ≥ 0. Similarly, the constraint−1A⊗SB ≤ RTB

AB ≤ 1A ⊗ SB implies that SB ≥ 0.

It is straightforward to show that Fβ is a subset of F, i.e., Fβ ⊆ F. Indeed,suppose that for a given completely positive map F ∈ Fβ, the quantity β(F)in (7.2.218) is achieved by the operators (R∗AB, S∗B). Note that since F ∈ Fβ, bydefinition we have that Tr[S∗B] ≤ 1, and we also have that (ΓF

AB)TB ≤ R∗AB and

(R∗AB)TB ≤ 1A ⊗ S∗B. Letting σB ≡ S∗B, for every state ρA we find that

FA→B(ρA) = TrA

[(ρT

A ⊗ 1B)ΓFAB

](7.2.225)

=(

TrA

[(ρT

A ⊗ 1B)(ΓFAB)

TB])T

(7.2.226)

=(

TrA

[(√

ρTA ⊗ 1B)(ΓF

AB)TB(√

ρTA ⊗ 1B)

])T

(7.2.227)

680

Page 692: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

≤(

TrA

[(√

ρTA ⊗ 1B)R∗AB(

√ρT

A ⊗ 1B)])T

(7.2.228)

= TrA

[(√

ρTA ⊗ 1B)(R∗AB)

TB(√

ρTA ⊗ 1B)

](7.2.229)

≤ TrA[(ρT

A ⊗ 1B)(1A ⊗ σB)]

(7.2.230)= σB, (7.2.231)

where to obtain the first inequality we used (ΓFAB)

TB ≤ R∗AB and to obtain thesecond inequality we used (R∗AB)

TB ≤ 1A ⊗ σB. Therefore, FA→B(ρA) ≤ σBfor all ρA, which means that FA→B ∈ F. Since F ∈ Fβ is arbitrary, we concludethat Fβ ⊂ F.

By replacing the set F in the definition of the generalized Υ-informationof a channel N with the set Fβ, we obtain the following quantity:

Υβ(N) := supψRA

infF∈Fβ

D(NA→B(ψRA)‖FA→B(ψRA)), (7.2.232)

which we call the Υβ-information of N. When we take the generalized diver-gence D to be the quantum relative entropy, the hypothesis testing relativeentropy, and the sandwiched Rényi relative entropy, we have

Υβ(N) := supψRA

infF∈Fβ

D(NA→B(ψRA)‖FA→B(ψRA)), (7.2.233)

Υβ,εH (N) := sup

ψRA

infF∈Fβ

DεH(NA→B(ψRA)‖FA→B(ψRA)), (7.2.234)

Υβα(N) := sup

ψRA

infF∈Fβ

Dα(NA→B(ψRA)‖FA→B(ψRA)). (7.2.235)

Since the set Fβ is a subset of F, minimizing over Fβ can never lead to asmaller value compared to minimizing over F, which means that

Υ(N) ≤ Υβ(N). (7.2.236)

Therefore, using the Υβ-information, the bound in Proposition 7.22 on every(|M|, ε) classical communication protocol thus becomes

log2 |M| ≤ Υβ,εH (N). (7.2.237)

This bound is looser than the one in Proposition 7.22, but it has the advan-tange that it can be computed using an SDP. This is due to the fact that thehypothesis testing relative entropy can itself be computed via an SDP.

681

Page 693: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Although we get an efficiently computable upper bound in the one-shotsetting via the Υβ-information, in the asymptotic setting this bound is notknown to be additive, making its evaluation computationally prohibitive asthe number n of channel uses increases. Instead, for the purpose of obtainingan efficiently computable upper bound in the asymptotic setting, we definethe following quantity for every quantum channel N:

Cβ(N) = log2 β(N), (7.2.238)

Since β(N) can be computed using an SDP (in particular, via the optimizationproblem in (7.2.218)), we have that Cβ(N) can also be computed using an SDP.

A useful fact about the quantity Cβ(N) is the fact that it is additive, i.e.,

Cβ(N1 ⊗N2) = Cβ(N1) + Cβ(N2) (7.2.239)

for every two channels N1 and N2, as proved in Appendix 7.B by employingsemi-definite programming duality. We use this fact to prove that Cβ(N) is astrong converse rate for classical communication over a channel N in Theo-rem 7.28 below. However, first we establish the following proposition:

Proposition 7.27

For a quantum channel N, the following inequalities hold for all α > 1:

Υ(N) ≤ Cβ(N), Υα(N) ≤ Cβ(N). (7.2.240)

PROOF: Let F = 1β(N)

N. Then, β(F) = 1β(N)

β(N) = 1, which means thatF ∈ Fβ. Then, since Fβ ⊆ F, we can choose F as above when performing theinfimum in the definition of Υ(N). This leads to

Υ(N) = supψRA

infF∈F

D(NA→B(ψRA)‖FA→B(ψRA)) (7.2.241)

≤ supψRA

D

(NA→B(ψRA)

∥∥∥∥∥1

β(N)NA→B(ψRA)

)(7.2.242)

= supψRA

D(NA→B(ψRA)‖NA→B(ψRA)) + log2 β(N) (7.2.243)

= Cβ(N), (7.2.244)

as required, where the second equality follows because D(ρ‖ 1x σ) = D(ρ‖σ)+

log2 x for every state ρ, positive semi-definite operator σ, and x > 0 (see(4.2.26)). The last equality follows because D(ρ‖ρ) = 0.

682

Page 694: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Similarly, using the fact that Dα(ρ‖ 1x σ) = Dα(ρ‖σ) + log2 x, and using the

same choice for the map F as above, we find that Υα(N) ≤ Cβ(N).

The additivity of Cβ, along with Proposition 7.27, leads us to the follow-ing:

Theorem 7.28 SDP Upper Bound on Strong Converse Classical Ca-pacity

For a quantum channel N, the quantity Cβ(N) is a strong converse ratefor classical communication over N.

PROOF: We start by observing that, using Proposition 7.27, the inequality in(7.2.185) can be written as

1n

log2 |M| ≤1n

Cβ(N⊗n) +

α

n(α− 1)log2

(1

1− ε

)(7.2.245)

for all α > 1. This inequality holds for all (n, |M|, ε) classical communicationprotocols. Using the additivity of Cβ, we find that

1n

log2 |M| ≤ Cβ(N) +α

n(α− 1)log2

(1

1− ε

). (7.2.246)

Now, let us fix ε ∈ [0, 1) and δ > 0. Let δ′ be such that δ > δ′. Since Cβ(N)does not depend on α, let us choose α such that the right-hand side of theabove inequality is as small as possible, which occurs as α → ∞. With thischoice of α, take n large enough so that

δ′ ≥ 1n

log2

(1

1− ε

). (7.2.247)

Then, we obtain

1n

log2 |M| ≤ Cβ(N) +1n

log2

(1

1− ε

)(7.2.248)

≤ Cβ(N) + δ′ (7.2.249)

< Cβ(N) + δ. (7.2.250)

So we have that Cβ(N)+ δ > 1n log2 |M| for all (n, |M|, ε) classical communica-

tion protocols with n sufficiently large. Due to this strict inequality, it follows

683

Page 695: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

that there cannot exist an (n, 2n(Cβ(N)+δ), ε) classical communication protocolfor all sufficiently large n such that (7.2.247) holds, for if it did there would ex-ist some message set M such that 1

n log2 |M| = Cβ(N) + δ, which we have justseen is not possible. Since ε and δ are arbitrary, we have that for all ε ∈ [0, 1),δ > 0, and sufficiently large n, there does not exist an (n, 2n(Cβ(N)+δ), ε) classi-cal communication protocol. This means that Cβ(N) is a strong converse rate,which completes the proof.

By examining (7.2.248) in the above proof, we see that the following boundholds for an arbitrary (n, |M|, ε) classical communication protocol:

1n

log2 |M| ≤ Cβ(N) +1n

log2

(1

1− ε

). (7.2.251)

If we fix the rate R = 1n log2 |M|, then this bound can be rewritten as follows:

1− ε ≤ 2−n(R−Cβ(N)), (7.2.252)

which indicates that communicating at a rate R > Cβ(N) implies the successprobability 1 − ε of every sequence of such protocols decays exponentiallyfast to zero.

7.3 Examples

In this section, we present various examples of channels with known for-mulas for the Holevo information and/or known results on additivity of theHolevo information.

Let us start by making some observations about the Holevo informationχ(N) of a channel N. First, by expanding the definition of the Holevo informa-tion using the expression for the mutual information in terms of the relativeentropy, we arrive at the following:

Proposition 7.29 Alternate Forms for Channel Holevo Information

For a channel N, the following equalities hold

χ(N) = sup(p(x),ψx

A)x

[H(N(ρA))− ∑

x∈Xp(x)H(N(ψx

A))

], (7.3.1)

684

Page 696: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

= sup(p(x),ψx

A)x

[H(N(ρA))− ∑

x∈Xp(x)H(Nc(ψx

A))

]. (7.3.2)

where (p(x), ψxA)x∈X is an ensemble of pure states, ρA :=

∑x∈X p(x)ψxA, and we recall from (4.1.1) and (4.2.87) that H(ρ) =

−Tr[ρ log ρ] is the quantum entropy of ρ.

PROOF: We start by recalling the definition of the Holevo information χ(N)of N from (4.11.106):

χ(N) := supρXA

I(X; B)ω, (7.3.3)

where ωRB = NA→B(ρXA), and the supremum is over all classical-quantumstates of the form ρXA = ∑x∈X p(x)|x〉〈x|X ⊗ ρx

A, with X a finite alphabetwith associated |X|-dimensional system X, ρx

Ax∈X a set of states, and p :X→ [0, 1] a probability distribution on X. Recall from Proposition 4.84 that tocompute the Holevo information it suffices to take ensembles consisting onlyof pure states.

Defining the classical–quantum state ρXA as

ρXA = ∑x∈X

p(x)|x〉〈x|X ⊗ ψxA, (7.3.4)

where ψxAx∈X is a set of pure states, and defining ρ′XB = NA→B(ρXA) is

another classical–quantum state, it follows from Proposition 4.12 that

I(X; B)ρ′ = H

(∑

x∈Xp(x)N(ψx

A)

)− ∑

x∈Xp(x)H(N(ψx

A)) (7.3.5)

for all states ρXA. Since optimizing over classical–quantum states is equiva-lent to optimizing over ensembles (p(x), ψx

A)x∈X, we obtain (7.3.1).

To prove (7.3.2), let VA→BE be an isometric extension of N. Then, for everypure state ψ on the channel input system A, we can write

N(ψ) = TrE[VψV†]. (7.3.6)

Since V|ψ〉 is a pure state, it follows that TrE[VψV†] and TrB[VψV†] have thesame (non-zero) eigenvalues. The latter state is equal to Nc(ψA) by definition,which means that

H(N(ψA)) = H(Nc(ψA)). (7.3.7)

Therefore, (7.3.2) follows.

685

Page 697: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

7.3.1 Covariant Channels

For irreducibly-covariant channels (see Definition 3.48), the Holevo informa-tion takes a particularly simple form.

Theorem 7.30 Holevo Information of Irreducibly-Covariant Chan-nels

Suppose NA→B is a covariant channel with respect to a finite group G,with an irreducible representation Ug

Ag∈G of G acting on the inputspace of the channel and another representation Vg

Bg∈G of G acting onthe output space of the channel. Then,

χ(N) = H(N(πA))− Hmin(N), (7.3.8)

where πA = 1AdA

and

Hmin(N) := minρA

H(N(ρA)) (7.3.9)

is called the minimum output entropy of N. The minimization is with re-spect to all input states ρA in the domain of N.

PROOF: We have

χ(N) = sup(p(x),ρx

A)x

[H

(∑

x∈Xp(x)N(ρx

A)

)− ∑

x∈Xp(x)H(N(ρx

A))

](7.3.10)

≤ sup(p(x),ρx

A)x

H

(∑

x∈Xp(x)N(ρx

A)

)

+ sup(p(x),ρx

A)x

− ∑

x∈Xp(x)H(N(ρx

A))

(7.3.11)

≤ supρA

H(N(ρA)) + supρA

−H(N(ρA)) (7.3.12)

= supρA

H(N(ρA))− infρA

H(N(ρA)). (7.3.13)

Now, by the unitary invariance of the quantum entropy, for every state ρA weobtain

H(N(ρA)) = H(VgBN(ρA)(V

gB )

†) = H(N(UgAρA(U

gA)

†)) (7.3.14)

686

Page 698: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

for all g ∈ G. This implies that

H(N(ρA)) =1|G| ∑

g∈GH(N(Ug

AρA(UgA)

†)) (7.3.15)

≤ H

(∑

g∈G

1|G|N(Ug

AρA(UgA)

†)

)(7.3.16)

= H

(N

(1|G| ∑

g∈GUg

AρA(UgA)

))(7.3.17)

= H(N(πA)) , (7.3.18)

where the inequality follows from concavity of the quantum entropy and thelast equality follows because Ugg∈G is an irreducible representation, whichimplies that

1|G| ∑

g∈GUg

AρA(UgA)

† =1AdA

(7.3.19)

for every state ρA. Then, since we are optimizing a continuous function overa compact and convex set, the infimum in (7.3.13) can be achieved, meaningthat we can replace the infimum in (7.3.13) with a minimum, which meansthat

χ(N) ≤ H(N(πA))− Hmin(N). (7.3.20)

To show the reverse inequality, let ρ∗A be a state for which H(N(ρ∗A)) =

Hmin(N). Then, we consider the ensemble(

1|G| , Ug

Aρ∗A(UgA)

†)

g∈Gand ob-

tain

χ(N) ≥ H

(∑

g∈G

1|G|N(Ug

Aρ∗A(UgA)

†)

)− ∑

g∈G

1|G|H(N(Ug

Aρ∗A(UgA)

†)) (7.3.21)

= H(N(πA))− ∑g∈G

1|G|H(Vg

B ρ∗A(VgB )

†) (7.3.22)

= H(N(πA))− ∑g∈G

1|G|H(N(ρ∗A)) (7.3.23)

= H(N(πA))− H(N(ρ∗A)) (7.3.24)= H(N(πA))− Hmin(N). (7.3.25)

Therefore,χ(N) ≥ H(N(πA))− Hmin(N), (7.3.26)

687

Page 699: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

and the proof is complete.

We note that to compute the minimum output entropy of any channel N,it suffices to optimize over pure states. Indeed, by restricting the optimizationto pure states ψ in the definition of Hmin(N), we find that

Hmin(N) = minρ

H(N(ρ)) ≤ minψ

H(N(ψ)). (7.3.27)

On the other hand, since every state ρ can be written as a convex combinationof pure states, so that ρ = ∑x∈X p(x)ψx, we see that

H(N(ρ)) = H

(N

(∑

x∈Xp(x)ψx

))(7.3.28)

= H

(∑

x∈Xp(x)N(ψx)

)(7.3.29)

≥ ∑x∈X

p(x)H(N(ψx)) (7.3.30)

≥ minx∈X

H(N(ψx)) (7.3.31)

≥ minψ

H(N(ψ)), (7.3.32)

where the first inequality follows from concavity of the quantum entropy. Sowe have

Hmin(N) = minψ

H(N(ψ)). (7.3.33)

We now look at two irreducibly covariant channels, the depolarizing chan-nel and the erasure channel. A plot of the Holevo information for these chan-nels is given in Figure 7.6.

7.3.1.1 Depolarizing Channel

In Section 3.2.5, specifically in (3.2.112), we defined the qubit depolarizingchannel as

Dp(ρ) := (1− p)ρ +p3(XρX + YρY + ZρZ) (7.3.34)

for all p ∈ [0, 1]. From the arguments in Section 6.3.1.1, it follows that thischannel is covariant with respect to the Pauli operators on both the input

688

Page 700: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

0.0 0.2 0.4 0.6 0.8 1.0p

0.0

0.2

0.4

0.6

0.8

1.0

χ(N

)

Dp

Ep

FIGURE 7.6: The Holevo information of the depolarizing channel Dp (ex-pressed in (7.3.37)) and the erasure channel Ep (expressed in (7.3.57)), bothof which are defined for the parameter p ∈ [0, 1].

and output spaces. Furthermore, the Pauli operators 1, X, Y, Z satisfy theproperty in (7.3.19). We thus conclude that

χ(Dp) = H(Dp(π)

)− Hmin(Dp) = 1− Hmin(Dp), (7.3.35)

where we used the fact that the depolarizing channel is unital, i.e., Dp(1) = 1,and that H(π) = log2 2 = 1. To compute the minimum output entropy, weuse the fact that it suffices to minimize over pure states. It is straightforwardto show that the two eigenvalues of Dp(ψ) are (2p/3, 1− 2p/3) for every purestate ψ. Therefore,

Hmin(Dp) = h2

(2p3

)(7.3.36)

so that

χ(Dp) = 1− h2

(2p3

)(7.3.37)

for p ∈ [0, 1]. Since the Holevo information is known to be additive for thedepolarizing channel (please consult the Bibliographic Notes in Section 7.5), itfollows that χ(Dp) is equal to the classical capacity of the depolarizing chan-nel. See Figure 7.6 for a plot of the Holevo information χ(Dp) of the depolar-izing channel.

689

Page 701: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

For the qudit depolarizing channel D(d)p , recall from the discussion around

(6.3.26) that it is irreducibly covariant. Therefore, by Theorem 7.30, we obtain

χ(D(d)p ) = log2 d− Hmin(D

dp). (7.3.38)

The calculation of the minimum output entropy for the qudit depolarizingchannel is analogous to the calculation of the minimum output entropy of thequbit depolarizing channel. In particular, for every pure state ψ, the eigenval-ues of D(d)

p (ψ) are 1− dd+1 p (with multiplicity one) and d

d2−1 p (with multiplic-

ity d− 1). Indeed, by using the parameterization in (3.2.140) with q = pd2

d2−1 ,consider that

D(d)p (ψ) = (1− q)ψ + q

Id

(7.3.39)

=(

1− q +qd

)ψ +

qd(I − ψ) (7.3.40)

=

(1− pd

d + 1

)ψ +

pdd2 − 1

(I − ψ) . (7.3.41)

Therefore,

Hmin(D(d)p ) = −

(1− dp

d + 1

)log2

(1− dp

d + 1

)

− dpd + 1

log2

(dp

d2 − 1

), (7.3.42)

so that

χ(D(d)p ) = log2 d +

(1− dp

d + 1

)log2

(1− dp

d + 1

)

+dp

d + 1log2

(dp

d2 − 1

)(7.3.43)

for d ≥ 2 and p ∈ [0, 1].

The Holevo information is also known to be additive for the qudit depo-larizing channel, which means that the expression in (7.3.43) is equal to itsclassical capacity.

690

Page 702: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Theorem 7.31 Additivity of the Holevo Information for the Depo-larizing Channel

For every channel M,

χ(D(d)p ⊗M) = χ(D

(d)p ) + χ(M) (7.3.44)

for all d ≥ 2 and all p ∈ [0, 1]. Consequently,

C(D(d)p ) = χ(D

(d)p ). (7.3.45)

PROOF: Please consult the Bibliographic Notes in Section 7.5.

It also holds that the Holevo information is the strong converse classicalcapacity of the qudit depolarizing channel, i.e.,

C(D(d)p ) = χ(D

(d)p ) (7.3.46)

for all d ≥ 2 and all p ∈ [0, 1]. Please consult the Bibliographic Notes inSection 7.5 for a reference to the proof.

7.3.1.2 Erasure Channel

Let us now consider the erasure channel. Recall from (3.2.99) that the erasurechannel Ep, with p ∈ [0, 1], is defined as

Ep(ρ) = (1− p)ρ + pTr[ρ]|e〉〈e|, (7.3.47)

where |e〉 is called the erasure state and is not in the Hilbert space of theinput system A. In other words, the state |e〉〈e| is supported on the spaceorthogonal to the input space. As argued in Section 6.3.1.2, we can considerthe output space of the channel to be a qutrit system with the orthonormalbasis |0〉, |1〉, |2〉, and we can let the state |2〉 be the erasure state. Then,

Ep(ρ) = (1− p)ρ + p|2〉〈2| (7.3.48)

for every state ρ.

We also argued in Section 6.3.1.2 that the erasure channel is irreduciblycovariant. Therefore, by Theorem 7.30, we have that

χ(Ep) = H(Ep(π)

)− Hmin(Ep). (7.3.49)

691

Page 703: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Now, on the input qubit space, we have π = 12 |0〉〈0|+ 1

2 |1〉〈1|. Therefore,

Ep(π) =1− p

2|0〉〈0|+ 1− p

2|1〉〈1|+ p|2〉〈2|, (7.3.50)

which means that

H(Ep(π)

)= −(1− p) log2

(1− p

2

)− p log2 p (7.3.51)

= 1− p + h2(p). (7.3.52)

In addition, for every pure state ψ, we have

H(Ep(ψ)) = H((1− p)ψ + p|2〉〈2|) (7.3.53)= −(1− p) log2(1− p)− p log2 p, (7.3.54)= h2(p), (7.3.55)

where the second equality follows because the state |2〉〈2| is orthogonal to ψ.Therefore,

Hmin(Ep) = h2(p), (7.3.56)

which means that the Holevo information of the erasure channel is

χ(Ep) = 1− p. (7.3.57)

This is consistent with what one might expect intuitively because communi-cation over the erasure channel is only possible with probability 1− p, whenno erasure occurs, and conditioned on this outcome, the erasure channel issimply the identity channel.

In general, for the qudit erasure channel E(d)p , whose action can be definedon the d-dimensional space with orthonormal basis |1〉, . . . , |d〉 such thatthe state |d + 1〉 is the erasure state, we have that it is irreducibly covariant(see Section 6.3.1.2). Using this fact, which implies that

χ(E(d)p ) = H

(E(d)p (π)

)− Hmin(E

(d)p ), (7.3.58)

along with arguments analogous to those presented above, we obtain

χ(E(d)p ) = (1− p) log2 d. (7.3.59)

692

Page 704: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Proposition 7.32 Υ-Information of the Erasure Channel

The Υ-information of the qudit erasure channel E(d)p is given by

Υ(E(d)p ) = χ(E

(d)p ) = (1− p) log2 d. (7.3.60)

PROOF: By combining (7.3.59) and Proposition 7.21, we conclude that Υ(E(d)p ) ≥χ(E

(d)p ) = (1− p) log2 d. So we establish the opposite inequality.

Since the erasure channel is irreducibly covariant, Theorem 7.24 impliesthat the optimization over states ψRA in the definition of the Υ-information isunnecessary, and we have that

Υ(E(d)p ) = inf

F∈FD((E

(d)p )A→B(ΦRA)‖FA→B(ΦRA)), (7.3.61)

where ΦRA is the maximally entangled state with Schmidt rank d. The Choi

state ρE(d)p

RB = (E(d)p )A→B(ΦRA) of the erasure channel is

ρE(d)p

RB = (1− p)ΦRB + p1Rd⊗ |e〉〈e|. (7.3.62)

Now, let us make a particular choice of the map F in the minimization overthe completely positive maps in F. Suppose that FA→B is such that

FA→B(ΦRA) =1− p

dΦRB + p

1Rd⊗ |e〉〈e| =: σF

RB, (7.3.63)

which implies via (3.2.16) that its action on a general input state ρA is as fol-lows:

FA→B(ρA) =1− p

dρA + p|e〉〈e| ≤ 1− p

d1A + p|e〉〈e|. (7.3.64)

Note that the map FA→B defined in this way is indeed in the set F, as demon-strated by the inequality in (7.3.64) and the fact that the operator on the right-hand side of (7.3.64) is a quantum state. Then,

Υ(E(d)p ) ≤ D(ρ

E(d)p

RB ‖σFRB). (7.3.65)

It is straightforward to show that

Tr[

ρE(d)p

RB log2 ρE(d)p

RB

]= (1− p) log2(1− p) + p log2

( pd

), (7.3.66)

693

Page 705: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

and

Tr[

ρE(d)p

RB log2 σFRB

]= (1− p) log2

(1− p

d

)+ p log2

( pd

), (7.3.67)

which means that

Υ(E(d)p ) ≤ (1− p) log2 d = χ(E

(d)p ). (7.3.68)

This concludes the proof.

The classical capacity of the quantum erasure channel and its strong con-verse now follow as a direct corollary of (7.3.59), (7.2.15), Proposition 7.32, theirreducible covariance of the erasure channel, and Theorem 7.26.

Theorem 7.33 Classical Capacity of the Erasure Channel

For d ≥ 2 and p ∈ [0, 1], the following equality holds for the classicalcapacity and strong converse classical capacity of the quantum erasurechannel E(d)p :

C(E(d)p ) = C(E(d)p ) = (1− p) log2 d. (7.3.69)

7.3.2 Amplitude Damping Channel

Recall from (3.2.87) that the amplitude damping channel is defined as

Aγ(ρ) = A1ρA†1 + A2ρA†

2, (7.3.70)

whereA1 =

√γ|0〉〈1|, A2 = |0〉〈0|+

√1− γ|1〉〈1|. (7.3.71)

It can be shown that

χ(Aγ) =12

(f (r∗)− log2(1− q2)− q f ′(q)

), (7.3.72)

where

f (x) := (1 + x) log2(1 + x)− (1− x) log2(1− x), (7.3.73)

f ′(x) = log2

(1 + x1− x

), (7.3.74)

694

Page 706: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

0.0 0.2 0.4 0.6 0.8 1.0γ

0.0

0.2

0.4

0.6

0.8

1.0 Cβ(Aγ)

χ(Aγ)

FIGURE 7.7: The Holevo information χ(Aγ) in (7.3.72) of the amplitudedamping channel, which represents a lower bound on its classical ca-pacity. Also shown is the upper bound Cβ(Aγ) on the classical capacity,which is defined in (7.2.238) via the SDP in (7.2.218), and for the amplitudedamping channel it is given by the expression in (7.3.77).

r∗ :=

√1− γ− (q− γ)2

1− γ+ q2, (7.3.75)

and q is determined via

(γq− γ2 − γ(1− γ)) f ′(r∗) = −r∗(1− γ) f ′(q). (7.3.76)

(Please consult the Bibliographic Notes in Section 7.5.) It is worth noting thatneither the additivity of the Holevo information for nor the classical capacityof the amplitude damping channel are not known.

As we have seen, the quantity Cβ(Aγ) is an upper bound on the classi-cal capacity of the amplitude damping channel, and it can be shown (pleaseconsult the Bibliographic Notes in Section 7.5) that

Cβ(Aγ) = log2(1 +√

1− γ). (7.3.77)

See Figure 7.7 for a plot of both χ(Aγ) and Cβ(Aγ).

695

Page 707: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

7.3.3 Hadamard Channels

In this section, we prove that the Holevo information is additive for all Hada-mard channels.

Theorem 7.34 Additivity of Holevo Information for HadamardChannels

For a Hadamard channel N and an arbitrary channel M, the followingadditivity relation holds

χ(N⊗M) = χ(N) + χ(M). (7.3.78)

PROOF: Using the expression in (7.3.2), we have that

χ(N⊗M)

= sup(p(x),ψx)x∈X

[H

(∑

x∈Xp(x)(N⊗M)(ψx

A1 A2)

)

− ∑x∈X

p(x)H((Nc ⊗Mc)(ψxA1 A2

))

].

(7.3.79)

Now, for every bipartite state ρAB, it follows from strong subadditivitythat H(ρAB) ≤ H(ρA) + H(ρB) a fact known as the subadditivity of the quan-tum entropy (consider (5.6.88) with system C trivial). Using this for the firstterm in (7.3.79), we find that

H

(∑

x∈Xp(x)(N⊗M)(ψx

A1 A2)

)

≤ H

(∑

x∈Xp(x)N(ψx

A1)

)+ H

(∑

x∈Xp(x)M(ψx

A2)

).

(7.3.80)

We now make use of the following identity, which is straightforward to verify:for every finite alphabet X and ensemble (p(x), ρx

AB),

∑x∈X

p(x)D(ρxAB‖ρx

A ⊗ ρxB) = ∑

x∈Xp(x)H(ρx

A) + ∑x∈X

p(x)H(ρxB)

− ∑x∈X

p(x)H(ρxAB). (7.3.81)

696

Page 708: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

We use this for the second term in (7.3.79) to conclude that

∑x∈X

p(x)H((Nc ⊗Mc)(ψxA1 A2

))

= ∑x∈X

p(x)H(Nc(ψxA1)) + ∑

x∈Xp(x)H(Mc(ψx

A2))

− ∑x∈X

p(x)D((Nc ⊗Mc)(ψxA1 A2

)‖Nc(ψxA1)⊗Mc(ψx

A2)).

(7.3.82)

Now, let us focus on the relative entropy term in the expression above. SinceN is a Hadamard channel, by Proposition 3.35 we know that the comple-mentary channel Nc is entanglement-breaking. Then, from Theorem 3.33, weknow that every entanglement-breaking channel can be written as the com-position of a measurement channel followed by a preparation channel. Thismeans that we can write Nc as Nc = P Mqc, where Mqc is the measurement(or quantum–classical) channel, and P is the preparation channel. Using thedata-processing inequality for the quantum relative entropy, for all x ∈ X weobtain

D((Nc ⊗Mc)(ψxA1 A2

)‖Nc(ψxA1)⊗M(ψx

A2))

= D((P Mcqc ⊗Mc)(ψx

A1 A2)‖(P Mqc)(ψ

xA1)⊗Mc(ψx

A2)) (7.3.83)

≤ D((Mqc ⊗M)(ψxA1 A2

)‖Mcqc(ψ

xA1)⊗Mc(ψx

A2)). (7.3.84)

Then, using the identity (7.3.81) once again, we obtain

∑x∈X

p(x)D((Mqc ⊗Mc)(ψxA1 A2

)‖Mqc(ψxA1)⊗Mc(ψx

A2))

= ∑x∈X

p(x)H(Mqc(ψxA1)) + ∑

x∈Xp(x)H(Mc(ψx

A2))

− ∑x∈X

p(x)H((Mqc ⊗Mc)(ψxA1 A2

)). (7.3.85)

Now, let the measurement channel Mqc have the associated POVM MyA1y∈Y

for some finite alphabet Y. Then, letting q(y|x) := Tr[MyψxA1], we have that

H(Mqc(ψxA1)) = H(Y|X = x) (7.3.86)

for all x ∈ X. Also, for every x ∈ X, we find that

(Mqc ⊗Mc)(ψxA1 A2

) = ∑y∈Y

q(y|x)|y〉〈y|Y ⊗ ρx,yA2

, (7.3.87)

697

Page 709: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

where ρx,yA2

:= 1q(y|x)TrA1 [(My

A1⊗ 1A2)ψ

xA1 A2

]. Note that

∑y∈Y

q(y|x)ρx,yA2

= ∑y∈Y

TrA1 [(MyA1⊗ 1A2)ψ

xA1 A2

]

= TrA1 [ψxA1 A2

]

= ψxA2

(7.3.88)

for all x ∈ X. Therefore, for all x ∈ X,

H((Mqc ⊗M)(ψxA1 A2

))

= H

(∑y∈Y

q(y|x)|y〉〈y|Y ⊗Mc(ρx,yA2)

)(7.3.89)

= H(Y|X = x) + ∑y∈Y

q(y|x)H(Mc(ρx,yA2)), (7.3.90)

where the last equality follows from the direct-sum property of the quantumentropy. Putting everything together, we obtain

H

(∑

x∈Xp(x)(N⊗M)(ψx

A1 A2)

)− ∑

x∈Xp(x)H((Nc ⊗Mc)(ψx

A1 A2))

≤ H

(∑

x∈Xp(x)N(ψx

A1)

)− ∑

x∈Xp(x)H(Nc(ψx

A1)) + H

(∑

x∈Xp(x)M(ψx

A2)

)

− ∑x∈X

p(x)H(Mc(ψxA2)) + ∑

x∈Xp(x)H(Y|X = x)

+ ∑x∈X

p(x)H(Mc(ψxA2))− ∑

x∈Xp(x)H(Y|X = x)

− ∑x∈X

∑y∈Y

p(x)q(y|x)H(Mc(ψx,yA2)) (7.3.91)

= H

(∑

x∈Xp(x)N(ψx

A1)

)− ∑

x∈Xp(x)H(Nc(ψx

A1)) (7.3.92)

+ H

(∑

x∈X∑y∈Y

p(x)q(y|x)M(ρx,yA2)

)

− ∑x∈X

∑y∈Y

p(x)q(y|x)H(Mc(ρx,yA2))

≤ χ(N) + χ(M), (7.3.93)

698

Page 710: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

where we have used (7.3.88) and, to obtain the last inequality, the fact thatthe first two terms in (7.3.93) are of the form of the objective function in theexpression in (7.3.1) for the Holevo information χ(N), and similarly for thelast two terms, in which the ensemble is (p(x)q(y|x), ρ

x,yA2) : x ∈ X, y ∈ Y.

Since the ensemble (p(x), ψxA1 A2

)x∈X used to obtain (7.3.93) is arbitrary,we conclude that

χ(N⊗M) ≤ χ(M) + χ(N), (7.3.94)

which implies, via the superadditivity in (7.2.44) that χ(N ⊗M) = χ(N) +χ(M), as required.

7.4 Summary

In this chapter, we developed the theory of classical communication over aquantum channel, adopting a similar structure to that of the previous chap-ter. We began with the one-shot setting of classical communication, and wedefined the one-shot classical capacity of a quantum channel in Definition 7.2.We then derived upper (Proposition 7.3) and lower (Proposition 7.5) boundson the one-shot classical capacity in terms of the hypothesis testing Holevoinformation of a quantum channel. The approaches to doing so are conceptu-ally similar to those from the previous chapter. However, there are extra stepsinvolved in deriving the lower bound, called derandomization and expurga-tion, that establish the existence of a code with maximum error probability nolarger than a given threshold and number of bits transmitted roughly equalto the one-shot Holevo information.

With the fundamental information-theoretic arguments established in theone-shot setting, we then moved on to the asymptotic setting of classical com-munication. One of the main results is that the regularized Holevo informa-tion of a channel is equal to its classical capacity (Theorem 7.13). We thenconsidered some special cases: for entanglement-breaking, Hadamard, depo-larizing, and erasue channels, the Holevo information is not only equal to theclassical capacity but also equal to the strong converse classical capacity (weshowed the proofs in full for entanglement-breaking and erasure channels,but deferred to the literature for the others). We discussed general upperbounds on the classical capacity, including the Υ-information and Cβ semi-definite programming bound.

Going forward from here, the methods of position-based coding and se-

699

Page 711: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

quential decoding are useful for the tasks of secret key distillation (Chap-ter 10) and private communication (Chapter 11), and the concept of deran-domization appears again in the context of private communication. The Holevoinformation will also play a role in achievable rates for these tasks.

7.5 Bibliographic Notes

Classical communication over quantum channels is one of the earliest settingsconsidered in quantum information theory. A key early work on the topic in-cludes Holevo (1973), in which the Holevo upper bound on classical capacitywas established. Many years later, after the advent of quantum computing,the Holevo information lower bound on classical capacity was establishedby Holevo (1998) and Schumacher and Westmoreland (1997). Prior to theseworks, Hausladen et al. (1996) proved the same lower bound for the spe-cial case of a channel that accepts classical inputs and outputs pure quantumstates.

Classical communication in the one-shot setting has been considered by anumber of authors, including Hayashi and Nagaoka (2003); Hayashi (2007);Mosonyi and Datta (2009); Mosonyi and Hiai (2011); Renes and Renner (2011);Wang and Renner (2012); Matthews and Wehner (2014); Sharma and Warsi(2012); Datta et al. (2013); Tomamichel and Hayashi (2013); Wilde (2013); An-shu et al. (2019); Qi et al. (2018); Oskouei et al. (2019). Proposition 7.3 is dueto Matthews and Wehner (2014). The second part of Theorem 7.4 is due toWilde et al. (2014). A variation of Theorem 7.5 (for average error probabil-ity with uniformly random message) is due to Wang and Renner (2012). Theproof given here is due to Oskouei et al. (2019), however with some variationsgiven in this book to account for maximal error probability. At the same time,the proof uses the method of position-based coding (Anshu et al., 2019), withthe derandomization argument as given by Qi et al. (2018).

Additivity of Holevo information for entanglement-breaking channels wasestablished by Shor (2002a), for Hadamard channels by King et al. (2007),for the depolarizing channel by King (2003b), and for the erasure channel byBennett et al. (1997). The fact that the Holevo information is the strong con-verse classical capacity of the depolarizing channel was proven by Koenigand Wehner (2009). Additivity of the sandwiched Rényi-Holevo informationfor entanglement-breaking channels (Theorem 7.16) was established by Wildeet al. (2014), by building upon earlier seminal results of King (2003a) subse-

700

Page 712: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

quently generalized by Holevo (2006). That is, Lemma 7.18 is due to King(2003a); Holevo (2006). Lemma 7.17 is due to Wilde et al. (2014).

The Υ-information of a quantum channel and its variants were defined byWang et al. (2019c). The same authors established bounds on classical capac-ity involving Υ-information. The strong converse for the classical capacityof the quantum erasure channel is due to Wilde and Winter (2014), but herewe have followed the approach of Wang et al. (2019c). The semi-definite pro-gramming upper bound Cβ(N) for the classical capacity of a quantum chan-nel N was established by Wang et al. (2018).

The Holevo information of covariant channels was studied by Holevo(2002b).

A proof of the fact that the limit in the definition of the regularized Holevoinformation of a channel exists was given by Barnum et al. (1998).

The formula in (7.3.72) for the Holevo information of the amplitude damp-ing channel was derived by Li-Zhen and Mao-Fa (2007b), using the tech-niques of Cortese (2002) and Berry (2005). The formula in (7.3.77) for thequantity Cβ for the same channel was determined by Wang et al. (2018) (seealso Khatri et al. (2020)).

Appendix 7.A The α → 1 Limit of the SandwichedRényi Υ-Information of a Channel

In this section, we show that

limα→1+

Υα(N) = Υ(N), (7.A.1)

where we recall that

Υα(N) = supψRA

infF∈F

Dα(NA→B(ψRA)‖FA→B(ψRA)), (7.A.2)

Υ(N) = supψRA

infF∈F

D(NA→B(ψRA)‖FA→B(ψRA)). (7.A.3)

Here, ψRA is a pure state, with the dimension of R equal to the dimension ofA, and the infimum is over the set F of completely positive maps defined as

F = FA→B : ∃ σB ≥ 0, Tr[σB] ≤ 1, FA→B(ρA) ≤ σB ∀ ρA ∈ D(HA). (7.A.4)

701

Page 713: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Now, since the sandwiched Rényi relative entropy increases monotoni-cally with α (see Proposition 4.29), and since limα→1 Dα(ρ‖σ) = D(ρ‖σ) (seeProposition 4.28), we obtain

limα→1+

Υα(N) = infα∈(1,∞)

supψRA

infF∈F

Dα(NA→B(ψRA)‖FA→B(ψRA)) (7.A.5)

= supψRA

infα∈(1,∞)

infF∈F

Dα(NA→B(ψRA)‖FA→B(ψRA)) (7.A.6)

= supψRA

infF∈F

infα∈(1,∞)

Dα(NA→B(ψRA)‖FA→B(ψRA)) (7.A.7)

= supψRA

infF∈F

D(NA→B(ψRA)‖FA→B(ψRA)) (7.A.8)

= Υ(N), (7.A.9)

as required, where to obtain the second equality we made use of the minimaxtheorem in Theorem 2.19 to exchange infα∈(1,∞) and supψRA

. Specifically, weapplied that theorem to the function

(α, ψRA) 7→ infF∈F

Dα(NA→B(ψRA)‖FA→B(ψRA)), (7.A.10)

which is monotonically increasing in the first argument and continuous in thesecond argument.

Appendix 7.B Proof of the Additivity of Cβ(N)

In this section, we prove that

Cβ(N1 ⊗N2) = Cβ(N1) + Cβ(N2). (7.B.1)

Noting that Cβ(N) = log2 β(N), with β(N) defined in (7.2.218), the expressionabove for the additivity of Cβ(N) is equivalent to the multiplicativity of β(N),i.e.,

β(N1 ⊗N2) = β(N1) · β(N2). (7.B.2)

We now prove that this equality holds. We start with the following lemma.

702

Page 714: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Lemma 7.35

Let A and B be Hermitian operators such that −A ≤ B ≤ A, and let Cand D be Hermitian operators such that −C ≤ D ≤ C. Then,

− A⊗ C ≤ B⊗ D ≤ A⊗ C. (7.B.3)

PROOF: The condition −A ≤ B ≤ A is equivalent to A− B ≥ 0 and A + B ≥0, and the condition −C ≤ D ≤ C is equivalent to C− D ≥ 0 and C + D ≥ 0.These inequalities imply that

(A− B)⊗ (C− D) ≥ 0, (7.B.4)(A + B)⊗ (C + D) ≥ 0, (7.B.5)(A− B)⊗ (C + D) ≥ 0, (7.B.6)(A + B)⊗ (C− D) ≥ 0. (7.B.7)

Expanding the left-hand side of these inequalities gives

A⊗ C− B⊗ C− A⊗ D + B⊗ D ≥ 0, (7.B.8)A⊗ C + B⊗ C + A⊗ D + B⊗ D ≥ 0, (7.B.9)A⊗ C− B⊗ C + A⊗ D− B⊗ D ≥ 0, (7.B.10)A⊗ C + B⊗ C− A⊗ D− B⊗ D ≥ 0. (7.B.11)

Now, adding the first two of these inequalities implies that A⊗ C + B⊗ D ≥0, which is equivalent to the left-hand side of (7.B.3). Adding the last twoinequalities implies that A⊗ C− B⊗D ≥ 0, which is equivalent to the right-hand side of (7.B.3).

An immediate corollary of the lemma above is the following: for all Her-mitian operators A, B, C, D such that 0 ≤ B ≤ A and 0 ≤ D ≤ C, it holdsthat

0 ≤ B⊗ D ≤ A⊗ C. (7.B.12)

Indeed, the condition 0 ≤ B ≤ A implies that A ≥ 0, which is equivalent to−A ≤ 0, which means that −A ≤ B holds. Similarly, we get that −C ≤ D.So we have that −A ≤ B ≤ A and −C ≤ D ≤ C. The result then follows byapplying the lemma above.

Now, let us start the proof of (7.B.2) by showing

β(N1 ⊗N2) ≤ β(N1) · β(N2). (7.B.13)

703

Page 715: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Recall from (7.2.218) that

β(N) =

infimum Tr[SB]subject to −RAB ≤ TB[ΓN

AB] ≤ RAB,−1A ⊗ SB ≤ TB[RAB] ≤ 1A ⊗ SB.

(7.B.14)

Now, let (R1AB, S1

B) be a feasible point in the SDP for β(N1), and let (R2AB, S2

B)be a feasible point in the SDP for β(N2). Each pair thus satisfies the constraintsin (7.B.14). Using Lemma 7.35, the first of these constraints implies that

− R1A1B1⊗ R2

A2B2≤ TB1 [Γ

N1A1B1

]⊗ TB2 [ΓN2A2B2

] ≤ R1A1B1⊗ R2

A2B2. (7.B.15)

Furthermore, observe that

TB1 [ΓN1A1B1

]⊗ TB2 [ΓN2A2B2

] = TB1B2 [ΓN1A1B1⊗ ΓN2

A2B2] (7.B.16)

= TB1B2 [ΓN1⊗N2A1 A2B1B2

]. (7.B.17)

Using this, along with Lemma 7.35, the second constraint in (7.B.14) impliesthat

− 1A1 A2 ⊗ S1B1⊗ S2

B2≤ TB1B2 [R

1A1B1⊗ R2

A2B2] ≤ 1A1 A2 ⊗ S1

B1⊗ S2

B2. (7.B.18)

Now, the inequalities in (7.B.17) and (7.B.18) imply that (R1A1B1⊗ R2

A2B2, S1

B1⊗

S2B2) is a feasible point in the SDP for β(N1 ⊗N2). This means that

β(N1 ⊗N2) ≤ Tr[S1B1⊗ S2

B2] = Tr[S1

B1]Tr[S2

B2]. (7.B.19)

Since (R1A1B1

, S1B1) and (R2

A2B2, S2

B2) are arbitrary feasible points in the SDPs for

β(N1) and β(N2), respectively, the inequality in (7.B.19) holds for the feasiblepoints achieving β(N1) and β(N2). This means that

β(N1 ⊗N2) ≤ β(N1) · β(N2), (7.B.20)

as required.

The prove the reverse inequality, i.e.,

β(N1 ⊗N2) ≥ β(N1)β(N2), (7.B.21)

we turn to the SDP dual to the one in (7.B.14).

704

Page 716: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Lemma 7.36

For every quantum channel N, the SDP dual to the SDP in (7.B.14) forβ(N) is given by

β(N) :=

supremum Tr[TB[ΓN

AB](KAB −MAB)]

,subject to KAB + MAB ≤ TB[EAB + FAB],

EB + FB ≤ 1B,KAB, MAB, EAB, FAB ≥ 0.

(7.B.22)

Furthermore, it holds that β(N) = β(N).

PROOF: Using the formulation of the SDP for β(N) as in (7.2.219), the dual tothe SDP for β(N) is simply

β(N) =

supremum Tr[DY]subject to Φ†(Y) ≤ C,

Y ≥ 0,(7.B.23)

where

C =

(1B 00 0AB

), D =

TB[ΓNAB] 0 0 0

0 −TB[ΓNAB] 0 0

0 0 0 00 0 0 0

, (7.B.24)

and the map Φ is defined as

Φ(X) =

RAB 0 0 00 RAB 0 00 0 1R ⊗ SB − TB[RAB] 00 0 0 1R ⊗ SB + TB[RAB]

, (7.B.25)

X =

(SB 00 RAB

). (7.B.26)

To determine the adjoint Φ†, we first observe that, since the operators C and Dare block diagonal, the objective function Tr[DY] of the dual problem involvesonly the diagonal blocks of Y. Furthermore, the fact that Φ(X) and X areblock diagonal means that the condition Tr[Φ(X)Y] = Tr[XΦ†(Y)] definingthe adjoint map Φ† involves only the diagonal blocks of Y. Therefore, if the

705

Page 717: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

dual problem is feasible, then there is always a feasible point Y that is blockdiagonal. This means that, without loss of generality, we can let

Y =

Y1AB 0 0 00 Y2

AB 0 00 0 Y3

AB 00 0 0 Y4

AB

, (7.B.27)

with Y1AB, Y2

AB, Y3AB, Y4

AB ≥ 0. Then,

Tr[Φ(X)Y]

= Tr

RAB 0 0 00 RAB 0 00 0 1R ⊗ SB − TB[RAB] 00 0 0 1R ⊗ SB + TB[RAB]

×

Y1AB 0 0 00 Y2

AB 0 00 0 Y3

AB 00 0 0 Y4

AB

(7.B.28)

= Tr[

RABY1AB + RABY2

AB + (1R ⊗ SB − TB[RAB])Y3AB

+(1R ⊗ SB + TB[RAB])Y4AB

](7.B.29)

= Tr[SB(Y3B + Y4

B)] + Tr[RAB(Y1AB + Y2

AB + TB[Y4AB −Y3

AB])] (7.B.30)

= Tr[(

SB 00 RAB

)(Y3

B + Y4B 0

0 Y1AB + Y2

AB + TB[Y4AB −Y3

AB]

)]. (7.B.31)

For the last line to be equal to Tr[Xֆ(Y)], we must have

Φ†(Y) =(

Y3B + Y4

B 00 Y1

AB + Y2AB + TB[Y4

AB −Y3AB]

). (7.B.32)

Then, the condition Φ†(Y) ≤ C is given by(

Y3B + Y4

B 00 Y1

AB + Y2AB + TB[Y4

AB −Y3AB]

)≤(

1B 00 0RB

), (7.B.33)

which implies that

Y3B + Y4

B ≤ 1B, (7.B.34)

706

Page 718: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

Y1AB + Y2

AB ≤ TB[Y3AB −Y4

AB]. (7.B.35)

Then,

Tr[DY] = Tr[TB[ΓNAB]Y

1AB]− Tr[TB[ΓN

AB]Y2AB] (7.B.36)

= Tr[TB[ΓNAB](Y

1AB −Y2

AB)]. (7.B.37)

Therefore, the dual is given by

β(N) =

supremum Tr[TB[ΓN

AB](KAB −MAB)]

subject to KAB + MAB ≤ TB[EAB − FAB],EB + FB ≤ 1B,KAB, MAB, EAB, FAB ≥ 0,

(7.B.38)

as required.

To show that β(N) = β(N), we need to check that Slater’s condition holds(Theorem 2.22). We can pick EAB = 1AB

3dA, FAB = 1AB

6dA, and KAB = MAB = 1AB

24dA,

where dA is the dimension of the space of the system A. Then we have strictinequalities for all of the constraints of the dual problem, which means thatSlater’s condition holds. The primal β(N) and dual β(N) are thus equal.

With the dual problem in hand, we can now prove (7.B.21). Let (K1A1B1

,M1

A1B1, E1

A1B1, F1

A1B1) be a feasible point for the dual SDP for N1, and let (K2

A2B2,

M2A2B2

, E2A2B2

, F2A2B2

) be a feasible point for the dual SDP for N2. Then, pick

KA1B1 A2B2 = K1A1B1⊗ K2

A2B2+ M1

A1B1⊗M2

A2B2, (7.B.39)

MA1B1 A2B2 = K1A1B1⊗M2

A2B2+ M1

A1B1⊗ K2

A2B2, (7.B.40)

EA1B1 A2B2 = E1A1B1⊗ E2

A2B2+ F1

A1B1⊗ F2

A2B2, (7.B.41)

FA1B1 A2B2 = E1A1B1⊗ F2

A2B2+ F1

A1B1⊗ E2

A2B2. (7.B.42)

Note that KA1B1 A2B2 , MA1B1 A2B2 , EA1B1 A2B2 , FA1B1 A2B2 ≥ 0. Then,

KA1B1 A2B2 −MA1B1 A2B2 = (K1A1B1−M1

A1B1)⊗ (K2

A2B2−M2

A2B2), (7.B.43)

KA1B1 A2B2 + MA1B1 A2B2 = (K1A1B1

+ M1A1B1

)⊗ (K2A2B2

+ M2A2B2

), (7.B.44)

EA1B1 A2B2 − FA1B1 A2B2 = (E1A1B1− F1

A1B1)⊗ (E2

A2B2− F2

A2B2), (7.B.45)

EA1B1 A2B2 + FA1B1 A2B2 = (E1A1B1

+ F1A1B1

)⊗ (E2A2B2

+ F2A2B2

). (7.B.46)

Consider that

KA1B1 A2B2 + MA1B1 A2B2

707

Page 719: Principles of Quantum Communication Theory: A Modern Approach

Chapter 7: Classical Communication

= (K1A1B1

+ M1A1B1

)⊗ (K2A2B2

+ M2A2B2

) (7.B.47)

≤ TB1 [E1A1B1− F1

A1B1]⊗ TB2 [E

2A2B2− F2

A2B2] (7.B.48)

= TB1B2 [(E1A1B1− F1

A1B1)⊗ (E2

A2B2− F2

A2B2)] (7.B.49)

= TB1B2 [EA1B1 A2B2 − FA1B1 A2B2 ], (7.B.50)

where the inequality follows from the constraints KiAiBi

, MiAiBi≥ 0 and Ki

AiBi

+ MiAiBi≤ TBi [E

iAiBi− Fi

AiBi] for i ∈ 1, 2 and from an application of (7.B.12).

Furthermore, we have that

EB1B2 + FB1B2 = (E1B1+ F1

B1)⊗ (E2

B2+ F2

B2) (7.B.51)

≤ 1B1 ⊗ 1B2 (7.B.52)= 1B1B2 , (7.B.53)

where the inequality follows from the constraints EiBi

, FiBi≥ 0 and Ei

Bi+

FiBi≤ 1Bi for i ∈ 1, 2 and from an application of (7.B.12). The collection

(KA1B1 A2B2 , MA1B1 A2B2 , EA1B1 A2B2 , FA1B1 A2B2) thus constitutes a feasible pointfor the SDP in (7.B.38). By restricting the optimization in the SDP to this point,we find that

β(N1 ⊗N2) (7.B.54)

≥ Tr[TB1B2 [Γ

N1⊗N2A1 A2B1B2

](KA1B1 A2B2 −MA1B1 A2B2)]

(7.B.55)

= Tr[(

TB1 [ΓN1A1B1

]⊗ TB2 [ΓN2A2B2

])(KA1B1 A2B2 −MA1B1 A2B2)

](7.B.56)

= Tr[(

TB1 [ΓN1A1B1

]⊗ TB2 [ΓN2A2B2

])

×((K1

A1B1−M1

A1B1)⊗ (K2

A2B2−M2

A2B2))]

(7.B.57)

= Tr[TB1 [Γ

N1A1B1

](K1A1B1−M1

A1B1)]

× Tr[TB2 [Γ

N2A2B2

](K2A2B2−M2

A2B2)]

. (7.B.58)

Now, since (K1AB, M1

AB, E1AB, F1

AB) (K2AB, M2

AB, E2AB, F2

AB) were arbitrary feasi-ble points in the SDPs for β(N1) = β(N1) and β(N2) = β(N2), respectively,the inequality in (7.B.58) holds for the feasible points achieving β(N1) andβ(N2). Therefore,

β(N1 ⊗N2) ≥ β(N1) · β(N2). (7.B.59)

We have thus shown that β(N1 ⊗N2) = β(N1) · β(N2).

708

Page 720: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8

Entanglement Distillation

In the last two chapters, we explored classical communication over quantumchannels, in which classical information is encoded into a quantum state,transmitted over a quantum channel, and decoded at the receiving end. Inthis chapter, we begin our exploration of quantum communication. The goalhere is to send quantum information between two spatially separated parties.By “quantum information,” we mean that a particular quantum state is trans-mitted, which is carried physically by some quantum system. As was the casein previous chapters, the particular information carrier is unimportant to uswhen developing the theoretical results; however, the most common phys-ical manifestation is a photonic encoding, which is useful for long-distancequantum communication.

A basic quantum communication protocol is teleportation, which we de-veloped in Section 3.3. In this protocol, the sender, Alice, initially shares amaximally entangled state with the receiver, Bob. This shared entanglement,along with classical communication, can be used to transmit an arbitraryquantum state perfectly from Alice to Bob. Specifically, if Alice and Bob sharea maximally entangled state of Schmidt rank d ≥ 2, then using this entangle-ment along with 2 log2 d bits of classical communication, Alice can perfectlytransmit an arbitrary state of log2 d qubits to Bob. Thus, the quantum tele-portation protocol realizes a noiseless quantum channel between Alice andBob without having to physically transport the particles carrying the quan-tum information. Of course, this achievement comes at the cost of having apre-shared maximally entangled state.

How do we obtain maximally entangled states in the first place? In prac-

709

Page 721: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

L↔ ΦABρ⊗nAB

An A

Bn B

Alice

Bob

FIGURE 8.1: Given a bipartite state ρAB shared by Alice and Bob, the taskof entanglement distillation is to find the largest d for which a maximallyentangled state |Φ〉AB of Schmidt rank d can be extracted from n copiesof ρAB with the smallest possible error, given a two-way LOCC channelL↔

AnBn→ABbetween Alice and Bob.

tice, due to noise and other device imperfections, physical sources of entan-glement often only produce mixed entangled states, not the pure, maximallyentangled states that are needed for quantum teleportation. The purpose ofthis chapter is to show that many copies of a mixed entangled state can beused to extract, or distill, some smaller number of pure maximally entangledstates. These distilled maximally entangled states can then be used for quan-tum communication via the teleportation protocol. This is a basic strategy forquantum communication that we consider in more detail in Chapter 14, inorder to obtain achievable rates for quantum communication over a quantumchannel.

Similar to quantum teleportation, in which the allowed resources are lo-cal operations by Alice and Bob and one-way classical communication fromAlice to Bob, in entanglement distillation we allow Alice and Bob local op-erations with two-way classical communication (that is, communication fromAlice to Bob and from Bob to Alice); see Figure 8.1. The goal is to determine,given many copies of a quantum state ρAB, the maximum rate at which max-imally entangled states (i.e., ebits) can be distilled approximately from ρAB,where the rate is defined as the ratio 1

n log2 d between the number log2 d ofapproximate ebits extracted and the initial number n of copies of ρAB. In theasymptotic setting, this maximum rate of entanglement distillation is calledthe distillable entanglement of ρAB, and we denote it by ED(ρAB). We often writeED(ρAB) as ED(A; B)ρ in order to explicitly indicate the bipartition betweenthe subsystems.

The shared resource state ρAB for entanglement distillation has to be en-tangled to begin with in order for entanglement distillation to be successful.

710

Page 722: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

If ρAB is separable to begin with, then it stays separable after the applicationof an LOCC channel, and it is not possible to distill high fidelity maximallyentangled states from a separable state. This intuitive reasoning becomes for-malized in this chapter: some of the entanglement measures from Chapter 5serve as upper bounds on the distillable entanglement, in both the one-shot(Section 8.1) and asymptotic (Section 8.2) settings. In particular, the Rainsrelative entropy and squashed entanglement are upper bounds on distillableentanglement. These entanglement measures are currently the best knownupper bounds on distillable entanglement, and so we focus exclusively onthem for this purpose in this chapter. It is then a trivial consequence of Propo-sition 5.24, (5.1.149), and Proposition 5.35 that log-negativity, relative entropyof entanglement, and entanglement of formation are upper bounds on distil-lable entanglement, and so we do not focus on these entanglement measuresin this chapter.

We also consider lower bounds on distillable entanglement in this chap-ter: the lower bound on distillable entanglement in the one-shot setting inSection 8.1.2 is based on the concept of decoupling, which is an important con-cept that we discuss later. This lower bound, when applied in the asymptoticsetting, leads to the coherent information lower bound ED(ρAB) ≥ I(A〉B)ρ

on distillable entanglement.

8.1 One-Shot Setting

The one-shot setting for entanglement distillation begins with Alice and Bobsharing the state ρAB. An entanglement distillation protocol for ρAB is defined bythe pair (d,L↔

AB→AB), where d ∈ N, d ≥ 2, and L↔

AB→ABis an LOCC channel

(Definition 3.37), with dA = dB = d. The distillation error perr(L↔; ρAB) of theprotocol is given by the infidelity, defined as

perr(L↔; ρAB) := 1− F(ΦAB,L↔AB→AB(ρAB)) (8.1.1)

= 1− 〈Φ|ABL↔AB→AB(ρAB)|Φ〉AB, (8.1.2)

where ΦAB is the maximally entangled state of Schmidt rank d, defined as

|Φ〉AB =1√d

d−1

∑i=0|i〉A ⊗ |i〉B, (8.1.3)

and F is the fidelity (see Section 3.5.2). To obtain (8.1.2), we used the formulain (3.5.23) for the fidelity between a pure state and a mixed state.

711

Page 723: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

The figure of merit in (8.1.2) is sensible: the error probability perr(L↔; ρAB)is equal to the probability that the state ωAB := L↔

AB→AB(ρAB) fails an “en-

tanglement test,” which is a measurement defined by the POVM

ΦAB, 1AB −ΦAB. (8.1.4)

Passing the test corresponds to the measurement operator ΦAB and failingcorresponds to 1AB − ΦAB. If 1− Tr[ωABΦAB] ≤ ε ∈ [0, 1], and dA = dB =d ≥ 2, then we say that the final state ωAB contains log2 d ε-approximate ebits.

Definition 8.1 (d, ε) Entanglement Distillation Protocol

An entanglement distillation protocol (d,L↔AB→AB

) for the state ρAB iscalled a (d, ε) protocol, with ε ∈ [0, 1], if perr(L↔; ρAB) ≤ ε.

Given ε ∈ [0, 1], the highest number log2 d of ε-approximate ebits thatcan be extracted from a state ρAB among all (d, ε) entanglement distillationprotocols is called the one-shot ε-distillable entanglement of ρAB.

Definition 8.2 One-Shot Distillable Entanglement

Given a bipartite state ρAB and ε ∈ [0, 1], the one-shot ε-distillable entan-glement of ρAB, denoted by Eε

D(ρAB) ≡ EεD(A; B)ρ, is defined as

EεD(A; B)ρ := sup

(d,L↔)

log2 d : perr(L↔; ρAB) ≤ ε, (8.1.5)

where the optimization is over all d ≥ 2 and every LOCC channelL↔

AB→ABwith dA = dB = d.

In addition to finding the largest number log2 d of ε-approximate ebitsthat can be extracted from all (d, ε) entanglement distilltion protocols for agiven ε ∈ [0, 1], we can consider the following complementary question: for agiven d ≥ 2, what is the lowest value of ε that can be attained among all (d, ε)entanglement distillation protocols? In other words, what is the value of

ε∗D(d; ρAB) := infL↔

AB→AB∈LOCC

perr(L↔; ρAB) : dA = dB = d, (8.1.6)

where the optimization is over every LOCC channel L↔AB→AB

, with dA =dB = d? In this book, we focus primarily on the problem of optimizing the

712

Page 724: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

number of extracted (approximate) ebits rather than the error, and so our pri-mary quantity of interest is the one-shot distillable entanglement Eε

D(ρAB).

Calculating the one-shot distillable entanglement is generally a difficulttask, because it involves optimizing over every Schmidt rank d ≥ 2 of themaximally entangled state ΦAB and over every LOCC channel LAB→AB, withdA = dB = d. We therefore try to estimate the one-shot distillable entangle-ment by devising upper and lower bounds. We begin in the next section withupper bounds.

8.1.1 Upper Bounds on the Number of Ebits

In this section, we provide three different upper bounds on one-shot distil-lable entanglement, based on coherent information, Rains relative entropy,and squashed entanglement. Our study of upper bounds on one-shot dis-tillable entanglement begins with coherent information and the followinglemma. This can be understood as a fully quantum generalization of Lemma 6.4,in which we are performing the entanglement test in (8.1.4) rather than thecomparator test from (6.1.37).

Lemma 8.3

Let A and B be quantum systems with the same dimension d ≥ 2. LetΦAB be a maximally entangled state of Schmidt rank d, and let ωAB be anarbitrary bipartite state. If the probability Tr[ΦABωAB] that the state ωABpasses the entanglement test defined by the POVM ΦAB, 1AB − ΦABsatisfies

Tr[ΦABωAB] ≥ 1− ε (8.1.7)

for some ε ∈ [0, 1], then

log2 d ≤ IεH(A〉B)ω, (8.1.8)

where IεH(A〉B)ω is the ε-hypothesis testing coherent information (see

(4.11.97)).

If, in addition to (8.1.7), we have that TrB[ωAB] = πA = 1Ad , then

2 log2 d ≤ IεH(A; B)ω. (8.1.9)

713

Page 725: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

PROOF: By assumption, we have that

F(ΦAB, ρAB) = Tr[ΦABωAB] ≥ 1− ε. (8.1.10)

Now, for every state σB, we have that

Tr[ΦAB(1A ⊗ σB)] =1d〈Γ|AB(1A ⊗ σB)|Γ〉AB (8.1.11)

=1d

Tr[σB] (8.1.12)

=1d

, (8.1.13)

where we used (2.2.33). Next, recall from (4.11.97) that

IεH(A〉B)ω = inf

σBDε

H(ωAB‖1A ⊗ σB), (8.1.14)

where

DεH(ρAB‖1A ⊗ σB) = − log2 inf

ΛABTr[ΛAB(1A ⊗ σB)] : 0 ≤ ΛAB ≤ 1AB,

Tr[ΛABωAB] ≥ 1− ε. (8.1.15)

Based on (8.1.10), we see that ΦAB is a measurement operator satisfying theconstraints for the optimization in the definition of Dε

H(ωAB‖1A⊗ σB). There-fore,

Tr[ΦAB(1A ⊗ σB)] =1d≥ 2−Dε

H(ωAB‖1A⊗σB), (8.1.16)

which implies thatlog2 d ≤ Dε

H(ωAB‖1A ⊗ σB) (8.1.17)

for every state σB. Optimizing over σB leads to

log2 d ≤ infσB

DεH(ωAB‖1A ⊗ σB) = Iε

H(A〉B)ω, (8.1.18)

which is precisely (8.1.8).

Similarly,

Tr[ΦAB(πA ⊗ σB)] =1d

Tr[ΦAB(1A ⊗ σB)] (8.1.19)

=1d2 〈Γ|AB(1A ⊗ σB)|Γ〉AB (8.1.20)

714

Page 726: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

=1d2 , (8.1.21)

where the last line follows from the same reasoning for (8.1.11)–(8.1.13). Next,recall that

IεH(A; B)ω = inf

σBDε

H(ωAB‖ωA ⊗ σB). (8.1.22)

Therefore, by definition of the hypothesis testing relative entropy,

Tr[ΦAB(πA ⊗ σB)] =1d2 ≥ 2−Dε

H(ωAB‖πA⊗σB), (8.1.23)

which implies that2 log2 d ≤ Dε

H(ωAB‖πA ⊗ σB). (8.1.24)

Since the state σB is arbitrary, we obtain

2 log2 d ≤ infσB

DεH(ωAB‖πA ⊗ σB) = Iε

H(A; B)ω, (8.1.25)

which is precisely (8.1.9). To obtain the last equality, we made use of theassumption TrB[ωAB] = πA.

Note that the result of Lemma 8.3 is general and applies to every bipartitestate that is close in fidelity to a maximally entangled state. Applying it to thestate ωAB = LAB→AB(ρAB) at the output of a (d, ε) entanglement distillationprotocol for a state ρAB, we obtain the following result:

Theorem 8.4 Upper Bound on One-Shot Distillable Entanglement

Let ρAB be a bipartite state. For every (d, ε) entanglement distillationprotocol (d,LAB→AB) for ρAB, with ε ∈ (0, 1] and dA = dB = d, thenumber of ε-approximate ebits extracted at the end of the protocol isbounded from above by the LOCC-optimized ε-hypothesis testing co-herent information of ρAB, i.e.,

log2 d ≤ supL

IεH(A′〉B′)L(ρ), (8.1.26)

where the optimization is over every LOCC channel LAB→A′B′ . Conse-quently, for the one-shot ε-distillable entanglement, we obtain

EεD(A; B)ρ ≤ sup

L

IεH(A′〉B′)L(ρ). (8.1.27)

715

Page 727: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

PROOF: For a (d, ε) entanglement distillation protocol (d,LAB→AB) for ρAB,by definition the state ωAB = LAB→AB(ρAB) satisfies Tr[ΦABωAB] ≥ 1 −ε. Therefore, using (8.1.8), we conclude that log2 d ≤ Iε

H(A〉B)L(ρ). SinceLAB→AB is a particular LOCC channel, we conclude that

IεH(A〉B)L(ρ) ≤ sup

L

IεH(A′〉B′)L(ρ). (8.1.28)

We thus conclude (8.1.26). Now using the definition of EεD(A; B)ρ in (8.1.5),

we obtain the inequality in (8.1.27).

We now consider an upper bound based on the Rains relative entropy.In order to place an upper bound on the one-shot distillable entanglementEε

D(ρAB) for a given state ρAB and ε ∈ [0, 1], we consider states that are uselessfor entanglement distillation. This is entirely analogous conceptually to whatis done for classical and entanglement-assisted classical communication inthe previous two chapters, in which we used the set of replacement channels(which are useless for both of these communication tasks) to place an upperbound on the number of transmitted bits in an (|M|, ε) protocol, such that theupper bound depends only on the channel N being used for communication.

What states are useless for entanglement distillation? Note that an intu-itive necessary condition for successful entanglement distillation is that theinitial state ρAB should be entangled: if ρAB is separable, then the outputstate LAB→AB(ρAB) of an arbitrary entanglement distillation protocol is stilla separable state. This suggests that separable states are useless for entanglementdistillation. To be more precise, separable states are useless for entanglementdistillation because they have a very small probability of passing the entan-glement test. As we show in Lemma 8.5 below, the following bound holds forevery separable state σAB:

Tr[ΦABσAB] ≤1d

, (8.1.29)

where d is the Schmidt rank of ΦAB. More generally, operators in the set PPT′,defined as

PPT′(A : B) = σAB : σAB ≥ 0, ‖TB[σAB]‖1 ≤ 1, (8.1.30)

are also useless for entanglement distillation, in the sense that a statementanalgous to (8.1.29) can be made for them. We now prove this statement.

716

Page 728: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Lemma 8.5

Let A and B be quantum systems with the same dimension d ≥ 2.Let ΦAB be a maximally entangled state of Schmidt rank d. If σAB ∈PPT′(A : B), then Tr[ΦABσAB] ≤ 1

d .

REMARK: Note that Lemma 8.5 implies that Tr[ΦABσAB] ≤ 1d for every separable state

σAB because SEP(A : B) ⊆ PPT′(A : B) (recall Figure 5.2).

PROOF: Using the fact that the partial transpose is self-inverse and self-adjoint,as discussed in (3.1.77) and (3.1.78), respectively, we find that

Tr[ΦABσAB] = Tr[TB(ΦAB)TB(σAB)] (8.1.31)

=1d

Tr[FABTB(σAB)], (8.1.32)

where FAB = 1dTB(ΦAB) is the dimension-scaled unitary swap operator, as

shown in (3.1.87). Since FAB is a unitary operator, by the variational charac-terization of the trace norm (see (2.2.88)), we obtain

Tr[ΦABσAB] ≤1d‖TB(σAB)‖1 (8.1.33)

≤ 1d

, (8.1.34)

where the last line follows from the definition of the set PPT′(A : B) in (8.1.30).

Due to the fact that SEP ⊆ PPT ⊆ PPT′ (see Figure 5.2), Lemma 8.5 tells usthat both separable and PPT states are useless for entanglement distillation.However, due to the fact that separable states are strictly contained in the setof PPT states for all bipartite states except for qubit-qubit and qubit-qutritstates, it follows that there are PPT entangled states that are useless for entan-glement distillation. We elaborate upon this point further in Section 8.2.0.1below, and we show that the distillable entanglement (in the asymptotic set-ting) vanishes for all PPT states.

The steps followed in the proof of Lemma 8.5 above are completely anal-ogous to the steps in (8.1.11)–(8.1.13) and in (8.1.19)–(8.1.21) of the proof ofLemma 8.3. Therefore, just as Lemma 8.3 was used to establish Proposi-tion 8.4, we can use Lemma 8.5 to place an upper bound on the number log2 dof approximate ebits in a bipartite state ωAB.

717

Page 729: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Proposition 8.6

Fix ε ∈ [0, 1], and let A and B be quantum systems with the same dimen-sion d ≥ 2. Fix a maximally entangled state ΦAB of Schmidt rank d. LetωAB be an ε-approximate maximally entangled state, in the sense that

F(ΦAB, ωAB) = Tr[ΦABωAB] ≥ 1− ε. (8.1.35)

Then, the number log2 d of ε-approximate ebits in ωAB is bounded fromabove as follows:

log2 d ≤ RεH(A; B)ω, (8.1.36)

where RεH(A; B)ω is the ε-hypothesis testing Rains relative entropy of

ωAB (see (5.3.5)).

PROOF: Let σAB be an arbitrary operator in PPT′(A : B). The inequalityTr[ΦABωAB] ≥ 1− ε guarantees that ωAB passes the entanglement test withprobability greater than 1 − ε. Thus, we conclude that ΦAB is a particularmeasurement operator satisfying the constraints for 2−Dε

H(ωAB‖σAB). Apply-ing Lemma 8.5 and the definition of Dε

H(ωAB‖σAB), we conclude that

2−DεH(ωAB‖σAB) ≤ Tr[ΦABσAB] ≤

1d

. (8.1.37)

Rearranging this leads to

log2 d ≤ DεH(ωAB‖σAB) (8.1.38)

Since this inequality holds for every operator σAB ∈ PPT′(A : B), we concludethat

log2 d ≤ infσAB∈PPT′(A:B)

DεH(ωAB‖σAB) = Rε

H(A; B)ω, (8.1.39)

where we used the definition of RεH(A; B)ω in (5.3.5) to obtain the last equal-

ity.

A consequence of Proposition 8.6 is the following upper bound on theone-shot distillable entanglement of ρAB.

718

Page 730: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Theorem 8.7 Rains Upper Bound on One-Shot Distillable Entangle-ment

Let ρAB be a bipartite state. For every (d, ε) entanglement distillationprotocol for ρAB, with ε ∈ [0, 1], we have that

log2 d ≤ RεH(A; B)ρ. (8.1.40)

Consequently, for the one-shot distillable entanglement, we have

EεD(A; B)ρ ≤ Rε

H(A; B)ρ (8.1.41)

for every state ρAB and all ε ∈ [0, 1].

PROOF: Consider a (d, ε) entanglement distillation protocol for ρAB with thecorresponding LOCC channel LAB→AB. Then, by definition, we have that

perr(L; ρAB) = 1− Tr[ΦABLAB→AB(ρAB)] ≤ ε. (8.1.42)

Letting ωAB = LAB→AB(ρAB), we have that Tr[ΦABωAB] ≥ 1− ε. The out-put state ωAB of the entanglement distillation protocol therefore satisfies theconditions of Proposition 8.6, which means that

log2 d ≤ RεH(A; B)ω. (8.1.43)

Now, it follows from Proposition 5.25 that RεH(A; B) is an entanglement mea-

sure. Thus, it satisfies the data-processing inequality under LOCC chan-nels, which means that Rε

H(A; B)ω ≤ RεH(A; B)ρ. We thus have log2 d ≤

RεH(A; B)ρ. Since this inequality holds for all d ≥ 2 and every LOCC chan-

nel LAB→AB, by definition of one-shot ε-distillable entanglement, we obtainEε

D(A; B)ρ ≤ RεH(A; B)ρ, as required.

The main step that allows us to conclude the bound in (8.1.40) in terms ofthe state ρAB alone is the fact that Rε

H is an entanglement measure, meaningthat it is monotone non-increasing under LOCC channels. In other words,the set of PPT′ operators is preserved under LOCC channels. This fact is nottrue for the set 1A ⊗ σB : σB ∈ D(H) appearing in the optimization thatdefines the ε-hypothesis testing coherent information, meaning that the op-erator LAB→AB(1A ⊗ σB) (where LAB→AB is an LOCC channel) is not in gen-eral of the form 1B ⊗ τB for some state τB. We therefore cannot use the data-processing inequality for the bound in (8.4) in order to reduce it to log2 d ≤IεH(A〉B)ρ.

719

Page 731: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Combining Theorems 8.4 and 8.7 with Propositions 4.67 and 4.68 immedi-ately leads to the following upper bounds.

Corollary 8.8

Let ρAB be a bipartite state, and let ε ∈ [0, 1/2). For every (d, ε) entan-glement distillation protocol (d,LAB→AB), with dA = dB = d, we havethat

log2 d ≤ 11− 2ε

(supL

I(A′〉B′)L(ρ) + h2(ε)

), (8.1.44)

where I(A′〉B′)L(ρ) is the coherent information of LAB→A′B′(ρAB) (see(4.2.91)). For ε ∈ [0, 1),

log2 d ≤ Rα(A; B)ρ +α

α− 1log2

(1

1− ε

)∀ α > 1, (8.1.45)

whereRα(A; B)ρ = inf

σAB∈PPT′(A:B)Dα(ρAB‖σAB) (8.1.46)

is the sandwiched Rényi Rains relative entropy of ρAB (see (5.3.6)).

PROOF: Combining the upper bound in (8.1.26) from Theorem 8.4 with theupper bound in (4.9.52) from Proposition 4.67, we obtain

log2 d ≤ supL

IεH(A′〉B′)L(ρ) (8.1.47)

≤ 11− ε

(supL

I(A′〉B′)L(ρ) + h2(ε)

)+

ε

1− εlog2 d. (8.1.48)

Rearranging this and simplifying leads to

log2 d ≤ 11− 2ε

(supL

I(A′〉B′)L(ρ) + h2(ε)

), (8.1.49)

which is the inequality in (8.1.44). The inequality in (8.1.45) follows fromTheorem 8.7 and (4.9.59) in Proposition 4.68.

Since the upper bounds in (8.1.44) and (8.1.45) hold for all (d, ε) entan-glement distillation protocols, we conclude the following upper bounds on

720

Page 732: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

distillable entanglement:

EεD(A; B)ρ ≤

11− 2ε

(supL

I(A′〉B′)L(ρ) + h2(ε)

), (8.1.50)

EεD(A; B)ρ ≤ Rα(A; B)ρ +

α

α− 1log2

(1

1− ε

)∀ α > 1. (8.1.51)

We finally turn to squashed entanglement and establish it as an upperbound on one-shot distillable entanglement:

Theorem 8.9 Squashed Entanglement Upper Bound on One-ShotDistillable Entanglement

Let ρAB be a bipartite state. For every (d, ε) entanglement distillationprotocol for ρAB, with ε ∈ [0, 1), we have that

log2 d ≤ 11−√ε

(Esq(A; B)ρ + g2(

√ε))

, (8.1.52)

where Esq(A; B)ρ is the squashed entanglement of ρAB (see (5.1.162)) andg2(δ) := (δ + 1) log2(δ + 1) − δ log2 δ. Consequently, for the one-shotdistillable entanglement, we have

EεD(A; B)ρ ≤

11−√ε

(Esq(A; B)ρ + g2(

√ε))

(8.1.53)

for every state ρAB and ε ∈ [0, 1).

PROOF: Consider a (d, ε) entanglement distillation protocol for ρAB with thecorresponding LOCC channel LAB→AB. From the LOCC monotonicity ofsquashed entanglement (Theorem 5.33), we have that

Esq(A; B)ω ≤ Esq(A; B)ρ, (8.1.54)

where ωAB = LAB→AB(ρAB). Continuing, by definition, the following in-equality holds

perr(L; ρAB) = 1− Tr[ΦABLAB→AB(ρAB)] ≤ ε. (8.1.55)

It follows that Tr[ΦABωAB] ≥ 1− ε, which is the same as

F(ΦAB, ωAB) ≥ 1− ε. (8.1.56)

721

Page 733: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

As a consequence of Proposition 5.38, we find that

Esq(A; B)ω ≥ Esq(A; B)Φ −(√

ε log2 min∣∣A

∣∣ ,∣∣B∣∣+ g2(

√ε))

(8.1.57)

= log2 d−(√

ε log2 d + g2(√

ε))

(8.1.58)

= (1−√

ε) log2 d− g2(√

ε). (8.1.59)

The first equality follows from Proposition 5.36. We can finally rearrange theestablished inequality Esq(A; B)ρ ≥ (1−√ε) log2 d− g2(

√ε) to be in the form

stated in the theorem.

8.1.2 Lower Bound on the Number of Ebits via Decoupling

Having found upper bounds on one-shot distillable entanglement, we nowfocus on lower bounds. In order to find a lower bound on distillable entan-glement, we have to find an explicit entanglement distillation protocol thatworks for an arbitrary bipartite state ρAB and an arbitrary error ε ∈ (0, 1). Re-call that the goal of entanglement distillation is for two parties, Alice and Bob,to make use of LOCC to transform their shared bipartite state ρAB to the max-imally entangled state ΦAB, for some dA = dB = d ≥ 2. Now, the initial stateρAB has some purification |ψρ〉ABE, with the purifying system E in generalcorrelated with A and B. However, because the maximally entangled state ispure, every purification of it must be of the form ΦAB ⊗ φE′ , with the systemE′ in tensor product with systems A and B. Since the goal of entanglementdistillation is to distill a maximally entangled state and the maximally entan-gled state has this property, we can thus think of entanglement distillation asthe task of decoupling A and B from their environment E; see Figure 8.2. Ourlower bound on one-shot distillable entanglement tells us what dimension dof A and B is sufficient in order to achieve this decoupling up to error ε.

The lower bound on one-shot distillable entanglement that we determinein this section is expressed in terms of an information measure that is derivedfrom a smoothed version of the max-relative entropy Dmax, which we brieflycover in Section 4.8.1. Recall from Definition 4.56 that the max-relative en-tropy of a state ρ and a positive semi-definite operator σ is defined as

Dmax(ρ‖σ) = log2

∥∥∥σ−12 ρσ−

12

∥∥∥∞

. (8.1.60)

Using this, we define the conditional min-entropy as

Hmin(A|B)ρ := − infσB

Dmax(ρAB‖1A ⊗ σB) (8.1.61)

722

Page 734: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

LψABE Φ+AB

ρE

A

B

A

B

E

FIGURE 8.2: The task of entanglement distillation can be understood fromthe perspective of decoupling: given a bipartite state ρAB with purificationψABE, the entanglement distillation protocol given by the LOCC channelL should result in the pure maximally entangled state ΦAB, which by def-inition is in tensor product with the environment, so that the joint state isΦAB ⊗ ρE, with ρE = TrAB[ψABE].

where the optimization is with respect to every state σB.

From Definition 4.59, the smooth max-relative entropy is defined as

Dεmax(ρ‖σ) = inf

ρ∈Bε(ρ)Dmax(ρ‖σ), (8.1.62)

where we recall from (4.8.31) that

Bε(ρ) = ρ : P(ρ, ρ) ≤ ε, (8.1.63)

and the sine distance P(ρ, ρ) is given by (see Definition 3.66)

P(ρ, ρ) =√

1− F(ρ, ρ). (8.1.64)

Using the smooth max-relative entropy, we define the smooth conditionalmin-entropy of ρAB as

Hεmin(A|B)ρ = − inf

σBDε

max(ρAB‖1A ⊗ σB) (8.1.65)

= supρ∈Bε(ρ)

Hmin(A|B)ρ (8.1.66)

for all ε ∈ (0, 1), where the optimization in the first line is with respect tostates σB.

We also need the smooth conditional max-entropy of ρAB, which is definedas

Hεmax(A|B)ρ := inf

ρ∈Bε(ρ)Hmax(A|B)ρ (8.1.67)

723

Page 735: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

for all ε ∈ (0, 1), where

Hmax(A|B)ρ := H12(A|B)ρ (8.1.68)

= − infσB

D12(ρAB‖1A ⊗ σB) (8.1.69)

= supσB

log2 F(ρAB, 1A ⊗ σB). (8.1.70)

The obtain the last equality, we made use of (4.5.8), and the optimizationtherein is with respect to states σB.

For every state ρAB, the conditional min- and max-entropies are related asfollows:

Hmax(A|B)ρ = −Hmin(A|E)ψ (8.1.71)Hε

max(A|B)ρ = −Hεmin(A|E)ψ, (8.1.72)

for all ε ∈ (0, 1), where ψABE is a purification of ρAB.

Both the conditional min-entropy and the smooth conditional min-entropycan be formulated as semi-definite programs. The same is true for the condi-tional max-entropy and the smooth conditional max-entropy. Please consultthe Bibliographic Notes in Section 8.5 for details.

Finally, we need the quantity

H2(A|B)ρ := − infσB

D2(ρAB‖1A ⊗ σB) (8.1.73)

= − infσB

log2 Tr

[(σ− 1

4B ρABσ

− 14

B

)2]

, (8.1.74)

which is known as the “conditional collision entropy” of ρAB, where the op-timization is with respect to states σB. Due to monotonicity in α of the sand-wiched Rényi relative entropy Dα (see Proposition 4.29), we have that

Hmin(A|B)ρ ≤ H2(A|B)ρ (8.1.75)

for every bipartite state ρAB.

We are now ready to state a lower bound on one-shot distillable entangle-ment.

724

Page 736: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Theorem 8.10 Lower Bound on One-Shot Distillable Entanglement

Let ρAB be a quantum state. For all ε ∈ (0, 1] and η ∈ [0,√

ε), there existsa (d, ε) one-way entanglement distillation protocol for ρAB with

log2 d = −H√

ε−ηmax (A|B)ρ + 4 log2 η. (8.1.76)

Consequently, for the one-shot distillable entanglement of ρAB, we have

EεD(A; B)ρ ≥ sup

L

(−H

√ε−η

max (A′|B′)L(ρ))+ 4 log2 η (8.1.77)

for all ε ∈ [0, 1] and η ∈ [0,√

ε), where the optimization is over everyLOCC channel LAB→A′B′ .

In order to prove Theorem 8.10, we exhibit an entanglement distillation

protocol (d,LAB→AB), with dA = dB = d = 2−H√

ε−ηmax (A|B)ρ+4 log2 η, such that

perr(L; ρAB) ≤ ε for all ε ∈ (0, 1]. To this end, we construct a one-way LOCCchannel L→

AB→ABof the form

L→AB→AB = ∑x∈X

ExA→A ⊗Dx

B→B, (8.1.78)

where X is a finite alphabet, ExA→Ax∈X is a set of completely positive maps

such that ∑x∈X ExA→A

is trace preserving, and DxB→Bx∈X is a set of channels.

Recall from Section 3.2.11 that every one-way Alice-to-Bob LOCC channel canbe written as

L→AB→AB = DBXB→B CXA→XB EA→AXA, (8.1.79)

where EA→AXAis a local channel for Alice that corresponds to the quantum

instrument given by the maps ExA→Ax∈X, i.e., (see also (3.2.180))

EA→AXA(ρA) = ∑

x∈XEx

A→A(ρA)⊗ |x〉〈x|XA . (8.1.80)

The map CXA→XB is a noiseless classical channel that transforms the classicalregister XA, held by Alice, to the classical register XB (which is simply a copyof XA), held by Bob. The final channel DBXB→B is a local channel for Bobdefined as

DBXB→B(ρB ⊗ |x〉〈x|XB) = DxB→B(ρB) (8.1.81)

725

Page 737: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

for all x ∈ X. In the proof below, we explicitly construct the CP maps EA→Ax∈Xand the channels Dx

B→Bx∈X.

In addition to providing explicit forms for the channels EA→AXAand DBXB→B

involved in the LOCC channel L→AB→AB

in (8.1.78), we prove that perr(L; ρAB) ≤ε. To do this, we make use of the following general decoupling result, whichwe explain and prove in Appendix 8.A.

Theorem 8.11

Given a subnormalized state ρAE (i.e., Tr[ρAE] ≤ 1), and a completelypositive map NA→A′ , the following bound holds

UA

∥∥∥NA→A′(UAρAEU†A)−ΦN

A′ ⊗ ρE

∥∥∥1

dUA

≤ 2−12 H2(A|E)ρ− 1

2 H2(A|A′)ΦN , (8.1.82)

where ΦNA′ := TrA[ΦN

AA′ ], ΦNAA′ is the Choi state of NA→A′ , ρE :=

TrA[ρAE], and the integral is over unitaries UA acting on system A, takenwith respect to the Haar measure.

PROOF: See Appendix 8.A.

REMARK: The integral in (8.1.82) with respect to the Haar measure should be thought ofas a uniform average over the continuous set of all unitaries UA acting on the system A. Inother words, the integral is analogous to a uniform average over a discrete set of unitaries.In fact, for every dimension d ≥ 2, there exists a set Uxx∈X of unitaries, called a unitaryone-design, such that

UUXU† dU =

1|X| ∑

x∈XUxXU†

x = Tr[X]1

d(8.1.83)

for every operator X. An example of a unitary one-design is the Heisenberg-Weyl opera-tors Wz,x : 0 ≤ z, x ≤ d− 1, which are defined in (3.2.115)–(3.2.117). Please consult theBibliographic Notes in Section 8.5 for more information about integration over unitarieswith respect to the Haar measure and about unitary designs. A simple argument for theright-most equality in (8.1.83) goes as follows. First, it follows for a unitary V that

V(∫

UUXU† dU

)V† =

UVUX(VU)† dU =

UUXU† dU, (8.1.84)

where the final equality follows because the Haar measure is a unitarily invariant mea-sure. So it follows that the operator

∫U UXU† dU commutes with all unitaries. The only

operator that does so is the identity operator, which implies that∫

U UXU† dU ∝ 1. The

726

Page 738: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

normalization factor of Tr[X]/d follows by taking a trace of the left-hand side, using itscyclicity, and the fact that dU is a probability measure.

Proof of Theorem 8.10

Fix η ∈ (0,√

ε), and let ψABE be a purification of ρAB, with ρAE := TrB[ψABE].Fix d such that (8.1.76) holds. Then, starting from the expression in (8.1.76)and using (8.1.72), we find that

log2 d = H√

ε−ηmin (A|E)ρ + 4 log2 η. (8.1.85)

Now, pick a state ρAE ∈ B√

ε−η(ρAE) such that

Hmin(A|E)ρ = H√

ε−ηmin (A|E)ρ. (8.1.86)

Then, using (8.1.75), we find that

log2 d = Hmin(A|E)ρ + 4 log2 η (8.1.87)

≤ H2(A|E)ρ + 4 log2 η (8.1.88)

= H2(A|E)ρ − 2 log2

(1η2

), (8.1.89)

where

H2(A|E)ρ = − infσE

log2 Tr

[(σ− 1

4E ρAEσ

− 14

E

)2]

(8.1.90)

We now define a channel EA→AXAas follows:

EA→AXA(·) := ∑

x∈XVx

A→AΠxA(·)Πx

AVx†A→A ⊗ |x〉〈x|XA , (8.1.91)

where dA = d, X is a finite alphabet with1 |X| = dXA = dAd , Πx

Ax∈X is a setof projectors such that ∑x∈X Πx

A = 1A, and VxA→Ax∈X is a set of isometries.

So we have thatEx

A→A(·) := VxA→AΠx

A(·)ΠxAVx†

A→A (8.1.92)

1We assume that d divides dA without loss of generality. If it is not the case, then wecan repeat the whole analysis with the system A embedded in a larger Hilbert space that isdivided by d. We would also need to start with a state ρAE such that (8.1.86) holds with the

definition H√

ε−ηmin (A|E)ρ := − infσE D

√ε−η

max (ρAE‖ΠA ⊗ σB), where ΠA is the projection ontothe support of TrE[ρAE]. Then we would repeat the whole analysis with such a ρAE. We donot go into further details here.

727

Page 739: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

for all x ∈ X. Each isometry VxA→A

takes the subspace of HA onto which ΠxA

projects and embeds it into the fixed d-dimensional space HA, i.e., im(VxA→A

) =

HA for all x ∈ X. The projectors ΠxAx∈X correspond to a measurement of

the input state, with ΠxA(·)Πx

A the (unnormalized) post-measurement state,and the isometries Vx

A→Ax∈X can be thought of as encodings of the initial

system A into the system A on which one share of the desired maximallyentangled state ΦAB is to be generated. We have

EA→AXA(ρAB) = ∑

x∈XEA→A(ρAB)⊗ |x〉〈x|XA (8.1.93)

= ∑x∈X

VxA→A(Π

xAρABΠx

A)⊗ |x〉〈x|XA (8.1.94)

= ∑x∈X

p(x)ωxAB ⊗ |x〉〈x|XA , (8.1.95)

where

p(x) := Tr[ΠAρA], (8.1.96)

ωxAB :=

1p(x)

VxA→A(Π

xAρABΠx

A). (8.1.97)

Now, by Theorem 8.11, the following inequality holds∫

UA

∥∥∥EA→AXA(UAρAEU†

A)−ΦEAXA⊗ ρE

∥∥∥1

dUA

≤ 2−12 H2(A|E)ρ− 1

2 H2(A|AXA)ΦE , (8.1.98)

where ΦEAAXA

is the Choi state of EA→AXA. Given that log2 d ≤ H2(A|E)ρ −

2 log2

(1η2

), we obtain

2−12 H2(A|E)ρ ≤ 1√

dη2. (8.1.99)

We also have that

H2(A|AXA)ΦE = supσAXA

− log2 Tr

[(σ− 1

4AXA

ΦEAAXA

σ− 1

4AXA

)2]

(8.1.100)

≥ − log2 d. (8.1.101)

Indeed, in the optimization over σAXA, take

σAXA=

1dXA

∑x∈X

πA ⊗ |x〉〈x|XA (8.1.102)

728

Page 740: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

= πA ⊗ πXA (8.1.103)

=1

dA1A ⊗ 1XA . (8.1.104)

With this choice of σAXA, we find that

Tr

[(σ− 1

4AXA

ΦEAAXA

σ− 1

4AXA

)2]

= dA Tr[(

ΦEAAXA

)2]

(8.1.105)

= dA Tr

(

∑x∈X

VxA→AΠx

AΦAAΠxA(V

xA→A)

† ⊗ |x〉〈x|XA

)2 (8.1.106)

= dA ∑x∈X

Tr[(

VxA→AΠx

AΦAAΠxA(V

xA→A)

†)2]

(8.1.107)

= dA ∑x∈X

Tr

(

VxA→AΠx

A1

dA

dA−1

∑i,j=0|i〉〈j|A ⊗ |i〉〈j|AΠx

A(VxA→A)

)2 (8.1.108)

= dA ∑x∈X

Tr

(

1dA

d−1

∑i,j=0|i〉〈j|A ⊗ |i〉〈j|A

)2 (8.1.109)

= dA ∑x∈X

Tr

(

ddA

1d

d−1

∑i,j=0|i〉〈j|A ⊗ |i〉〈j|A

)2 (8.1.110)

=d2

dA∑

x∈XTr

(

1d

d−1

∑i,j=0|i〉〈j|A ⊗ |i〉〈j|A

)2 (8.1.111)

=d2

dA|X| (8.1.112)

= d, (8.1.113)

where we recall that dXA = |X| = dAd . We thus have

2−12 H2(A|AXA)ΦE ≤

√d, (8.1.114)

which means that∫

UA

∥∥∥EA→AXA(UAρAEU†

A)−ΦEAXA⊗ ρE

∥∥∥1

dUA ≤ η2. (8.1.115)

729

Page 741: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Note that

ΦEAXA

=1

dA∑

x∈XVx

A→AΠxA1AΠx

AVx†A→A ⊗ |x〉〈x|XA (8.1.116)

=1

dA∑

x∈XVx

A→AΠxAVx†

A→A ⊗ |x〉〈x|XA (8.1.117)

=1

dA1A ⊗ 1XA (8.1.118)

= πA ⊗ πXA , (8.1.119)

where the last equality follows because dXA = dAd . So we have

UA

∥∥∥EA→AXA(UAρAEU†

A)− πA ⊗ πXA ⊗ ρE

∥∥∥1

dUA ≤ η2. (8.1.120)

Now, since the average over a set of elements is never less than the minimumover the same set, we have that∫

UA

∥∥∥EA→AXA(UAρAEU†

A)− πA ⊗ πXA ⊗ ρE

∥∥∥1

dUA

≥ minUA

∥∥∥EA→AXA(UAρAEU†

A)− πA ⊗ πXA ⊗ ρE

∥∥∥1

(8.1.121)

This implies that there exists a unitary UA (in particular, the one that achievesthe minimum on the right-hand side of the above inequality) such that

η2 ≥∫

UA

∥∥∥EA→AXA(UAρAEU†

A)− πA ⊗ πXA ⊗ ρE

∥∥∥1

dUA

≥∥∥∥EA→AXA

(UAρAEU†A)− πA ⊗ πXA ⊗ ρE

∥∥∥1

(8.1.122)

Now, let

ωAXAE = EA→AXA(UAρAEU†

A), τAXAE = πA ⊗ πXA ⊗ ρE, (8.1.123)

ωAXAE = EA→AXA(UAρAEU†

A), τAXAE = πA ⊗ πXA ⊗ ρE. (8.1.124)

Then, by the Fuchs–van de Graaf inequality (see (3.5.109)), and by the defini-tion of the sine distance (see Definition 3.66), we have that

η2 ≥∥∥∥ωAXAE − τAXAE

∥∥∥1

(8.1.125)

≥ 2− 2√

F(ωAXAE, τAXAE) (8.1.126)

730

Page 742: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

≥ 2− 2√

1− P(ωAXAE, τAXAE)2, (8.1.127)

which implies that

P(ωAXAE, τAXAE) ≤√

1−(

1− η2

2

)2

(8.1.128)

= η

√1− η2

4(8.1.129)

≤ η. (8.1.130)

Then, by the triangle inequality for sine distance (Lemma 3.67), we have

P(ωAXAE, τAXAE) ≤ P(ωAXAE, ωAXAE) + P(ωAXAE, τAXAE) (8.1.131)

≤ P(ρAE, ρAE) + η (8.1.132)

≤√

ε− η + η (8.1.133)

=√

ε, (8.1.134)

where the second inequality follows from the data-processing inequality forthe sine distance, unitary invariance of the sine distance, and the inequalityin (8.1.130). To obtain the last inequality, we used the definition of the stateρAE as one that is (

√ε − η)-close to ρAE in sine distance. We can write the

inequality in (8.1.134) in terms of fidelity as

F(ωAXAE, τAXAE) ≥ 1− ε. (8.1.135)

Note that both ωAXAE and τAXAE are classical-quantum states because XA isa classical register. In particular,

ωAXAE = ∑x∈X

p(x)|x〉〈x|XA ⊗ωxAE, (8.1.136)

p(x) = Tr[ΠxAρA], (8.1.137)

ωxAE =

1p(x)

VxA→A(Π

xAρAEΠx

A). (8.1.138)

Also,

τAXAE = πA ⊗1

dXA∑

x∈X|x〉〈x|XA ⊗ ρE. (8.1.139)

Then, using the direct-sum property of the root fidelity (see (3.5.79)), we ob-tain

F(ωAXAE, τAXAE) =(√

F(ωAXAE, τAXAE))2

(8.1.140)

731

Page 743: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

=

(∑

x∈X

√p(x)dXA

√F(ωx

AE, πA ⊗ ρE)

)2

. (8.1.141)

Now, let

ψxABE :=

1p(x)

VxA→A(Π

xAψABEΠx

A) (8.1.142)

be a purification of ωxAE

for all x ∈ X, and let ΦAB ⊗ φEE′ be a purification ofπA ⊗ ρE. Then, by Uhlmann’s theorem (Theorem 3.58), for every x ∈ X thereexists an isometric channel Wx

B→BE′ such that√

F(ωxAE, πA ⊗ ρE) =

√F(Wx

B→BE′(ψxABE), ΦAB ⊗ φEE′) (8.1.143)

for all x ∈ X. Using the set WxB→BE′x∈X, we define the quantum channels

DxB→Bx∈X as follows:

DxB→B := TrE′ Wx

B→BE′ . (8.1.144)

By the data-processing inequality for fidelity (see Theorem 3.59) under thepartial trace channel TrEE′ , we obtain

√F(ωx

AE, πA ⊗ ρE)

=√

F(WxB→BE′(ψ

xABE), ΦAB ⊗ φEE′) (8.1.145)

≤√

F(TrEE′ [WxB→BE′(ψ

xABE)], TrEE′ [ΦAB ⊗ φEE′ ]) (8.1.146)

=√

F(DxB→B(ω

xAB), ΦAB), (8.1.147)

for all x ∈ X, where we recall the definition of ωxAB

from (8.1.97). Since theinequality in (8.1.147) holds for all x ∈ X, we have that

(∑

x∈X

√p(x)dXA

√F(ωx

AE, πA ⊗ ρE)

)2

≤(

∑x∈X

√p(x)dXA

√F(Dx

B→B(ωxAB), ΦAB)

)2

. (8.1.148)

Now, the final state of Alice and Bob after executing the LOCC channeldefined by (8.1.79), with EA→AXA

defined by (8.1.91) and DBXB→B defined by(8.1.81) and (8.1.144), is

ωAB = (DXBB→B CXA→XB EA→AXA)(ρAB) (8.1.149)

732

Page 744: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

= ∑x∈X

DXBB→B(ExA→A(ρAB)⊗ |x〉〈x|XB) (8.1.150)

= ∑x∈X

(ExA→A ⊗Dx

B→B)(ρAB) (8.1.151)

= TrXB

[∑

x∈Xp(x)|x〉〈x|XB ⊗Dx

B→B(ωxAB)

], (8.1.152)

where in the third equality we recognize the required form in (8.1.78) for aone-way Alice-to-Bob LOCC channel, and in the last inequality we made useof (8.1.97). Using the form of ωAB in the last equality, along with all of thedevelopments above, we finally obtain

F(ωAB, ΦAB)

= F

(TrXB

[∑

x∈Xp(x)|x〉〈x|XB ⊗Dx

B→B(ωxAB)

], TrXB [πXB ⊗ΦAB]

)

(8.1.153)

≥ F

(∑

x∈Xp(x)|x〉〈x|XB ⊗Dx

B→B(ωxAB),

1dXA

∑x∈X|x〉〈x|XA ⊗ΦAB

)

(8.1.154)

=

(∑

x∈X

√p(x)dXA

√F(Dx

B→B(ωxAB), ΦAB)

)2

(8.1.155)

=

(∑

x∈X

√p(x)dXA

√F((TrEE′ Wx

B→BE′)(ψxABE), TrEE′ [ΦAB ⊗ φEE′ ])

)2

(8.1.156)

≥(

∑x∈X

√p(x)dXA

√F(Wx

B→BE′(ψxABE), ΦAB ⊗ φEE′)

)2

(8.1.157)

=

(∑

x∈X

√p(x)dXA

√F(ωx

AE, πA ⊗ ρE)

)2

(8.1.158)

≥ 1− ε. (8.1.159)

Therefore,perr(L; ρAB) = 1− F(ωAB, ΦAB) ≤ ε. (8.1.160)

To summarize, we have shown that, given a state ρAB and ε ∈ (0, 1),there exists a (d, ε) one-way entanglement distillation protocol LAB→AB if

733

Page 745: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

the dimension d = dA = dB satisfies log2 d = −H√

ε−ηmax (A|B)ρ + 4 log2 η,

where η ∈ [0,√

ε). Although we explicitly constructed the encoding channelsEx

A→Ax∈X on Alice’s side, on Bob’s side we relied on Uhlmann’s theorem to

guarantee the existence of a set of decoding channels DxB→Bx∈X such that

the overall LOCC channel LAB→AB satisfies perr(L; ρAB) ≤ ε.

Combining Theorem 8.10 with (4.8.73), and using (8.1.72), leads to the fol-lowing lower bound on the one-shot distillable entanglement:

Corollary 8.12

Let ρAB be a bipartite quantum state with purification ψABE. For all ε ∈(0, 1), η ∈ [0,

√ε), and α > 1, there exists a (d, ε) one-way entanglement

distillation protocol for ρAB satisfying

log2 d ≥ Hα(A|E)ψ −1

α− 1log2

(1

(√

ε− η)2

)

− log2

(1

1− (√

ε− η)2

)+ 4 log2 η. (8.1.161)

PROOF: The inequality follows from taking the results of Theorem 8.10, using(8.1.72), and applying the inequality in (4.8.73).

Since the inequality in (8.1.161) holds for all (d, ε) entanglement distilla-tion protocols, we obtain the following bound for all ε ∈ (0, 1), η ∈ [0,

√ε),

and α > 1:

EεD(A; B)ρ ≥ sup

L

Hα(A′|E′)φ −1

α− 1log2

(1

(√

ε− η)2

)

− log2

(1

1− (√

ε− η)2

)+ 4 log2 η, (8.1.162)

where the optimization is with respect to every LOCC channel LAB→A′B′ ,such that φA′B′E′ is a purification of LAB→A′B′(ρAB). This comes about by firstapplying the LOCC channel LAB→A′B′ to ρAB for free, applying Corollary 8.12to the state LAB→A′B′(ρAB), and finally optimizing over every LOCC channelLAB→A′B′ .

734

Page 746: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

8.2 Distillable Entanglement of a Quantum State

Having found upper and lower bounds on the one-shot distillable entangle-ment Eε

D(A; B)ρ of a bipartite quantum state ρAB, let us now move on to theasymptotic setting. In this setting, we allow Alice and Bob to make use ofan arbitrarily large number n of copies of the state ρAB in order to obtaina maximally entangled state. An entanglement distillation protocol for n copiesof ρAB is defined by the triple (n, d,LAnBn→AB), consisting of the number nof copies of ρAB, an integer d ≥ 2, and an LOCC channel LAnBn→AB withdA = dB = d. Observe that an entanglement distillation protocol for n copiesof ρAB is equivalent to a (one-shot) entanglement distillation protocol for thestate ρ⊗n

AB. All of the results of Section 8.1 thus carry over to the asymptoticsetting simply by replacing ρAB with ρ⊗n

AB. In particular, the error probabilityfor an entanglement distillation protocol for ρAB defined by (n, d,LAnBn→AB)is equal to

perr(L; ρ⊗nAB) = 1− 〈Φ|ABLAnBn→AB(ρ

⊗nAB)|Φ〉AB. (8.2.1)

Definition 8.13 (n, d, ε) Entanglement Distillation Protocol

An entanglement distillation protocol (n, d,LAnBn→AB) for n copies ofρAB, with dA = dB = d, is called an (n, d, ε) protocol, with ε ∈ [0, 1], ifperr(L; ρ⊗n

AB) ≤ ε.

Based on the discussion above, we note that an (n, d, ε) entanglement dis-tillation protocol for ρAB is a (d, ε) entanglement distillation protocol for ρ⊗n

AB.

The rate R(n, d) of an (n, d, ε) entanglement distillation protocol for n copiesof a given state is

R(n, d) :=log2 d

n, (8.2.2)

which can be thought of as the number of ε-approximate ebits contained inthe final state of the protocol, per copy of the given initial state. Given a stateρAB and ε ∈ [0, 1], the maximum rate of entanglement distillation among all(n, d, ε) entanglement distillation protocols for ρAB is given by

En,εD (ρAB) ≡ En,ε

D (A; B)ρ :=1n

EεD(ρ

⊗nAB) (8.2.3)

= sup(d,L)

log2 d

n: perr(L; ρ⊗n

AB) ≤ ε

, (8.2.4)

735

Page 747: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

where the optimization is with respect to all d ≥ 2 and every LOCC channelLAnBn→AB with dA = dB = d.

Definition 8.14 Achievable Rate for Entanglement Distillation

Given a bipartite quantum state ρAB, a rate R ∈ R+ is called an achievablerate for entanglement distillation for ρAB if for all ε ∈ (0, 1], δ > 0, and suf-ficiently large n, there exists an (n, 2n(R−δ), ε) entanglement distillationprotocol for ρAB.

As we prove in Appendix A,

R achievable rate ⇐⇒ limn→∞

εD(2n(R−δ); ρ⊗nAB) = 0 ∀ δ > 0. (8.2.5)

In other words, a rate R is achievable if the optimal error probability for asequence of protocols with rate R − δ, δ > 0, vanishes as the number n ofcopies of ρAB increases.

Definition 8.15 Distillable Entanglement of a Quantum State

The distillable entanglement of a bipartite state ρAB, denoted by ED(A; B)ρ,is defined to be the supremum of all achievable rates for entanglementdistillation for ρAB, i.e.,

ED(A; B)ρ := supR : R is an achievable rate for ρAB. (8.2.6)

The distillable entanglement can also be written as

ED(A; B)ρ = infε∈(0,1]

lim infn→∞

1n

EεD(ρ

⊗nAB). (8.2.7)

See Appendix A for a proof.

Definition 8.16 Weak Converse Rate for Entanglement Distillation

Given a bipartite state ρAB, a rate R ∈ R+ is called a weak converse ratefor entanglement distillation for ρAB if every R′ > R is not an achievablerate for ρAB.

As we show in Appendix A,

R weak converse rate ⇐⇒ limn→∞

εD(2n(R−δ); ρ⊗nAB) > 0 ∀ δ > 0. (8.2.8)

736

Page 748: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Definition 8.17 Strong Converse Rate for Entanglement Distillation

Given a bipartite state ρAB, a rate R ∈ R+ is called a strong converserate for entanglement distillation for ρAB if for all ε ∈ [0, 1), δ > 0, andsufficiently large n, there does not exist an (n, 2n(R+δ), ε) entanglementdistillation protocol for ρAB.

We show in Appendix A that

R strong converse rate ⇐⇒ limn→∞

εD(2n(R+δ); ρAB) = 1 ∀ δ > 0. (8.2.9)

Definition 8.18 Strong Converse Distillable Entanglement of aQuantum State

The strong converse distillable entanglement of a bipartite state ρAB, de-noted by ED(A; B)ρ, is defined as the infimum of all strong converserates, i.e.,

ED(A; B)ρ := infR : R is a strong converse rate for ρAB. (8.2.10)

Note thatED(A; B)ρ ≤ ED(A; B)ρ (8.2.11)

for all bipartite states ρAB. We can also write the strong converse distillableentanglement as

ED(A; B)ρ = supε∈[0,1)

lim supn→∞

1n

EεD(ρ

⊗nAB). (8.2.12)

See Appendix A for a proof.

We are now ready to present a general expression for the distillable entan-glement of a bipartite quantum state, as well as two upper bounds on it.

Theorem 8.19 Distillable Entanglement of a Bipartite State

The distillable entanglement of a bipartite state ρAB is given by

ED(A; B)ρ = limn→∞

1n

supL(n)

I(A′〉B′)L(n)(ρ⊗n), (8.2.13)

737

Page 749: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

where the optimization is with respect to (two-way) LOCC channelsL(n)AnBn→A′B′ . Furthermore, the Rains relative entropy R(A; B)ρ from

(5.3.4) is a strong converse rate for distillable entanglement, in the sensethat

ED(A; B)ρ ≤ R(A; B)ρ, (8.2.14)

and the squashed entanglement from (5.4.1) is a weak converse rate, inthe sense that

ED(A; B)ρ ≤ Esq(A; B)ρ. (8.2.15)

If we define

D↔(ρAB) ≡ D↔(A; B)ρ := supL

I(A′〉B′)L(ρ), (8.2.16)

then we can write (8.2.13) as

ED(A; B)ρ = limn→∞

1n

D↔(ρ⊗nAB) =: D↔reg(ρAB), (8.2.17)

so that the distillable entanglement can be viewed as the regularized versionof D↔, and it is reminiscent of the regularized Holevo information that givesthe classical capacity of a quantum channel.

Let us make the following observations about Theorem 8.19.

• The coherent information of a bipartite state ρAB is an achievable rate forentanglement distillation, i.e.,

ED(A; B)ρ ≥ I(A〉B)ρ = H(B)ρ − H(AB)ρ. (8.2.18)

This follows immediately from (8.2.13) by dropping the optimization overtwo-way LOCC channels and due to the fact that the coherent informa-tion is additive for product states, meaning that I(An〉Bn)ρ⊗n = nI(A〉B)ρ.As we show in Section 8.2.1 below, the strategy to attain the coherent in-formation rate is essentially the one-way entanglement distillation pro-tocol considered in Section 8.1.2 for the one-shot lower bound, and it issometimes called the “hashing protocol.” For this reason, the inequalityin (8.2.18) is known as the hashing bound (please consult the BibliographicNotes in Section 8.5 for pointers to the research literature).

• In order to obtain a higher entanglement distillation rate than I(A〉B)ρ,one strategy is to use n ≥ 2 copies of ρAB along with a two-way LOCC

738

Page 750: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

channel LAnBn→A′B′ in order to obtain a state ωA′B′ := LAnBn→A′B′(ρ⊗nAB)

whose coherent information is potentially larger than that of ρAB. Then,we can apply the hashing protocol to the state ωA′B′ . The overall rate ofthis strategy (the two-way LOCC channel followed by the hashing pro-tocl) is then 1

n I(A′〉B′)ω, and Theorem 8.19 tells us that such a strategy isoptimal in the large n limit. With increasingly more copies of ρAB to startwith, it might be possible to obtain a better rate, which is why we needto regularize in general.

As with the proof of the entanglement-assisted classical capacity and clas-sical capacity theorems in Chapters 6 and 7, respectively, we prove Theo-rem 8.19 in two steps:

1. Achievability: We show that the right-hand side of (8.2.13) is an achiev-able rate for entanglement distillation for ρAB. Doing so involves exhibit-ing an explicit entanglement distillation protocol. The protocol we use isbased on the one we used in Section 8.1.2 to obtain a lower bound on theone-shot distillable entanglement.

The achievability part of the proof establishes that

ED(A; B)ρ ≥ limn→∞

1n

supL

I(A′〉B′)L(ρ⊗n). (8.2.19)

2. Weak converse: We show that the right-hand side of (8.2.13) is a weakconverse rate for entanglement distillation for ρAB, from which it followsthat ED(A; B)ρ ≤ limn→∞

1n supL I(A′〉B′)L(ρ⊗n). In order to show this,

we use the one-shot upper bounds from Section 8.1.1 to prove that everyachievable rate R satisfies R ≤ limn→∞

1n supL I(A′〉B′)L(ρ⊗n).

We go through the achievability part of the proof of Theorem 8.19 in Sec-tion 8.2.1. We then proceed with the weak converse part in Section 8.2.2.

The expression in (8.2.13) for the distillable entanglement involves botha limit over an unbounded number of copies of the state ρAB, as well as anoptimization over all two-way LOCC channels. Computing the distillableentanglement is therefore intractable in general. After establishing a proofof (8.2.13), we proceed to establish upper bounds on distillable entanglementthat depend only on the given state ρAB. Specifically, in Section 8.2.3, we usethe one-shot results in Section 8.1.1 to show that the Rains relative entropy isa strong converse rate for entanglement distillation. We also show that thesquashed entanglement is a weak converse rate for entanglement distillation.

739

Page 751: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

PPT(Bound

Entangled)SEP Entangled

(a) Qubit-qubit and qubit-qutrit states (b) General high-dimensional states

— Distillable — Non-Distillable

SEP = PPT Entangled

FIGURE 8.3: The set of all bipartite states can be split into distillable andnon-distillable sets. (a) For qubit-qubit and qubit-qutrit states entangle-ment and distillability are in one-to-one correspondence because all PPTstates are separable. (b) In higher dimensions, there are entangled statesbelonging to the set PPT, which we call bound entangled states. Distil-lability and entanglement are thus not synonymous in general for high-dimensional quantum systems.

8.2.0.1 Bound Entanglement

The inequality in (8.2.14) implies that ED(A; B)ρ ≤ 0 for every PPT state ρAB,because, by definition, the Rains relative entropy vanishes for all PPT states.On the other hand, we always have ED(A; B)ρ ≥ 0 for every state ρAB. There-fore,

ED(A; B)ρ = 0 for all PPT states. (8.2.20)

Recall from the discussion after Lemma 8.5 (see also Section 3.1.1.5) that thereexist PPT entangled states in higher dimensional bipartite systems becauseSEP(A : B) 6= PPT(A : B) except for when A and B are both qubits or whenone is a qubit and the other is a qutrit. All of these entangled states havezero distillable entanglement, and thus we refer to them as bound entangled.Remarkably, therefore, except for qubit-qubit and qubit-qutrit states, priorentanglement is only necessary, but not sufficient, for distilling pure maxi-mally entangled states. Please consult the Bibliographic Notes in Section 8.5for more information about bound entanglement.

As shown in Figure 8.3, we can use entanglement distillation to split upthe set of all bipartite states into distillable and non-distillable states. For two-

740

Page 752: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

qubit states and qubit-qutrit states, non-distillable states are exactly equal tothe set of separable states by the PPT criterion. For higher dimensions, asstated above, this is not the case. Also in higher dimensions, it is in gen-eral possible to have states with negative partial transpose (NPT) that arenonetheless non-distillable. These NPT bound entangled states are shown inFigure 8.3(b) as the region between the PPT bound entangled states and thedistillable entangled states. It is not known whether NPT bound entangledstates exist, but since they have not been ruled out, we nevertheless depictthem in the figure.

8.2.1 Proof of Achievability

As the first step in proving the achievability part of Theorem 8.19, let us recallCorollary 8.12: given a bipartite state ρAB with purification ψABE, for all ε ∈(0, 1), η ∈ [0,

√ε), and α > 1, there exists a (d, ε) entanglement distillation

protocol for ρAB such that

log2 d ≥ Hα(A|E)ψ −1

α− 1log2

(1

(√

ε− η)2

)

− log2

(1

1− (√

ε− η)2

)+ 4 log2 η, (8.2.21)

whereHα(A|E)ψ = − inf

σEDα(ψAE‖1A ⊗ σE) (8.2.22)

is the sandwiched Rényi conditional entropy. Applying this inequality to thestate ρ⊗n

AB for all n ≥ 1 leads to the following:

Proposition 8.20

For every state ρAB and ε ∈ (0, 1), there exists an (n, d, ε) entanglementdistillation protocol for ρAB such that the rate log2 d

n satisfies

log2 dn≥ Hα(A|E)ψ −

2α− 1n(α− 1)

log2

(4ε

)− 1

nlog2

(1

1− ε4

), (8.2.23)

for all n ≥ 1 and α > 1, where ψABE is a purification of ρAB. In general,

En,εD (A; B) ≥ sup

L

1n

Hα(A′|E′)φ −2α− 1

n(α− 1)log2

(4ε

)

741

Page 753: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

− 1n

log2

(1

1− ε4

), (8.2.24)

for all n ≥ 1 and α > 1, where the optimization is over ev-ery LOCC channel LAnBn→A′B′ , such that φA′B′E′ is a purification ofLAnBn→A′B′(ρ

⊗nAB).

PROOF: Let ψABE be a purification of ρAB, and use the tensor-product pu-rification ψ⊗n

ABE for ρ⊗nAB. Also, let η =

√ε

2 . Substituting all of this into theinequality in (8.2.21) and simplifying leads to

log2 dn≥ 1

nHα(An|En)ψ⊗n − 1

n(α− 1)log2

(4ε

)

− 1n

log2

(1

1− ε4

)− 2

nlog2

(4ε

), (8.2.25)

Then, optimizing over every LOCC channel LAnBn→A′B′ , and using the defi-nition of En,ε

D (A; B)ρ in (8.2.4), we obtain (8.2.24).

If in (8.2.25) we restrict the optimization in the definition of Hα(An|En)ψ⊗n

to product states and use additivity of the sandwiched Rényi conditional en-tropy for all α > 1, we have that

Hα(An|En)ψ⊗n = nHα(A|E)ψ. (8.2.26)

Note that the proof of additivity follows similarly to the proof of Proposi-tion 6.21. This leads to (8.2.23).

With the inequality in (8.2.23), we can prove that the coherent informationis an achievable rate for entanglement distillation.

Theorem 8.21 Achievability of Coherent Information for Entangle-ment Distillation

The coherent information I(A〉B)ρ of a bipartite state ρAB is an achiev-able rate for entanglement distillation for ρAB. In other words,ED(A; B)ρ ≥ I(A〉B)ρ for every bipartite state ρAB.

PROOF: Let ψABE be a purification of ρAB. Fix ε ∈ (0, 1] and δ > 0. Letδ1, δ2 > 0 be such that

δ = δ1 + δ2. (8.2.27)

742

Page 754: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Set α ∈ (1, ∞) such that

δ1 ≥ I(A〉B)ρ − Hα(A|E)ψ. (8.2.28)

Note that this is possible because Hα(A|E)ψ increases monotonically with de-creasing α (this follows from Proposition 4.21), so that

limα→1+

Hα(A|E)ψ = supα∈(1,∞)

Hα(A|E)ψ. (8.2.29)

Also,

limα→1+

Hα(A|E)ψ = supα∈(1,∞)

Hα(A|E)ψ (8.2.30)

= supα∈(1,∞)

(− inf

σEDα(ψAE‖1A ⊗ σE)

)(8.2.31)

= − infα∈(1,∞)

infσE

Dα(ψAE‖1A ⊗ σE) (8.2.32)

= − infσE

infα∈(1,∞)

Dα(ψAE‖1A ⊗ σE) (8.2.33)

= − infσE

D(ψAE‖1A ⊗ σE) (8.2.34)

= H(A|E)ψ, (8.2.35)

where the fifth equality follows from Proposition 4.28. Then, by definition ofconditional entropy, and the fact that ψABE is a pure state, we find that

H(A|E)ψ = H(AE)ψ − H(E)ψ = H(B)ψ − H(AB)ψ = I(A〉B)ρ. (8.2.36)

With α ∈ (1, ∞) chosen such that (8.2.28) holds, take n large enough sothat

δ2 ≥2α− 1

n(α− 1)log2

(4ε

)+

1n

log2

(1

1− ε4

). (8.2.37)

Now, we use the fact that for the n and ε chosen above there exists an (n, d, ε)protocol such that

log2 dn≥ Hα(A|E)ψ −

2α− 1n(α− 1)

log2

(4ε

)− 1

nlog2

(1

1− ε4

). (8.2.38)

(This follows from Proposition 8.20 above.) Rearranging the right-hand sideof this inequality, and using (8.2.27), (8.2.28), and (8.2.37), we find that

log2 dn≥ I(A〉B)ρ −

(I(A〉B)ρ − Hα(A|E)ψ +

2α− 1n(α− 1)

log2

(4ε

)

743

Page 755: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

+1n

log2

(1

1− ε4

))(8.2.39)

≥ I(A〉B)ρ − (δ1 + δ2) (8.2.40)= I(A〉B)ρ − δ. (8.2.41)

We thus have that there exists an (n, d, ε) entanglement distillation protocolwith rate log2 d

n ≥ I(A〉B)ρ − δ. Therefore, there exists an (n, 2n(R−δ), ε) en-tanglement distillation protocol with R = I(A〉B)ρ for all sufficiently large nsuch that (8.2.37) holds. Since ε and δ are arbitrary, we conclude that for allε ∈ (0, 1], δ > 0, and sufficiently large n, there exists an (n, 2n(I(A〉B)ρ−δ), ε)entanglement distillation protocol. This means that, by definition, I(A〉B)ρ isan achievable rate.

Proof of the Achievability Part of Theorem 8.19

Let LAkBk→A′B′ be an arbitrary LOCC channel with k ≥ 1, let

ωA′B′ = LAkBk→A′B′(ρ⊗kAB), (8.2.42)

and let φA′B′E′ be a purification of ωA′B′ . Fix ε ∈ (0, 1] and δ > 0. Let δ1, δ2 > 0be such that

δ = δ1 + δ2. (8.2.43)

Set α ∈ (1, ∞) such that

δ1 ≥1k

I(A′〉B′)ω −1k

Hα(A′|E′)φ, (8.2.44)

which is possible based on the arguments given in the proof of Theorem 8.21above. Then, with this choice of α, take n large enough so that

δ2 ≥2α− 1

kn(α− 1)log2

(4ε

)+

1kn

log2

(1

1− ε4

). (8.2.45)

Now, we use the fact that, for the chosen n and ε, there exists an (n, d, ε) en-tanglement distillation protocol such that (8.2.23) holds, i.e.,

log2 dn≥ Hα(A′|E′)φ −

2α− 1n(α− 1)

log2

(4ε

)− 1

nlog2

(1

1− ε4

). (8.2.46)

744

Page 756: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Dividing both sides by k gives

log2 dkn

≥ 1k

Hα(A′|E′)φ −2α− 1

kn(α− 1)log2

(4ε

)− 1

knlog2

(1

1− ε4

). (8.2.47)

Rearranging the right-hand side of this inequality, and using (8.2.43)–(8.2.45),we find that

log2 dkn

≥ 1k

I(A′〉B′)ω −(

1k

I(A′〉B′)ω −1k

Hα(A′|E′)φ

+2α− 1

kn(α− 1)log2

(4ε

)+

1kn

log2

(1

1− ε4

))(8.2.48)

≥ 1k

I(A′〉B′)ω − (δ1 + δ2) (8.2.49)

=1k

I(A′〉B′)ω − δ. (8.2.50)

Thus, there exists a (kn, d, ε) entanglement distillation protocol with rate log2 dkn ≥

1k I(A′〉B′)ω − δ. Therefore, letting n′ ≡ kn, there exists an (n′, 2n′(R−δ), ε) en-tanglement distillation protocol with R = 1

k I(A′〉B′)ω for all sufficiently largen such that (8.2.45) holds. Since ε and δ are arbitrary, we conclude that for allε ∈ (0, 1], δ > 0, and sufficiently large n, there exists an (n, 2n( 1

k I(A′〉B′)ω−δ), ε)entanglement distillation protocol. This means that 1

k I(A′〉B′)ω is an achiev-able rate. (Recall that ωA′B′ = LAkBk→A′B′(ρ

⊗kAB).)

Now, since in the arguments above the LOCC channel LAkBk→A′B′ is arbi-trary, we conclude that

1k

supL

I(A′〉B′)L(ρ⊗k) (8.2.51)

is an achievable rate. Finally, since the number k of copies of ρAB is arbitrary,we conclude that

limk→∞

1k

supL

I(A′〉B′)L(ρ⊗k) (8.2.52)

is an achievable rate.

8.2.2 Proof of the Weak Converse

In order to prove the weak converse part of Theorem 8.19, we make use ofCorollary 8.8, specifically (8.1.44): given a bipartite state ρAB, for every (d, ε)

745

Page 757: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

entanglement distillation protocol for ρAB, with ε ∈ [0, 1/2), it holds that

log2 d ≤ 11− 2ε

(supL

I(A′〉B′)L(ρ) + h2(ε)

), (8.2.53)

where the optimization is with respect to every LOCC channel LAB→A′B′ . Ap-plying this inequality to the state ρ⊗n

AB immediately leads to the following.

Proposition 8.22

Let ρAB be a bipartite state, and let n ≥ 1 and ε ∈ [0, 1/2). For an(n, d, ε) entanglement distillation protocol for ρAB with correspondingLOCC channel LAnBn→AB such that dA = dB = d, the rate log2 d

n satisfies

log2 dn≤ 1

1− 2ε

(supL

1n

I(A′〉B′)L(ρ⊗n) +1n

h2(ε)

). (8.2.54)

Consequently,

En,εD (A; B)ρ ≤

11− 2ε

(1n

supL

I(A′〉B′)L(ρ⊗n) +1n

h2(ε)

), (8.2.55)

where the optimization is over every LOCC channel LAnBn→A′B′ .

PROOF: The inequality in (8.2.54) is immediate from (8.1.44) in Corollary 8.8by applying that inequality to the state ρ⊗n

AB and dividing both sides by n. Theinequality in (8.2.55) follows immediately by definition of En,ε

D in (8.2.4).

Proof of the Weak Converse Part of Theorem 8.19

Suppose that R is an achievable rate for entanglement distillation for the bi-partite state ρAB. Then, by definition, for all ε ∈ (0, 1], δ > 0, and sufficientlylarge n, there exists an (n, 2n(R−δ), ε) entanglement distillation protocol forρAB. For all such protocols for which ε ∈ (0, 1/2), the inequality in (8.2.54)holds, so that

R− δ ≤ 11− 2ε

(1n

supL

I(A′〉B′)L(ρ⊗n) +1n

h2(ε)

). (8.2.56)

746

Page 758: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Since the inequality holds for all sufficiently large n, it holds in the limit n →∞, so that

R ≤ limn→∞

11− 2ε

(1n

supL

I(A′〉B′)L(ρ⊗n) +1n

h2(ε)

)+ δ (8.2.57)

=1

1− 2εlim

n→∞

1n

supL

I(A′〉B′)L(ρ⊗n) + δ. (8.2.58)

Then, since this inequality holds for all ε ∈ (0, 1/2), δ > 0, we conclude that

R ≤ limε,δ→0

(1

1− 2εlim

n→∞

1n

supL

I(A′〉B′)L(ρ⊗n) + δ

)(8.2.59)

= limn→∞

1n

supL

I(A′〉B′)L(ρ⊗n). (8.2.60)

We have thus shown that the quantity limn→∞1n supL I(A′〉B′)L(ρ⊗n) is a weak

converse rate for entanglement distillation for ρAB.

8.2.3 Rains Relative Entropy Strong Converse Upper Bound

As stated previously, the expression in (8.2.13) for distillable entanglementinvolves both a limit over an unbounded number of copies of the initial stateρAB, as well an optimization over all two-way LOCC channels. Computingthe distillable entanglement is therefore intractable in general. In this section,we use the one-shot upper bound established in Section 8.1.1 to show thatthe Rains relative entropy is a strong converse upper bound on the distillableentanglement of a bipartite state.

We start by recalling the upper bound in (8.1.45), which tell us that

log2 d ≤ Rα(A; B)ρ +α

α− 1log2

(1

1− ε

)∀ α > 1, (8.2.61)

for an arbitrary (d, ε) entanglement distillation protocol, where ε ∈ (0, 1).Recall that

Rα(A; B)ρ = infσAB∈PPT′(A:B)

Dα(ρAB‖σAB). (8.2.62)

The upper bound above is a consequence of the fact that PPT′ operatorsare useless for entanglement distillation, in the sense that for every σAB ∈PPT′(A : B), the bound Tr[ΦABσAB] ≤ 1

d holds.

747

Page 759: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Applying the upper bound in (8.2.61) to the state ρ⊗nAB leads to the follow-

ing result:

Corollary 8.23

Let ρAB be a bipartite state, let n ≥ 1, ε ∈ [0, 1), and α > 1. For an (n, d, ε)entanglement distillation protocol, the following bound holds

log2 dn≤ Rα(A; B)ρ +

α

n(α− 1)log2

(1

1− ε

). (8.2.63)

Consequently,

En,εD (A; B)ρ ≤ Rα(A; B)ρ +

α

n(α− 1)log2

(1

1− ε

). (8.2.64)

PROOF: An (n, d, ε) entanglement distillation protocol for ρAB is a (d, ε) en-tanglement distillation protocol for ρ⊗n

AB. Therefore, applying the inequality in(8.2.61) to the state ρ⊗n

AB and dividing both sides by n leads to

log2 dn≤ 1

nRα(An; Bn)ρ⊗n +

α

n(α− 1)log2

(1

1− ε

). (8.2.65)

Now, by subadditivity of the sandwiched Rényi Rains relative entropy (see(5.3.18)), we have that

Rα(An; Bn)ρ⊗n ≤ nRα(A; B)ρ. (8.2.66)

Therefore,log2 d

n≤ Rα(A; B)ρ +

α

n(α− 1)log2

(1

1− ε

), (8.2.67)

as required. Since this inequality holds for all (n, d, ε) protocols, we obtain(8.2.64) by optimizing over all protocols (d,LAB→AB), with dA = dB = d ≥2.

Given an ε ∈ (0, 1), the inequality in (8.2.63) gives us a bound on the rateof an arbitrary (n, d, ε) entanglement distillation protocol for a state ρAB. Ifinstead we fix the rate to be r, so that d = 2nr, then the inequality in (8.2.63) isas follows:

r ≤ Rα(A; B)ρ +α

n(α− 1)log2

(1

1− ε

)(8.2.68)

748

Page 760: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

for all α > 1. Rearranging this inequality gives us the following lower boundon ε:

ε ≥ 1− 2−n( α−1α )(r−Rα(A;B)ρ) (8.2.69)

for all α > 1.

Theorem 8.24 Strong Converse Upper Bound on Distillable Entan-glement

Let ρAB be a bipartite state. The Rains relative entropy R(A; B)ρ is astrong converse rate for entanglement distillation for ρAB, i.e.,

ED(A; B)ρ ≤ R(A; B)ρ. (8.2.70)

where we recall that R(A; B)ρ is defined as

R(A; B)ρ = infσAB∈PPT′(A:B)

D(ρAB‖σAB). (8.2.71)

PROOF: Let ε ∈ [0, 1) and δ > 0. Let δ1, δ2 > 0 be such that

δ > δ1 + δ2 =: δ′. (8.2.72)

Set α ∈ (1, ∞) such that

δ1 ≥ Rα(A; B)ρ − R(A; B)ρ, (8.2.73)

which is possible because Rα(A; B)ρ is monotonically increasing in α (whichfollows from Proposition 4.29) and because limα→1+ Rα(A; B)ρ = R(A; B)ρ

(see Appendix 5.B for a proof). With this value of α, take n large enough sothat

δ2 ≥α

n(α− 1)log2

(1

1− ε

). (8.2.74)

Now, with the values of n and ε as above, an arbitrary (n, d, ε) entangle-ment distillation protocol for ρAB satisfies (8.2.63), i.e.,

log2 dn≤ Rα(A; B)ρ +

α

n(α− 1)log2

(1

1− ε

)(8.2.75)

for all α ∈ (1, ∞). Rearranging the right-hand side of this inequality, andusing (8.2.72)–(8.2.74), we obtain

log2 dn≤ R(A; B)ρ + Rα(A; B)ρ − R(A; B)ρ +

α

n(α− 1)log2

(1

1− ε

)(8.2.76)

749

Page 761: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

≤ R(A; B)ρ + δ1 + δ2 (8.2.77)= R(A; B)ρ + δ′ (8.2.78)< R(A; B)ρ + δ. (8.2.79)

So we have that log2 dn < R(A; B)ρ + δ for all (n, d, ε) entanglement distillation

protocols for ρAB with sufficiently large n such that (8.2.74) holds. Due to thisstrict inequality, it follows that there cannot exist an (n, 2n(R(A;B)ρ+δ), ε) en-tanglement distillation protocol for ρAB for all sufficiently large n such that(8.2.74) holds. For if it were to exist, there would be a d ≥ 2 such thatlog2 d = n(R(A; B)ρ + δ), which we have just seen is not possible. Since εand δ are arbitrary, we conclude that for all ε ∈ [0, 1), δ > 0, and sufficientlylarge n, there does not exist an (n, 2n(R(A;B)ρ+δ), ε) entanglement distillationprotocol for ρAB. This means that R(A; B)ρ is a strong converse rate, so thatED(A; B)ρ ≤ R(A; B)ρ.

Given that the Rains relative entropy is a strong converse rate for distil-lable entanglement, by following arguments analogous to those in the proofabove, we can conclude that 1

k R(Ak; Bk)ρ⊗k is a strong converse rate for allk ≥ 2. Therefore, the regularized quantity

Rreg(A; B)ρ := limn→∞

1n

R(An; Bn)ρ⊗n , (8.2.80)

is a strong converse rate for entanglement distillation for ρAB, so that

ED(A; B)ρ ≤ Rreg(A; B)ρ. (8.2.81)

By subadditivity of Rains relative entropy (see (5.3.18)),

Rreg(A; B)ρ ≤ R(A; B)ρ, (8.2.82)

so that the regularized quantity in general gives a tighter upper bound ondistillable entanglement.

8.2.3.1 The Strong Converse from a Different Point of View

We now show that the Rains relative entropy is a strong converse rate usingthe equivalent characterization of a strong converse rate in (8.2.9). In otherwords, given a bipartite state ρAB, we show that for an arbitrary sequence(n, 2nr, εn)n∈N of (n, d, ε) protocols with rates r > R(A; B)ρ, it holds that

750

Page 762: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

limn→∞ εn = 1. Indeed, for every element of the sequence, the inequality in(8.2.68) applies; namely,

r ≤ Rα(A; B)ρ +α

n(α− 1)log2

(1

1− εn

)(8.2.83)

for all α > 1. Rearranging this inequality gives us the following lower boundon εn:

εn ≥ 1− 2−n( α−1α )(r−Rα(A;B)ρ) (8.2.84)

for all α > 1. Now, since r > R(A; B)ρ, limα→1+ Rα(A; B)ρ = R(A; B)ρ

(see Appendix 5.B), and because Rα(A; B)ρ is monotonically increasing in α(this follows from Proposition 4.29), there exists an α∗ > 1 such that r >

Rα∗(A; B)ρ. Applying the inequality in (8.2.84) to this value of α, we find that

εn ≥ 1− 2−n(

α∗−1α∗)(r−Rα∗ (A;B)ρ). (8.2.85)

Then, taking the limit n→ ∞ on both sides of this inequality, we conclude thatlimn→∞ εn ≥ 1. But εn ≤ 1 for all n because εn is a probability by definition.So we obtain limn→∞ εn = 1. Since the rate r > R(A; B)ρ is arbitrary, weconclude that R(A; B)ρ is a strong converse rate for entanglement distillationfor ρAB. We also see from (8.2.85) that the sequence εnn∈N approaches oneat an exponential rate.

8.2.4 Squashed Entanglement Weak Converse Upper Bound

In this section, we establish the squashed entanglement of a bipartite stateas a weak converse upper bound on its distillable entanglement. The mainidea is to apply the one-shot bound from Theorem 8.9 and the additivity ofthe squashed entanglement (Proposition 5.32) in order to arrive at this con-clusion.

Corollary 8.25

Let ρAB be a bipartite state, let n ≥ 1, and let ε ∈ [0, 1). For an (n, d, ε)entanglement distillation protocol, the following bound holds

1n

log2 d ≤ 11−√ε

(Esq(A; B)ρ +

g2(√

ε)

n

). (8.2.86)

751

Page 763: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

PROOF: An (n, d, ε) entanglement distillation protocol for ρAB is a (d, ε) en-tanglement distillation protocol for ρ⊗n

AB. Therefore, applying the inequality in(8.9) to the state ρ⊗n

AB and dividing both sides by n leads to

1n

log2 d ≤ 11−√ε

(1n

Esq(An; Bn)ρ⊗n +g2(√

ε)

n

). (8.2.87)

Now, by additivity of the squashed entanglement (see (5.4.8)), we have that

Esq(An; Bn)ρ⊗n = nEsq(A; B)ρ. (8.2.88)

This concludes the proof.

We now provide a proof of (8.2.15), the statement that the squashed entan-glement is a weak converse rate for entanglement distillation. Suppose that Ris an achievable rate for entanglement distillation for the bipartite state ρAB.Then, by definition, for all ε ∈ (0, 1], δ > 0, and sufficiently large n, there ex-ists an (n, 2n(R−δ), ε) entanglement distillation protocol for ρAB. For all suchprotocols, the inequality in (8.2.86) holds, so that

R− δ ≤ 11−√ε

(Esq(A; B)ρ +

1n

g2(√

ε)

). (8.2.89)

Since the inequality holds for all sufficiently large n, it holds in the limit n →∞, so that

R ≤ limn→∞

11−√ε

(Esq(A; B)ρ +

1n

g2(√

ε)

)+ δ (8.2.90)

=1

1−√εEsq(A; B)ρ + δ. (8.2.91)

Then, since this inequality holds for all ε ∈ (0, 1] and δ > 0, we conclude that

R ≤ limε,δ→0

(1

1−√εEsq(A; B)ρ + δ

)(8.2.92)

= Esq(A; B)ρ. (8.2.93)

We have thus shown that the squashed entanglement Esq(A; B)ρ is a weakconverse rate for entanglement distillation.

8.2.5 One-Way Entanglement Distillation

In Section 8.1.2, we considered a one-way entanglement distillation protocolto derive a lower bound on the one-shot distillable entanglement of a bipartite

752

Page 764: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

state. In the asymptotic setting, this leads to the coherent information lowerbound on distillable entanglement, i.e.,

ED(A; B)ρ ≥ I(A〉B)ρ = H(B)ρ − H(AB)ρ, (8.2.94)

which holds for every bipartite state ρAB. By simply reversing the roles ofAlice and Bob in the protocol, it follows that

ED(A; B)ρ ≥ I(B〉A)ρ = H(A)ρ − H(AB)ρ. (8.2.95)

The quantity on the right-hand side of the above inequality is sometimescalled reverse coherent information. Thus, in general, we have the followinglower bound on distillable entanglement:

ED(A; B)ρ ≥ maxI(A〉B)ρ, I(B〉A)ρ, (8.2.96)

which holds for every bipartite state ρAB.

The coherent information lower bound can be improved by first applyinga two-way LOCC channel to n copies of the given state, and then performinga one-way entanglement distillation protocol at the coherent information rate.This leads to

ED(A; B)ρ = limn→∞

1n

supL

I(A′〉B′)L(ρ⊗n) = limn→∞

1n

D↔(ρ⊗nAB) (8.2.97)

where the optimization is over every two-way LOCC channel LAnBn→A′B′ .

If we restrict the optimization in (8.2.97) above to one-way LOCC channelsof the form LAnBn→A′B′ , then we obtain what is called the one-way distillableentanglement, denoted by E→D (A; B)ρ, and defined operationally in a similarway to the distillable entanglement ED(A; B)ρ, but with the free operationsallowed restricted to one-way LOCC. A key result is the following equality:

E→D (A; B)ρ = limn→∞

1n

supL→

I(A′〉B′)L→(ρ⊗n) = limn→∞

1n

D→(ρ⊗nAB), (8.2.98)

whereD→(ρAB) := sup

L→I(A′〉B′)L→(ρ). (8.2.99)

Like the distillable entanglement, the one-way distillable entanglement is anoperational quantity of interest in entanglement theory. Furthermore, theequality in (8.2.98) can be proved along similar lines to how we proved (8.2.13).

In what follows, we show that this expression for one-way distillable en-tanglement can be simplified.

753

Page 765: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Theorem 8.26 One-Way Distillable Entanglement of a BipartiteState

The one-way distillable entanglement of bipartite state ρAB is given by

E→D (A; B)ρ = limn→∞

1n

supV

I(A′〉XBn)Vρ⊗nV† , (8.2.100)

where VAn→A′XE is an isometry of the form

VAn→A′XE = ∑x∈X

KxAn→A′ ⊗ |x〉X ⊗ |x〉E, (8.2.101)

with ∑x∈X(KxAn)†Kx

An = 1An , dA′ = dnA, dX ≤ d2

A, and dE = |X| = dX.Additionally,

D→(ρAB) = supV

I(A′〉XB)VρV† , (8.2.102)

with the optimization over isometries V as in (8.2.101), with n = 1.

This theorem tells us that, to determine the one-way distillable entangle-ment of a bipartite state, it suffices to optimize over one-way LOCC channelsthat consist of only a quantum instrument for Alice, with each of the corre-sponding maps containing just one Kraus operator. Furthermore, it sufficesto take A′ = An and B′ = Bn.

PROOF: We start by recalling from Section 3.2.11 (see also the beginning ofSection 8.1.2) that every one-way LOCC channel L→AnBn→A′B′ can be expressedas

ωA′B′ := L→AnBn→A′B′(ρAnBn) (8.2.103)

= ∑x∈X

(ExAn→A′ ⊗Dx

Bn→B′)(ρAnBn) (8.2.104)

=(DXBBn→B′ CXA→XB EAn→A′XA

)(ρAnBn), (8.2.105)

where X is some finite alphabet dXA = dXB = |X|, ExAn→A′x∈X is a set

of completely positive maps such that ∑x∈X ExAn→A′ is trace preserving, and

DxBn→B′x∈X is a set of channels. In particular,

EAn→A′XA(ρAnBn) = ∑

x∈XEx

An→A′(ρAnBn)⊗ |x〉〈x|XA , (8.2.106)

DXBBn→B′(|x〉〈x|XB ⊗ ρAnBn) = DxBn→B′(ρAnBn). (8.2.107)

754

Page 766: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

For every n ≥ 1, if we restrict the optimization in (8.2.98) such that |X| =d2

An = d2nA , d = dA′ = dn

A, DxBn→B′ = idBn for all x ∈ X, and Ex

An→A′(·) =

KxAn(·)Kx

An for all x ∈ X such that ∑x∈X(KxAn)†Kx

An = 1An , then the LOCCchannel L→AnBn→A′B′ reduces to

L→AnBn→A′B′(ρAnBn) = ∑x∈X

KxAnρAnBn(Kx

An)† ⊗ |x〉〈x|X (8.2.108)

for every state ρAnBn , and it has an isometric extension of the form

VAn→AnXE = ∑x∈X

KxAn ⊗ |x〉X ⊗ |x〉E. (8.2.109)

We thus obtainE→D (A; B)ρ ≥ lim

n→∞

1n

supV

I(An〉XBn)ω (8.2.110)

The rest of the proof is devoted to proving the reverse inequality. LetL→AnBn→A′B′ be an arbitrary one-way LOCC channel of the form in (8.2.103)–(8.2.105). For every state ρAnBn , let

px := Tr[ExAn→A′(ρAnBn)], (8.2.111)

ρxA′Bn :=

1px

ExAn→A′(ρAnBn), (8.2.112)

ρxBn :=

1px

TrA′ [ExAn→A′(ρAnBn)], (8.2.113)

for all x ∈ X. Then, using the data-processing inequality for coherent infor-mation in (4.3.17) and the direct-sum property for quantum relative entropy,we obtain

I(A′〉B′)ω ≤ I(A′〉BnXB)E(ρ) (8.2.114)

= ∑x∈X

pxD(ρxA′Bn‖1A′ ⊗ ρx

Bn) (8.2.115)

= ∑x∈X

px I(A′〉Bn)ρx . (8.2.116)

Now,

EAn→A′XA(ρAnBn) = ∑

x∈XEx

An→A′(ρAB)⊗ |x〉〈x|XA (8.2.117)

= ∑x∈X

pxρxA′B ⊗ |x〉〈x|XA . (8.2.118)

755

Page 767: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Suppose that ExAn→A′ has the following Kraus representation:

ExAn→A′(·) = ∑

y∈YKx,y

An→A′(·)(Kx,yAn→A′)

†, (8.2.119)

where Y is some finite alphabet. Let

qx,y := Tr[Kx,yAn→A′ρAnBn(Kx,y

An→A′)†], (8.2.120)

and observe that px = ∑y∈Y qx,y. Therefore, for each x ∈ X, the values ry|x :=qx,ypx

constitute a probability distribution on Y, in the sense that ry|x ≥ 0 for ally ∈ Y, and ∑y∈Y ry|x = 1. Using this, and letting

ρx,yA′Bn :=

1qx,y

Kx,yAn→A′ρAnBn(Kx,y

An→A′)†, (8.2.121)

so thatpxρx

A′Bn = ∑y∈Y

qx,yρx,yA′Bn , (8.2.122)

we find that

px I(A′〉Bn)ρx ≤ px ∑y∈Y

ry|x I(A′〉Bn)ρx,y , (8.2.123)

for all x ∈ X, where the inequality follows from convexity of coherent infor-mation (see (4.2.125)). Without loss of generality, we can take A′ ≡ An: ifA′ has a dimension smaller than that of An, we can always first isometricallyembed A′ into An. The coherent information remains unchanged under thisisometric embedding.

Combining the last inequality above with the one in (8.2.116), we concludethat

I(A′〉B′)ω ≤ ∑x∈X

px ∑y∈Y

ry|x I(A′〉Bn)ρx,y = ∑x∈X,y∈Y

qx,y I(A′〉Bn)ρx,y . (8.2.124)

Then assigning the superindex z = (x, y) with Z = X× Y, we finally have

I(A′〉B′)ω ≤ supE

I(A′〉BnZB)E(ρ⊗n), (8.2.125)

where the optimization is over channels EAn→A′ZBof the form

EAn→A′ZB(·) := ∑

z∈XKz

An→A′(·)(KzAn→A′)

† ⊗ |z〉〈z|ZB , (8.2.126)

756

Page 768: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

such that ∑z∈Z(KzAn→A′)

†KzAn→A′ = 1An . (This is effectively an optimization

over operators KzAn→A′z∈Z such that ∑z∈Z(Kz

An→A′)†Kz

An→A′ = 1An .) Thechannel EAn→A′ZB

has an isometric extension of the form

VAn→A′ZBE = ∑z∈Z

KzAn→A′ ⊗ |z〉ZB ⊗ |z〉E, (8.2.127)

where dE = dZB = |Z|. Since the number of Kraus operators need not exceedd2

An = d2nA (see Theorem 3.18), we can take dZ = d2n

A without loss of generality.We can thus optimize over all isometries of the form in (8.2.127). Altogether,we have that

I(A′〉B′)ω ≤ supV

I(A′〉ZBn)Vρ⊗nV† (8.2.128)

for every one-way LOCC channel L→AnBn→A′B′ and all n ≥ 1. Optimizing overall one-way LOCC channels on the left-hand side of the inequality above, andtaking the limit n→ ∞ leads us to conclude that

E→D (A; B)ρ ≤ limn→∞

1n

supV

I(A′〉ZBn)Vρ⊗nV† . (8.2.129)

Combining this with (8.2.110) and reassigning Z as X finishes the proof.

Lemma 8.27

For every bipartite state ρAB, the optimized coherent information lowerbound on distillable entanglement is non-negative, i.e., D→(ρAB) ≥ 0.

PROOF: Let ψABR = |ψ〉〈ψ|ABR be a purification of ρAB, and consider thefollowing Schmidt decomposition of |ψ〉ABR:

|ψ〉ABR =r−1

∑k=0

√λk|φk〉A ⊗ |ϕk〉BR. (8.2.130)

Then, let

VA→A′XE :=r−1

∑k=0|k〉A′〈φk|A ⊗ |k〉X ⊗ |k〉E. (8.2.131)

It is then straightforward to show that I(A′〉XB)VρV† = 0. Since V is anexample of an isometry in the optimization for D→(ρAB), we conclude thatD→(ρAB) ≥ I(A′〉XB)VρV† = 0.

757

Page 769: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

8.3 Examples

We now consider classes of bipartite states and evaluate the upper and lowerbounds on their distillable entanglement that we have established in thischapter. In some cases, the distillable entanglement can be determined ex-actly because the upper and lower bounds coincide.

8.3.1 Pure States

The simplest example for which distillable entanglement can be determinedexactly is the class of pure bipartite states. In this case, the coherent infor-mation lower bound and the Rains relative entropy upper bound coincideand are equal to the entropy of the reduced state. Indeed, for the coherentinformation, the joint entropy H(AB)ψ = 0 for every pure state ψAB, so that

I(A〉B)ψ = H(B)ψ − H(AB)ψ = H(B)ψ = H(A)ψ, (8.3.1)

where the last equality follows from the Schmidt decomposition theorem(Theorem 3.2) to see that the reduced states ψA and ψB have the same non-zero eigenvalues, and thus the same value for the entropy. On the other hand,Proposition 5.20 states that the relative entropy of entanglement ER(A; B)ψ =H(A)ψ, and we also know that ER(A; B)ψ ≥ R(A; B)ψ (see (5.1.149)). We thushave the following:

Theorem 8.28 Distillable Entanglement for Pure States

The distillable entanglement of a pure bipartite state ψAB is equal to theentropy of the reduced state on A, i.e.,

ED(A; B)ψ = H(A)ψ. (8.3.2)

8.3.2 Degradable and Anti-Degradable States

In this section, we define two classes of states for which the one-way distill-able entanglement takes on a simple form.

758

Page 770: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Definition 8.29 Degradable and Anti-Degradable Bipartite States

Given a bipartite state ρAB with purification ψABE, we call it degradableif there exists a quantum channel DB→E′ such that the state τAE′E :=DB→E′(ψABE) satisfies

τAE′ = τAE = ψAE. (8.3.3)

We call ρAB anti-degradable if there exists a quantum channel AE→B′ suchthat the state ωABB′ := AE→B′(ψABE) satisfies

ωAB′ = ωAB = ρAB. (8.3.4)

REMARK: Degradable and anti-degradable states are the state counterparts of degrad-able and anti-degradable channels; see Definition 3.25. In fact, observe that the Choi stateof a degradable channel is a degradable state, and the Choi state of an anti-degradablechannel is an anti-degradable state.

Anti-degradable states are also sometimes called symmetrically extendible states ortwo-extendible states (please consult the Bibliographic Notes in Section 8.5).

Intuitively, a degradable state is one for which the system B can be usedto simulate (via a quantum channel DB→E′) the correlations between A andE. Analogously, an anti-degrdable state is one for which the system E can beused to simulate (via a quantum channel AE→B′) the correlations between Aand B.

An anti-degradable state ρAB is one for which the environment E (corre-sponding to the purifying system of ρAB) cannot be decoupled from A andB through LOCC from A to B alone. Indeed, recall the task of decouplingfrom Section 8.1.2 (in particular, see Figure 8.2). Since a channel can alwaysbe applied to E in order to simulate the correlations between A and B, fromthe point of view of A, the systems B and E become indistinguishable, so thatA and B cannot be (perfectly) decoupled from E. Given that decoupling isnot possible for anti-degradable states, we might expect that anti-degradablestates have zero one-way distillable entanglement. This is indeed true, as wenow show.

759

Page 771: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Theorem 8.30 One-Way Distillable Entanglement for Anti-Degradable States

For an anti-degradable state ρAB, the one-way distillable entanglementis equal to zero, i.e., E→D (A; B)ρ = 0.

PROOF: Let VA→A′XE be an arbitrary isometry in the optimization for D→(ρAB).Also, let ψABR be a purification of ρAB. Then, because ρAB is anti-degradable,there exists a channel AR→B such that

ρAB = AR→B(ψAR). (8.3.5)

Now, letωA′XEBR = VA→A′XEψABRV†

A→A′XE, (8.3.6)

which is a pure state. Then, using the fact that

ωA′XB = TrE[VA→A′XEρABV†A→A′XE] (8.3.7)

= (AR→B TrE)(VA→A′XEψARV†A→A′XE) (8.3.8)

= AR→B(ωA′XR), (8.3.9)

and that ωXB = AR→B(ωXR), we find that

I(A′〉XB)ω ≤ I(A′〉RX)ω (8.3.10)= H(RX)ω − H(A′RX)ω (8.3.11)= H(A′EB)ω − H(EB)ω (8.3.12)= H(A′XB)ω − H(XB)ω (8.3.13)= −I(A′〉XB)ω, (8.3.14)

where we used the data-processing inequality in (4.3.17), and for the subse-quent equalities we used the fact that ωA′XEBR is a pure state that is symmetricin X and E. We thus have I(A′〉XB)VρV† ≤ 0 for every isometry V used inthe optimization for D→(ρAB), implying that D→(ρAB) ≤ 0. However, sinceD→(ρAB) ≥ 0 by Lemma 8.27, we obtain D→(ρAB) = 0. The statement thatE→D (A; B)ρ = 0 follows by repeating the same argument for n copies of ρABand using the fact that ρ⊗n

AB is an anti-degradable state if ρAB is.

760

Page 772: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Theorem 8.31 One-Way Distillable Entanglement for DegradableStates

For a degradable state ρAB, we have

D→(ρAB) = I(A〉B)ρ. (8.3.15)

Consequently, D→(ρ⊗nAB) = nD→(ρAB), and thus the one-way distillable

entanglement of a degradable state ρAB is equal to its coherent informa-tion:

E→D (A; B)ρ = I(A〉B)ρ. (8.3.16)

PROOF: First, observe that if we pick the isometry V in (8.2.102) to be VA→A′XE =1A ⊗ |0, 0〉XE (so that X and E are one-dimensional systems), then we obtainD→(ρAB) ≥ I(A〉B)ρ. Then, we conclude that D→(ρ⊗n

AB) ≥ nI(A〉B)ρ be-cause coherent information is additive for product states; thus, ED(A; B)ρ =

limn→∞1n D→(ρ⊗n

AB) ≥ I(A〉B)ρ.

We now prove the reverse inequality. Let VA→A′XE be an arbitrary isome-try in the optimization for D→(ρAB). Also, let ψABR be a purification of ρAB.Then, since ρAB is degradable, there exists a channel DB→R′ such that the stateτAR′R := DB→R′(ψABR) satisfies

τAR′ = τAR = ψAR. (8.3.17)

Let WB→R′F be an isometric extension of DB→R′ , and let

|ϕ〉AR′FR = WB→R′F|ψ〉ABR, (8.3.18)|ω〉A′XEBR = VA→A′XE|ψ〉ABR, (8.3.19)|φ〉A′XER′FR = WB→R′F|ω〉A′XEBR = VA→A′XE|ϕ〉AR′FR. (8.3.20)

Then, by invariance of entropy under the isometry WB→R′F,

I(A′〉XB)ω = H(XB)ω − H(A′XB)ω (8.3.21)= H(XR′F)φ − H(A′XR′F)φ (8.3.22)= H(XR′F)φ − H(ER)φ (8.3.23)= H(XR′F)φ − H(XR)φ, (8.3.24)

where the second-to-last line follows because φA′XER′FR is a pure state, sothat H(A′XR′F)φ = H(ER)φ, and then for the last line we used the fact that

761

Page 773: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

φA′XER′FR is symmetric in X and E by definition of VA→A′XE. Next, due to thefact that τAR′ = τAR, it holds that φXR′ = φXR. We thus obtain

I(A′〉XB)ω = H(XR′F)φ − H(XR′)φ (8.3.25)= −I(F〉XR′)φ (8.3.26)≤ −I(F〉R′)φ (8.3.27)= H(FR′)φ − H(R′)φ (8.3.28)= H(FR′)φ − H(R)φ, (8.3.29)

where the inequality follows from the data-processing inequality with thepartial trace channel TrX, and the last equality follows because φR = φR′ ,due to the degradability of ρAB. Finally, observe that

φR′F = WB→R′FρBW†B→R′F, (8.3.30)

so that H(R′F)φ = H(B)ρ by isometric invariance of entropy. Also, φR =TrAB[ψABR], which implies that H(R)φ = H(AB)ρ. We thus obtain

I(A′〉XB)ω ≤ I(A〉B)ρ (8.3.31)

for every isometry VA→A′XE. This implies that D→(ρAB) ≤ I(A〉B)ρ, i.e.,

D→(ρAB) = I(A〉B)ρ. (8.3.32)

In other words, the trivial isometry VA→A′XE = 1A ⊗ |0, 0〉XE is optimal forD→(ρAB) when ρAB is a degradable state. Thus, by additivity of coherent in-formation for product states, we obtain ED(A; B)ρ = I(A〉B)ρ, as required.

8.4 Summary

In this chapter, we studied the task of entanglement distillation, in whichthe goal is for Alice and Bob to convert many copies of a shared entangledstate ρAB to some (smaller) number of ebits, i.e., copies of a two-qubit max-imally entangled state using local operations and classical communication.The highest rate at which this can be done, given arbitrarily many copies ofρAB and such that the error vanishes, is called distillable entanglement, andwe denote it by ED(A; B)ρ. We started with the one-shot setting, in whichwe allow for some error in the distillation protocol, and we determined bothupper and lower bounds on the number of (approximate) ebits that can be

762

Page 774: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

obtained. Then, in the asymptotic setting, we showed that the coherent in-formation I(A〉B)ρ is a lower bound on distillable entanglement for everybipartite state ρAB. We also found that two different entanglement measuresare upper bounds on distillable entanglement, namely, the Rains relative en-tropy and squashed entanglement. These are the best known upper boundson distillable entanglement.

By first performing entanglement distillation to transform their mixed en-tangled states to approximately pure maximally entangled states and thenperforming the quantum teleportation protocol, Alice can transmit any quan-tum state to Bob. In this sense, entanglement distillation can be used to real-ize a near-ideal quantum channel between Alice and Bob. This fact underliesachievable strategies for quantum communication, which is the task of per-fectly transmitting an arbitrary quantum state from Alice to Bob when theresource is a quantum channel NA→B connecting Alice and Bob rather than ashared bipartite state ρAB. Quantum communication is the subject of the nextchapter.

8.5 Bibliographic Notes

Although our focus in this book is on communication, maximally entangledstates are useful resources for many other quantum information processingtasks, (see, e.g., Horodecki et al. (2009b) for applications of entanglement inquantum computing), which makes entanglement distillation a relevant topicin its own right.

The concept of entanglement distillation was initially developed by Ben-nett et al. (1996b,c), who also provided one-way and two-way protocols fordistillation from two-qubit Bell-diagonal states. The precise mathematicaldefinition of distillable entanglement was given by Rains (1998, 1999a,b); Horodeckiet al. (2000); Plenio and Virmani (2007). Relaxing the set of allowed operationsfrom LOCC channels to separable and completely PPT-preserving channels asa means to obtain upper bounds on distillable entanglement was consideredby Rains (1998, 1999a,b).

Berta (2008); Buscemi and Datta (2010b); Brandao and Datta (2011); Wildeet al. (2017) have considered lower bounds on distillable entanglement in theone-shot setting. The lower bound that we present in Proposition 8.10 is theone given by (Wilde et al., 2017, Proposition 21), which makes use of the one-

763

Page 775: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

shot decoupling results obtained by Dupuis et al. (2014). In particular, theproof of Theorem 8.11 provided in Appendix 8.A comes directly from theproof of (Dupuis et al., 2014, Theorem 3.3). The notion of decoupling hasplayed an important role in the development of quantum information the-ory. It was originally proposed by Schumacher and Westmoreland (2002)in the context of understanding approximate quantum error correction andquantum communication. It was then developed in much more detail byHorodecki et al. (2005b, 2007) in the context of state merging and by Hay-den et al. (2008a) for understanding the coherent information lower boundon quantum capacity. Dupuis (2010) developed the method in more detailin his PhD thesis for a variety of information-processing tasks, and this cul-minated in the general decoupling theorem presented as (Dupuis et al., 2014,Theorem 3.3).

For the SDP formulations of conditional min- and max-entropy, as wellas their smoothed variants, see (Tomamichel, 2015, Chapter 6). For more in-formation about unitary designs and about Haar measure integration overunitaries, we refer to (Collins and Sniady, 2006; Roy and Scott, 2009). Theone-shot upper bound that we present in Proposition 8.6 and Theorem 8.7based on the fact that PPT’ operators are useless for entanglement distillationwas determined by Tomamichel et al. (2016).

In the asymptotic setting, Devetak and Winter (2005) used random codingarguments to establish the coherent information lower bound (also called the“hashing inequality”) on the distillable entanglement of a bipartite quantumstate. The corresponding hashing protocol was presented by Bennett et al.(1996c) for two-qubit Bell-diagonal states. Devetak and Winter (2005) alsodetermined that the general expression in Theorem 8.19 is an achievable ratefor entanglement distillation from a bipartite state, and they also proved theconverse. Horodecki et al. (2000) conjectured this formula earlier, conditionedon the hashing inequality being true.

Theorem 8.26 is due to Devetak and Winter (2005). The other resultsin Sections 8.2.5 and 8.3.2 on one-way entanglement distillation were ob-tained by Leditzky et al. (2018), who in the same work used the conceptsof approximate degradablity and approximate anti-degradability of bipar-tite states to derive upper bounds on distillable entanglement. We note thatanti-degradable quantum states, as defined by Leditzky et al. (2018), are alsoknown as symmetrically extendible states, two-extendible states, or two-shareablestates (Werner, 1989a; Doherty et al., 2004; Yang, 2006).

PPT entangled states (i.e., bound entangled states) were discovered by

764

Page 776: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

NU

ρAE τB ⊗ ρE

A B

E

Environment

Alice

Alice Bob

FIGURE 8.4: Given a bipartite state ρAE and a quantum channel NA→B,the goal of decoupling is to obtain a state τB ⊗ ρE that is in tensor productwith the environment E, where ρE = TrA[ρAE]. To assist with the task,Alice is allowed to apply an arbitrary unitary U to her system A.

Horodecki (1997), and Horodecki et al. (1998) showed that PPT states are use-less for entanglement distillation. A major open and challenging question,which is alluded to in Section 8.2.0.1 (see also Figure 8.3) is whether there ex-ist NPT (negative partial transpose) bound entangled states. For discussionsconcerning NPT bound entanglement, we refer to (Horodecki and Horodecki,1999; DiVincenzo et al., 2000; Dür et al., 2000).

Appendix 8.A One-Shot Decoupling and Proof ofTheorem 8.11

The key insight needed to obtain the lower bound on one-shot distillable en-tanglement in Theorem 8.10 is that entanglement distillation can be thoughtof in terms of decoupling. The general scenario of decoupling is depicted inFigure 8.4. Given a quantum channel NA→B and a bipartite state ρAE, the goalof decoupling is to obtain a state NA→B(ρAE) that is decoupled from the sys-tem E, i.e., a state approximately of the form τB ⊗ ρE, where ρE = TrA[ρAE]and τB is some state. Note that the reduced state of E at the output is the sameas the reduced state of E at the input because we apply a channel only to thesystem A and such a channel is trace preserving. We also allow Alice an ar-bitrary unitary that she can apply to her system before sending it through thechannel NA→B, so that we require

NA→B(UAρAEU†A) ≈ε τB ⊗ ρE, (8.A.1)

for some state τB, up to some error ε. Of course, exact equality in (8.A.1)cannot be obtained in general, and we make the goal instead to obtain an

765

Page 777: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

output state NA→B(UAρAEU†A) that is as close as possible to the state ΦN

B ⊗ ρE,where ΦN

B = TrA[ΦNAB] and ΦN

AB is the Choi state of NA→B. The choice forτB given by the reduced Choi state might seem arbitrary, but it is taken foranalytical considerations and leads to a good bound for our purposes. Byusing trace distance as our measure of closeness, the goal is to determine anupper bound on the following quantity:

minUA

∥∥∥NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

∥∥∥1

(8.A.2)

Note that the minimum over all unitaries never exceeds the average, meaningthat

minUA

∥∥∥NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

∥∥∥1

≤∫

UA

∥∥∥NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

∥∥∥1

dUA, (8.A.3)

where the integral over all unitaries UA is with respect to the Haar measure,and it can be thought of as a uniform average over the continuous set of allunitaries UA acting on the system A. Theorem 8.11 provides an upper boundon the right-hand side of the inequality above, and we restate the result herefor convenience:

UA

∥∥∥NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

∥∥∥1

dUA ≤ 2−12 H2(A|E)ρ− 1

2 H2(A|B)ΦN ,

(8.A.4)where we recall that the sandwiched Rényi conditional entropy of order twoof a bipartite state ωCD is defined as

H2(C|D)ω = − infσD

D2(ωCD‖1C ⊗ σD) (8.A.5)

= − infσD

log2 Tr

[(σ− 1

4D ωCDσ

− 14

D

)2]

, (8.A.6)

and the optimization is over every state σD.

Proof of Theorem 8.11

Let

MBE := NA→B(UAρAEU†A)−ΦN

B ⊗ ρE, (8.A.7)

766

Page 778: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

σBE := τB ⊗ ζE, (8.A.8)

with τB and ζE arbitrary positive definite states. By the variational character-ization of the trace norm in (2.2.88), we have that

‖MBE‖1 = maxUBE|Tr[UBEMBE]| , (8.A.9)

where the optimization is over every unitary UBE. Using the Cauchy–Schwarzinequality (see (2.2.23)), and suppressing system labels for brevity, we obtain

‖M‖1 = maxU|Tr[UM]| (8.A.10)

= maxU

∣∣∣Tr[(

σ14 Uσ

14

) (σ−

14 Mσ−

14

)]∣∣∣ (8.A.11)

≤ maxU

√Tr[(

σ14 Uσ

14

) (σ

14 U†σ

14

)]Tr[σ−

14 Mσ−

12 M†σ−

14

](8.A.12)

=

√max

UTr[σ

12 Uσ

12 U†

]Tr[σ−

14 Mσ−

12 M†σ−

14

]. (8.A.13)

Since σ12 and Uσ

12 U† are positive definite for every unitary U, by the Cauchy–

Schwarz inequality, we conclude that

Tr[σ

12 Uσ

12 U†

]=∣∣∣Tr[σ

12 Uσ

12 U†

]∣∣∣ (8.A.14)

≤√

Tr[σ

12 σ

12

]Tr[Uσ

12 U†Uσ

12 U†

](8.A.15)

= Tr[σ] (8.A.16)= 1 (8.A.17)

for every unitary U, which implies that

maxU

Tr[σ

12 Uσ

12 U†

]≤ Tr[σ] = 1. (8.A.18)

On the other hand, by taking U = 1 in the optimization over U, we obtain

maxU

Tr[σ

12 Uσ

12 U†

]≥ Tr[σ] = 1, (8.A.19)

which means thatmax

UTr[σ

12 Uσ

12 U†

]= Tr[σ] = 1. (8.A.20)

767

Page 779: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Therefore,

‖M‖1 ≤√

Tr[σ−

14 Mσ−

12 M†σ−

14

](8.A.21)

=

√Tr[(

σ−14 Mσ−

14

)2]

, (8.A.22)

where the last line follows because M is Hermitian. So we have that∥∥∥NA→B(UAρAEU†

A)−ΦNB ⊗ ρE

∥∥∥1

≤√

Tr[(

(τB ⊗ ζE)− 1

4 (NA→B(UAρAEU†A)−ΦN

B ⊗ ρE)(τB ⊗ ζE)− 1

4

)2]

.

(8.A.23)

Now, define

NA→B(·) := τ− 1

4B NA→B(·)τ

− 14

B , (8.A.24)

ρAE := ζ− 1

4E ρAEζ

− 14

E . (8.A.25)

Using these definitions, we can write the inequality above as

∥∥∥NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

∥∥∥1

≤√

Tr[(

NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

)2]

. (8.A.26)

Taking the integral over unitaries UA on both sides of this inequality, andusing Jensen’s inequality (see (2.3.20)), which applies because the square rootfunction is concave, we obtain

UA

∥∥∥NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

∥∥∥1

dUA

≤∫

UA

√Tr[(

NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

)2]

dUA (8.A.27)

≤√∫

UA

Tr[(

NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

)2]

dUA. (8.A.28)

768

Page 780: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

Expanding the integral on the right-hand side of the inequality above leadsto∫

UA

Tr[(

NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

)2]

dUA

=∫

UA

Tr[(

NA→B(UAρAEU†A))2]

dUA (8.A.29)

− 2∫

UA

Tr[NA→B(UAρAEU†

A)(ΦNB ⊗ ρE)

]dUA + Tr[(ΦN

B ⊗ ρE)2]

(8.A.30)

=∫

UA

Tr[(

NA→B(UAρAEU†A))2]

dUA

− 2 Tr[NA→B

(∫

UA

UAρAEU†A dUA

)(ΦN

B ⊗ ρE)

]+ Tr[(ΦN

B ⊗ ρE)2].

(8.A.31)

Now, using (8.1.83), we have that

NA→B

(∫

UA

UAρAEU†A dUA

)= NA→B(πA ⊗ TrA[ρAE]) = ΦN

B ⊗ ρE.

(8.A.32)Therefore,

UA

Tr[(

NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

)2]

dUA

=∫

UA

Tr[(

NA→B(UAρAEU†A))2]

dUA − Tr[(ΦNB )

2]Tr[ρ2E]. (8.A.33)

We now use the fact thatTr[X2] = Tr[X⊗2F] (8.A.34)

for every operator X, where F is the swap operator defined in (3.1.88). In(8.A.33) above, we have X ≡ XBE = NA→B(UAρAEU†

A), which is a bipartiteoperator. The corresponding swap operator is FBE = FB ⊗ FE, with FB theswap operator acting on two copies of HB and FE the swap operator actingon two copies of HE. Therefore,

Tr[(

NA→B(UAρAEU†A))2]

= Tr[(

NA→B(UAρAEU†A))⊗2

(FB ⊗ FE)

](8.A.35)

769

Page 781: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

= Tr[(

N⊗2A→B(U

⊗2A ρ⊗2

AE(U†A)⊗2))(FB ⊗ FE)

](8.A.36)

= Tr[ρ⊗2

AE

(((U†

A)⊗2(N⊗2

A→B)†(FB)U⊗2

A

)⊗ FE

)], (8.A.37)

where the last line follows from the definition of the adjoint of a channel. Wethus have∫

UA

Tr[(

NA→B(UAρAEU†A))2]

dUA

= Tr[

ρ⊗2AE

((∫

UA

(U†A)⊗2(N⊗2

A→B)†(FB)U⊗2

A dUA

)⊗ FE

)]. (8.A.38)

Now, we use the following known fact (a standard result in Schur–Weyl du-ality): for every operator X acting on Cd ⊗Cd, with d ≥ 2,

U(U†)⊗2XU⊗2 dU = α1 + βF, (8.A.39)

where F is again the swap operator, and

α =Tr[X]

d2 − 1− Tr[XF]

d(d2 − 1), (8.A.40)

β =Tr[XF]d2 − 1

− Tr[X]

d(d2 − 1). (8.A.41)

Taking X ≡ (N⊗2A→B)

†(FB), which is an operator acting on two copies of HA,we obtain

Tr[(N⊗2A→B)

†(FB)] = Tr[N⊗2A→B(1

⊗2A )FB] (8.A.42)

= d2ATr[(ΦN

B )⊗2FB] (8.A.43)

= d2ATr[(ΦN

B )2], (8.A.44)

where the last line follows from (8.A.34). By similar reasoning, we obtain

Tr[FA(N⊗2A→B)

†(FB)] = Tr[N⊗2A→B(FA)FB] (8.A.45)

= d2ATr[(FA ⊗ FB)(ΦN

AB)⊗2] (8.A.46)

= d2ATr[(ΦN

AB)2], (8.A.47)

where the second equality follows by expressing the action of N⊗2A→B on FA

with the Choi state ΦNAB, using (3.2.7). To obtain the last equality, we again

770

Page 782: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

used (8.A.34). We thus have

α =Tr[(ΦN

B )2]

d2A − 1

(d2

A − dATr[(ΦN

AB)2]

Tr[(ΦNB )

2]

), (8.A.48)

β =Tr[(ΦN

AB)2]

d2A − 1

(d2

A − dATr[(ΦN

B )2]

Tr[(ΦNAB)

2]

). (8.A.49)

We now make use of the following general fact, whose proof we provide be-low in Lemma 8.32: for every non-zero positive semi-definite operator PABwith PB := TrA[PAB], the following inequalities hold

1dA≤ Tr[P2

AB]

Tr[P2B]≤ dA. (8.A.50)

Applying these inequalities to the expressions for α and β above, we obtain

α ≤ Tr[(ΦNB )

2], β ≤ Tr[(ΦNAB)

2]. (8.A.51)

We thus have that∫

UA

Tr[(

NA→B(UAρAEU†A))2]

dUA (8.A.52)

= Tr[ρ⊗2

AE

((α1⊗2

A + βFA)⊗ FE

)](8.A.53)

= αTr[ρ2E] + βTr[ρ2

AE] (8.A.54)

≤ Tr[(ΦNB )

2]Tr[ρ2E] + Tr[(ΦN

AB)2]Tr[ρ2

AE]. (8.A.55)

Combining this with (8.A.33), we find that

UA

Tr[(

NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

)2]

dUA

≤ Tr[(ΦNAB)

2]Tr[ρ2AE]. (8.A.56)

Then, by (8.A.28), and recalling the definitions in (8.A.24) and (8.A.25), weobtain

UA

∥∥∥NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

∥∥∥1

dUA

≤√

Tr[(ΦNAB)

2]Tr[ρ2AE] (8.A.57)

771

Page 783: Principles of Quantum Communication Theory: A Modern Approach

Chapter 8: Entanglement Distillation

=

(Tr

[(τ− 1

4B ΦN

ABτ− 1

4B

)2]) 1

2(

Tr

[(ζ−

14 ρAEζ

− 14

E

)2]) 1

4

. (8.A.58)

This inequality holds for all states τB and ζE, which means that∫

UA

∥∥∥NA→B(UAρAEU†A)−ΦN

B ⊗ ρE

∥∥∥1

dUA

≤(

infτB

Tr

[(τ− 1

4B ΦN

ABτ− 1

4B

)2]) 1

2(

infζE

Tr

[(ζ− 1

4E ρAEζ

− 14

E

)2]) 1

2

(8.A.59)

= 2−12 H2(A|B)ΦN− 1

2 H2(A|E)ρ (8.A.60)

= 2−12 H2(A|E)ρ− 1

2 H2(A|B)ΦN , (8.A.61)

which completes the proof.

Lemma 8.32

For every non-zero positive semi-definite operator PAB, with PB =TrA[PAB], it holds that

1dA≤ Tr[P2

AB]

Tr[P2B]≤ dA. (8.A.62)

PROOF: Letting A′ denote a copy of A, and applying the Cauchy–Schwarzinequality (see (2.2.23)), we find that

Tr[P2B] = Tr[(PAB ⊗ 1A′)(PA′B ⊗ 1A)] (8.A.63)

≤√

Tr[(PAB ⊗ 1A′)2]Tr[(PA′B ⊗ 1A)2] (8.A.64)

= Tr[P2AB ⊗ 1A′ ] (8.A.65)

= dATr[P2AB]. (8.A.66)

The other inequality follows from the operator inequality PAB ≤ dA1A ⊗PB, after sandwiching it by P

12AB and taking a full trace. This operator inequal-

ity follows in turn because 1d2

A∑d2−1

i=0 UiAPAB(Ui

A)† = πA ⊗ PB (Lemma 3.26),

for UiAi the set of Heisenberg–Weyl operators, and by noticing that all terms

in the sum are positive semi-definite and one term in the sum is 1d2

APAB.

772

Page 784: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9

Quantum Communication

In the previous chapter, we considered entanglement distillation, which is thetask of taking many copies of a mixed entangled state ρAB shared by Alice andBob and transforming them to a maximally entangled state ΦAB of Schmidtrank d ≥ 2. Using the quantum teleportation protocol, the maximally entan-gled state resulting from entanglement distillation can be used for quantumcommunication, in the sense that Alice can transfer an arbitrary state of log2 dqubits to Bob.

Now, if Alice and Bob are distantly separated, then how do they obtainmany copies of the shared entangled state ρAB in the first place? Typically, oneof the parties, say Alice, prepares two quantum systems in an entangled stateand sends one of them through a quantum channel NA→B to Bob, therebyestablishing the shared entangled state. Rather than use the shared entangledstate as the resource for communication, it is more natural to use the quantumchannel itself as the resource, as it could in principle lead to better strategiesand higher rates. This is the scenario that we consider in this chapter.

Recall that in the case of classical communication from Chapter 7, we con-sidered messages from a set M, and the goal was to find upper and lowerbounds on the maximum number log2 |M| of transmitted bits over a quan-tum channel for a given error ε. Now, in the case of quantum communication,the goal is to transmit a given number of qubits, rather than bits, for a givenerror ε. Formally, suppose that the sender, Alice, holds a quantum systemA′ with dimension d ≥ 2 that she would like to transmit over the channel Nto Bob, the receiver. In general, the state of this system could be entangledwith the state of some other system R (of arbitrary dimension) to which Alice

773

Page 785: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

E N DA B

ΨRA′ ωRB′

A′ B′

Alice Bob

R

FIGURE 9.1: Depiction of a quantum communication protocol for one useof the quantum channel N. Alice shares the maximally entangled stateΨRA′ with an inaccessible reference system R. She uses the channel EA′→Ato encode her system A′ into a system A, which is sent through the chan-nel NA→B. Bob then applies the decoding channel DB→B′ , such that thefinal state is ωRB′ shared between Bob and the reference system.

does not have access, and so we suppose that the joint state is a pure stateΨRA′ with Schmidt rank d. Note that, by the Schmidt decomposition theo-rem (Theorem 3.2), the dimension of R need not exceed the dimension of A′,which is d. The goal is to determine the largest value of log2 d (which can bethought of as the number of qubits in the system A′) for which the A′ part ofan arbitrary entangled state ΨRA′ can be transmitted with error at most ε. Thisgeneral quantum communication scenario is known as strong subspace trans-mission. As usual, Alice and Bob are allowed local encoding and decodingchannels, respectively, to help with this task; see Figure 9.1 for a depiction ofa one-shot protocol for quantum communication. In the asymptotic setting,they are also allowed as many uses of the channel N as desired. The quan-tum capacity of N, denoted by Q(N), is then the largest value of 1

n log2 d suchthat the A′ part of an arbitrary pure state ΨRA′ can be transmitted to Bob witherror that vanishes as the number n of channel uses increases.

Note that the notion of quantum communication presented above (strongsubspace transmission) is completely general and includes as special casesthe following information-processing tasks:

1. Entanglement transmission: Here, Alice’s system A′ is in the maximally en-tangled state ΦRA′ with the reference system R, and the goal is to transmit

774

Page 786: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

the system A′ to Bob. This is a special case of strong subspace transmis-sion in which ΨRA′ = ΦRA′ .

2. Entanglement generation: Alice prepares a pure entangled state ΨA′A, withdA′ = d ≥ 2 and A the input system to the channel N. The goal is totransmit the system A to Bob such that the resulting state shared by Aliceand Bob is the maximally entangled state of Schmidt rank d. We showin Appendix 9.A how entanglement generation is related to the notion ofquantum communication that we consider here.

Note that entanglement generation is similar to entanglement distillation,and we elaborate more upon this similarity in Section 9.1.3.

3. Subspace transmission: In this scenario, Alice wishes to send a system A′,in an arbitrary pure state ϕA′ , to Bob. This is a special case of the protocolabove in which the system R is not entangled with Alice’s system, so thatΨRA′ = φR ⊗ ϕA′ for some pure state φR on R.

Note that subspace transmission can be accomplished by first perform-ing an entanglement transmission or entanglement generation protocoland then performing the quantum teleportation protocol (however, thisapproach requires the use of a forward classical channel).

We discuss these alternative notions of quantum communication in more de-tail, as well as prove some relationships between them, in Appendix 9.A.

We start our development of quantum communication with the one-shotsetting. Recall that mutual-information channel measures appear in the one-shot upper bounds for entanglement-assisted classical communication andthat Holevo-information channel measures appear in the one-shot upper boundsfor classical communication. In the case of quantum communication, we findthat coherent-information channel measures appear in the one-shot upperbounds. The one-shot lower bound that we obtain is based on the one-shot,one-way entanglement distillation protocol in Section 8.1.2 of the previouschapter, along with an argument for the removal of the classical communica-tion used in that protocol. We then move on to the asymptotic setting, andwe find that the quantum capacity of a quantum channel N is equal to itsregularized coherent information, i.e.,

Q(N) = Icreg(N) := lim

n→∞

1n

Ic(N⊗n), (9.0.1)

where we recall the definition of the coherent information Ic(N) of a channelfrom (4.11.107). Thus, as with the classical capacity, the quantum capacity is

775

Page 787: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

difficult to compute in general. We then find tractable upper bounds on thequantum capacity, and for this purpose, the channel entanglement measuresdefined in Section 5.5 play an important role.

9.1 One-Shot Setting

A (strong subspace) quantum communication protocol for a quantum chan-nel N in the one-shot setting is illustrated in Figure 9.1. It is defined by thethree elements (d,EA′→A,DB→B′), in which d is the dimension of the sys-tem A′, EA′→A is an encoding channel with dA′ = d, and DB→B′ is a decodingchannel with dB′ = dA′ = d. We call the pair (E,D) of encoding and decodingchannels a quantum communication code1 for N.

REMARK: In a strong subspace transmission protocol, the goal is to transmit one shareof a pure state ΨRA′ , with corresponding state vector |Ψ〉RA′ . Note that the state vector|Ψ〉RA′ has a Schmidt decomposition of the form

|Ψ〉RA′ =d

∑x=1

√p(x)|ξx〉R ⊗ |ζx〉A′ , (9.1.1)

where p(x)dx=1 are the Schmidt coefficients and |ξx〉Rd

x=1, |ζx〉A′dx=1 are orthonor-

mal sets of vectors for R and A′, respectively. When written in this form, the state vec-tor |Ψ〉RA′ can be understood as a coherent version of the initial state Φp

MM′ for classicaland entanglement-assisted classical communication (see, e.g., (6.1.2)). The key differencein the classical-communication case is that there is a fixed orthonormal basis |m〉m∈Mcorresponding to the messages m in the message set M. In quantum communication, thegoal is to transmit a state of a quantum system, which means that there is no particular ba-sis used for communication. The encoding and decoding channels should thus be definedso that they can reliably transmit states of the system in an arbitrary basis.

The protocol proceeds as follows: we start with the entangled state ΨRA′ ,where the system A′ belongs to Alice and the system R is an arbitrary refer-ence system inaccessible to Alice. Alice then sends the system A′ through theencoding channel EA′→A and sends A through the channel NA→B. Once Bobreceives the system B, he applies the decoding channel DB→B′ to it. The finalstate of the protocol is therefore

ωRB′ := (DB→B′ NA→B EA′→A)(ΨRA′). (9.1.2)1The quantum communication codes that we consider in this chapter are essentially

equivalent to codes for performing approximate quantum error correction. Please consult theBibliographic Notes in Section 9.5 for more information.

776

Page 788: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Let us now quantify the reliability of the protocol described above, i.e.,how close the final state ωRB′ is to the initial state ΨRA′ . In Section 3.5, wediscussed two measures of closeness for states:

• Normalized trace distance, using which the distance between the initialand final states is 1

2 ‖ΨRA′ −ωRB′‖1. The lower the normalized trace dis-tance, the more reliable the protocol is.

• Fidelity, in which case we have F(ΨRA′ , ωRB′) =∥∥√ΨRA′

√ωRB′

∥∥21 =

〈Ψ|RA′ωRB′ |Ψ〉RA′ as the closeness between the initial and final states ofthe protocol. The higher the fidelity, the more reliable the protocol is.

These two measures of closeness are arguably equivalent to each other, inthe sense that one can be used to bound the other via the inequality (3.5.109)shown in Theorem 3.64, which we restate here: for all states ρ and σ,

1−√

F(ρ, σ) ≤ 12‖ρ− σ‖1 ≤

√1− F(ρ, σ). (9.1.3)

Now, our figure of merit for the quantum communication protocol shouldnot be based on just one particular initial state, in this case ΨRA′ . Recall thatthe task of quantum communication is to reliably transmit one share of an ar-bitrary pure state through the channel NA→B. Intuitively, therefore, the closerthe overall channel DB→B′ NA→B EA′→A is to the identity channel idA′→B′ ,the better the code (E,D) is at the quantum communication task, and so ourfigure of merit should quantify this distance. One method to determine thisdistance is to calculate how well a given code can transmit one share of a statein the worst case, i.e., by either the highest value of the trace distance or by thelowest value of the fidelity. If a code can be designed such that, in the worstcase, the fidelity (trace distance) is high (low), then by definition any otherstate will do just as well or better. We are thus led to define the following twofigures of merit:

1. Worst-case trace distance: We define this as

supΨRA′

12‖ΨRA′ − (DB→B′ NA→B EA′→A)(ΨRA′)‖1 . (9.1.4)

Recalling Definition 3.68, we see that worse-case trace distance is equal tothe diamond distance between the identity channel idA′→B′ and the channelDB→B′ NA→B EA′→A:

12‖idA′→B′ −DB→B′ NA→B EA′→A‖ . (9.1.5)

777

Page 789: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

2. Worst-case fidelity: We define this as

infΨRA′〈Ψ|RA′(DB→B′ NA→B EA′→A)(ΨRA′)|Ψ〉RA′ , (9.1.6)

which is the same as the channel fidelity F(DN E) from Definition 3.72.

These two figures of merit are arguably equivalent, as mentioned before, dueto the inequality in (9.1.3) relating the trace distance and the fidelity. For therest of this chapter, we exclusively use the worst-case fidelity of the code asthe figure of merit, and we define the error probability of the quantum communi-cation code (E,D) for N as

p∗err(E,D;N) := 1− F(D N E) (9.1.7)= sup

ΨRA′1− 〈Ψ|RA′(DB→B′ NA→B EA′→A)(ΨRA′)|Ψ〉RA′ .

(9.1.8)

One can view this quantity as the quantum analogue of the maximum errorprobability for the classical communication tasks of Chapters 6 and 7, whichis why we use the same notation for it as in those chapters.

Definition 9.1 (d, ε) Quantum Communication Protocol

Let (d,E,D) be the elements of a quantum communication protocol forthe quantum channel N. The protocol is called a (d, ε) protocol, withε ∈ [0, 1], if p∗err(E,D;N) ≤ ε.

As alluded to at the beginning of this chapter, a special case of interestin quantum communication is entanglement transmission, which is when thestate ΨRA′ is fixed to be the maximally entangled state ΦRA′ = |Φ〉〈Φ|RA′ ,where we recall that

|Φ〉RA′ =1√d

d−1

∑i=0|i, i〉RA′ . (9.1.9)

Now, since this state is a particular state in the optimization in (9.1.8) for theerror probability p∗err(E,D;N), we conclude that

p∗err(E,D;N)

≥ 1− 〈Φ|RA′(DB→B′ NA→B EA′→A)(ΦRA′)|Φ〉RA′ (9.1.10)= 1− Fe(DB→B′ NA→B EA′→A) (9.1.11)=: perr(E,D;N), (9.1.12)

778

Page 790: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

where in the second line we have identified the entanglement fidelity of thechannel DB→B′ NA→B EA′→A, as stated in Definition 3.71. In the last line,we have defined the quantity perr(E,D;N). As the notation suggests, thisquantity is a quantum analogue of the average error probability for clas-sical and entanglement-assisted classical communication. In classical andentanglement-assisted classical communication, the average error probabilitycorresponds to taking a uniform distribution over the messages being sent.Similarly, in quantum communication, the average error probability can bethought of as taking a uniform distribution for the Schmidt coefficients in(9.1.1), which by definition gives a maximally entangled state.

Another way of writing the average error probability for a quantum com-munication code is via what is known as the entanglement test, which we intro-duced in the previous chapter. It is analogous to the comparator test that wedefined in Chapters 6 and 7 in the context of classical communication. The en-tanglement test is defined by the POVM ΦRB′ , 1RB′ −ΦRB′. The outcomesof the entanglement test tell us whether the state being measured is the max-imally entangled state ΦRB′ . Since the state ΦRB′ is pure, using (3.5.23), theprobability that the state ωRB′ at the end of the protocol is in the maximallyentangled state, i.e., the probability that the state “passes the entanglementtest,” is

Tr[ΦRB′ωRB′ ] = 〈Φ|RB′ωRB′ |Φ〉RB′ (9.1.13)= F(ΦRB′ , ωRB′) (9.1.14)= 1− perr(E,D;N). (9.1.15)

As stated at the beginning of this chapter, the goal of quantum commu-nication is to determine the maximum number log2 d of qubits that can betransmitted over a quantum channel N, in the sense that the A′ part of anarbitrary pure state ΨRA′ , with dA′ = d, can be transmitted over the channelwith error at most ε ∈ (0, 1]. We call this maximum number of transmittedqubits the one-shot quantum capacity of N.

Definition 9.2 One-Shot Quantum Capacity of a Quantum Channel

Given a quantum channel N and ε ∈ (0, 1], the one-shot ε-error quantumcapacity of N, denoted by Qε(N), is defined to be the maximum numberlog2 d of transmitted qubits among all (d, ε) quantum communication

779

Page 791: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

E PσB DA B

ΨRA′ ωRB′

A′ B′

Alice Bob

R

FIGURE 9.2: Depiction of a protocol that is useless for entanglement trans-mission. The encoded half of Alice’s share of the pure state ΨRA′ is dis-carded and replaced by an arbitrary (but fixed) state σB.

protocols over N. In other words,

Qε(N) := sup(d,E,D)

log2 d : p∗err(E,D;N) ≤ ε, (9.1.16)

where the optimization is with respect to d ∈ N, d ≥ 2, encoding chan-nels E with input system dimension d, and decoding channels D withoutput system dimension d.

In addition to finding, for a given ε ∈ (0, 1], the maximum number oftransmitted qubits among all (d, ε) quantum communication protocols overNA→B, we can consider the following complementary problem: for a givendimension d ≥ 2, find the smallest possible error among all (d, ε) quantumcommunication protocols for NA→B, which we denote by ε∗Q(d;N). In otherwords, the complementary problem is to determine

ε∗Q(d;N) := infE,Dp∗err(E,D;N) : dA′ = dB′ = d, (9.1.17)

where the optimization is with respect to all encoding channels EA′→A anddecoding channels DB→B′ such that dA′ = dB′ = d. In this book, we focus pri-marily on the problem of optimizing the number of transmitted qubits ratherthan the error, and so our primary quantity of interest is the one-shot quan-tum capacity Qε(N).

780

Page 792: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

9.1.1 Protocol for a Useless Channel

Consider an arbitrary (d, ε) quantum communication protocol for a chan-nel NA→B, with ε ∈ (0, 1] and with encoding and decoding channels E and D,respectively. This means that p∗err(E,D;N) ≤ ε. By the arguments in (9.1.10)–(9.1.12), this protocol realizes a (d, ε) entanglement transmission protocol, inthe sense that

Tr[ΦRB′ρRB′ ] ≥ 1− ε, (9.1.18)

where ΦRA′ is the maximally entangled state defined in (9.1.9) and

ρRB′ := (DB→B′ NA→B EA′→A)(ΦRA′). (9.1.19)

Consider now the same protocol but over the useless channel depicted in Fig-ure 9.2. This useless channel is exactly the same as the one considered inChapters 6 and 7; namely, it is the replacement channel for some state σB. Forthe initial state ΦRA′ , the state at the end of the protocol for the replacementchannel is

τRB′ = (DB→B′ RσBA→B EA′→A)(ΦRA′) = πR ⊗DB→B′(σB). (9.1.20)

As in classical communication and entanglement-assisted classical commu-nication, we now use the hypothesis testing relative entropy to compare thestate ρRB′ obtained at the end of the quantum communication protocol overthe channel N with the state τRB′ obtained at the end of the quantum com-munication protocol for the replacement channel RσB . In particular, we makeuse of Lemma 8.3, because the state τRB′ satisfies TrB′ [τRB] = πR and due to(9.1.18). Therefore, using (8.1.9) in Lemma 8.3, we conclude that

log2 d ≤ 12

IεH(R; B′)ρ. (9.1.21)

Another bound from Lemma 8.3, namely, the one in (8.1.8), is a more generalupper bound that requires only the assumption in (9.1.18) and does not havethe interpretation of being a comparison between a quantum communicationprotocol for N and a quantum communication protocol for RσB

A→B. Applyingthis bound gives

log2 d ≤ IεH(R〉B′)ρ. (9.1.22)

This latter bound is the one that we employ in this chapter because it leads to aformula for the quantum capacity of some channels of interest in applications.This inequality tells us that, given an arbitrary (d, ε) quantum communication

781

Page 793: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

protocol with corresponding code (E,D), the ε-hypothesis testing coherentinformation Iε

H(R〉B′)ρ, with ρRB′ given by (9.1.19), is an upper bound on themaximum number of qubits that can be transmitted over the channel witherror at most ε. Note that a different choice for the encoding and decodinggenerally produces a different value for the upper bound. We would like anupper bound that applies regardless of the specific protocol. In other words,we would like an upper bound that is a function of the channel NA→B only.

9.1.2 Upper Bound on the Number of Transmitted Qubits

We now establish a general upper bound on the number of transmitted qubitsin an arbitrary quantum communication protocol. This bound holds inde-pendently of the encoding and decoding channels used in the protocol anddepends only on the given communication channel NA→B and the error ε.

Theorem 9.3 Upper Bound on One-Shot Quantum Capacity

Let NA→B be a quantum channel. For a (d, ε) quantum communicationprotocol for NA→B, with ε ∈ (0, 1], the number of qubits transmittedover N is bounded from above by the ε-hypothesis testing coherent in-formation of N defined in (4.11.96), i.e.,

log2 d ≤ Ic,εH (N). (9.1.23)

Consequently, for the one-shot quantum capacity of N,

Qε(N) ≤ Ic,εH (N). (9.1.24)

PROOF: Let E and D be the encoding and decoding channels, respectively, fora (d, ε) quantum communication protocol for N. Then, by (9.1.22), we havethat

log2 d ≤ IεH(R〉B′)ρ = inf

σB′Dε

H(ρRB′‖1R ⊗ σB′), (9.1.25)

where ρRB′ = (DB→B′ NA→B EA′→A)(ΦRA′) is the state defined in (9.1.19).By restricting the optimization in the definition of Iε

H(R〉B′)ρ over every state σB′to the set DB→B′(τB) : τB ∈ D(HB), we obtain

IεH(R〉B′)ρ

= infσB′

DεH(ρRB′‖1R ⊗ σB′) (9.1.26)

782

Page 794: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

≤ infτB

DεH((DB→B′ NA→B EA′→A)(ΦRA′)‖1R ⊗DB→B′(τB)) (9.1.27)

≤ infτB

DεH(NA→B(ρRA)‖1R ⊗ τB) (9.1.28)

where the second inequality follows from the data-processing inequality forhypothesis testing relative entropy and we let ρRA := EA′→A(ΦRA′). We nowtake the supremum over every state ρRA, which effectively corresponds totaking the supremum over all encoding channels, and since it suffices to con-sider only pure states when optimizing the coherent information (see the ar-guments after Definition 4.82), we conclude that

IεH(R〉B′)ρ ≤ inf

τBDε

H(NA→B(ρRA)‖1R ⊗ τB) (9.1.29)

≤ supψRA

infτB

DεH(NA→B(ψRA)‖1R ⊗ τB) (9.1.30)

= Ic,εH (N), (9.1.31)

as required.

As an immediate consequence of Theorem 9.3 and Propositions 4.67 and4.68, we obtain the following:

Corollary 9.4

Let NA→B be a quantum channel, and let ε ∈ [0, 1). For all (d, ε) quantumcommunication protocols for N, the following bounds hold:

(1− 2ε) log2 d ≤ Ic(N) + h2(ε), (9.1.32)

log2 d ≤ Icα(N) +

α

α− 1log2

(1

1− ε

)∀ α > 1, (9.1.33)

where Ic(N) is the coherent information of N, as defined in (4.11.107),and Ic

α(N) is the sandwiched Rényi coherent information of N, as de-fined in (4.11.100).

The proof of (9.1.32) is analogous to the proof of (8.1.44). The proof of(9.1.33) follows by combining Theorem 9.3 with Proposition 4.68.

Since the bounds in (9.1.32) and (9.1.33) hold for an arbitrary (d, ε) quan-tum communication protocol for N, we have that

(1− 2ε)Qε(N) ≤ Ic(N) + h2(ε), (9.1.34)

783

Page 795: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Qε(N) ≤ Icα(N) +

α

α− 1log2

(1

1− ε

)∀ α > 1, (9.1.35)

for all ε ∈ [0, 1).

Let us summarize the steps that we took to arrive at the bounds in (9.1.32)and (9.1.33):

1. We first compared a quantum communication protocol for N with thesame protocol for a useless channel by using the hypothesis testing rel-ative entropy. This led us to Lemma 8.3, and the resulting upper boundin (9.1.22).

2. We then used the data-processing inequality for the hypothesis testingrelative entropy to remove the decoding channel from the bound in (9.1.22).This is done in (9.1.28) in the proof of Theorem 9.3.

3. Finally, we optimized over all encoding channels in (9.1.29)–(9.1.31) toobtain Theorem 9.3, in which the bound is a function solely of the chan-nel and the error probability. Using Propositions 4.67 and 4.68, which re-late hypothesis testing relative entropy to quantum relative entropy andsandwiched Rényi relative entropy, we arrived at Corollary 9.4.

9.1.3 Lower Bound on the Number of Transmitted Qubits viaEntanglement Distillation

Having derived upper bounds on the number of transmitted qubits for anarbitrary quantum communication protocol, let us now determine a lowerbound on the number of transmitted qubits. As with the other communica-tion scenarios that we have considered so far, in order to obtain a lower boundon the number qubits that can be transmitted, we need to devise an explicit(d, ε) quantum communication protocol for all ε ∈ (0, 1). The protocol weconsider is based on the one-shot, one-way entanglement distillation proto-col from Proposition 8.10 in Chapter 8, which establishes that, for an arbitrarybipartite state ρAB and for all ε ∈ (0, 1] and η ∈ [0,

√ε), there exists a (d, ε)

one-way entanglement distillation protocol for ρAB such that

log2 d = −H√

ε−ηmax (A|B)ρ + 4 log2 η. (9.1.36)

The goal in this section is to show that entanglement distillation can be usedto develop a quantum communication strategy. Specifically, we show that

784

Page 796: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

the existence of a (d, ε) one-way entanglement distillation protocol for thebipartite state ωAB = NA′→B(ψAA′) implies the existence of a (d′, ε′) quantumcommunication protocol, with d′ and ε′ being functions of d and ε. The claimis as follows:

Theorem 9.5 Lower Bound on One-Shot Quantum Capacity

Let NA→B be a quantum channel. For all ε ∈ (0, 1), η ∈ [0, ε√

δ/4), and δ ∈(0, 1), there exists a (d, ε) quantum communication protocol for NA→Bsuch that

log2 d = supψAA′

(−H

ε√

δ4 −η

max (A|B)ω

)+ log2(η

4(1− δ)), (9.1.37)

where ωAB = NA′→B(ψAA′). Consequently,

Qε(N) ≥ supψAA′

(−H

ε√

δ4 −η

max (A|B)ω

)+ log2(η

4(1− δ)) (9.1.38)

for all η ∈ [0, ε√

δ/4) and δ ∈ (0, 1), where ωAB = NA′→B(ψAA′).

The first step in the proof of Theorem 9.5 is to observe that one-way en-tanglement distillation is an example of entanglement generation, albeit withforward (i.e., sender to receiver) classical communication, which we intro-duced at the beginning of this chapter and formally define below. We thenshow that forward classical communication does not help for entanglementgeneration, even in the non-asymptotic setting. One-way entanglement dis-tillation thus implies entanglement generation. We then show that entangle-ment generation implies entanglement transmission, which we defined at thebeginning of this chapter. Finally, we show that entanglement transmissionimplies quantum communication.

Before proceeding with the proof of Theorem 9.5, let us formally defineentanglement generation (with and without one-way LOCC assistance) andentanglement transmission.

• Entanglement generation: An entanglement generation protocol for NA→Bis defined by the three elements (d, ΨA′A,DB→B′), where ΨA′A is a purestate with dA′ = d, and DB→B′ is a decoding channel with dB′ = d. The

785

Page 797: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

goal of the protocol is to transmit the system A such that the final state

σA′B′ := (DB→B′ NA→B)(ΨA′A) (9.1.39)

is close in fidelity to a maximally entangled state of Schmidt rank d. Theentanglement generation error of the protocol is given by

p(EG)err (ΨA′A,D;N) := 1− 〈Φ|A′B′σA′B′ |Φ〉A′B′ (9.1.40)

= 1− F(ΦA′B′ , σA′B′). (9.1.41)

We call the protocol (d, ΨA′A,DB→B′) a (d, ε) protocol, with ε ∈ [0, 1], ifp(EG)

err (ΨA′A,D;N) ≤ ε.

Note that an entanglement generation protocol (d, ΨA′A,DB→B′) over NA→Bis an example of an entanglement distillation protocol (d,LAB→AB) forthe state ρA′B = NA→B(ΨA′A), with A ≡ A′, B ≡ B′, and LAB→AB ≡DB→B′ .

• Entanglement generation assisted by one-way LOCC: An entanglement gen-eration protocol for NA→B assisted by one-way LOCC from A to B is de-fined by (d, ΨA′A, Ex

A′A→A′Ax, DxB→B′x), where d ≥ 2, ΨA′A is a pure

state with dA′ = d, ExA′A→A′Ax∈X is a set of completely positive maps

indexed by a finite alphabet X such that ∑x∈X ExA′A→A′A is trace preserv-

ing, and DxB→B′x∈X is a set of quantum channels indexed by X, with

dB′ = d. The goal of the protocol is to transmit the system A such that thefinal state

σ→A′B′ := ∑x∈X

(DxB→B′ NA→B Ex

A′A→A′A)(ΨA′A) (9.1.42)

is close in fidelity to a maximally entangled state of Schmidt rank d. Theerror of the protocol is given by

p(EG),→err (ΨA′A, Exx, Dxx;N) = 1− F(ΦA′B′ , σ→A′B′). (9.1.43)

We call the protocol (d, ΨA′A, ExA′A→A′Ax, Dx

B→B′x) a (d, ε) protocol,with ε ∈ [0, 1], if p(EG),→

err (ΨA′A, Exx, Dxx;N) ≤ ε.

• Entanglement transmission: An entanglement transmission protocol forNA→B consists of the three elements (d,E,D), where d ≥ 2, EA′→A isan encoding channel with dA′ = d, and DB→B′ is a decoding channelwith dB′ = d. The goal of the protocol is to transmit the A′ system of

786

Page 798: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

a maximally entangled state ΦRA′ of Schmidt rank d such that the finalstate

ωRB′ := (DB→B′ NA→B EA′→A)(ΦRA′) (9.1.44)

is close to the initial maximally entangled state. The entanglement trans-mission error of the protocol is

p(ET)err (E,D;N) := 1− 〈Φ|RB′ωRB′ |Φ〉RB′ (9.1.45)

= 1− Fe(D N E), (9.1.46)

where we recall the entanglement fidelity of a channel from Definition 3.71.We call the protocol (d,E,D) a (d, ε) protocol, with ε ∈ [0, 1], if p(ET)

err (E,D;N)≤ ε.

Observe that the error criterion for entanglement transmission is the sameas the average error criterion for quantum communication (see (9.1.10)–(9.1.12)). This means that the existence of an arbitrary (d, ε) quantumcommunication protocol implies the existence of a (d, ε) entanglementtransmission protocol. Also observe that the existence of an arbitrary(d, ε) entanglement transmission protocol implies the existence of a (d, ε)entanglement generation protocol with respect to the state ΨRA ≡ EA′→A(ΦRA′)(with the systems R, A, and A′ belonging to Alice).

9.1.3.1 Proof of Theorem 9.5

We start by showing that an arbitrary entanglement distillation protocol forthe state NA→B(ψA′A), with ψA′A a pure state, has the same performance pa-rameters as an entanglement generation protocol with one-way LOCC assis-tance.

Consider an arbitrary (d, ε) entanglement distillation protocol for NA→B(ψA′A)given by a one-way LOCC channel LA′B→AB, with dA = dB = d. In general,this LOCC channel has the form LA′B→AB = ∑x∈X Ex

A′→A⊗Dx

B→B, where X

is some finite alphabet, ExA′→A

x∈X is a set of completely positive maps suchthat ∑x∈X Ex

A′→Ais trace preserving, and Dx

B→Bx∈X is a set of channels. The

output state of the entanglement distillation protocol is

∑x∈X

(ExA′→A ⊗Dx

B→B)(NA→B(ψA′A))

= ∑x∈X

(DxB→B NA→B Ex

A′→A)(ψA′A), (9.1.47)

787

Page 799: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

which has the form of a state at the output of an entanglement generation pro-tocol with one-way LOCC assistance. We thus have that a (d, ε) entanglementdistillation protocol for NA→B(ψA′A) is equivalent to a (d, ε) entanglementgeneration protocol for N with one-way LOCC assistance. We now show thatone-way LOCC assistance does not help for entanglement generation.

Lemma 9.6

Given a (d, ε) entanglement generation protocol for a channel N, assistedby one-way LOCC, with d ≥ 2 and ε ∈ [0, 1], there exists a (d, ε) entan-glement generation protocol for N (without one-way LOCC assistance).

PROOF: Consider an arbitrary (d, ε) entanglement generation protocol as-sisted by one-way LOCC. The output state of such a protocol has the formin (9.1.42), i.e.,

σ→A′B′ = ∑x∈X

(DxB→B′ NA→B Ex

A′A→A′A)(ΨA′A), (9.1.48)

and by definition we have

F(ΦA′B′ , σ→A′B′) = Tr[ΦA′B′σ→A′B′ ] ≥ 1− ε. (9.1.49)

Now, let

p(x) := Tr[ExA′A→A′A(ΨA′A)], (9.1.50)

ρxA′A :=

1p(x)

ExA′A→A′A(ΨA′A). (9.1.51)

Using this, we can write σ→A′B′ as

σ→A′B′ = ∑x∈X

p(x)(DxB→B′ NA→B)(ρ

xA′A). (9.1.52)

For every x ∈ X, let ρxA′A have the following spectral decomposition:

ρxA′A =

rx

∑k=1

q(k|x)φx,kA′A, (9.1.53)

where rx = rank(ρxA′A). We thus have that

σ→A′B′ = ∑x∈X

rx

∑k=1

p(x)q(k|x)(DxB→B′ NA→B)(φ

x,kA′A). (9.1.54)

788

Page 800: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Then, letting σx,kA′B′ := (Dx

B→B′ NA→B)(φx,kA′A), we conclude that

Tr[ΦA′B′σ→A′B′ ] = ∑

x∈X

rx

∑k=1

p(x)q(k|x)Tr[ΦA′B′σx,kA′B′ ] (9.1.55)

≤ maxx∈X,

1≤k≤rx

Tr[ΦA′B′σx,kA′B′ ] (9.1.56)

= maxx∈X,

1≤k≤rx

Tr[ΦA′B′(DxB→B′ NA→B)(φ

x,kA′A)]. (9.1.57)

In other words, there exists a pair (φx,kA′A,Dx

B→B′) (namely, the one that achievesthe maximum on the right-hand side of the inequality above) such that

Tr[ΦA′B′σx,kA′B′ ] ≥ Tr[ΦA′B′σA′B′ ] ≥ 1− ε. (9.1.58)

Therefore, the triple (d, φk,xA′A,Dx

B→B′) constitutes a (d, ε) entanglement gener-ation protocol (without one-way LOCC assistance).

As we have seen in Chapter 8, forward classical communication certainlyhelps in general for entanglement distillation. However, it does not help forentanglement generation because the resource for entanglement generationis a quantum channel, whereas for entanglement distillation the resource is abipartite quantum state. Having a quantum channel as the resource is morepowerful than having a bipartite quantum state because, when using a quan-tum channel, there is an extra degree of freedom in the input state to thechannel. The proof of Lemma 9.6 demonstrates that the forward classicalcommunication in an arbitrary (d, ε) is not needed, and the proof essentiallyrelies on convexity of the entanglement fidelity performance criterion.

We now show that entanglement generation implies entanglement trans-mission, up to a transformation of the performance parameters.

Lemma 9.7 Entanglement Generation to Entanglement Transmis-sion

Given a (d, ε) entanglement generation protocol for a channel NA→B,with d ≥ 2 and ε ∈ [0, 1], there exists a (d, 4ε) entanglement transmissionprotocol for N.

PROOF: Let (d, ΨA′A,DB→B′) be the elements of a (d, ε) entanglement gener-ation protocol for NA→B, with dA′ = dB′ = d. This implies that the output

789

Page 801: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

stateσA′B′ = (DB→B′ NA→B)(ΨA′A) (9.1.59)

satisfiesF(ΦA′B′ , σA′B′) ≥ 1− ε. (9.1.60)

We now construct an entanglement transmission protocol. To this end, letA′ ≡ R be a reference system inaccessible to both Alice and Bob. By the data-processing inequality for fidelity (Theorem 3.59) with respect to the partialtrace channel TrB′ , we have that

F(ΦR, ΨR) = F(TrB′ [ΦRB′ ], TrB′ [σRB′ ]) ≥ F(ΦRB′ , σRB′) ≥ 1− ε. (9.1.61)

Next, by Uhlmann’s theorem (Theorem 3.58), there exists an isometric chan-nel UA′→A such that

F(ΦR, ΨR) = F(UA′→A(ΦRA′), ΨRA′) ≥ 1− ε. (9.1.62)

We let this isometric channel UA′→A be the encoding channel for the entan-glement transmission protocol, and we let

ωRB′ = (DB→B′ NA→B UA′→A)(ΦRA′). (9.1.63)

Next, using the sine distance (Definition 3.66), by definition of the (d, ε) en-tanglement generation protocol, we have that P(ΦRB′ , σRB′) ≤

√ε. Similarly,

from (9.1.62) we have that

P(ΨRA,UA′→A(ΦRA′)) ≤√

ε. (9.1.64)

Therefore, by the triangle inequality for the sine distance (Lemma 3.67), weconclude that

P(ΦRB′ , ωRB′) ≤ P(ΦRB′ , σRB′) + P(σRB′ , ωRB′) (9.1.65)

≤√

ε +√

ε (9.1.66)

≤ 2√

ε, (9.1.67)

where the second inequality follows from the data-processing inequality forsine distance (see (3.5.135)) and (9.1.64) to see that

P(σRB′ , ωRB′) (9.1.68)= P((DB→B′ NA→B)(ΨRA), (DB→B′ NA→B UA′→A)(ΦRA′)) (9.1.69)≤ P(ΨRA,UA′→A(ΦRA′)) (9.1.70)

790

Page 802: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

≤√

ε. (9.1.71)

Therefore, by definition of the sine distance, we conclude that

1− F(ΦRB′ , ωRB′) ≤ 4ε, (9.1.72)

so that (d,UA′→A,DB→B′) constitutes a (d, 4ε) entanglement transmission pro-tocol, as required.

Finally, we show that entanglement transmission implies quantum com-munication, up to a transformation of the performance parameters. We couldalternatively call this statement “quantum expurgation,” because the argu-ments in the proof are analogous to the expurgation arguments applied in theproof of the lower bound for one-shot classical communication in Proposi-tion 7.5.

Lemma 9.8 Entanglement Transmission to Quantum Communica-tion

Given a (d, ε) entanglement transmission protocol for a channel NA→B,with d ≥ 2 and ε ∈ [0, 1], for all δ ∈ (0, 1), there exists a (b(1 −δ)dc, 2

√ε/δ) quantum communication protocol for N.

PROOF: Suppose that a (d, ε) entanglement transmission code for NA→B ex-ists, and let EA′→A and DB→B′ be the corresponding encoding and decodingchannels, respectively, with dA′ = dB′ = d. The condition perr(E,D;N) ≤ εthen holds, namely,

1− Tr[ΦRB′ωRB′ ] ≤ ε, (9.1.73)

whereωRB′ = (DB→B′ NA→B EA′→A)(ΦRA′). (9.1.74)

LetCA′→B′ := DB→B′ NA→B EA′→A. (9.1.75)

We proceed with the following algorithm:

1. Set k = d and Hd = HA′ . Suppose for now that (1− δ)d is a positiveinteger.

2. Set |φk〉 ∈ Hk to be a state vector that achieves the minimum fidelity ofCA′→B′ :

|φk〉 := arg min|φ〉∈Hk

〈φ|CA′→B′(|φ〉〈φ|)|φ〉, (9.1.76)

791

Page 803: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

and set the fidelity Fk of |φk〉 as follows:

Fk := min|φ〉∈Hk

〈φ|CA′→B′(|φ〉〈φ|)|φ〉 (9.1.77)

= 〈φk|CA′→B′(|φk〉〈φk|)|φk〉. (9.1.78)

3. SetHk−1 := span|ψ〉 ∈ Hk : |〈ψ|φk〉| = 0. (9.1.79)

That is, Hk−1 is set to the orthogonal complement of |φk〉 in Hk, so thatHk = Hk−1 ⊕ span|φk〉. Set k→ k− 1.

4. Repeat steps 2-3 until k = (1− δ)d after step 3.

The idea behind this algorithm is to successively remove minimum fi-delity states from HA′ until k = (1− δ)d. By the structure of the algorithmand some analysis given below, we are then guaranteed that for this k andlower that

1− min|φ〉∈Hk

〈φ|C(|φ〉〈φ|)|φ〉 ≤ ε/δ. (9.1.80)

That is, the subspace Hk is good for quantum communication of states at thechannel input with fidelity at least 1− ε/δ (to be precise, the subspace Hk isgood for subspace transmission as defined in the introduction of this chapter).Furthermore, the algorithm implies that

Fd ≤ Fd−1 ≤ · · · ≤ F(1−δ)d, (9.1.81)

Hd ⊇ Hd−1 ⊇ · · · ⊇ H(1−δ)d. (9.1.82)

Also, |φk〉`k=1 is an orthonormal basis for H`, where ` ∈ 1, . . . , d. Notethat the unit vectors |φk〉, k ∈ (1− δ)d − 1, . . . , 1 can be generated by re-peating the algorithm above exhaustively.

We now analyze the claims above by employing Markov’s inequality andsome other tools. From (9.1.73), we have that

F(ΦRB′ ,CA′→B′(ΦRA′)) ≥ 1− ε. (9.1.83)

Since |φk〉dk=1 is an orthonormal basis for Hd, we can write

|Φ〉RA′ =1√d

d

∑k=1|φk〉R ⊗ |φk〉A′ , (9.1.84)

792

Page 804: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

where complex conjugation is taken with respect to the basis |i〉d−1i=0 used

in (9.1.9). A consequence of the data-processing inequality for fidelity underthe dephasing channel ωR 7→ ∑d

k=1 |φk〉〈φk|RωR|φk〉〈φk|R and convexity of thesquare function is that

F(ΦRB′ , (idR ⊗ CA′→B′)(ΦRA′)) ≤1d

d

∑k=1〈φk|CA′→B′(|φk〉〈φk|)|φk〉 (9.1.85)

=1d

d

∑k=1

Fk. (9.1.86)

This means that

1d

d

∑k=1

Fk ≥ 1− ε⇐⇒ 1d

d

∑k=1

(1− Fk) ≤ ε. (9.1.87)

Now, taking K to be a uniform random variable with realizations k ∈ 1, . . . , dand applying Markov’s inequality (see (2.3.19)), we find that

Pr[1− FK ≥ ε/δ] ≤ E[1− FK]ε/δ

≤ εε/δ

= δ. (9.1.88)

So this implies that (1− δ)d of the Fk values are such that Fk ≥ 1− ε/δ. Sincethey are ordered as given in (9.1.81), we conclude that H(1−δ)d, which by def-inition has dimension (1− δ)d, is a good subspace for quantum communica-tion in the following sense (subspace transmission):

min|φ〉∈H(1−δ)d

〈φ|CA′→B′(|φ〉〈φ|)|φ〉 ≥ 1− ε/δ. (9.1.89)

Now, applying Proposition 3.75 to (9.1.89), we conclude that

min|ψ〉∈H′

(1−δ)d⊗H(1−δ)d

〈ψ|(idH′(1−δ)d

⊗ CA′→B′)(|ψ〉〈ψ|)|φ〉 ≥ 1− 2√

ε/δ, (9.1.90)

i.e., p∗err(E,D;N) ≤ 2√

ε/δ, which is the criterion for strong subspace trans-mission (the strongest notion of quantum communication).

To finish off the proof, suppose that (1− δ)d is not an integer. Then thereexists a δ′ > δ such that (1− δ′)d = b(1− δ)dc is a positive integer. By theabove reasoning, there exists a code satisfying (9.1.90), except with δ replacedby δ′, and with the code dimension equal to b(1− δ)dc. We also have that1− 2

√ε/δ′ > 1− 2

√ε/δ. This concludes the proof.

793

Page 805: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

We now return to the proof of Theorem 9.5. To finish it off, we combinethe results of Lemmas 9.6, 9.7, and 9.8 to conclude that the existence of a(d, ε) entanglement distillation protocol for ωAB = NA′→B(ψAA′) implies theexistence of a (d′, ε′) quantum communication protocol, where

d′ = (1− δ)d, ε′ = 4√

ε

δ, δ ∈ (0, 1). (9.1.91)

Recalling that d is given by (9.1.36), we conclude that

log2 d′ = −Hε′√

δ4 −η

max (A|B)ω + log2(η4(1− δ)). (9.1.92)

Then, since the pure state ψAA′ used in (9.1.92) is arbitrary, we conclude thatthere exists a (d′, ε′) quantum communication protocol satisfying

log2 d′ = supψAA′

(−H

ε′√

δ4 −η

max (A|B)ω

)+ log2(η

4(1− δ)) (9.1.93)

for all η ∈ [0, ε′√

δ/4) and δ ∈ (0, 1). This is precisely the statement in (9.1.37),and so the proof of Theorem 9.5 is complete.

Applying the relation between smooth conditional min- and max-entropyin (8.1.72) to the result of Theorem 9.5, and combining it with (4.8.73), weobtain the following.

Corollary 9.9

Let NA→B be a quantum channel, and let VNA→BE be an isometric channel

extending NA→B. For all ε ∈ (0, 1), δ ∈ (0, 1), η ∈ [0, ε√

δ/4), and α > 1,there exists a (d, ε) quantum communication protocol for N with

log2 d ≥ supψAA′

Hα(A|E)φ −1

α− 1log2

(1

f (ε, δ, η)

)

− log2

(1

1− f (ε, δ, η)

)+ log2(η

4(1− δ)), (9.1.94)

where f (ε, δ, η) :=(

ε√

δ4 − η

)2, φAE = TrB[V

NA′→BE(ψAA′)] =

NcA′→E(ψAA′), and ψAA′ is a pure state with the dimension of A′ equal

to the dimension of A.

794

Page 806: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Since the inequality in (9.1.94) holds for all (d, ε) quantum communicationprotocols, we have that

Qε(N) ≥ supψAA′

Hα(A|E)φ −1

α− 1log2

(1

f (ε, δ, η)

)

− log2

(1

1− f (ε, δ, η)

)+ log2(η

4(1− δ)), (9.1.95)

whereφAE = TrB[V

NA′→BE(ψAA′)] = Nc

A′→E(ψAA′), (9.1.96)

f (ε, δ, η) is defined just above, η ∈ [0, ε√

δ/4), δ ∈ (0, 1), and α > 1.

9.1.3.2 Remark on Forward Classical Communication

To summarize what we did in this section, we used the result from Propo-sition 8.10 on one-shot entanglement distillation to prove the existence of aquantum communication protocol in the one-shot setting. Note that the en-tanglement distillation protocol of Proposition 8.10 involves one-way classi-cal communication, while quantum communication (as we defined it at thebeginning of this chapter) does not. In other words, in this section we man-aged to remove the one-way classical communication from the entanglementdistillation protocol and thereby argue for the existence of a quantum commu-nication protocol. More generally, it holds that forward classical communica-tion (i.e., from the sender to the receiver) does not enhance the correspondingquantum capacity of the channel. In other words, if Q→(N) denotes the quan-tum capacity of the channel N when classical communication from the senderto the receiver is allowed as part of the protocol, then Q→(N) = Q(N). Thisis a direct consequence of the chain of reasoning given in Lemmas 9.6, 9.7,and 9.8, and we return to this point in Chapter 14 when we consider LOCC-assisted quantum communication.

9.2 Quantum Capacity of a Quantum Channel

We now consider the asymptotic setting. In this scenario, depicted in Fig-ure 9.3, the quantum system A′ to be transmitted to Bob is encoded inton copies A1, . . . , An of a quantum system A, for n ≥ 1. Each of these systems

795

Page 807: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

DE

ΨRA′ ωRB′

R

A′

N

N

...

N

N

A1

A2

...An−1

An

B1

B2

...Bn−1

Bn

B′

Alice Bob

FIGURE 9.3: A general quantum communication protocol for n ≥ 1 mem-oryless/unassisted uses of a quantum channel N. Alice uses the chan-nel E to encode her share A′ of the pure state ΨRA′ into n quantum sys-tems A1, A2, . . . , An. She then sends each one of these through the chan-nel N. Bob finally applies a joint decoding channel D on the systemsB1, B2, . . . , Bn, resulting in the state ωRB′ given by (9.2.1).

is then sent independently through the channel N. We call this the asymptoticsetting because the number n can be arbitrarily large.

Analysis of the asymptotic setting is almost exactly the same as that of theone-shot setting. This is due to the fact that n independent uses of the channelN can be regarded as a single use of the channel N⊗n. So the only change thatneeds to be made is to replace N with N⊗n and to define the encoding anddecoding channels as acting on n systems instead of just one. In particular,the state at the end of the protocol becomes

ωRB′ = (DBn→B′ N⊗nA→B EA′→An)(ΨRA′). (9.2.1)

Then, just as in the one-shot setting, we define the error probability of thecode (E,D) for n independent uses of N as

p∗err(E,D;N⊗n) = 1− F(D N⊗n E). (9.2.2)

796

Page 808: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Definition 9.10 (n, d, ε) Quantum Communication Protocol

Let (d,EA′→An ,DBn→B′) be the elements of a quantum communicationprotocol for n independent uses of the channel NA→B, where dA′ =dB′ = d. The protocol is called an (n, d, ε) protocol, with ε ∈ [0, 1], ifp∗err(E,D;N⊗n) ≤ ε.

The rate of an (n, d, ε) quantum communication protocol is defined as thenumber of qubits transmitted per channel use, i.e.,

R(n, d) :=log2 d

n. (9.2.3)

Observe that the rate depends only on the Schmidt rank d of the pure state ΨRA′to be transmitted and on the number of channel uses. In particular, it does notdirectly depend on the communication channel nor on the encoding and de-coding channels. For a given ε ∈ [0, 1] and n ≥ 1, the highest rate among all(n, d, ε) protocols is denoted by Qn,ε(N), and it is defined as

Qn,ε(N) :=1n

Qε(N⊗n) = sup

(d,E,D)

log2 d

n: p∗err(E,D;N⊗n) ≤ ε

, (9.2.4)

where in the second equality we use the definition of the one-shot quantumcapacity Qε given in (9.1.16), and the supremum is over all d ≥ 2, encodingchannels E with input system dimension d, and decoding channels D withoutput system dimension d.

Definition 9.11 Achievable Rate for Quantum Communication

Given a quantum channel N, a rate R ∈ R+ is called an achievable rate forquantum communication over N if for all ε ∈ (0, 1], δ > 0, and sufficientlylarge n, there exists an (n, 2n(R−δ), ε) quantum communication protocolfor N.

As we prove in Appendix A,

R achievable rate ⇐⇒ limn→∞

ε∗Q(2n(R−δ);N⊗n) = 0 ∀ δ > 0. (9.2.5)

In other words, a rate R is achievable if for all δ > 0, the optimal error proba-bility for a sequence of protocols with rate R− δ vanishes as the number n ofuses of N increases.

797

Page 809: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Definition 9.12 Quantum Capacity of a Quantum Channel

The quantum capacity of a quantum channel N, denoted by Q(N), is definedto be the supremum of all achievable rates, i.e.,

Q(N) := supR : R is an achievable rate for N (9.2.6)

An equivalent definition of quantum capacity is

Q(N) = infε∈(0,1]

lim infn→∞

1n

Qε(N⊗n). (9.2.7)

We prove this in Appendix A.

Definition 9.13 Weak Converse Rate for Quantum Communication

Given a quantum channel N, a rate R ∈ R+ is called a weak converse ratefor quantum communication over N if every R′ > R is not an achievablerate for N.

We show in Appendix A that

R weak converse rate ⇐⇒ limn→∞

ε∗Q(2n(R−δ);N⊗n) > 0 ∀ δ > 0. (9.2.8)

In other words, a weak converse rate is a rate for which the optimal errorprobability cannot be made to vanish, even in the limit of a large number ofchannel uses.

Definition 9.14 Strong Converse Rate for Quantum Communication

Given a quantum channel N, a rate R ∈ R+ is called a strong converserate for quantum communication over N if for all ε ∈ [0, 1), δ > 0, andsufficiently large n, there does not exist an (n, 2n(R+δ), ε) quantum com-munication protocol for N.

We show in Appendix A that

R strong converse rate ⇐⇒ limn→∞

ε∗Q(2n(R+δ);N⊗n) = 1 ∀ δ > 0. (9.2.9)

Unlike the weak converse, in which the optimal error is required to simply bebounded away from zero as the number n of channel uses increases, in order

798

Page 810: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

to have a strong converse rate, the optimal error has to converge to one asn increases. By comparing (9.2.8) and (9.2.9), we conclude that every strongconverse rate is a weak converse rate.

Definition 9.15 Strong Converse Quantum Capacity of a QuantumChannel

The strong converse quantum capacity of a quantum channel N, denotedby Q(N), is defined as the infimum of all strong converse rates, i.e.,

Q(N) := infR : R is a strong converse rate for N (9.2.10)

We can also write the strong converse quantum capacity as

Q(N) = supε∈[0,1)

lim supn→∞

1n

Qε(N⊗n). (9.2.11)

See Appendix A for a proof. We also show in Appendix A that

Q(N) ≤ Q(N) (9.2.12)

for every quantum channel N.

We now state the main theorem of this chapter, which gives an expressionfor the quantum capacity of a quantum channel.

Theorem 9.16 Quantum Capacity

The quantum capacity of a quantum channel NA→B is equal to the regu-larized coherent information Ic

reg(N) of N, i.e.,

Q(N) = Icreg(N) := lim

n→∞

1n

Ic(N⊗n). (9.2.13)

Recall from (4.11.107) that the channel coherent information is defined as

Ic(N) = supψRA

I(R〉B)ω, (9.2.14)

where ωRB = NA→B(ψRA) and ψRA is a pure state with the dimension of Requal to the dimension of A.

Observe that the expression in (9.2.13) for the quantum capacity of a quan-tum channel is somewhat similar to the expression in (7.2.14) for the classical

799

Page 811: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

capacity of a quantum channel, in the sense that both capacities involve aregularization of a corresponding channel measure. In the case of quantumcommunication, we obtain the regularization of the channel’s coherent infor-mation, whereas in the case of classical communication, we obtain the regu-larization of Holevo information. Due to the regularization, which involvesa limit of an arbitrarily large number of uses of the channel, the quantumcapacity is in general difficult to compute.

We show below in Section 9.2.3 that the coherent information is always su-peradditive, meaning that Ic(N⊗n) ≥ nIc(N) for every channel N. This meansthat the coherent information is always a lower bound on the quantum ca-pacity of a channel N:

Q(N) ≥ Ic(N) for every channel N. (9.2.15)

If the coherent information happens to be additive for a particular channel,then the regularization in (9.2.13) is not required. The coherent information isknown to be additive for all degradable and anti-degradable channels. (SeeDefinition 3.25.) In fact, for anti-degradable channels, the coherent informa-tion is equal to zero, a fact that we prove in Section 9.3.2 below. Examplesof degradable and anti-degradable channels include the amplitude dampingchannel (Section 3.2.5.1) and the quantum erasure channel (Section 3.2.5.2).For all such channels, we thus have Q(N) = Ic(N).

Also, just as with classical communication, in this case Theorem 9.16 onlymakes a statement about the quantum capacity and not about the strong con-verse quantum capacity. One way of attempting to prove that the coherentinformation of a channel is equal to its strong converse quantum capacity in-volves proving that the sandwiched Rényi coherent information is additivefor the channel. Unfortunately, this quantity has not been shown to be ad-ditive for any quantum channel thus far, which means that this approach isnot known to be useful for obtaining strong converse quantum capacities.We consider another approach to strong converse quantum capacities in Sec-tion 9.2.4, which leads to a strong converse theorem for dephasing channels(see (3.2.137)). In terms of a general statement about the converse, the bestwe can say generally is that the regularized coherent information is a weakconverse rate for all quantum channels.

There are two ingredients to the proof of Theorem 9.16:

1. Achievability: We show that Icreg(N) is an achievable rate, which involves

explicitly constructing a quantum communication protocol. The devel-opments in Section 9.1.3 on a lower bound for one-shot quantum capac-

800

Page 812: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

ity can be used (via the substitution N→ N⊗n) to argue for the existenceof a quantum communication protocol for N in the asymptotic setting atthe rate Ic

reg(N).

The achievability part of the proof establishes that Q(N) ≥ Icreg(N).

2. Weak Converse: We show that Icreg(N) is a weak converse rate, from which

it follows that Q(N) ≤ Icreg(N). To show that Ic

reg(N) is a weak converserate, we use the one-shot upper bounds from Section 9.1.2 to concludethat every achievable rate r satisfies r ≤ Ic

reg(N).

We first establish in Section 9.2.1 that the quantity Icreg(N) is an achievable

rate for quantum communication over N. Then, in Section 9.2.2, we provethat Ic

reg(N) is a weak converse rate.

9.2.1 Proof of Achievability

In this section, we prove that Icreg(N) is an achievable rate for quantum com-

munication over a channel N.

First, recall from Corollary 9.9 that for all ε ∈ (0, 1), δ ∈ (0, 1), η ∈[0, ε√

δ/4), and α > 1, there exists a (d, ε) quantum communication protocolfor NA→B with

log2 d ≥ supψAA′

Hα(A|E)φ −1

α− 1log2

(1

f (ε, δ, η)

)

− log2

(1

1− f (ε, δ, η)

)+ log2(η

4(1− δ)), (9.2.16)

where f (ε, δ, η) :=(

ε√

δ4 − η

)2, φAE = Nc

A′→E(ψAA′), and ψAA′ is a pure statewith the dimension of A′ equal to the dimension of A. Note that

Hα(A|E)φ = − infσE

Dα(φAE‖1A ⊗ σE), (9.2.17)

where the optimization is with respect to states σE. A simple corollary of(9.2.16) is the following.

801

Page 813: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Corollary 9.17 Lower Bound for Quantum Communication in theAsymptotic Setting

Let NA→B be a quantum channel. For all n ∈ N, ε ∈ (0, 1), and α > 1,there exists an (n, d, ε) quantum communication protocol for N with

log2 dn≥ sup

ψAA′Hα(A|E)φ −

1n(α− 1)

log2

(128ε2

)

− 1n

log2

(1

1− ε2

128

)− 4

nlog2

(1ε

)− 15

n, (9.2.18)

where φAE = NcA′→E(ψAA′) and ψAA′ is a pure state with the dimension

of A′ equal to the dimension of A.

PROOF: Applying the inequality in (9.2.16) to the channel N⊗n, letting δ =1/2, and letting η = ε

8√

2leads to

log2 dn≥ sup

ΨAn A′n

H(An|En)Φ −1

n(α− 1)log2

(128ε2

)

− 1n

log2

(1

1− ε2

128

)− 4

nlog2

(1ε

)− 15

n, (9.2.19)

where ΦAnEn = (NcA′→E)

⊗n(ΨAn A′n), and the optimization is with respectto pure states ΨAn A′n with dA = dA′ . Now, let ψAA′ be an arbitrary purestate, and let φAE = Nc

A′→E(ψAA′). Then, restricting the optimization in thedefinition of Hα(An|En)φ⊗n to product states (see (9.2.17)) leads to

Hα(An|En)φ⊗n ≥ nHα(A|E)φ. (9.2.20)

In other words,

supΨ

An A′n

Hα(An|En)Φ ≥ Hα(An|En)φ⊗n ≥ nHα(A|E)φ (9.2.21)

for every pure state ψAA′ , which implies that

supΨ

An A′n

Hα(An|En)Φ ≥ n supψAA′

Hα(A|E)φ. (9.2.22)

Therefore, the inequality in (9.2.19) simplifies to (9.2.18), as required.

802

Page 814: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

The inequality in (9.2.18) implies that

Qn,ε(N) ≥ supψAA′

Hα(A|E)φ −1

n(α− 1)log2

(128ε2

)

− 1n

log2

(1

1− ε2

128

)− 4

nlog2

(1ε

)− 15

n, (9.2.23)

for all n ≥ 1, ε ∈ (0, 1), and α > 1, where dA′ = dA and φAE = NcA′→E(ψAA′).

We can now use (9.2.18) to prove that the regularized coherent informationIcreg(N) is an achievable rate for quantum communication over N.

Proof of the Achievability Part of Theorem 9.16

Fix ε ∈ (0, 1] and δ > 0. Let δ1, δ2 > 0 be such that

δ = δ1 + δ2. (9.2.24)

Set α ∈ (1, ∞) such that

δ1 ≥ Ic(N)− supψAA′

Hα(A|E)φ, (9.2.25)

where ψAA′ is a pure state with the dimension of A′ equal to the dimensionof A and φAE = Nc

A′→E(ψAA′). Note that this is possible because Hα(A|E)φ in-creases monotonically with decreasing α (this follows from Proposition 4.21),so that

limα→1+

supψAA′

Hα(A|E)φ = supα∈(1,∞)

supψAA′

Hα(A|E)φ (9.2.26)

= supα∈(1,∞)

supψAA′

(− inf

σEDα(φAE‖1A ⊗ σE)

)(9.2.27)

= − infα∈(1,∞)

infψAA′

infσE

Dα(φAE‖1A ⊗ σE) (9.2.28)

= − infψAA′

infσE

infα∈(1,∞)

Dα(φAE‖1⊗ σE) (9.2.29)

= − infψAA′

infσE

D(φAE‖1A ⊗ σE) (9.2.30)

= supψAA′

(− inf

σED(φAE‖1A ⊗ σE)

)(9.2.31)

803

Page 815: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

= supψAA′

H(A|E)φ, (9.2.32)

where the fifth equality follows from Proposition 4.20. Now, let VNA′→BE be

an isometric channel extending NA′→B such that φAE = NcA′→E(ψAA′) =

TrB[VNA′→BE(ψAA′)]. Since the state VN

A′→BE(ψAA′) is pure, we can view it as apurification of ωAB = NA′→B(ψAA′), so that

H(A|E)φ = H(AE)φ − H(E)φ (9.2.33)= H(B)ω − H(AB)ω (9.2.34)= I(A〉B)ω (9.2.35)

for every pure state ψAA′ . Therefore,

supψAA′

H(A|E)φ = supψAA′

I(A〉B)ω = Ic(N). (9.2.36)

With α ∈ (1, ∞) chosen such that (9.2.25) holds, take n large enough sothat

δ2 ≥1

n(α− 1)log2

(128ε2

)+

1n

log2

(1

1− ε2

128

)+

4n

log2

(1ε

)+

15n

. (9.2.37)

Now, we use the fact that for the n and ε chosen above there exists an (n, d, ε)protocol such that

log2 dn≥ sup

ψAA′Hα(A|E)φ −

1n(α− 1)

log2

(128ε2

)

− 1n

log2

(1

1− ε2

128

)− 4

nlog2

(1ε

)− 15

n, (9.2.38)

which holds due to Corollary 9.17. Rearranging the right-hand side of thisinequality, and using (9.2.24), (9.2.25), and (9.2.37), we find that

log2 dn≥ Ic(N)−

(Ic(N)− sup

ψAA′Hα(A|E)φ +

1n(α− 1)

log2

(128ε2

)

+1n

log2

(1

1− ε2

128

)+

4n

log2

(1ε

)+

15n

)(9.2.39)

804

Page 816: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

≥ Ic(N)− (δ1 + δ2) (9.2.40)= Ic(N)− δ. (9.2.41)

Thus, there exists an (n, d, ε) quantum communication protocol with rate log2 dn ≥

Ic(N)− δ. Therefore, there exists an (n, 2n(R−δ), ε) quantum communicationprotocol with R = Ic(N) for all sufficiently large n such that (9.2.37) holds.Since ε and δ are arbitrary, we conclude that for all ε ∈ (0, 1], δ > 0, andsufficiently large n, there exists an (n, 2n(Ic(N)−δ), ε) quantum communicationprotocol. This means, by definition, that Ic(N) is an achievable rate for quan-tum communication over N.

Now, we can repeat the arguments above for the tensor-power channelN⊗k for all k ≥ 1, and so we conclude that 1

k Ic(N⊗k) is an achievable rate (thearguments are similar to the arguments in the proof of the achievability partof Proposition 8.19). Since k is arbitrary, we conclude that limk→∞

1k Ic(N⊗k) =

Icreg(N) is an achievable rate for quantum communication over N.

9.2.2 Proof of the Weak Converse

We now show that the regularized coherent information Icreg(N) is a weak

converse rate for quantum communication over N. This establishes that Q(N)≤ Ic

reg(N) and therefore that Q(N) = Icreg(N), completing the proof of Theo-

rem 9.16.

Let us first recall from Theorem 9.4 that for every quantum channel N,we have the following: for all ε ∈ [0, 1) and (d, ε) quantum communicationprotocols for N,

(1− 2ε) log2 d ≤ Ic(N) + h2(ε), (9.2.42)

log2 d ≤ Icα(N) +

α

α− 1log2

(1

1− ε

)∀ α > 1. (9.2.43)

To obtain these inequalities, we considered a quantum communication pro-tocol for a useless channel. The useless channel in the asymptotic settingis analogous to the one in Figure 9.2 and is shown in Figure 9.4. Applying(9.2.42) and (9.2.43) to the channel N⊗n leads to the following.

805

Page 817: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

DE

ΨRA′ ωRB′

R

A′ ... PσBn

A1

A2

...An−1

An

B1

B2

...Bn−1

Bn

B′

Alice Bob

FIGURE 9.4: Depiction of a protocol that is useless for quantum commu-nication in the asymptotic setting. The state encoding Alice’s share ofthe pure state ΨRA′ is discarded and replaced by an arbitrary (but fixed)state σBn .

Corollary 9.18 Upper Bounds for Quantum Communication in theAsymptotic Setting

Let N be a quantum channel. For all ε ∈ [0, 1), n ∈N, and (n, d, ε) quan-tum communication protocols over n uses of N, the number of transmit-ted qubits is bounded from above as follows:

(1− 2ε)log2 d

n≤ 1

nIc(N⊗n) +

1n

h2(ε), (9.2.44)

log2 dn≤ 1

nIcα(N

⊗n) +α

n(α− 1)log2

(1

1− ε

)∀ α > 1.

(9.2.45)

PROOF: Since the inequalities in (9.2.42) and (9.2.43) of Theorem 9.4 hold forevery channel N, they hold for the channel N⊗n. Therefore, applying (9.2.42)and (9.2.43) to N⊗n and dividing both sides by n, we obtain the desired re-sult.

The inequalities in the corollary above give us, for all ε ∈ [0, 1) and n ∈N,an upper bound on the rate of an arbitrary (n, d, ε) quantum communication

806

Page 818: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

protocol. If instead we fix a particular rate R by letting d = 2nR, then wecan obtain a lower bound on the error probability of an (n, 2nR, ε) quantumcommunication protocol. Specifically, using (9.2.45), we find that

ε ≥ 1− 2−n( α−1α )(R− 1

n Icα(N

⊗n)) (9.2.46)

for all α > 1.

The inequalities in (9.2.44) and (9.2.45) imply that

(1− 2ε)Qn,ε(N) ≤ 1n

Ic(N⊗n) +1n

h2(ε), (9.2.47)

Qn,ε(N) ≤ 1n

Icα(N

⊗n) +α

n(α− 1)log2

(1

1− ε

)∀ α > 1, (9.2.48)

with n ≥ 1 and ε ∈ [0, 1).

Using (9.2.44), we can now prove the weak converse part of Theorem 9.16.

Proof of the Weak Converse Part of Theorem 9.16

Suppose that R is an achievable rate. Then, by definition, for all ε ∈ (0, 1],δ > 0, and sufficiently large n, there exists an (n, 2n(R−δ), ε) quantum com-munication protocol for N. For all such protocols, the inequality (9.2.44) inCorollary 9.18 holds, so that

(1− 2ε)(R− δ) ≤ 1n

Ic(N⊗n) +1n

h2(ε). (9.2.49)

Since this bound holds for all sufficiently large n, it holds in the limit n→ ∞,so that

(1− 2ε)R ≤ limn→∞

(1n

Ic(N⊗n) +1n

h2(ε)

)+ (1− 2ε)δ, (9.2.50)

= limn→∞

1n

Ic(N⊗n) + (1− 2ε)δ. (9.2.51)

Then, since this inequality holds for all ε ∈ (0, 1/2) and δ > 0, we obtain

R ≤ limε,δ→0

1

1− 2εlim

n→∞

1n

Ic(N) + δ

= lim

n→∞

1n

Ic(N⊗n) = Icreg(N). (9.2.52)

We have thus shown that if R is an achievable rate, then R ≤ Icreg(N). The con-

trapositive of this statement is that if R > Icreg(N), then R is not an achievable

rate. By definition, therefore, Icreg(N) is a weak converse rate.

807

Page 819: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

9.2.3 The Additivity Question

Although we have shown that the quantum capacity Q(N) of a channel N isgiven by its regularized coherent information Ic

reg(N) = limn→∞1n Ic(N⊗n),

without the additivity of Ic(N), this result is not particularly helpful since itis not clear whether the regularized coherent information can be computed ingeneral.

The coherent information is always superadditive, meaning that for twoarbitrary quantum channels N1 and N2,

Ic(N1 ⊗N2) ≥ Ic(N1) + Ic(N2). (9.2.53)

This follows from the fact that coherent information is additive for productstates τA1B1 ⊗ωA2B2 :

I(A1A2〉B1B2)τ⊗ω = I(A1〉B1)τ + I(A2〉B2)ω, (9.2.54)

which is a consequence of (4.1.6) and the additivity of entropy for productstates (see (4.2.103)).

Now, let ψR1R2 A1 A2 , φR1 A1 , ϕR2 A2 be arbitrary pure states, where A1 andA2 are input systems to the channels N1 and N2, respectively, and dR1 = dA1and dR2 = dA2 . Then, letting

ρR1R2B1B2 := ((N1)A1→B1 ⊗ (N2)A2→B2)(ψR1R2 A1 A2), (9.2.55)τR1B1 := (N1)A1→B1(φR1 A1), (9.2.56)

ωR2B2 := (N2)A2→B2(ϕR2 A2), (9.2.57)

and restricting the optimization in the definition of coherent information of achannel to pure product states, we find that

Ic(N1 ⊗N2) = supψR1R2 A1 A2

I(R1R2〉B1B2)ρ (9.2.58)

≥ supφR1 A1

⊗ϕR2 A2

I(R1R2〉B1B2)τ⊗ω (9.2.59)

= supφR1 A1

⊗ϕR2 A2

I(R1〉A1)τ + I(R2〉B2)ω (9.2.60)

= supφR1 A1

I(R1〉B1)τ + supϕR2 A2

I(R2〉B2)ω (9.2.61)

= Ic(N1) + Ic(N2), (9.2.62)

808

Page 820: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

which is precisely (9.2.53). The reverse inequality does not hold in general,but it does for degradable channels (see Section 9.3.1 below).

For the sandwiched Rényi coherent information of a bipartite state ρAB,which is defined as

Icα(A〉B)ρ = inf

σBDα(ρAB‖1A ⊗ σB), (9.2.63)

where the optimization is over states σB, the following additivity equalityholds for all product states τA1B1 ⊗ωA2B2 and α ∈ (1, ∞):

Iα(A1A2〉B1B2)τ⊗ω = Iα(A1〉B1)τ + Iα(A2〉B2)ω. (9.2.64)

This equality follows by reasoning similar to that given for the proof of Propo-sition 6.21. By the same reasoning given in (9.2.58)–(9.2.62), we conclude that

Icα(N1 ⊗N2) ≥ Ic

α(N1) + Icα(N2) (9.2.65)

for all α ∈ (1, ∞), where Icα(N) is the sandwiched Rényi coherent information

of the channel Icα(N). Whether the reverse inequality holds, even for particu-

lar classes of channels, is an open question.

9.2.4 Rains Information Strong Converse Upper Bound

Except for channels for which the coherent information is known to be ad-ditive (such as the class of degradable channels; see Section 9.3.1 below), thequantum capacity of a channel is difficult to compute. This prompts us to findtractable upper bounds on quantum capacity. This search for tractable upperbounds is entirely analogous to what was done in Section 7.2.5 for classicalcommunication in order to obtain tractable strong converse upper bounds onclassical capacity.

Recall that in the previous chapter on entanglement distillation, our ap-proach to obtaining strong converse upper bounds on distillable entangle-ment consisted of comparing the state at the output of an entanglement dis-tillation protocol with one that is useless for entanglement distillation. Weconsidered the set of PPT′ operators as the useless set, and we obtained stateentanglement measures as upper bounds in the one-shot and asymptotic set-tings. Now, observe that entanglement transmission is similar to entangle-ment distillation in the sense that, like entanglement distillation, the errorcriterion for entanglement transmission involves comparing the output state

809

Page 821: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

of the protocol to the maximally entangled state. This suggests that the stateentanglement measures defined in Sections 5.2 and 5.3, and in particular theresults of Proposition 8.6 and Corollary 8.7, are relevant. However, the mainresource that we are considering in this chapter is a quantum channel andnot a quantum state, and so we have an extra degree of freedom in the inputstate to the channel, which we can optimize. This suggests that the channelentanglement measures from Section 5.5 are relevant, and this is indeed whatwe find.

Proposition 9.19

Let NA→B be a quantum channel. For an arbitrary (d, ε) quantum com-munication protocol for NA→B, the number log2 d of qubits transmittedover N satisfies

log2 d ≤ RεH(N), (9.2.66)

where

RεH(N) = sup

ψSA

RεH(S; B)ω (9.2.67)

= supψSA

infσSB∈PPT′(S:B)

D(NA→B(ψSA)‖σSB) (9.2.68)

is the ε-hypothesis testing Rains information of the channel N, defined in(5.5.129). Consequently, the one-shot quantum capacity does not exceedthe ε-hypothesis testing Rains information of the channel N:

Qε(N) ≤ RεH(N). (9.2.69)

REMARK: Note that in the expression for RεH(N) above it suffices to optimize over pure

states ψRA, with the dimension of S equal to the dimension of A. We showed this in(5.5.3)–(5.5.6) immediately after Definition 5.39.

PROOF: By the arguments in (9.1.10)–(9.1.12), a (d, ε) quantum communica-tion protocol is a (d, ε) entanglement transmission protocol. As such, we con-clude that the state ρRB′ defined in (9.1.18) satisfies Tr[ΦRB′ρRB′ ] ≥ 1− ε. Wecan therefore apply Proposition 8.6 to conclude that

log2 d ≤ RεH(S; B′)ρ. (9.2.70)

Note that

RεH(S; B′)ρ

810

Page 822: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

= infσSB′∈PPT′(S:B′)

DεH(ρSB′‖σSB′) (9.2.71)

= infσSB′∈PPT′(S:B′)

DεH((DB→B′ NA→B EA′→A)(ΦSA′)‖σSB′). (9.2.72)

Now, since every local channel is completely PPT preserving (this followsimmediately from Proposition 3.44 and Lemma 3.45), we conclude that thechannel DB→B′ ≡ idS ⊗DB→B′ is completely PPT preserving, so that the set

DB→B′(τSB) : τSB ∈ PPT′(S : B) (9.2.73)

is a subset of PPT′(S : B′). Thus, by restricting the optimization over all oper-ators σSB′ ∈ PPT′(S : B′) to the outputs DB→B′(τSB) of the decoding channelDB→B′ acting on operators τSB ∈ PPT′(S : B), we obtain

RεH(S; B′)ω

≤ infτSB∈PPT′(S:B)

DεH((DB→B′ NA→B EA′→A)(ΦSA′)‖DB→B′(τSB))

(9.2.74)≤ inf

τSB∈PPT′(S:B)Dε

H((NA→B EA′→A)(ΦSA′)‖τSB) (9.2.75)

= infτSB∈PPT′(S:B)

DεH(NA→B(ρSA)‖τSB), (9.2.76)

where the second inequality follows from the data-processing inequality forhypothesis testing relative entropy, and the equality follows by letting ρSA =EA′→A(ΦSA′). Finally, after optimizing over all states ρSA, we obtain

RεH(S; B′)ρ ≤ sup

ρSA

infτSB∈PPT′(S:B)

DεH(NA→B(ρSA)‖τSB) (9.2.77)

= RεH(N), (9.2.78)

so that, by (9.2.70), we conclude that

log2 d ≤ RεH(N), (9.2.79)

as required.

The result of Proposition 9.19 is analogous to the result of Theorem 9.3.Combining it with Proposition 4.68 leads to the following:

811

Page 823: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Corollary 9.20 One-Shot Rains Upper Bound for Quantum Commu-nication

Let NA→B be a quantum channel, and let ε ∈ [0, 1). For all (d, ε) quantumcommunication protocols over N, we have that

log2 d ≤ Rα(N) +α

α− 1log2

(1

1− ε

)∀ α > 1, (9.2.80)

where

Rα(N) = supψSA

Rα(S; B)ω (9.2.81)

= supψSA

infσSB∈PPT′(S:B)

Dα(NA→B(ψSA)‖σSB) (9.2.82)

is the sandwiched Rényi Rains information of N, defined in (5.5.131).

REMARK: Note that in the expression for Rα(N) above it suffices to optimize over purestates ψSA, with the dimension of S equal to the dimension of A. We showed this in (5.5.3)–(5.5.6) immediately after Definition 5.39.

Since the inequality in (9.2.80) holds for all (d, ε) quantum communicationprotocols, we conclude the following upper bound on the one-shot quantumcapacity:

Qε(N) ≤ Rα(N) +α

α− 1log2

(1

1− ε

)(9.2.83)

for all α > 1.

For n channel uses, the bound in (9.2.80) becomes

log2 dn≤ 1

nRα(N

⊗n) +α

n(α− 1)log2

(1

1− ε

)∀ α > 1, (9.2.84)

which holds for an arbitrary (n, d, ε) quantum communication protocol thatemploys n uses of the channel N, where n ≥ 1 and ε ∈ [0, 1). We can simplifythis inequality by making use of the following fact.

812

Page 824: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Proposition 9.21 Weak Subadditivity of Rényi Rains Information ofa Channel

Let NA→B be a quantum channel, with dA the dimension of the inputsystem A. For all α > 1 and all n ∈N, we have

Rα(N⊗n) ≤ nRα(N) +

α(d2A − 1)

α− 1log2(n + 1). (9.2.85)

PROOF: Throughout this proof, for convenience we make use of the alternatenotation

Rα(ρAB) ≡ Rα(A; B)ρ (9.2.86)

where ρAB is a bipartite state.

Let ψSAn be an arbitrary pure state, with the dimension of S equal to thedimension of An, and let ρAn := TrS[ψSAn ]. We start by observing that thechannel N⊗n is covariant with respect to the symmetric group Sn. In particu-lar, if we let Wπ

Anπ∈Sn and WπBnπ∈Sn be the unitary representations of Sn,

defined in (2.5.1), acting on H⊗nA and H⊗n

B , respectively, then for every stateρAn , we have that

N⊗n(WπAnρAnWπ†

An ) = WπBnN

⊗n(ρAn)Wπ†Bn (9.2.87)

for all π ∈ Sn. Consequently, by Proposition 5.50, in particular by (5.5.140),we find that

Rα(N⊗nA→B(ψSAn)) ≤ Rα(N

⊗nA→B(ψ

ρSAn)), (9.2.88)

where the state ρAn is defined as

ρAn =1n! ∑

π∈Sn

WπAnρAnWπ†

An , (9.2.89)

and ψρSAn is a purification of ρAn .

Since the state ρAn is permutation invariant by definition, by Lemma 3.6, ithas a purification |φρ〉An An ∈ Symn(HAA), where the dimension of A is equalto the dimension of A. Consequently, there exists an isometry VS→An suchthat

VS→An |ψρ〉SAn = |φρ〉An An . (9.2.90)

Therefore, by isometric invariance of the sandwiched Rényi Rains relativeentropy, we obtain

Rα(N⊗nA→B(ψ

ρSAn)) = Rα(N

⊗nA→B(φ

ρ

An An)). (9.2.91)

813

Page 825: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Now, since φρ

An An is a state, the operator inequality φρ

An An ≤ 1An An holds.Multiplying both sides of this inequality from the left and right by the projec-tion ΠSymn(HAA)

≡ ΠAn An onto the symmetric subspace of H⊗nAA

, we obtain

ΠAn An |φρ〉〈φρ|An AnΠAn An ≤ Π2An An = ΠAn An . (9.2.92)

But |φρ〉An An ∈ Symn(HAA), which means that

ΠAn An |φρ〉〈φρ|An AnΠAn An = |φρ〉〈φρ|An An . (9.2.93)

Therefore,

φρ

An An ≤ ΠAn An =

(d2

A + n− 1n

) ∫φ⊗n

AAdφ, (9.2.94)

where the equality follows from (2.5.11). Now, note that

(n + d2

A − 1n

)=

(n + d2A − 1)(n + d2

A − 2) · · · (n + 2)(n + 1)(d2

A − 1)(d2A − 2) · · · 2 · 1 (9.2.95)

=n + d2

A − 1d2

A − 1· n + d2

A − 2d2

A − 2· · · n + 2

2· n + 1

1. (9.2.96)

Then, using the fact that n+kk ≤ n + 1 for all k ≥ 1, and applying this inequal-

ity to each factor on the right-hand side of the above equation, we obtain(

n + d2A − 1

n

)≤ (n + 1)d2

A−1 (9.2.97)

Therefore,

φρ

An An ≤ (n + 1)d2A−1

∫φ⊗n

AAdφ ≡ (n + 1)d2

A−1ξ An An . (9.2.98)

Next, we use (4.5.44) to obtain

Rα(N⊗nA→B(φ

ρ

An An) ≤ Rα(N⊗nA→B(ξ An An)) +

α

α− 1log2(n + 1)d2

A−1. (9.2.99)

Then, by quasi-convexity of the Rényi Rains relative entropy (Proposition 5.26),we obtain

Rα(N⊗nA→B(ξ An An)) ≤ sup

φAA

Rα(N⊗nA→B(φ

⊗nAA

)). (9.2.100)

814

Page 826: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Then, using subadditivity of the sandwiched Rényi Rains relative entropy fortensor-product states, as given by (5.3.18), we find that

Rα(N⊗nA→B(φ

⊗nAA

)) ≤ nRα(NA→B(φAA)). (9.2.101)

Putting everything together, we finally obtain

Rα(N⊗nA→B(ψSAn))

≤ n supφAA

Rα(NA→B(φAA)) +α(d2

A − 1)α− 1

log2(n + 1) (9.2.102)

= nRα(N) +α(d2

A − 1)α− 1

log2(n + 1). (9.2.103)

Since the pure state ψSAn is arbitrary, we conclude that

Rα(N⊗n) = sup

ψSAn

Rα(N⊗nA→B(ψSAn)) ≤ nRα(N) +

α(d2A − 1)

α− 1log2(n + 1),

(9.2.104)as required.

Combining (9.2.85) with (9.2.84) leads to the following upper bound on therate of an arbitrary (n, d, ε) quantum communication protocol for a quantumchannel NA→B:

log2 dn≤ Rα(N) +

α

n(α− 1)log2

((n + 1)d2

A−1

1− ε

)(9.2.105)

for all α > 1. Consequently, the following bound holds for the n-shot quan-tum capacity:

Qn,ε(N) ≤ Rα(N) +α

n(α− 1)log2

((n + 1)d2

A−1

1− ε

)(9.2.106)

for all α > 1.

With this bound, we are now ready to state the main result of this section,which is that the Rains information of a channel is an upper bound on thestrong converse capacity of an arbitrary quantum channel N.

815

Page 827: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Theorem 9.22 Strong Converse Upper Bound on Quantum Capacity

The Rains information R(N) of a quantum channel NA→B is a strongconverse rate for quantum communication over N. In other words,Q(N) ≤ R(N) for every quantum channel N.

Recall from (5.5.127) that

R(N) = supψSA

infσSB∈PPT′(S:B)

D(NA→B(ψSA)‖σSB), (9.2.107)

where the supremum is with respect to pure states ψSA with dS = dA.

PROOF: Let ε ∈ [0, 1) and δ > 0. Let δ1, δ2 > 0 be such that

δ > δ1 + δ2 =: δ′. (9.2.108)

Set α ∈ (1, ∞) such thatδ1 ≥ Rα(N)− R(N), (9.2.109)

which is possible because Rα(N) is monotonically increasing in α (which fol-lows from Proposition 4.29) and because limα→1+ Rα(N) = R(N) (see Ap-pendix 5.B for a proof). With this value of α, take n large enough so that

δ2 ≥α

n(α− 1)log2

((n + 1)d2

A−1

1− ε

), (9.2.110)

where dA is the dimension of the input space of the channel N.

Now, with the values of n and ε as above, an (n, d, ε) quantum communi-cation protocol satisfies (9.2.105), i.e.,

log2 dn≤ Rα(N) +

α

n(α− 1)log2

((n + 1)d2

A−1

1− ε

), (9.2.111)

for all α > 1. Rearranging the right-hand side of this inequality, and using(9.2.108)–(9.2.110), we obtain

log2 dn≤ R(N) + Rα(N)− R(N) +

α

n(α− 1)log2

((n + 1)d2

A−1

1− ε

)(9.2.112)

≤ R(N) + δ1 + δ2 (9.2.113)

816

Page 828: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

= R(N) + δ′ (9.2.114)< R(N) + δ. (9.2.115)

So we have that R(N) + δ >log2 d

n for all (n, d, ε) quantum communicationprotocols with sufficiently large n. Due to this strict inequality, it follows thatthere cannot exist an (n, 2n(R(N)+δ), ε) quantum communication protocol forall sufficiently large n such that (9.2.110) holds, for if it did there would exista d such that log2 d = n(R(N) + δ), which we have just seen is not possible.Since ε and δ are arbitrary, we conclude that for all ε ∈ [0, 1), δ > 0, andsufficiently large n, there does not exist an (n, 2n(R(N)+δ), ε) quantum commu-nication protocol. This means that R(N) is a strong converse rate, and thusthat Q(N) ≤ R(N).

9.2.4.1 The Strong Converse from a Different Point of View

Let us now show that the Rains relative entropy of a quantum channel N isa strong converse rate according to the definition of a strong converse ratein Appendix A. To this end, consider a sequence (n, 2nr, εn)n∈N of (n, d, ε)quantum communication protocols, with each element of the sequence hav-ing an arbitrary (but fixed) rate r > R(N). For each element of the sequence,the inequality in (9.2.105) holds, which means that

r ≤ Rα(N) +α

n(α− 1)log2

((n + 1)d2

A−1

1− εn

)(9.2.116)

for all α > 1. Rearranging this inequality leads to the following lower boundon the error probabilities εn:

εn ≥ 1− (n + 1)d2A−1 · 2−n( α−1

α )(r−Rα(N)). (9.2.117)

Now, since r > R(N), limα→1+ Rα(N) = R(N) (see Appendix 5.B for a proof),and since the sandwiched Rényi Rains relative entropy is monotonically in-creasing in α (see Proposition 4.29), there exists an α∗ > 1 such that R >

Rα∗(N). Applying the inequality in (9.2.117) to this value of α, we find that

εn ≥ 1− (n + 1)d2A−1 · 2−n

(α∗−1

α∗)(r−Rα∗ (N)). (9.2.118)

Then, taking the limit n → ∞ on both sides of this inequality, we concludethat limn→∞ εn ≥ 1. However, εn ≤ 1 for all n since εn is a probability by

817

Page 829: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

definition. So we conclude that limn→∞ εn = 1. Since the rate r > R(N) isarbitrary, we conclude that R(N) is a strong converse rate. We also see from(9.2.118) that the sequence εnn∈N approaches one at an exponential rate.

9.2.5 Squashed Entanglement Weak Converse Bound

One of the results of Chapter 14 is that the squashed entanglement of a quan-tum channel (see Definition 5.52) is a weak converse rate for quantum com-munication assisted by LOCC. Since the LOCC-assisted quantum capacityis an upper bound on the unassisted quantum capacity considered in thischapter, we conclude that the squashed entanglement of a channel is a weakconverse rate for unassisted quantum communication.

We present some statements along these lines briefly here, indicating thatthe squashed entanglement gives an upper bound on the one-shot quantumcapacity, the n-shot quantum capacity, as well as the asymptotic quantumcapacity. Complete proofs of these statements are available in Chapter 14, andthey follow from the fact that the assistance of LOCC can only increase ratesof quantum communication. Propositions 9.23 and 9.24 below are a directconsequence of Theorem 14.4, and Theorem 9.25 below is a consequence ofTheorem 14.15.

Proposition 9.23 One-Shot Squashed Entanglement Upper Bound

Let NA→B be a quantum channel, and let ε ∈ [0, 1). For all (d, ε) quantumcommunication protocols over the channel NA→B, the following boundholds

log2 d ≤ 11−√ε

(Esq(N) + g2(

√ε))

, (9.2.119)

where Esq(N) is the squashed entanglement of the channel N (see Defi-nition 5.52) and g2(δ) := (δ + 1) log2(δ + 1)− δ log2 δ. Consequently, forthe one-shot quantum capacity of N,

Qε(N) ≤ 11−√ε

(Esq(N) + g2(

√ε))

. (9.2.120)

818

Page 830: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Proposition 9.24 n-Shot Squashed Entanglement Upper Bound

Let NA→B be a quantum channel, and let ε ∈ [0, 1). For all (n, d, ε) quan-tum communication protocols over the channel NA→B, the followingbound holds

1n

log2 d ≤ 11−√ε

(Esq(N) +

g2(√

ε)

n

). (9.2.121)

Consequently, for the n-shot quantum capacity of N,

Qn,ε(N) ≤ 11−√ε

(Esq(N) +

g2(√

ε)

n

). (9.2.122)

Theorem 9.25 Weak Converse Upper Bound on Quantum Capacity

The squashed entanglement Esq(N) of a quantum channel NA→B is aweak converse rate for quantum communication over N. In other words,Q(N) ≤ Esq(N) for every quantum channel N.

9.3 Examples

We now consider the quantum capacity for particular classes of quantumchannels. As remarked earlier, computing the quantum capacity of an ar-bitrary channel is a difficult task. This task is made more difficult by the factthat, in some cases, the coherent information is known to be strictly superaddi-tive, meaning that

Ic(N⊗n) > nIc(N). (9.3.1)

This fact confirms that regularization of the coherent information really isneeded in general in order to compute the quantum capacity, and that addi-tivity of coherent information is simply not true for all channels. Another in-teresting phenomenon related to quantum capacity is superactivation, which iswhen two channels N1 and N2, each with zero quantum capacity, i.e., Q(N1) =Q(N2) = 0, can combine to have non-zero quantum capacity, i.e., Q(N1 ⊗N2) > 0. Please consult the Bibliographic Notes in Section 9.5 for more infor-mation about strict superadditivity and superactivation.

819

Page 831: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

In this section, we show that coherent information is additive for all degrad-able channels, which means that regularization is not needed in order to com-pute their capacities. The same turns out to be true for generalized dephasingchannels, and we prove this by showing that the Rains relative entropy ofthose channels coincides with their coherent information. We also show thatanti-degradable channels have zero quantum capacity. Finally, we evaluatethe upper and lower bounds established in this chapter for the generalizedamplitude damping channel.

Before starting, let us first recall the definition of coherent information ofa channel:

Ic(N) = supψRA

I(R〉B)ω = supρH(N(ρ))− H(Nc(ρ)), (9.3.2)

where ωRB = NA→B(ψRA) and the second equality is explained in (4.11.111)–(4.11.113). We let

Ic(ρ,N) := H(N(ρ))− H(Nc(ρ)). (9.3.3)

9.3.1 Degradable Channels

Recall from Definition 3.25 that a channel NA→B is degradable if there existsa channel DB→E such that

Nc = D N, (9.3.4)

where Nc is a channel complementary to N (see Definition 3.24) and dE ≥rank(ΓN

AB). In particular, if VA→BE is an isometric extension of N, so that

N(ρ) = TrE[VρV†] (9.3.5)

for every state ρ, thenNc(ρ) = TrB[VρV†]. (9.3.6)

We now show that the coherent information is additive for degradablequantum channels, meaning that

Ic(N⊗M) = Ic(N) + Ic(M) (9.3.7)

for all degradable quantum channels N and M. Consequently, regularizationis unnecessary, and we conclude that the quantum capacity of a degradablechannel is equal to its coherent information:

Q(N) = Ic(N) for every degradable channel N. (9.3.8)

820

Page 832: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Proposition 9.26 Additivity of Coherent Information for Degrad-able Channels

Let N and M be degradable channels. Then, the coherent information isadditive, i.e.,

Ic(N⊗M) = Ic(N) + Ic(M). (9.3.9)

PROOF: As shown in Section 9.2.3, we always have superadditivity of coher-ent information, so that Ic(N ⊗M) ≥ Ic(N) + Ic(M). So we prove that thereverse inequality also holds for the case of degradable channels.

Let D1 and D2 be the degrading channels for N and M, respectively, mean-ing that

Nc = D1 N, Mc = D2 M. (9.3.10)

Now, let ρA1 A2 be an arbitrary state on the input systems A1 and A2 of thechannels N and M, respectively. Using (4.11.103) and (4.11.104), along withthe fact that (N⊗M)c = Nc ⊗Mc, we find that

H(Nc(ρA1)) + H(Mc(ρA2)− H((N⊗M)c(ρA1 A2) (9.3.11)= D((N⊗M)c(ρA1 A2)‖Nc(ρA1)⊗Mc(ρA2)) (9.3.12)= D((D1 N⊗D2 M)(ρA1 A2)‖(D1 N)(ρA1)⊗ (D2 M)(ρA2))

(9.3.13)≤ D((N⊗M)(ρA1 A2)‖N(ρA1)⊗M(ρA2)) (9.3.14)= H(N(ρA1)) + H(M(ρA2))− H((N⊗M)(ρA1 A2)), (9.3.15)

where the third equality follows from (9.3.10), the inequality follows from thedata-processing inequality for quantum relative entropy, and the last equal-ity from (4.11.103) and (4.11.104). Rearranging this inequality and applyingsubadditivity of the entropy H((N⊗M)(ρA1 A2)) gives

H((N⊗M)(ρA1 A2))− H((N⊗M)c(ρA1 A2)) (9.3.16)≤ H(N(ρA1))− H(Nc(ρA2)) + H(M(ρA2))− H(Mc(ρA2)) (9.3.17)≤ sup

ρA1

H(N(ρA1))− H(Nc(ρA1)) (9.3.18)

+ supρA2

H(M(ρA2))− H(Mc(ρA2)) (9.3.19)

= Ic(N) + Ic(M) (9.3.20)

821

Page 833: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Since the state ρA1 A2 is arbitrary, we conclude that

Ic(N⊗M) = supρA1 A2

H((N⊗M)(ρA1 A2))− H((N⊗N)c(ρA1 A2)) (9.3.21)

≤ Ic(N) + Ic(M), (9.3.22)

as required.

Another useful fact about a degradable channel N is that the coherent in-formation Ic(ρ,N) defined in (9.3.3) is concave in the input state ρ.

Lemma 9.27

For a degradable channel N, the function ρ 7→ Ic(ρ,N) is concave in theinput state ρ. In other words, for every finite alphabet X, probabilitydistribution p : X→ [0, 1], and set ρx

Ax∈X of states,

Ic

(∑

x∈Xp(x)ρx

A,N

)≥ ∑

x∈Xp(x)Ic(ρx

A,N). (9.3.23)

PROOF: Let D be a degrading channel corresponding to N, so that Nc = D N. Next, define the following states:

ωXB := ∑x∈X

p(x)|x〉〈x|X ⊗NA→B(ρxA), (9.3.24)

τXE := ∑x∈X

p(x)|x〉〈x|X ⊗Nc(ρxA) = DB→E(ωXB) (9.3.25)

Then, by noting that τXE = DB→E(ωXB) and applying the data-processinginequality for quantum mutual information (see (4.2.199)), we obtain

I(X; E)τ ≤ I(X; B)ω . (9.3.26)

Then, using (4.1.10) and rearranging leads to

H(B)ω − H(E)τ ≥ H(B|X)ω − H(E|X)τ, (9.3.27)

which is the desired inequality in (9.3.23). Indeed, the left-hand side of theinequality above is simply Ic(∑x∈X p(x)ρx

A,N). For the right-hand side, we

find that

H(B|X)ω = ∑x∈X

p(x)H(N(ρxA)), (9.3.28)

822

Page 834: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

H(E|X)τ = ∑x∈X

p(x)H(Nc(ρxA)), (9.3.29)

because ωXB and τXE are classical-quantum states. Therefore, the right-handside of (9.3.27) is equal to ∑x∈X p(x)Ic(ρx

A,N).

Generalized Dephasing Channels

While additivity of coherent information for degradable channels allows fora tractable formula for their quantum capacity, the question about the strongconverse quantum capacity Q(N) remains. In other words, is it the case thatQ(N) = Ic(N) for all degradable quantum channels N? We answer this ques-tion here for a particular class of degradable channels.

We consider the class of degradable channels N called generalized dephasingchannels. Such channels are defined by the following isometric extension:

VNA→BE =

d−1

∑i=0|i〉B〈i|A ⊗ |ψi〉E, (9.3.30)

where d ≥ 2 and where the state vectors |ψi〉d−1i=0 are arbitrary (not neces-

sarily orthonormal). Recalling the discussion in Section 3.2.9 on Hadamardchannels, in particular (3.2.229), we see that generalized dephasing channelsare Hadamard channels, as in (3.2.222), with V therein set to 1.

For a state ρA, we have that

NA→B(ρA) = TrE[VNA→BEρAVN†

A→BE] =d−1

∑i,j=0〈i|ρA|j〉

⟨ψi|ψj

⟩|i〉〈j|B, (9.3.31)

and

NcA→E(ρA) = TrB[VN

A→BEρAVN†A→BE] =

d−1

∑i=0〈i|ρA|i〉|ψi〉〈ψi|E. (9.3.32)

Then, it is straightforward to see that

Nc N(ρ) = Nc(ρ) (9.3.33)

for every state ρ. This implies that generalized dephasing channels N aredegradable, with Nc being the degrading channel.

823

Page 835: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

We now show that Q(N) = Ic(N) for every generalized dephasing chan-nel N. We do this by showing that the Rains information R(N) of a general-ized dephasing channel is equal to its coherent information.

Theorem 9.28 Quantum Capacity of Generalized Dephasing Chan-nels

For every generalized dephasing channel N (defined by the isometricextension in (9.3.30)), the following equalities hold

Q(N) = Q(N) = R(N) = Ic(N), (9.3.34)

which establish the coherent information as the quantum capacity andstrong converse quantum capacity.

PROOF: It suffices to show that R(N) = Ic(N). Note that the inequalityIc(N) ≤ R(N) holds for every quantum channel N by combining the result ofTheorem 9.16 with the result of Theorem 9.22. We now show that the reverseinequality holds for all generalized dephasing channels.

We start by observing that every generalized dephasing channel N is co-variant with respect to the operators Z(j)d−1

j=0 defined in (3.2.116):

N(Z(j)ρZ(j)†) = Z(j)N(ρ)Z(j)† (9.3.35)

for every state ρ and for all 0 ≤ j ≤ d− 1, where

Z(j) =d−1

∑k=0

e2πikj

d |k〉〈k|. (9.3.36)

Then, for every state ρ, the average state

ρ :=1d

d−1

∑j=0

Z(j)ρZ(j)† =d−1

∑k=0〈k|ρ|k〉|k〉〈k| (9.3.37)

is diagonal in the basis |i〉d−1i=0 . Since the quantities 〈k|ρ|k〉 are probabilities,

we conclude that for every state ρ, its corresponding average state ρ has apurification of the following form:

|φρ〉RA =d−1

∑i=0

√p(i)|i〉R ⊗ |i〉A =: |ψp〉RA, (9.3.38)

824

Page 836: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

where p : 0, 1, . . . , d− 1 → [0, 1] is a probability distribution. Therefore, byProposition 5.50, when calculating the Rains information R(N), it suffices tooptimize over the pure states ψ

pRA:

R(N) = supψ

pRA

infσRB∈PPT′(R:B)

D(NA→B(ψpRA)‖σRB). (9.3.39)

Now, restricting the optimization in the definition of the coherent infor-mation Ic(N) to the pure states in (9.3.38), we obtain

Ic(N) = supψRA

infσB

D(NA→B(ψRA)‖1R ⊗ σB) (9.3.40)

≥ supψ

pRA

infσB

D(NA→B(ψpRA)‖1R ⊗ σB) (9.3.41)

≥ supψ

pRA

infσB

D(∆(NA→B(ψpRA))‖∆(1R ⊗ σB)), (9.3.42)

where the last line follows from the data-processing inequality for quantumrelative entropy, and we introduced the following channel:

∆(ρ) := ΠρΠ + (1−Π)ρ(1−Π), Π =d−1

∑i=0|i〉〈i|R ⊗ |i〉〈i|B. (9.3.43)

Now, it is straightforward to check that

ΠNA→B(ψpRA)Π = NA→B(ψ

pRA), (9.3.44)

which implies that

∆(NA→B(ψpRA)) = ΠNA→B(ψ

pRA)Π. (9.3.45)

Therefore, because Π and 1−Π project onto orthogonal subspaces, we obtain

D(∆(NA→B(ψpRA))‖∆(1R ⊗ σB))

= D(ΠNA→B(ψpRA)Π‖Π(1R ⊗ σB)Π) (9.3.46)

= D

(ΠNA→B(ψ

pRA)Π

∥∥∥∥∥d−1

∑i=0

q(i)|i〉〈i|R ⊗ |i〉〈i|B)

, (9.3.47)

where the last line follows because

Π(1R ⊗ σB)Π =d−1

∑i=0

q(i)|i〉〈i|R ⊗ |i〉〈i|B, (9.3.48)

825

Page 837: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

with the probability distribution q(i) := 〈i|σB|i〉. Note that the right-handside of the equation above is a state in PPT′(R : B). Therefore, we have

Ic(N) ≥ supψ

pRA

infσB

D(∆(NA→B(ψpRA))‖∆(1R ⊗ σB)) (9.3.49)

= supψ

pRA

infσB

D(ΠNA→B(ψpRA)Π‖Π(1R ⊗ σB)Π) (9.3.50)

= supψ

pRA

infσB

D

(NA→B(ψ

pRA)

∥∥∥∥∥d−1

∑i=0〈i|σB|i〉|i〉〈i|R ⊗ |i〉〈i|B

)(9.3.51)

≥ supψ

pRA

infσRB∈PPT′(R:B)

D(NA→B(ψpRA)‖σRB) (9.3.52)

= R(N), (9.3.53)

completing the proof.

9.3.2 Anti-Degradable Channels

Let us now consider anti-degradable channels. Recall from Definition 3.25that a channel NA→B is anti-degradable if there exists an anti-degrading chan-nel AE→B such that

N = A Nc, (9.3.54)

where Nc is a channel complementary to N and dE ≥ rank(ΓNAB).

Proposition 9.29 Coherent Information for Anti-Degradable Chan-nels

The coherent information vanishes for all anti-degradable channels, i.e.,Ic(N) = 0 for every anti-degradable channel N. Therefore, Q(N) = 0 forall anti-degradable channels.

PROOF: Let NA→B have the following Stinespring representation: N(ρA) =TrE[VA→BEρAV†

A→BE]. Then, for every pure state ψRA, the state vector

|φ〉RBE := VA→BE|ψ〉RA (9.3.55)

is such thatI(R〉B)φ =

12(

I(R; B)φ − I(R; E)φ

). (9.3.56)

826

Page 838: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

To see this, let us first note that H(E)φ = H(RB)φ and H(RE)φ = H(B)φ.These identities hold because φRBE is a pure state, implying that the reducedstates φE and φRB have the same spectrum and the reduced states φRE and φBhave the same spectrum. This, along with (4.11.103), leads to

12(

I(R; B)φ − I(R; E)φ

)(9.3.57)

=12(

H(R)φ + H(B)φ − H(RB)φ − H(R)φ − H(E)φ + H(RE)φ

)(9.3.58)

= H(B)φ − H(RB)φ (9.3.59)= I(R〉B)φ. (9.3.60)

Next, using (4.11.104), noting that φRB = NA→B(ψRA) =: ωRB, and using thefact that N is anti-degradable, so that N = A Nc, we have

I(R; B)φ ≤ I(R; E)φ, (9.3.61)

where the inequality follows from the data-processing inequality for mutualinformation under local channels (see (4.2.199)) and the facts that NA→B(ψRA) =AE→B Nc

A→E(ψRA) and the reduced state φRE = NcA→E(ψRA). Therefore,

from (9.3.56) we conclude that

I(R〉B)ω ≤ 0 (9.3.62)

for every pure state ψRA. This implies that

Ic(N) = supψRA

I(R〉B)ω = 0, (9.3.63)

as required.

9.3.3 Generalized Amplitude Damping Channel

Let us recall the definition of the generalized amplitude damping channel(GADC) from (3.2.96):

Aγ,N(ρ) = A1ρA†1 + A2ρA†

2 + A3ρA†3 + A4ρA†

4, (9.3.64)

where

A1 =√

1− N(

1 00√

1− γ

), A2 =

√1− N

(0√

γ0 0

), (9.3.65)

827

Page 839: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

A3 =√

N(√

1− γ 00 1

), A4 =

√N(

0 0√γ 0

), (9.3.66)

and γ, N ∈ [0, 1]. It is straightforward to show that

Aγ,N(ρ) = XAγ,1−N(XρX)X (9.3.67)

for every state ρ and all γ, N ∈ [0, 1]. In other words, the GADC Aγ,N is re-lated to the GADC Aγ,1−N via a simple pre- and post-processing by the Pauliunitary X = |0〉〈1| + |1〉〈0|. The information-theoretic aspects of the GADCare thus invariant under the interchange N ↔ 1− N, which means that wecan, without loss of generality, restrict the parameter N to the interval [0, 1/2].

For N = 0, the GADC reduces to the amplitude damping channel Aγ

defined in (3.2.87), which is degradable. Indeed, we first note that

Acγ,0 = A1−γ,0, (9.3.68)

where the complementary channel Acγ,0 (recall Definition 3.24) is defined via

the following isometric extension:

Vγ,N := A1 ⊗ |0〉+ A2 ⊗ |1〉+ A3 ⊗ |2〉+ A4 ⊗ |3〉. (9.3.69)

We now use the fact that, for all γ1, γ2, N1, N2 ∈ [0, 1],

Aγ,N = Aγ2,N2 Aγ1,N1 , (9.3.70)

where γ = γ1 + γ2− γ1γ2 and N = γ1(1−γ2)N1+γ2N2γ1+γ2−γ1γ2

. From this fact, it followsthat the defining condition for degradability, namely, Dγ,0 Aγ,0 = Ac

γ,0 =A1−γ,0, is satisfied by the quantum channel Dγ,0 := A 1−2γ

1−γ ,0. It can be shown

that for N > 0, the GADC Aγ,N is not degradable for all γ ∈ (0, 1] (pleaseconsult the Bibliographic Notes in Section 9.5).

Since Aγ,0 is degradable, its coherent information is additive, which meansthat its quantum capacity is equal to its coherent information, i.e.,

Q(Aγ,0) = Ic(Aγ,0) = supρ

H(Aγ,0(ρ))− H(Ac

γ,0(ρ))

(9.3.71)

= supρ

Ic(ρ,Aγ,0), (9.3.72)

where we have used the expression in (9.3.2). Now, as explained in Sec-tion 6.3.2, the GADC is covariant with respect to the Pauli operator Z. Fur-thermore, by Lemma 9.27, the function ρ 7→ Ic(ρ,Aγ,0) is concave. Therefore,

828

Page 840: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

for every state ρ,

Ic(

12

ρ +12

ZρZ,Aγ,0

)≥ 1

2Ic(ρ,Aγ,0) +

12

Ic(ZρZ,Aγ,0). (9.3.73)

Now, using the fact that Aγ,0 is covariant with respect to Z, and the fact thatAc

γ,0 = A1−γ,0, we obtain

Ic(ZρZ,Aγ,0) = H(Aγ,0(ZρZ))− H(Acγ,0(ZρZ)) (9.3.74)

= H(ZAγ,0(ρ)Z)− H(ZA1−γ,0(ρ)Z) (9.3.75)= H(Aγ,0(ρ))− H(A1−γ,0(ρ)) (9.3.76)= H(Aγ,0(ρ))− H(Ac

γ,0(ρ)) (9.3.77)

= Ic(ρ,Aγ,0). (9.3.78)

Therefore,

Ic(

12

ρ +12

ZρZ,Aγ,0

)≥ Ic(ρ,Aγ,0) (9.3.79)

for every state ρ. Recalling from (3.2.109) that the state 12ρ + 1

2 ZρZ resultsfrom the action of the completely dephasing channel on ρ, which means thatit is diagonal in the standard basis, we find that

maxp∈[0,1]

Ic((1− p)|0〉〈0|+ p|1〉〈1|,Aγ,0) ≥ Ic(ρ,Aγ,0) (9.3.80)

for every state ρ, which means that

Q(Aγ,0) = Ic(Aγ,0) = maxp∈[0,1]

Ic((1− p)|0〉〈0|+ p|1〉〈1|,Aγ,0) (9.3.81)

= maxp∈[0,1]

h2((1− γ)p)− h2(γp), (9.3.82)

where in the last line we have evaluated Ic((1− p)|0〉〈0|+ p|1〉〈1|,Aγ,0). SeeFigure 9.5 for a plot of the quantum capacity of the amplitude damping chan-nel Aγ,0. Note that the capacity vanishes at γ = 1

2 , which is due to the factthat for γ ≥ 1

2 the amplitude damping channel Aγ,0 (and more generally theGADC Aγ,N for N ∈ [0, 1]) is anti-degradable. From Proposition 9.29, we thushave that Q(Aγ,N) = 0 for all N ∈ [0, 1] and γ ≥ 1

2 .

Let us now consider the coherent information of the GADC Aγ,N for N >0. In this case, the coherent information Ic(Aγ,N) is a lower bound on thequantum capacity of the GADC. As with the amplitude damping channel,

829

Page 841: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

0.0 0.2 0.4 0.6 0.8 1.0γ

0.0

0.2

0.4

0.6

0.8

1.0

Q(A

γ,0

)

FIGURE 9.5: Quantum capacity of the amplitude damping channel, asgiven by (9.3.82). The capacity is equal to zero for γ ≥ 1

2 because in thisparameter range the channel is anti-degradable.

it can be shown that for the GADC Aγ,N with N > 0 it suffices to optimizeover states diagonal in the standard basis in order to compute the coherentinformation:

Ic(Aγ,N) = maxp∈[0,1]

Ic((1− p)|0〉〈0|+ p|1〉〈1|,Aγ,N), (9.3.83)

for all γ ∈ (0, 1) and all N > 0. The proof of this is more involved, since forN > 0 the GADC is not degradable, meaning that we cannot use Lemma 9.27.Please consult the Bibliographic Notes in Section 9.5 for a source of the proof.

In Figure 9.6, we plot the coherent information lower bound given by(9.3.83). We also plot the Rains information upper bound R(Aγ,N) as wellas four other upper bounds that are based on the following identities, whichfollow from (9.3.70):

Aγ,N = AγN,1 Aγ(1−N)1−γN ,0

, (9.3.84)

Aγ,N = Aγ(1−N),0 A γN1−γ(1−N)

,1. (9.3.85)

It then follows that

Q(Aγ,N) ≤ Q(Aγ(1−N)1−γN ,0

) =: QUBDP,1(γ, N), (9.3.86)

830

Page 842: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

0.0 0.1 0.2 0.3 0.4 0.5

γ

0.0

0.2

0.4

0.6

0.8

1.0N = 0.1

0.0 0.1 0.2 0.3 0.4 0.5

γ

0.0

0.2

0.4

0.6

0.8

1.0N = 0.2

0.0 0.1 0.2 0.3 0.4 0.5

γ

0.0

0.2

0.4

0.6

0.8

1.0N = 0.3

0.0 0.1 0.2 0.3 0.4 0.5

γ

0.0

0.2

0.4

0.6

0.8

1.0N = 0.4

0.0 0.1 0.2 0.3 0.4 0.5

γ

0.0

0.2

0.4

0.6

0.8

1.0N = 0.45

0.0 0.1 0.2 0.3 0.4 0.5

γ

0.0

0.2

0.4

0.6

0.8

1.0N = 0.5

Ic(Aγ,N) QUBDP,1 QUB

DP,2 QUBDP,3 QUB

DP,4 R(Aγ,N)

FIGURE 9.6: The coherent information lower bound Ic(Aγ,N) and four up-per bounds on the quantum capacity of the generalized amplitude damp-ing channel Aγ,N . The quantum capacity lies within the shaded region.

Q(Aγ,N) ≤ Q(Aγ(1−N),0) =: QUBDP,2(γ, N), (9.3.87)

Q(Aγ,N) ≤ Q(AγN,0) =: QUBDP,3(γ, N), (9.3.88)

Q(Aγ,N) ≤ Q(A γN1−γ(1−N)

,0) =: QUBDP,4(γ, N). (9.3.89)

Note that the right-hand side of each inequality can be calculated using (9.3.82).We have also made use of (9.3.67), which implies that Q(Aγ,1) = Q(Aγ,0).These inequalities hold due to the fact that, for the composition of two quan-tum channels N and M,

Q(N M) ≤ Q(M) and Q(N M) ≤ Q(N). (9.3.90)

The first inequality holds by the data-processing inequality. The second in-equality can be viewed as a lower bound on the quantum capacity of thechannel N that arises from a coding strategy consisting of some encoding fol-lowed by many uses of the channel M.

831

Page 843: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

9.4 Summary

In this chapter, we studied quantum communication. Given a quantum chan-nel NA→B connecting Alice and Bob, the goal in quantum communication isto determine the highest rate, called the quantum capacity and denoted byQ(N), at which the A′ part of an arbitrary pure state ΨRA′ can be transmittedto Bob without error. At the disposal of Alice and Bob are local encoding anddecoding channels, as well as an arbitrary number of (unassisted) uses of thechannel NA→B. By unassisted, we mean that Alice and Bob are not allowedto communicate with each other between channel uses. We found that thecoherent information Ic(N) of N is always a lower bound on its quantum ca-pacity, and that, in general, computing the exact value of the capacity involvesa regularization, so that Q(N) = Ic

reg(N).

Starting with the one-shot setting, in which only one use of the channelis allowed and there is some tolerable non-zero error, we determined bothupper and lower bounds on the number of qubits that can be transmitted.The one-shot upper bound involves the hypothesis testing relative entropyin a way similar to how it is involved in classical communication and entan-glement distillation. Specifically, we establish the hypothesis testing coher-ent information as an upper bound. This leads to the coherent information(hence regularized coherent information) weak converse upper bound in theasymptotic setting. To obtain a lower bound, we used the results of Chapter 8on entanglement distillation. We found that we could take the entanglementdistillation protocol developed in that chapter and convert it to a suitablequantum communication protocol. We proved that this lower bound is opti-mal when applied to the asymptotic setting, in the sense that it leads to the co-herent information (hence regularized coherent information) as an achievablerate, which matches the upper bound. For degradable channels, we showedthat the coherent information is additive, meaning that Q(N) = Ic(N) forall degradable channels. We also showed that anti-degradable channels havezero quantum capacity.

With the goal of obtaining tractable estimates of quantum capacity forgeneral channels, we found that the Rains information R(N) of N is a strongconverse upper bound on the quantum capacity of N. This allowed us toconclude that the quantum capacity of the generalized dephasing channel isequal to its coherent information, because its Rains information and coher-ent information coincide. We also looked ahead to Chapter 14 and concludedfrom the results there that the squashed entanglement of a quantum channel

832

Page 844: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

is an upper bound on quantum capacity.

9.5 Bibliographic Notes

The problem of determining the capacity of a quantum channel for trans-mitting quantum information, in a manner analogous to Shannon’s channelcapacity theorem, was proposed by Shor (1995). The notion of quantum com-munication that we consider in this chapter, as well as the notion of entan-glement transmission, was defined by Schumacher (1996). The notion of sub-space transmission was defined by Barnum et al. (2000) (see also (Bennettet al., 1997)), and the notion of entanglement generation was defined by De-vetak (2005). These different notions of quantum communication, and theconnections between them, have been examined by Kretschmann and Werner(2004), where they also proved that the capacities for these variations are allequal to each other.

Upper and lower bounds on one-shot quantum capacity have been es-tablished by Buscemi and Datta (2010a); Datta and Hsieh (2013); Beigi et al.(2016); Tomamichel et al. (2016); Anshu et al. (2019); Wang et al. (2019b). Theapproach of using hypothesis testing relative entropy for obtaining an upperbound on one-shot quantum capacity (specifically, Theorem 9.3) comes fromwork by Matthews and Wehner (2014). The lower bound on the one-shotquantum capacity in Theorem 9.5 comes from work on one-shot decoupling(Dupuis et al., 2014), which was then used by (Wilde et al., 2017, Proposi-tion 21) to obtain a lower bound on the one-shot distillable entanglement.The various code conversions in Lemmas 9.6, 9.7, and 9.8 are available in anumber of works, including Barnum et al. (2000); Kretschmann and Werner(2004); Klesse (2007); Watrous (2018) (see also Wilde and Qi (2018)). The one-shot Rains upper bound in Corollary 9.20 was obtained by Tomamichel et al.(2017).

In the asymptotic setting, Schumacher (1996); Schumacher and Nielsen(1996); Barnum et al. (1998, 2000) established coherent information as an up-per bound on quantum capacity, and Lloyd (1997); Shor (2002b); Devetak(2005) established the lower bound. (See also the proofs of Klesse (2008);Hayden et al. (2008b).) Decoupling as a method for understanding quantumcapacity was initially studied by Schumacher and Westmoreland (2002) anddeveloped in further detail by Hayden et al. (2008a). The Rains informationstrong converse upper bound (Theorem 9.22) was established by Tomamichel

833

Page 845: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

et al. (2017). Weak subadditivity of Rényi Rains information of a channel(Proposition 9.21) is also due to Tomamichel et al. (2017). We also mentionthat upper bounds on quantum capacity based on approximate degradabil-ity and approximate anti-degradability of channels have been established bySutter et al. (2017) (see also (Leditzky et al., 2018)).

Additivity of coherent information for degradable channels was shown byDevetak and Shor (2005). Smith and Yard (2008); Smith et al. (2011) demon-strated the phenomenon of superactivation of quantum capacity, and DiVin-cenzo et al. (1998); Smith and Smolin (2007); Cubitt et al. (2015); Elkoussand Strelchuk (2015) demonstrated superadditivity of coherent information.Lemma 9.27 was presented in (Yard et al., 2008, Lemma 5). The quantumcapacity of generalized dephasing channels was established by Devetak andShor (2005) and the strong converse by Tomamichel et al. (2017). See (Morganand Winter, 2014) for the pretty-strong converse for the quantum capacity ofdegradable channels. The generalized amplitude damping channel (GADC)has been studied in detail by Khatri et al. (2020). The fact that this channelis not degradable for N ∈ (0, 1) and γ ∈ (0, 1] follows from (Cubitt et al.,2008, Theorem 4). The expression in (9.3.82) for the quantum capacity of theamplitude damping channel was given by Giovannetti and Fazio (2005). Fora proof of (9.3.83), see (García-Patrón et al., 2009, Appendix). A proof of anti-degradability of the GADC Aγ,N for all γ ≥ 1

2 can be found in (Khatri et al.,2020, Proposition 2).

Appendix 9.A Alternative Notions ofQuantum Communication

At the beginning of this chapter, we considered three alternative notions ofquantum communication, and we described how they are implied by the no-tion of quantum communication as defined at the beginning of Section 9.1.We now precisely define these other notions of quantum communication, andwe show how the notion of quantum communication considered in the chap-ter (strong subspace transmission) implies all three alternatives.

834

Page 846: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Definition 9.30 Entanglement Transmission

An entanglement transmission protocol for NA→B consists of the three el-ements (d,E,D), where d ≥ 2, EA′→A is an encoding channel withdA′ = d, and DB→B′ is a decoding channel with dB′ = d. The goal ofthe protocol is to transmit the A′ system of a maximally entangled stateΦRA′ of Schmidt rank d such that the final state

ωRB′ := (DB→B′ NA→B EA′→A)(ΦRA′) (9.A.1)

is close to the initial maximally entangled state. The entanglement trans-mission error of the protocol is

p(ET)err (E,D;N) := 1− 〈Φ|RB′ωRB′ |Φ〉RB′ (9.A.2)

= 1− Fe(D N E), (9.A.3)

where we recall the entanglement fidelity of a channel from Defini-tion 3.71. We call the protocol (d,E,D) a (d, ε) protocol, with ε ∈ [0, 1], ifp(ET)

err (E,D;N) ≤ ε.

It is straightforward to see that if there exists a (d, ε) quantum communi-cation protocol for a quantum channel N (as per Definition 9.1), then thereexists a (d, ε) entanglement transmission protocol. Indeed, for a (d, ε) quan-tum communication protocol with encoding and decoding channel E and D,we have that

p∗err(E,D;N)

= max|Ψ〉RA′

1− 〈Ψ|RA′(DB→B′ NA→B EA′→A)(ΨRA′)|Ψ〉RA′ (9.A.4)

= 1− F(D N E) (9.A.5)≤ ε. (9.A.6)

However, since the maximally entangled state ΦAB′ is a particular pure statein the optimization for F(D N E), we conclude that

p(ET)err (E,D;N) = 1− Fe(D N E) ≤ 1− F(D N E) ≤ ε. (9.A.7)

So the elements (d,E,D) form a (d, ε) entanglement transmission protocol.

835

Page 847: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

Definition 9.31 Entanglement Generation

An entanglement generation protocol for NA→B is defined by the three el-ements (d, ΨA′A,DB→B′), where ΨA′A is a pure state with dA′ = d, andDB→B′ is a decoding channel with dB′ = d. The goal of the protocol is totransmit the system A such that the final state

σA′B′ := (DB→B′ NA→B)(ΨA′A) (9.A.8)

is close in fidelity to a maximally entangled state of Schmidt rank d. Theentanglement generation error of the protocol is given by

p(EG)err (ΨA′A,D;N) := 1− 〈Φ|A′B′σA′B′ |Φ〉A′B′ (9.A.9)

= 1− F(ΦA′B′ , σA′B′). (9.A.10)

We call the protocol (d, ΨA′A,DB→B′) a (d, ε) protocol, with ε ∈ [0, 1], ifp(EG)

err (ΨA′A,D;N) ≤ ε.

Consider a (d, ε) quantum communication protocol for NA→B given by theelements (d,EA′→A,DB→B′), where dA′ = dB′ = d. Then, by the argumentsabove, the same elements constitute a (d, ε) entanglement transmission pro-tocol, so that

〈Φ|RB′(DB→B′ NA→B EA′→A)(ΦRA′)|Φ〉RB′ ≥ 1− ε. (9.A.11)

Now, let the system R ≡ A belong to Alice, and let ΨAA := EA′→A(ΦAA).Then,

〈Φ|AB′(DB→B′ NA→B)(ΨAA)|Φ〉RB′ ≥ 1− ε. (9.A.12)

Therefore, by definition, the elements (d, ΨAA,DB→B′) constitute a (d, ε) en-tanglement generation protocol.

Definition 9.32 Subspace Transmission

A subspace transmission protocol over the quantum channel NA→B consistsof the three elements (d,E,D), where d ≥ 2 and E and D are encodingand decoding channels. The goal of the protocol is to transmit an arbi-trary pure state ψA′ such that the final state

ωB′ := (DB→B′ NA→B EA′→A)(ψA′) (9.A.13)

is close in fidelity to the initial state. The state transmission error of the

836

Page 848: Principles of Quantum Communication Theory: A Modern Approach

Chapter 9: Quantum Communication

protocol is

p(ST)err (E,D;N) := 1−min

ψ〈ψ|N(ψ)|ψ〉 = 1− Fmin(D N E), (9.A.14)

where we recall the minimum fidelity of a channel defined in (3.5.162).We call the protocol (d,E,D) a (d, ε) protocol, with ε ∈ [0, 1], ifp(ST)

err (E,D;N) ≤ ε.

REMARK: An alternative way to define the error criterion for a subspace transmissioncode would be to use the average fidelity, defined in (3.5.160); please consult the Biblio-graphic Notes in Section 9.5.

Given a (d, ε) quantum communication protocol for the channel N withthe elements (d,E,D), the equality p∗err(E,D;N) = 1 − F(D N E) holds,where F(·) is the channel fidelity defined in (3.5.163). Then, restricting the op-timization in F(D N E) to pure states ΨRA′ = |Ψ〉〈Ψ|RA′ such that |Ψ〉RA′ =|φ〉R ⊗ |ψ〉A′ , we obtain

p(ST)err (E,D;N)

= 1− Fmin(D N E) (9.A.15)= 1−min

|ψ〉〈ψ|N(ψ)|ψ〉 (9.A.16)

= 1− min|φ〉,|ψ〉

(〈φ|R ⊗ 〈ψ|A′)(φR ⊗N(ψA′))(|φ〉R ⊗ |ψ〉A′) (9.A.17)

≤ 1− F(D N E) (9.A.18)≤ ε. (9.A.19)

So the elements (d,E,D) form a (d, ε) subspace transmission protocol.

837

Page 849: Principles of Quantum Communication Theory: A Modern Approach

Chapter 10

Secret Key Distillation

[IN PROGRESS]

10.1 Tripartite key states and bipartite private states

838

Page 850: Principles of Quantum Communication Theory: A Modern Approach

Chapter 10: Secret Key Distillation

Definition 10.1 Tripartite Key State

A state γABE is a tripartite key state, containing log2 K bits of secrecy, iflocal measurements of the A and B systems lead to the same uniformlyrandom outcome and the system E is product with the measurementoutcomes. That is, after Alice and Bob send their systems through localdephasing (measurement) channels

MA,B(·) :=K

∑i=1|i〉〈i|A,B(·)|i〉〈i|A,B, (10.1.1)

the resulting state on AB and E is as follows:

(MA ⊗MB)(γABE) = ΦAB ⊗ σE, (10.1.2)

for some state σE and where ΦAB is the maximally classically correlatedstate

ΦAB :=1K

K

∑i=1|i〉〈i|A ⊗ |i〉〈i|B. (10.1.3)

Definition 10.2 Bipartite private state

A state γABA′B′ is a bipartite private state, containing log2 K bits of se-crecy, if after purifying γABA′B′ to a pure state γABA′B′E with purifyingsystem E and tracing over the systems A′B′, the resulting state γABE isa tripartite key state. The systems A and B are called key systems, andthe systems A′ and B′ are called shield systems.

Theorem 10.3

A state γABA′B′ is a bipartite private state if and only if it has the follow-ing form:

γABA′B′ = UABA′B′ (ΦAB ⊗ ρA′B′)U†ABA′B′ , (10.1.4)

for a maximally entangled state ΦAB, where

ΦAB :=1K

K

∑i,j=1|i〉〈j|A ⊗ |i〉〈j|B, (10.1.5)

839

Page 851: Principles of Quantum Communication Theory: A Modern Approach

Chapter 10: Secret Key Distillation

some state ρA′B′ , and a global twisting unitary UABA′B′ of the followingform:

UABA′B′ :=K

∑i,j=1|i〉〈i|A ⊗ |j〉〈j|B ⊗Uij

A′B′ ., (10.1.6)

where UijA′B′ is a unitary for all i, j ∈ 1, . . . , K.

PROOF: Suppose that γABA′B′ has the form in (10.1.4). A particular purifica-tion of γABA′B′ is

|φγ〉ABA′B′E = UABA′B′ |Φ〉AB ⊗ |ψρ〉A′B′E (10.1.7)

=

(K

∑i,j=1|i〉〈i|A ⊗ |j〉〈j|B ⊗Uij

A′B′

)(1√K

K

∑k=1|k〉A|k〉B ⊗ |ψρ〉A′B′E

)

(10.1.8)

=1√K

K

∑k=1|k〉A|k〉B ⊗Ukk

A′B′ |ψρ〉A′B′E. (10.1.9)

The local dephasing channels in (10.1.1) lead to the following state

(MA⊗MB)(|φγ〉〈φγ|ABA′B′E) =1K

K

∑k=1|k〉〈k|A⊗|k〉〈k|B⊗Ukk

A′B′ |ψρ〉〈ψρ|A′B′E(

UkkA′B′

)†.

(10.1.10)Taking a partial trace over the A′B′ systems leads to

TrA′B′

[1K

K

∑k=1|k〉〈k|A ⊗ |k〉〈k|B ⊗Ukk

A′B′ |ψρ〉〈ψρ|A′B′E(

UkkA′B′

)†]

=1K

K

∑k=1|k〉〈k|A ⊗ |k〉〈k|B ⊗ TrA′B′

[Ukk

A′B′ |ψρ〉〈ψρ|A′B′E(

UkkA′B′

)†]

(10.1.11)

=1K

K

∑k=1|k〉〈k|A ⊗ |k〉〈k|B ⊗ TrA′B′

[(Ukk

A′B′

)†Ukk

A′B′ |ψρ〉〈ψρ|A′B′E]

(10.1.12)

= ΦAB ⊗ ρE. (10.1.13)

Thus, the particular purification |φγ〉ABA′B′E leads to a tripartite key state onsystems ABE.

Now, in the above we chose a particular purification of γABA′B′ . However,given that all purifications are related by isometries on the purifying system,

840

Page 852: Principles of Quantum Communication Theory: A Modern Approach

Chapter 10: Secret Key Distillation

any purification can be written as VE→E′ |φγ〉ABA′B′E for some isometry VE→E′ .Then repeating the above calculation gives that the reduced state on ABE′

after local dephasing channels on A and B is

ΦAB ⊗VEρEV†E , (10.1.14)

so that there is no correlation between the measurement outcomes of Aliceand Bob and the system E′. Furthermore, the measurement outcomes are per-fectly correlated and uniformly random. So we conclude that a state γABA′B′of the form in (10.1.4) is a bipartite private state.

Conversely, suppose now that γABA′B′ is a bipartite private state held byAlice and Bob, and let |φγ〉ABA′B′E be a purification of it, with E the purifyingsystem. Expanding the state in the basis of the local measurements of Aliceand Bob gives

|φγ〉ABA′B′E =K

∑i,j=1

αi,j|i〉A|j〉B|φi,j〉A′B′E, (10.1.15)

for some states |φi,j〉A′B′E and probability amplitudes αi,ji,j. However, in or-der for the measurement outcomes of Alice and Bob to be perfectly correlatedand uniformly random, it is necessary that

|αi,j|2 =

1K if i = j0 if i 6= j

. (10.1.16)

(Any other values for the amplitudes αi,j would lead to a different distributionupon measurement of the A and B systems.) So the global state should havethe following form:

|φγ〉ABA′B′E =K

∑i=1

1√K|i〉A|i〉Beiϕi |φi,i〉A′B′E. (10.1.17)

In order for the reduced density operator on E to be independent of the mea-surement outcomes of Alice and Bob, it is necessary for it to be a fixed statewith no dependence on i:

TrA′B′ |φi,i〉〈φi,i|A′B′E = σE. (10.1.18)

In such a case, then all of the states |φi,i〉A′B′E are purifications of the samestate σE, so that there exists a unitary Ui

A′B′ relating each |φi,i〉A′B′E to a fixedpurification |φσ〉A′B′E of σ:

eiϕi |φi,i〉A′B′E = UiA′B′ |φσ〉A′B′E. (10.1.19)

841

Page 853: Principles of Quantum Communication Theory: A Modern Approach

Chapter 10: Secret Key Distillation

Thus, we can write the global state as

1√K

K

∑i=1|i〉A|i〉BUi,i

A′B′ |φσ〉A′B′E, (10.1.20)

which is equivalent to(

∑i,j|i〉〈i|A ⊗ |j〉〈j|B ⊗Ui,j

A′B′

)|Φ〉AB ⊗ |φσ〉A′B′E, (10.1.21)

after setting Ui,jA′B′ = Ui

A′B′ for all j ∈ 1, . . . , K (there is in fact full freedom

in how the unitary Ui,jA′B′ is chosen for i 6= j). One can now deduce that the

reduced state on systems ABA′B′ has the form in (10.1.4).

Definition 10.4 ε-Approximate Tripartite Key State

Fix ε ∈ [0, 1]. A state ρABE is an ε-approximate tripartite key state if thereexists a tripartite key state γABE, as in Definition 10.1, such that

F(ρABE, γABE) ≥ 1− ε. (10.1.22)

Similarly, a state ρABA′B′ is an ε-approximate bipartite private state ifthere exists a bipartite private state γABA′B′ , as in Definition 10.2, suchthat

F(ρABA′B′ , γABA′B′) ≥ 1− ε. (10.1.23)

Approximate tripartite key states are in one-to-one correspondence withapproximate bipartite private states, as summarized below:

Proposition 10.5

If ρABA′B′ is an ε-approximate bipartite key state with K key values, thenAlice and Bob hold an ε-approximate tripartite key state with K key val-ues. The converse statement is true as well.

PROOF: Prove it using Uhlmann’s theorem.

842

Page 854: Principles of Quantum Communication Theory: A Modern Approach

Chapter 10: Secret Key Distillation

10.2 Privacy test and the ε-relative entropy of en-tanglement of bipartite private states

Here we define a “privacy test” as a method for testing whether a given bi-partite state is private. It forms an essential component in Proposition 10.10,which states that the ε-relative entropy of entanglement is an upper bound onthe number of private bits in an ε-approximate bipartite private state.

Definition 10.6 Privacy test

Let γABA′B′ be a bipartite private state as given in Definition 10.2. Aprivacy test corresponding to γABA′B′ (a γ-privacy test) is defined as thefollowing dichotomic measurement:

ΠABA′B′ , IABA′B′ −ΠABA′B′ , (10.2.1)

whereΠABA′B′ := UABA′B′ (ΦAB ⊗ IA′B′)U†

ABA′B′ (10.2.2)

and UABA′B′ is the twisting unitary specified in (10.1.6).

If one has access to the systems ABA′B′ of a bipartite state ρABA′B′ andhas a description of γABA′B′ satisfying (10.1.23), then the γ-privacy test de-cides whether ρABA′B′ is a private state with respect to γABA′B′ . The firstoutcome corresponds to the decision “yes, it is a γ-private state,” and thesecond outcome corresponds to “no.” Physically, this test is just untwistingthe purported private state and projecting onto a maximally entangled state.The following lemma states that the probability for an ε-approximate bipartiteprivate state to pass the γ-privacy test is high:

Lemma 10.7

Let ε ∈ [0, 1] and let ρABA′B′ be an ε-approximate private state as givenin Definition 10.4, with γABA′B′ satisfying (10.1.23). The probability forρABA′B′ to pass the γ-privacy test is never smaller than 1− ε:

Tr[ΠABA′B′ρABA′B′ ] ≥ 1− ε, (10.2.3)

where ΠABA′B′ is defined as above.

843

Page 855: Principles of Quantum Communication Theory: A Modern Approach

Chapter 10: Secret Key Distillation

PROOF: One can see this bound explicitly by inspecting the following steps:

Tr[ΠABA′B′ρABA′B′ ] = 〈Φ|AB TrA′B′ [U†ABA′B′ρABA′B′UABA′B′ ]|Φ〉AB (10.2.4)

= F(ΦAB, TrA′B′ [U†ABA′B′ρABA′B′UABA′B′ ]) (10.2.5)

≥ F(ΦAB ⊗ θA′B′ , U†ABA′B′ρABA′B′UABA′B′) (10.2.6)

= F(UABA′B′(ΦAB ⊗ θA′B′)U†ABA′B′ , ρABA′B′) (10.2.7)

= F(γABA′B′ , ρABA′B′) (10.2.8)≥ 1− ε. (10.2.9)

The steps follow as a consequence of several properties of the fidelity recalledin Section .... .

On the other hand, a separable state σABA′B′ ∈ S(AA′ : BB′) of the key andshield systems has a small chance of passing any γ-privacy test:

Lemma 10.8

For a separable state σABA′B′ ∈ S(AA′ : BB′), the probability of passingany γ-privacy test is never larger than 1/K:

Tr[ΠABA′B′σABA′B′ ] ≤1K

, (10.2.10)

where K is the number of values that the secret key can take (i.e., K =dim(HA) = dim(HB)).

PROOF: The idea is to begin by establishing the bound for any pure productstate |φ〉AA′ ⊗ |ϕ〉BB′ , i.e., to show that

Tr[ΠABA′B′ |φ〉〈φ|AA′ ⊗ |ϕ〉〈ϕ|BB′ ] ≤1K

. (10.2.11)

We can expand these states with respect to the standard bases of A and B asfollows:

|φ〉AA′ ⊗ |ϕ〉BB′ =

[K

∑i=1

αi|i〉A ⊗ |φi〉A′]⊗[

K

∑j=1

β j|j〉B ⊗ |ϕj〉B′]

, (10.2.12)

where ∑Ki=1 |αi|2 = ∑K

j=1∣∣β j∣∣2 = 1. We then find that

Tr[ΠABA′B′ |φ〉〈φ|AA′ ⊗ |ϕ〉〈ϕ|BB′ ]

844

Page 856: Principles of Quantum Communication Theory: A Modern Approach

Chapter 10: Secret Key Distillation

= Tr[UABA′B′ (ΦAB ⊗ IA′B′)U†ABA′B′ |φ〉〈φ|AA′ ⊗ |ϕ〉〈ϕ|BB′ ] (10.2.13)

=∥∥∥(〈Φ|AB ⊗ IA′B′)U†

ABA′B′ |φ〉AA′ ⊗ |ϕ〉BB′∥∥∥

2

2(10.2.14)

=

∥∥∥∥∥∥1√K

(K

∑i=1〈i|A ⊗ 〈i|B ⊗Uii†

A′B′

)

K

∑i′,j′=1

αi′β j′ |i′〉A ⊗ |j′〉B ⊗ |φi′〉A′ |ϕj′〉B′∥∥∥∥∥∥

2

2(10.2.15)

=1K

∥∥∥∥∥∥

K

∑i,i′,j′=1

αi′β j′〈i|i′〉A ⊗ 〈i|j′〉B ⊗Uii†A′B′ |φi′〉A′ |ϕj′〉B′

∥∥∥∥∥∥

2

2

(10.2.16)

=1K

∥∥∥∥∥K

∑i=1

αiβiUii†A′B′ |φi〉A′ |ϕi〉B′

∥∥∥∥∥

2

2

(10.2.17)

=1K

∥∥∥∥∥K

∑i=1

αiβi|ξi〉A′B′∥∥∥∥∥

2

2

(10.2.18)

=1K

K

∑i,j=1

αiβiα∗j β∗j 〈ξ j|ξi〉A′B′ . (10.2.19)

where |ξi〉A′B′ := (UiiA′B′)

†|φi〉A′ |ϕi〉B′ is a quantum state. The desired boundin (10.2.11) is then equivalent to

K

∑i,j=1

αiβiα∗j β∗j 〈ξ j|ξi〉A′B′ ≤ 1. (10.2.20)

Setting αi =√

pieiθi and βi =√

qieiηi , we find that

K

∑i,j=1

αiβiα∗j β∗j 〈ξ j|ξi〉A′B′ =

∣∣∣∣∣K

∑i,j=1

√piqi pjqje

i(θi+ηi−θj−ηj)〈ξ j|ξi〉A′B′∣∣∣∣∣ (10.2.21)

≤K

∑i,j=1

√piqi pjqj

∣∣〈ξ j|ξi〉A′B′∣∣ (10.2.22)

≤K

∑i,j=1

√piqi pjqj (10.2.23)

=

[K

∑i=1

√piqi

]2

≤ 1, (10.2.24)

845

Page 857: Principles of Quantum Communication Theory: A Modern Approach

Chapter 10: Secret Key Distillation

where the last inequality holds for all probability distributions (this is just thestatement that the classical fidelity cannot exceed one). The above reasoningthus establishes (10.2.10) for pure product states, and the bound for generalseparable states follows because every such state can be written as a convexcombination of pure product states.

Definition 10.9 ε-Relative Entropy of Entanglement

The ε-relative entropy of entanglement of a bipartite state ρAB is definedas

EεR(A; B)ρ := inf

σAB∈S(A:B)Dε

H(ρAB‖σAB). (10.2.25)

This quantity is an LOCC monotone, meaning that

EεR(A; B)ρ ≥ Eε

R(A′; B′)ω, (10.2.26)

for ωA′B′ := ΛAB→A′B′(ρAB), with ΛAB→A′B′ an LOCC channel. This followsbecause the underlying quantity Dε

H is monotone non-increasing with respectto quantum channels and the set of separable states is closed under LOCCchannels.

Proposition 10.10

Fix ε ∈ [0, 1]. Let ρABA′B′ be an ε-approximate bipartite private state, asgiven in Definition 10.4. Then the number log2 K of private bits in sucha state is bounded from above by the ε-relative entropy of entanglementof ρABA′B′ :

EεR(AA′; BB′)ρ ≥ log2 K. (10.2.27)

PROOF: Let σABA′B′ be an arbitrary separable state in S(AA′ : BB′). FromDefinition 10.4 and Lemma 10.7, we conclude that the γ-privacy test ΠABA′B′from (10.2.2) is a particular measurement operator satisfying the constraintTr[ΠABA′B′ρABA′B′ ] ≥ 1− ε for βε(ρABA′B′‖σABA′B′). Applying Lemma 10.8and the definition of βε, we conclude that

βε(ρABA′B′‖σABA′B′) ≤ Tr[ΠABA′B′σABA′B′ ] ≤1K

. (10.2.28)

Since the inequality holds for all separable states σABA′B′ ∈ S(AA′ : BB′), we

846

Page 858: Principles of Quantum Communication Theory: A Modern Approach

Chapter 10: Secret Key Distillation

conclude that

supσABA′B′∈S(AA′:BB′)

βε(ρABA′B′‖σABA′B′) ≤1K

. (10.2.29)

Applying a negative logarithm and Definition 10.9, we arrive at the inequalityin (10.2.27).

847

Page 859: Principles of Quantum Communication Theory: A Modern Approach

Chapter 11

Private Communication

[IN PROGRESS]

848

Page 860: Principles of Quantum Communication Theory: A Modern Approach

Part III

Feedback-Assisted QuantumCommunication Protocols

We now delve into interactive quantum communication protocols.Such protocols involve interaction between the sender and receiverof a quantum channel, beyond the quantum channel that connectsthem, and this interaction can potentially increase a given commu-nication capacity because it represents an additional resource thatthe sender and receiver have at their disposal. Such protocols arericher than the non-interactive protocols that we considered previ-ously, and as such, their analysis is more involved.

One objective of the following chapters is to understand what rolethis interaction plays and whether it can increase capacity. For themost part, what we accomplish is the establishment of limitationson the ability of feedback to increase capacity. In some cases, suchas the case presented in the first chapter, a surprising conclusion isthat interaction does not increase capacity at all, so that the theorysimplifies.

Page 861: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12

Quantum-Feedback-AssistedCommunication

In this chapter, we begin our foray into interactive quantum communicationby analyzing communication protocols in which the goal is for the senderto communicate a classical message to the receiver, with the assistance of afree noiseless quantum feedback channel. By a quantum feedback channel,we mean a quantum channel from the receiver to the sender that is separatefrom the channel from the sender to the receiver being used to communicatethe message. We thus call this communication scenario “quantum-feedback-assisted communication.”

One simple (yet effective) way to make use of this free noiseless quantumfeedback channel is for the receiver to transmit one share of a bipartite quan-tum state to the sender. By doing so, they can establish shared entanglement,and the rates of classical communication that are achievable with such a strat-egy are given by the limits on entanglement-assisted communication that westudied previously in Chapter 6.

Perhaps surprisingly, we show here that the same non-asymptotic con-verse bounds established in (6.2.61) and (6.2.92) apply to protocols assistedby noiseless quantum feedback. These non-asymptotic converse bounds im-ply that the quantum-feedback-assisted classical capacity of a channel is nolarger than its entanglement-assisted capacity. Furthermore, the strong con-verse property holds for the quantum-feedback-assisted capacity, so that thestrong converse capacity is equal to the mutual information of a quantumchannel.

850

Page 862: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

E0

N N N

D1

E1 En− 1

D2

M 3 m

A1

A′1

A2

A′2 · · ·An

A′n

ΨF0B′0AliceBob

F0

B′0B1 F1

F1

B′1B2 F2

. ..

. ..

B′2 · · ·

A′n−1

Fn−1

BnB′n−1 m

FIGURE 12.1: A general quantum-feedback-assisted communication pro-tocol for the channel N, which uses it n times.

This result demonstrates that the entanglement-assisted capacity of a quan-tum channel is a rather robust communication capacity. Not only is the mu-tual information of a channel equal to the strong converse entanglement-assisted capacity for all channels, but it is also equal to the strong conversequantum feedback-assisted capacity for all channels. Thus, the theory ofentanglement-assisted and quantum-feedback-assisted communication sim-plifies immensely.

It is worth remarking that Shannon proved that a similar result holds forclassical channels, and the strong converse property was later demonstratedas well. In this sense, the entanglement-assisted capacity of a quantum chan-nel represents the fully quantum generalization of the classical capacity of aclassical channel. Related, the quantum mutual information of a quantumchannel represents the fully quantum generalization of the classical mutualinformation of a classical channel.

12.1 n-Shot Quantum Feedback-Assisted Commu-nication Protocols

We begin by defining the most general form for an n-shot classical commu-nication protocol assisted by a noiseless quantum feedback channel, wheren ∈ N. Such a protocol is depicted in Figure 12.1, and it is defined by thefollowing elements:

(M, ΨF0B′0,E0

M′F0→A′1 A1, Ei

A′iFi→A′i+1 Ai+1,Di

BiB′i−1→FiB′in−1

i=1 ,DnBnB′n−1→M),

(12.1.1)

851

Page 863: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

where M is the message set, ΨF0B′0denotes a bipartite quantum state, the ob-

jects denoted by E are encoding channels, and the objects denoted by D aredecoding channels. Let C denote all of these elements, which together consti-tute the quantum-feedback-assisted code. The quantum systems labeled by Frepresent the feedback systems that Bob sends back to Alice. The primed sys-tems A′i and B′i represent local quantum memory or “scratch” registers thatAlice and Bob can exploit in the feedback-assisted protocol.

In such an n-round feedback-assisted protocol, the protocol proceeds asfollows: let p : M → [0, 1] be a probability distribution over the message set.Alice starts by preparing two classical registers M and M′ in the followingstate:

ΦpMM′ := ∑

m∈Mp(m)|m〉〈m|M ⊗ |m〉〈m|M′ . (12.1.2)

Furthermore, Alice and Bob also initially share a quantum state ΨF0B′0on Al-

ice’s system F0 and Bob’s system B′0. This state is prepared by Bob locally, andthen he transmits the system F0 to Alice via the noiseless quantum feedbackchannel. The initial global state shared between them is

ΦpMM′ ⊗ΨF0B′0

. (12.1.3)

Alice then sends the M′ and F0 registers through the first encoding channelE0

M′F0→A′1 A1. This encoding channel realizes a set E0,m

F0→A′1 A1m∈M of quantum

channels as follows:

E0,mF0→A′1 A1

(τF0) := E0M′F0→A′1 A1

(|m〉〈m|M′ ⊗ τF0), (12.1.4)

for all input states τF0 . The global state after the first encoding channel is thenas follows:

E0M′F0→A′1 A1

(ΦpMM′ ⊗ΨF0B′0

). (12.1.5)

Note that the scratch system A′1 can contain a classical copy of the particularmessage m that is being communicated, and the same is true for all of the laterscratch systems A′i, for i ∈ 2, . . . , n. In fact, this is necessary in order for thecommunication protocol to be effective. Alice then transmits the A1 systemthrough the channel NA1→B1 , leading to the state

ρ1MA′1B1B′0

:= (NA1→B1 E0M′F0→A′1 A1

)(ΦpMM′ ⊗ΨF0B′0

). (12.1.6)

852

Page 864: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

After receiving the B1 system, Bob performs the decoding channel D1B1B′0→F1B′1

,such that the state is then

D1B1B′0→F1B′1

(ρ1MA′1B1B′0

), (12.1.7)

with it being understood that the system B′1 is Bob’s new scratch register andthe feedback system F1 gets sent over the noiseless quantum feedback channelback to Alice.

In the next round, Alice processes the A′1F1 systems with the encodingchannel E1

A′1F1→A′2 A2, and she sends system A2 over the channel NA2→B2 , lead-

ing to the state

ρ2MA′2B2B′1

:= (NA2→B2 E1A′1F1→A′2 A2

D1B1B′0→F1B′1

)(ρ1MA′1B1B′0

). (12.1.8)

Bob then applies the second decoding channel D2B2B′1→F2B′2

. This process theniterates n − 2 more times, and the state after each use of the channel is asfollows:

ρiMA′iBiB′i−1

:=

(NAi→Bi Ei−1A′i−1Fi−1→A′i Ai

Di−1Bi−1B′i−2→Fi−1B′i−1

)(ρi−1MA′i−1Bi−1B′i−2

), (12.1.9)

for i ∈ 3, . . . , n.In the final round, Bob performs the decoding channel Dn

BnB′n−1→M, which

is a quantum-to-classical channel that finally decodes the transmitted mes-sage. The final classical-classical state of the protocol is then as follows:

ωpMM

:= DnBnB′n−1→M(TrA′n [ρ

nMA′nBnB′n−1

]). (12.1.10)

Now, just as we did in Chapter 6 in the case of entanglement-assisted clas-sical communication, we can define the message error probability, averageerror probability, and maximal error probability as in (6.1.13), (6.1.14), and(6.1.15), respectively. Using the alternative expression in (6.1.24) for the av-erage error probability, we have that the error probability for the quantum-feedback-assisted code C is given by

perr(C; p) =12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1

, (12.1.11)

853

Page 865: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

Using the alternative expression in (6.1.36) for the maximal error probability,we have that the maximal error probability of the quantum-feedback-assistedcode C is given by

p∗err(C) = maxp:M→[0,1]

12

∥∥∥ΦpMM′ −ω

pMM

∥∥∥1

. (12.1.12)

Definition 12.1 (n, |M|, ε) Quantum-Feedback-Assisted ClassicalCommunication Protocol

Let (M, ΨF0B′0,E0

M′F0→A′1 A1, Ei

A′iFi→A′i+1 Ai+1,Di

BiB′i−1→FiB′in−1

i=1 ,DnBnB′n−1→M

)

be the elements of an n-shot quantum-feedback-assisted classical com-munication protocol over the channel NA→B. The protocol is called an(n, |M|, ε) protocol, with ε ∈ [0, 1], if p∗err(C) ≤ ε.

12.1.1 Protocol over a Useless Channel

As before, when determining converse bounds on the rate at which classicalmessages can be communicated reliably using such feedback-assisted proto-cols, it is helpful to consider a useless channel. Our plan is again to use rel-ative entropy (or some generalized divergence) to compare the states at eachtime step of the actual protocol with those resulting from employing a use-less channel instead of the actual channel. As before, a useless channel thatconveys no information at all is one in which the input state is discarded andreplaced with some state at the output:

RA→B := PσB TrA, (12.1.13)

where PσB denotes a preparation channel that prepares the arbitrary (butfixed) state σB at the output.

We can modify the ith step of the protocol discussed in the previous sec-tion, such that instead of the actual channel NAi→Bi being applied, the replace-ment channel RAi→Bi is applied; see Figure 12.2.

The state after the first round in this protocol over the useless channel is

τ1MA′1B1B′0

:= (RA1→B1 E0M′F0→A′1 A1

)(ΦpMM′ ⊗ΨF0B′0

) (12.1.14)

854

Page 866: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

E0

PσB PσB PσB

D1

E1 En− 1

D2

M 3 m

A1

A′1

A2

A′2 · · ·An

A′n

ΨF0B′0AliceBob

F0

B′0B1 F1

F1

B′1B2 F2

. ..

. ..

B′2 · · ·

A′n−1

Fn−1

BnB′n−1 m

FIGURE 12.2: Depiction of a protocol that is useless for quantum-feedback-assisted classical communication. In each round, the encodedstate is discarded and replaced with an arbitrary (but fixed) state σB.

= TrA1 [E0M′F0→A′1 A1

(ΦpMM′ ⊗ΨF0B′0

)]⊗ σB1 , (12.1.15)

where we observe that

τ1MA′1B1B′0

= τ1MA′1B′0

⊗ σB1 , (12.1.16)

and furthermore that

τ1MB1B′0

= TrA′1 A1[E0

M′F0→A′1 A1(Φp

MM′ ⊗ΨF0B′0)]⊗ σB1 (12.1.17)

= TrM′F0[Φp

MM′ ⊗ΨF0B′0]⊗ σB1 (12.1.18)

= πpM ⊗ΨB′0

⊗ σB1 , (12.1.19)

where the second equality holds due to the fact that the first encoding channelE0

M′F0→A′1 A1is trace preserving, and where

πpM = ∑

m∈Mp(m)|m〉〈m|M. (12.1.20)

Thus, there is no correlation whatsover between the message system M andBob’s systems B1B′0 after tracing over all of Alice’s systems. Intuitively, thisis a consequence of the fact that the “communication line has been cut” whenemploying the replacement channel.

The state after the second replacement channel RA2→B2 is then given by

τ2MA′2B2B′1

:= (RA2→B2 E1A′1F1→A′2 A2

D1B1B′0→F1B′1

)(τ1MA′1B1B′0

) (12.1.21)

855

Page 867: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

= TrA2 [(E1A′1F1→A′2 A2

D1B1B′0→F1B′1

)(τ1MA′1B1B′0

)]⊗ σB2 (12.1.22)

= TrA2 [(E1A′1F1→A′2 A2

D1B1B′0→F1B′1

)(τ1MA′1B′0

⊗ σB1)]⊗ σB2 , (12.1.23)

where we used (12.1.16) to obtain the last line. If we take the partial traceover system A′2, then the fact that the encoding channel E1

A′1F1→A′2 A2is trace

preserving implies that

τ2MB2B′1

= TrA′2 A2[(E1

A′1F1→A′2 A2D1

B1B′0→F1B′1)(τ1

MA′1B′0⊗ σB1)]⊗ σB2 (12.1.24)

= TrA′1F1[D1

B1B′0→F1B′1(τ1

MA′1B′0⊗ σB1)]⊗ σB2 (12.1.25)

= TrF1 [D1B1B′0→F1B′1

(τ1MB′0⊗ σB1)]⊗ σB2 . (12.1.26)

Then, using (12.1.19), which implies that τ1MB′0

= πpM ⊗ΨB′0

, we find that

τ2MB2B′1

= TrF1 [D1B1B′0→F1B′1

(πpM ⊗ΨB′0

⊗ σB1)]⊗ σB2 (12.1.27)

= πpM ⊗ TrF1 [D

1B1B′0→F1B′1

(ΨB′0⊗ σB1)]⊗ σB2 (12.1.28)

= πpM ⊗ τ2

B′1⊗ σB2 . (12.1.29)

Thus, we find again that there is no correlation whatsoever between the mes-sage system M and Bob’s systems B2B′1 after tracing over all of Alice’s sys-tems.

The states for the other rounds i ∈ 3, . . . , n are given by

τiMA′iBiB′i−1

:= (RAi→Bi Ei−1A′i−1Fi−1→A′i Ai

Di−1Bi−1B′i−2→Fi−1B′i−1

)(τi−1MA′i−1Bi−1B′i−2

) (12.1.30)

= TrAi [Ei−1A′i−1Fi−1→A′i Ai

Di−1Bi−1B′i−2→Fi−1B′i−1

)(τi−1MA′i−1Bi−1B′i−2

)]⊗ σBi (12.1.31)

= TrAi [Ei−1A′i−1Fi−1→A′i Ai

Di−1Bi−1B′i−2→Fi−1B′i−1

)(τi−1MA′i−1B′i−2

⊗ σBi−1)]⊗ σBi .

(12.1.32)

Repeating a calculation similar to the above leads to a similar conclusion asabove:

τiMBiB′i−1

= πpM ⊗ τi

B′i−1⊗ σBi , (12.1.33)

for all i ∈ 3, . . . , n. That is, there is no correlation whatsoever between themessage system M and Bob’s systems BiB′i−1 after tracing over all of Alice’s

856

Page 868: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

systems. Again, this is intuitively a consequence of the fact that the “commu-nication line has been cut” when employing the replacement channel.

Bob’s final decoding channel DnBnB′n−1→M

therefore leads to the following

classical-classical state:

τMM := πM ⊗DnBnB′n−1→M(τ2

B′n−1⊗ σBn) = π

pM ⊗ τM, (12.1.34)

where τM := ∑m∈M t(m)|m〉〈m|M for some probability distribution t : M →[0, 1], which corresponds to Bob’s measurement.

12.1.2 Upper Bound on the Number of Transmitted Bits

We now give a general upper bound on the number transmitted bits in anyquantum-feedback-assisted classical communication protocol. This result isstated in Theorem 12.2, and it holds independently of the encoding and de-coding channels used in the protocol and depends only on the given commu-nication channel N. Recall from the previous section that log2 |M| representsthe number of bits that are transmitted over the channel N.

Theorem 12.2 n-Shot Upper Bounds for Quantum-Feedback-Assisted Classical Communication

Let NA→B be a quantum channel, and let ε ∈ [0, 1). For all (n, |M|, ε)quantum-feedback-assisted classical communication protocols over thechannel NA→B, the following bounds hold,

log2 |M|n

≤ 11− ε

(I(N) +

1n

h2(ε)

), (12.1.35)

log2 |M|n

≤ Iα(N) +α

n(α− 1)log2

(1

1− ε

)∀ α > 1, (12.1.36)

where I(N) is the mutual information of N, as defined in (4.11.102),and Iα(N) is the sandwiched Rényi mutual information of N, as definedin (4.11.91).

PROOF: Let us start with an arbitrary (n, |M|, ε) quantum-feedback-assistedclassical communication protocol over a channel NA→B, corresponding to, asdescribed earlier, a message set M, a shared quantum state ΨF0B′0

, the encod-

857

Page 869: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

ing channels E0M′F0→A′1 A1

and EiA′iFi→A′i+1 Ai+1

n−1i=1 , and the decoding channels

DiBiB′i−1→FiB′i

n−1i=1 and Dn

BnB′n−1→M. Recall that we refer to all of these objects

collectively as the code C. The error criterion p∗err(C) ≤ ε holds by the defini-tion of an (n, |M|, ε) protocol, which implies that for all probability distribu-tions p : M→ [0, 1] on the message set M:

perr(C; p) ≤ p∗err(C) ≤ ε. (12.1.37)

(The reasoning for this is analogous to that in (6.1.62)–(6.1.65).) In particular,the above inequality holds with p being the uniform distribution on M, sothat p(m) = 1

|M| for all m ∈M.

Now, let ΦMM be the state defined in (12.1.2) with p the uniform distri-bution, and similarly let ωMM, defined in (12.1.10), be the state at the end ofthe protocol such that p is the uniform prior probability distribution. Observethat Tr[ωMM] = πM. Also, letting

ΠMM = ∑m∈M|m〉〈m|M ⊗ |m〉〈m|M (12.1.38)

be the projection defining the comparator test, as in (6.1.37), observe that

1− Tr[ΠMMωMM] =12

∥∥ΦMM −ωMM

∥∥1 ≤ ε, (12.1.39)

where the first equality follows by combining (6.1.24) with (6.1.41). Thismeans that

Tr[ΠMMωMM] ≥ 1− ε. (12.1.40)

We thus have all of the ingredients to apply Lemma 6.4. Doing so gives thefollowing critical first bound:

log2 |M| ≤ IεH(M; M)ω. (12.1.41)

Invoking Proposition 4.67, the definition of IεH(M; M) from (4.11.88), and

the expression for mutual information from (4.2.96), we find that

IεH(M; M)ω ≤

11− ε

(I(M; M)ω + h2(ε)

). (12.1.42)

Now, using the monotonicity of the mutual information (see Proposition 4.17)with respect to the last decoding channel, Dn

BnB′n−1→M, we find that

I(M; M)ω ≤ I(M; BnB′n−1)ρn . (12.1.43)

858

Page 870: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

Then, using the chain rule for mutual information in (4.2.109), we obtain

I(M; BnB′n−1)ρn = I(M; Bn|B′n−1)ρn + I(M; B′n−1)ρn (12.1.44)

≤ I(MB′n−1; Bn)ρn + I(M; B′n−1)ρn , (12.1.45)

where the second line is a consequence of the chain rule, as well as non-negativity of mutual information:

I(M; Bn|B′n−1)ρn = I(MB′n−1; Bn)ρn − I(B′n−1; Bn)ρn (12.1.46)

≤ I(MB′n−1; Bn)ρn . (12.1.47)

Finally, observe that the state ρnMBnB′n−1

has the following form:

ρnMBnB′n−1

= NAn→Bn(ζnMB′n−1 An

), (12.1.48)

where

ζnMB′n−1 An

:=

TrA′n [(En−1A′n−1Fn−1→A′n An

Dn−1Bn−1B′n−2→Fn−1B′n−1

)(ρn−1MA′n−1Bn−1B′n−2

)]. (12.1.49)

That is, the state ζnMB′n−1 An

is a particular state to consider in the optimization

of the mutual information of a channel (with the channel input system beingAn and the external correlated systems being MB′n−1), whereas the definitionof the mutual information of a channel involves an optimization over all suchstates. This means that

I(MB′n−1; Bn)ρn ≤ I(N). (12.1.50)

Putting together (12.1.43), (12.1.45), and (12.1.50), we find that

I(M; M)ω ≤ I(N) + I(M; B′n−1)ρn . (12.1.51)

The quantity I(M; B′n−1)ρn can be bounded using steps analogous to theabove. In particular, using the monotonicity of the mutual information withrespect to the second-to-last decoding channel Dn−1

Bn−1B′n−2→Fn−1B′n−1, then em-

ploying the same steps as above, we conclude that

I(M; B′n−1)ρn ≤ I(M; Bn−1B′n−2)ρn−1 (12.1.52)

= I(M; Bn−1|B′n−2)ρn−1 + I(M; B′n−2)ρn−1 (12.1.53)

859

Page 871: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

≤ I(MB′n−2; Bn−1)ρn−1 + I(M; B′n−2)ρn−1 (12.1.54)

≤ I(N) + I(M; B′n−2)ρn−1 , (12.1.55)

Overall, this leads to

I(M; M)ω ≤ 2I(N) + I(M; B′n−2)ρn−1 . (12.1.56)

Then, bounding I(M; B′n−2) in the same manner as above, and continuing thisprocess n− 3 more times such that we completely “unwind” the protocol, weobtain

I(M; M)ω ≤ 2I(N) + I(M; B′n−1)ρn−1 (12.1.57)

≤ 3I(N) + I(M; B′n−3)ρn−2 (12.1.58)... (12.1.59)≤ nI(N) + I(M; B′0)ρ1 . (12.1.60)

However, from (12.1.6), we have that ρ1MB′0

= ΦM ⊗ ΨB′0, which means that

I(M; B′0)ρ1 = 0. Therefore, putting together (12.1.41), (12.1.42), and (12.1.57)–(12.1.60), we get

log2 |M| ≤ IεH(M; M)ω (12.1.61)

≤ 11− ε

(I(M; M)ω + h2(ε)

)(12.1.62)

≤ 11− ε

(nI(N) + h2(ε)) , (12.1.63)

and the last line is equivalent to (12.1.35), as required.

We now establish the bound in (12.1.36). Combining (12.1.41) with Propo-sition 4.68, we conclude that the following bound holds for all α > 1:

log2 |M| ≤ Iα(M; M)ω +α

α− 1log2

(1

1− ε

). (12.1.64)

Recall that the sandwiched Rényi mutual information Iα(M; M)ω is definedas

Iα(M; M)ω = infξM

Dα(ωMM‖ωM ⊗ ξM) (12.1.65)

= infξM

Dα(ωMM‖πM ⊗ ξM). (12.1.66)

860

Page 872: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

Our goal now is to compare the actual protocol with one that results fromemploying a useless, replacement channel. To this end, let R

σBA→B be the

replacement channel defined in (12.1.13), with σB an arbitrary (but fixed)state. Then as discussed in Section 12.1.1 (in particular, in (12.1.34)), the fi-nal state of the protocol conduced with the replacement channel is given byτMM = πM ⊗ τM. Then, we find that

Iα(M; M)ω = infξM

Dα(ωMM‖πM ⊗ ξM) (12.1.67)

≤ Dα(ωMM‖πM ⊗ τM) (12.1.68)

= Dα(ωMM‖τMM). (12.1.69)

We now proceed with a similar method considered in the proof of the boundin (12.1.35), but using the sandwiched Rényi relative entropy as our main toolfor analysis. By the monotonicity of the sandwiched Rényi mutual informa-tion with respect to the last decoding channel, and using (12.1.33), we findthat

Dα(ωMM‖τMM) ≤ Dα(ρnMBnB′n−1

‖τnMBnB′n−1

) (12.1.70)

= Dα(ρnMBnB′n−1

‖πM ⊗ τnB′n−1⊗ σBn) (12.1.71)

α− 1log2 Qα(ρ

nMBnB′n−1

‖πM ⊗ τnB′n−1⊗ σBn)

1α , (12.1.72)

where in the last line we used the definition in (4.5.2) of the sandwiched Rényirelative entropy. Now, recalling that ρn

MBnB′n−1= NAn→Bn(ζ

nMB′n−1 An

) with the

state ζ(n)MB′n−1 An

defined in (12.1.49), and defining the positive semi-definiteoperator

X(α)MB′n−1 An

:=(

πM ⊗ τnB′n−1

) 1−α2α

ζnMB′n−1 An

(πM ⊗ τn

B′n−1

) 1−α2α , (12.1.73)

as well as the completely positive map

S(α)σB (·) := σ

1−α2α

Bn(·)σ

1−α2α

Bn, (12.1.74)

we use the definition of Qα in (4.5.3) to obtain

Qα(ρnMBnB′n−1

‖πM ⊗ τnB′n−1⊗ σBn)

=

∥∥∥∥(

πM ⊗ τnB′n−1⊗ σBn

) 1−α2α

ρnMBnB′n−1

(πM ⊗ τn

B′n−1⊗ σBn

) 1−α2α

∥∥∥∥α

(12.1.75)

861

Page 873: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

=

∥∥∥∥S(α)σB

((πM ⊗ τn

B′n−1

) 1−α2α

NAn→Bn(ζnMB′n−1 An

)(

πM ⊗ τnB′n−1

) 1−α2α

)∥∥∥∥α

(12.1.76)

=

∥∥∥∥(S(α)σB NAn→Bn)(X(α)

MB′n−1 An)

∥∥∥∥α

. (12.1.77)

Multiplying and dividing by∥∥∥∥X(α)

MB′n−1

∥∥∥∥α

leads to

∥∥∥∥(S(α)σB NAn→Bn)(X(α)

MB′n−1 An)

∥∥∥∥α

=

∥∥∥∥(S(α)σB NAn→Bn)(X(α)

MB′n−1 An)

∥∥∥∥α∥∥∥∥X(α)

MB′n−1

∥∥∥∥α

∥∥∥∥X(α)MB′n−1

∥∥∥∥α

(12.1.78)

=

∥∥∥∥(S(α)σB NAn→Bn)(X(α)

MB′n−1 An)

∥∥∥∥α∥∥∥∥X(α)

MB′n−1

∥∥∥∥α

×

∥∥∥∥(

π1−α2α

M ⊗(

τnB′n−1

) 1−α2α

)ζn

MB′n−1

1−α2α

M ⊗(

τnB′n−1

) 1−α2α

)∥∥∥∥α

(12.1.79)

=

∥∥∥∥(S(α)σB NAn→Bn)(X(α)

MB′n−1 An)

∥∥∥∥α∥∥∥∥X(α)

MB′n−1

∥∥∥∥α

· Qα(ζnMB′n−1

‖πM ⊗ τnB′n−1

)1α (12.1.80)

≤ supYMB′n−1 An

≥0

∥∥∥(S(α)σB NAn→Bn)(YMB′n−1 An)∥∥∥

α∥∥∥YMB′n−1

∥∥∥α

· Qα(ζnMB′n−1

‖πM ⊗ τnB′n−1

)1α

(12.1.81)

=∥∥∥S(α)σB NAn→Bn

∥∥∥CB,1→α

· Qα(ζnMB′n−1

‖πM ⊗ τnB′n−1

)1α , (12.1.82)

where to obtain the inequality we performed an optimization with respect toall positive semi-definite operators YMB′n−1 An

. To obtain the last line, we haveused the norm ‖·‖CB,1→α defined in (6.2.68), which we show in Appendix 6.Ecan be written as

‖M‖CB,1→α = supYRA≥0

‖MA→B(YRA)‖α

‖TrA[YRA]‖α

(12.1.83)

862

Page 874: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

for any completely positive map M. Plugging (12.1.82) back in to (12.1.72),we conclude the following bound:

Dα(ρnMBnB′n−1

‖τnMBnB′n−1

)

≤ α

α− 1log2

∥∥∥S(α)σB NAn→Bn

∥∥∥CB,1→α

+ Dα(ζnMB′n−1

‖πM ⊗ τnB′n−1

). (12.1.84)

As in the proof of (12.1.35), we now iterate the above by successively bound-ing the sandwiched Rényi relative entropy terms Dα(ζ i

MB′i−1‖πM ⊗ τi

B′i−1) for

i ∈ 1, . . . , n. Starting with the term Dα(ζnMB′n−1

‖πM ⊗ τnB′n−1

), we use the

monotonicity of the sandwiched Rényi relative entropy under the second-to-last decoding channel Dn−1

Bn−1B′n−2→Fn−1B′n−1, then apply the same reasoning as

in (12.1.75)–(12.1.82) to obtain

Dα(ζnMB′n−1

‖πM ⊗ τnB′n−1

)

≤ Dα(ρn−1MBn−1B′n−2

‖πM ⊗ τn−1Bn−1B′n−2

) (12.1.85)

= Dα(ρn−1MBn−1B′n−2

‖πM ⊗ τn−1B′n−2⊗ σB) (12.1.86)

≤ α

α− 1log2

∥∥∥S(α)σB NAn−1→Bn−1

∥∥∥CB,1→α

+ Dα(ζn−1MB′n−2

‖πM ⊗ τn−1B′n−2

).

(12.1.87)

Iterating this reasoning n− 2 more times, we end up with the following bound:

Dα(ωMM‖τMM)

≤ nα

α− 1log2

∥∥∥S(α)σB NA→B

∥∥∥CB,1→α

+ Dα(ρ1MB′0‖πM ⊗ΨB′0

)

= nα

α− 1log2

∥∥∥S(α)σB NA→B

∥∥∥CB,1→α

, (12.1.88)

where the equality holds because ρ1MB′0

= πM⊗ΨB′0. Putting together (12.1.64),

(12.1.69), and (12.1.88), we finally obtain

log2 |M| ≤

α− 1log2

∥∥∥S(α)σB NA→B

∥∥∥CB,1→α

α− 1log2

(1

1− ε

). (12.1.89)

Since we proved that this bound holds for any choice of the state σB, we con-clude that

log2 |M|863

Page 875: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

≤ nα

α− 1infσB

log2

∥∥∥S(α)σB NA→B

∥∥∥CB,1→α

α− 1log2

(1

1− ε

)(12.1.90)

= nIα(N) +α

α− 1log2

(1

1− ε

), (12.1.91)

where the last equality follows from Lemma 6.20, and it implies (12.1.36).

12.1.3 The Amortized Perspective

In this section, we revisit the proofs above for Theorem 12.2 that establishbounds on non-asymptotic quantum feedback-assisted capacity. In particular,we adopt a different perspective, which we call the amortized perspective andwhich turns out to be useful in establishing bounds for all kinds of feedback-assisted protocols other than the ones considered in this chapter.

For the case of quantum feedback-assisted protocols, the amortized per-spective consists of defining the amortized mutual information of a quantumchannel as the largest net difference of the output and input mutual informa-tion that can be realized by the channel, when Alice and Bob are allowed toshare an arbitrary state before the use of the channel. A key property of theamortized mutual information of a channel is that it is more readily seen tolead to an upper bound on the non-asymptotic quantum feedback-assistedcapacity. Furthermore, another key property is that the amortized mutualinformation of a channel collapses to the usual mutual information of a chan-nel, and so this leads to an alternative way of understanding the previous re-sults. Furthermore, as indicated above, this perspective becomes quite usefulin later chapters when we analyze LOCC-assisted quantum communicationprotocols and LOPC-assisted private communication protocols.

12.1.3.1 Quantum Mutual Information

We begin by defining the following key concept, the amortized mutual infor-mation of a quantum channel:

864

Page 876: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

Definition 12.3 Amortized Mutual Information of a Channel

The amortized mutual information of a quantum channel NA→B is defined as

IA(N) := supρA′AB′

[I(A′; BB′)ω − I(A′A; B′)ρ

], (12.1.92)

whereωA′BB′ := NA→B(ρA′AB′) (12.1.93)

and the optimization is over states ρA′AB′ .

Intuitively, the amortized mutual information is equal to the largest netmutual information that can be realized by the channel, if we allow Alice andBob to share an arbitrary state before communication begins. As mentionedabove, this concept turns out to be useful for understanding the feedback-assisted protocols presented previously.

We have the following simple relationship between mutual informationand amortized mutual information:

Lemma 12.4

The mutual information of any channel NA→B does not exceed its amor-tized mutual information:

I(N) ≤ IA(N). (12.1.94)

PROOF: Let us restrict the optimization in the definition of the amortized mu-tual information to states ρA′AB′ that have a trivial B′ system. This means thatρA′AB′ is of the form ρA′AB′ = ρA′A ⊗ |0〉〈0|B′ . Therefore, I(A′A; B′)ρ = 0 andI(A′; BB′)ω = I(A′; B)ω, where ωA′BB′ = NA→B(ρA′AB′), so that

IA(N) = supρA′AB′

[I(A′; BB′)ω − I(A′A; B)ω

](12.1.95)

≥ supρA′A

I(A′; B)ω (12.1.96)

= I(N), (12.1.97)

i.e., IA(N) ≥ I(N), as required.

An important question is whether the opposite inequality, i.e., IA ≤ I(N),holds, which would allow us to conclude that IA = I(N). In this case, we find

865

Page 877: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

that it does. Specifically, we have the following.

Proposition 12.5

Given an arbitrary quantum channel N, amortization does not increaseits mutual information:

I(N) = IA(N). (12.1.98)

PROOF: To see this, consider that for an arbitrary input state ρA′AB′ , we canuse the chain rule for mutual information in (4.2.109) twice to obtain

I(A′; BB′)ω − I(A′A; B′)ρ

= I(A′; B)ω + I(A′; B′|B)ω − I(A′A; B′)ρ (12.1.99)≤ I(A′; B)ω + I(A′B; B′)ω − I(A′A; B′)ρ. (12.1.100)

In particular, to obtain the third line, note that (4.2.109) implies

I(A′B; B′)ω = I(B; B′)ω + I(A′; B′|B)ω ≥ I(A′; B′|B)ω (12.1.101)

since I(B; B′)ω ≥ 0. Continuing, we apply the monotonicity of the mutual in-formation under the channel N, which implies that I(A′A; B)ρ ≥ I(A′B; B′)ω.Therefore,

I(A′; BB′)ω − I(A′A; B′)ρ (12.1.102)≤ I(A′; B)ω + I(A′B; B′)ω − I(A′B; B′)ω (12.1.103)= I(A′; B)ω (12.1.104)≤ I(N), (12.1.105)

where the last line follows because the state ωA′B = NA→B(ρA′A) has the formof states that we consider when performing the optimization in the definitionof the mutual information of a channel. Since the inequality

I(A′; BB′)ω − I(A′A; B′)ρ ≤ I(N) (12.1.106)

holds for an arbitrary input state ρA′AB′ , we conclude the bound in (12.1.98).

We note here that the equality in (12.1.98) is stronger than the additivityof mutual information shown in Chapter 6 (in particular, that shown in The-orem 6.19). Indeed, the equality in (12.1.98) actually implies the additivity re-lation discussed previously. To see this, consider that the equality in (12.1.98)implies that

I(A′; BB′)ω − I(A′A; B′)ρ ≤ I(N) (12.1.107)

866

Page 878: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

for an arbitrary input state ρA′AB′ , where ωA′BB′ = NA→B(ρA′AB′). Now letρA′AB′ = MA′′→B′(σA′AA′′) for some channel MA′′→B′ and some state σA′AA′′ .Then it follows that

ωA′BB′ = (NA→B ⊗MA′′→B′)(σA′AA′′), (12.1.108)

and applying (12.1.107), we have that

I(A′; BB′)ω ≤ I(N) + I(A′A; B′)ρ (12.1.109)≤ I(N) + sup

σA′AA′′I(A′A; B′)ρ (12.1.110)

= I(N) + I(M), (12.1.111)

where the inequality follows because the state σA′AA′′ is a particular state toconsider for the optimization in the definition of the mutual information ofthe channel MA′′→B′ . Since the inequality holds for all input states σA′AA′′ toNA→B ⊗MA′′→B′ , we conclude that

I(N⊗M) ≤ I(N) + I(M), (12.1.112)

which is the non-trivial inequality needed in the proof of the additivity of themutual information of a channel (see the proof of Theorem 6.19).

How is the amortized mutual information relevant for analyzing a feedback-assisted protocol? Consider that the bound in (12.1.43) involves the mutualinformation I(M; BnB′n−1)ρn , so that

I(M; BnB′n−1)ρn

= I(M; BnB′n−1)ρn − I(M; B′0)ρ1 (12.1.113)

= I(M; BnB′n−1)ρn − I(M; B′0)ρ1 +n−1

∑i=1

I(M; B′i)ρi − I(M; B′i)ρi (12.1.114)

≤ I(M; BnB′n−1)ρn − I(M; B′0)ρ1 +n−1

∑i=1

I(M; BiB′i−1)ρi − I(M; B′i)ρi (12.1.115)

=n

∑i=1

I(M; BiB′i−1)ρi − I(M; B′i−1)ρi (12.1.116)

≤ n · supρA′AB′

I(A′; BB′)ω − I(A′A; B′)ρ (12.1.117)

= n · IA(N) = n · I(N). (12.1.118)

867

Page 879: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

The first equality follows because the state ρ1MB′0

is a product state. The secondequality follows by adding and subtracting the mutual information of thestate of the message system M and Bob’s memory system B′i . The inequality isa consequence of data processing under the action of the decoding channels.The third equality follows from collecting terms. The final inequality followsbecause the state ρi

MBiB′i−1is a particular state to consider in the optimization

of the amortized mutual information, and the final equality follows from theamortization collapse in Proposition 12.5.

Thus, we observe that the bound in (12.1.35), at a fundamental level, is aconsequence of the amortization collapse from Proposition 12.5.

12.1.3.2 Sandwiched Rényi Mutual Information

We can also consider the concept of amortization for the sandwiched Rényimutual information, and in this subsection, we revisit the bound in (12.1.36)to understand it from this perspective.

Definition 12.6 Amortized Sandwiched Mutual Information

The amortized sandwiched Rényi mutual information of a quantumchannel NA→B is defined for α ∈ (0, 1) ∪ (1, ∞) as follows:

IAα (N) := supρA′AB′

[Iα(A′; BB′)ω − Iα(A′A; B′)ρ

], (12.1.119)

where ωA′BB′ := NA→B(ρA′AB′) and the optimization is over statesρA′AB′ .

Just as with the mutual information of a channel, we find that for all α ∈(0, 1) ∪ (1, ∞),

Iα(N) ≤ IAα (N), (12.1.120)

and the proof of this analogous to the proof of Lemma 12.4, which establishesthe corresponding inequality for the mutual information. So the question isto determine whether the opposite inequality holds. Indeed, we find againthat it is the case, at least for α > 1.

868

Page 880: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

Proposition 12.7

Amortization does not increase the sandwiched Rényi mutual informa-tion of a quantum channel N for all α > 1:

Iα(N) = IAα (N). (12.1.121)

PROOF: Let ρA′AB′ be an arbitrary input state, and let σB and τB′ be arbitrarystates. Then, letting ωA′BB′ = NA→B(ρA′AB′), we find that

Iα(A′; BB′)ω

= infξBB′

Dα(ωA′BB′‖ωA′ ⊗ ξBB′) (12.1.122)

≤ Dα(ωA′BB′‖ωA′ ⊗ σB ⊗ τB′) (12.1.123)

= Dα(NA→B(ρA′AB′)‖ρA′ ⊗ σB ⊗ τB′) (12.1.124)

α− 1log2

∥∥∥(ρA′ ⊗ σB ⊗ τB′)1−α2α NA→B(ρA′AB′) (ρA′ ⊗ σB ⊗ τB′)

1−α2α

∥∥∥α

,

(12.1.125)

where to obtain the last equality we used the alternate expression in (4.5.3)for the sandwiched Rényi relative entropy. Defining

X(α)A′AB′

:= (ρA′ ⊗ τB′)1−α2α ρA′AB′ (ρA′ ⊗ τB′)

1−α2α , (12.1.126)

and making use of the completely positive map S(α)σB from (12.1.74), we find

that ∥∥∥(ρA′ ⊗ σB ⊗ τB′)1−α2α NA→B(ρA′AB′) (ρA′ ⊗ σB ⊗ τB′)

1−α2α

∥∥∥α

=∥∥∥(S(α)σB NA→B)(X(α)

A′AB′)∥∥∥

α(12.1.127)

=

∥∥∥(S(α)σB NA→B)(X(α)A′AB′)

∥∥∥α∥∥∥X(α)

A′B′

∥∥∥α

∥∥∥X(α)A′B′

∥∥∥α

(12.1.128)

≤ supYA′AB′≥0

∥∥∥(S(α)σB NA→B)(YA′AB′)∥∥∥

α

‖YA′B′‖α

×∥∥∥[ρA′ ⊗ τB′ ]

1−α2α ρA′B′ [ρA′ ⊗ τB′ ]

1−α2α

∥∥∥α

(12.1.129)

=∥∥∥S(α)σB NA→B

∥∥∥CB,1→α

· Qα(ρA′B′‖ρA′ ⊗ τB′)1α , (12.1.130)

869

Page 881: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

where in the last line we have used the expression in (6.E.1) for the norm‖·‖CB,1→α. Plugging (12.1.130) back into (12.1.125), we find that

Iα(A′; BB′)ω ≤α

α− 1log2

∥∥∥S(α)σB NA→B

∥∥∥CB,1→α

+ Dα(ρA′B′‖ρA′ ⊗ τB′). (12.1.131)

Since the inequality holds for arbitrary states σB and τB′ , we conclude that

Iα(A′; BB′)ω

≤ α

α− 1infσB

log2

∥∥∥S(α)σB NA→B

∥∥∥CB,1→α

+ infτB′

Dα(ρA′B′‖ρA′ ⊗ τB′) (12.1.132)

= Iα(N) + Iα(A′; B′)ρ (12.1.133)

≤ Iα(N) + Iα(A′A; B′)ρ, (12.1.134)

where the equality follows from Lemma 6.20 and the final inequality frommonotonicity of the mutual information under the partial trace TrA. Since wehave shown that the following inequality holds for an arbitrary input stateρA′AB′ :

Iα(A′; BB′)ω − Iα(A′A; B′)ρ ≤ Iα(N), (12.1.135)

we conclude that IAα (N) ≤ Iα(N), which leads to IAα (N) = Iα(N) after com-bining with (12.1.120).

By following exactly the same steps in (12.1.107)–(12.1.112), but replacingI with Iα, we can conclude that the amortization collapse in Proposition 12.7implies the additivity relation (Theorem 6.22) for sandwiched Rényi mutualinformation of quantum channels for all α > 1:

Iα(N) = IAα (N) ∀ α > 1 =⇒ Iα(N⊗M) ≤ Iα(N) + Iα(M), (12.1.136)

where N and M are quantum channels. Furthermore, by following exactly thesame steps as in (12.1.113)–(12.1.118), but replacing I with Iα, and employingProposition 12.7, we conclude the following bound

Iα(M; BnB′n−1)ρn ≤ n · Iα(N), (12.1.137)

which in turn implies the bound in (12.1.36). Thus, we can alternatively an-alyze feedback-assisted protocols and arrive at the bound in (12.1.36) by uti-lizing the concept of amortization.

870

Page 882: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

12.2 Quantum Feedback-Assisted Classical Capac-ity of a Quantum Channel

In this section, we analyze the asymptotic case, in which we allow for an ar-bitrarily high number of rounds of feedback. This task is now rather straight-forward, given the bounds that we have established in the previous section.So we keep this section brief, only stating some definitions and then some the-orems that follow as a direct consequence of definitions and developments inprevious chapters.

Definition 12.8 Achievable Rate for Quantum-Feedback-AssistedClassical Communication

Given a quantum channel N, a rate R ∈ R+ is called an achievablerate for quantum-feedback-assisted classical communication over N iffor all ε ∈ (0, 1], all δ > 0, and all sufficiently large n, there exists an(n, 2n(R−δ), ε) quantum-feedback-assisted classical communication pro-tocol.

Definition 12.9 Quantum-Feedback-Assisted Classical Capacity of aQuantum Channel

The quantum-feedback-assisted classical capacity of a quantum channelN, denoted by CQFB(N), is defined as the supremum of all achievablerates, i.e.,

CQFB(N) := supR : R is an achievable rate for N. (12.2.1)

Definition 12.10 Strong Converse Rate for Quantum-Feedback-Assisted Classical Communication

Given a quantum channel N, a rate R ∈ R+ is called a strong converserate for quantum-feedback-assisted classical communication over N iffor all ε ∈ [0, 1), all δ > 0, and all sufficiently large n, there does notexist an (n, 2n(R+δ), ε) quantum-feedback-assisted classical communica-tion protocol.

871

Page 883: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

Definition 12.11 Strong Converse Quantum-Feedback-AssistedClassical Capacity of a Quantum Channel

The strong converse quantum-feedback-assisted classical capacity of aquantum channel N, denoted by CQFB(N), is defined as the infimum ofall strong converse rates, i.e.,

CQFB(N) := infR : R is a strong converse rate for N. (12.2.2)

The main result of this section is the following capacity theorem:

Theorem 12.12 Quantum-Feedback-Assisted Classical Capacity

For any quantum channel N, its quantum-feedback-assisted classical ca-pacity CQFB(N) and its strong converse quantum-feedback-assisted clas-sical capacity are both equal to its mutual information I(N), i.e.,

CQFB(N) = CQFB(N) = I(N), (12.2.3)

where I(N) is defined in (4.11.102).

PROOF: By previous reasoning, we have that

CQFB(N) ≤ CQFB(N), (12.2.4)

and by Theorem 6.16 and the fact that any entanglement-assisted classicalcommunication protocol is a particular kind of quantum-feedback-assistedclassical communication protocol, we have that

I(N) ≤ CQFB(N) ≤ CQFB(N). (12.2.5)

The upper bound CQFB(N) ≤ I(N) follows from (12.1.36) and the same rea-soning given in the proof detailed in Section 6.2.3.

12.3 Bibliographic Notes

Shannon (1956) proved that feedback does not increase the classical capacityof a classical channel (his result is a weak-converse bound). The strong con-verse for the feedback-assisted classical capacity of a classical channel was

872

Page 884: Principles of Quantum Communication Theory: A Modern Approach

Chapter 12: Quantum-Feedback-Assisted Communication

established independently by Kemperman and Kesten. Kesten’s proof ap-peared in (Wolfowitz, 1964, Chapter 4) and Kemperman’s proof appearedlater in (Kemperman, 1971). See (Ulrey, 1976) for a discussion of this history.Polyanskiy and Verdú (2010) employed a Rényi-entropic method to extendShannon’s result to a strong converse statement, i.e., that the mutual infor-mation of a classical channel is equal to the strong converse feedback-assistedclassical capacity.

Bowen (2004) proved that a quantum feedback channel does not increasethe entanglement-assisted classical capacity of a quantum channel (his resultis a weak-converse bound). Bennett et al. (2014) proved that the mutual in-formation of a quantum channel is equal to the strong converse quantum-feedback-assisted classical capacity. Their approach was to employ the quan-tum reverse Shannon theorem to do so. Cooney et al. (2016) used a Rényi-entropic method to prove this same result, and this is the approach that wehave followed in this book. As far as we are aware, the concept of amortizedmutual information of a quantum channel and the fact that it reduces to themutual information of a quantum channel are original to this book.

873

Page 885: Principles of Quantum Communication Theory: A Modern Approach

Chapter 13

Classical-Feedback-AssistedCommunication

[IN PROGRESS - incorporate results of arXiv:1506.02228, arXiv:1902.02490,and arXiv:2010.01058]

874

Page 886: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14

LOCC-Assisted QuantumCommunication

This chapter develops an important variation of quantum communication, inwhich we allow the sender and receiver the free use of classical communica-tion. That is, in between every use of a given quantum communication chan-nel NA→B, the sender and receiver are allowed to perform local operationsand classical communication (LOCC). For this reason, the capacity consid-ered in this chapter is called the LOCC-assisted quantum capacity.

The practical motivation for this kind of feedback-assisted quantum ca-pacity comes from the fact that, these days, classical communication is rathercheap and plentiful. Thus, from a resource-theoretic perspective, it can besensible to simply allow classical communication as a free resource (similarto how we did for entanglement-assisted communication in Chapter 6). Thenour goal is to place informative bounds on the rate at which quantum infor-mation can be communicated from the sender to the receiver in this setting.Furthermore, these bounds are relevant for understanding and placing limita-tions on the speed at which distributed quantum computation can be carriedout.

In order to establish upper bounds on LOCC-assisted quantum capacity,we revisit the concept of amortization introduced in Section 12.1.3. However,in this context, we proceed somewhat differently, instead employing entan-glement measures to quantify how much entanglement can be generated bymultiple uses of a quantum channel. Then we define the amortized entangle-ment of a quantum channel as the largest difference between the output and

875

Page 887: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

input entanglement of the channel. In particular, two entanglement measureson which we focus are the squashed entanglement and the Rains relative en-tropy, as well as a variant of the latter called max-Rains relative entropy. Onekey property of the squashed entanglement and the max-Rains relative en-tropy is that these entanglement measures do not increase under amortiza-tion, similar to how we previously observed that the mutual information ofa channel does not increase under amortization. This key property leads tothe conclusion that these entanglement measures can serve as upper boundson the LOCC-assisted quantum capacity of a quantum channel, and arrivingat this conclusion is one of the main goals of this chapter. At the end of thechapter, we demonstrate the utility of the squashed entanglement and Rainsfamily of bounds by evaluating them for several example quantum channelsof interest.

Combining Entanglement Distillation and Teleportation to Obtain a Quan-tum Communication Protocol

We can use entanglement distillation along with teleportation in the asymp-totic setting to obtain a lower bound on the number of transmitted qubits in aquantum communication protocol. The strategy is as follows; see Figure 14.1.

1. Alice prepares several copies, say n, of a pure state ψAA, with the dimen-sion of A equal to the dimension of A, and sends each of the A systemsthrough the channel NA→B to Bob.

2. Alice and Bob now share n copies of the state ωAB = NA→B(ψAA). Theyperform a one-way entanglement distillation protocol to convert thesemixed entangled states to a perfect, maximally entangled state ΦAB ofSchmidt rank d ≥ 2. Roughly speaking, as shown in Chapter 8, a Schmidtrank of 2nI(A〉B)ω is achievable for n copies of ωAB, as n→ ∞.

3. Using the distilled maximally entangled state, along with 2 log2 d bits ofclassical communication, Alice and Bob perform the quantum teleporta-tion protocol to transmit the A′ part of an arbitrary pure state ΨRA′ fromAlice to Bob, with dA′ = d.

Since there are n uses of the channel in this strategy, we see that as n→ ∞, therate of this strategy (the number of qubits per channel use) is log2 d

n = I(A〉B)ω.By optimizing over all initial pure states ψAA prepared by Alice, we find that,in the asymptotic setting, the rate supψAA

I(A〉B)ω = Ic(N) is achievable,

876

Page 888: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

ψAn An

ΨRA′ ΨRB′

N⊗n L→

One-wayentanglement

distillation

T→

Teleportation

An

An

A

A′

Bn B

B′

R

Alice

Bob

Reference

Alice

FIGURE 14.1: Sketch of a quantum communication protocol over n uses ofthe channel N, which makes use of entanglement distillation and telepor-tation. The arrow “→” indicates that the channel contains classical com-munication from Alice to Bob only. We show in [] that, in the asymptoticlimit, this strategy achieves the quantum capacity of N.

where we recognize the coherent information Ic(N) of the channel N, whichwe define in (4.11.107). Note that the strategy we have outlined here alsoinvolves classical communication between Alice and Bob during the entan-glement distillation step and during the teleportation step. The results of Sec-tion 9.1.3 and Section 9.2.1 show that this strategy is nonetheless a valid quan-tum communication protocol, in the sense that the rate Ic(N) can be achievedby a strategy that does not make use of forward classical communication.

Now, Alice can do better than prepare the state ψ⊗nAA

and send each Asystem through the channel: she can prepare a state ψA1···An A1···An

≡ ψAn An

such that the systems A1, . . . , An being sent through N are entangled. One canthen achieve a rate of 1

n supψAn AnI(An〉Bn)ω = 1

n Ic(N⊗n). Since we are freeto use the channel N as many times as we want, we can optimize over n toobtain the communication rate

supn∈N

1n

Ic(N⊗n) =: Icreg(N), (14.0.1)

and it is this regularized coherent information of N that is optimal for quantumcommunication over the channel N. In other words, Q(N) = Ic

reg(N), and weprove this in Section 9.2.2.

877

Page 889: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

14.1 n-Shot LOCC-Assisted Quantum Communica-tion Protocol

This section discusses the most general form for an n-shot LOCC-assistedquantum communication protocol.

Before starting, we should clarify that the goal of such a protocol is to pro-duce an approximate maximally entangled state, with the Schmidt rank of theideal target state as large as possible. Due to the assumption of free classicalcommunication, as well as the quantum teleportation protocol discussed inSection 3.3, generating an approximate maximally entangled state is equiv-alent to generating an approximate identity quantum channel, such that thedimension of the ideal target identity channel is equal to the Schmidt rank ofthe target maximally entangled state. To make this statement more quanti-tative, suppose that ωA′B′ is an approximate maximally entangled state; i.e.,suppose that it is ε-close in normalized trace distance to a maximally entan-gled state ΦA′B′ of Schmidt rank d:

12‖ωA′B′ −ΦA′B′‖1 ≤ ε. (14.1.1)

Let TAA′B′→B denote the one-way LOCC channel corresponding to quantumteleportation (as discussed around (3.3.27)). Then by applying (3.3.28), it fol-lows that teleportation over the ideal resource state ΦA′B′ realizes an identitychannel idA→B of dimension d:

TAA′B′→B((·)⊗ΦA′B′) = idA→B(·). (14.1.2)

Let TωA→B denote the channel realized by teleportation over the unideal state ωA′B′ :

TωA→B(·) := TAA′B′→B((·)⊗ωA′B′). (14.1.3)

We would then like to determine the deviation of the ideal channel fromTω

A→B, and to do so, we can employ the normalized diamond distance. Thenconsider that, from the data-processing inequality for trace distance,

‖TωA→B − idA→B‖

= supψRA

‖TωA→B(ψRA)− idA→B(ψRA)‖1 (14.1.4)

= supψRA

‖TAA′B′→B(ψRA ⊗ωA′B′)− TAA′B′→B(ψRA ⊗ΦA′B′)‖1 (14.1.5)

878

Page 890: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

≤ supψRA

‖ψRA ⊗ωA′B′ − ψRA ⊗ΦA′B′‖1 (14.1.6)

= ‖ωA′B′ −ΦA′B′‖1 ≤ 2ε, (14.1.7)

so that we arrive at the desired statement mentioned above:

12‖ωA′B′ −ΦA′B′‖1 ≤ ε ⇒ 1

2‖Tω

A→B − idA→B‖ ≤ ε. (14.1.8)

Thus, for the above reason, we focus exclusively on LOCC-assisted proto-cols whose aim is to generate an approximate maximally entangled state. Inwhat follows, all bipartite cuts for separable states or LOCC channels shouldbe understood as being between Alice’s and Bob’s systems.

A protocol for LOCC-assisted quantum communication is depicted in Fig-ure [NEED TO INCLUDE FIGURE], and it is defined by the following ele-ments:

(ρ(1)A′1 A1B′1

, L(i)A′i−1Bi−1B′i−1→A′i AiB′i

ni=2,L(n+1)

A′nBnB′n→MA MB), (14.1.9)

where ρ(1)A′1 A1B′1

is a separable state, L(i)A′i−1Bi−1B′i−1→A′i AiB′i

is an LOCC channel

for i ∈ 2, . . . , n, and L(n+1)A′nBnB′n→MA MB

is a final LOCC channel that gener-ates the approximate maximally entangled state in systems MA and MB. LetC denote all of these elements, which together constitute the LOCC-assistedquantum communication code. All systems with primed labels should beunderstood as local quantum memory or scratch registers that Alice or Bobcan employ in this information-processing task. They are also assumed to befinite-dimensional, but could be arbitrarily large. The unprimed systems arethe ones that are either input to or output from the quantum communicationchannel NA→B.

The LOCC-assisted quantum communication protocol begins with Aliceand Bob performing an LOCC channel L(1)

∅→A′1 A1B′1, which leads to the sepa-

rable state ρ(1)A′1 A1B′1

mentioned above, where A′1 and B′1 are systems that arefinite-dimensional but arbitrarily large. The system A1 is such that it can befed into the first channel use. Alice sends system A1 through the first channeluse, leading to a state

ω(1)A′1B1B′1

:= NA1→B1(ρ(1)A′1 A1B′1

). (14.1.10)

879

Page 891: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

Alice and Bob then perform the LOCC channel L(2)A′1B1B′1→A′2 A2B′2

, which leadsto the state

ρ(2)A′2 A2B′2

:= L(2)A′1B1B′1→A′2 A2B′2

(ω(1)A′1B1B′1

). (14.1.11)

Alice sends system A2 through the second channel use NA2→B2 , leading to thestate

ω(2)A′2B2B′2

:= NA2→B2(ρ(2)A′2 A2B′2

). (14.1.12)

This process iterates: the protocol uses the channel n times. In general, wehave the following states for all i ∈ 2, . . . , n:

ρ(i)A′i AiB′i

:= L(i)A′i−1Bi−1B′i−1→A′i AiB′i

(ω(i−1)A′i−1Bi−1B′i−1

), (14.1.13)

ω(i)A′iBiB′i

:= NAi→Bi(ρ(i)A′i AiB′i

), (14.1.14)

where L(i)A′i−1Bi−1B′i−1→A′i AiB′i

is an LOCC channel. The final step of the protocol

consists of an LOCC channel L(n+1)A′nBnB′n→MA MB

, which generates the systemsMA and MB for Alice and Bob, respectively. The protocol’s final state is asfollows:

ωMA MB := L(n+1)A′nBnB′n→MA MB

(ω(n)A′nBnB′n

). (14.1.15)

The goal of the protocol is for the final state ωMA MB to be close to a max-imally entangled state, and we define the quantum error probability of thecode as follows:

qerr(C) := 1− F(ωMA MB , ΦMA MB) (14.1.16)= 1− 〈Φ|MA MBωMA MB |Φ〉MA MB , (14.1.17)

where F denotes the quantum fidelity (Definition 3.55) and the maximallyentangled state ΦMA MB = |Φ〉〈Φ|MA MB is defined from

|Φ〉MA MB :=1√M

M

∑m=1|m〉MA ⊗ |m〉MB , (14.1.18)

such that it has Schmidt rank M. Intuitively, the quantum error probabilityqerr(C) is equal to the probability that one obtains the outcome “not max-imally entangled state ΦMA MB” when performing the test or measurementΦMA MB , IMA MB −ΦMA MB on the final state ωMA MB of the protocol.

880

Page 892: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

Definition 14.1 (n, M, ε) LOCC-Assisted Quantum CommunicationProtocol

Let C := (ρ(1)A′1 A1B′1

, L(i)A′i−1Bi−1B′i−1→A′i AiB′i

ni=2,L(n+1)

A′nBnB′n→MA MB) be the ele-

ments of an n-round LOCC-assisted quantum communication protocolover the channel NA→B. The protocol is called an (n, M, ε) protocol, withε ∈ [0, 1], if qerr(C) ≤ ε.

14.1.1 Lower Bound on the Number of Transmitted Qubits

[IN PROGRESS]

one-shot lower bound in terms of coherent information of a state. Thisachieves coherent information of a channel, as well as reverse coherent infor-mation of a channel.

14.1.2 Amortized Entanglement as a General Upper Boundfor LOCC-Assisted Quantum Communication Protocols

It is an interesting question to determine whether the inequality opposite tothat in Lemma 5.42 holds, and as the following proposition demonstrates,this question is intimately related to finding useful upper bounds on the rateat which maximal entanglement can be distilled by employing an LOCC-assisted quantum communication protocol. The following proposition repre-sents our fundamental starting point when analyzing limitations on LOCC-assisted quantum communication protocols.

Proposition 14.2

Let NA→B be a quantum channel, let ε ∈ [0, 1], and let E be an en-tanglement measure that is equal to zero for all separable states. Foran (n, M, ε) LOCC-assisted quantum communication protocol with finalstate ωMA MB , the following bound holds

E(MA; MB)ω ≤ n · EA(N). (14.1.19)

881

Page 893: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

PROOF: Consider an LOCC-assisted quantum communication protocol as pre-sented in Section 14.1. From the monotonicity of the entanglement measure Ewith respect to LOCC channels, we find that

E(MA; MB)ω

≤ E(A′n; BnB′n)ω(n) (14.1.20)= E(A′n; BnB′n)ω(n) − E(A′1A1; B′1)ρ(1) (14.1.21)

= E(A′n; BnB′n)ω(n) +

[n

∑i=2

E(A′i Ai; B′i)ρ(i) − E(A′i Ai; B′i)ρ(i)

]

− E(A′1A1; B′1)ρ(1) (14.1.22)

≤n

∑i=1

[E(A′i; BiB′i)ω(i) − E(A′i Ai; B′i)ρ(i)

](14.1.23)

≤ n · EA(N). (14.1.24)

The first equality follows because the state ρ(1)A′1 A1B′1

is a separable state, and byassumption, the entanglement measure E vanishes for all such states. Thesecond equality follows by adding and subtracting equal terms. The sec-ond inequality follows because E(A′i Ai; B′i)ρ(i) ≤ E(A′i−1; Bi−1B′i−1)ω(i−1) forall i ∈ 2, . . . , n, due to monotonicity of the entanglement measure E withrespect to LOCC channels. The final inequality follows from the definitionof amortized entanglement and the fact that the states ω

(i)A′iBiB′i

and ρ(i)A′i AiB′i

areparticular states to consider in its optimization.

The inequality in (14.1.19) states that the entanglement of the final outputstate ωMA MB , as quantified by E, cannot exceed n times the amortized entan-glement of the channel NA→B. Intuitively, the only resource allowed in theprotocol, which has the potential to generate entanglement, is the quantumcommunication channel NA→B. All of the LOCC channels allowed for freehave no ability to generate entanglement on their own. Thus, the entangle-ment of the final state should not exceed the largest possible amount of en-tanglement that could ever be generated with n calls to the channel, and thislargest entanglement is exactly the amortized entanglement of the channel.

Observe that the bound in Proposition 14.2 depends on the final stateωMA MB , and thus it is not a universal bound, depending only on the param-eters n, M, and ε, because this state in turn depends on the entire protocol.Similar to the upper bounds established in previous chapters, it is desirableto refine this bound such that it depends only on n, M, and ε, which are the

882

Page 894: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

parameters characterizing any generic LOCC-assisted quantum communica-tion protocol. In the forthcoming sections, we consider particular entangle-ment measures, such as squashed entanglement and Rains relative entropy,which allow us to relate the parameters M and ε to the final state ωMA MB .

To end this section, we note here that the bound in Proposition 14.2 sim-plifies for teleportation-simulable channels and for entanglement measuresthat are subadditive with respect to states and equal to zero for all separablestates. This conclusion is a consequence of Propositions 5.44 and 14.2:

Corollary 14.3 Reduction by Teleportation

Let ES be an entanglement measure that is subadditive with respect tostates and equal to zero for all separable states. Let NA→B be a channelthat is teleportation-simulable with associated resource state θRB′ . Letε ∈ [0, 1]. For an (n, M, ε) LOCC-assisted quantum communication pro-tocol with final state ωMA MB , the following bound holds

ES(MA; MB)ω ≤ n · ES(R; B′)θ. (14.1.25)

14.1.3 Squashed Entanglement Upper Bound on the Numberof Transmitted Qubits

We now establish the squashed entanglement upper bound on the numberof qubits that a sender can transmit to a receiver by employing an LOCC-assisted quantum communication protocol:

Theorem 14.4 n-Shot Squashed Entanglement Upper Bound

Let NA→B be a quantum channel, and let ε ∈ [0, 1). For all (n, M, ε)LOCC-assisted quantum communication protocols over the channelNA→B, the following bound holds

log2 M ≤ 11−√ε

[n · Esq(N) + g2(

√ε)]

. (14.1.26)

PROOF: Consider an arbitrary (n, M, ε) LOCC-assisted quantum communi-cation protocol over the channel NA→B, as defined in Section 14.1. The squashedentanglement is an entanglement measure (monotone under LOCC as shown

883

Page 895: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

in Theorem 5.33) and it is equal to zero for separable states (Proposition 5.4.5).Thus, Proposition 14.2 applies, and we find that

Esq(MA; MB)ω ≤ n · EAsq(N) = n · Esq(N), (14.1.27)

where the equality follows from Theorem 5.58. Applying Definition 14.1 leadsto

F(ΦMA MB , ωMA MB) ≥ 1− ε. (14.1.28)

As a consequence of Proposition 5.38, we find that

Esq(MA; MB)ω

≥ Esq(MA; MB)Φ −[√

ε log2 min |MA| , |MB|+ g2(√

ε)]

(14.1.29)

= log2 M−[√

ε log2 M + g2(√

ε)]

(14.1.30)

= (1−√

ε) log2 M− g2(√

ε). (14.1.31)

The first equality follows from Proposition 5.36. We can finally rearrange theestablished inequality n ·Esq(N) ≥ (1−√ε) log2 M− g2(

√ε) to be in the form

stated in the theorem.

14.2 n-Shot PPT-Assisted Quantum CommunicationProtocol

Recalling the completely PPT-preserving channels of Definition 3.42 as form-ing a superset of LOCC channels, we can also consider quantum communica-tion protocols assisted by completely PPT-preserving channels (abbreviatedas C-PPT-P channels). Such a PPT-assisted protocol has exactly the same formas given in Section 14.1, but it is instead defined by the following elements:

(ρ(1)A′1 A1B′1

, P(i)A′i−1Bi−1B′i−1→A′i AiB′i

ni=2,P(n+1)

A′nBnB′n→MA MB), (14.2.1)

where ρ(1)A′1 A1B′1

is a PPT state, P(i)A′i−1Bi−1B′i−1→A′i AiB′i

is a C-PPT-P channel for

i ∈ 2, . . . , n, and P(n+1)A′nBnB′n→MA MB

is a final C-PPT-P channel that generatesan approximate maximally entangled state in systems MA and MB. Denotingthe final state of the protocol again by ωMA MB , the criterion for such a protocolis again given by (14.1.16). We then arrive at the following definition:

884

Page 896: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

Definition 14.5 (n, M, ε) PPT-Assisted Quantum CommunicationProtocol

Let C := (ρ(1)A′1 A1B′1

, P(i)A′i−1Bi−1B′i−1→A′i AiB′i

ni=2,P(n+1)

A′nBnB′n→MA MB) be the el-

ements of an n-round PPT-assisted quantum communication protocolover the channel NA→B. The protocol is called an (n, M, ε) protocol, withε ∈ [0, 1], if qerr(C) ≤ ε.

Since every LOCC channel is a C-PPT-P channel, we can make the follow-ing observation immediately:

REMARK: Every (n, M, ε) LOCC-assisted quantum communication protocol is also an(n, M, ε) PPT-assisted quantum communication protocol.

As a consequence of the above observation, any converse bound or limi-tation that we establish for an arbitrary (n, M, ε) PPT-assisted protocol is alsoa converse bound for an (n, M, ε) LOCC-assisted protocol.

By the exact same reasoning as in the proof of Proposition 14.2, but re-placing LOCC channels with C-PPT-P channels, we arrive at the followingproposition:

Proposition 14.6

Let NA→B be a quantum channel, let ε ∈ [0, 1], and let E be an entangle-ment measure that is monotone under completely PPT-preserving chan-nels and is equal to zero for all PPT states. For an (n, M, ε) PPT-assistedquantum communication protocol with final state ωMA MB , the followingbound holds

E(MA; MB)ω ≤ n · EA(N). (14.2.2)

Recalling Definition 3.47, a channel NA→B with input system A and out-put system B is defined to be PPT-simulable with associated resource stateωRB′ if the following equality holds for all input states ρA:

NA→B(ρA) = PARB′→B(ρA ⊗ωRB′), (14.2.3)

where PARB′→B is a completely PPT-preserving channel between the sender,who has systems A and R, and the receiver, who has system B′.

885

Page 897: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

By the same reasoning used to arrive at Corollary 14.3, but replacing LOCCchannels with completely PPT-preserving ones, we find the following:

Corollary 14.7

Let ES denote an entanglement measure that is monotone non-increasing with respect to completely PPT-preserving channels, subad-ditive with respect to states, and equal to zero for all PPT states. LetNA→B be a channel that is PPT-simulable with associated resource stateθRB′ . Let ε ∈ [0, 1]. For an (n, M, ε) PPT-assisted quantum communica-tion protocol with final state ωMA MB , the following bound holds

ES(MA; MB)ω ≤ n · ES(R; B′)θ. (14.2.4)

14.2.1 Rényi–Rains Information Upper Bounds on the Num-ber of Transmitted Qubits

We now establish the max-Rains information upper bound on the number ofqubits that a sender can transmit to a receiver by employing a PPT-assistedquantum communication protocol:

Theorem 14.8 n-Shot Max-Rains Upper Bound

Let NA→B be a quantum channel, and let ε ∈ [0, 1). For all (n, M, ε) PPT-assisted quantum communication protocols over the channel NA→B, thefollowing bound holds

log2 M ≤ n · Rmax(N) + log2

(1

1− ε

). (14.2.5)

PROOF: Consider an arbitrary (n, M, ε) LOCC-assisted quantum communi-cation protocol over the channel NA→B, as defined in Section 14.1. The max-Rains relative entropy is an entanglement measure (monotone under com-pletely PPT-preserving channels as shown in Proposition 5.28), and it is equalto zero for PPT states. Thus, Proposition 14.6 applies, and we find that

Rmax(MA; MB)ω ≤ n · RAmax(N) = n · Rmax(N), (14.2.6)

where the equality follows from Theorem 5.56. Applying Definition 14.5 leads

886

Page 898: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

toF(ΦMA MB , ωMA MB) ≥ 1− ε. (14.2.7)

As a consequence of Propositions 8.6 and 4.68, we find that

log2 M ≤ Rε(MA; MB)ω (14.2.8)

≤ Rmax(MA; MB)ω + log2

(1

1− ε

). (14.2.9)

Combining (14.2.6) and (14.2.9), we conclude the proof.

At this point, it is worthwhile to compare the squashed-entanglementbound in Theorem 14.4 with the max-Rains information bound in Theorem 14.8.First, both bounds hold for all quantum channels, and so this is an advan-tage that they both possess. The squashed entanglement and max-Rains in-formation are rather different quantities, and so the quantities on their owncan vary based on the channel for which they are evaluated. The squashed-entanglement bound in Theorem 14.4 is a weak-converse bound, whereas themax-Rains information bound in Theorem 14.8 is a strong-converse bound.The max-Rains information bound has the advantage that it is efficiently com-putable by a semi-definite program, whereas it is not known how to computethe squashed entanglement. However, one can apply Proposition 5.37 to seethat the squashed entanglement bound gives a whole host of upper boundsrelated to the choice of a squashing channel, and one can potentially obtaintight bounds by making a clever choice of a squashing channel.

For channels that are PPT-simulable with associated resource states, asrecalled in (14.2.3), we obtain upper bounds that can be even stronger:

Theorem 14.9 n-Shot Rényi–Rains Upper Bounds for PPT-Simulable Channels

Let NA→B be a quantum channel that is PPT-simulable with associatedresource state θSB′ , and let ε ∈ [0, 1). For all (n, M, ε) PPT-assisted quan-tum communication protocols over the channel NA→B, the followingbounds hold for all α > 1:

log2 M ≤ n · Rα(S; B′)θ +α

α− 1log2

(1

1− ε

), (14.2.10)

log2 M ≤ 11− ε

[n · R(S; B′)θ + h2(ε)

]. (14.2.11)

887

Page 899: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

PROOF: Consider an arbitrary (n, M, ε) LOCC-assisted quantum communi-cation protocol over the channel NA→B, as defined in Section 14.1. The Rényi–Rains relative entropy and Rains relative entropy are monotone non-increasingunder completely PPT-preserving channels (Proposition 5.25), equal to zerofor PPT states, and subadditive with respect to states (Proposition 5.25). Assuch, Corollary 14.7 applies, and we find for α > 1 that

Rα(MA; MB)ω ≤ n · Rα(S; B′)θ, (14.2.12)R(MA; MB)ω ≤ n · R(S; B′)θ. (14.2.13)

Applying Definition 14.5 leads to

F(ΦMA MB , ωMA MB) ≥ 1− ε. (14.2.14)

As a consequence of Proposition 8.6, we have that

log2 M ≤ Rε(MA; MB)ω. (14.2.15)

Applying Propositions 4.67 and 4.68, we find that

log2 M ≤ Rα(MA; MB)ω +α

α− 1log2

(1

1− ε

), (14.2.16)

log2 M ≤ 11− ε

[R(MA; MB)ω + h2(ε)] . (14.2.17)

Putting together (14.2.12), (14.2.13), (14.2.16), and (14.2.17) concludes the proof.

14.3 LOCC- and PPT-Assisted Quantum Capacitiesof Quantum Channels

In this section, we analyze the asymptotic capacities, and as before, the upperbounds for the asymptotic capacities are straightforward consequences of thenon-asymptotic bounds given in Sections 14.1.3 and 14.2.1. The definitions ofthese capacities are similar to what we have given previously, and so we onlystate them here briefly.

888

Page 900: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

Definition 14.10 Achievable Rate for LOCC-Assisted QuantumCommunication

Given a quantum channel N, a rate R ∈ R+ is called an achievable ratefor LOCC-assisted quantum communication over N if for all ε ∈ (0, 1],all δ > 0, and all sufficiently large n, there exists an (n, 2n(R−δ), ε) LOCC-assisted quantum communication protocol.

Definition 14.11 LOCC-Assisted Quantum Capacity of a QuantumChannel

The LOCC-assisted quantum capacity of a quantum channel N, denotedby Q↔(N), is defined as the supremum of all achievable rates, i.e.,

Q↔(N) := supR : R is an achievable rate for N. (14.3.1)

Definition 14.12 Weak Converse Rate for LOCC-Assisted QuantumCommunication

Given a quantum channel N, a rate R ∈ R+ is called a weak converserate for LOCC-assisted quantum communication over N if every R′ > Ris not an achievable rate for N.

Definition 14.13 Strong Converse Rate for LOCC-Assisted Quan-tum Communication

Given a quantum channel N, a rate R ∈ R+ is called a strong con-verse rate for LOCC-assisted quantum communication over N if for allε ∈ [0, 1), all δ > 0, and all sufficiently large n, there does not exist an(n, 2n(R+δ), ε) LOCC-assisted quantum communication protocol.

Definition 14.14 Strong Converse LOCC-Assisted Quantum Capac-ity of a Quantum Channel

The strong converse LOCC-assisted quantum capacity of a quantumchannel N, denoted by Q↔(N), is defined as the infimum of all strongconverse rates, i.e.,

Q↔(N) := infR : R is a strong converse rate for N. (14.3.2)

889

Page 901: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

We have the exact same definitions for PPT-assisted quantum communi-cation, and we use the notation Q↔PPT to refer to the PPT-assisted quantumcapacity and Q↔PPT for the strong converse PPT-assisted quantum capacity.

Recall that, by definition, the following bounds hold

Q↔(N) ≤ Q↔(N) ≤ Q↔PPT(N), (14.3.3)

Q↔(N) ≤ Q↔PPT(N) ≤ Q↔PPT(N). (14.3.4)

As a direct consequence of the bound in Theorem 14.4 and methods simi-lar to those given in the proof of Theorem 6.23, we find the following:

Theorem 14.15 Squashed-Entanglement Weak-Converse Bound

The squashed entanglement of a channel N is a weak converse rate forLOCC-assisted quantum communication:

Q↔(N) ≤ Esq(N). (14.3.5)

As a direct consequence of the bound in Theorem 14.8 and methods simi-lar to those given in Section 6.2.3, we find that

Theorem 14.16 Max-Rains Strong-Converse Bound

The max-Rains information of a channel N is a strong converse rate forPPT-assisted quantum communication:

Q↔PPT(N) ≤ Rmax(N). (14.3.6)

As a direct consequence of the bound in Theorem 14.9 and methods simi-lar to those given in Section 6.2.3, we find that

Theorem 14.17 Rains Strong-Converse Bound for PPT-SimulableChannels

Let N be a quantum channel that is PPT-simulable with associated re-source state θSB′ . Then the Rains information of N is a strong converserate for PPT-assisted quantum communication:

Q↔PPT(N) ≤ R(N). (14.3.7)

890

Page 902: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

14.4 Examples

[IN PROGRESS]

erasure channel - get Rains information as a strong converse rate - willmatch lower bound in terms of reverse coherent information (Hayashi calledthis pseudocoherent information)

covariant dephasing channels - get Rains information as a strong converserate and then coherent information matches this (will evaluate Rains informa-tion bound in unassisted quantum capacity chapter)

depolarizing channel - evaluate Rains information

use squashed entanglement to give upper bound for amplitude dampingchannel

14.5 Bibliographic Notes

The concept of LOCC-assisted quantum communication over a quantum chan-nel was presented in (Bennett et al., 1996c, Section V). The same Section V of(Bennett et al., 1996c) also showed how to use the notion of teleportation sim-ulation of a quantum channel and entanglement measures in order to boundthe LOCC-assisted quantum capacity from above by a resource state that canrealize the channel by teleportation simulation. Müller-Hermes (2012) pre-sented a more detailed analysis of this bounding technique. Other papersthat make use of the teleportation-simulation technique in this and other con-texts include those by Horodecki et al. (1999); Gottesman and Chuang (1999);Zhou et al. (2000); Bowen and Bose (2001); Takeoka et al. (2002); Giedke andIgnacio Cirac (2002); Wolf et al. (2007); Niset et al. (2009); Chiribella et al.(2009); Soeda et al. (2011); Leung and Matthews (2015); Pirandola et al. (2017);Takeoka et al. (2016); Wilde et al. (2017); Takeoka et al. (2017); Kaur and Wilde(2017).

A precise mathematical definition of an LOCC-assisted quantum com-munication protocol conducted over a quantum channel was presented in(Müller-Hermes, 2012, Definition 12) and (Takeoka et al., 2014, Section IV).

That the entanglement of the final state of an n-round LOCC-assistedquantum communication is bounded from above by n times the channel’s

891

Page 903: Principles of Quantum Communication Theory: A Modern Approach

Chapter 14: LOCC-Assisted Quantum Communication

amortized entanglement (Proposition 14.2) was anticipated by Bennett et al.(2003) and proven by Kaur and Wilde (2017). Corollary 14.3 was anticipatedby Bennett et al. (1996c) and presented in more detail in (Müller-Hermes,2012, Chapter 4), while the form in which we have presented it here is closelyrelated to the presentation by Kaur and Wilde (2017).

The n-round PPT-assisted quantum communication protocols presentedin Section 14.2 were considered by Kaur and Wilde (2017), with PPT-assistedquantum communication over a single or parallel use of a quantum channelconsidered by Leung and Matthews (2015); Wang and Duan (2016b); Wanget al. (2019b). The bound in Proposition 14.6 was established by Kaur andWilde (2017).

Theorem 14.4 is due to Takeoka et al. (2014).

Wang and Duan (2016a) defined a semi-definite programming upper boundon distillable entanglement of a bipartite state, and Wang and Duan (2016b)defined a semi-definite programming upper bound on the quantum capacityof a quantum channel. Wang et al. (2019b) observed that the quantity definedby Wang and Duan (2016a) is equal to the max-Rains relative entropy, whilealso observing that the quantity defined by Wang and Duan (2016b) is equalto the max-Rains information of a quantum channel. Berta and Wilde (2018)established the max-Rains information as an upper bound on the n-roundnon-asymptotic PPT-assisted quantum capacity (Theorem 14.8). The upperbounds in Theorem 14.9 are due to Kaur and Wilde (2017).

892

Page 904: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15

LOPC-Assisted PrivateCommunication

This chapter continues with the theme of feedback-assisted communication.Here, we consider secret-key-agreement protocols, where the goal is for thesender and receiver of a quantum channel NA→B to establish secret key, byusing the quantum channel NA→B along with the free use of public clas-sical communication. That is, between every channel use in a secret-key-agreement protocol, the sender and receiver are allowed to perform local op-erations and public classical communication (LOPC). The notion of capacitydeveloped in this chapter is known as secret-key-agreement capacity.

The secret key distilled in such a secret-key-agreement protocol should beprotected from an eavesdropper. The model we assume here is that the eaves-dropper is quite powerful, having access to the full environment of every useof the quantum channel NA→B, as well as a copy of all of the classical dataexchanged by the sender and receiver when they conduct a round of LOPC.To understand this model in a physical context, suppose that the quantumchannel connecting the sender and receiver is a fiber-optic cable, which wecan model as a bosonic loss channel, and suppose that the sender employsquantum states of light to distill a secret key with the receiver. Then, in thiseavesdropper model, we are assuming that all of the light that does not makeit to the receiver is collected by the eavesdropper in a quantum memory. Inthis way, the secret key distilled by such a protocol is guaranteed to be secureagainst a quantum-enabled eavesdropper.

The practical motivation for secret-key-agreement protocols is related to

893

Page 905: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

the motivation that we considered in the previous chapter on LOCC-assistedquantum communication. Classical communication is cheap and plentifulthese days, and so from a resource-theoretic perspective, it seems sensible toallow it for free in a theoretical model of communication. Once a secret keyhas been established, it can be used in conjuction with the well known one-time pad protocol as a scheme for private communication of an arbitrary mes-sage that has the same size as the key. Thus, as a consequence of the one-timepad protocol, it follows that secret key agreement and private communicationare equivalent information-processing tasks when public classical communi-cation is available for free. Furthermore, the model of LOPC-assisted secretkey agreement is essentially the same model considered in quantum key dis-tribution, which is one of the most famous applications in quantum informa-tion science. One of the main goals of this chapter is to place bounds on therate at which secret-key-agreement is possible. Due to the strong connectionbetween the model of secret-key-agreement and quantum key distribution,these bounds then place limitations on the rates at which it is possible to gen-erate secret key in a quantum key distribution protocol.

The main method for placing limitations on the rates of secret-key-agreementprotocols is similar to the approach that we took in the last chapter. In fact,there are many parallels. We again use the concept of amortization and en-tanglement measures, such as squashed entanglement and relative entropy ofentanglement (and several variants of the latter).

However, the main difference between this chapter and the previous oneis that the communication model is different. As discussed above, a secret-key-agreement protocol is a three-party protocol, consisting of the legitimatesender and receiver, as well as the eavesdropper. Thus, a priori, it is not ob-vious how to connect entanglement measures, which are used in two-partyprotocols, to secret-key-agreement protocols. To overcome this problem, weexploit the purification principle to establish a powerful equivalence betweentripartite secret-key-agreement protocols and bipartite private-state distilla-tion protocols. After doing so, we can apply entanglement measures to boundthe rate at which it is possible to distill bipartite private states in a bipartiteprivate-state distillation protocol, and then by appealing to the aforemen-tioned equivalence, we can bound the rate at which it is possible to distillsecret key in a tripartite secret-key-agreement protocol.

The main conclusion of this chapter is that entanglement measures suchas squashed entanglement and relative entropy of entanglement (and the lat-ter’s variations) are upper bounds on the secret-key-agreement capacity of

894

Page 906: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

quantum channels. At the end of the chapter, we evaluate these bounds forvarious channels of interest in order to determine the fundamental limitationson secret key agreement for these channels.

15.1 n-Shot Secret-Key-Agreement Protocol

We begin by discussing the most general form for a secret-key-agreement pro-tocol conducted over a quantum channel. The most important point to clar-ify before starting is the communication model, in particular, to address thequestion of who has access to what. First, we suppose that there is a quan-tum channel NA→B connecting the legitimate sender Alice to the legitimatereceiver Bob. Alice has exclusive access to the input system A, and Bob hasexclusive access to the output system B. As we know from Chapter 3, everyquantum channel has an isometric channel UA→BE extending it, such that theoriginal channel NA→B is recovered by tracing over the purifying or environ-ment system E:

NA→B = TrE UA→BE. (15.1.1)

Taking the same perspective as that in Chapter 11, with the idea that a power-ful, fully quantum eavesdropper could have access to every system to whichthe legitimate parties do not have access, we suppose that the quantum eaves-dropper has access to the environment system E. Furthermore, in a secret-key-agreement protocol, the legitimate parties are allowed to use a public,classical communication channel, in addition to the quantum channel NA→B,in order to generate a secret key. Since this channel is public, we suppose thatthe eavesdropper has access to all of the classical data exchanged between thelegitimate parties.

In more detail, an n-shot protocol for secret key agreement consists ofn calls to the quantum channel NA→B, interleaved by LOPC channels. Sinceall of the classical data exchanged between Alice and Bob is assumed to bepublic and available to the eavesdropper as well, we call these channels “LOPC”channels, which is an abbreviation of “local operations and public communi-cation.” In fact, a protocol for secret key agreement has essentially the samestructure as a protocol for LOCC-assisted quantum communication, as dis-cussed in Section 14.1, with the exception that the systems at the end shouldhold a secret key instead of a maximally entangled state.

A protocol for secret key agreement is depicted in Figure [REF - NEED

895

Page 907: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

FIGURE], and it consists of the following elements:

(ρA′1 A1B′1Y1, L(i)

A′i−1Bi−1B′i−1→A′i AiB′iYin

i=2,L(n+1)A′nBnB′n→KAKBYn+1

). (15.1.2)

All systems labeled by A belong to Alice, those labeled by B belong to Bob,and those labeled by Y are classical systems belonging to Eve, representing acopy of the classical data exchanged by Alice and Bob in a round of LOPC.In the above, ρA′1 A1B′1Y1

is a separable state, L(i)A′i−1Bi−1B′i−1→A′i AiB′iYi

is an LOPC

channel for i ∈ 2, . . . , n, and L(n+1)A′nBnB′n→KAKBYn+1

is a final LOPC channelthat generates the approximate secret key in systems KA and KB. Let C de-note all of these elements, which together constitute the secret-key-agreementprotocol. As with LOCC-assisted quantum communication, all systems withprimed labels should be understood as local quantum memory or scratch reg-isters that Alice and Bob can employ in this information-processing task. Wealso assume that they are finite-dimensional, yet arbitrarily large. The un-primed systems are the ones that are either input to or output from the quan-tum communication channel NA→B.

The secret-key-agreement protocol begins with Alice and Bob performingan LOPC channel L(1)

∅→A′1 A1B′1Y1, which leads to the separable state ρ

(1)A′1 A1B′1Y1

mentioned above, where A′1 and B′1 are systems that are finite-dimensional

yet arbitrarily large. In particular, the state ρ(1)A′1 A1B′1Y1

has the following form:

ρ(1)A′1 A1B′1Y1

:= ∑y1

pY1(y1)τy1A′1 A1

⊗ ζy1B′1⊗ |y1〉〈y1|Y1 , (15.1.3)

where Y1 is a classical random variable corresponding to the message ex-changed between Alice and Bob, which is needed to establish this state. Theclassical system Y1 belongs to the eavesdropper. Also, τy1

A′1 A1y1 and ζy1

B′1y1

are sets of quantum states and pY1 is a probability distribution. Note that thereduced state for Alice and Bob is a generic separable state of the followingform:

ρ(1)A′1 A1B′1

= ∑y1

pY1(y1)τy1A′1 A1

⊗ ζy1B′1

. (15.1.4)

The system A1 of ρ(1)A′1 A1B′1Y1

is such that it can be fed into the first channel use.Alice then sends system A1 through the first channel use, leading to a state

ω(1)A′1B1B′1E1Y1

:= UNA1→B1E1

(ρ(1)A′1 A1B′1Y1

). (15.1.5)

896

Page 908: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

Note that we write the channel use as the isometric channel UNA1→B1E1

thatextends NA1→B1 , since we would like to incorporate the eavesdropper’s sys-tem E1 explicitly into the description of the protocol. Alice and Bob thenperform the LOPC channel L(2)

A′1B1B′1→A′2 A2B′2Y2, which leads to the state

ρ(2)A′2 A2B′2E1Y1Y2

:= L(2)A′1B1B′1→A′2 A2B′2Y2

(ω(1)A′1B1B′1E1Y1

). (15.1.6)

The LOPC channel L(2)A′1B1B′1→A′2 A2B′2Y2

can be written as

L(2)A′1B1B′1→A′2 A2B′2Y2

:= ∑y2

Ey2A′1→A′2 A2

⊗ Fy2B1B′1→B′2

⊗ |y2〉〈y2|Y2 . (15.1.7)

In the above, Ey2A′1→A′2 A2

y2 and Fy2B1B′1→B′2

y2 are sets of completely positive

maps such that the sum map ∑y2E

y2A′1→A′2 A2

⊗ Fy2B1B′1→B′2

is trace preserving.The classical system Y2 represents the eavesdropper’s copy of the classicaldata exchanged by Alice and Bob in this round of LOPC. Note that the re-duced channel acting on Alice and Bob’s systems is as follows:

L(2)A′1B1B′1→A′2 A2B′2

= ∑y2

Ey2A′1→A′2 A2

⊗ Fy2B1B′1→B′2

. (15.1.8)

Alice sends system A2 through the second channel use UNA2→B2E2

, leading tothe state

ω(2)A′2B2B′2E1E2Y1Y2

:= UNA2→B2E2

(ρ(2)A′2 A2B′2E1Y1Y2

). (15.1.9)

This process iterates: the protocol uses the channel n times. In general, wehave the following states for all i ∈ 2, . . . , n:

ρ(i)A′i AiB′i E

i−11 Yi

1:= L

(i)A′i−1Bi−1B′i−1→A′i AiB′iYi

(ω(i−1)A′i−1Bi−1B′i−1Ei−1

1 Yi−11

), (15.1.10)

ω(i)A′iBiB′i E

i1Yi

1:= UN

Ai→BiEi(ρ

(i)A′i AiB′i E

i−11 Yi

1), (15.1.11)

where L(i)A′i−1Bi−1B′i−1→A′i AiB′iYi

is an LOPC channel that can be written as

L(i)A′i−1Bi−1B′i−1→A′i AiB′iYi

:= ∑yi

EyiA′i−1→A′i Ai

⊗ FyiBi−1B′i−1→B′i

⊗ |yi〉〈yi|Yi . (15.1.12)

In the above, EyiA′i−1→A′i Ai

yi and FyiBi−1B′i−1→B′i

yi are sets of completely pos-

itive maps such that the sum map ∑yiE

yiA′i−1→A′i Ai

⊗ FyiBi−1B′i−1→B′i

is trace pre-

serving. The classical system Yi represents the eavesdropper’s copy of the

897

Page 909: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

classical data exchanged by Alice and Bob in this round of LOPC. Note thatthe reduced channel acting on Alice and Bob’s systems is as follows:

L(i)A′i−1Bi−1B′i−1→A′i AiB′i

= ∑yi

EyiA′i−1→A′i Ai

⊗ FyiBi−1B′i−1→B′i

. (15.1.13)

In (15.1.10)–(15.1.11), we have employed the following shorthands: Ei1 ≡

E1 · · · Ei and Yi1 ≡ Y1 · · ·Yi. The final step of the protocol consists of an LOPC

channel L(n+1)A′nBnB′n→KAKBYn+1

, which generates the key systems KA and KB forAlice and Bob, respectively. The protocol’s final state is as follows:

ωKAKBEn1 Yn+1

1:= L

(n+1)A′nBnB′n→KAKBYn+1

(ω(n)A′nBnB′nEn

1 Yn1). (15.1.14)

Note that the final LOPC channel can be written as

L(n+1)A′nBnB′n→KAKBYn+1

:= ∑yn+1

Eyn+1A′n→KA

⊗ Fyn+1BnB′n→KB

⊗ |yn+1〉〈yn+1|Yn+1 , (15.1.15)

and the reduced final channel acting on Alice and Bob’s systems is as follows:

L(n+1)A′nBnB′n→KAKB

= ∑yn+1

Eyn+1A′n→KA

⊗ Fyn+1BnB′n→KB

. (15.1.16)

The goal of the protocol is for the final state ωKAKBEn1 Yn+1

1to be nearly in-

distinguishable from a tripartite secret-key state, and we define the privacyerror of the code to be as follows:

perr(C) := 1− F(ωKAKBEn1 Yn+1

1, ΦKAKB ⊗ σEn

1 Yn+11

), (15.1.17)

where σEn1 Yn+1

1is some state of the eavesdropper’s systems, F denotes the

quantum fidelity (Definition 3.55) and the maximally classically correlatedstate ΦKAKB is defined as

ΦKAKB :=1K

K

∑k=1|k〉〈k|KA ⊗ |k〉〈k|KB . (15.1.18)

Intuitively, the privacy error perr(C) quantifies how distinguishable the finalstate ωKAKBEn

1 Yn+11

is from an ideal tripartite secret-key state ΦKAKB ⊗ σEn1 Yn+1

1,

in which the key values in KA and KB are perfectly correlated and uniformlyrandom and in tensor product with the eavesdropper’s systems En

1 Yn+11 . For

898

Page 910: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

an ideal tripartite secret-key state, it is difficult for an eavesdropper to guessthe value of the key by observing the content of her quantum systems En

1 Yn+11 .

In fact, the chance for an eavesdropper to guess the key value of an idealsecret-key state is equal to 1/K, which is no better than random guessing.

Due to the isometric invariance of the fidelity and the fact that all isomet-ric extensions of a channel are related by an isometry acting on the environ-ment system, the privacy error in (15.1.17) is invariant under any choice of anisometric channel UN

A→BE that extends the original channel NA→B. Thus, therelevant performance parameters for a secret-key agreement protocol do notchange with the particular isometric extension chosen. This is to be expectedsince the actual information that the eavesdropper gains in the protocol doesnot depend on the particular isometric extension chosen.

Definition 15.1 (n, K, ε) Secret-Key-Agreement Protocol

Let (ρ(1)A′1 A1B′1Y1, L(i)

A′i−1Bi−1B′i−1→A′i AiB′iYin

i=2,L(n+1)A′nBnB′n→KAKBYn+1

) be the ele-

ments of an n-round LOPC-assisted secret-key-agreement protocol overthe channel NA→B. The protocol is called an (n, K, ε) protocol, withε ∈ [0, 1], if the privacy error perr(C) ≤ ε.

15.1.1 Equivalence between Secret Key Agreement and LOPC-Assisted Private Communication

The goal of a secret-key-agreement protocol is for Alice and Bob to estab-lish an approximation of an ideal secret key, the latter begin uniformly dis-tributed, perfectly correlated, and independent of Eve’s quantum systems.What is the use of this secret key? As it turns out, it can be used for privatecommunication by means of the one-time pad protocol. This in turn meansthat secret key agreement and private classical communication are equivalentinformation processing tasks when public classical communication is avail-able for free, and the goal of this section is to clarify this point.

An LOPC-assisted private communication protocol uses a quantum chan-nel n times along with public classical communication to transmit an arbitrarymessage of size K privately from Alice to Bob in such a way that the fidelity ofthe actual state at the end of the protocol and the ideal state is no smaller than1− ε. In more detail, let Φp

MA MBdenote the following state in which there is

899

Page 911: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

an arbitrary distribution p over the message:

ΦpMA MB

:=K

∑m=1

p(m)|m〉〈m|MA ⊗ |m〉〈m|MB . (15.1.19)

Let ωpMA MBEn

1 Yn+11

denote the final state of the protocol, which is defined in the

same way as (15.1.14), with the exception that the message distribution p isno longer uniform. Then an (n, K, ε) LOPC-assisted private communicationprotocol is defined similarly to an (n, K, ε) secret-key-agreement protocol asgiven above, except that the following inequality holds

maxp:M→[0,1]

1− F(ωpMA MBEn

1 Yn+11

, ΦpMA MB

⊗ σEn1 Yn+1

1) ≤ ε. (15.1.20)

where the maximization is over all message distributions p and σEn1 Yn+1

1is

some fixed state of the eavesdropper’s systems that is independent of themessage transmitted.

By the use of the one-time pad protocol, it follows that an (n, K, ε) secret-key-agreement protocol leads to an (n, K, ε) LOPC-assisted private communi-cation protocol. To see how the one-time pad protocol works in conjunctionwith secret key agreement, suppose that Alice and Bob have completed an(n, K, ε) secret-key-agreement protocol as described in the previous section,with the final state ωKAKBEn

1 Yn+11

satisfying perr(C) ≤ ε. Alice then brings inher local message registers MA and MA′ , so that the overall quantum state is

ΦpMA MA′

⊗ωKAKBEn1 Yn+1

1(15.1.21)

The one-time pad protocol is an LOPC protocol in which Alice then performsthe following classical computation, represented as a quantum channel, onher message register MA′ and her key register KA:

∑k,m|m⊕ k〉CA〈m|MA′ 〈k|KA(·)|m〉MA′ |k〉KA〈m⊕ k|CA , (15.1.22)

where the addition ⊕ is modulo K. She then transmits the classical registerCA over a public classical channel to Bob. Eve can make a copy CA′ of thisclassical register containing the value m⊕ k, but since Bob’s key register KB isnot available to her, the register CA′ is nearly independent of Alice’s messageregister MA (depending on how small ε is). Bob then performs the follow-ing classical computation, represented as a quantum channel, on his received

900

Page 912: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

register CA and his key register KB:

∑c,k|c k〉MB〈c|CA〈k|KB(·)|c〉CA |k〉KB〈c k|MB , (15.1.23)

where the subtraction is modulo K. Let ωpMA MBEn

1 Yn+11 CA′

denote the final

state of the protocol. By applying the data-processing inequality to (15.1.20),as well as the fact mentioned above that CA′ is independent of MA and MB inthe ideal case, the following inequality holds

maxp:M→[0,1]

1− F(ωpMA MBEn

1 Yn+11 CA′

, ΦpMA MB

⊗ σEn1 Yn+1

1 CA′) ≤ ε, (15.1.24)

where σEn1 Yn+1

1CA′ is a fixed state of the eavesdropper’s systems. Thus, an

(n, K, ε) secret-key-agreement protocol leads to an (n, K, ε) LOPC-assisted pri-vate communication protocol, as claimed.

The other implication is trivial: while employing an (n, K, ε) LOPC-assistedprivate communication protocol, Alice can choose the distribution of the mes-sage to be uniform, and then an (n, K, ε) LOPC-assisted private communi-cation protocol leads to an (n, K, ε) secret-key-agreement protocol. Thus, itfollows that LOPC-assisted private communication and secret key agreementare equivalent whenever public classical communication is available for free.

15.2 Equivalence between Secret Key Agreementand LOCC-Assisted Private-State Distillation

There is a deep and powerful equivalence between a secret-key-agreementprotocol as described above and a protocol that uses LOCC assistance to dis-till a bipartite private state (recall Definition 10.2). This equivalence is helpfulin the analysis of secret-key-agreement protocols, in the sense that one canuse tools from entanglement theory in order to establish bounds on the rateat which secret key agreement is possible.

The main idea behind this equivalence is to apply the purification princi-ple to a secret-key-agreement protocol and then examine the consequences.That is, we can purify each step of the secret-key-agreement protocol dis-cussed in the previous section, and then we can examine various reducedstates at each step. In what follows, we detail such a purified protocol.

901

Page 913: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

15.2.1 The Purified Protocol

To begin with, recall that the initial state ρ(1)A′1 A1B′1

of a secret-key-agreement

protocol is a separable state of the form in (15.1.3). The state ρ(1)A′1 A1B′1

can bepurified as follows

|ρ(1)〉A′1 A1SA1B′1SB1Y1

:= ∑y1

√pY1(y1)|τy1〉A′1 A1SA1

⊗ |ζy1〉B′1SB1⊗ |y1〉Y1 ,

(15.2.1)where the systems SA1 and SB1 are known as local “shield” systems. In prin-ciple, the shield systems SA1 and SB1 could be held by Alice and Bob, re-spectively, and the states |τy1〉A′1 A1SA1

and |ζy1〉B′1SB1purify τ

y1A′1 A1

and ζy1B′1

in(15.1.3), respectively. We assume without loss of generality that the shieldsystems contain a coherent classical copy of the classical random variable Y1,such that tracing over systems SA1 and SB1 recovers the original state in (15.1.3).As before, Eve possesses system Y1, which contains a coherent classical copyof the classical data exchanged.

Each LOPC channel L(i)A′i−1Bi−1B′i−1→A′i AiB′i

for i ∈ 2, . . . , n is of the form in

(15.1.12) and can be purified to an isometry in the following way:

UL(i)

A′i−1Bi−1B′i−1→A′i AiSAiB′iSBi

Yi:=

∑yi

UEyiA′i−1→A′i AiSAi

⊗UFyiBi−1B′i−1→B′iSBi

⊗ |yi〉Yi , (15.2.2)

where UEyiA′i−1→A′i AiSAi

yi and UFyiBi−1B′i−1→B′iSBi

yi are collections of linear op-

erators, each of which is a contraction, that is,

‖UEyiA′i−1→A′i AiSAi

‖∞, ‖UFyiBi−1B′i−1→B′iSBi

‖∞ ≤ 1, (15.2.3)

such that the linear operator in (15.2.2) is an isometry.

It is important to note here that the isometry UL(i)

A′i−1Bi−1B′i−1→A′i AiSAiB′iSBi

Yire-

sults from purifying each step of an LOPC channel. That is, an LOPC channelis implemented as a sequence of one-way LOPC channels, which each consistof a generalized measurement by one party, classical communication of themeasurement outcome to the other, and a channel by the other party, con-ditioned on the outcome of the measurement. So when purifying the LOPC

902

Page 914: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

channel, we purify each of these steps, and the resulting purified channel iswhat is represented in (15.2.2).

The systems SAi and SBi in (15.2.2) are shield systems belonging to Aliceand Bob, respectively, and we assume without loss of generality that theycontain a coherent classical copy of the classical random variable Yi, such thattracing over the systems SAi and SBi recovers the original LOPC channel in(15.1.12). As before, Yi is a system held by Eve, containing a coherent classicalcopy of the classical data exchanged in this round.

Thus, a purification of the state ρ(i)A′i AiB′i

after each LOPC channel is as fol-lows:

|ρ(i)〉A′i AiSAi1

B′iSBi1

Ei−11 Yi

1:=

UL(i)

A′i−1Bi−1B′i−1→A′i AiSAiB′iSBi

Yi|ω(i−1)〉A′i−1Bi−1B′i−1S

Ai−11

SBi−1

1Ei−1

1 Yi−11

, (15.2.4)

where we have employed the shorthands SAi1≡ SA1 · · · SAi and SBi

1≡ SB1 · · · SBi ,

with a similar shorthand for Ei−11 and Yi

1 as before. A purification of the state

ω(i)A′iBiB′i

after each use of the channel NA→B is

|ω(i)〉A′iBiSAi1

B′iSBi1

Ei1Yi

1:= UN

Ai→BiEi|ρ(i)〉A′i AiSAi

1B′iSBi

1Ei−1

1 Yi1, (15.2.5)

where UNAi→BiEi

is an isometric extension of the ith channel use NAi→Bi .

The final LOPC channel takes the form in (15.1.15), and it can be purifiedto an isometry similarly as

UL(n+1)

A′nBnB′n→KASAn+1KBSBn+1Yn+1

:=

∑yn+1

UEyn+1A′n→KASAn+1

⊗UFyn+1BnB′n→KBSBn+1

⊗ |yn+1〉Yn+1 . (15.2.6)

The systems SAn+1 and SBn+1 are again shield systems belonging to Alice andBob, respectively, and we assume again that they contain a coherent classicalcopy of the classical random variable Yn+1, such that tracing over SAn+1 andSBn+1 recovers the original LOPC channel in (15.1.15). As before, Yn+1 is asystem held by Eve, containing a coherent classical copy of the classical dataexchanged in this round.

903

Page 915: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

The final state at the end of the purified protocol is a pure state |ω〉KASAKBSBEnYn+1 ,given by

|ω〉KASAKBSBEnYn+1 :=

UL(n+1)

A′nBnB′n→KASAn+1KBSBn+1Yn+1

|ω(n)〉A′nBnSAn1

B′nSBn1

En1 Yn

1. (15.2.7)

Alice is in possession of the key system KA and the shield systems SA ≡SA1 · · · SAn+1 , Bob possesses the key system KB and the shield systems SB ≡SB1 · · · SBn+1 , and Eve holds the environment systems En ≡ E1 · · · En. Addi-tionally, Eve has coherent copies Yn+1 ≡ Y1 · · ·Yn+1 of all the classical dataexchanged.

15.2.2 LOCC-Assisted Bipartite Private-State Distillation

As a consequence of the purification principle, on the one hand, if we traceover the shield systems at every step, then we simply recover the originaltripartite secret-key-agreement protocol detailed in Section 15.1. On the otherhand, suppose that we instead trace over all of Eve’s systems at each step.Due to the fact that each state of the Y systems is a coherent classical copy,the resulting reduced states consist of a classical mixture of various statesof Alice and Bob’s systems, as would arise in an LOCC-assisted quantumcommunication protocol.

It is worthwhile to examine how each step changes after tracing over Eve’ssystems of the purified protocol. For the first step, the reduced state of Aliceand Bob’s systems is a separable state of the following form:

ρ(1)A′1 A1SA1

B′1SB1= ∑

y1

pY1(y1)τy1A′1 A1SA1

⊗ ζy1B′1SB1

, (15.2.8)

where τy1A′1 A1SA1

= |τy1〉〈τy1 |A′1 A1SA1and ζ

y1B′1SB1

= |ζy1〉〈ζy1 |B′1SB1. Tracing over

Eve’s system Yi of each isometry UL(i)

A′i−1Bi−1B′i−1→A′i AiSAiB′iSBi

Yileads to the fol-

lowing LOCC channel:

L(i)A′i−1Bi−1B′i−1→A′i AiSAi

B′iSBi= ∑

yi

UEyiA′i−1→A′i AiSAi

⊗UFyiBi−1B′i−1→B′iSBi

, (15.2.9)

where

UEyiA′i−1→A′i AiSAi

(·) := UEyiA′i−1→A′i AiSAi

(·)[UEyiA′i−1→A′i AiSAi

]†, (15.2.10)

904

Page 916: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

UFyiBi−1B′i−1→B′iSBi

(·) := UFyiBi−1B′i−1→B′iSBi

(·)[UFyiBi−1B′i−1→B′iSBi

]†. (15.2.11)

Tracing over Eve’s system Yn+1 of the final isometry leads to the followingLOCC channel:

L(n+1)A′nBnB′n→KASAn+1

KBSBn+1= ∑

yn+1

UEyn+1A′n→KASAn+1

⊗UFyn+1BnB′n→KBSBn+1

, (15.2.12)

with a similar convention as in (15.2.10)–(15.2.11) for the maps UEyn+1A′n→KASAn+1

and UFyn+1BnB′n→KBSBn+1

.

The states at every step of the protocol are then given by the following forall i ∈ 2, . . . , n:

ρ(i)A′i AiSAi

1B′iSBi

1

:= L(i)A′i−1Bi−1B′i−1→A′i AiSAi

B′iSBi(ω

(i−1)A′i−1S

Ai−11

Bi−1B′i−1SBi−1

1

), (15.2.13)

ω(i)A′iSAi

1BiB′iSBi

1

:= NAi→Bi(ρ(i)A′i AiSAi

1B′iSBi

1

), (15.2.14)

and the final state of the protocol is given by

ωKASAKBSB := L(n+1)A′nBnB′n→KASAn+1

KBSBn+1(ω

(n)A′nBnSAn

1B′nSBn

1

). (15.2.15)

Finally, by employing (15.1.17) and Proposition 10.5, the following condi-tion holds

perr(C) = 1− F(ωKASAKBSB , γKASAKBSB), (15.2.16)

where γKASAKBSB is a bipartite private state of the form in Theorem 10.3. Thus,applying Definition 15.1, it follows that

F(ωKASAKBSB , γKASAKBSB) ≥ 1− ε. (15.2.17)

We can now make a critical observation. By tracing over Eve’s systems En

and Yn+1 at every step of the purified protocol as we did above, it is clear thatthe resulting protocol is an LOCC-assisted protocol that distills an approx-imate bipartite private state on the systems KASAKBSB, with performanceparameter given by (15.2.17). Indeed, the initial state in (15.2.8) is a separa-ble state, and the channels in (15.2.9) and (15.2.12) are LOCC channels, andso the protocol has the same form as an LOCC-assisted quantum communi-cation protocol, as we studied in the previous chapter. However, the goal

905

Page 917: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

of this LOCC-assisted bipartite private-state distillation is not as stringent asit was in the previous chapter, for LOCC-assisted quantum communication.Namely, it is only necessary to distill an approximate bipartite private stateand not necessarily an approximate maximally entangled state; but keep inmind that a maximally entangled state is a particular kind of bipartite pri-vate state. Thus, starting with a tripartite secret-key-agreement protocol, wecan apply the purification principle, then trace over the systems of the eaves-dropper, and the result is a bipartite private-state distillation protocol assistedby LOCC.

Alternatively, this reasoning can go in the opposite direction. Suppose in-stead that we had started with an LOCC-assisted bipartite private-state dis-tillation protocol of the above form. Then we could purify it as we did in theprevious subsection, and after doing so, we could trace over Alice and Bob’sshield systems. If the protocol satisfies the condition in (15.2.17), then theresulting protocol would be an (n, K, ε) tripartite secret-key-agreement proto-col, which follows as a consequence of the equivalence between approximatebipartite private states and tripartite secret-key states, as given in Proposi-tion 10.5.

As a consequence of this equivalence between tripartite secret-key-agreementprotocols and bipartite private-state distillation protocols, we can employ thetools of entanglement theory in order to analyze bipartite private-state dis-tillation protocols, in a way similar to how we did in the last chapter. Forexample, if our goal is to determine upper bounds on the rate at which itis possible to generate secret key in a secret-key-agreement protocol, thenwe can employ an entanglement measure to analyze the equivalent bipar-tite private-state distillation protocol in order to accomplish the goal. In fact,this is exactly what we accomplish in this chapter, demonstrating that thesquashed entanglement of a channel serves as an upper bound on secret-keyrates, and that variations of the relative entropy of entanglement, similar inspirit to the Rains relative entropy, serve as upper bounds on secret-key ratesas well.

15.2.2.1 Unboundedness of Shield Systems in a Bipartite Private-State Dis-tillation Protocol

One observation that we make here is that the shield systems in a bipartiteprivate-state distillation protocol are finite-dimensional, yet arbitrarily large.That is, there is no bound that we can establish on their dimension for a

906

Page 918: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

generic private-state distillation protocol, and this unboundedness is a con-sequence of the fact that the shield systems result from purifying the localmemory or scratch registers of Alice and Bob, which in turn have no boundon their dimension. This unboundedness poses a challenge when trying toestablish upper bounds on the rate at which secret key agreement, or equiva-lently, bipartite private-state distillation is possible. However, there are meth-ods for handling this unboundedness that we detail later.

15.2.3 Relation between Secret Key Agreement and LOCC-Assisted Quantum Communication

Due to the fact that a maximally entangled state is a particular kind of bipar-tite private state and due to the equivalence between secret key agreementand LOCC-assisted bipartite private-state distillation, we arrive at the fol-lowing conclusion, which relates LOCC-assisted quantum communication tosecret key agreement:

Proposition 15.2

Let NA→B be a quantum channel, let n, K ∈ N, and let ε ∈ [0, 1]. Thenan (n, K, ε) LOCC-assisted quantum communication protocol is also an(n, K, ε) protocol for secret key agreement.

This statement is a rather simple observation, but it has consequences forcapacities related to these tasks. That is, from this statement, we can con-clude that the LOCC-assisted quantum capacity of a given quantum channelis bounded from above by its secret-key-agreement capacity. Thus, any upperbound established on the secret-key-agreement capacity of a quantum chan-nel is also an upper bound on its LOCC-assisted quantum capacity. Further-more, if a given quantity is a lower bound on the LOCC-assisted quantumcapacity of a quantum channel, then it is also a lower bound on its secret-key-agreement capacity.

907

Page 919: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

15.2.4 n-Shot Secret-Key-Agreement Protocol Assisted by Pub-lic Separable Channels

In Section 14.2, we generalized the notion of an LOCC-assisted quantum com-munication protocol to one that is assisted by PPT-preserving channels. Wenote here that we can consider a similar kind of generalization for secret-key-agreement protocols.

Recall from Section 3.2.11 that any LOCC channel LAB can be written as aseparable channel of the following form:

LAB→A′B′ = ∑yE

yA→A′ ⊗ F

yB→B′ , (15.2.18)

where EyA→A′y and Fy

B→B′y are sets of completely positive maps such thatthe sum map ∑y E

yA ⊗ F

yB is trace preserving. However, the converse state-

ment is not true. That is, it is not possible in general to implement an arbitraryseparable channel of the form above as an LOCC channel.

Thus, we can allow for a slight generalization of a secret-key-agreementprotocol to one that is assisted by public separable channels. Indeed, we de-fine a public separable channel to be the following generalization of an LOPCchannel:

LAB→A′B′Y = ∑yE

yA→A′ ⊗ F

yB→B′ ⊗ |y〉〈y|Y, (15.2.19)

where Alice and Bob have access to the A and B systems, respectively, andthe eavesdropper has access to the system Y. The only requirement for a pub-lic separable channel is that Ey

A→A′y and FyB→B′y are sets of completely

positive maps such that the sum map ∑y EyA ⊗ F

yB is trace preserving. Similar

to the distinction between LOCC and separable channels, it is not possiblein general to implement a public separable channel via local operations andpublic classical communication.

The main point that we make in this section is that we can generalize asecret-key-agreement protocol to be assisted by public separable channelsrather than just LOPC channels. For fixed privacy error, the resulting pro-tocol achieves a rate of communication that is either the same or higher thanthat achieved by an LOPC-assisted protocol, due to the fact that every LOPCchannel is a public separable channel. Such a protocol is defined in the sameway as we did in Section 15.1, and then we arrive at the following definition:

908

Page 920: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

Definition 15.3 (n, K, ε) Secret-Key-Agreement Protocol Assisted byPublic Separable Channels

Let C := (ρ(1)A′1 A1B′1Y1

, L(i)A′i−1Bi−1B′i−1→A′i AiB′iYi

ni=2,L(n+1)

A′nBnB′n→KAKBYn+1)

be the elements of an n-round public-separable-assisted secret-key-agreement protocol over the channel NA→B. The protocol is called an(n, K, ε) protocol, with ε ∈ [0, 1], if the privacy error perr(C) ≤ ε.

Furthermore, the equivalence between secret key agreement and bipartiteprivate-state distillation, as outlined in Sections 15.2.1 and 15.2.2, still holdsunder this generalization (one can check that all of the steps given in Sec-tions 15.2.1 and 15.2.2 still hold). However, the correspondence changes asfollows: to any tripartite (n, K, ε) secret-key-agreement protocol assisted bypublic separable channels, there exists an (n, K, ε) bipartite private-state dis-tillation protocol assisted by separable channels and vice versa. Thus, wecan again employ the tools of entanglement theory to analyze secret-key-agreement protocols assisted by public separable channels.

15.2.5 Amortized Entanglement Bound for Secret-Key-AgreementProtocols

Due to the equivalence between tripartite secret-key-agreement protocols andbipartite private-state distillation protocols, we can use the tools of entan-glement theory to establish upper bounds on the rate at which secret-key-agreement is possible. Namely, we can apply the idea of amortized entangle-ment from the previous chapter in order to establish a generic upper boundin terms of an amortized entanglement measure. In fact, by the same stepsused to arrive at Proposition 14.2, we find the following bound:

Proposition 15.4

Let NA→B be a quantum channel, let ε ∈ [0, 1], and let E be an entan-glement measure that is equal to zero for all separable states. For an(n, K, ε) secret-key-agreement protocol, the following bound holds

E(KASA; KBSB)ω ≤ n · EA(N), (15.2.20)

909

Page 921: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

where ωKASAKBSB is the final state resulting from the equivalent (n, K, ε)

bipartite private-state distillation protocol (see (15.2.15)) and EA(N) isthe amortized entanglement of the channel NA→B, as given in Defini-tion 5.41.

Just as the bound from Proposition 14.2 depends on the final state of theLOCC-assisted quantum communication protocol, the same is true for thebound in (15.2.20). The bound is thus not a universal bound (a universalbound would depend only on the protocol parameters n, K, and ε). Thus,one of the main goals of the forthcoming sections is to employ particular en-tanglement measures in order to arrive at universal bounds for secret-key-agreement protocols.

We should also observe that the quantity E(KASA; KBSB)ω in the boundin (15.2.20) can be understood as quantifying amount of entanglement be-tween the systems KASA and KBSB. As such, the shield systems SA andSB are involved, and they can have arbitrarily large dimension. Thus, onemust account for this in the analysis of the entanglement E(KASA; KBSB)ω.For example, in the previous chapter, we analyzed the analogous quantityE(MA; MB)ω by employing squashed entanglement. In particular, since thestate ωMA MB there was an approximate maximally entangled state, we ap-plied the uniform continuity of squashed entanglement from Proposition 5.38in order to evaluate E(MA; MB)ω. When we did so, the dimension of themaximally entangled state appeared in the continuity bound, and this wasacceptable there because the dimension of the maximally entangled state is di-rectly related to the rate of entanglement distillation. However, it is not clearwhether we can take such an approach, via uniform continuity of squashedentanglement, when analyzing bipartite private-state distillation, due to thefact that the shield systems do not necessarily have a bounded dimension. Assuch, we employ another method to analyze such protocols.

Just as the bound in Proposition 14.2 simplifies for teleportation-simulablechannels and particular entanglement measures, the same is true for the boundgiven in Proposition 15.4, by employing the same reasoning:

Corollary 15.5 Reduction by Teleportation

Let ES denote an entanglement measure that is subadditive with respectto states (Definition 5.1.9) and equal to zero for all separable states. Let

910

Page 922: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

NA→B be a channel that is LOCC-simulable with associated resourcestate θRB′ (Definition 3.40). Let ε ∈ [0, 1]. For an (n, K, ε) secret-key-agreement protocol, the following bound holds

ES(KASA; KBSB)ω ≤ n · ES(R; B′)θ, (15.2.21)

where ωKASAKBSB is the final state resulting from the equivalent (n, K, ε)bipartite private-state distillation protocol.

Just as we found bounds that apply to PPT-assisted quantum communi-cation in terms of entanglement measures that are monotone with respect toPPT-preserving channels, we can also find bounds that apply to secret-key-agreement protocols that are assisted by public separable channels:

Proposition 15.6

Let NA→B be a quantum channel, let ε ∈ [0, 1], and let E be an entan-glement measure that is monotone non-increasing with respect to sepa-rable channels and equal to zero for all separable states. For an (n, K, ε)secret-key-agreement protocol assisted by public separable channels, thefollowing bound holds

E(KASA; KBSB)ω ≤ n · EA(N), (15.2.22)

where ωKASAKBSB is the final state resulting from the equivalent (n, K, ε)bipartite private-state distillation protocol assisted by separable chan-nels and EA(N) is the amortized entanglement of the channel NA→B, asgiven in Definition 5.41.

Finally, this bound again simplifies for channels that are simulable by theaction of a separable channel on a resource state θRB′ (separable-simulablechannels):

Corollary 15.7

Let ES denote an entanglement measure that is that is monotone non-increasing with respect to separable channels, subadditive with respectto states (Definition 5.1.9), and equal to zero for separable states. LetNA→B be a channel that is separable-simulable with associated resource

911

Page 923: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

state θRB′ (Definition 3.41). Let ε ∈ [0, 1]. For an (n, K, ε) secret-key-agreement protocol assisted by public separable channels, the followingbound holds

ES(KASA; KBSB)ω ≤ n · ES(R; B′)θ, (15.2.23)

where ωKASAKBSB is the final state resulting from the equivalent (n, K, ε)bipartite private-state distillation protocol assisted by separable chan-nels.

15.3 Squashed Entanglement Upper Bound on theNumber of Transmitted Private Bits

We now employ the squashed entanglement in order to bound the numberof private bits that an n-shot secret-key-agreement protocol can generate. Wehave already shown in Section 5.4 that the squashed entanglement satisfies allof the requirements needed to apply it in Proposition 15.4. Namely, it is equalto zero for separable states, it is an entanglement measure (non-increasingunder the action of an LOCC channel), and the squashed entanglement ofa channel does not increase under amortization (Theorem 5.58). Putting allof these items together, we can already conclude the following bound for an(n, K, ε) secret-key-agreement protocol:

Esq(KASA; KBSB)ω ≤ n · Esq(N), (15.3.1)

where Esq(N) is the squashed entanglement of a channel (Definition 5.39)and ωKASAKBSB is the final state resulting from the equivalent (n, K, ε) bipar-tite private-state distillation protocol. Thus, what remains is to evaluate thesquashed entanglement of an approximate bipartite private state, and this isthe main technical problem that we consider in this section before concludingthat squashed entanglement is an upper bound on secret-key rates.

15.3.1 Squashed Entanglement and Approximate Private States

This subsection establishes Proposition 15.9, which is an upper bound on thelogarithm of the dimension K of a key system of an ε-approximate privatestate, as given in Definition 10.4, in terms of its squashed entanglement, plusanother term depending only on ε and log2 K.

912

Page 924: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

In what follows, we suppose that γAA′BB′ is a private state with key sys-tems AB and shield systems A′B′. Recall from Theorem 10.3 that a privatestate of log2 K private bits can be written in the following form:

γABA′B′ = UABA′B′ (ΦAB ⊗ σA′B′)U†ABA′B′ , (15.3.2)

where ΦAB is a maximally entangled state of Schmidt rank K

ΦAB :=1K ∑

i,j|i〉〈j|A ⊗ |i〉〈j|B, (15.3.3)

andUABA′B′ = ∑

i,j|i〉〈i|A ⊗ |j〉〈j|B ⊗Uij

A′B′ (15.3.4)

is a controlled unitary known as a “twisting unitary,” with each UijA′B′ a uni-

tary operator. Due to the fact that the maximally entangled state ΦAB is unex-tendible, any extension γAA′BB′E of a private state γAA′BB′ necessarily has thefollowing form:

γAA′BB′E = UAA′BB′ (ΦAB ⊗ σA′B′E)U†AA′BB′ , (15.3.5)

where σA′B′E is an extension of σA′B′ .

We start with the following lemma, which applies to any extension of abipartite private state:

Lemma 15.8

Let γAA′BB′ be a bipartite private state, and let γAA′BB′E be an extensionof it, as given above. Then the following identity holds for any suchextension:

2 log2 K = I(A; BB′|E)γ + I(A′; B|AB′E)γ. (15.3.6)

PROOF: First consider that the following identity holds as a consequence oftwo applications of the chain rule for conditional quantum mutual informa-tion:

I(AA′; BB′|E)γ = I(A; BB′|E)γ + I(A′; BB′|AE)γ

= I(A; BB′|E)γ + I(A′; B′|AE)γ + I(A′; B|B′AE)γ. (15.3.7)

Combined with the following identity, which holds for an extension γAA′BB′Eof a private state γAA′BB′ ,

I(AA′; BB′|E)γ = 2 log2 K + I(A′; B′|AE)γ, (15.3.8)

913

Page 925: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

we recover the statement in (15.3.6). So it remains to prove (15.3.8). By defi-nition, we have that

I(AA′; BB′|E)γ = H(AA′E)γ + H(BB′E)γ − H(E)γ − H(AA′BB′E)γ.(15.3.9)

By applying (15.3.3)–(15.3.5), we can write γAA′BB′E as follows:

γAA′BB′E =1K ∑

i,j|i〉〈j|A ⊗ |i〉〈j|B ⊗Uii

A′B′σA′B′E(UjjA′B′)

†. (15.3.10)

Tracing over system B leads to the following state:

γAA′B′E =1K ∑

i|i〉〈i|A ⊗ γi

A′B′E, (15.3.11)

whereγi

A′B′E := UiiA′B′σA′B′E(U

iiA′B′)

†. (15.3.12)

Similarly, tracing over system A of γAA′BB′E leads to

γBA′B′E =1K ∑

i|i〉〈i|B ⊗ γi

A′B′E. (15.3.13)

So these and the chain rule for conditional entropy imply that

H(AA′E)γ = H(A)γ + H(A′E|A)γ = log2 K + H(A′E|A)γ. (15.3.14)

Similarly, we have that

H(BB′E)γ = log2 K + H(B′E|B)γ = log2 K + H(B′E|A)γ, (15.3.15)

where we have used the symmetries in (15.3.11)–(15.3.13). Since γE = γiE for

all i (this is a consequence of γABA′B′ being an ideal private state), we find that

H(E)γ =1K ∑

iH(E)γi = H(E|A)γ. (15.3.16)

Finally, we have that

H(AA′BB′E)γ = H(ABA′B′E)Φ⊗σ (15.3.17)= H(AB)Φ + H(A′B′E)σ (15.3.18)

=1K ∑

iH(A′B′E)γi (15.3.19)

914

Page 926: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

= H(A′B′E|A)γ. (15.3.20)

The first equality follows from unitary invariance of quantum entropy. Thesecond equality follows because the entropy is additive for tensor-productstates. The third equality follows because H(AB)Φ = 0 since ΦAB is a purestate, and σA′B′E is related to γi

A′B′E by the unitary UiiA′B′ . The final equality

follows by applying (15.3.11), and the fact that conditional entropy is a convexcombination of entropies for a classical-quantum state where the conditioningsystem is classical.

Combining (15.3.9), (15.3.14), (15.3.15), (15.3.16), (15.3.20), and the fact that

I(A′; B′|AE)γ = H(A′E|A)γ + H(B′E|A)γ − H(E|A)γ − H(A′B′E|A)γ,(15.3.21)

we recover (15.3.8).

We can now establish the squashed entanglement bound for an approxi-mate bipartite private state:

Proposition 15.9

Let γAA′BB′ be a private state, with key systems AB and shield systemsA′B′, and let ωAA′BB′ be an ε-approximate private state, in the sense that

F(γAA′BB′ , ωAA′BB′) ≥ 1− ε (15.3.22)

for ε ∈ [0, 1]. Suppose that |A| = |B| = K. Then

(1− 2√

ε) log2 K ≤ Esq(AA′; BB′)ω + 2g2(√

ε), (15.3.23)

whereg2(δ) := (δ + 1) log2(δ + 1)− δ log2 δ. (15.3.24)

PROOF: By applying Uhlmann’s theorem for fidelity (Theorem 3.58) and theinequalities relating trace distance and fidelity from Theorem 3.64, for a givenextension ωAA′BB′E of ωAA′BB′ , there exists an extension γAA′BB′E of γAA′BB′such that

12‖γAA′BB′E −ωAA′BB′E‖1 ≤

√ε. (15.3.25)

Defining f1(δ, K) := 2δ log2 K + 2g2(δ), we then find that

2 log2 K = I(A; BB′|E)γ + I(A′; B|AB′E)γ (15.3.26)

915

Page 927: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

≤ I(A; BB′|E)ω + I(A′; B|AB′E)ω + 2 f1(√

ε, K) (15.3.27)≤ I(A; BB′|E)ω + I(A′; B|AB′E)ω

+ I(A′; B′|AE)ω + 2 f1(√

ε, K) (15.3.28)

= I(AA′; BB′|E)ω + 2 f1(√

ε, K). (15.3.29)

The first equality follows from Lemma 15.8. The first inequality follows fromtwo applications of Proposition 4.8 (uniform continuity of conditional mutualinformation). The second inequality follows because I(A′; B′|AE)ω ≥ 0 (thisis strong subadditivity from Theorem 4.6). The last equality is a consequenceof the chain rule for conditional mutual information, as used in (15.3.7). Sincethe inequality

2 log2 K ≤ I(AA′; BB′|E)ω + 2 f1(√

ε, K) (15.3.30)

holds for any extension of ωAA′BB′ , the statement of the proposition follows.

15.3.2 Squashed Entanglement Upper Bound

We now establish the squashed entanglement upper bound on the numberof private bits that a sender can transmit to a receiver by employing a secret-key-agreement protocol. The proof is similar to that of Theorem 14.4, but itinstead invokes Proposition 15.9.

Theorem 15.10 n-Shot Squashed Entanglement Upper Bound

Let NA→B be a quantum channel, and let ε ∈ [0, 1/4). For all (n, K, ε)secret-key-agreement protocols over the channel NA→B, the followingbound holds

log2 M ≤ 11− 2

√ε

[n · Esq(N) + 2g2(

√ε)]

. (15.3.31)

PROOF: Given an arbitrary (n, K, ε) secret-key-agreement protocol as outlinedin Section 15.1, we consider its equivalent (n, K, ε) LOCC-assisted bipartiteprivate-state distillation protocol, as outlined in Section 15.2.2. The squashedentanglement is an entanglement measure (monotone under LOCC as shownin Theorem 5.33) and it is equal to zero for separable states (Proposition 5.4.5).Thus, Proposition 15.4 applies, and we find that

Esq(KASA; KBSB)ω ≤ n · EAsq(N) = n · Esq(N), (15.3.32)

916

Page 928: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

where the equality follows from Theorem 5.58. Applying Definition 15.1 and(15.2.17) leads to

F(γKASAKBSB , ωKASAKBSB) ≥ 1− ε, (15.3.33)

where γKASAKBSB is an exact private state of log2 K private bits. As a conse-quence of Proposition 15.9, we find that

Esq(KASA; KBSB)ω ≥ (1− 2√

ε) log2 K− 2g2(√

ε). (15.3.34)

Putting together (15.3.32) and (15.3.34), we arrive at the statement of the the-orem.

15.4 Relative Entropy of Entanglement Upper Boundson the Number of Transmitted Private Bits

We now establish the max-relative entropy of entanglement bound on thenumber of private bits that a sender can transmit to a receiver by employinga secret-key-agreement protocol assisted by public separable channels:

Theorem 15.11 n-Shot Max-Relative Entropy of Entanglement Up-per Bound

Let NA→B be a quantum channel, and let ε ∈ [0, 1). For all (n, K, ε) secret-key-agreement protocols assisted by public separable channels, over thechannel NA→B, the following bound holds

log2 K ≤ n · Emax(N) + log2

(1

1− ε

). (15.4.1)

PROOF: Given an arbitrary (n, K, ε) secret-key-agreement protocol assistedby public separable channels as outlined in Section 15.2.4, we consider itsequivalent (n, K, ε) separable-assisted bipartite private-state distillation pro-tocol. The max-relative entropy of entanglement is an entanglement measure(monotone under separable channels channels as shown in Proposition 5.16)and it is equal to zero for separable states. Thus, Proposition 15.4 applies, andwe find that

Emax(KASA; KBSB)ω ≤ n · EAmax(N) = n · Emax(N), (15.4.2)

917

Page 929: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

where the equality follows from Theorem 5.54. Applying Definition 15.1 and(15.2.17) leads to

F(γKASAKBSB , ωKASAKBSB) ≥ 1− ε, (15.4.3)

where γKASAKBSB is an exact private state of log2 K private bits. As a conse-quence of Propositions 10.10 and 4.68, we find that

log2 K ≤ EεR(SAKA; SBKB)ω (15.4.4)

≤ Emax(SAKA; SBKB)ω + log2

(1

1− ε

). (15.4.5)

Combining (15.4.2) and (15.4.5), we conclude the proof.

For channels that are separable-simulable with associated resource states,as given in Definition 3.41, we obtain upper bounds that can be even stronger:

Theorem 15.12 n-Shot Rényi–REE Upper Bounds for Separable-Simulable Channels

Let NA→B be a quantum channel that is separable-simulable with as-sociated resource state θSB′ , and let ε ∈ [0, 1). For all (n, K, ε) secret-key-agreement protocols assisted by public separable channels, over thechannel NA→B, the following bounds hold for all α > 1:

log2 K ≤ n · Eα(S; B′)θ +α

α− 1log2

(1

1− ε

), (15.4.6)

log2 K ≤ 11− ε

[n · ER(S; B′)θ + h2(ε)

]. (15.4.7)

PROOF: Given an arbitrary (n, K, ε) secret-key-agreement protocol assistedby public separable channels as outlined in Section 15.2.4, we consider itsequivalent (n, K, ε) separable-assisted bipartite private-state distillation pro-tocol. The Rényi relative entropy of entanglement and relative entropy of en-tanglement are monotone non-increasing under separable channels (Propo-sition 5.16), equal to zero for separable states, and subadditive with respectto states (Proposition 5.16). As such, Corollary 15.7 applies, and we find forα > 1 that

Eα(SAKA; SBKB)ω ≤ n · Eα(S; B′)θ, (15.4.8)ER(SAKA; SBKB)ω ≤ n · ER(S; B′)θ. (15.4.9)

Applying Definition 15.1 and (15.2.17) leads to

F(γKASAKBSB , ωKASAKBSB) ≥ 1− ε, (15.4.10)

918

Page 930: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

where γKASAKBSB is an exact private state of log2 K private bits. As a conse-quence of Proposition 10.10, we have that

log2 K ≤ EεR(SAKA; SBKB)ω. (15.4.11)

Applying Propositions 4.67 and 4.68, we find that

log2 K ≤ Eα(SAKA; SBKB)ω +α

α− 1log2

(1

1− ε

), (15.4.12)

log2 K ≤ 11− ε

[ER(SAKA; SBKB)ω + h2(ε)] . (15.4.13)

Putting together (15.4.8), (15.4.9), (15.4.12), and (15.4.13) concludes the proof.

15.5 Secret-Key-Agreement Capacities of QuantumChannels

In this section, we analyze the asymptotic capacities, and as before, the upperbounds for the asymptotic capacities are straightforward consequences of thenon-asymptotic bounds given in Sections 15.3.2 and 15.4. The definitions ofthese capacities are similar to what we have given previously, and so we onlystate them here briefly.

Definition 15.13 Achievable Rate for Secret Key Agreement

Given a quantum channel N, a rate R ∈ R+ is called an achievable ratefor secret key agreement over N if for all ε ∈ (0, 1], all δ > 0, and allsufficiently large n, there exists an (n, 2n(R−δ), ε) secret-key-agreementprotocol.

Definition 15.14 Secret-Key-Agreement Capacity of a QuantumChannel

The secret-key-agreement capacity of a quantum channel N, denoted byP↔(N), is defined as the supremum of all achievable rates, i.e.,

P↔(N) := supR : R is an achievable rate for N. (15.5.1)

919

Page 931: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

Definition 15.15 Weak Converse Rate for Secret Key Agreement

Given a quantum channel N, a rate R ∈ R+ is called a weak converserate for secret key agreement over N if every R′ > R is not an achievablerate for N.

Definition 15.16 Strong Converse Rate for Secret Key Agreement

Given a quantum channel N, a rate R ∈ R+ is called a strong converserate for secret key agreement over N if for all ε ∈ [0, 1), all δ > 0, andall sufficiently large n, there does not exist an (n, 2n(R+δ), ε) secret-key-agreement protocol.

Definition 15.17 Strong Converse Secret-Key-Agreement Capacityof a Quantum Channel

The strong converse secret-key-agreement capacity of a quantum chan-nel N, denoted by P↔(N), is defined as the infimum of all strong con-verse rates, i.e.,

P↔(N) := infR : R is a strong converse rate for N. (15.5.2)

We have the exact same definitions for secret key agreement assisted bypublic separable channels, and we use the notation P↔SEP to refer to the public-separable-assisted secret-key-agreement capacity and P↔SEP for the strong con-verse secret-key-agreement capacity assisted by public separable channels.

Recall that, by definition, the following bounds hold

P↔(N) ≤ P↔(N) ≤ P↔SEP(N), (15.5.3)

P↔(N) ≤ P↔SEP(N) ≤ P↔SEP(N). (15.5.4)

As a direct consequence of the bound in Theorem 15.10 and methods sim-ilar to those given in the proof of Theorem 6.23, we find the following:

Theorem 15.18 Squashed-Entanglement Weak-Converse Bound

The squashed entanglement of a channel N is a weak converse rate for

920

Page 932: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

secret key agreement:P↔(N) ≤ Esq(N). (15.5.5)

As a direct consequence of the bound in Theorem 15.11 and methods sim-ilar to those given in Section 6.2.3, we find that

Theorem 15.19 Max-Relative Entropy of Entanglement Strong-Converse Bound

The max-relative entropy of entanglement of a channel N is a strong con-verse rate for secret key agreement assisted by public separable chan-nels:

P↔SEP(N) ≤ Emax(N). (15.5.6)

As a direct consequence of the bound in Theorem 15.12 and methods sim-ilar to those given in Section 6.2.3, we find that

Theorem 15.20 Relative Entropy of Entanglement Strong-ConverseBound for Separable-Simulable Channels

Let N be a quantum channel that is separable-simulable with associatedresource state θSB′ . Then the relative entropy of entanglement of θSB′ isa strong converse rate for secret key agreement assisted by public sepa-rable channels:

P↔SEP(N) ≤ ER(S; B′)θ. (15.5.7)

15.6 Examples

[IN PROGRESS]

15.7 Bibliographic Notes

Quantum key distribution is one of the first examples of a secret-key-agreementprotocol conducted over a quantum channel (Bennett and Brassard, 1984;Ekert, 1991). Secret key agreement was considered in classical information

921

Page 933: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

theory by Maurer (1993); Ahlswede and Csiszár (1993). Secret key distilla-tion from a bipartite quantum state was studied by a number of researchers,including Devetak and Winter (2005); Horodecki et al. (2005a); Christandl(2006); Horodecki et al. (2008, 2009a); Christandl et al. (2007, 2012), well be-fore secret key agreement over quantum channels was considered. One of theseminal insights in this domain, that a tripartite secret-key-distillation pro-tocol is equivalent to a bipartite private-state distillation protocol, was madeby Horodecki et al. (2005a, 2009a). These authors also established the relativeentropy of entanglement (Vedral and Plenio, 1998) as an upper bound on therate of a secret-key-distillation protocol. In earlier work, Curty et al. (2004)observed that a separable state has no distillable secret key.

Secret key agreement over a quantum channel was formally defined byTakeoka et al. (2014), who also observed that the aforementioned insight ex-tends to this setting, with more details being given by Kaur and Wilde (2017).The issue of unbounded shield systems resulting from secret-key-distillationor secret-key-agreement protocols was somewhat implicit in (Horodecki et al.,2005a, 2009a) and discussed in more detail by Christandl et al. (2012); Wilde(2016); Wilde et al. (2017). The relation between LOCC-assisted quantumcommunication and secret key agreement was discussed by Wilde et al. (2017).The notion of secret key agreement assisted by public separable channelsis original to this book (including the observation that various previouslyknown bounds apply to these more general protocols).

The use of teleportation simulation and relative entropy of entanglementas a method for bounding rates of secret-key-agreement protocols was pre-sented by Pirandola et al. (2017).

The amortized entanglement bound for secret-key-agreement protocols(Proposition 15.4) was contributed by Kaur and Wilde (2017). Corollary 15.5,as presented here, is due to Kaur and Wilde (2017).

The squashed entanglement upper bound on the rate of a secret-key-agreementprotocol was established by Takeoka et al. (2014); Wilde (2016). Lemma 15.8and Proposition 15.9 are due to Wilde (2016).

The use of sandwiched relative entropy of entanglement for bounding keyrates was contributed by Wilde et al. (2017). The max-relative entropy of en-tanglement of a state was introduced by Datta (2009b,a), and the generaliza-tion to channels by Christandl and Müller-Hermes (2017). Lemmas 5.21 and5.48 were established by Berta and Wilde (2018). Proposition 5.54 was provenby Christandl and Müller-Hermes (2017). The proof that we follow here was

922

Page 934: Principles of Quantum Communication Theory: A Modern Approach

Chapter 15: LOPC-Assisted Private Communication

given by Berta and Wilde (2018). The interpretation of Proposition 5.54 interms of “amortization collapse” was given by Berta and Wilde (2018). The-orem 15.11 was proven by Christandl and Müller-Hermes (2017). The pre-cise statement of Theorem 15.12 is due to Wilde et al. (2017); Kaur and Wilde(2017) (although it was stated for LOCC-simulable channels in these papers).

923

Page 935: Principles of Quantum Communication Theory: A Modern Approach

Summary

[IN PROGRESS]

924

Page 936: Principles of Quantum Communication Theory: A Modern Approach

Appendix A

Analyzing GeneralCommunication Scenarios

[IN PROGRESS]

925

Page 937: Principles of Quantum Communication Theory: A Modern Approach

Bibliography

Christoph Adami and Nicolas J. Cerf. von Neumann capacity of noisy quan-tum channels. Physical Review A, 56(5):3470–3483, November 1997. doi:10.1103/PhysRevA.56.3470. arXiv:quant-ph/9609024.

Rudolf Ahlswede and Imre Csiszár. Common randomness in informationtheory and cryptography. I. Secret sharing. IEEE Transactions on InformationTheory, 39(4):1121–1132, July 1993.

Robert Alicki and Mark Fannes. Continuity of quantum conditional informa-tion. Journal of Physics A: Mathematical and General, 37(5):L55–L57, February2004. arXiv:quant-ph/0312081.

Anurag Anshu, Mario Berta, Rahul Jain, and Marco Tomamichel. A minimaxapproach to one-shot entropy inequalities. Journal of Mathematical Physics,60(12):122201, December 2019. doi: 10.1063/1.5126723. URL https://doi.org/10.1063/1.5126723. arXiv:1906.00333.

Anurag Anshu, Rahul Jain, and Naqueeb A. Warsi. Building blocks for com-munication over noisy quantum networks. IEEE Transactions on Informa-tion Theory, 65(2):1287–1306, February 2019. doi: 10.1109/TIT.2018.2851297.arXiv:1702.01940.

Anurag Anshu, Rahul Jain, and Naqueeb A. Warsi. On the near-optimality ofone-shot classical communication over quantum channels. Journal of Math-ematical Physics, 60(1):012204, January 2019. doi: 10.1063/1.5039796. URLhttps://doi.org/10.1063/1.5039796. arXiv:1804.09644.

Huzihiro Araki. On an inequality of Lieb and Thirring. Letters in Mathemat-ical Physics, 19(2):167–170, February 1990. ISSN 1573-0530. doi: 10.1007/BF01045887. URL https://doi.org/10.1007/BF01045887.

926

Page 938: Principles of Quantum Communication Theory: A Modern Approach

Sanjeev Arora and Satyen Kale. A combinatorial, primal-dual approach tosemidefinite programs. In Proceedings of the Thirty-Ninth Annual ACM Sym-posium on Theory of Computing, pages 227–236, New York, NY, USA, June2007. Association for Computing Machinery. ISBN 9781595936318. doi: 10.1145/1250790.1250823. URL https://doi.org/10.1145/1250790.1250823.

Sanjeev Arora, Elad Hazan, and Satyen Kale. Fast algorithms for approxi-mate semidefinite programming using the multiplicative weights updatemethod. In 46th Annual IEEE Symposium on Foundations of Computer Science,pages 339–348, 2005.

Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weightsupdate method: a meta-algorithm and applications. Theory of Comput-ing, 8(6):121–164, 2012. doi: 10.4086/toc.2012.v008a006. URL http://www.theoryofcomputing.org/articles/v008a006.

Koenraad Audenaert, Bart De Moor, Karl Gerd H. Vollbrecht, and Reinhard F.Werner. Asymptotic relative entropy of entanglement for orthogonallyinvariant states. Physical Review A, 66(3):032310, September 2002. doi:10.1103/PhysRevA.66.032310. URL http://link.aps.org/doi/10.1103/PhysRevA.66.032310. arXiv:quant-ph/0204143.

Koenraad M. R. Audenaert. Comparisons between quantum state distin-guishability measures. Quantum Information and Computation, 14(1-2):31–38,January 2014. ISSN 1533-7146. URL http://dl.acm.org/citation.cfm?id=2600498.2600500. arXiv:1207.1197.

Koenraad M. R. Audenaert and Jens Eisert. Continuity bounds on the quan-tum relative entropy. Journal of Mathematical Physics, 46(10):102104, October2005. doi: 10.1063/1.2044667. URL https://doi.org/10.1063/1.2044667.arXiv:quant-ph/0503218.

Koenraad M. R. Audenaert, John Calsamiglia, Ramon Muñoz Tapia, EmilioBagan, Lluis Masanes, Antonio Acin, and Frank Verstraete. Discriminatingstates: The quantum Chernoff bound. Physical Review Letters, 98(16):160501,April 2007. doi: 10.1103/PhysRevLett.98.160501. URL http://link.aps.org/doi/10.1103/PhysRevLett.98.160501. arXiv:quant-ph/0610027.

Koenraad M. R. Audenaert, Milan Mosonyi, and Frank Verstraete. Quantumstate discrimination bounds for finite sample size. Journal of MathematicalPhysics, 53(12):122205, December 2012. doi: 10.1063/1.4768252. URL https://doi.org/10.1063/1.4768252. arXiv:1204.0711.

927

Page 939: Principles of Quantum Communication Theory: A Modern Approach

Howard Barnum, Michael A. Nielsen, and Benjamin Schumacher. Informa-tion transmission through a noisy quantum channel. Physical Review A, 57(6):4153–4175, June 1998. doi: 10.1103/PhysRevA.57.4153. URL https://link.aps.org/doi/10.1103/PhysRevA.57.4153. arXiv:quant-ph/9702049.

Howard Barnum, Emanuel Knill, and Michael A. Nielsen. On quantum fi-delities and channel capacities. IEEE Transactions on Information Theory, 46(4):1317–1329, July 2000. arXiv:quant-ph/9809010.

Salman Beigi. Sandwiched Rényi divergence satisfies data processing in-equality. Journal of Mathematical Physics, 54(12):122202, December 2013.arXiv:1306.5920.

Salman Beigi, Nilanjana Datta, and Felix Leditzky. Decoding quantum in-formation via the Petz recovery map. Journal of Mathematical Physics, 57(8):082203, August 2016. arXiv:1504.04449.

V. P. Belavkin and P. Staszewski. C*-algebraic generalization of relative en-tropy and entropy. Annales de l’I.H.P. Physique théorique, 37(1):51–58, 1982.URL http://eudml.org/doc/76163.

Ingemar Bengtsson and Karol Zyczkowski. Geometry of Quantum States: AnIntroduction to Quantum Entanglement. Cambridge University Press, secondedition, 2017. doi: 10.1017/9781139207010.

Charles H. Bennett and Gilles Brassard. Quantum cryptography: Public keydistribution and coin tossing. In International Conference on Computer Systemand Signal Processing, IEEE, 1984, pages 175–179, 1984.

Charles H. Bennett and Stephen J. Wiesner. Communication via one- andtwo-particle operators on Einstein-Podolsky-Rosen states. Physical ReviewLetters, 69(20):2881–2884, November 1992. doi: 10.1103/PhysRevLett.69.2881.

Charles H. Bennett, Gilles Brassard, Claude Crépeau, Richard Jozsa, AsherPeres, and William K. Wootters. Teleporting an unknown quantum statevia dual classical and Einstein-Podolsky-Rosen channels. Physical ReviewLetters, 70(13):1895–1899, March 1993. URL https://link.aps.org/doi/10.1103/PhysRevLett.70.1895.

Charles H. Bennett, Herbert J. Bernstein, Sandu Popescu, and Benjamin Schu-macher. Concentrating partial entanglement by local operations. Phys-ical Review A, 53(4):2046–2052, April 1996a. doi: 10.1103/PhysRevA.

928

Page 940: Principles of Quantum Communication Theory: A Modern Approach

53.2046. URL https://link.aps.org/doi/10.1103/PhysRevA.53.2046.arXiv:quant-ph/9511030.

Charles H. Bennett, Gilles Brassard, Sandu Popescu, Benjamin Schumacher,John A. Smolin, and William K. Wootters. Purification of noisy entan-glement and faithful teleportation via noisy channels. Physical ReviewLetters, 76(5):722–725, January 1996b. doi: 10.1103/PhysRevLett.76.722.arXiv:quant-ph/9511027.

Charles H. Bennett, David P. DiVincenzo, John A. Smolin, and William K.Wootters. Mixed-state entanglement and quantum error correction. PhysicalReview A, 54(5):3824–3851, November 1996c. doi: 10.1103/PhysRevA.54.3824. arXiv:quant-ph/9604024.

Charles H. Bennett, David P. DiVincenzo, and John A. Smolin. Capacities ofquantum erasure channels. Physical Review Letters, 78(16):3217–3220, April1997. doi: 10.1103/PhysRevLett.78.3217. arXiv:quant-ph/9701015.

Charles H. Bennett, David P. DiVincenzo, Christopher A. Fuchs, Tal Mor, EricRains, Peter W. Shor, John A. Smolin, and William K. Wootters. Quan-tum nonlocality without entanglement. Physical Review A, 59(2):1070–1091,February 1999a. doi: 10.1103/PhysRevA.59.1070. URL http://link.aps.org/doi/10.1103/PhysRevA.59.1070. arXiv:quant-ph/9804053.

Charles H. Bennett, Peter W. Shor, John A. Smolin, and Ashish V. Thap-liyal. Entanglement-assisted classical capacity of noisy quantum channels.Physical Review Letters, 83(15):3081–3084, October 1999b. doi: 10.1103/PhysRevLett.83.3081. arXiv:quant-ph/9904023.

Charles H. Bennett, Peter W. Shor, John A. Smolin, and Ashish V. Thap-liyal. Entanglement-assisted capacity of a quantum channel and the re-verse Shannon theorem. IEEE Transactions on Information Theory, 48(10):2637–2655, October 2002. arXiv:quant-ph/0106052.

Charles H. Bennett, Aram W. Harrow, Debbie W. Leung, and John A. Smolin.On the capacities of bipartite Hamiltonians and unitary gates. IEEE Transac-tions on Information Theory, 49(8):1895–1911, August 2003. ISSN 0018-9448.doi: 10.1109/TIT.2003.814935. arXiv:quant-ph/0205057.

Charles H. Bennett, Igor Devetak, Aram W. Harrow, Peter W. Shor, and An-dreas Winter. The quantum reverse Shannon theorem and resource trade-offs for simulating quantum channels. IEEE Transactions on Information The-ory, 60(5):2926–2959, May 2014. arXiv:0912.5537.

929

Page 941: Principles of Quantum Communication Theory: A Modern Approach

Dominic W. Berry. Qubit channels that achieve capacity with two states. Phys-ical Review A, 71:032334, March 2005. URL https://link.aps.org/doi/10.1103/PhysRevA.71.032334. arXiv:quant-ph/0412113.

Mario Berta. Single-shot quantum state merging. Diploma thesis, ETHZurich, February 2008. arXiv:0912.4495.

Mario Berta and Mark M. Wilde. Amortization does not enhance the max-Rains information of a quantum channel. New Journal of Physics, 20(5):053044, May 2018. doi: 10.1088/1367-2630/aac153. arXiv:1709.00200.

Mario Berta, Omar Fawzi, and Marco Tomamichel. On variational expres-sions for quantum relative entropies. Letters in Mathematical Physics, 107(12):2239–2265, December 2017. arXiv:1512.02615.

Reinhold A. Bertlmann and Philipp Krammer. Bloch vectors for qudits.Journal of Physics A: Mathematical and Theoretical, 41(23):235303, May 2008.doi: 10.1088/1751-8113/41/23/235303. URL https://doi.org/10.1088%2F1751-8113%2F41%2F23%2F235303. arXiv:0806.1174.

Rajendra Bhatia. Matrix Analysis. Springer New York, 1997. doi: 10.1007/978-1-4612-0653-8.

Igor Bjelakovic and Rainer Siegmund-Schultze. Quantum Stein’s lemma re-visited, inequalities for quantum entropies, and a concavity theorem ofLieb. July 2003. arXiv:quant-ph/0307170.

Garry Bowen. Quantum feedback channels. IEEE Transactions on InformationTheory, 50(10):2429–2434, October 2004. arXiv:quant-ph/0209076.

Garry Bowen and Sougato Bose. Teleportation as a depolarizing quan-tum channel, relative entropy, and classical capacity. Physical Re-view Letters, 87(26):267901, December 2001. doi: 10.1103/PhysRevLett.87.267901. URL https://link.aps.org/doi/10.1103/PhysRevLett.87.267901. arXiv:quant-ph/0107132.

Stephen Boyd and Lieven Vandenberghe. Convex Optimization. CambridgeUniversity Press, The Edinburgh Building, Cambridge, CB2 8RU, UK, 2004.

Fernando G. S. L. Brandao and Nilanjana Datta. One-shot rates for entan-glement manipulation under non-entangling maps. IEEE Transactions onInformation Theory, 57(3):1754–1760, March 2011. ISSN 0018-9448. doi:10.1109/TIT.2011.2104531. arXiv:0905.2673.

930

Page 942: Principles of Quantum Communication Theory: A Modern Approach

Fernando G.S.L. Brandao, Matthias Christandl, and Jon Yard. Faithfulsquashed entanglement. Communications in Mathematical Physics, 306(3):805–830, September 2011. ISSN 0010-3616. doi: 10.1007/s00220-011-1302-1.URL http://dx.doi.org/10.1007/s00220-011-1302-1. arXiv:1010.1750.

Samuel L. Braunstein and H. J. Kimble. Teleportation of continuous quantumvariables. Physical Review Letters, 80(4):869–872, January 1998.

Samuel L. Braunstein, Giacomo M. D’Ariano, Gerard J. Milburn, and Massi-miliano F. Sacchi. Universal teleportation with a twist. Physical Review Let-ters, 84(15):3486–3489, April 2000. doi: 10.1103/PhysRevLett.84.3486. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.84.3486. arXiv:quant-ph/9908036.

Francesco Buscemi and Nilanjana Datta. The quantum capacity of channelswith arbitrarily correlated noise. IEEE Transactions on Information Theory,56(3):1447–1460, March 2010a. ISSN 0018-9448. doi: 10.1109/TIT.2009.2039166. arXiv:0902.0158.

Francesco Buscemi and Nilanjana Datta. Distilling entanglement fromarbitrary resources. Journal of Mathematical Physics, 51(10):102201,October 2010b. doi: http://dx.doi.org/10.1063/1.3483717. URLhttp://scitation.aip.org/content/aip/journal/jmp/51/10/10.1063/1.3483717. arXiv:1006.1896.

Gianfranco Cariolaro and Tomaso Erseghe. Pulse Position Modulation. JohnWiley & Sons, Inc., 2003. ISBN 9780471219286. doi: 10.1002/0471219282.eot394. URL http://dx.doi.org/10.1002/0471219282.eot394.

Eric A. Carlen. Trace inequalities and quantum entropy: An introduc-tory course. Contemporary Mathematics, 529:73–140, 2010. Available atwww.ueltschi.org/AZschool/notes/EricCarlen.pdf.

Filippo Caruso and Vittorio Giovannetti. Degradability of bosonic Gaussianchannels. Physical Review A, 74(6):062307, December 2006. URL https://journals.aps.org/pra/abstract/10.1103/PhysRevA.74.062307.arXiv:quant-ph/0603257.

Nicholas J. Cerf and Christoph Adami. Negative entropy and informationin quantum mechanics. Physical Review Letters, 79(26):5194–5197, Decem-ber 1997. doi: 10.1103/PhysRevLett.79.5194. URL https://link.aps.org/doi/10.1103/PhysRevLett.79.5194. arXiv:quant-ph/9512022.

931

Page 943: Principles of Quantum Communication Theory: A Modern Approach

Nicolas J. Cerf and Chris Adami. Information theory of quantum entan-glement and measurement. Physica D: Nonlinear Phenomena, 120(1):62–81, September 1998. ISSN 0167-2789. doi: https://doi.org/10.1016/S0167-2789(98)00045-1. URL http://www.sciencedirect.com/science/article/pii/S0167278998000451. arXiv:quant-ph/9605039.

Herman Chernoff. A measure of asymptotic efficiency for tests of a hy-pothesis based on the sum of observations. The Annals of MathematicalStatistics, 23(4):493–507, 12 1952. doi: 10.1214/aoms/1177729330. URLhttps://doi.org/10.1214/aoms/1177729330.

Giulio Chiribella, Giacomo Mauro D’Ariano, and Paolo Perinotti. Realizationschemes for quantum instruments in finite dimensions. Journal of Math-ematical Physics, 50(4):042101, April 2009. doi: 10.1063/1.3105923. URLhttp://dx.doi.org/10.1063/1.3105923. arXiv:0810.3211.

Eric Chitambar, Debbie Leung, Laura Mancinska, Maris Ozols, and AndreasWinter. Everything you always wanted to know about LOCC (but wereafraid to ask). Communications in Mathematical Physics, 328(1):303–326, May2014. ISSN 1432-0916. doi: 10.1007/s00220-014-1953-9. URL http://dx.doi.org/10.1007/s00220-014-1953-9. arXiv:1210.4583.

Eric Chitambar, Julio I. de Vicente, Mark W. Girard, and Gilad Gour. Entangle-ment manipulation and distillability beyond LOCC. Journal of MathematicalPhysics, 61(4):042201, April 2020. arXiv:1711.03835.

Man-Duen Choi. Completely positive linear maps on complex matrices. Lin-ear Algebra and Its Applications, 10(3):285–290, 1975.

Matthias Christandl. The Structure of Bipartite Quantum States: Insights fromGroup Theory and Cryptography. PhD thesis, University of Cambridge, April2006. arXiv:quant-ph/0604183.

Matthias Christandl and Alexander Müller-Hermes. Relative entropy boundson quantum, private and repeater capacities. Communications in Mathemati-cal Physics, 353(2):821–852, July 2017. doi: 10.1007/s00220-017-2885-y. URLhttps://doi.org/10.1007/s00220-017-2885-y. arXiv:1604.03448.

Matthias Christandl and Andreas Winter. “Squashed entanglement”: An ad-ditive entanglement measure. Journal of Mathematical Physics, 45(3):829–840,March 2004. arXiv:quant-ph/0308088.

932

Page 944: Principles of Quantum Communication Theory: A Modern Approach

Matthias Christandl, Artur Ekert, Michal Horodecki, Pawel Horodecki,Jonathan Oppenheim, and Renato Renner. Unifying classical and quan-tum key distillation. Proceedings of the 4th Theory of Cryptography Conference,Lecture Notes in Computer Science, 4392:456–478, February 2007. arXiv:quant-ph/0608199.

Matthias Christandl, Norbert Schuch, and Andreas Winter. Entanglementof the antisymmetric state. Communications in Mathematical Physics, 311(2):397–422, March 2012. arXiv:0910.4151.

Benoît Collins and Piotr Sniady. Integration with respect to the Haar measureon unitary, orthogonal and symplectic group. Communications in Mathe-matical Physics, 264(3):773–795, June 2006. ISSN 1432-0916. doi: 10.1007/s00220-006-1554-3. URL https://doi.org/10.1007/s00220-006-1554-3.arXiv:math-ph/0402073.

Tom Cooney, Milan Mosonyi, and Mark M. Wilde. Strong converse exponentsfor a quantum channel discrimination problem and quantum-feedback-assisted communication. Communications in Mathematical Physics, 344(3):797–829, June 2016. arXiv:1408.3373.

John Cortese. Relative entropy and single qubit Holevo-Schumacher-Westmoreland channel capacity. July 2002. URL https://arxiv.org/abs/quant-ph/0207128. arXiv:quant-ph/0207128.

Toby Cubitt, David Elkouss, William Matthews, Maris Ozols, David Pérez-García, and Sergii Strelchuk. Unbounded number of channel uses may berequired to detect quantum capacity. Nature Communications, 6:6739, 2015.URL https://doi.org/10.1038/ncomms7739. arXiv:1408.5115.

Toby S. Cubitt, Mary Beth Ruskai, and Graeme Smith. The structure ofdegradable quantum channels. Journal of Mathematical Physics, 49(10):102104, 2008.

Marcos Curty, Maciej Lewenstein, and Norbert Lütkenhaus. Entangle-ment as a precondition for secure quantum key distribution. Physi-cal Review Letters, 92(21):217903, May 2004. doi: 10.1103/PhysRevLett.92.217903. URL https://link.aps.org/doi/10.1103/PhysRevLett.92.217903. arXiv:quant-ph/0307151.

Nilanjana Datta. Max-relative entropy of entanglement, alias log robustness.International Journal of Quantum Information, 7(02):475–491, January 2009a.arXiv:0807.2536.

933

Page 945: Principles of Quantum Communication Theory: A Modern Approach

Nilanjana Datta. Min- and max-relative entropies and a new entanglementmonotone. IEEE Transactions on Information Theory, 55(6):2816–2826, June2009b. arXiv:0803.2770.

Nilanjana Datta and Min-Hsiu Hsieh. One-shot entanglement-assisted quan-tum and classical communication. IEEE Transactions on Information Theory,59(3):1929–1939, March 2013. arXiv:1105.3321.

Nilanjana Datta and Felix Leditzky. A limit of the quantum Rényi diver-gence. Journal of Physics A: Mathematical and Theoretical, 47(4):045304, Jan-uary 2014. URL http://stacks.iop.org/1751-8121/47/i=4/a=045304.arXiv:1308.5961.

Nilanjana Datta, Milan Mosonyi, Min-Hsiu Hsieh, and Fernando G. S. L.Brandão. A smooth entropy approach to quantum hypothesis testing andthe classical capacity of quantum channels. IEEE Transactions on Infor-mation Theory, 59(12):8014–8026, December 2013. ISSN 0018-9448. doi:10.1109/TIT.2013.2282160. arXiv:1106.3089.

Nilanjana Datta, Marco Tomamichel, and Mark M. Wilde. On the second-order asymptotics for entanglement-assisted communication. Quantum In-formation Processing, 15(6):2569–2591, June 2016. arXiv:1405.1797.

Edward B. Davies and J. T. Lewis. An operational approach to quantum prob-ability. Communications in Mathematical Physics, 17:239–260, 1970.

Igor Devetak. The private classical capacity and quantum capacity of a quan-tum channel. IEEE Transactions on Information Theory, 51(1):44–55, January2005. doi: 10.1109/TIT.2004.839515. arXiv:quant-ph/0304127.

Igor Devetak and Peter W. Shor. The capacity of a quantum channel for si-multaneous transmission of classical and quantum information. Commu-nications in Mathematical Physics, 256(2):287–303, June 2005. arXiv:quant-ph/0311131.

Igor Devetak and Andreas Winter. Distillation of secret key and entanglementfrom quantum states. Proceedings of the Royal Society of London A: Mathemat-ical, Physical and Engineering Sciences, 461(2053):207–235, January 2005. doi:10.1098/rspa.2004.1372. arXiv:quant-ph/0306078.

Igor Devetak, Marius Junge, Christopher King, and Mary Beth Ruskai. Mul-tiplicativity of completely bounded p-norms implies a new additivity re-sult. Communications in Mathematical Physics, 266(1):37–63, August 2006.arXiv:quant-ph/0506196.

934

Page 946: Principles of Quantum Communication Theory: A Modern Approach

David P. DiVincenzo, Peter W. Shor, and John A. Smolin. Quantum-channelcapacity of very noisy channels. Physical Review A, 57:830–839, February1998. doi: 10.1103/PhysRevA.57.830. URL https://link.aps.org/doi/10.1103/PhysRevA.57.830. arXiv:quant-ph/9706061.

David P. DiVincenzo, Peter W. Shor, John A. Smolin, Barbara M. Terhal,and Ashish V. Thapliyal. Evidence for bound entangled states with neg-ative partial transpose. Physical Review A, 61(6), May 2000. doi: 10.1103/physreva.61.062312. arXiv:quant-ph/9910026.

Andrew C. Doherty, Pablo A. Parrilo, and Federico M. Spedalieri. Completefamily of separability criteria. Physical Review A, 69:022308, February 2004.doi: 10.1103/PhysRevA.69.022308. URL https://link.aps.org/doi/10.1103/PhysRevA.69.022308. arXiv:quant-ph/0308032.

Frederic Dupuis. The decoupling approach to quantum information theory. PhDthesis, University of Montreal, April 2010. arXiv:1004.1641.

Frédéric Dupuis and Mark M. Wilde. Swiveled Rényi entropies. Quan-tum Information Processing, 15(3):1309–1345, March 2016. ISSN 1573-1332. doi: 10.1007/s11128-015-1211-x. URL http://dx.doi.org/10.1007/s11128-015-1211-x. arXiv:1506.00981.

Frederic Dupuis, Lea Kraemer, Philippe Faist, Joseph M. Renes, and RenatoRenner. Generalized entropies. XVIIth International Congress on Mathemati-cal Physics, pages 134–153, 2013. arXiv:1211.3141.

Frédéric Dupuis, Mario Berta, Jürg Wullschleger, and Renato Renner. One-shot decoupling. Communications in Mathematical Physics, 328(1):251–284,May 2014. ISSN 1432-0916. doi: 10.1007/s00220-014-1990-4. URL http://dx.doi.org/10.1007/s00220-014-1990-4. arXiv:1012.6044.

Wolfgang Dür, J. Ignacio Cirac, Maciej Lewenstein, and Dagmar Bruß. Distil-lability and partial transposition in bipartite systems. Physical Review A, 61:062313, May 2000. doi: 10.1103/PhysRevA.61.062313. URL https://link.aps.org/doi/10.1103/PhysRevA.61.062313. arXiv:quant-ph/9910022.

Harold G. Eggleston. Convexity. Cambridge University Press, Cambridge,UK, 1958.

Artur K. Ekert. Quantum cryptography based on Bell’s theorem. PhysicalReview Letters, 67(6):661–663, August 1991.

935

Page 947: Principles of Quantum Communication Theory: A Modern Approach

David Elkouss and Sergii Strelchuk. Superadditivity of private informationfor any number of uses of the channel. Physical Review Letters, 115(4):040501,July 2015. doi: 10.1103/PhysRevLett.115.040501. URL http://link.aps.org/doi/10.1103/PhysRevLett.115.040501. arXiv:1502.05326.

Kun Fang and Hamza Fawzi. Geometric Rényi divergence and its applica-tions in quantum channel capacities. September 2019. URL http://arxiv.org/abs/1909.05758. arXiv:1909.05758v1.

William Feller. An Introduction to Probability Theory and Its Applications, vol-ume 1. Wiley, third edition, 1968.

Rupert L. Frank and Elliott H. Lieb. Monotonicity of a relative Rényientropy. Journal of Mathematical Physics, 54(12):122201, December 2013.arXiv:1306.5358.

Bert Fristedt and Lawrence Gray. A Modern Approach to Probability Theory.Birkhäuser, Boston, 1997. ISBN 978-1-4899-2839-9.

Christopher A. Fuchs and Carlton M. Caves. Mathematical techniques forquantum communication theory. Open Systems & Information Dynamics, 3(3):345–356, 1995. arXiv:quant-ph/9604001.

Christopher A. Fuchs and Jeroen van de Graaf. Cryptographic distinguisha-bility measures for quantum mechanical states. IEEE Transactions on Infor-mation Theory, 45(4):1216–1227, May 1998. URL https://ieeexplore.ieee.org/document/761271. arXiv:quant-ph/9712042.

Jun-Ichi Fujii, Masatoshi Fujii, and Ritsuo Nakamoto. Jensen’s operatorinequality and its application. Sûrikaisekikenkyûsho Kôkyûroku, 1396:85–93, March 2004. Available at http://www.kurims.kyoto-u.ac.jp/~kyodo/kokyuroku/contents/pdf/1396-10.pdf.

Jingliang Gao. Quantum union bounds for sequential projective mea-surements. Physical Review A, 92(5):052331, November 2015. doi: 10.1103/PhysRevA.92.052331. URL https://link.aps.org/doi/10.1103/PhysRevA.92.052331. arXiv:1410.5688.

Raúl García-Patrón, Stefano Pirandola, Seth Lloyd, and Jeffrey H. Shapiro.Reverse coherent information. Physical Review Letters, 102(21):210501, May2009. doi: 10.1103/PhysRevLett.102.210501. arXiv:0808.0210.

936

Page 948: Principles of Quantum Communication Theory: A Modern Approach

I. Gelfand and Mark Aronovich Naimark. On the imbedding of normed ringsinto the ring of operators in hilbert space. Rec. Math. [Mat. Sbornik] N.S.,12(54)(2):197–217, 1943.

Sevag Gharibian. Strong NP-hardness of the quantum separability problem.Quantum Information and Computation, 10(3):343–360, March 2010. ISSN1533-7146. arXiv:0810.4507.

Géza Giedke and J. Ignacio Cirac. Characterization of Gaussian operationsand distillation of Gaussian states. Physical Review A, 66(3):032316, Septem-ber 2002. doi: 10.1103/PhysRevA.66.032316. URL https://link.aps.org/doi/10.1103/PhysRevA.66.032316. arXiv:quant-ph/0204085.

Alexei Gilchrist, Nathan K. Langford, and Michael A. Nielsen. Distance mea-sures to compare real and ideal quantum processes. Physical Review A, 71(6):062310, June 2005. arXiv:quant-ph/0408063.

Vittorio Giovannetti and Rosario Fazio. Information-capacity description ofspin-chain correlations. Physical Review A, 71:032314, March 2005. doi:10.1103/PhysRevA.71.032314. URL https://link.aps.org/doi/10.1103/PhysRevA.71.032314. arXiv:quant-ph/0405110.

Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone. Achieving theHolevo bound via sequential measurements. Physical Review A, 85(1):012302, January 2012. arXiv:1012.0386.

Daniel Gottesman and Isaac L. Chuang. Demonstrating the viability of uni-versal quantum computation using teleportation and single-qubit opera-tions. Nature, 402(6760):390–393, November 1999. doi: 10.1038/46503.arXiv:quant-ph/9908010.

Markus Grassl, Thomas Beth, and Thomas Pellizzari. Codes for the quantumerasure channel. Physical Review A, 56(1):33–38, July 1997. doi: 10.1103/PhysRevA.56.33. URL https://link.aps.org/doi/10.1103/PhysRevA.56.33. arXiv:quant-ph/9610042.

Manish Gupta and Mark M. Wilde. Multiplicativity of completely boundedp-norms implies a strong converse for entanglement-assisted capac-ity. Communications in Mathematical Physics, 334(2):867–887, March 2015.arXiv:1310.7028.

Leonid Gurvits. Classical complexity and quantum entanglement. Jour-nal of Computer and System Sciences, 69(3):448–484, 2004. ISSN 0022-0000. doi: https://doi.org/10.1016/j.jcss.2004.06.003. URL http:

937

Page 949: Principles of Quantum Communication Theory: A Modern Approach

//www.sciencedirect.com/science/article/pii/S0022000004000893.arXiv:quant-ph/0303055.

Brian C. Hall. Quantum Theory for Mathematicians. Graduate Texts in Mathe-matics. Springer New York, 2013. ISBN 9781461471165.

Frank Hansen and Gert K. Pedersen. Jensen’s operator inequality. Bulletinof the London Mathematical Society, 35(4):553–564, July 2003. ISSN 1469-2120. doi: 10.1112/S0024609303002200. URL http://dx.doi.org/10.1112/S0024609303002200. arXiv:math/0204049.

Aram W. Harrow. The church of the symmetric subspace. arXiv:1308.6595,2013. URL https://arxiv.org/abs/1308.6595.

Matthew B. Hastings. Superadditivity of communication capacity using en-tangled inputs. Nature Physics, 5:255–257, April 2009. arXiv:0809.3972.

Paul Hausladen, Richard Jozsa, Benjamin Schumacher, Michael Westmore-land, and William K. Wootters. Classical information capacity of a quan-tum channel. Physical Review A, 54(3):1869–1876, September 1996. doi:10.1103/PhysRevA.54.1869.

Masahito Hayashi. Error exponent in asymmetric quantum hypothesis test-ing and its application to classical-quantum channel coding. Physical Re-view A, 76(6):062301, December 2007. doi: 10.1103/PhysRevA.76.062301.arXiv:quant-ph/0611013.

Masahito Hayashi. Quantum Information Theory: Mathematical Foundation.Springer, second edition, 2017.

Masahito Hayashi and Hiroshi Nagaoka. General formulas for capacity ofclassical-quantum channels. IEEE Transactions on Information Theory, 49(7):1753–1768, July 2003. arXiv:quant-ph/0206186.

Patrick Hayden, Richard Jozsa, Dénes Petz, and Andreas Winter. Structure ofstates which satisfy strong subadditivity of quantum entropy with equal-ity. Communications in Mathematical Physics, 246(2):359–374, April 2004.arXiv:quant-ph/0304007.

Patrick Hayden, Michał Horodecki, Andreas Winter, and Jon Yard. A de-coupling approach to the quantum capacity. Open Systems & Informa-tion Dynamics, 15(01):7–19, 2008a. doi: 10.1142/S1230161208000043. URLhttps://doi.org/10.1142/S1230161208000043. arXiv:quant-ph/0702005.

938

Page 950: Principles of Quantum Communication Theory: A Modern Approach

Patrick Hayden, Peter W. Shor, and Andreas Winter. Random quantum codesfrom Gaussian ensembles and an uncertainty relation. Open Systems & In-formation Dynamics, 15(1):71–89, March 2008b. arXiv:0712.0975.

Teiko Heinosaari and Mário Ziman. The Mathematical Language of QuantumTheory: From Uncertainty to Entanglement. Cambridge University Press,2012.

Carl W. Helstrom. Detection theory and quantum mechanics. Information andControl, 10(3):254–291, 1967. ISSN 0019-9958. URL https://doi.org/10.1016/S0019-9958(67)90302-6.

Carl W. Helstrom. Quantum detection and estimation theory. Journal of Sta-tistical Physics, 1:231–252, 1969. ISSN 0022-4715.

Carl W. Helstrom. Quantum Detection and Estimation Theory. Academic Press,1976.

Fumio Hiai and Milán Mosonyi. Different quantum f -divergences and thereversibility of quantum operations. Reviews in Mathematical Physics, 29(7):1750023, August 2017. arXiv:1604.03089.

Fumio Hiai and Dénes Petz. The proper formula for relative entropy andits asymptotics in quantum probability. Communications in MathematicalPhysics, 143(1):99–114, December 1991.

Foek T. Hioe and Joseph H. Eberly. n-level coherence vector and higher con-servation laws in quantum optics and quantum mechanics. Physical ReviewLetters, 47(12):838–841, September 1981. doi: 10.1103/PhysRevLett.47.838.URL https://link.aps.org/doi/10.1103/PhysRevLett.47.838.

Alexander S. Holevo. An analogue of statistical decision theory and non-commutative probability theory. Trudy Moskovskogo Matematicheskogo Ob-shchestva, 26:133–149, 1972a. URL http://www.mathnet.ru/php/archive.phtml?wshow=paper&jrnid=mmo&paperid=260&option_lang=eng.

Alexander S. Holevo. On quasiequivalence of locally normal states. Theo-retical and Mathematical Physics, 13(2):1071–1082, November 1972b. ISSN1573-9333.

Alexander S. Holevo. Bounds for the quantity of information transmitted bya quantum communication channel. Problems of Information Transmission, 9:177–183, 1973.

939

Page 951: Principles of Quantum Communication Theory: A Modern Approach

Alexander S. Holevo. The capacity of the quantum channel with general sig-nal states. IEEE Transactions on Information Theory, 44(1):269–273, January1998. arXiv:quant-ph/9611023.

Alexander S. Holevo. On entanglement assisted classical capacity. Journalof Mathematical Physics, 43(9):4326–4333, September 2002a. arXiv:quant-ph/0106075.

Alexander S. Holevo. Remarks on the classical capacity of quantum channel.December 2002b. quant-ph/0212025.

Alexander S. Holevo. Multiplicativity of p-norms of completely positivemaps and the additivity problem in quantum information theory. RussianMathematical Surveys, 61(2):301–339, 2006.

Alexander S. Holevo. Quantum Systems, Channels, Information: A MathematicalIntroduction, volume 16. Walter de Gruyter, 2013.

Roger A. Horn and Charles R. Johnson. Matrix Analysis. Matrix Analysis.Cambridge University Press, 2013. ISBN 9780521839402.

Karol Horodecki, Michał Horodecki, Paweł Horodecki, and JonathanOppenheim. Secure key from bound entanglement. Physical Re-view Letters, 94(16):160502, April 2005a. doi: 10.1103/PhysRevLett.94.160502. URL http://link.aps.org/doi/10.1103/PhysRevLett.94.160502. arXiv:quant-ph/0309110.

Karol Horodecki, Michał Horodecki, Paweł Horodecki, Debbie Leung, andJonathan Oppenheim. Quantum key distribution based on private states:Unconditional security over untrusted channels with zero quantum capac-ity. IEEE Transactions on Information Theory, 54(6):2604–2620, June 2008. ISSN0018-9448. doi: 10.1109/TIT.2008.921870. arXiv:quant-ph/0608195.

Karol Horodecki, Michal Horodecki, Pawel Horodecki, and Jonathan Op-penheim. General paradigm for distilling classical key from quantumstates. IEEE Transactions on Information Theory, 55(4):1898–1929, April 2009a.arXiv:quant-ph/0506189.

Michał Horodecki. Simplifying monotonicity conditions for entanglementmeasures. Open Systems & Information Dynamics, 12:231—-237, September2005. arXiv:quant-ph/0412210.

940

Page 952: Principles of Quantum Communication Theory: A Modern Approach

Michał Horodecki and Paweł Horodecki. Reduction criterion of separabilityand limits for a class of distillation protocols. Physical Review A, 59(6):4206–4216, June 1999. doi: 10.1103/PhysRevA.59.4206. URL http://link.aps.org/doi/10.1103/PhysRevA.59.4206. arXiv:quant-ph/9708015.

Michał Horodecki, Paweł Horodecki, and Ryszard Horodecki. Separability ofmixed states: necessary and sufficient conditions. Physics Letters A, 223(1-2):1–8, November 1996. doi: 10.1016/s0375-9601(96)00706-2. arXiv:quant-ph/9605038.

Michał Horodecki, Paweł Horodecki, and Ryszard Horodecki. Mixed-stateentanglement and distillation: Is there a “bound” entanglement in na-ture? Physical Review Letters, 80(24):5239–5242, June 1998. doi: 10.1103/physrevlett.80.5239. arXiv:quant-ph/9801069.

Michał Horodecki, Paweł Horodecki, and Ryszard Horodecki. Generalteleportation channel, singlet fraction, and quasidistillation. PhysicalReview A, 60(3):1888–1898, September 1999. doi: 10.1103/PhysRevA.60.1888. URL https://link.aps.org/doi/10.1103/PhysRevA.60.1888.arXiv:quant-ph/9807091.

Michał Horodecki, Paweł Horodecki, and Ryszard Horodecki. Unified ap-proach to quantum capacities: Towards quantum noisy coding theorem.Physical Review Letters, 85:433–436, July 2000. doi: 10.1103/PhysRevLett.85.433. URL https://link.aps.org/doi/10.1103/PhysRevLett.85.433.arXiv:quant-ph/0003040.

Michal Horodecki, Peter W. Shor, and Mary Beth Ruskai. Entanglementbreaking channels. Reviews in Mathematical Physics, 15(6):629–641, 2003.arXiv:quant-ph/0302031.

Michal Horodecki, Jonathan Oppenheim, and Andreas Winter. Partial quan-tum information. Nature, 436:673–676, August 2005b.

Michal Horodecki, Jonathan Oppenheim, and Andreas Winter. Quantumstate merging and negative information. Communications in MathematicalPhysics, 269(1):107–136, January 2007. arXiv:quant-ph/0512247.

Pawel Horodecki. Separability criterion and inseparable mixed states withpositive partial transposition. Physics Letters A, 232(5):333–339, August1997. ISSN 0375-9601. doi: https://doi.org/10.1016/S0375-9601(97)00416-7. URL http://www.sciencedirect.com/science/article/pii/S0375960197004167. arXiv:quant-ph/9703004.

941

Page 953: Principles of Quantum Communication Theory: A Modern Approach

Ryszard Horodecki, Paweł Horodecki, Michał Horodecki, and KarolHorodecki. Quantum entanglement. Reviews of Modern Physics, 81(2):865–942, June 2009b. doi: 10.1103/RevModPhys.81.865. URL https://link.aps.org/doi/10.1103/RevModPhys.81.865. arXiv:quant-ph/0702225.

Karol Zyczkowski, Paweł Horodecki, Anna Sanpera, and Maciej Lewenstein.Volume of the set of separable states. Physical Review A, 58:883–892, August1998. doi: 10.1103/PhysRevA.58.883. URL https://link.aps.org/doi/10.1103/PhysRevA.58.883. arXiv:quant-ph/9804024.

Lawrence M. Ioannou. Computational complexity of the quantum separabil-ity problem. Quantum Information and Computation, 7(4):335–370, May 2007.ISSN 1533-7146. arXiv:quant-ph/0603199.

Raban Iten, Joseph M. Renes, and David Sutter. Pretty good measuresin quantum information theory. IEEE Transactions on Information The-ory, 63(2):1270–1279, February 2017. doi: 10.1109/TIT.2016.2639521.arXiv:1608.08229.

Vojkan Jaksic, Yoshiko Ogata, Claude-Alain Pillet, and Robert Seiringer.Quantum hypothesis testing and non-equilibrium statistical mechanics. Re-views in Mathematical Physics, 24(06):1230002, 2012. arXiv:1109.3804.

Anna Jencova. A relation between completely bounded norms and conju-gate channels. Communications in Mathematical Physics, 266(1):65–70, August2006. arXiv:quant-ph/0601071.

Anna Jencova. Quantum hypothesis testing and sufficient subalgebras. Lettersin Mathematical Physics, 93:15–27, 2010. arXiv:0810.4045.

Vishal Katariya and Mark M. Wilde. Geometric distinguishability mea-sures limit quantum channel estimation and discrimination. April 2020.arXiv:2004.10708.

Eneet Kaur and Mark M. Wilde. Amortized entanglement of a quantum chan-nel and approximately teleportation-simulable channels. Journal of PhysicsA: Mathematical and Theoretical, July 2017. URL http://iopscience.iop.org/10.1088/1751-8121/aa9da7. arXiv:1707.07721.

Johannes Henricus Bernardus Kemperman. Strong converses for a generalmemoryless channel with feedback. In Transactions of the Sixth Prague Con-ference on Information Theory, Statistical Decision Functions, and Random Pro-cesses, 1971.

942

Page 954: Principles of Quantum Communication Theory: A Modern Approach

Leonid G Khachiyan. Polynomial algorithms in linear programming. USSRComputational Mathematics and Mathematical Physics, 20(1):53–72, 1980. ISSN0041-5553.

Sumeet Khatri, Kunal Sharma, and Mark M. Wilde. Information-theoreticaspects of the generalized amplitude damping channel. Physical Review A,102(1):012401, July 2020. arXiv:1903.07747.

Christopher King. Maximal p-norms of entanglement breaking channels.Quantum Information and Computation, 3(2):186–190, 2003a. arXiv:quant-ph/0212057.

Christopher King. The capacity of the quantum depolarizing channel.IEEE Transactions on Information Theory, 49(1):221–229, January 2003b.ISSN 0018-9448. URL https://ieeexplore.ieee.org/document/1159773.arXiv:quant-ph/0204172.

Christopher King, Keiji Matsumoto, Michael Nathanson, and Mary BethRuskai. Properties of conjugate channels with applications to additivityand multiplicativity. Markov Processes and Related Fields, 13(2):391–423, 2007.J. T. Lewis memorial issue, arXiv:quant-ph/0509126.

Alexei Kitaev. Quantum computations: algorithms and error correction. Rus-sian Mathematical Surveys, 52(6):1191–1249, 1997.

Oskar Klein. Zur Quantenmechanischen Begründung des zweiten Haupt-satzes der Wärmelehre. Z. Physik, 72:767–775, 1931.

Rochus Klesse. Approximate quantum error correction, random codes, andquantum channel capacity. Physical Review A, 75:062315, June 2007. doi:10.1103/PhysRevA.75.062315. URL https://link.aps.org/doi/10.1103/PhysRevA.75.062315. arXiv:quant-ph/0701102v2.

Rochus Klesse. A random coding based proof for the quantum codingtheorem. Open Systems & Information Dynamics, 15(1):21–45, March 2008.arXiv:0712.2558.

Robert Koenig and Stephanie Wehner. A strong converse for classical channelcoding using entangled inputs. Physical Review Letters, 103(7):070504, Au-gust 2009. URL https://link.aps.org/doi/10.1103/PhysRevLett.103.070504. arXiv:0903.2838.

943

Page 955: Principles of Quantum Communication Theory: A Modern Approach

Robert Koenig, Renato Renner, and Christian Schaffner. The operationalmeaning of min- and max-entropy. IEEE Transactions on Information Theory,55(9):4337–4347, September 2009. arXiv:0807.1338.

Pieter Kok, W. J. Munro, Kae Nemoto, T. C. Ralph, Jonathan P. Dowl-ing, and G. J. Milburn. Linear optical quantum computing with pho-tonic qubits. Reviews of Modern Physics, 79(1):135–174, January 2007. doi:10.1103/RevModPhys.79.135. URL https://link.aps.org/doi/10.1103/RevModPhys.79.135. arXiv:quant-ph/0512071.

Hidetoshi Komiya. Elementary proof for Sion’s minimax theorem. KodaiMathematical Journal, 11(1):5–7, 1988. URL https://doi.org/10.2996/kmj/1138038812.

Karl Kraus. States, Effects and Operations: Fundamental Notions of QuantumTheory,. Springer Verlag, 1983.

Dennis Kretschmann and Reinhard F. Werner. Tema con variazioni: quantumchannel capacity. New Journal of Physics, 6(1):26, 2004. URL http://stacks.iop.org/1367-2630/6/i=1/a=026. arXiv:quant-ph/0311037.

Erwin Kreyszig. Introductory Functional Analysis with Applications. Wiley Clas-sics Library. Wiley, 1989. ISBN 9780471504597.

Lev Landau. Das dämpfungsproblem in der wellenmechanik. Zeitschrift fürPhysik, 45(5):430–441, May 1927. ISSN 0044-3328. URL https://doi.org/10.1007/BF01343064.

Oscar E. Lanford, III and Derek W. Robinson. Mean entropy of states inquantum-statistical mechanics. Journal of Mathematical Physics, 9(7):1120–1125, July 1968. doi: 10.1063/1.1664685.

Jimmie D. Lawson and Yongdo Lim. The geometric mean, matrices, metrics,and more. The American Mathematical Monthly, 108(9):797–812, November2001. doi: 10.1080/00029890.2001.11919815. URL https://doi.org/10.1080/00029890.2001.11919815.

Felix Leditzky. Relative entropies and their use in quantum information theory.PhD thesis, University of Cambridge, November 2016. arXiv:1611.08802.

Felix Leditzky, Nilanjana Datta, and Graeme Smith. Useful states and en-tanglement distillation. IEEE Transactions on Information Theory, 64(7):4689–4708, July 2018. doi: 10.1109/TIT.2017.2776907. arXiv:1701.03081.

944

Page 956: Principles of Quantum Communication Theory: A Modern Approach

Felix Leditzky, Eneet Kaur, Nilanjana Datta, and Mark M. Wilde. Approachesfor approximate additivity of the Holevo information of quantum channels.Physical Review A, 97(1):012332, January 2018. doi: 10.1103/PhysRevA.97.012332. arXiv:1709.01111.

Yin Tat Lee, Aaron Sidford, and Sam Chiu Wai Wong. A faster cutting planemethod and its implications for combinatorial and convex optimization. InIEEE 56th Annual Symposium on the Foundations of Computer Science, pages1049–1065, October 2015. arXiv:1508.04874.

Matthew S. Leifer. Conditional density operators and the subjectivity of quan-tum operations. AIP Conference Proceedings, 889(1):172–186, February 2007.doi: 10.1063/1.2713456. URL https://aip.scitation.org/doi/abs/10.1063/1.2713456. arXiv:quant-ph/0611233.

Matthew S. Leifer and Robert W. Spekkens. Towards a formulation of quan-tum theory as a causally neutral theory of Bayesian inference. Physi-cal Review A, 88(5):052130, November 2013. doi: 10.1103/PhysRevA.88.052130. URL http://link.aps.org/doi/10.1103/PhysRevA.88.052130.arXiv:1107.5849.

Matthew S. Leifer, Leah Henderson, and Noah Linden. Optimal entangle-ment generation from quantum operations. Physical Review A, 67(1):012306,January 2003. doi: 10.1103/physreva.67.012306. arXiv:quant-ph/0205055.

Debbie Leung and William Matthews. On the power of PPT-preservingand non-signalling codes. IEEE Transactions on Information Theory, 61(8):4486–4499, August 2015. ISSN 0018-9448. doi: 10.1109/TIT.2015.2439953.arXiv:1406.7142.

Ke Li and Andreas Winter. Squashed entanglement, k-extendibility, quan-tum Markov chains, and recovery maps. Foundations of Physics, 48:910–924,February 2018. arXiv:1410.4184.

Hou Li-Zhen and Fang Mao-Fa. Entanglement-assisted classical capacity of ageneralized amplitude damping channel. Chinese Physics Letters, 24(9):2482,2007a. URL http://stacks.iop.org/0256-307X/24/i=9/a=006.

Hou Li-Zhen and Fang Mao-Fa. The Holevo capacity of a generalizedamplitude-damping channel. Chinese Physics, 16(7):1843, 2007b. URLhttp://stacks.iop.org/1009-1963/16/i=7/a=006.

Elliot H. Lieb. Convex trace functions and the Wigner-Yanase-Dyson conjec-ture. Advances in Mathematics, 11:267–288, December 1973.

945

Page 957: Principles of Quantum Communication Theory: A Modern Approach

Elliott H. Lieb and Mary Beth Ruskai. Proof of the strong subadditivity ofquantum-mechanical entropy. Journal of Mathematical Physics, 14:1938–1941,1973a.

Elliott H. Lieb and Mary Beth Ruskai. A fundamental property of quantum-mechanical entropy. Physical Review Letters, 30(10):434–436, March 1973b.doi: 10.1103/PhysRevLett.30.434.

Elliott H. Lieb and Walter Thirring. Studies in mathematical physics, chapterInequalities for the moments of the eigenvalues of the Schroedinger Hamil-tonian and their relation to Sobolev inequalities, pages 269–297. PrincetonUniversity Press, Princeton, 1976.

Göran Lindblad. Completely positive maps and entropy inequalities. Commu-nications in Mathematical Physics, 40(2):147–151, June 1975. ISSN 0010-3616.doi: 10.1007/BF01609396. URL http://dx.doi.org/10.1007/BF01609396.

Seth Lloyd. Capacity of the noisy quantum channel. Physical Review A, 55(3):1613–1622, March 1997. doi: 10.1103/PhysRevA.55.1613. arXiv:quant-ph/9604015.

Keiji Matsumoto. A new quantum version of f -divergence. 2013.arXiv:1311.4722.

Keiji Matsumoto. Quantum fidelities, their duals, and convex analysis. Au-gust 2014a. arXiv:1408.3462.

Keiji Matsumoto. On the condition of conversion of classical probability dis-tribution families into quantum families. December 2014b. arXiv:1412.3680.

Keiji Matsumoto. A new quantum version of f-divergence. In MasanaoOzawa, Jeremy Butterfield, Hans Halvorson, Miklós Rédei, Yuichiro Ki-tajima, and Francesco Buscemi, editors, Reality and Measurement in Al-gebraic Quantum Theory, volume 261, pages 229–273, Singapore, 2018.Springer Singapore. ISBN 9789811324864 9789811324871. doi: 10.1007/978-981-13-2487-1_10. URL http://link.springer.com/10.1007/978-981-13-2487-1_10. Series Title: Springer Proceedings in Mathemat-ics & Statistics.

William Matthews and Stephanie Wehner. Finite blocklength conversebounds for quantum channels. IEEE Transactions on Information Theory, 60(11):7317–7329, November 2014. arXiv:1210.4722.

946

Page 958: Principles of Quantum Communication Theory: A Modern Approach

Ueli M. Maurer. Secret key agreement by public discussion from commoninformation. IEEE Transactions on Information Theory, 39(3):733–742, May1993.

Adam Miranowicz and Satoshi Ishizaka. Closed formula for the relative en-tropy of entanglement. Physical Review A, 78:032310, September 2008. doi:10.1103/PhysRevA.78.032310. URL https://link.aps.org/doi/10.1103/PhysRevA.78.032310. arXiv:0805.3134.

Gert Molière and Max Delbrück. Statistische Quantenmechanik und Thermody-namik. Berlin: Verlag der Akademie der Wissenschaften, 1935.

Ciara Morgan and Andreas Winter. “Pretty strong” converse for the quantumcapacity of degradable channels. IEEE Transactions on Information Theory, 60(1):317–333, January 2014. arXiv:1301.4927.

Milan Mosonyi and Nilanjana Datta. Generalized relative entropies and thecapacity of classical-quantum channels. Journal of Mathematical Physics, 50(7):072104, July 2009. doi: 10.1063/1.3167288. URL http://dx.doi.org/10.1063/1.3167288. arXiv:0810.3478.

Milán Mosonyi and Fumio Hiai. On the quantum Rényi relative entropiesand related capacity formulas. IEEE Transactions on Information Theory, 57(4):2474–2487, April 2011. arXiv:0912.1286.

Milán Mosonyi and Tomohiro Ogawa. Quantum hypothesis testing andthe operational interpretation of the quantum Rényi relative entropies.Communications in Mathematical Physics, 334(3):1617–1648, March 2015.arXiv:1309.3228.

Milán Mosonyi and Dénes Petz. Structure of sufficient quantum coarse-grainings. Letters in Mathematical Physics, 68(1):19–30, April 2004.ISSN 1573-0530. URL https://doi.org/10.1007/s11005-004-4072-2.arXiv:quant-ph/0312221.

Alexander Müller-Hermes. Transposition in quantum information theory.Master’s thesis, Technical University of Munich, September 2012.

Martin Müller-Lennert, Frédéric Dupuis, Oleg Szehr, Serge Fehr, and MarcoTomamichel. On quantum Rényi entropies: a new generalization and someproperties. Journal of Mathematical Physics, 54(12):122203, December 2013.URL https://doi.org/10.1063/1.4838856. arXiv:1306.3142.

947

Page 959: Principles of Quantum Communication Theory: A Modern Approach

Hiroshi Nagaoka. The converse part of the theorem for quantum Hoeffdingbound. November 2006. arXiv:quant-ph/0611289.

Mark Aronovich Naimark. Spectral functions of a symmetric operator. Izv.Akad. Nauk SSSR Ser. Mat., 4(3):277–318, 1940.

Michael A. Nielsen. Continuity bounds for entanglement. Phys. Rev. A, 61(6):064301, April 2000. arXiv:quant-ph/9908086.

Michael A. Nielsen. A simple formula for the average gate fidelity of a quan-tum dynamical operation. Physics Letters A, 303(4):249 – 252, 2002. ISSN0375-9601. doi: DOI:10.1016/S0375-9601(02)01272-0.

Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and QuantumInformation. Cambridge University Press, 2000.

Julien Niset, Jaromír Fiurasek, and Nicolas J. Cerf. No-go theorem for Gaus-sian quantum error correction. Physical Review Letters, 102(12):120501,March 2009. doi: 10.1103/PhysRevLett.102.120501. URL http://link.aps.org/doi/10.1103/PhysRevLett.102.120501. arXiv:0811.3128.

Michael Nussbaum and Arleta Szkoła. The Chernoff lower bound for sym-metric quantum hypothesis testing. The Annals of Statistics, pages 1040–1057, 2009. arXiv:quant-ph/0607216.

Tomohiro Ogawa and Hiroshi Nagaoka. Strong converse and Stein’s lemmain quantum hypothesis testing. IEEE Transactions on Information Theory, 46(7):2428–2433, November 2000. arXiv:quant-ph/9906090.

Samad Khabbazi Oskouei, Stefano Mancini, and Mark M. Wilde. Unionbound for quantum information processing. Proceedings of the RoyalSociety A, 475(2221):20180612, January 2019. doi: 10.1098/rspa.2018.0612. URL https://royalsocietypublishing.org/doi/abs/10.1098/rspa.2018.0612. arXiv:1804.08144.

Masanao Ozawa. Quantum measuring processes of continuous observables.Journal of Mathematical Physics, 25(1):79–87, 1984.

Vern Paulsen. Completely Bounded Maps and Operator Algebras. CambridgeStudies in Advanced Mathematics. Cambridge University Press, 2003. doi:10.1017/CBO9780511546631.

948

Page 960: Principles of Quantum Communication Theory: A Modern Approach

Asher Peres. Separability criterion for density matrices. Physical Review Let-ters, 77(8):1413–1415, August 1996. doi: 10.1103/PhysRevLett.77.1413. URLhttp://link.aps.org/doi/10.1103/PhysRevLett.77.1413. arXiv:quant-ph/9604005.

Dénes Petz. Quasi-entropies for states of a von Neumann algebra. Publ. RIMS,Kyoto University, 21:787–800, 1985.

Dénes Petz. Quasi-entropies for finite quantum systems. Reports in Mathemat-ical Physics, 23:57–65, 1986a.

Dénes Petz. Sufficient subalgebras and the relative entropy of states of avon Neumann algebra. Communications in Mathematical Physics, 105(1):123–131, March 1986b. ISSN 1432-0916. URL https://doi.org/10.1007/BF01212345.

Dénes Petz. Sufficiency of channels over von Neumann algebras. QuarterlyJournal of Mathematics, 39(1):97–108, 1988. ISSN 1464-3847. URL https://doi.org/10.1093/qmath/39.1.97.

Dénes Petz. Monotonicity of quantum relative entropy revisited. Reviews inMathematical Physics, 15:79, March 2003. URL https://doi.org/10.1142/S0129055X03001576. arXiv:quant-ph/0209053.

Dénes Petz and Mary Beth Ruskai. Contraction of generalized relative en-tropy under stochastic mappings on matrices. Infinite Dimensional Analysis,Quantum Probability and Related Topics, 1(1):83–89, January 1998.

Stefano Pirandola, Riccardo Laurenza, Carlo Ottaviani, and Leonardo Banchi.Fundamental limits of repeaterless quantum communications. Nature Com-munications, 8:15043, 2017.

Martin B. Plenio. Logarithmic negativity: A full entanglement monotone thatis not convex. Physical Review Letters, 95(9):090503, August 2005. doi: 10.1103/PhysRevLett.95.090503. URL https://link.aps.org/doi/10.1103/PhysRevLett.95.090503. arXiv:quant-ph/0505071.

Martin B. Plenio and Shashank Virmani. An introduction to entanglementmeasures. Quantum Information & Compution, 7(1):1–51, January 2007. ISSN1533-7146. URL http://dl.acm.org/citation.cfm?id=2011706.2011707.arXiv:quant-ph/0504163.

949

Page 961: Principles of Quantum Communication Theory: A Modern Approach

Martin B. Plenio, Shashank Virmani, and P. Papadopoulos. Operator mono-tones, the reduction criterion and the relative entropy. Journal of Physics A:Mathematical and General, 33(22):L193, June 2000. URL http://stacks.iop.org/0305-4470/33/i=22/a=101. arXiv:quant-ph/0002075.

Yury Polyanskiy and Sergio Verdú. Arimoto channel coding converse andRényi divergence. In Proceedings of the 48th Annual Allerton Conference onCommunication, Control, and Computation, pages 1327–1333, September 2010.

Haoyu Qi, Qing-Le Wang, and Mark M. Wilde. Applications of position-based coding to classical communication over quantum channels. Journal ofPhysics A, 51(44):444002, November 2018. arXiv:1704.01361.

Lu-Feng Qiao, Alexander Streltsov, Jun Gao, Swapan Rana, Ruo-Jing Ren,Zhi-Qiang Jiao, Cheng-Qiu Hu, Xiao-Yun Xu, Ci-Yu Wang, Hao Tang, Ai-Lin Yang, Zhi-Hao Ma, Maciej Lewenstein, and Xian-Min Jin. Entan-glement activation from quantum coherence and superposition. Physi-cal Review A, 98(5):052351, November 2018. doi: 10.1103/PhysRevA.98.052351. URL https://link.aps.org/doi/10.1103/PhysRevA.98.052351.arXiv:1710.04447.

Eric M. Rains. Entanglement purification via separable superoperators.arXiv:quant-ph/9707002, 1998.

Eric M. Rains. Bound on distillable entanglement. Physical Review A, 60(1):179–184, July 1999a. doi: 10.1103/PhysRevA.60.179. URL http://link.aps.org/doi/10.1103/PhysRevA.60.179. arXiv:quant-ph/9809082.

Eric M. Rains. Rigorous treatment of distillable entanglement. Physical ReviewA, 60:173–178, July 1999b. doi: 10.1103/PhysRevA.60.173. URL https://link.aps.org/doi/10.1103/PhysRevA.60.173. arXiv:quant-ph/9809078.

Eric M. Rains. A semidefinite program for distillable entanglement.IEEE Transactions on Information Theory, 47(7):2921–2933, November 2001.arXiv:quant-ph/0008047.

Alexey E. Rastegin. Relative error of state-dependent cloning. Physical Re-view A, 66(4):042304, October 2002. doi: 10.1103/PhysRevA.66.042304. URLhttp://link.aps.org/doi/10.1103/PhysRevA.66.042304.

Alexey E. Rastegin. A lower bound on the relative error of mixed-statecloning and related operations. Journal of Optics B: Quantum and Semi-classical Optics, 5(6):S647, December 2003. URL http://stacks.iop.org/1464-4266/5/i=6/a=017. arXiv:quant-ph/0208159.

950

Page 962: Principles of Quantum Communication Theory: A Modern Approach

Alexey E. Rastegin. Sine distance for quantum states. February 2006.arXiv:quant-ph/0602112.

Michael Reed and Barry Simon. Methods of Modern Mathematical Physics, vol-ume I: Functional Analysis. Academic Press, 1981. ISBN 9780080570488.

Joseph M. Renes and Renato Renner. Noisy channel coding via privacyamplification and information reconciliation. IEEE Transactions on Infor-mation Theory, 57(11):7377–7385, November 2011. ISSN 0018-9448. doi:10.1109/TIT.2011.2162226. arXiv:1012.4814.

Renato Renner. Security of Quantum Key Distribution. PhD thesis, ETH Zürich,December 2005. arXiv:quant-ph/0512258.

Luca Rigovacca, Go Kato, Stefan Baeuml, Myungshik S. Kim, William J.Munro, and Koji Azuma. Versatile relative entropy bounds for quantumnetworks. New Journal of Physics, 20:013033, January 2018. arXiv:1707.05543.

Ralph Tyrrell Rockafellar. Convex Analysis. Princeton Landmarks in Mathe-matics and Physics. Princeton University Press, 1970. ISBN 9780691015866.

Bill Rosgen and John Watrous. On the hardness of distinguishing mixed-statequantum computations. Proceedings of the 20th IEEE Conference on Computa-tional Complexity, pages 344–354, June 2005. arXiv:cs/0407056.

Sheldon Ross. Introduction to Probability Models. Academic Press, 12 edition,2019. ISBN 978-0-12-814346-9.

Aidan Roy and A. J. Scott. Unitary designs and codes. Designs, Codes andCryptography, 53(1):13–31, October 2009. doi: 10.1007/s10623-009-9290-2.URL https://doi.org/10.1007/s10623-009-9290-2. arXiv:0809.3813.

Walter Rudin. Principles of Mathematical Analysis. International Series in Pureand Applied Mathematics. McGraw-Hill, 1976. ISBN 9780070856134.

Mary Beth Ruskai. Inequalities for quantum entropy: A review withconditions for equality. Journal of Mathematical Physics, 43(9):4358–4375,September 2002. doi: http://dx.doi.org/10.1063/1.1497701. arXiv:quant-ph/0205064.

Benjamin Schumacher. Sending entanglement through noisy quantum chan-nels. Physical Review A, 54:2614–2628, October 1996. doi: 10.1103/PhysRevA.54.2614. URL https://link.aps.org/doi/10.1103/PhysRevA.54.2614. arXiv:quant-ph/9604023.

951

Page 963: Principles of Quantum Communication Theory: A Modern Approach

Benjamin Schumacher and Michael A. Nielsen. Quantum data processingand error correction. Physical Review A, 54(4):2629–2635, October 1996. doi:10.1103/PhysRevA.54.2629. arXiv:quant-ph/9604022.

Benjamin Schumacher and Michael D. Westmoreland. Sending classical infor-mation via noisy quantum channels. Physical Review A, 56(1):131–138, July1997.

Benjamin Schumacher and Michael D. Westmoreland. Approximate quan-tum error correction. Quantum Information Processing, 1(1):5–12, April 2002.ISSN 1573-1332. doi: 10.1023/A:1019653202562. URL https://doi.org/10.1023/A:1019653202562. arXiv:quant-ph/0112106.

Pranab Sen. Achieving the Han-Kobayashi inner bound for the quan-tum interference channel by sequential decoding. September 2011.arXiv:1109.0802.

Claude Shannon. The zero error capacity of a noisy channel. IRE Transactionson Information Theory, IT-2:S8–S19, September 1956.

Naresh Sharma. Equality conditions for the quantum f -relative entropy andgeneralized data processing inequalities. In 2010 IEEE International Sympo-sium on Information Theory, pages 2698–2702, June 2010. doi: 10.1109/ISIT.2010.5513655. arXiv:0906.4755.

Naresh Sharma and Naqueeb A. Warsi. On the strong converses for the quan-tum channel capacity theorems. May 2012. arXiv:1205.1712.

Yaoyun Shi and Xiaodi Wu. Epsilon-net method for optimizations over sepa-rable states. In Artur Czumaj, Kurt Mehlhorn, Andrew Pitts, and RogerWattenhofer, editors, Automata, Languages, and Programming, pages 798–809, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. ISBN 978-3-642-31594-7. arXiv:1112.0808.

Maksim E. Shirokov. Tight uniform continuity bounds for the quantum con-ditional mutual information, for the Holevo quantity, and for capacities ofquantum channels. Journal of Mathematical Physics, 58(10):102202, October2017. doi: 10.1063/1.4987135. URL https://doi.org/10.1063/1.4987135.arXiv:1512.09047.

Peter W. Shor. Scheme for reducing decoherence in quantum computer mem-ory. Physical Review A, 52(4):R2493–R2496, October 1995. doi: 10.1103/PhysRevA.52.R2493.

952

Page 964: Principles of Quantum Communication Theory: A Modern Approach

Peter W. Shor. Additivity of the classical capacity of entanglement-breakingquantum channels. Journal of Mathematical Physics, 43(9):4334–4340, 2002a.doi: 10.1063/1.1498000. arXiv:quant-ph/0201149.

Peter W. Shor. The quantum channel capacity and coherent information. InLecture Notes, MSRI Workshop on Quantum Computation, 2002b.

Peter W. Shor. Equivalence of additivity questions in quantum informa-tion theory. Communications in Mathematical Physics, 246(3):453–472, 2004.arXiv:quant-ph/0305035.

Maurice Sion. On general minimax theorems. Pacific Journal of Mathematics, 8(1):171–176, March 1958. URL https://msp.org/pjm/1958/8-1/p14.xhtml.

Graeme Smith and John A. Smolin. Degenerate quantum codes for Paulichannels. Physical Review Letters, 98:030501, January 2007. doi: 10.1103/PhysRevLett.98.030501. URL https://link.aps.org/doi/10.1103/PhysRevLett.98.030501. arXiv:quant-ph/0604107.

Graeme Smith and Jon Yard. Quantum communication with zero-capacitychannels. Science, 321(5897):1812–1815, September 2008. arXiv:0807.4935.

Graeme Smith, John A. Smolin, and Jon Yard. Quantum communication withGaussian channels of zero quantum capacity. Nature Photonics, 5:624–627,August 2011. arXiv:1102.4580.

Akihito Soeda, Peter S. Turner, and Mio Murao. Entanglement costof implementing controlled-unitary operations. Physical Review Let-ters, 107(18):180501, October 2011. doi: 10.1103/physrevlett.107.180501.arXiv:1008.1128.

Benjamin Steinberg. Representation Theory of Finite Groups: An IntroductoryApproach. Springer New York, 2011. ISBN 9781461407768.

William F. Stinespring. Positive functions on C*-algebras. Proceedings of theAmerican Mathematical Society, 6:211–216, 1955.

Gilbert Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press andSIAM, fifth edition, May 2016.

Ruslan L. Stratonovich. Information capacity of a quantum communicationschannel. i. Soviet Radiophysics, 8(1):82–91, January 1965. ISSN 1573-9120.doi: 10.1007/BF01038470. URL https://doi.org/10.1007/BF01038470.

953

Page 965: Principles of Quantum Communication Theory: A Modern Approach

David Sutter, Volkher B. Scholz, Andreas Winter, and Renato Renner. Ap-proximate degradable quantum channels. IEEE Transactions on InformationTheory, 63(12):7832–7844, December 2017. URL https://ieeexplore.ieee.org/document/8046086. arXiv:1412.0980.

Masahiro Takeoka, Masashi Ban, and Masahide Sasaki. Quantum chan-nel of continuous variable teleportation and nonclassicality of quantumstates. Journal of Optics B: Quantum and Semiclassical Optics, 4(2):114, April2002. URL http://stacks.iop.org/1464-4266/4/i=2/a=306. arXiv:quant-ph/0110031.

Masahiro Takeoka, Saikat Guha, and Mark M. Wilde. The squashed entangle-ment of a quantum channel. IEEE Transactions on Information Theory, 60(8):4987–4998, August 2014. ISSN 0018-9448. arXiv:1310.0129.

Masahiro Takeoka, Kaushik P. Seshadreesan, and Mark M. Wilde. Uncon-strained distillation capacities of a pure-loss bosonic broadcast channel. In2016 IEEE International Symposium on Information Theory (ISIT), pages 2484–2488, July 2016. doi: 10.1109/ISIT.2016.7541746.

Masahiro Takeoka, Kaushik P. Seshadreesan, and Mark M. Wilde. Uncon-strained capacities of quantum key distribution and entanglement distilla-tion for pure-loss bosonic broadcast channels. Physical Review Letters, 119(15):150501, October 2017. arXiv:1706.06746.

Marco Tomamichel. Quantum Information Processing with Finite Resources:Mathematical Foundations, volume 5. Springer, 2015. arXiv:1504.00233.

Marco Tomamichel and Masahito Hayashi. A hierarchy of information quan-tities for finite block length analysis of quantum tasks. IEEE Transactions onInformation Theory, 59(11):7693–7710, November 2013. ISSN 0018-9448. doi:10.1109/TIT.2013.2276628. arXiv:1208.1478.

Marco Tomamichel, Roger Colbeck, and Renato Renner. A fully quantumasymptotic equipartition property. IEEE Transactions on Information Theory,55(12):5840–5847, December 2009. arXiv:0811.1221.

Marco Tomamichel, Roger Colbeck, and Renato Renner. Duality betweensmooth min- and max-entropies. IEEE Transactions on Information Theory,56(9):4674–4681, September 2010. arXiv:0907.5238.

Marco Tomamichel, Mario Berta, and Joseph M. Renes. Quantum cod-ing with finite resources. Nature Communications, 7:11419, May 2016.arXiv:1504.04617.

954

Page 966: Principles of Quantum Communication Theory: A Modern Approach

Marco Tomamichel, Mark M. Wilde, and Andreas Winter. Strong converserates for quantum communication. IEEE Transactions on Information Theory,63(1):715–727, January 2017. doi: 10.1109/tit.2016.2615847. arXiv:1406.2946.

Robert R. Tucci. Quantum entanglement and conditional information trans-mission. September 1999. arXiv: quant-ph/9909041.

Robert R. Tucci. Entanglement of distillation and conditional mutual infor-mation. Februrary 2002. arXiv:quant-ph/0202144.

Armin Uhlmann. The “transition probability” in the state space of a *-algebra.Reports on Mathematical Physics, 9(2):273–279, April 1976.

Michael L. Ulrey. Sequential coding for channels with feedback. Informationand Control, 32(2):93–100, October 1976.

Hisaharu Umegaki. Conditional expectations in an operator algebra IV (en-tropy and information). Kodai Mathematical Seminar Reports, 14(2):59–85,1962.

Lieven Vandenberghe and Stephen Boyd. Semidefinite programming. SIAMReview, 38(1):49–95, 1996. doi: 10.1137/1038003. URL https://doi.org/10.1137/1038003.

Gonzalo Vazquez-Vilar. Multiple quantum hypothesis testing expressionsand classical-quantum channel converse bounds. In 2016 IEEE InternationalSymposium on Information Theory, pages 2854–2857, Barcelona, Spain, 2016.arXiv:1607.07625.

Vlatko Vedral and Martin B. Plenio. Entanglement measures and purificationprocedures. Physical Review A, 57(3):1619–1633, March 1998. doi: 10.1103/PhysRevA.57.1619. URL http://link.aps.org/doi/10.1103/PhysRevA.57.1619. arXiv:quant-ph/9707035.

Vlatko Vedral, Martin B. Plenio, M. A. Rippin, and Peter L. Knight. Quanti-fying entanglement. Physical Review Letters, 78(12):2275–2279, March 1997.doi: 10.1103/PhysRevLett.78.2275. URL https://link.aps.org/doi/10.1103/PhysRevLett.78.2275. arXiv:quant-ph/9702027.

Sergio Verdu. On channel capacity per unit cost. IEEE Transactions on Infor-mation Theory, 36(5):1019–1030, 1990. doi: 10.1109/18.57201.

955

Page 967: Principles of Quantum Communication Theory: A Modern Approach

Guifré Vidal. Entanglement monotones. Journal of Modern Optics, 47(2-3):355–376, 2000. doi: 10.1080/09500340008244048. URL https://www.tandfonline.com/doi/abs/10.1080/09500340008244048. arXiv:quant-ph/9807077.

Guifré Vidal and Reinhard F. Werner. Computable measure of entanglement.Physical Review A, 65:032314, February 2002. doi: 10.1103/PhysRevA.65.032314. URL https://link.aps.org/doi/10.1103/PhysRevA.65.032314.arXiv:quant-ph/0102117.

Johann von Neumann. Wahrscheinlichkeitstheoretischer aufbau der quan-tenmechanik. Nachrichten von der Gesellschaft der Wissenschaften zu Göt-tingen, Mathematisch-Physikalische Klasse, 1(11):245–272, 1927a. URL http://eudml.org/doc/59230.

Johann von Neumann. Thermodynamik quantenmechanischer gesamtheiten.Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen,Mathematisch-Physikalische Klasse, 102:273–291, 1927b.

Johann von Neumann. Zur theorie der gesellschaftsspiele. MathematischeAnnalen, 100(1):295–320, December 1928. ISSN 1432-1807. URL https://doi.org/10.1007/BF01448847.

Johann von Neumann. Mathematische grundlagen der quantenmechanik. Verlagvon Julius Springer Berlin, 1932.

Kun Wang, Xin Wang, and Mark M. Wilde. Quantifying the unextendibilityof entanglement. November 2019a. arXiv:1911.07433.

Ligong Wang and Renato Renner. One-shot classical-quantum capacity andhypothesis testing. Physical Review Letters, 108(20):200501, 2012. doi: 10.1103/PhysRevLett.108.200501. URL https://link.aps.org/doi/10.1103/PhysRevLett.108.200501. arXiv:1007.5456.

Xin Wang and Runyao Duan. Improved semidefinite programming upperbound on distillable entanglement. Physical Review A, 94(5):050301, Novem-ber 2016a. doi: 10.1103/physreva.94.050301. arXiv:1601.07940.

Xin Wang and Runyao Duan. A semidefinite programming upper boundof quantum capacity. In 2016 IEEE International Symposium on Infor-mation Theory (ISIT). IEEE, July 2016b. doi: 10.1109/isit.2016.7541587.arXiv:1601.06888.

956

Page 968: Principles of Quantum Communication Theory: A Modern Approach

Xin Wang and Mark M. Wilde. α-logarithmic negativity. April 2019a.arXiv:1904.10437.

Xin Wang and Mark M. Wilde. Resource theory of asymmetric distinguisha-bility. Physical Review Research, 1(3):033170, December 2019b. doi: 10.1103/PhysRevResearch.1.033170. URL http://arxiv.org/abs/1905.11629.

Xin Wang, Wei Xie, and Runyao Duan. Semidefinite programming strongconverse bounds for classical capacity. IEEE Transactions on Information The-ory, 64(1):640–653, January 2018. ISSN 0018-9448. doi: 10.1109/TIT.2017.2741101. arXiv:1610.06381.

Xin Wang, Kun Fang, and Runyao Duan. Semidefinite programming con-verse bounds for quantum communication. IEEE Transactions on InformationTheory, 65(4):2583–2592, April 2019b. arXiv:1709.00200.

Xin Wang, Kun Fang, and Marco Tomamichel. On converse bounds for clas-sical communication over quantum channels. IEEE Transactions on Infor-mation Theory, 65(7):4609–4619, July 2019c. doi: 10.1109/TIT.2019.2898656.URL https://arxiv.org/abs/1709.05258. arXiv:1709.05258.

John Watrous. Semidefinite programs for completely bounded norms. The-ory of Computing, 5(11):217–238, November 2009. doi: 10.4086/toc.2009.v005a011. arXiv:0901.4709.

John Watrous. Simpler semidefinite programs for completely boundednorms. Chicago Journal of Theoretical Computer Science, July 2013.arXiv:1207.5726.

John Watrous. The Theory of Quantum Information. Cambridge UniversityPress, 2018. doi: 10.1017/9781316848142.

Reinhard F. Werner. An application of Bell’s inequalities to a quantum stateextension problem. Letters in Mathematical Physics, 17(4):359–363, May1989a. doi: 10.1007/BF00399761.

Reinhard F. Werner. Quantum states with Einstein-Podolsky-Rosen correla-tions admitting a hidden-variable model. Physical Review A, 40(8):4277–4281, October 1989b. doi: 10.1103/PhysRevA.40.4277. URL http://link.aps.org/doi/10.1103/PhysRevA.40.4277.

Reinhard F. Werner. All teleportation and dense coding schemes. Jour-nal of Physics A: Mathematical and General, 34(35):7081, September 2001.

957

Page 969: Principles of Quantum Communication Theory: A Modern Approach

URL http://stacks.iop.org/0305-4470/34/i=35/a=332. arXiv:quant-ph/0003070.

Mark M. Wilde. Sequential decoding of a general classical-quantum chan-nel. Proceedings of the Royal Society of London A: Mathematical, Physical andEngineering Sciences, 469(2157), September 2013. ISSN 1364-5021. doi:10.1098/rspa.2013.0259. arXiv:1303.0808.

Mark M. Wilde. Squashed entanglement and approximate private states.Quantum Information Processing, 15(11):4563–4580, November 2016. ISSN1573-1332. doi: 10.1007/s11128-016-1432-7. URL http://dx.doi.org/10.1007/s11128-016-1432-7. arXiv:1606.08028.

Mark M. Wilde. Quantum Information Theory. Cambridge University Press,second edition, 2017. URL https://doi.org/10.1017/CBO9781139525343.arXiv:1106.1445v7.

Mark M. Wilde. Strong and uniform convergence in the teleportation simu-lation of bosonic Gaussian channels. Physical Review A, 97(6):062305, June2018a. doi: 10.1103/PhysRevA.97.062305. URL https://link.aps.org/doi/10.1103/PhysRevA.97.062305. arXiv:1712.00145.

Mark M. Wilde. Optimized quantum f -divergences and data processing.Journal of Physics A, 51(37):374002, September 2018b. arXiv:1710.10252.

Mark M. Wilde and Haoyu Qi. Energy-constrained private and quantum ca-pacities of quantum channels. IEEE Transactions on Information Theory, 64(12):7802–7827, December 2018. arXiv:1609.01997.

Mark M. Wilde and Andreas Winter. Strong converse for the quantum capac-ity of the erasure channel for almost all codes. Proceedings of the 9th Confer-ence on the Theory of Quantum Computation, Communication and Cryptography,May 2014. arXiv:1402.3626.

Mark M. Wilde, Andreas Winter, and Dong Yang. Strong converse forthe classical capacity of entanglement-breaking and Hadamard channelsvia a sandwiched Rényi relative entropy. Communications in Mathemati-cal Physics, 331(2):593–622, October 2014. URL https://doi.org/10.1007/s00220-014-2122-x. arXiv:1306.1586.

Mark M. Wilde, Marco Tomamichel, and Mario Berta. Converse bounds forprivate communication over quantum channels. IEEE Transactions on Infor-mation Theory, 63(3):1792–1817, March 2017. arXiv:1602.08898.

958

Page 970: Principles of Quantum Communication Theory: A Modern Approach

Andreas Winter. Tight uniform continuity bounds for quantum entropies:conditional entropy, relative entropy distance and energy constraints.Communications in Mathematical Physics, 347(1):291–313, October 2016.arXiv:1507.07775.

Michael M. Wolf, David Pérez-García, and Geza Giedke. Quantum capacitiesof bosonic channels. Physical Review Letters, 98(13):130501, March 2007. doi:10.1103/PhysRevLett.98.130501. arXiv:quant-ph/0606132.

Jacob Wolfowitz. Coding Theorems of Information Theory, volume 31. Springer,1964.

William K. Wootters. Entanglement of formation of an arbitrary state oftwo qubits. Physical Review Letters, 80:2245–2248, March 1998. doi:10.1103/PhysRevLett.80.2245. URL https://link.aps.org/doi/10.1103/PhysRevLett.80.2245. arXiv:quant-ph/9709029.

Dong Yang. A simple proof of monogamy of entanglement. Physics LettersA, 360(2):249–250, 2006. ISSN 0375-9601. doi: https://doi.org/10.1016/j.physleta.2006.08.027. URL http://www.sciencedirect.com/science/article/pii/S0375960106012801. arXiv:quant-ph/0604168.

Jon Yard, Patrick Hayden, and Igor Devetak. Capacity theorems for quan-tum multiple-access channels: Classical-quantum and quantum-quantumcapacity regions. IEEE Transactions on Information Theory, 54(7):3091–3113,July 2008. arXiv:quant-ph/0501045.

Haidong Yuan and Chi-Hang Fred Fung. Fidelity and Fisher information onquantum channels. New Journal of Physics, 19(11):113039, November 2017.doi: 10.1088/1367-2630/aa874c. URL http://stacks.iop.org/1367-2630/19/i=11/a=113039?key=crossref.c8abb94f653e6d572133885d9e0b86b0.arXiv:1506.00819.

Horace Yuen, Robert Kennedy, and Melvin Lax. Optimum testing of multiplehypotheses in quantum detection theory. IEEE Transactions on InformationTheory, 21(2):125–134, March 1975.

Sisi Zhou and Liang Jiang. An exact correspondence between the quantumFisher information and the Bures metric. October 2019. URL http://arxiv.org/abs/1910.08473. arXiv:1910.08473.

Xinlan Zhou, Debbie W. Leung, and Isaac L. Chuang. Methodology for quan-tum logic gate construction. Physical Review A, 62(5):052316, oct 2000. ISSN

959

Page 971: Principles of Quantum Communication Theory: A Modern Approach

1050-2947. doi: 10.1103/PhysRevA.62.052316. URL https://link.aps.org/doi/10.1103/PhysRevA.62.052316.

960