princeton university...acknowledgments i would like to express my sincere appreciation to all those...

209
Reversible and Quantum Circuit Synthesis Chia-Chun Lin A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy Recommended for Acceptance by the Department of Electrical Engineering Adviser: Niraj K. Jha June 2014

Upload: others

Post on 13-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Reversible and Quantum Circuit

    Synthesis

    Chia-Chun Lin

    A Dissertation

    Presented to the Faculty

    of Princeton University

    in Candidacy for the Degree

    of Doctor of Philosophy

    Recommended for Acceptance

    by the Department of

    Electrical Engineering

    Adviser: Niraj K. Jha

    June 2014

  • c⃝ Copyright by Chia-Chun Lin, 2014.

    All rights reserved.

  • Abstract

    This thesis presents five major tools for the synthesis of reversible and quantum cir-

    cuits. Quantum computation has the ability to solve several important problems sig-

    nificantly faster than its classical counterpart. Because of this promise, much research

    effort has been dedicated to discovering new quantum algorithms and technologies.

    Quantum mechanics postulates that the time-evolution of quantum states is re-

    versible. Thus, reversibility is a necessary condition for quantum computing. Hence,

    we propose an effective method and tool, called RMDDS, for synthesizing reversible

    circuits. Since the evolution of quantum states is determined by some primitive

    physical operations, quantum computers implemented in different physical systems

    have different cost. Therefore, we propose an optimized quantum gate library, called

    QGLVP, for various physical machine descriptions.

    To enhance synthesis efficiency, we introduce QLib, a quantum module library,

    which contains scripts to generate quantum modules for many well-known quantum

    algorithms.

    Since a quantum system inevitably interacts with the environment, this leads to

    error and consequent failure of computation. To address this problem, we propose

    FTQLS, a tool that synthesizes and optimizes fault-tolerant quantum circuits by using

    logic identity rules for various physical machine descriptions.

    Finally, we present a tool, called PAQCS, for physical design-aware fault-tolerant

    quantum circuit synthesis. It effectively synthesizes quantum logic circuits into quan-

    tum physical circuits, targeting different physical machine descriptions and quantum

    error correction codes.

    iii

  • Acknowledgments

    I would like to express my sincere appreciation to all those who made this thesis

    possible.

    I am deeply grateful for the guidance of my adviser, Prof. Niraj K. Jha. His

    enthusiasm for innovative research and meticulous attention to detail greatly inspired

    and influenced me. He showed me the right direction at every critical moment during

    the development of this thesis and directed me all the way.

    I also received much advice and guidance from Prof. Sun-Yuan Kung, Prof. Sus-

    mita Sur-Kolay, and Prof. Amlan Chakrabarti. Discussions with them were always a

    great source for new ideas. Without their encouragement and inspiration, this thesis

    would not have been possible.

    Thesis review is demanding work, and doing it under time pressure only makes

    it harder. I would like to thank my dissertation readers, Prof. Niraj K. Jha, Prof.

    Susmita Sur-Kolay, and Prof. Kaushik Sengupta, for their extensive efforts in polish-

    ing this thesis. I am also thankful to Prof. Niraj K. Jha, Prof. Sun-Yuan Kung, and

    Prof. Stephen A. Lyon for agreeing to be on my final public oral committee.

    Profs. Bede Liu and Sharad Malik kindly offered me teaching opportunities in

    their ELE482 and ELE206 courses. Those were valuable and enjoyable experiences.

    I would like to thank my English tutor Sandra Richter and host family Anne

    Remillard for their help, encouragement in my PhD life, and sharing the American

    culture and traditions with me.

    I appreciate the help I received from Chun-Yi Lee, Chunxiao Li, Meng Zhang,

    Jun-Wei Chuah, Ting-Jung Lin, Sourindra Chaudhuri, Aoxiang Tang, Xianmin Chen,

    Yang Yang, Debajit Bhattacharya over the past few years.

    Last, but not the least, I am grateful to my father, mother, brother, and wife An

    Dai and all my friends for their support during my PhD study. They were unbelievably

    encouraging and forbearing. They are the reason I keep going.

    iv

  • List of Abbreviations

    BDD binary decision diagram

    CTL Clifford plus T library

    DDS decision diagram synthesis

    FT fault-tolerant

    FTQLS fault-tolerant quantum logic synthesis

    FTS fault-tolerant set

    IT ion trap

    KFDD Kronecker functional decision diagram

    LP linear photonics

    NA neutral atom

    NP nonlinear photonics

    PAQCS physical design-aware fault-tolerant quantum circuit synthesis

    PMD physical machine description

    QC quantum cost

    QD quantum dot

    QECC quantum error correction code

    QGLVP quantum gate library for various physical machine descriptions

    QLib quantum module library

    RMDDS Reed-Muller decision diagram synthesis

    RMRLS Reed-Muller reversible logic synthesis

    SC superconducting

    SKA Solovay-Kitaev algorithm

    STA skipping table algorithm

    ULB universal logic block

    v

  • Contents

    Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

    Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

    List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

    List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

    List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

    1 Introduction 1

    1.1 Quantum System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.2 Quantum Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.3 Quantum Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.4 Reversible Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    1.5 FT Quantum Computation . . . . . . . . . . . . . . . . . . . . . . . . 9

    1.6 Physical Machine Description . . . . . . . . . . . . . . . . . . . . . . 10

    1.7 Physical Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    1.8 Synthesis Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    1.9 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 15

    1.10 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2 Related Work 17

    2.1 Reversible Logic Synthesis . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.1.1 Exact Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    vi

  • 2.1.2 Heuristic Synthesis . . . . . . . . . . . . . . . . . . . . . . . . 19

    2.2 Quantum Logic Synthesis . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.2.1 Logic Identities . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.2.2 Quantum Shannon Decomposition . . . . . . . . . . . . . . . . 22

    2.2.3 Quantum Compilation . . . . . . . . . . . . . . . . . . . . . . 22

    2.3 Physical Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    2.3.1 Tiled Quantum Architecture . . . . . . . . . . . . . . . . . . . 23

    2.3.2 Physical Synthesis . . . . . . . . . . . . . . . . . . . . . . . . 25

    3 RMDDS: Reed-Muller Decision Diagram Synthesis for Reversible Logic 26

    3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    3.2.1 Reversible Functions . . . . . . . . . . . . . . . . . . . . . . . 29

    3.2.2 Reversible Gates . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.2.3 Reed-Muller Expansion . . . . . . . . . . . . . . . . . . . . . . 32

    3.2.4 Decision Diagrams . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.3 Motivational Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    3.3.1 Example 1: #qubits vs. QC . . . . . . . . . . . . . . . . . . . 34

    3.3.2 Example 2: Circuit Size vs. Synthesis Time . . . . . . . . . . 35

    3.4 RMDDS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    3.4.1 RMRLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    3.4.2 DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    3.4.3 RMDDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    3.4.4 Synthesis Example . . . . . . . . . . . . . . . . . . . . . . . . 44

    3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    3.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    vii

  • 4 QLib: Quantum Module Library 59

    4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    4.2 Quantum Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    4.2.1 One-qubit Gates . . . . . . . . . . . . . . . . . . . . . . . . . 62

    4.2.2 Two-qubit Gates . . . . . . . . . . . . . . . . . . . . . . . . . 64

    4.2.3 Reversible Gates . . . . . . . . . . . . . . . . . . . . . . . . . 66

    4.3 File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

    4.4 Quantum Module Library . . . . . . . . . . . . . . . . . . . . . . . . 68

    4.4.1 QFT/IQFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    4.4.2 Bernstein-Vazirani Search (BVS) . . . . . . . . . . . . . . . . 69

    4.4.3 Grover’s Search . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    4.4.4 Arithmetic Circuits . . . . . . . . . . . . . . . . . . . . . . . . 72

    4.5 Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

    4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    5 QGLVP: Optimized Quantum Gate Library for Various PMDs 84

    5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    5.2 Primitive Quantum Gates . . . . . . . . . . . . . . . . . . . . . . . . 87

    5.2.1 One-qubit Gates . . . . . . . . . . . . . . . . . . . . . . . . . 88

    5.2.2 Two-qubit Gates . . . . . . . . . . . . . . . . . . . . . . . . . 91

    5.3 Identity Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    5.3.1 One-qubit Identity Rules . . . . . . . . . . . . . . . . . . . . . 93

    5.3.2 Two-qubit Identity Rules . . . . . . . . . . . . . . . . . . . . . 96

    5.3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    5.4 Quantum Gate Library . . . . . . . . . . . . . . . . . . . . . . . . . . 101

    5.4.1 RY(θ) Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    5.4.2 RZ(θ) Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

    5.4.3 H Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

    viii

  • 5.4.4 CZ Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

    5.4.5 CNOT Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

    5.4.6 CP Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

    5.4.7 G Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

    5.4.8 iSW Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

    5.4.9 SWAP Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    5.4.10 ZENO Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

    5.4.11 Toffoli Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

    5.4.12 Peres Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

    5.4.13 Fredkin Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

    5.4.14 Cost Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

    5.4.15 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

    5.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

    6 FTQLS: Fault-tolerant Quantum Logic Synthesis 118

    6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

    6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

    6.2.1 FT Quantum Computation . . . . . . . . . . . . . . . . . . . 121

    6.2.2 FTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

    6.3 Synthesis Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

    6.3.1 Technology Mapping . . . . . . . . . . . . . . . . . . . . . . . 126

    6.3.2 Quantum Compilation . . . . . . . . . . . . . . . . . . . . . . 126

    6.4 Circuit Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

    6.4.1 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 129

    6.4.2 Simplify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

    6.4.3 Interchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

    6.4.4 Commute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

    6.4.5 Optimization Process . . . . . . . . . . . . . . . . . . . . . . . 133

    ix

  • 6.4.6 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

    6.4.7 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

    6.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

    6.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

    7 PAQCS: Physical Design-aware Fault-tolerant Quantum Circuit Synthesis147

    7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

    7.2 Swap Based Quantum Tile Architecture . . . . . . . . . . . . . . . . 150

    7.3 Motivational Example . . . . . . . . . . . . . . . . . . . . . . . . . . 152

    7.4 Physical Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

    7.4.1 Qubit Placement . . . . . . . . . . . . . . . . . . . . . . . . . 154

    7.4.2 Channel Routing . . . . . . . . . . . . . . . . . . . . . . . . . 158

    7.4.3 PAQCS: Synthesis Flow . . . . . . . . . . . . . . . . . . . . . 163

    7.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

    7.5.1 Placement and Routing . . . . . . . . . . . . . . . . . . . . . . 165

    7.5.2 PAQCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

    7.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

    8 Conclusions and Future Research 180

    Bibliography 184

    x

  • List of Tables

    3.1 Experimental results for reversible functions under #qubits minimization 48

    3.2 Experimental results for reversible functions under QC minimization . 49

    3.3 Percentage increase in #qubits and decrease in QC from the #qubits to QC optimization objective for reversible functions 51

    3.4 Comparison with the best results . . . . . . . . . . . . . . . . . . . . 52

    3.5 Experimental results for irreversible functions under #qubits minimization 56

    3.6 Experimental results for irreversible functions under QC minimization 57

    3.7 Percentage increase in #qubits and decrease in QC from the #qubits to QC optimization objective for irreversible functions 58

    4.1 Syntax for quantum gates . . . . . . . . . . . . . . . . . . . . . . . . 68

    4.2 Impact of synthesis scripts . . . . . . . . . . . . . . . . . . . . . . . . 82

    5.1 Supported operations in different PMDs . . . . . . . . . . . . . . . . 86

    5.2 One-qubit identity rules (length=2) . . . . . . . . . . . . . . . . . . . 95

    5.3 One-qubit identity rules (length=3) . . . . . . . . . . . . . . . . . . . 96

    5.4 H gate identity rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    5.5 RXY gate identity rules . . . . . . . . . . . . . . . . . . . . . . . . . 97

    5.6 Asqu gate identity rules . . . . . . . . . . . . . . . . . . . . . . . . . 97

    5.7 Two-qubit identity rules . . . . . . . . . . . . . . . . . . . . . . . . . 100

    5.8 Execution cycles for each operation in QD . . . . . . . . . . . . . . . 102

    5.9 Execution cycles for each operation in SC . . . . . . . . . . . . . . . . 102

    5.10 Number of operations of each gate on every PMD . . . . . . . . . . . 114

    xi

  • 5.11 Number of execution cycles of each gate on every PMD . . . . . . . . 115

    5.12 Cost of Grover’s algorithm for search value |01⟩ . . . . . . . . . . . . 116

    6.1 Synthesis result for RZ(π64) using SKA and STA . . . . . . . . . . . . 123

    6.2 Conversion between one-qubit FTS and CTL . . . . . . . . . . . . . . 124

    6.3 Interchange rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

    6.4 Experimental results for plain adder circuits . . . . . . . . . . . . . . 139

    6.5 Experimental results for QFT circuits with only OPT-3 optimization 140

    6.6 Experimental results for QFT circuits with three-stage optimization and the improvement142

    6.7 Using CTL and FTS for QFT circuits in the LP system . . . . . . . 143

    6.8 Synthesis results part I . . . . . . . . . . . . . . . . . . . . . . . . . . 144

    6.9 Synthesis results part II . . . . . . . . . . . . . . . . . . . . . . . . . 145

    6.10 Average percentage reductions for different PMDs . . . . . . . . . . . 145

    6.11 Optimization results for a 4-bit modular adder based on SKA and STA 146

    7.1 Physical #ops for three QECCs and two PMDs . . . . . . . . . . . . 166

    7.2 Physical #cycles for three QECCs and two PMDs . . . . . . . . . . . 167

    7.3 Synthesis results for QLib benchmarks part I . . . . . . . . . . . . . . 169

    7.4 Synthesis results for QLib benchmarks part II . . . . . . . . . . . . . 170

    7.5 Synthesis results for RevLib benchmarks part I . . . . . . . . . . . . 171

    7.6 Synthesis results for RevLib benchmarks part II . . . . . . . . . . . . 172

    7.7 Comparisons with synthesis results presented in other work . . . . . . 175

    7.8 Average improvements due to PAQCS . . . . . . . . . . . . . . . . . 176

    xii

  • List of Figures

    1.1 Bloch sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.2 An example quantum circuit . . . . . . . . . . . . . . . . . . . . . . . 6

    1.3 High-level synthesis flow . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.1 Three-level tile hierarchy, each kth-level tile is composed of nine (k − 1)th-level tiles 24

    2.2 Two types of tiled architecture. (a) movement based and (b) swap based 24

    3.1 (a) A 2-to-1 irreversible OR gate/function, and (b) a 3-to-3 reversible OR gate obtained by adding a constant input (c1) and two garbage outputs (g1, g2) 30

    3.2 (a) n-bit MCT, (b) NOT, (c) CNOT, and (d) Toffoli gate . . . . . . . 30

    3.3 Two implementations of a five-bit MCT gate: (a) with [#qubits,QC] = [5,29] and (b) with [#qubits,QC] = [8,16] 34

    3.4 Truth table for Example 2 . . . . . . . . . . . . . . . . . . . . . . . . 35

    3.5 (a) Quick synthesis by direct placement of PPRM terms on ancillary lines, with [#qubits,QC] = [6,19] and (b) RMRLS requiring more synthesis time, with [#qubits,QC] = [3,11] 35

    3.6 Synthesis flow of RMRLS . . . . . . . . . . . . . . . . . . . . . . . . 36

    3.7 MCT cascades for various decompositions of a non-shared vertex in a DD 38

    3.8 MCT cascades for various decompositions of a shared vertex in a DD through the addition of an ancillary bit 39

    3.9 Synthesis flow of RMDDS . . . . . . . . . . . . . . . . . . . . . . . . 40

    3.10 Pseudocode of RMDDS . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    3.11 A synthesis example . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    3.12 Distribution of solutions for 0410184 when using the NCTFP library. 50

    3.13 Solutions for hwb6 58 under relaxed synthesis times when using the NCTFP library 53

    4.1 Two-qubit quantum gates: (a) CNOT, (b) CP, (c) CZ, and (d) SWAP 65

    xiii

  • 4.2 An example circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    4.3 An example output file . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    4.4 A four-qubit QFT circuit . . . . . . . . . . . . . . . . . . . . . . . . . 69

    4.5 An EWS module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    4.6 The BVS circuit for n = 3 and a = 5 . . . . . . . . . . . . . . . . . . 71

    4.7 The circuit structure for Grover’s search . . . . . . . . . . . . . . . . 72

    4.8 The circuit for diffusion operator D . . . . . . . . . . . . . . . . . . . 72

    4.9 The circuit structure of Cuccaro’s adder (four-qubit) . . . . . . . . . 74

    4.10 (a) MAJ, and (b) UMA modules for Cuccaro’s adder . . . . . . . . . 74

    4.11 The circuit structure of Draper’s adder (four-qubit) . . . . . . . . . . 75

    4.12 (a) CMAJ, and (b) CUMA modules for controlled Cuccaro’s adder . . 75

    4.13 A four-qubit multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    4.14 Rotate left by 1 (≪ 1) . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    4.15 A modular adder (a+b)%N . . . . . . . . . . . . . . . . . . . . . . . 77

    4.16 A modular adder for constant a and N . . . . . . . . . . . . . . . . . 78

    4.17 A modular subtracter for constant a and N . . . . . . . . . . . . . . 78

    4.18 Constant modular multiplier . . . . . . . . . . . . . . . . . . . . . . 79

    4.19 Restored modular multiplier . . . . . . . . . . . . . . . . . . . . . . . 80

    4.20 Modular exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . 81

    5.1 Symbols of (a) iSW and (b) G gates . . . . . . . . . . . . . . . . . . . 91

    5.2 CZ implementation: (a) and (b) in the NP system, (c) in the IT system104

    5.3 CNOT implementation: (a) from H gate, (b) SC, NA, (c) QD, and (d) IT systems104

    5.4 CP gate construction: (a) from RZ and CRZ gates, and for the (b) LP, NP, (c) QD, (d) NA, and (e) IT systems106

    5.5 G gate construction: (a) from RZ and CRZ gates, and for the (b) LP, NP, (c) QD, (d) NA, and (e) SC systems106

    5.6 iSW gate construction: (a) from CRX, (b) from CZ, applicable to QD, NA, and LP, (c) NP, and (d) IT systems107

    5.7 SWAP gate construction: (a) NP, (b) NA, (c) QD, (d) IT, and (e) SC systems108

    5.8 ZENO gate construction: (a) from the SWAP gate, and (b) for the QD systems109

    xiv

  • 5.9 Several ways to construct a CCZ gate using five two-qubit gates . . . 110

    5.10 Two ways to construct a CS gate . . . . . . . . . . . . . . . . . . . . 111

    5.11 Construction of a CA gate . . . . . . . . . . . . . . . . . . . . . . . . 111

    5.12 Toffoli gate construction based on (a) CNOT and (b) CZ gates . . . . 111

    5.13 Toffoli gate construction: (a) basic implementation, (b) suitable for the SC system (the dashed box is a CV gate), (c) suitable for the IT system112

    5.14 (a) Peres and (b) Fredkin gates . . . . . . . . . . . . . . . . . . . . . 112

    5.15 Peres gate construction: (a) basic implementation, (b) suitable for the SC system, (c) suitable for the IT system113

    5.16 (a) Implementation of Grover’s algorithm for search value |01⟩, and (b) optimized circuit for the NP system (the gates in the dashed box can be merged into an Asqu operation; the dashed arrow indicates the critical path)116

    6.1 FT implementations: (a) RZ(π4), (b) RX(

    π4), and (c) RY(

    π4). “↗” indicates a measurement gate125

    6.2 Synthesis flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

    6.3 Conversion of non-FT two-qubit gates to those in FTS(2): (a) CP, (b) G, and (c) iSW127

    6.4 Synthesis flow of non-FT one-qubit gates based on the FT table . . . 128

    6.5 Application flow for the simplify rule . . . . . . . . . . . . . . . . . . 131

    6.6 Commute rules for (a) CNOT, (b) CP, and (c) G gates . . . . . . . . 132

    6.7 Pseudocode of the optimization process . . . . . . . . . . . . . . . . . 134

    6.8 An example of window selection . . . . . . . . . . . . . . . . . . . . . 135

    6.9 An optimization example . . . . . . . . . . . . . . . . . . . . . . . . . 136

    6.10 Verification of the optimization process . . . . . . . . . . . . . . . . 137

    7.1 The architecture creates communication channels through the use of swap chains, which exchange the states of adjacent qubits. Application of a swap chain, consisting of S(q8,q5) and S(q5,q2), to the three qubits is equivalent to the application of S(s8,s5) and S(s8,s2) to the corresponding qubit states. Note that though their qubit states are swapped, the qubit positions remain fixed151

    7.2 A motivational example. (a) A logic circuit, where q0-q5 and s0-s5 refer to the name and state of qubits, respectively. (b) When qubits are placed trivially, #sw = (costB,costU,costR) = (8,6,8). (c) When channel routing is carefully taken into account, #sw = (8,3,4). (d) When both qubit placement and routing are carefully taken into account, #sw reduces to (4,1,2)152

    7.3 A placement example. (a) Initial state. Placement of the (b) v0, (c) v2, (d) v1, (e) v3, (f) v4, and (g) v5 qubits157

    7.4 A routing example. (a) Input circuit and its qubit and state layouts. (b)-(e) Routing for C(s0,s8). (f)-(h) Routing for C(s0,s5), C(s3,s6), and C(s5,s7), respectively. (i)-(j) Routing for C(s2,s0). (k) Routing for C(s5,s8). (l)-(m) Routing for C(s7,s6). (n) Recovery of the qubit states to their original position: pop ga stack with S(s0,s5), S(s4,s0), and S(s1,s0). (o) Recovered qubits states161

    7.5 Synthesis flow of PAQCS . . . . . . . . . . . . . . . . . . . . . . . . . 164

    7.6 #ops for the Steane code . . . . . . . . . . . . . . . . . . . . . . . . . 176

    7.7 #cycles for the Steane code . . . . . . . . . . . . . . . . . . . . . . . 177

    7.8 #ops for the Bacon-Shor code . . . . . . . . . . . . . . . . . . . . . . 177

    xv

  • 7.9 #cycles for the Bacon-Shor code . . . . . . . . . . . . . . . . . . . . . 178

    7.10 #ops for the Knill code . . . . . . . . . . . . . . . . . . . . . . . . . . 178

    7.11 #cycles for the Knill code . . . . . . . . . . . . . . . . . . . . . . . . 179

    xvi

  • Chapter 1

    Introduction

    The lure of quantum computing comes from the promise that it can significantly out-

    perform its classical counterpart when solving some important problems. Large-scale

    quantum computers will be able to solve certain problems much more quickly than

    any classical computer using the best currently known algorithms, like integer factor-

    ization using Shor’s algorithm [1], which is capable of breaking RSA encryption [2] in

    polynomial time. There exist quantum algorithms, such as Grover’s algorithm, which

    run faster than any possible classical algorithms. Given sufficient computational re-

    sources, a classical computer could be made to simulate any quantum algorithm [3].

    However, the computational basis of 100 quantum bits (qubits), for example, would

    already be too large to be represented on a classical computer because it would re-

    quire 2100 complex values to be stored. Due to its strong computation ability, quantum

    computation has ignited a lot of interest in the field.

    Quantum computation is the study of the information processing that can be

    accomplished using quantum mechanical systems [3, 4]. It is tempting to say that the

    operations of a quantum computer are governed by the laws of quantum mechanics.

    However, it is true that although all physical operations are governed by quantum

    mechanics, we would not say that our desktop are quantum computers. In general, a

    1

  • quantum computer is the one whose operations are governed by certain very special

    transformations of its internal states. The laws of quantum mechanics allow these

    peculiar transformations to take place under very carefully controlled conditions.

    In a quantum computer, the logical operations must have no physical interactions

    whatever that are not under the complete control of the system. All other interactions

    may introduce potentially catastrophic disruptions into the operation of a quantum

    computer. Such disruptions between what matters for the computation and what

    does not result in decoherence, which is fatal to quantum computation.

    To avoid decoherence, in general, quantum operations cannot be carried out in

    macroscopic physical systems, because most such systems cannot be isolated from the

    external environment. Such an isolation can be achieved if the qubits are operated

    upon in microscopic physical systems. Such microscopic systems must be decoupled

    from their surroundings except for the completely controlled interactions that are

    associated with the computation process itself.

    Quantum mechanics postulates that all quantum operations (except measurement,

    as discussed later) are invertible, and thus all valid quantum gates are reversible. In

    fact, quantum computers display an important part of their magic through reversible

    operations, which transform the initial state of qubits into its final form, using only

    processes whose action can be inverted. There is only a single irreversible operation

    in quantum computation, the measurement, which is the only way to extract useful

    information from qubits after their state has acquired its final form. The extracted

    information is then processed by a classical computer.

    In general, to implement quantum algorithms, we need both quantum and clas-

    sical computers. The quantum computer executes the desired sequence of quantum

    operations and the classical computer provides the control for these operations and

    also performs post-processing of the computed results.

    2

  • A quantum algorithm is executed by a quantum circuit, which comprises a se-

    quence of quantum gates. These quantum gates may be decomposed into several

    primitive quantum operations, supported by corresponding quantum physical ma-

    chine descriptions (PMDs). In addition, because quantum systems are delicate and

    difficult to control, fault-tolerant (FT) quantum circuits are needed for practical im-

    plementation. Therefore, several quantum error correction codes (QECCs) are pro-

    posed to facilitate FT computation. For physical quantum circuit realization, the

    physical address of qubits needs to be considered in order to honor the physical dis-

    tance constraint.

    The synthesis of quantum circuits is difficult and several metrics must be con-

    sidered, such as the number of primitive operations (#ops), the number of critical

    execution cycles (#cycles), QECC, and different PMDs. In the thesis, we provide sev-

    eral methodologies for quantum circuit synthesis and optimization targeting different

    metrics. These techniques are scalable to any quantum algorithm. Their integration

    yields optimized physical quantum circuits for various PMDs at the end of synthesis

    process.

    In the section, some background knowledge about quantum computing is provided,

    followed by a high-level view of quantum circuit synthesis.

    1.1 Quantum System

    In quantum computation, a qubit refers to a unit of quantum information [3, 4]. It

    has very different characteristics from a classical bit. For example, a classical bit may

    only take on two distinct values: 0 or 1. However, a qubit does not suffer from this

    limitation.

    In a two-state quantum system, a qubit |ψ⟩ can be described by [3]:

    |ψ⟩ = α0|0⟩+ α1|1⟩ (1.1)3

  • where |·⟩ is the ket vector in Dirac notation, which indicates that |0⟩ and |1⟩ are

    column vectors corresponding to:

    |0⟩ ≡

    10

    , |1⟩ ≡01

    (1.2)A qubit is a superposition of |0⟩ and |1⟩, which means that the qubit |ψ⟩ exists

    in the two states simultaneously. α0 and α1 are complex coefficients. They represent

    the amplitudes of |0⟩ and |1⟩, respectively, with the normalization constraint |α0|2 +

    |α1|2 = 1.

    The qubit in Eq. (1.1) can also be written in terms of azimuth and elevation angles

    as:

    |ψ⟩ = eiγ(cos θ2|0⟩+ eiϕ sin θ

    2|1⟩) (1.3)

    where γ, θ, and ϕ are all real numbers. The global phase eiγ is physically indis-

    tinguishable and thus can be ignored. Therefore, we can further simplify it to the

    form:

    |ψ⟩ = cos θ2|0⟩+ eiϕ sin θ

    2|1⟩ (1.4)

    We can visualize Eq. (1.4) on the surface of a three-dimensional sphere, known as the

    Bloch sphere, as shown in Fig. 1.1. The north and south poles represent states |0⟩

    and |1⟩, respectively. A point on the Bloch sphere represents a superposition of the

    two states. A valid single-qubit operation represents a rotation on the Bloch sphere.

    The pair of elevation and azimuth angles (θ, ϕ) are in the range of 0 ≤ θ < π and

    0 ≤ ϕ < 2π.

    Since a physical system changes over time, a quantum state |ψ⟩ is actually a func-

    tion of time: |ψ(t)⟩. Quantum mechanics postulates that the evolution of a quantum

    4

  • φ

    θ

    |0〉

    |1〉

    x

    y

    z

    |ψ〉

    Figure 1.1: Bloch sphere

    state of a closed quantum system can be described by Schrödinger’s equation [3]:

    i~∂|ψ(t)⟩∂t

    = Ĥ|ψ(t)⟩ (1.5)

    where ~ is Planck’s constant divided by 2π and Ĥ is a Hermitian operator, called

    the Hamiltonian, which represents the total observable energy of the system. For

    simplicity, if we consider Ĥ to be independent of time, the equation can be solved as:

    |ψ(t)⟩ = e−iĤ(t−t0)/~|ψ(t0)⟩ = Û |ψ(t0)⟩ (1.6)

    where Û is a unitary operator, U †U = I, and I is identity operator. Hence, all

    valid quantum operations are unitary and thus reversible. Reversibility is a necessary

    condition for quantum computing.

    The evolution of an isolated quantum system with a finite number of states can

    be described by a unitary matrix. An n-qubit operation is represented by a 2n × 2n

    unitary matrix.

    5

  • BC

    A

    Time flow

    | ψ〉 C(B⊗A)| ψ〉

    Figure 1.2: An example quantum circuit

    1.2 Quantum Circuit

    A quantum circuit comprises a sequence of quantum gates. An n-qubit circuit is

    depicted with n horizontal lines, with time flowing from left to right. A quantum

    computation typically requires the application of several quantum gates, sequentially

    or in parallel, to various subsets of qubits. The net unitary transformation they

    perform can be expressed in a matrix product form, which is a cascade of the unitary

    matrices of the corresponding quantum gates, using two rules: dot product and tensor

    product [3, 4].

    Dot Product

    The dot product is the same as matrix multiplication. If several gates act on the

    same subset of qubits, then those gates must be applied in series and their overall

    effect computed by the dot product. In a quantum circuit, if operation A acts before

    B, their overall effect is computed by their dot product in reverse order, i.e., B · A.

    Tensor Product

    If adjacent gates within a quantum circuit act on independent subsets of qubits, then

    they can be applied simultaneously in parallel. The net effect of the parallel gates is

    evaluated with the tensor product, denoted by “⊗”.

    An example is shown in Fig. 1.2. The system computes C · [B ⊗A], where A and

    B are 2× 2 matrices and C is a 4× 4 matrix.

    6

  • As indicated earlier, all quantum operations are unitary and thus invertible. The

    inverse of a unitary matrix U is U−1 = U †, which is its conjugate transpose. Therefore,

    obtaining the inverted circuits is straightforward, i.e., reverse the order of the gates

    and use the conjugate transpose of each gate. For example, the inverse circuit of the

    one in Fig. 1.2 is [B† ⊗ A†] · C†.

    When two physical systems are treated as one combined system, the state space

    of the combined physical system is the tensor product H = H1 ⊗H2, where H1 and

    H2 are the component subsystems. It is important to note that the state of a many-

    qubit composite system cannot always be decomposed as a product of its component

    subsystems. In this case, we say that the qubits are entangled. Entanglement is a

    physical phenomenon that occurs when groups of particles are generated or interact in

    ways that the quantum state of each member must subsequently be described relative

    to each other.

    1.3 Quantum Algorithm

    A quantum algorithm is one that runs on a realistic model of a quantum computer [4].

    Just as a classical algorithm is a finite sequence of instructions, where each step

    or instruction can be performed on a classical computer, a quantum algorithm is a

    sequence of instructions, where each instruction is performed on a quantum computer.

    Although all classical algorithms can also be performed on a quantum computer, the

    term quantum algorithm is typically used for those algorithms that are inherently

    quantum or use some essential feature of quantum computation, such as quantum

    superposition or entanglement [5].

    Quantum algorithms are interesting because they might be able to solve some

    problems much faster than classical algorithms. The most well known algorithms are

    Shor’s algorithm for factoring and Grover’s algorithm for searching an unstructured

    7

  • database or an unordered list. Shor’s algorithm runs exponentially faster than the

    best known classical algorithm for factoring. Grover’s algorithm runs quadratically

    faster than the best possible classical algorithm for the same task.

    In general, a quantum algorithm consists of both classical and quantum func-

    tions [3, 6]. Classical functions are described in the classical domain, i.e., the mapping

    of input to output states consists of binary values. Since quantum mechanics requires

    the time-evolution of quantum states to be reversible, the classical functions need to

    be reversible. Their circuit implementations consist of a sequence of reversible gates

    that implement reversible operations. Quantum functions are described in the quan-

    tum domain, i.e., the mapping of input to output states is described by a unitary

    transformation matrix with complex values. Their circuit realizations are composed

    of quantum gates.

    1.4 Reversible Computing

    A function is reversible if its input and output mapping is bijective and thus is in-

    formation lossless. Therefore, all reversible computations can be computed forward

    or backward (inversely) without loss of information. The study of reversible logic

    has historically been motivated by power consumption. Landauer’s principle (or Von

    Neumann-Landauer limit) states that regardless of the technology chosen for imple-

    menting a circuit, when a bit of information is erased, at leastKT ln 2 Joules of energy

    is dissipated, where K is the Boltzmann constant and T the operating temperature

    in degrees Kelvin [7]. In 1973, Bennett [8] showed that zero-energy computation is

    possible only if the computation is reversible and a recent experiment has been shown

    to validate this perspective [9].

    Reversible computing not only attracts interest from the point of view of power

    consumption, but also finds many applications in other areas. One of the most impor-

    8

  • tant applications of reversible computing is quantum computation [3]. In quantum

    computation, reversible functions are mostly used for generating oracles. An ora-

    cle can be seen as the description of the problem to solve. The reversible functions

    provide a high-level description of quantum algorithms. These functions are synthe-

    sized to reversible circuits. Then, these reversible gates are further decomposed into

    low-level quantum circuits for physical realization on a target PMD.

    1.5 FT Quantum Computation

    Quantum systems are fragile because when they inevitably interact with the environ-

    ment, the information stored in the system decoheres, thus resulting in the failure of

    computation. Hence, quantum circuits need to be FT in a practical implementation.

    FT quantum computation implies that quantum circuits perform correctly even when

    errors occur.

    Many QECCs have been proposed to facilitate FT quantum computation, such as

    Steane code [3, 10], Bacon-Shor code [11], Knill code [12], and surface code [13, 14].

    QECC is employed to detect and correct such errors. However, the QECC circuit may

    itself incur an error. Thus, the probability of an error, for each qubit must be smaller

    than an error threshold, such that the overall circuit error probability is reduced after

    QECC is applied [15].

    Assuming that the errors are independent, if the probability of a single error in a

    single physical gate is p, the error probability of an encoded logic gate is given by Cp2,

    where C is a constant that depends on the code and circuitry used [3]. This can be

    improved if a concatenation code is used, which entails another level of QECC applied

    on top of the previous level. Therefore, the error rate becomes C(Cp2)2 = C3p4 for

    two-level concatenation. In general, a k-level concatenation code has a logical error

    9

  • rate of pk = 1/C(Cp)2k . Therefore, when p < 1/C, the error rate can be made

    arbitrarily small.

    Implementations of these QECCs are based on some FT gates [16], as will be

    defined and explained in Chapter 6. Therefore, if the circuits contain non-FT gates,

    they should be converted to FT gates first. This conversion process is called quantum

    compilation. For physical implementation of FT quantum computation, tiled quan-

    tum architectures are used where each tile is able to do a set of FT computations.

    The configuration and size of the tiles depend on the type of QECC and PMD.

    1.6 Physical Machine Description

    Physical realization of quantum computers is very challenging [3]. It not only requires

    a robust physical representation of qubits, but also the enabling of their time-evolution

    as desired. In quantum mechanics, the time-evolution of a closed quantum system

    is described by a unitary operator determined by its Hamiltonian [3]. Since different

    quantum systems have different Hamiltonians, they also have different PMDs. To

    perform a quantum operation, one must be able to control the Hamiltonians of the

    system. However, an operation may be easily performed in one system but with

    difficulty in another. Hence, one PMD may be more suitable for implementing a

    quantum logic gate than another.

    Several quantum systems have been proposed that support different PMDs:

    • Quantum dot (QD): In this system, a qubit is represented by the spin states

    of two electrons in a double electrostatically defined quantum dot, which has

    two potential wells with a tunneling barrier between them [17].

    • Superconducting (SC): In this system, charged carriers are used to represent

    qubits [18, 19, 20, 21, 22, 23]. At low temperatures in certain metals, two

    10

  • electrons can bind together to form a Cooper pair. Such a pair can be confined

    within an electrostatic box and used to represent quantum information.

    • Ion trap (IT): This quantum system is based on a 2D lattice of confined ions,

    each of which represents a physical qubit that can be moved within the lattice

    to accommodate local interactions [3, 24, 25, 26].

    • Neutral atom (NA): This is a system of trapped neutral atoms that can be

    isolated from the environment and whose simple quantum-level structure can

    be exploited [27, 28, 29].

    • Linear photonics (LP): In this quantum system, a probabilistic two-photon

    gate is teleported into a quantum circuit with high probability [30, 31, 32, 33].

    • Nonlinear photonics (NP): This quantum system is based on weak cross-

    Kerr nonlinearities [30, 34, 35].

    A broad survey of quantum systems is available from the ARDA quantum com-

    puting roadmap [36].

    1.7 Physical Design

    A quantum logic circuit does not consider the physical address of qubits and logic

    gates in it are applied to qubits without considering the physical distance between

    these qubits. In a physical circuit realization, the physical address of qubits needs

    to be considered. In a practical implementation of quantum circuits, physical qubits

    have to be placed on a grid. The grid implements the architecture of the quantum

    computer. A physical distance constraint often imposed is that quantum gates can

    only operate on adjacent qubits on the grid.

    Current quantum technologies support a set of one- and two-qubit quantum gates.

    A two-qubit gate enables interaction between the two qubits. In a physical circuit

    11

  • implementation, when the two qubits are far apart, a communication channel must

    be created between them. This communication overhead not only increases the com-

    putation latency but also the probability of a computation error.

    A complex multi-qubit reversible gate must be decomposed into a sequence of one-

    and two-qubit quantum gates. However, even a one- or two-qubit gate may not be

    directly implementable in a physical quantum machine. Therefore, it must be further

    decomposed using the set of supported primitive quantum operations in the PMD

    of the quantum machine. A quantum gate may require a very different number of

    primitive quantum operations for realization on different machines.

    1.8 Synthesis Flow

    There are many difficulties and challenges in implementing a quantum computer. A

    hierarchical architecture for constructing a quantum computer can be found in [37].

    From top to bottom, it includes quantum computing theory, quantum programming,

    quantum computer architecture, quantum computer micro-architecture, and technol-

    ogy building blocks. In this thesis, we concentrate on the synthesis of FT quantum

    circuits, i.e., we convert a given quantum algorithm into an FT circuit that is opti-

    mized for the given PMD.

    Quantum circuit synthesis is concerned with the ability to automatically generate

    an optimized quantum circuit from a given quantum algorithm. The synthesis of

    quantum circuits is generally difficult, but can be performed effectively by hierarchi-

    cally decomposing it into many stages. Fig. 1.3 shows a high-level view of a three-stage

    synthesis flow. It includes synthesis from the quantum algorithm to non-FT quantum

    logic, non-FT quantum logic circuit to FT quantum logic, and FT quantum logic to

    FT quantum physical circuit. Each synthesis stage is described next.

    12

  • Optimized

    quantum

    gate library

    (QGLVP)

    Quantum algorithm

    Quantum functionsClassical functions

    Quantum

    module library (QLib)

    Reversible logic

    synthesis (RMDDS)

    non-FT quantum logic

    PMD

    FT quantum

    logic synthesis

    (FTQLS)

    Physical design-aware quantum

    circuit synthesis (PAQCS)QECC

    FT physical quantum circuit

    FT quantum logic

    Figure 1.3: High-level synthesis flow

    13

  • The input of the flow is a quantum algorithm, which generally contains two parts:

    quantum functions and classical functions. Synthesizing an arbitrary quantum func-

    tion is very difficult and inefficient. Fortunately, quantum functions usually have

    regular structures and hence can be effectively synthesized with some quantum mod-

    ules. Therefore, a quantum module library (QLib) [38] is introduced to facilitate the

    synthesis of commonly used quantum modules. On the other hand, classical functions

    are reversible functions and can be synthesized by the Reed-Muller Decision Diagram

    Synthesis (RMDDS) tool [39], which is a flexible and efficient tool for reversible logic

    synthesis. RMDDS is able to optimize the number of qubits or the quantum cost of

    the circuit implementation, and the objective is influenced by the implementation of

    the underlying technology. After the synthesis of quantum and reversible functions, a

    non-FT quantum logic circuit, which implements the quantum algorithm, is derived.

    It is composed of high-level quantum logic gates.

    In order to convert the high-level logic gates into low-level quantum gates, an

    optimized quantum gate library for various PMDs (QGLVP) [40] is used. The gate

    library decomposes high-level gates into primitive quantum gates supported by various

    PMDs. With the decomposition library, the circuit is then optimized by the FT

    quantum logic synthesizer, FTQLS [41], which synthesizes and optimizes the non-FT

    logic to FT logic circuits. The focus of FTQLS is on optimizing circuits with FT

    primitive gates. A logical FT circuit is composed of a set of FT gates, such that the

    QECCs can be easily applied to the circuit.

    Next, the physical circuit is synthesized by using the physical design-aware quan-

    tum circuit synthesizer, PAQCS [42]. Based on the QECC and PMD chosen, the

    qubits are placed on a 2D grid, with some communication channels constructed. In

    this stage, the physical cost of QECC is considered. Thus, the optimization can be

    targeted to different QECCs and PMDs, which reflect true circuit cost.

    14

  • 1.9 Contributions of the Thesis

    As indicated in the previous section, the synthesis of quantum circuit is divided into

    three stages. The thesis proposes several effective synthesis methodologies to facilitate

    the synthesis flow.

    The thesis first proposes RMDDS, which is a flexible and efficient reversible logic

    synthesizer. It is flexible in the sense that users can can either optimize the number of

    qubits or the quantum cost in the circuit implementation. It is also efficient because

    the circuits can be synthesized within user-defined CPU times. This combination of

    flexibility and efficiency has been missing from synthesizers presented earlier.

    The thesis then discusses QLib, which contains scripts to generate quantum mod-

    ules of different sizes and specifications for well-known quantum algorithms. Because

    many quantum algorithms use similar subroutines that can be implemented with

    similar circuit modules, QLib is very helpful in quantum logic synthesis. In addition,

    QLib can also serve as a suite of benchmarks for quantum logic and physical synthesis.

    The thesis thereafter presents QGLVP. Quantum gates are themselves realized

    using primitive quantum operations that are supported by the PMDs. Thus, the

    quantum cost for implementing a quantum operation may differ from one PMD to

    another. Hence, the optimized quantum gate decompositions are different in different

    PMDs. To make our quantum gate library efficient in terms of the number of primitive

    quantum operations involved and the associated delay, we explore one-qubit and two-

    qubit quantum identity rules that can help remove redundancies in the quantum

    gate implementation. QGLVP thus provides technology mapping for quantum logic

    synthesis.

    Next, FTQLS is presented. The input to FTQLS is an unoptimized quantum

    logic realized using a set of commonly used gates and its output is an optimized FT

    quantum logic that only comprises primitive quantum operations supported by the

    given PMD. FTQLS does technology mapping for different PMDs and then converts

    15

  • non-FT circuits to FT circuits. For technology mapping, it utilizes QGLVP. Efficient

    conversion to FT circuits is done by integrating two quantum compilers and an FT

    cache table into FTQLS. For improving synthesis results, an FT set of gates that is

    directly supported by each PMD is proposed. Quantum circuit optimization is done

    by utilizing quantum identity rules. A methodology such as FTQLS has not been

    attempted before, to the best of our knowledge.

    The last part of the thesis discusses PAQCS. It does physical design-aware quan-

    tum circuit synthesis considering different PMDs and QECCs. It contains two efficient

    and effective algorithms: one for physical qubit placement and another for routing of

    communications. Physical design-aware quantum circuit design that can be targeted

    at multiple PMDs and QECCs has not been attempted before, to the best of our

    knowledge.

    1.10 Thesis Outline

    The rest of this thesis is organized as follows. Chapter 2 describes related work for

    the thesis. Chapter 3 describes RMDDS for reversible logic synthesis. Chapter 4

    discusses QLib, which contains scripts to generate quantum modules. Chapter 5

    introduces QGLVP, which is an optimized quantum gate library for various PMDs.

    Chapter 6 presents FTQLS, which effectively synthesize FT quantum logic circuit

    to various PMDs. Chapter 7 presents PAQCS. It synthesizes quantum logic circuits

    to quantum physical circuits. Chapter 8 concludes the thesis and presents ideas for

    future research.

    16

  • Chapter 2

    Related Work

    This chapter surveys previous work in reversible and quantum logic synthesis. Re-

    versible logic synthesis generates a high-level description of quantum oracle functions.

    Hence, it is a very important part of the synthesis flow. Many different methodolo-

    gies for reversible logic synthesis have been proposed. Each has some advantages

    and disadvantages. Therefore, it is possible that a hybrid method may exploit the

    complementary advantages of such methodologies.

    For quantum circuit synthesis, some important issues must be considered that do

    not exist in traditional circuit synthesis. These issues include the use of fault-tolerant

    (FT) gates, the type of quantum error correction code (QECC) applied, physical

    machine description (PMD), and physical qubit placement and routing.

    In this chapter, we summarize several important techniques that are used in re-

    versible and quantum circuit synthesis and discuss their characteristics, performance,

    and applications.

    2.1 Reversible Logic Synthesis

    Reversible computation has been historically motivated by theoretical research in low-

    power electronics. However, reversible logic not only attracts interest from low-power

    17

  • research, but also finds many applications in other areas. Given that reversible trans-

    formations are the bottleneck in many widely used algorithms, reversible instructions

    have been added to various microprocessor instruction sets [43]. In addition, the bit-

    permutation operation, which is reversible, has been proposed in [44] to increase the

    efficiency of cryptographic algorithms. Reversible debugging and program inversion

    allow us to undo a command in debugging environments and allow the reconstruction

    of decisions that lead to some particular outcomes. Therefore, program inversion and

    debugging [45, 46] and reversible programming languages [47, 48] have been proposed.

    Reversible adiabatic circuits have been proposed to recycle signal energy [49] and for

    supercomputing [50].

    Quantum computation is another important motivation for reversible computation

    because unitary transformations in quantum mechanics are reversible. Many quantum

    algorithms [51, 52, 53] have been introduced to solve several difficult problems in

    polynomial time. Because of these important applications, reversible circuit synthesis

    has become an important research topic.

    Reversible logic synthesis is concerned with the ability to automatically gener-

    ate a reversible circuit from a given reversible function. It synthesizes reversible

    functions into reversible circuits composed of multiple-controlled Toffoli (MCT) [54],

    Fredkin [55], and Peres [56] gates. Several reversible logic synthesis algorithms have

    been proposed. In general, they can be classified into two categories: exact and

    heuristic.

    2.1.1 Exact Synthesis

    Exact algorithms perform an exhaustive search for globally optimal solutions. An

    exact algorithm based on depth-first search with iterative deepening was used in [57]

    to find optimal solutions for all three-bit reversible functions. Another exact algorithm

    based on Boolean satisfiability was proposed in [58]. This algorithm guarantees a

    18

  • reversible circuit with a minimum number of gates only for circuits with up to 10 gates.

    Although exact algorithms guarantee optimality, they consume excessive synthesis

    times since there are 2n! permutations for an n-input reversible function. Thus, the

    synthesis time explodes with an increase in the number of input variables or the

    number of gates, limiting the usage of exact algorithms in practice.

    2.1.2 Heuristic Synthesis

    In order to synthesize reversible circuits more efficiently, several heuristic methods

    have been proposed that lead to reasonably good solutions. According to a sur-

    vey [59], these methods can be categorized into four types: (1) search-based, (2)

    binary-decision-diagram-based, (3) transformation-based, and (4) cycle-based. Dif-

    ferent heuristic algorithms excel at different aspects of synthesis: number of qubits

    (#qubits), quantum cost (QC) or synthesis time.

    Search-based Method

    Gupta et al. [60] proposed a search-based method, called Reed-Muller reversible logic

    synthesis (RMRLS), which uses the positive-polarity Reed-Muller (PPRM) expansion

    to synthesize reversible circuits. The methodology in [61] extends the one in [60] for

    synthesis with more reversible gates, such as Fredkin and Peres gates. Given enough

    memory and time, these methods can find a minimal circuit. RMRLS is typically

    effective in synthesizing circuits with variable count up to 6. However, many reversible

    functions/circuits exceed these limits.

    Transformation-based Method

    Transformation-based synthesis approaches [59, 62, 63, 64] rely on a truth table de-

    scription. This approach compares the identity function I with a given permutation

    function F , and uses a sequence of reversible gates to transform F into I. To conduct

    19

  • the transformation, the optimization metric used is the sum of Hamming distances

    between the binary patterns of F and I in each row of the truth table. The method

    iterates through the rows of the truth table. Since the size of the truth table increases

    exponentially with the number of inputs, the method becomes impractical for large

    input functions.

    Cycle-based Method

    Instead of synthesizing an entire permutation, one can factor it into a set of cycles and

    synthesize the resulting cycles separately. A cycle (a1, a2, . . . , an) is a permutation

    such that f(a1) = a2, f(a2) = a3, . . . , and f(an) = a1. Cycle-based methods [57, 65,

    66] use group theory to factor reversible functions into a set of cycles.

    Decision Diagram Method

    Since most of the heuristic algorithms are not scalable to a large number of inputs,

    Wille et al. [67] introduced a scalable heuristic algorithm based on Shannon decompo-

    sition and binary decision diagram (BDD) to efficiently synthesize large circuits. The

    average synthesis time is just a few seconds. Later, Soeken et al. [68] and Pang et al.

    [69, 70] used the Kronecker functional decision diagram (KFDD) to further improve

    circuit cost through Davio and Shannon expansions. Since both BDD and KFDD

    synthesis methods are based on the decision diagram data structure, we categorize

    them under the label: decision diagram synthesis (DDS). Although DDS is scalable

    and very efficient, it generates many ancillary input and output bits, which weakens

    their practical impact.

    Due to the characteristics of different methods, a well-designed hybrid synthesis

    method that leverages complementary advantages of various heuristics may provide an

    opportunity to perform effective tradeoffs. Reed-Muller Decision Diagram Synthesis

    (RMDDS), presented in this thesis, is such a hybrid method that exploits the advan-

    20

  • tages of search based and decision diagram based methods. It is efficient in terms of

    synthesis time. In addition, it enables trade-offs between #qubits and quantum cost,

    as shown in Chapter 3.

    2.2 Quantum Logic Synthesis

    Quantum logic circuits are built using a cascade of quantum gates, which operate on

    qubits. Synthesis of quantum logic has many aspects, based on the input and output

    it generates. Some quantum logic synthesis problems have been tackled before, as

    discussed next.

    2.2.1 Logic Identities

    In circuit design, often gates in adjacent circuit blocks can be merged, since a cir-

    cuit may admit many different but equivalent decompositions. These decompositions

    constitute circuit identities or templates. They can be used to optimize the quantum

    cost or critical path of the circuit. Some quantum circuit identities and templates

    have been proposed in [3, 71, 72, 73, 74]. The methods in [3, 71] consider identities

    involving Controlled-NOT (CNOT) gates. The method in [72] only considers one-

    qubit gate identities. The methods in [73, 74] target the NCV library consisting of

    CNOT and Controlled-V (CV) gates, where V is the square root of NOT. Since dif-

    ferent quantum systems support different primitive quantum operations, i.e., physical

    machine descriptions (PMDs), the methods discussed above are only partially appli-

    cable and, in many cases, completely inapplicable to the targeted PMDs. We need

    identities rules that are generally applicable to every PMD.

    21

  • 2.2.2 Quantum Shannon Decomposition

    Quantum Shannon decomposition is used to synthesize generic quantum circuits.

    The input of the synthesizer is an arbitrary quantum matrix and its output is a

    cascade of controlled or uncontrolled rotation gates. Typically, it utilizes cosine-

    sine decomposition [75, 76] or optimized brute-force search [77] to synthesize optimal

    circuits. However, due to the size of the matrix representation (an n-qubit circuit

    is described by a 2n × 2n matrix) and high computational complexity, O(4n), these

    methods cannot process large quantum circuits. Thus, they have limited use.

    2.2.3 Quantum Compilation

    Since an FT quantum circuit is only composed of FT gates, the conversion of non-FT

    gates to FT gates is an important procedure. The purpose of quantum compilation

    is to find a cascade of gates from a discrete universal gate set to approximate any

    arbitrary unitary gate within an error threshold. Currently, quantum compilation

    basically uses two methodologies to convert non-FT circuits into FT circuits. One is

    based on the Solovay-Kitaev algorithm (SKA) [16, 78] and the other is based on the

    skipping table algorithm (STA) [79, 80]. SKA and STA are introduced in detail in

    Section 6.2.1.

    2.3 Physical Design

    Quantum logic circuits cannot be directly implemented on a physical quantum system

    because they contain temporal, but no spatial, information of the circuit. A quantum

    physical circuit is derived by placing qubits on a grid, which denotes physical locations

    of the qubits. Adjacent qubits can interact with each other on the grid. If the two

    qubits that a two-qubit gate is applied to are not adjacent, a communication channel

    must be created.

    22

  • In general, a physical circuit is composed of many quantum tiles. Each tile contains

    a qubit and is able to do a set of FT operations. The implementation of a tile is based

    on the type of PMD and the QECC employed.

    2.3.1 Tiled Quantum Architecture

    Many quantum technologies and architectures have been proposed to enable various

    degrees of qubit interactions. The ion trap (IT) technology [24, 25, 26] uses one-

    dimensional (1D) interaction, in which only two adjacent qubits can interact with

    each other. This is, of course, highly restrictive. The quantum dot (QD) [17], su-

    perconducting (SC) [18, 19, 20, 21, 22, 23], neutral atom (NA) [27, 28, 29], linear

    photonics (LP) [30, 31, 32, 33], and non-linear photonics (NP) [30, 34, 35] technolo-

    gies use two-dimensional (2D) interaction, in which a qubit (except for those at the

    boundaries) can interact with four neighboring qubits. The technology presented in

    [81] uses three-dimensional (3D) interaction, in which a qubit can interact with six

    neighboring qubits. However, this makes quantum control very difficult. Currently,

    the 2D interaction technology is the most popular and most current research is based

    on the 2D structure.

    Various tiled quantum architectures have been proposed in order to implement FT

    quantum computation [82, 83, 84]. In such architectures, each tile contains a qubit

    register and implements a set of FT quantum gates. The architecture is typically

    hierarchical, which means that a kth-level tile comprises multiple (k− 1)th-level tiles.

    An example is shown in Fig. 2.1. It shows each level to be composed of nine sub-level

    tiles.

    Two types of quantum tile architectures have been proposed based on the mobility

    of qubits. The first type is movement based [82, 84, 85], as shown in Fig. 2.2(a). The

    universal logic block (ULB) tile shown in the figure is analogous to a configurable logic

    block employed in conventional FPGAs. ULBs are separated by routing channels

    23

  • T2

    T1

    T1

    T1

    T1

    T1

    T1

    T1

    T1

    T1

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    level 1 level 0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    T0

    Figure 2.1: Three-level tile hierarchy, each kth-level tile is composed of nine (k−1)th-level tiles

    ULB ULB ULB

    ULB ULB ULB

    ULB ULB ULB

    classical control

    cla

    ssic

    al co

    ntr

    ol

    classical control

    cla

    ssic

    al co

    ntr

    ol

    Q Q Q

    Q Q Q

    Q Q Q

    (b)(a)

    Figure 2.2: Two types of tiled architecture. (a) movement based and (b) swap based

    that are used to move qubits. NA, LP, and NP are three quantum technologies that

    support the movement of qubits and are thus compatible with this architecture. The

    second type is swap based, as shown in Fig. 2.2(b). Each tile Q contains a qubit and

    a set of FT gates. In QD and SC quantum technologies, the qubits are fixed, but the

    states of qubits move through the use of swap chains. A swap chain exchanges the

    states of adjacent qubits through a sequence of swap gates.

    For physical implementation of the tiles, several designs have been proposed. The

    designs are based on different QECCs, each with different #ops and #cycles. Some

    important tile implementations for the three QECCs are: Steane code [86], Bacon-

    Shor code [87], and Knill code [88].

    24

  • 2.3.2 Physical Synthesis

    Many techniques have been proposed to perform physical synthesis of quantum logic

    circuits to 1D quantum architectures [89, 90, 91, 92]. However, they are not suitable

    for 2D quantum architectures. A few hand-optimized 2D quantum circuits have been

    proposed in [93, 94, 95]. However, developing methodologies for physical circuit syn-

    thesis on 2D architectures is a nascent area of research. Recently, a methodology for

    qubit placement that minimizes communication overhead in a 2D quantum architec-

    ture has been proposed [96], which is the first attempt to address this problem. It

    uses mixed integer linear programming for qubit placement optimization.

    25

  • Chapter 3

    RMDDS: Reed-Muller Decision

    Diagram Synthesis for Reversible

    Logic

    In this chapter, we propose a flexible and efficient reversible logic synthesizer. It ex-

    ploits the complementary advantages of two methods: Reed-Muller Reversible Logic

    Synthesis (RMRLS) and Decision Diagram Synthesis (DDS), and is thus called Reed-

    Muller Decision Diagram Synthesis (RMDDS) [39]. RMRLS does not scale to a large

    number of qubits. DDS tools, even though efficient, add a large number of ancillary

    qubits and typically incur much higher quantum cost than necessary. RMDDS over-

    comes these obstacles. It is flexible in the sense that users can either optimize the

    number of qubits (#qubits) or the quantum cost (QC) in the circuit implementation.

    It is also efficient because the circuits can be synthesized within user-defined CPU

    times. This combination of flexibility and efficiency has been missing from synthesiz-

    ers presented earlier. When used to synthesize reversible functions, RMDDS reduces

    #qubits by up to 79.2% (average of 54.6%) when the synthesis objective is to min-

    imize #qubits and the QC by up to 71.5% (average of 35.7%) when the synthesis

    26

  • objective is to minimize QC, relative to DDS methods. For irreversible functions

    (which are automatically embedded in reversible functions), the corresponding best

    (average) reductions in #qubits is 42.1% (22.5%) for minimization of #qubits and

    63.0% (25.9%) for the minimization of QC.

    3.1 Introduction

    As indicated in Chapter 2.1, many reversible logic synthesis algorithms have been

    proposed in prior works. Since the exact algorithms are complex and require long

    synthesis times, in order to synthesize reversible circuits more efficiently, usually some

    heuristic methods are used that generate reasonably good solutions. RMRLS uses

    the positive-polarity Reed-Muller (PPRM) expansion to synthesize reversible circuits

    with optimized QC. RMRLS is typically effective in synthesizing circuits with small

    number of qubits and is not scalable to large circuits. On the other hand, DDS

    efficiently synthesizes large circuits with very low synthesis times, but with a large

    number of ancillary qubits.

    Both QC and #qubits are important metrics for evaluating the performance of

    reversible circuits. The QC of a reversible gate is equal to the number of elementary

    quantum operations required to implement its functionality. Hence, QC is related

    to the computation time and defines the computational complexity of the circuit.

    #qubits is an important metric for evaluating the quantum hardware cost of the

    circuit. Since allowing higher #qubits in the implementation may make its realization

    impractical, reducing #qubits is very important.

    The aim of this chapter is to present a heuristic algorithm that is scalable (to a

    large number of inputs) and efficient (i.e., synthesis time is at most a few minutes) like

    the DDS methods, yet results in fewer #qubits and smaller QC than these methods, in

    general. Thus, a hybrid synthesis method that leverages complementary advantages

    27

  • of various heuristics may provide an opportunity to perform tradeoffs among these

    three aspects. However, the integration of these heuristics needs to be performed

    carefully since a bad integration interface can decrease its efficiency and effectiveness.

    The integration of a transformation-based method with DDS is not easy because the

    truth table specification does not scale with the number of inputs. In order to form

    a hybrid of cycle-based and DDS methods, decomposition into cycles and conversion

    between the canonical cycle form, which is the input format used by cycle-based

    methods, and a decision diagram (DD) pose major hurdles.

    Due to the above drawbacks, we choose to integrate RMRLS with DDS. Because

    the conversion between the PPRM expansion and a DD is straightforward, it is much

    easier to form a hybrid of RMRLS and DDS. This combination can leverage the

    complementary advantages of RMRLS (in reducing #qubits and QC) and DDS (in

    terms of scalability and efficiency). We refer to this hybrid method as RMDDS.

    RMDDS is very flexible. It allows users to specify an objective (or impose constraints)

    on the circuit implementation, such as #qubits, QC, or synthesis time. Previous

    reversible synthesizers do not provide such a flexibility.

    The remainder of the chapter is organized as follows. Section 3.2 provides back-

    ground material on reversible circuits. Section 3.3 provides some simple examples to

    motivate our work. Section 3.4 discusses the synthesis procedure in detail. Section 3.5

    presents experimental results. Finally, Section 3.6 concludes.

    3.2 Background

    In this section, we present background material to facilitate understanding of re-

    versible logic synthesis.

    28

  • 3.2.1 Reversible Functions

    A function is reversible if it performs a bijective (one-to-one and onto) mapping of all

    its inputs and outputs, else it is irreversible. Since reversible functions are bijective:

    (1) the operation of reversible functions is bidirectional, which means they can be

    operated in both the forward and backward directions, (2) reversible operations are

    information lossless, (3) the number of inputs and outputs of reversible functions are

    equal, (4) the outputs of reversible functions are permutations of the inputs, and (5)

    because information needs to be conserved, fan-out in reversible circuits is prohibited.

    The truth table of an irreversible function can be expanded to convert it to a

    reversible function. This can be done by adding garbage outputs and corresponding

    constant inputs to balance the number of inputs and outputs and then bijectively

    filling the augmented truth table. An example is shown in Fig. 3.1. The 2-to-1 OR

    gate/function is irreversible because its truth table is not bijective. There are three

    repeated outputs (1). Hence, its operation is information-lossy since we cannot always

    deduce the input vector from the corresponding output bit (i.e., information is erased

    during computation). This can be remedied by adding a constant input (c1) and two

    garbage outputs (g1 and g2) and filling the truth table bijectively.

    An irreversible function can be transformed into a reversible one with the addition

    of ⌈log2 p⌉ outputs, where p is the number of repeated outputs [97]. However, there

    are many ways to perform the transformation since there are 2n! permutations for an

    n-input reversible function.

    3.2.2 Reversible Gates

    Reversible circuits are constructed by cascading reversible gates. A commonly used

    reversible gate is theMultiple Controlled Toffoli (MCT) gate: an n-bit MCT gate [54]

    has n inputs and outputs. It passes the first n− 1 inputs (referred to as control bits)

    29

  • c1

    a b f g1

    g2

    0

    0

    0 0 0 0

    0 0 1 1 0 0

    0 1 0 1 0 1

    0 1 1 1 1 0

    1 0 0 0 0 1

    1 0 1 0 1 0

    1 1 0 0 1 1

    1 1 1 1 1 1

    OR

    c1

    a

    b

    f

    g1

    g2

    a

    bf

    (a)

    (b)

    a b f0

    0 0

    0 1 1

    1 0 1

    1 1 1

    Figure 3.1: (a) A 2-to-1 irreversible OR gate/function, and (b) a 3-to-3 reversible ORgate obtained by adding a constant input (c1) and two garbage outputs (g1, g2)

    x1

    x2

    x3

    x1

    x2

    x3⊕x1x2

    x1

    x2

    x1

    x2⊕x1

    x1 x1⊕1=x1'

    x1x2x3

    xn-1xn

    y1y2y3

    yn-1yn

    ...

    ...

    (a)

    (b)

    (c)

    (d)

    Figure 3.2: (a) n-bit MCT, (b) NOT, (c) CNOT, and (d) Toffoli gate

    to the output unaltered and inverts the nth input (referred to as the target bit) if the

    first n− 1 inputs are all 1’s. That is:

    control bits: yi = xi, 1 ≤ i ≤ n− 1,

    target bit: yn = xn ⊕ x1x2 . . . xn−1(3.1)

    An MCT gate is shown in Fig. 3.2(a). A 1-bit MCT gate inverts the input uncon-

    ditionally and is, hence, also called a NOT gate [Fig. 3.2(b)]. A 2-bit MCT gate is

    also called the Feynman or Controlled-NOT (CNOT) gate. It inverts the target bit

    30

  • if the control bit is 1 [Fig. 3.2(c)]. Fig. 3.2(d) depicts a 3-bit MCT gate. Convention-

    ally, a 3-bit MCT gate is simply referred to as the Toffoli gate. It inverts the target

    bit if both the control bits are 1.

    Another popular reversible gate is the n-bit Multiple Controlled Fredkin (MCF)

    gate, defined as follows [55]:

    control bits: yi = xi, 1 ≤ i ≤ n− 2,

    target bit 0: yn−1 = xn−1x1x2...xn−2 ⊕ xnx1x2...xn−2

    target bit 1: yn = xnx1x2...xn−2 ⊕ xn−1x1x2...xn−2

    (3.2)

    The MCF gate swaps the two target bits if all the control bits are 1’s. Conven-

    tionally, a 3-bit MCF gate (n = 2) is simply referred to as the Fredkin gate.

    A third type of reversible gate is the three-bit Peres gate [56]. It accomplishes

    the operation of the cascade of a CNOT gate and a Toffoli gate, with the operation

    defined as follows:

    y1 = x1 ⊕ x2

    y2 = x2

    y3 = x3 ⊕ x1x2

    (3.3)

    The QC of a reversible gate is equal to the number of elementary quantum op-

    erations required to implement its functionality. The method for deriving the QC of

    arbitrary quantum gates was first introduced by Barenco et al. [98]. An n-bit quan-

    tum gate is hierarchically decomposed into basic elementary reversible gates. NOT,

    CNOT, and Toffoli gate are defined to have a QC of 1, 1, and 5, respectively. The

    QC of an n-bit MCT gate is 2n − 3. However, if the gate contains some ancillary

    qubits, its QC can be reduced because the ancillary qubits can be used to hold some

    temporary values, thus reducing the computational complexity [73, 74].

    31

  • The QC of a reversible circuit is simply the sum of the QC of its constituent gates.

    A high QC leads to a high computation time, leading to a higher probability that the

    quantum system will be subjected to noise and just collapse. Thus, minimizing the

    circuit QC is also a very important design objective.

    3.2.3 Reed-Muller Expansion

    Any Boolean function can be described by an exclusive-OR sum-of-products (ESOP)

    expression [99]. The ESOP expression can be easily converted to the PPRM expan-

    sion in which all the variables are uncomplemented (by replacing each complemented

    variable x′ by x⊕ 1). The PPRM expansion is canonical and has the form:

    f(x1, x2, . . . , xn) =a0 ⊕ a1x1 ⊕ . . .⊕ anxn ⊕ a12x1x2⊕

    a13x1x3 ⊕ . . .⊕ an−1,nxn−1xn ⊕ . . .⊕

    a12...nx1x2 . . . xn

    (3.4)

    Each cube in the expansion is referred to as a Reed-Muller term. Fixed-polarity

    Reed-Muller (FPRM) expansion is a variant of the PPRM expansion and uses only

    the complemented or uncomplemented form (but not both) of each variable. An n-

    input function has 2n FPRM expansions. The PPRM expansion is just a special case

    of the FPRM expansion because it only uses uncomplemented variables. The choice

    of polarity for each variable influences the number of terms in the resulting FPRM

    expansion. The following example shows four different expansions for an example

    Boolean function: y = a + b′c. Generally, the PPRM expansion has the largest

    32

  • number of terms since only positive polarity is used for each variable.

    SOP: y = a+ b′c

    ESOP: y = a⊕ a′b′c

    PPRM(a,b,c) : y = abc⊕ ac⊕ bc⊕ a⊕ c

    FPRM(a′,b′,c) : y = a′b′c⊕ a′ ⊕ 1

    (3.5)

    3.2.4 Decision Diagrams

    DDs are data structures that can efficiently represent Boolean functions. A DD is a

    directed acyclic graph defined as follows [100, 101]:

    Definition 1: A DD over variables X = {x1, x2, ..., xn} is a rooted directed acyclic

    graphG = (V,E) with vertex set V containing two types of vertices: non-terminal and

    terminal. A terminal vertex is labeled 0 or 1 and has no successors. A non-terminal

    vertex v is labeled with a variable x ∈ X, which is called the decision variable, and

    has exactly two successors denoted by low(v) and high(v) ∈ V .

    Definition 2: A DD is free if each variable is encountered at most once on each path

    in the DD from the root to a terminal vertex.

    Definition 3: A DD is ordered if it is free and the variables are encountered in the

    same order on every path from the root to a terminal vertex.

    A BDD decomposes Boolean functions into smaller sub-functions by only using

    Shannon decomposition on every variable, whereas a KFDD decomposes Boolean

    functions into sub-functions by using Davio or Shannon decompositions.

    Shannon and Davio decompositions are defined as follows:

    f = x′if0i ⊕ xif 1i Shannon (S)

    f = f 0i ⊕ xif 2i positive Davio (pD)

    f = f 1i ⊕ x′if 2i negative Davio (nD)

    (3.6)

    33

  • (b)

    a

    b

    c

    d

    e

    0

    0

    0

    a

    b

    c

    d

    e

    cd

    bcd

    abcd⊕e

    a

    b

    c

    d

    e

    a

    b

    c

    d

    abcd⊕e

    (a)

    QC=29 QC = 5 + 5 + 5 + 1

    Figure 3.3: Two implementations of a five-bit MCT gate: (a) with [#qubits,QC] =[5,29] and (b) with [#qubits,QC] = [8,16]

    where f 0i is the 0-cofactor of f with respect to xi [i.e., f0i = f(x1, . . . , xi−1, 0, xi+1, xn)],

    f 1i similarly is the 1-cofactor, and f2i = f

    0i ⊕ f 1i .

    3.3 Motivational Examples

    We next discuss various design tradeoffs that can be made, with the aid of two

    examples.

    For reversible circuit synthesis, we should trade off among synthesis time, #qubits

    and QC. In most cases, a circuit with minimum #qubits, QC, and synthesis time does

    not exist, as illustrated by two examples next.

    3.3.1 Example 1: #qubits vs. QC

    Fig. 3.3(a) shows an example of a five-bit MCT gate. It contains five qubits and

    no ancillary bits. Its QC is 25 − 3 = 29. However, we can also implement the five-

    bit Toffoli gate as shown in Fig. 3.3(b). This requires eight qubits, including three

    ancillary qubits. Lines 5, 6 (garbage outputs e, cd) and 7 (garbage output bcd) realize

    intermediate values required for line 8 (the valid final output). The QC now reduces

    to only 16. This shows how #qubits can be traded off for QC.

    34

  • c b a co

    bo

    ao

    0 0 0 0 0 1

    0 0 1 0 0 00 1 0 1 1

    0 1 1 0 1 0

    1

    0 0 0 1 11

    0 1 1 0 01

    1 0 1 0 11

    1 1 1 1 0

    1 ao=a⊕1

    bo

    =ac⊕b⊕c

    co

    =ab⊕ac⊕b

    (a)

    (b)

    Figure 3.4: Truth table for Example 2

    a

    b

    c

    1

    0

    0

    ao

    bo

    co

    (a)

    -

    -

    -

    (a⊕1)c⊕b = ac⊕b⊕c

    a

    b

    c

    (a⊕1)(ac⊕b⊕c)⊕c = ac⊕ab⊕b

    ao

    bo

    co

    (b)

    Figure 3.5: (a) Quick synthesis by direct placement of PPRM terms on ancillarylines, with [#qubits,QC] = [6,19] and (b) RMRLS requiring more synthesis time,with [#qubits,QC] = [3,11]

    3.3.2 Example 2: Circuit Size vs. Synthesis Time

    The circuit size is determined by #qubits and QC. Fig. 3.4(a) shows the truth table

    of a three-variable reversible function and Fig. 3.4(b) its PPRM expansion. We can

    trivially synthesize the circuit (in almost no synthesis time) by placing the PPRM

    terms on three ancillary lines, one for each output, yielding #qubits = 6 and QC =

    19, as shown in Fig. 3.5(a). However, RMRLS yields the much better circuit shown

    in Fig. 3.5(b), with #qubits = 3 and QC = 11. This requires more synthesis time.

    3.4 RMDDS Algorithm

    In order to synthesize reversible circuits under a user-defined optimization objective

    or constraints on #qubits, QC or synthesis time, we propose a hybrid method, called

    35

  • L1: Pop search node

    from PQ

    L2: Select factor

    substitute

    vi

    ← vi

    ⊕ factor

    Insert new

    search nodes

    Higher score?

    Yes

    Synthesis

    complete or

    time out ?

    Yes

    No

    End

    No

    Yes

    More factors

    available?

    No

    Initialization

    PQ empty?

    No

    Yes

    Figure 3.6: Synthesis flow of RMRLS

    RMDDS, that is inspired by RMRLS and DDS and integrates their best features. It

    can effectively and efficiently explore the design space for circuit synthesis. In the

    following, we first briefly introduce the algorithms and concepts involved in RMRLS

    and DDS and then describe RMDDS.

    3.4.1 RMRLS

    RMRLS [60] exploits the similarity in the functional expression of MCT gates and the

    PPRM form. This similarity is evident from Eqs. (3.1) and (3.4). The synthesis flow

    of RMRLS is shown in Fig. 3.6. The input to the algorithm is the PPRM expansion

    of each output of the reversible function. In the initialization stage, a search node

    that contains the PPRM expansions of all the outputs is pushed into a priority queue

    (PQ). The node is initialized as the root node of an N-ary search tree. PQ maintains

    a list of nodes sorted by their priorities and the search tree records the search space

    of the function.

    36

  • The algorithm then enters two hierarchical loops to synthesize MCT gates. The

    first loop L1 iterates over search nodes and the second loop L2 iterates over valid

    MCT gates within a search node. In each iteration of L1, the highest-priority node

    is popped from PQ (if PQ is not empty). In each iteration of L2, factors in the

    PPRM expansion are investigated. The rule for finding a suitable factor is as follows.

    For each output function vout, we search for the PPRM terms that have the format

    vout = v⊕ factor where the factor is a PPRM term that does not contain variable v.

    For example, for the PPRM expansion aout = a⊕ ab⊕ ac⊕ bc⊕ 1, bc and 1 are valid

    factors but ab and ac are not, because the output function is aout and both ab and

    ac contain variable a. Wh