application-motivated, holistic benchmarking of a full ...€¦ · application-motivated, holistic...

44
Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills * 1,2 , Seyon Sivarajah 2 , Travis L. Scholten 3 , and Ross Duncan 2,4 1 University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK 2 Cambridge Quantum Computing Ltd, 9a Bridge Street, Cambridge, CB2 1UB, UK 3 IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA 4 University of Strathclyde, 26 Richmond Street, Glasgow, G1 1XH, UK June 3, 2020 Abstract Quantum computing systems need to be bench- marked in terms of practical tasks they would be expected to do. Here, we propose 3 “application- motivated” circuit classes for benchmarking: deep (relevant for state preparation in the variational quantum eigensolver algorithm), shallow (inspired by IQP-type circuits that might be useful for near- term quantum machine learning), and square (in- spired by the quantum volume benchmark). We quantify the performance of a quantum computing system in running circuits from these classes using several figures of merit, all of which require expo- nential classical computing resources and a polyno- mial number of classical samples (bitstrings) from the system. We study how performance varies with the compilation strategy used and the de- vice on which the circuit is run. Using systems made available by IBM Quantum, we examine their performance, showing that noise-aware compilation strategies may be beneficial, and that device con- nectivity and noise levels play a crucial role in the performance of the system according to our bench- marks. * Corresponding author: [email protected] Contents 1 Introduction 2 2 Circuit Classes 4 2.1 Shallow Circuits: IQP ........ 4 2.2 Square Circuits: Random Circuit Sampling ............... 5 2.3 Deep Circuits: Pauli Gadgets .... 7 3 Figures of Merit 7 3.1 Heavy Output Generation Bench- marking ................ 8 3.2 Cross-Entropy Difference ....... 10 3.3 1 -Norm Distance ........... 11 3.4 Metric Comparison .......... 12 4 Quantum Computing Stack 12 4.1 Software Development Kits ..... 12 4.2 Compilers ............... 13 4.3 Devices ................ 13 5 Experimental Results 14 5.1 Full Stack Benchmarking ...... 16 5.2 Application Motivated Benchmarks . 22 5.3 Insights from Classical Simulation . 24 6 Conclusion 25 A Exponential Distribution 35 A.1 Square Circuits ............ 35 A.2 Deep Circuits ............. 35 1 arXiv:2006.01273v1 [quant-ph] 1 Jun 2020

Upload: others

Post on 07-Aug-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

Application-Motivated, Holistic Benchmarking of a FullQuantum Computing Stack

Daniel Mills∗ 1,2, Seyon Sivarajah2, Travis L. Scholten3, and Ross Duncan2,4

1University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK2Cambridge Quantum Computing Ltd, 9a Bridge Street, Cambridge, CB2 1UB, UK

3IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA4University of Strathclyde, 26 Richmond Street, Glasgow, G1 1XH, UK

June 3, 2020

AbstractQuantum computing systems need to be bench-marked in terms of practical tasks they would beexpected to do. Here, we propose 3 “application-motivated” circuit classes for benchmarking: deep(relevant for state preparation in the variationalquantum eigensolver algorithm), shallow (inspiredby IQP-type circuits that might be useful for near-term quantum machine learning), and square (in-spired by the quantum volume benchmark). Wequantify the performance of a quantum computingsystem in running circuits from these classes usingseveral figures of merit, all of which require expo-nential classical computing resources and a polyno-mial number of classical samples (bitstrings) fromthe system. We study how performance varieswith the compilation strategy used and the de-vice on which the circuit is run. Using systemsmade available by IBM Quantum, we examine theirperformance, showing that noise-aware compilationstrategies may be beneficial, and that device con-nectivity and noise levels play a crucial role in theperformance of the system according to our bench-marks.

∗Corresponding author:[email protected]

Contents

1 Introduction 2

2 Circuit Classes 42.1 Shallow Circuits: IQP . . . . . . . . 42.2 Square Circuits: Random Circuit

Sampling . . . . . . . . . . . . . . . 52.3 Deep Circuits: Pauli Gadgets . . . . 7

3 Figures of Merit 73.1 Heavy Output Generation Bench-

marking . . . . . . . . . . . . . . . . 83.2 Cross-Entropy Difference . . . . . . . 103.3 `1-Norm Distance . . . . . . . . . . . 113.4 Metric Comparison . . . . . . . . . . 12

4 Quantum Computing Stack 124.1 Software Development Kits . . . . . 124.2 Compilers . . . . . . . . . . . . . . . 134.3 Devices . . . . . . . . . . . . . . . . 13

5 Experimental Results 145.1 Full Stack Benchmarking . . . . . . 165.2 Application Motivated Benchmarks . 225.3 Insights from Classical Simulation . 24

6 Conclusion 25

A Exponential Distribution 35A.1 Square Circuits . . . . . . . . . . . . 35A.2 Deep Circuits . . . . . . . . . . . . . 35

1

arX

iv:2

006.

0127

3v1

[qu

ant-

ph]

1 J

un 2

020

Page 2: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

A.3 Shallow Circuits . . . . . . . . . . . 35

B Compilation Strategies 37

C Device Data 38C.1 Device Coupling Maps . . . . . . . . 39C.2 Device Calibration Information . . . 39

D Empirical Relationship BetweenHeavy Output Generation Probabil-ity and L1 Distance 40

1 Introduction

As quantum computers evolve from bespoke lab-oratory experiments comprising a handful ofqubits, to more general-purpose, programmable,commercial-grade systems [1–5], new techniques forcharacterizing them are needed. Quantum char-acterization, validation, and verification (QCVV)protocols to detect, diagnose, and quantify errorsin quantum computers, originally focused on prop-erties of one or several qubits (e.g., T1 and T2 times,gate error rates, state preparation fidelity, etc). Asmulti-qubit quantum computing systems develop,the scope of QCVV must expand. In particular,a need has arisen for “holistic” benchmarks - oneswhich stress test a quantum computing system inits entirety, not just individual components. Holis-tic benchmarks are desirable for two reasons: theyenable comparison across different systems1, andallow for tracking the performance of a fixed sys-tem over time.

“Holistic benchmarking” of a quantum comput-ing system could refer to benchmarking the physi-cal implementation of a collection of qubits, with-out referring to the computational task these qubitswould perform. This idea is most useful when test-ing physical properties of a collection of qubits2.The complementary view (taken in this work) isthat holistic benchmarks test the quantum compu-tational capabilities of the complete system. Underthis view, the entire compute stack – qubits, com-pilation strategy, classical control hardware, etc.

1This should be compared to the benchmarking of classi-cal computers, with the LINPACK benchmarks [6, 7] beingused to build the TOP500 ranking of supercomputers [8].

2A simple example is crosstalk detection, where the out-put of the benchmarking could be a table of coupling valuesbetween all connected qubits.

– should be benchmarked collectively. Such “full-stack” benchmarking provides information thatbenchmarking individual stack components cannot,as it captures the performance of the system as anintegrated unit.

At the same time, running a full-stack bench-mark on a fixed computational system, while use-ful for tracking the performance of that systemover time, provides little information on how differ-ent combinations of the stack’s components couldchange system performance. For this reason, full-stack benchmarking should, as much as possible,make explicit the variable components of the stack,and systematically vary those components to seehow the inclusion of a particular component affectssystem-level performance.

Here, we will focus on benchmarking systemsmade available by IBM Quantum, and investigatetwo components of the stack: the compilation strat-egy used to map an abstract circuit onto one that isexecutable on a quantum computer and the deviceused to run the compiled circuit and return the re-sults. While the particular systems used here haveother components (such as pulse synthesizers), wedo not look at the impact of those pieces on full-stack performance.

The design of new compilers for quantum cir-cuits is an active area of research, especially “noise-aware” compilation strategies which use knowledgeof the physical properties of the system’s qubitsto improve results [1, 9–11]. The proliferation ofcompilers necessitates understanding how the in-clusion of particular compilation strategies in thestack affects performance. Problem instances re-quiring compilation, which are often more repre-sentative of real world problems, typically show dif-fering performance from those that do not [12]. Inparticular, noise-aware compilation strategies makeassumptions about the influence of noise processeson overall system performance, so full-stack bench-marking is necessary to verify those assumptions.

The benchmarks defined here have two parts:a circuit class and a figure of merit. The cir-cuit class describes the type of circuit to be runby the system, and the figure of merit quantifieshow well the system did when running circuits fromthat class. This approach is inspired by volumetricbenchmarking [13].

Because quantum computing systems are usedfor particular applications, the circuits classes

2

Page 3: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

should, in some way, test the performance of a sys-tem in those arenas [14]. At least two notions havebeen put forth as to how to define such classes. Oneproposes benchmarks based on often-used quantumalgorithmic primitives [13], the examples given be-ing primitives of Grover iterations and TrotterizedHamiltonian simulation.

An alternative is to pick a particular instance ofan application and check for the accuracy of theresults returned by the system when running thatinstance. Naturally, to measure non-negligible ac-curacy on noisy near-term systems the applicationsand instances must also be near-term by design.Such benchmarks have been defined in the contextof quantum simulation [15–19], quantum machinelearning [14, 20–23], discrete optimisation [12, 24–26], and quantum computational supremacy [3, 27–29]. This approach has the advantage that thedefinition of success is fairly straightforward. Thedownside is that performance as measured by oneinstance of an application may not be predictive ofperformance for the application generically.

The “application-motivated” circuit classes de-fined here draw inspiration from [13] (looking atcomputational primitives) but also draw inspira-tion from the literature above, by focusing on com-putational primitives of near-term quantum com-puting applications (chemistry and machine learn-ing, in particular). A system which does well on anapplication-motivated benchmark should do well inrunning the application the benchmark was derivedfrom. Three such “application-motivated” circuitclasses are introduced here. Drawing inspirationfrom the volumetric benchmarking approach, theclasses cover varying depth regimes and are (some-what) controllable in depth. In brief, the classes –as labelled by their depth regimes – are:

Deep: Inspired by product formula circuits, in-cluding state preparation circuits used in thevariational quantum eigensolver (VQE) algo-rithm for quantum chemistry [30–32].

Shallow: Inspired by hardware-efficient ansatze[33, 34] which may be useful for near-termquantum machine learning and chemistry ap-plications [35–37].

Square: Inspired by the circuits used to calculatea system’s quantum volume [38].

Section 2 provides details of these circuit classes,and presents algorithms for generating them.

How well a stack executes a circuit is assessedhere via continuous figures of merit, rather than bi-nary ones which may only verify correctness. Thisis because the outcomes from noisy devices willlikely not be correct, while information about close-ness to the correct answer is still highly valuable.Further, techniques for the verification of univer-sal quantum computation requires many qubits orqubit communication or both, none of which areaccessible using present-day noisy devices [39, 40].Indeed, to reflect the current state-of-the-art, wherethere exist few devices with limited networking be-tween them [4, 41], we will focus on examples ofhow classical computers can be used to performbenchmarks, as opposed to using small quantumcomputers to benchmark each other [42, 43]

We use three figures of merit, calculated us-ing classical computers. These are: heavy out-put generation probability [44], cross-entropy dif-ference [29], and `1-norm distance. Estimating eachof these figures of merit requires knowledge aboutthe ideal (noise-free) outcome probabilities of bit-strings the system could produce.

In practice, calculating the ideal outcome proba-bilities requires direct simulation of the circuit un-der consideration. Consequently, scaling to tens orhundreds of qubits will be challenging in general,particularly if the `1-norm distance is used as thefigure of merit. However, by considering circuitswith few qubits we allow ourselves the ability tosimulate the circuits classically, and to gain an in-sight into the behaviour of larger devices [3, 45].

We refer to a set of benchmarks as a bench-marking suite, each benchmark being defined byunique combinations of each circuit class and figureof merit. Using a benchmarking suite enables thederivation of broad insights about the behaviourand performance of a quantum computing systemacross a wide variety of possible applications. Theirvarying demands on the quantum computing re-sources (qubits, depth) allows for the explorationof the best routes to extract the most utility fromnear-term quantum computers. In sum, our bench-marking approach is both application-motivatedand holistic.

The remainder of this paper is comprised as fol-lows: Section 2 details the circuit classes, includingalgorithms for generating the circuits; Section 3 ex-

3

Page 4: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

plains the figures of merit we use; Section 4 intro-duces the software stack, as well as hardware madeavailable by IBM Quantum, that comprise the sys-tems we’ll be benchmarking; and Section 5 showsthe results of our benchmarking. We conclude inSection 6.

2 Circuit Classes

This section presents the formal definitions of thecircuits used in this work, while also identifying themotivations for their use in benchmarking. Thesemotivations include both the class of applicationsthey represent and the properties of the quantumcomputing stacks that they will probe. Collectively,this selection of circuit classes encompass an ar-ray of potential applications of quantum comput-ing, covering circuits of varied depth, connectivity,and gate types.

2.1 Shallow Circuits: IQP

Instantaneous Quantum Polytime (IQP) circuits[46] can be implemented using commuting gates.As well as being simpler to implement than uni-versal quantum circuits, there are strong theoreti-cal reasons to believe that, even in the presence ofnoise, IQP circuits cannot be simulated using classi-cal computers [47–49]. This has allowed for the ap-plication of noisy quantum technology in areas suchas machine learning [35, 36] and interactive two-player games [43, 46]. The connection between IQPand a demonstration of quantum computationalsupremacy on near-term hardware makes their im-plementation a pertinent benchmark of the perfor-mance of these devices.

The shallow class of circuits, whose depth in-creases slowly with width, is a subclass of IQPcircuits. These circuits probe the performance ofa quantum computing stack in fine-grained detailby measuring the impact of including more qubits(quasi-) independently of increasing circuit depth.This is useful when for understanding the perfor-mance of a device being utilised for applicationswhose qubit requirement grows more quickly thanthe circuit depth.

Definitions and Related Results An n-qubitIQP circuit consists of gates that are diagonal in the

Pauli-X basis, acting on the |0〉n state, with mea-surement taking place in the computational basis.For this class of circuits, Theorem 1 applies.

Theorem 1 (Informal [48]) Assuming eitherone of two conjectures, relating to the hardnessof approximating the Ising partition function andthe gap of degree 3 polynomials, and the stabilityof the Polynomial Hierarchy3, it is impossibleto classically sample from the output probabilitydistribution of any IQP circuit in polynomial time,up to an `1-norm distance of 1/192.

This class is called “instantaneous" because thesegates commute with one another, which in turn re-duces the amount of time that the quantum statewill need to be stored. In addition, the impossibil-ity of simulating IQP circuits is shown to hold whenrestricted by physically motivated constraints suchas limited connectivity and constant error rates oneach qubit [49].

An equivalent, commonly-considered definition isthat IQP circuits consist of gates diagonal in thePauli-Z basis, sandwiched between two layers ofHadamard gates acting on all qubits. Algorithm 1is used to generate IQP circuits of this form. Notethat Algorithm 1 limits the connectivity allowed be-tween the qubits, so it does not generate all circuitsin the IQP class.

The depth of this circuit may be arrived at byobserving that finding an optimal order of applica-tion of CZ is equivalent to finding a edge colouringof the graph Gn. In this case a 4-colouring canbe found in polynomial time [51]. Algorithm 1 in-cludes discrete randomness over the graphs, Gn,and continuous randomness over the rotation an-gles, αi.

The design of circuits in Algorithm 1 may becompared to other sparse IQP circuits [49], IQPcircuits on 2D latices [49, 52], and random 3-regular graphs used for benchmarking [12]. For ourpurposes these require too high-connectivity, aretoo architecture-specific, and are too application-specific, respectively. There are sparse IQP cir-cuits for which verification schemes exist [52, 53] al-though the connectivity is too architecture-specificfor our purposes, with the verification scheme re-quiring limits to the measurement noise which wecannot guarantee.

3The non-collapse of the Polynomial Hierarchy is widelyconjectured to be true [50].

4

Page 5: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

Algorithm 1 The pattern for building shallow cir-cuits.

Input: Number of qubits, n ∈ ZWorst case depth: 7Output: Circuit, Cn

1: Initialise n qubits, labelled q1, ..., qn, in thestate |0〉.

2:3: for all i ∈ 1, ..., n do4: Act H on qi5: end for6:7: Generate a random binomial graph, Gn, with n

vertices and edge probability 0.5, post selectingon those that are connected and have degreeless than 4.

8:9: for all edges i, j in Gn do

10: Act CZ between qi and qj11: end for12:13: for all i ∈ 1, ..., n do14: Generate αi ∈ [0, 2π] uniformly at random.15: Act RZ (αi) on qi .16: end for17:18: for all i ∈ 1, ..., n do19: Act H on qi20: end for21:22: Measure q1, ..., qn in the computational basis

Discussion The close connection, through The-orem 1, of quantum computational supremacy andshallow circuits4, explicitly measured in `1-normdistance, provides a measure of a quantum comput-ing stack’s quality; namely, by analysing the close-ness of the distributions it produces to the idealones, as measured by the `1-norm distance, andcomparing this value to 1/192.

However, as the output probabilities of shallowcircuits are not exponentially distributed, we can-not use Cross-Entropy Benchmarking. Similarlythe theoretical value of heavy output probabilityfor circuits with exponentially distributed outputprobabilities, as discussed in Section 3.1, cannot beused here.

Instead, we use the empirical value of the idealheavy output probability, in the place of a theoret-ically derived one, as a point of comparison withthe behaviour of the quantum computing stack be-ing benchmarked. This approach requires calcu-lation of all output probabilities and summationof the probabilities of those that are heavy. Thiscan be done for the small circuits investigated here,but allows for the benchmarking of fewer qubitsthan would be accessible if a theoretical value wasknown.

Before compilation shallow circuits have constantdepth, allowing us to measure the impact of in-creasing circuit width independently of increasingcircuit depth. Further, because Algorithm 1 lim-its the connectivity allowed between the qubits,the increase in circuit depth due to compilationonto limited-connectivity architectures is also min-imised, while avoiding a choice of connectivityfavouring one device in particular. By boundingconnectivity, but allowing all connections in prin-ciple, we avoid biasing against architectures thatallow all-to-all connectivity, which would still per-form well.

2.2 Square Circuits: Random Cir-cuit Sampling

While circuits required for applications are typi-cally not random, sampling from the output dis-

4While Theorem 1 is a worst case hardness result, andmay not apply to shallow circuits, we regard their perfor-mance as indicative of that of those for which it does. In-deed, similar hardness results to Theorem 1 exist for otherfamilies of sparse, constant depth IQP circuits [52].

5

Page 6: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

tributions of random circuits built from two-qubitgates has been suggested as a means to demonstratequantum computational supremacy [29, 44, 54, 55].Further, by utilising uniformly random two-qubitunitaries, the class we define here, which we referto as square circuits, provides a benchmark at alllayers of the quantum computing stack. In particu-lar it tests the ability of the device to implement auniversal gate set, the diversity and quality of thegates available, and the compilation strategy’s abil-ity to decompose these gates to the native architec-ture. Further, as quantum circuits can always beapproximated up to arbitrary precision using two-qubit unitary gates [56], square circuits can help usunderstand the performance of quantum computingstacks when implementing computations requiringa universal gate set.

Definitions and Related Results A randomcircuit, for a fixed number of qubits n and couplingmap Gn, is generated by applying m = poly (n)uniformly random two-qubit SU (4) gates betweenqubits connected by edges of Gn. Here, “uniformlyrandom” means according to the Haar measure.Random Circuit Sampling (RCS) is the task of

producing samples from the output distribution ofrandom circuits. To perform RCS approximatelyis to sample from a distribution close to that pro-duced by the random circuit. This task has beenshown to be hard even in the average case [54, 55],as outlined in Theorem 2, which improves upon theworst case result for IQP circuits as seen in Theo-rem 1.

Theorem 2 (Informal [54]) There exists a col-lection of coupling maps Gn, with one for each n,and procedure for generating random circuits re-specting each Gn, for which there is no classicalrandomised algorithm that performs approximateRCS, to within inverse polynomial `1-norm dis-tance error, for a constant fraction of the randomcircuits.

The conditions imposed on which coupling mapsand circuit generation procedures are covered bythis theorem are quite mild, but in particular thiscan be done using circuits with depth O (n) actingon a 2D square lattice [44, 54]. While this is rele-vant for devices built using superconducting tech-nology [3], we wish to avoid biasing in favour of thistechnology in particular.

The circuits used here – which are almost iden-tical to those used for the quantum volume bench-mark [38] – are generated according to Algorithm 2.We refer to this class of circuits as square circuits,and note that they consist of n layers of two-qubitgates acting between a bipartition of the qubits.There is discrete randomness over the possible bi-partition of the qubits, and continuous randomnessover the random two-qubit SU (4) gates.

Algorithm 2 The pattern for building square cir-cuits.

Input: Number of qubits, n ∈ ZWorst case depth: nOutput: Circuit, Cn

1: Initialise n qubits, labelled q1, ..., qn, in thestate |0〉

2:3: for each layer t up to depth n do4: . The contents of this for loop constitutes a

layer. The choice of the number of layers usedhere is discussed in Appendix A.1.

5:6: Divide the qubits into bn2 c pairs qi,1, qi,2

at random.7: for all i ∈ Z, 0 ≤ i ≤ bn2 c do8: Generate Ui,t ∈ SU (4) uniformly at ran-

dom according to the Haar measure.9: Act Ui,t on qubits qi,1 and qi,2.

10: end for11: end for12:13: Measure all qubits in the computational basis.

Discussion By allowing two-qubit gates to actbetween any pair of qubits in the uncompiled cir-cuit, square circuits avoid favouring any device inparticular [3, 29, 44]. This choice adheres closelyto our motivations of being hardware-agnostic. Inaddition, assuming all-to-all connectivity passes theburden of mapping the circuit onto the device to thecompilation strategy, which is in line with our wishto benchmark the full quantum computing stack.That siad, any architecture whose coupling mapclosely mirrors the uncompiled circuit will be ad-vantaged, as even a naive compilation strategy willperform well in that case.

6

Page 7: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

In [38] similar circuits are used but with all-to-allconnectivity restricted to nearest neighbour con-nectivity on a line, and the addition of permuta-tion layers. As this disadvantages devices with acompletely connected coupling map5 [5], a propertywhich would typically be an advantage, we choosenot to make this restriction here. Notice, however,that naively compiling square circuits onto an ar-chitecture with nearest neighbour connectivity ona line would result in the circuits of [38]. This simi-larity makes a comparison between experiments in-volving these circuits relevant. As a result, com-piling square circuits to superconducting devices(where connectivity is low) will generally result in acircuit similar to those used in the quantum volumebenchmark, as many SWAP operations are requiredregardless.

In addition, square circuits fulfil the necessaryconditions to apply HOG, as defined in Problem1. Namely, the distribution pC is sufficiently farfrom uniform in the required sense, as introducedin Section 3.1, which we demonstrate in AppendixA.1.

2.3 Deep Circuits: Pauli Gadgets

Pauli gadgets [57] are quantum circuits implement-ing an operation corresponding to exponentiatinga Pauli tensor. Sequences of Pauli gadgets actingon qubits form product formula circuits, most com-monly used in Hamiltonian simulation [30]. Manyalgorithms employing these circuits require fault-tolerant devices, but they are also the basis of trialstate preparation circuits in many variational algo-rithms, which are the most promising applicationsof noisy quantum computers. A notable exam-ple of this in quantum chemistry is the physically-motivated UCC family of trial states used in thevariational quantum eigensolver (VQE) [31, 58]. Asnear-term quantum computers hold promise as use-ful tools for studying quantum chemistry, we pro-pose that the quality of an implementation of thesegadgets is a useful benchmark, and use them todefine the deep circuit class.

Note that the circuits in this class differ from run-5Note that some compilation strategies may identify that

the SWAP gates in the permutation layer may be removedfor devices with all-to-all connectivity. We avoid this depen-dence on the compilation strategy by fixing the connectivityin the uncompiled circuit to be all-to-all.

ning the VQE end-to-end. Focusing on the statepreparation portion of a VQE circuit, we might de-duce performance of the quantum computing stackwhen running the VQE on a number of molecules6.The intuition being that if the state preparationsub-component is accurate, then the error in theexpectation values of measured observables will bedue to errors in implementing those observables, orthe readout process itself.

Definitions and Related Results These cir-cuits are built as in Algorithm 3. They are con-structed from several layers of Pauli Gadgets, eachacting on a random subset of n qubits. In the worstcase each Pauli Gadget will demand 4n + 1 gates:2n Pauli gates, 2 (n− 1) CX gates, and one RZ gate.In the construction of deep circuits there is discreterandomness over the choice of Pauli string, s, andcontinuous randomness over a rotation angle α.

Discussion By establishing the exponential dis-tribution of the output probabilities from deep cir-cuits, as we do in Appendix A.2, we allow our-selves the capacity to use Heavy Output Genera-tion Benchmarking and Cross-Entropy Benchmark-ing as introduced in Section 3. This constitutes anovel extension of those approaches to applicationmotivated benchmarking, and the unique ability forus to benchmark application-motivated circuits, us-ing polynomially many samples from a device. Thisprovides a novel insight into the capacity of near-term hardware to implement quantum chemistrycircuits.

3 Figures of MeritSuppose a quantum computer is programmed torun a circuit C or a unitary U . Figures of meritcompare pU (pC), the ideal output probabilities forU (C), and DU (DC), the distributions producedby an implementation, which may be noisy. Theremainder of this section outlines three figures ofmerit and details: their definition, the continuousrange of values they can take, their dependence onnoise, and the procedure for calculating their value

6Here we do not explore the relationship between the per-formance of a quantum computing stack when implementingdeep circuits and when implementing VQE but regard it asimportant for future work.

7

Page 8: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

Algorithm 3 The pattern for building deep cir-cuits.

Input: Number of qubits, n ∈ ZWorst case depth: (4n− 1) (3n+ 1)Output: Circuit, C

1: function PhaseGadget(α, q1, ..., qp)2: if p = 1 then3: Act RZ (α) on q14: else5: Act CX between q1 and q26: PhaseGadget(α, q2, ..., qp)7: Act CX between q1 and q28: end if9: end function

10:11: function Pauli(q1, ..., qp, s)12: if s1 = X then13: Act RX

(π2

)on q1

14: else if s1 = Y then15: Act H on the qp16: end if17: Pauli(q2, ..., qp, s)18: end function19:20: function PauliGadget(α, qubits, s)21: Pauli(qubits, s)22: PhaseGadget(α, qubits)23: Act the inverse of Pauli(qubits, s)24: end function25:26: Initialise n qubits, labelled q1, ..., qn, in the

state |0〉.27:28: for each layer t up to depth 3n+ 1 do29: . The contents of this for loop constitutes a

layer. The choice of the number of layers usedhere is discussed in Appendix A.2.

30:31: Select a random string s ∈ I,X,Y,Zn32: Generate random angle α ∈ [0, 2π]33: PauliGadget(α, qi : si 6= I, s)34:35: end for36:37: Measure all qubits in the computational basis

from samples produced by an implementation. Asnoted in the introduction, we use continuous fig-ures of merit which require classical resources tocompute.

3.1 Heavy Output GenerationBenchmarking

Heavy Output Generation [44] (HOG) is the prob-lem which demands that, given a quantum circuitC as input, strings x1, ..., xk be generated which arepredominantly those that are the most likely in theoutput distribution of C. That is to say, outputswith the highest probability in the ideal distribu-tion should be produced most regularly.

If the ideal distribution is sufficiently far fromuniform, this problem provides a means to dis-tinguish between samples from the ideal distribu-tion and a trivial attempt to mimic such a sam-pling procedure, namely producing uniformly ran-dom strings. Although a simple problem, this taskis also conjectured to be hard for a classical com-puter to perform in general [44].

Importantly, a solution to HOG can be verifiedby a classical device using polynomial samples fromthe real distribution. In combination, these prop-erties make the study of the likely output of a dis-tribution a useful tool in benchmarking near-termquantum devices.

Definitions and Related Results Let pC (x) =

|〈x|C|0n〉|2 be the probability of measuring the out-put x in the output probability distribution of anideal implementation of a circuit C. An outputz ∈ 0, 1n is heavy for a quantum circuit C, if|〈z|C|0n〉|2 is greater than the median of the setpC (x) : x ∈ 0, 1n.

We can define the probability that samples drawnfrom a distribution DC will be heavy outputs in thedistribution pC , called the heavy output generationprobability of DC , as follows. Here δC (x) = 1 if xis heavy for C, and 0 otherwise.

HOG (DC , pC) =∑

x∈0,1nDC (x) δC (x) (1)

For HOG (DC , pC) to help us distinguish betweenan ideal implementation of C and a trivial at-tempt to mimic it by generating random bit strings,HOG (pC , pC) should be greater than 0.5. In fact,

8

Page 9: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

HOG (pC , pC) is expected to be (1 + log 2)/2 ≈0.846574 [44] for circuit classes whose distributionof measurement probabilities, p, is of the exponen-tial form7 Pr (p) = Ne−Np, where N = 2n. This isdiscussed at length in Appendix A. When the out-put distributions of a class of circuits is shown totake this form it is meaningful to define the HeavyOutput Generation problem.

Problem 1 (Heavy Output Generation [44])Given a measure µ over a class of circuits, thefamily of distributions DC is said to satisfyHOG if the following is true.

EC←µ [HOG (DC , pC)] ≥ 2

3(2)

Indeed, the exponential distribution of the out-put probabilities of the random circuits defined in[38] allowed for the definition of the quantum vol-ume of a device. This is the largest n for whichdistributions DCn

which solve the HOG problemintroduced in Problem 1, where Cn are random cir-cuits defined in [38], can be sampled from.

The motivation for the introduction of quantumvolume is the classical hardness of solving the HOGproblem of Problem 1 for random circuits, underthe QUATH assumption of Assumption 1.

Assumption 1 The QUAntum THreshold as-sumption (QUATH) [44] is that there is no poly-nomial time classical algorithm that takes as inputthe description of a random circuit C ← µ andwhich guesses whether |〈0n|C|0n〉|2 is greater orless than the median value in pC (x) : x ∈ 0, 1nwith success probability at least 1/2 + Ω(1/2) overthe choices of C.

As opposed to the statement that HOG is hard,QUATH does not reference sampling, and concernsonly the difficulty of approximating amplitudes.QUATH can be evidenced by observing the diffi-culties of calculating output probability amplitudes[44].

Ideal and Noisy Implementations HOG issolved efficiently by a quantum computer, sim-ply by implementing the circuit C. In the caseof extreme noise, and the convergence of the real

7This is also commonly referred to as the Porter-Thomasdistribution [59].

distribution DC to the uniform distribution U ,HOG (DC , pC) = 1/2. This is compared to thecase where the output probabilities are exponen-tially distributed, where DU = pU , when we wouldexpect to have HOG (DC , pC) = (1 + log 2)/2. Thecontinuum of values in between provides a valuablefigure of merit, which we call Heavy Output Gen-eration Benchmarking, for a quantum computingstack.

Calculation From Samples We approximateHOG (DC , pC) in a number of operations whichgrows exponentially with the number of qubits, butusing only a polynomial number of samples fromthe real distribution DC , by calculating the idealprobabilities pC (x). To do so we simply calculatethe following expression, where x1, ..., xk are sam-ples drawn from DU .

1

k

∑i=1,...,k

δC (xi) (3)

By the law of large numbers, this converges toHOG (DU , pU ) in the limit of increasing samplesize.

Discussion The connections between HOG andquantum computational supremacy allow us to ex-tract valuable insights into the ability of a quantumcomputing stack to demonstrate quantum compu-tational supremacy. It provides a minimal, singlevalue with which to compare quantum computingstacks, with an intuitive interpretation. The HOGproblem of Problem 1, in particular, is easy to solveon a fault tolerant quantum computer with over-whelming success probability.

As with quantum volume, we too will considerthe largest n for which solving the HOG problemof Problem 1 is possible for the circuit classes inSection 2 which have exponentially distributed out-put probabilities. This is not the case for all cir-cuit classes used here, and for those for which itis not we will explicitly calculate the ideal heavyoutput probability as a point of comparison. Intu-itively, the largest n solving this problem verifiesthe largest Hilbert space accessible to a quantumcomputing stack.

9

Page 10: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

3.2 Cross-Entropy DifferenceCross-entropy benchmarking [29] relates to the av-erage probability, in the ideal distribution, pU , ofthe outputs which are sampled from the real dis-tribution, DU . For distributions which are farfrom uniform, and with a spread of probabilitiesof outcomes, this measure can be used to distin-guish an ideal from a real implementation. Idealimplementations will regularly produce the higherprobability outputs, obtaining a high benchmarkvalue, while even a small shift in the distributionwill lower the value.

The value of the cross entropy difference canbe calculated using exponential classical resources,from a polynomial number of samples from a quan-tum computer, which allows for its utilisation inbenchmarking smaller quantum devices [3, 29, 60,61]. There are also well developed means by whichthis quantity can be used as a means of extrapolat-ing from the behaviour of smaller devices to that oflarger devices, which might demonstrate quantumcomputational supremacy [3].

Definitions and Related Results Intuitively,the entropy, H (D), of a distribution, D, as definedin equation (4), measures the expectation of ones‘surprise’ at observing samples fromD. In this case,this is measured by fD (x) = − log (D (x)), whichaccordingly decreases with increasing probability ofthe outcome occurring.

H (D) =∑

x∈0,1nD (x) log

(1

D (x)

)(4)

By extension, the cross-entropy measures onessurprise when sampling from D when expecting D′.This may be restated as the additional informationrequired to describe D given a description of D′.Formally, cross-entropy is defined as in Definition1.

Definition 1 (Cross-Entropy) The cross-entropy between two probability distributions Dand D′is

CE (D,D′) =∑

x∈0,1nD (x) log

(1

D′ (x)

). (5)

Then the cross-entropy difference is simplyCE (U ,D′) − CE (D,D′), where U is the uniformdistribution.

Definition 2 (Cross-Entropy Difference)The cross-entropy difference between two probabil-ity distributions D and D′ is

CED (D,D′) =∑x∈0,1n

(1

2n−D (x)

)log

(1

D′ (x)

). (6)

Therefore, the cross-entropy difference can bethought of intuitively as answering “is the distri-bution D′ best predicted by D or by the uniformdistribution?”.

A different but related definition sets fD (x) =2−n − D (x)8, in which case the related quantityis referred to as linear cross entropy [3]. In thiscase the connection to the average probability ofthe outputs sampled is clearer.

Ideal and Noisy Implementations The cross-entropy, CE (DU , pU ), between the output distribu-tion, pU , of a unitary, U , and the output distribu-tion of an ideal implementation of U , DU , reducesto the entropy of pU . In the case where the prob-abilities pU (x) are approximately independent andidentically distributed according to the exponentialdistribution, we have that H (pU ) = log 2n + γ − 1[29], where γ is Euler’s constant.

In the case where the probabilities D (x) are un-correlated with those of pU (x) we arrive at the fol-lowing prediction of the cross-entropy [29].

EU [CE (DU , pU )] = log 2n + γ (7)

D (x) and pU (x) are uncorrelated if, for exam-ple, D is the uniform distribution, or, in thecase of demonstrations of quantum computationalsupremacy, if D is the output of a polynomial costclassical algorithm [29].

These results allow us to identify the extremevalues taken by the cross-entropy difference.

DU = pU : When the unitary is implemented per-fectly CED (DU , pU ) = 1.

DU = U : When samples are generated uniformly atrandom CED (DU , pU ) = 0.

8The function fD (x) may be any which decreases withincreasing outcome probability. The choice depends on therelationship between the fidelity of the resulting state, andthe standard deviation of the estimator of the associateddefinition of cross-entropy difference.

10

Page 11: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

As such, the cross entropy difference gives a valuebetween 0 and 1 which measures the accuracy ofthe implementation of a unitary, the calculation ofwhich is called Cross-Entropy Benchmarking.

Calculation From Samples By the law of largenumbers, the following expression converges toCE (DU , pU ), where x1, ..., xk are samples drawnfrom DU .

1

k

∑i=1,...,k

log

(1

pU (xi)

)(8)

This can be used by a classical computer to ap-proximate the value for CED (DU , pU ). While onlya polynomial number of samples xi are required,the calculation of pU (xi) takes exponential time.

In our case, to avoid requiring the inverse of 0 inthis approximation, we chose to use an approxima-tion to pU . Namely we approximate it by the largerof pU and an inverse exponential in the number ofqubits, as is inspired by the average case supremacyresults related to random circuits [54, 55].

Discussion The comparison to the uniform dis-tribution which the cross-entropy difference pro-vides is valuable as, if an honest attempt is beingmade to recreate a distribution, at worst U could beproduced. In addition, the cross-entropy gives anestimate for the average circuit fidelity [29], whenthe conditions of the above discussion are met, fa-cilitating the characterisation of noise levels in im-plementations of quantum circuits. While Cross-Entropy Benchmarking on its own cannot be usedto distinguish error channels, in combination withthe techniques introduced here, it can provide in-sight into this information.

By approximating the fidelity of smaller circuits,Cross-Entropy Benchmarking allows us to char-acterise larger ones. This is achieved by com-bining the fidelities of the smaller circuits whichthemselves combine to give a larger one. Thismethod has been introduced and employed tobenchmark demonstrations of quantum computa-tional supremacy [3]. In that domain, calculat-ing the cross-entropy difference of the larger circuitwould otherwise be too computationally costly.

The average circuit fidelity may be calculated bydecoupling two halves of the device9, performing

9In the work of [3] both decoupled, partially coupled, and

Cross-Entropy Benchmarking of the circuit builtfrom gates in the larger circuit which act only oneach half respectively, and multiplying together theresults of both. This approach is feasible whenit can be justified, through numerical simulationsand experimental implementations, that the av-erage circuit fidelities do combine in this fashion.This is so when the errors on each output are un-correlated with the amplitude of that output in theideal probability distribution.

3.3 `1-Norm DistanceThe `1-norm distance between two probability dis-tributions measures the total difference betweenthe probabilities the distributions assign to ele-ments of their sample space. Such a metric issufficiently strong that for several classes of quan-tum circuits it is known that classical simulationof all circuits in the class to within some `1-normdistance of the ideal distribution would contradictcommonly held computational complexity theoreticconjectures [48, 54, 62].

Unlike the previous two figures of merit, approx-imating the `1-norm distance requires a full char-acterisation of the ideal output distribution. In thecases where few qubits are considered, as is so here,it is possible to perform such characterisations. Forlarger qubit counts, the cross-entropy benchmark-ing and heavy output generation are the preferredbenchmarking schemes.

Definitions and Related Results In the caseof distributions over the sample space 0, 1n, the`1-norm distance is defined as follows.

Definition 3 (`1-norm distance) For distribu-tions D and D′ over the sample space 0, 1n the`1-norm distance between them is defined as

`1 (D,D′) =∑

x∈0,1n|D (x)−D′ (x)| . (9)

Ideal and Noisy Implementations An idealimplementation of a unitary would result in a `1-norm distance of `1 (DU , pU ) = 0. However, noisewill likely make it incredibly difficult for even faulttolerant quantum computers to achieve a `1-norm

fully coupled circuits are investigated to ensure the accuracyof this method of combining fidelities.

11

Page 12: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

distance of 0 and so bounds, such as that discussedin Theorem 1, are often put on the value instead.Indeed in that case it is sufficient for `1 (DU , pU ) tobe bounded for a demonstration of quantum com-putational supremacy to occur.

Once again, the `1-norm distance takes a contin-uous range of values allowing for comparison be-tween implementations of circuits.

Calculation From Samples In this work wewill approximate the `1-norm distance between theideal and real distributions using samples from thereal distribution. Given samples s = x1, ..., xmfrom DU , let sx be the number of times x ap-pears in s. Define DU by DU (x) = (sx)/m. Thenthe approximation we will use for `1 (DU , pU ) is`1

(DU , pU

).

Discussion Because of its independence fromprobability values themselves, the `1-norm distanceis regarded as a fair measure on the closeness of dis-tributions. That is to say, it is reasonable to requirequantum computers to produce samples from dis-tributions within some `1-norm distance of the idealdistribution. This might not be true for measuresof distance, such as multiplicative error, which re-quire zero probability outcomes are preserved inthe presence of noise, but for which very strongconnections to quantum computational supremacyalso exist [47].

3.4 Metric ComparisonUnfortunately, Cross-Entropy Benchmarking andHeavy Output Generation Benchmarking cannotbe used to bound the `1-norm distance [54], which,as noted in Section 3.3, provides strong guaran-tees of demonstrations of quantum computationalsupremacy10. That is, the `1-norm distance pro-vides uniquely (amongst the metrics studied here)strong assurances about quantum computationalsupremacy. This comes at the cost of requiring fullstate vector simulation to calculate it, consumingmemory which grows exponentially in the number

10Interestingly, empirical result show a slight negative cor-relation between `1-norm distance and experimental heavyoutput probability (normalized to the heavy output proba-bility of an ideal device). This relationship is discussed inAppendix D, which provides empirical details of this rela-tionship.

of qubits. As the circuit widths approach thoselarge enough to demonstrate quantum computa-tional supremacy, memory requirements becomethe bottleneck [63].

In the case of Heavy Output Generation Bench-marking and Cross-Entropy Benchmarking onlypolynomially many single output probabilities arerequired, allowing the utilisation of Feynman sim-ulators [44]. These compute output bit string am-plitudes by adding all Feynman path contributions.This extends the domain of classical simulation byovercoming the memory storage problem, establish-ing the frontier of what’s possible on classical com-puters [3, 64, 65]. However, this method still re-quires exponential time to perform and so reachesits own limit for large numbers of qubits.

Since HOG (DU , pU ) and CE (DU , pU ) are expec-tations of different functions of ideal output prob-abilities, δ (pU ) and − log (pU ) respectively, overthe experimental output distribution, they cap-ture different features of the outputs [54]. In factHOG (DU , pU ) can also be used to approximate cir-cuit fidelity, however the standard deviation of theestimator is larger than that for CE (DU , pU ) [3].

4 Quantum Computing Stack

Each component of a quantum computing stack ex-erts an influence on overall performance, and identi-fying the distinct impact of a particular componentis often hard. To disentangle these factors, we mustclearly identify the components used during bench-marking. Here we detail the components used tobuild the quantum computing stacks explored inSection 5. The diverse selection of components al-lows us to investigate a variety of ways of buildinga quantum computing stack.

4.1 Software Development Kits

We use a combination of tools available via pytket[10, 66] and Qiskit [1, 67]. pytket is a Python mod-ule which provides an environment for construct-ing and implementing quantum circuits, as well asfor interfacing with CQC’s t|ket〉, a retargetablecompiler for near term quantum devices featuringhardware-agnostic optimisation. Qiskit is a open-source quantum computing software developmentframework for programming, simulating, and inter-

12

Page 13: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

acting with quantum processors, which also pro-vides a compiler. Details of the versions of thesoftware used are seen in Table 2 of Appendix B.

We use three parts of Qiskit in this work. Firstis the transpiler architecture, which enables usersto define a custom compilation strategies by exe-cuting a series of passes on the input circuit, asdiscussed in Section 4.2. The second part of Qiskitwe use is the library of predefined passes. Finally, aprovider is used to access hardware made availableover the cloud by IBM Quantum. The providerenables users to send circuits to hardware, retrieveresults, and query the hardware for its properties11.

Similarly, we use pytket to generate and manip-ulate circuits in several ways. Firstly we use thet|ket〉 compiler to construct compilation strategieswhich optimise the input circuit for the target hard-ware, utilising predefined passes available in t|ket〉.Secondly we use pytket to define abstract circuitsand to convert t|ket〉’s native representation of thecircuit into a Qiskit QuantumCircuit object whichis then dispatched to IBM Quantum’s systems forexecution.

4.2 CompilersCompilers provide tools to construct executablequantum circuits from abstract circuit models.This is done by defining passes which may manip-ulate a representation of a quantum circuit, oftenby taking account of limited connectivity architec-tures, or minimising quantities such as gate depth,but need not perform any manipulation12. Thesepasses are composed to form compilation strategieswhich should output executable quantum circuits.Quantum compiling is an active area of research[68–75], and there are many pieces of software avail-able for quantum compiling. As noted above, inthis work we use two: t|ket〉 and the compiler avail-able in Qiskit.

For the purposes of this work, the problem ofquantum compilation is divided into three tasks.

Placement: Determine onto which physicalqubits of a given device the virtual qubits in

11These properties include the graph connectivity, single-and two-qubit error rates, and qubit T1 and T2 times. Someof the noise-aware compilation strategies we use requireknowledge of these properties.

12An example of this is a pass which counts the gates inthe circuit.

the circuit’s representation should be initiallymapped.

Routing: Modify a circuit to conform to the qubitlayout of a specific architecture, for example,by inserting SWAP gates to allow non-adjacentqubits to interact [76]. Circuits are rarely de-signed with the device’s coupling map in mind,so this step is important [12].

Optimisation: Work to minimise some propertyof a circuit. This may be gate count or depth,which is done to improve implementation ac-curacy by reducing the impact of noise.

Each of these tasks could consider such things asthe trade-offs between the connectivity of a par-ticular subgraph of the device and the amount ofcrosstalk present in that subgraph [77].

Both pytket and Qiskit have multiple placement,optimisation, and routing passes. We compare theperformance of 5 compilation strategies built fromthese passes. Two of them, noise-unaware pytketand noise-unaware Qiskit, compile the circuit with-out knowledge of the device’s noise properties. An-other two, noise-aware pytket and noise-aware Qiskit,do take noise properties into account. As a baseline, we consider a simple compilation strategyfrom pytket using only routing, without optimisa-tion or noise-awareness; we refer to this pass asonly pytket routing. We detail these schemes in Ap-pendix B. The main difference between the noise-aware schemes is that noise-aware pytket prioritisesthe minimisation of gate errors during placement13,whereas noise-aware Qiskit prioritises readout andCX errors [73].

4.3 Devices

We benchmark some of the devices made avail-able over the cloud by IBM Quantum. The de-vices we use are referred to by the unique namesibmqx2, ibmq_16_melbourne, ibmq_singapore andibmq_ourense. Each device has a set of nativegates which all gates in a given circuit must be de-composed to. For all the devices considered here,the native gates are: an identity operation, I; 3“u-gates” [78], as defined in equation (10); and a

13This is true of pytket 0.3.0; as a result of this work, laterversions of pytket take into account readout error.

13

Page 14: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

controlled-NOT (CX) gate.

U3 (θ, φ, λ) =

(cos(θ2

)−eiλ sin

(θ2

)eiφ sin

(θ2

)ei(λ+φ) cos

(θ2

) )U2 (φ, λ) = U3

(π2, φ, λ

)U1 (λ) = U3 (0, 0, λ)

(10)

Two of the device properties used by the noise-aware compilation strategies are their connectivityand calibration data. The connectivity of a devicerefers to the connectivity of the graph representinghow the qubits are coupled to one another. Thisinformation is contained in a device’s coupling mapwhich, in the cases of the devices studied here, areshown in Appendix C.1 and summarised in Table1.

Device calibration data includes informationabout single- and two-qubit error rates, readout er-ror, and qubit frequency, T1, and T2 times. Thenoise-aware compilation strategies we investigateuse the gate error rates and readout error. Fulldetails of noise levels can be found in AppendixC.2 with average values given in Figure 1. Thisinformation is updated twice daily, with the datain Figure 1 averaged over the period 2020-01-29 to2020-02-10 during which time our experiments wereconducted.

The results of Section 5 depend heavily on thenoise levels of the device at the time at which thecomputation is implemented. This is doubly truein the case of the noise-aware optimisation schemesas a circuit optimised at one time may not performas well over time as the noise levels of the deviceschange. To reduce this effect we endeavoured tocompile and run circuits within as short a time in-terval as possible.

5 Experimental Results

In this section we identify, using the benchmarksuite defined in Section 2 and Section 3, whichproperties of different levels of a quantum comput-ing stacks of Section 4 result in the best perfor-mance. This allows us to suggest means to extractas much computing power as is possible from thedevices available now, and in the near future. The

benchmark suite is used here to probe the perfor-mance of quantum computing stacks in three ways:

Full Stack Benchmarking: In Section 5.1 weperform benchmarks of the full quantum com-puting stack. Incorporating and thoroughlyinvestigating the compilation strategy, in par-ticular, helps develop an understanding of howcircuit compilation influences the performanceof the quantum computing stack. In thecase of noise-aware compilation strategies, thisalso highlights how the assumptions made bythe strategy about the importance of differentkinds of noise impacts performance.

Application Motivated Benchmarks: In Sec-tion 5.2, by including three quite different cir-cuit classes in our benchmark suite, we explorehow a quantum computing stack may performwhen implementing a wide array of applica-tions.

Insights from Classical Simulation: In Sec-tion 5.3 we explore how benchmarks them-selves can assist in the task of developing newnoise models. By identifying when benchmarkvalues for real implementations and those weexpect from simulations using noise modelsdiffer, noise channels which should be addedto the noise models to achieve greater agree-ment with real devices can be identified. Thisis of particular importance as noise-awarecompilation strategies often utilise noiseproperties.

In the following subsections we present results oneach of these topics. For each circuit class and fixednumber of qubits, 200 circuits were generated ac-cording to the circuit generation algorithms of Sec-tion 2. Each circuit is compiled by a given compi-lation strategy onto a particular device. The com-piled circuits were then run on the device, using8192 repetitions (samples) from each compiled cir-cuit, which generates 8192 bitstrings. The compiledcircuits are also classically simulated using a noisemodel built from the device calibration informationat the time of the device run. See Data Availabilityfor access to the full experimental data set.

The resulting bitstrings are then processed ac-cording to the figures of merit given in Section 3.The distribution of the figures of merit are com-

14

Page 15: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

Device Vertices Average Degree Radius Minimum Cycle Length

ibmqx2 5 2.4 1 3ibmq_16_melbourne 15 2 2

3 4 4ibmq_ourense 5 1.6 2 N/Aibmq_singapore 20 2.3 4 6

Table 1: Selected graph properties of the coupling maps of devices studied in this work.This table displays: the number of vertices in the graph (corresponding to the number of qubits on thedevice); the average degree, which is the mean number of edges incident on each vertex; the radius,which is the minimax distance over all pairs of vertices; and the minimum cycle length, which is thesmallest number of edges per cycle over all cycles of the graph. See Appendix C.1 for full details of thecoupling maps of the devices explored here.

0.00 0.02 0.04 0.06

Device

(a) Readout Error.

0.0000 0.0005 0.0010 0.0015

Error Rate

(b) Error per U2 gate.

0.00 0.02 0.04 0.06

(c) Error per CX gate.

Figure 1: Average error rates across devices used in this work. Bars show the mean error ratesacross the whole device, while error bars give the standard deviation. Devices shown here are: ibmqx2[ ], ibmq_ourense [ ], ibmq_singapore [ ], ibmq_16_melbourne [ ]. Further details can be found inAppendix C.2

15

Page 16: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

pared by their mean, and their shape, which is ag-gregated into a box-and-whisker plot. Uncompiledcircuits were also perfectly simulated without noisein order to calculate the ideal heavy output prob-ability. These points are referred to as Noise-Freein the figures below.

5.1 Full Stack Benchmarking

Impact of the Compilation Strategy

The two layers of the quantum computing stackwe study are the compilation strategy and the de-vice on which the compiled circuit is run. Using afixed device and comparing multiple compilationstrategies allows us to determine which strategytends to perform well. Further, we aggregate per-formance over all compilation strategies as a wayof estimating the performance of a “generic” strat-egy. Similarly, using a fixed strategy and compar-ing its performance on multiple devices enables astudy of how the assumptions made by the strategyabout the devices impact performance when thoseassumptions don’t always hold.

Figure 2 displays experimental results makingthis comparison when implementing square circuitson ibmq_16_melbourne, using heavy output gener-ation probability as the figure of merit. The noise-aware pytket compilation strategy performs some-what better, on average, than a generic strategy.Because the aggregated information (“All Strate-gies” in Figure 2) includes aggregation over noise-aware pytket, these results indicate that other com-pilation strategies perform a bit worse, since theperformance of the aggregate is generally lowerthan that of noise-aware pytket14. This reveals boththe potential for compilation strategy driven im-provements in performance, and the insights intosuch improvements which full quantum computingstack benchmarking brings.

Aggregation over compilation strategies is notonly useful for identifying strategies which are bet-ter in general. Doing so also provides a way ofidentifying devices which perform well, by “wash-ing out” the effect of the compilation strategy on

14Note that due to the fact we aggregate over all 5 com-pilation strategies, the distribution of heavy output proba-bilities amongst the “All Strategies” category contains fivetimes as many points as compared to those for noise-awarepytket.

performance. That is, the strong performance (us-ing a systems-level benchmark) of a given devicemight be caused by the compilation strategy; toreduce the effect of the strategy, aggregation overseveral can be done.

For example, Figure 3 shows that by consider-ing performance with a fixed compilation strategy(in this case, noise-aware pytket), ibmq_singaporewould be considered to perform similarly, if notslightly better than ibmq_ourense, as measuredby `1-norm distance. However, aggregating overall strategies, as is done in Figure 4, showsibmq_ourense to perform better. This suggeststhat ibmq_ourense might be a better device for a“generic” compilation strategy to compile to.

An instance-by-instance comparison of differentcompilation strategies also helps us understandtheir limitations. For example, Figure 5 revealsnoise-aware pytket works best at reproducing theideal distribution of heavy output probabilities ofsquare circuits on ibmq_16_melbourne. This islikely in part due to the routing scheme, as is re-vealed by the strong performance of only pytketrouting.

Similarly, Figure 6 shows that noise-awarepytket is amongst the worst-performing compilationstrategies for lower numbers of qubits, while it isamongst the best-performing for higher numbers.This could be a result of the way in which noise-aware pytket prioritises noise in its routing scheme,with gate errors taking precedence15.

These results highlight the fact that full-stackbenchmarking can help provide a more detailed un-derstanding of how the components of a system af-fect performance.

Noise Level, Connectivity Trade Off

Another particularly important example of this isthe examination of the connectedness of the de-vice and its noise levels. More highly-connectedarchitectures typically allow for shallower imple-mentations of a given circuit as compared to less-connected ones, but the noise levels in a morehighly-connected architecture may be higher dueto crosstalk [79]. This creates a trade-off between

15For larger numbers of qubits and deeper circuits, gateerrors becomes more impactful on the total noise, and mate-rialises as giving noise-aware pytket an advantage for largernumbers of qubits.

16

Page 17: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

2 3 4 5

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Number of Qubits

Heavy

Outpu

tsProba

bility

Strategy:[ ] noise-aware pytket[ ] Noise-Free[ ] All Strategies

Figure 2: Comparison of fixed compilation strategy to average of all strategies, us-ing the heavy outputs probability metric, when running square circuits using the realibmq_16_melbourne device. Boxes show quartiles of the dataset while the whiskers extend to 1.5times the IQR past the low and high quartiles. White circles give the mean.

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Number of Qubits

Classical Simulation Using Qiskit Noise Model

Implementation on Real Device

l1Norm

Distance

Device:[ ] ibmq_singapore[ ] ibmqx2[ ] ibmq_16_melbourne[ ] ibmq_ourense

Figure 3: Comparison of devices, using the `1-norm distance metric, when running shallowcircuits compiled using noise-aware pytket. Both simulations using Qiskit noise models, and imple-mentations on real devices, are included. Boxes show quartiles of the dataset while the whiskers extendto 1.5 times the IQR past the upper and lower quartiles. White circles give the mean.

17

Page 18: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

2 3 4 5 6 70.00

0.25

0.50

0.75

1.00

1.25

Number of Qubits

l1Norm

Distance

Device:[ ] ibmq_singapore[ ] ibmqx2[ ] ibmq_16_melbourne[ ] ibmq_ourense

Figure 4: Comparison of real devices, using the `1-norm distance metric, when runningshallow circuits compiled using all compilation strategies. Here we compile onto each deviceusing all compilation strategies, including all compiled circuits in this plot. Boxes show quartiles of thedataset while the whiskers extend to 1.5 times the IQR past the upper and lower quartiles. White circlesgive the mean.

2 3 4 5

0.4

0.6

0.8

1.0

Number of Qubits

Heavy

Outpu

tsProba

bility

Strategy:[ ] noise-aware pytket[ ] noise-aware Qiskit[ ] noise-unaware pytket[ ] only pytket routing[ ] noise-unaware Qiskit[ ] Noise-Free

Figure 5: Comparison of compilation strategies, using the heavy outputs probability metric,when square circuits are ran on the real ibmq_16_melbourne device. Boxes show quartiles ofthe dataset while the whiskers extend to 1.5 times the IQR past the upper and lower quartiles. Whitecircles give the mean.

18

Page 19: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

2 3 4 50.0

0.2

0.4

0.6

0.8

1.0

Number of Qubits

` 1-N

orm

Distance Strategy:

[ ] noise-aware pytket[ ] noise-aware Qiskit[ ] noise-unaware pytket[ ] only pytket routing[ ] noise-unaware Qiskit

Figure 6: Comparison of compilation strategies, using the `1-norm distance metric, whenshallow circuits are run on the real ibmq_ourense device. Boxes show quartiles of the datasetwhile the whiskers extend to 1.5 times the IQR past the low and high quartiles. White circles give themean.

connectivity and the total amount of noise incurredwhen running a computation.

As noise affects the accuracy of the computation,this trade-off has practical implications for the per-formance of a device. Indeed, reducing the connec-tivity between superconducting qubits is used as atool to reduce noise levels [79]. This can also becounteracted by decoupling qubits [3] but this isnot utilised in the devices studied here16.

Figure 7 shows that devices with lower noiselevels (ibmq_singapore and ibmq_ourense) typi-cally outperform devices with higher noise levels(ibmqx2 and ibmq_16_melbourne) despite the lat-ter’s higher connectivity. An interesting exceptionto this is for 4 qubits, where ibmq_16_melbourneperforms best, likely because of the 4-qubit cyclesin its connectivity graph. This reduces the SWAPoperations necessary for implementing the circuit,reducing the overall circuit depth. This reveals theincrease in performance that can be expected whenthe connectivity of the device and the problem in-stance are similar [12]. Similar results hold forCross-Entropy Benchmarking, as shown in Figure8.

In general, we expect that circuits whose struc-ture can naturally be mapped to the connectivityof the device will generally perform well, whereas

16While we focus on the connectivity of superconductingarchitectures here, more generally the comparison betweenthe limited connectivity of superconducting devices, and thecompletely connected coupling maps of ion trap devices isof interest [14, 21, 23].

those which cannot, will not. In general though,lower-noise devices will tend to perform best.

Comparison with Previous Results

As discussed in Section 2.2, our definition of squarecircuits differs from previous experiments [38], byallowing for all-to-all connectivity before compila-tion, as opposed to utilising permutation layers.However, as a naive compilation of square circuitsonto a one-dimensional, nearest-neighbour connec-tivity would recreate the same circuits as used in[38], we might expect the results from our experi-ments to be similar, and a comparison of the resultsfrom these experiments is warranted. Further, inthe case of superconducting devices, as are exploredhere, we would expect the circuit after compilationto be similar, as many SWAP operations will needto take place in both cases.

Figure 7 shows that all quantum computingstacks explored here produce heavy outputs withprobability greater than 2/3 on average for circuitsacting on at most 3 qubits, with ibmq_ourense typ-ically performing best. Previous results reportedthat ibmq_singapore could demonstrate a quan-tum volume, as defined in [38], of 24 [80]. Thatour experiments produce different results is surpris-ing, given their aforementioned similarity. One un-controllable variable is changes of the device overthe time between previous experiments and thisone, which may influence this discrepancy. In-deed, studying the change in the value of volumet-

19

Page 20: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

0.4

0.5

0.6

0.7

0.8

0.9

1.0

2 3 4 5

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Number of Qubits

Classical Simulation Using Qiskit Noise Model

Implementation on Real Device

Heavy

Outpu

tsProba

bility

Device:[ ] ibmq_singapore[ ] ibmq_16_melbourne[ ] ibmq_ourense[ ] ibmqx2[ ] Noise-Free

Figure 7: Comparison of devices, using the heavy outputs probability metric, when runningsquare circuits compiled using noise-aware pytket. Both simulations using Qiskit noise models, andimplementations on real devices, are included. Boxes show quartiles of the dataset while the whiskersextend to 1.5 times the IQR past the upper and lower quartiles. White circles give the mean.

20

Page 21: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

0.25

0.00

0.25

0.50

0.75

1.00

1.25

2 3 4 5

0.25

0.00

0.25

0.50

0.75

1.00

1.25

Number of Qubits

Classical Simulation Using Qiskit Noise Model

Implementation on Real Device

Cross

Entropy

Difference

Device:[ ] ibmq_ourense[ ] ibmqx2

Figure 8: Comparison of devices, using the cross entropy difference metric, when runningsquare circuits compiled using noise-aware pytket. Both simulations using Qiskit noise models, andimplementations on real devices, are included. Boxes show quartiles of the dataset while the whiskersextend to 1.5 times the IQR past the upper and lower quartiles. White circles give the mean.

21

Page 22: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

ric benchmarks over time would be interesting anduseful, although no such effort is known to us.

5.2 Application Motivated Bench-marks

The same quantum computing stack will performdifferently when running different applications, asthe structure of the circuits they require will gen-erally be different. Differences in performance areseen in the context of our application-motivatedbenchmarks. For example, consider Figure 9, whichshows performance when implementing sparselyconnected circuits, and Figure 10, which shows per-formance when implementing chemistry-motivatedcircuits. In the case of Figure 9, the ibmqx2 deviceoutperforms ibmq_singapore, while in the case ofFigure 10 the reverse is true.

Quantum Chemistry

Figure 10 suggests ibmq_ourense is best for quan-tum chemistry applications, because it performswell when running deep circuits17. In particularFigure 10 indicates that the average circuit fidelityis highest for implementations on ibmq_ourense.

In Figure 10, all devices converge to the mini-mum value of cross-entropy difference at 4 qubits.To extend an investigation of this sort to morequbits would require lower noise levels or chemistrymotivated circuits which generate exponentiallydistributed output probabilities at lower depth.

Shallow Circuits as a Benchmark

Figure 11 demonstrates that shallow circuits allowus to benchmark the behaviour of a quantum com-puting stack for applications involving circuits withmany qubits but low circuit depth [35, 43, 46]. Inthis case we are able to continue our analysis, be-yond that of Figure 7, of those devices which per-form sufficiently well for a smaller number of qubits,and which have architectures including more qubitsto investigate.

The results show ibmq_singapore outperformsthe comparably sized ibmq_16_melbourne and

17This comes with the caveat, as mentioned in Section 2.3,that the connection between the quality of an implementa-tion of these computational primitives, as measured by thisbenchmark, and accurate ground state energy calculationsin VQE has not been demonstrated experimentally.

has comparable performance to ibmq_ourense forsmaller numbers of qubits. ibmq_singapore outper-forms ibmq_ourense by having more qubits avail-able. This superior performance of ibmq_singaporeis in comparison to the results of Figure 7, whereibmq_ourense was shown to perform well. This jus-tifies our suggestion that shallow circuits shouldbe included in benchmarking suites. Doing so al-lows for the exploration of higher qubit requirementcomputations. In this setting devices that performpoorly when implementing square circuits or deepcircuits may perform well.

Shallow Circuits and `1-Norm Distance

Theorem 1 provides a convenient criterion for suc-cess in implementing shallow circuits; namely an`1-norm distance of not more than 1/192 from theideal distribution. Figure 4 explores the closenessto a successful implementation and reveals that,on average over all compilation strategies, the bestperforming device is ibmq_ourense.

Figure 6 explores the best performing compila-tion strategies for ibmq_ourense. It shows that,compared to the other strategies, the mean `1-norm distance is marginally smaller for noise-awarepytket when the number of qubits is larger, whilenoise-unaware Qiskit and noise-aware Qiskit performwell for fewer qubits. We explore the perfor-mance of noise-aware pytket further, as the in-stances with higher qubit counts are relevant foruse cases of quantum computers. Indeed, Figure 3shows ibmq_ourense and ibmq_singapore performsimilarly when the noise-aware pytket optimiser isused, despite ibmq_ourense performing better onaverage. This is likely because ibmq_singaporehas a sub lattice with comparable noise levels toibmq_ourense, which noise-aware pytket is able toisolate, while on average the levels are higher.

No device consistently brings the `1-norm dis-tance to within 1/192 of the ideal. However,ibmq_ourense seems to slightly outperform theother devices, showing the benefit of lower noiselevels over high connectivity or high numbers ofqubits. While this methodology would be impos-sible to extend to the demonstrations of quantumcomputational supremacy, we hope that exploringit for these quantum computational supremacy re-lated circuits will provide insights into the bestquantum computing stack for such demonstrations.

22

Page 23: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

2 3 4 5 6 7

1.0

0.5

0.0

0.5

1.0

Number of Qubits

Cross

Entropy

Difference

Device:[ ] ibmq_singapore[ ] ibmqx2[ ] ibmq_16_melbourne[ ] ibmq_ourense

Figure 9: Comparison of real devices, using the cross entropy difference metric, when run-ning shallow circuits compiled using noise-aware Qiskit. Boxes show quartiles of the dataset whilethe whiskers extend to 1.5 times the IQR past the upper and lower quartiles. White circles give themean.

0.5

0.0

0.5

1.0

1.5

2 3 4

0.5

0.0

0.5

1.0

1.5

Number of Qubits

Classical Simulation Using Qiskit Noise Model

Implementation on Real Device

Cross

Entropy

Difference

Device:[ ] ibmq_singapore[ ] ibmq_16_melbourne[ ] ibmq_ourense[ ] ibmqx2

Figure 10: Comparison of devices, using the cross entropy difference metric, when runningdeep circuits compiled using noise-aware Qiskit. Both simulations using Qiskit noise models, andimplementations on real devices, are included. Boxes show quartiles of the dataset while the whiskersextend to 1.5 times the IQR past the upper and lower quartiles. White circles give the mean.

23

Page 24: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

2 3 4 5 6 70.2

0.4

0.6

0.8

1.0

Number of Qubits

Heavy

Outpu

tsProba

bility

Device:[ ] ibmq_singapore[ ] ibmqx2[ ] ibmq_16_melbourne[ ] ibmq_ourense[ ] Noise-Free

Figure 11: Comparison of real devices, using the heavy outputs probability metric, whenrunning shallow circuits compiled using noise-aware pytket. Boxes show quartiles of the datasetwhile the whiskers extend to 1.5 times the IQR past the upper and lower quartiles. White circles givethe mean.

5.3 Insights from Classical Simula-tion

The noise present in a non-fault-tolerant quantumcomputer results in discrepancies between resultsobtained from running on real hardware and thosethat would be obtained from an ideal quantumcomputer. Often, noise models are utilised dur-ing classical simulation to investigate the effects ofnoise and help identify why these discrepancies oc-cur [81]. However, a perfect model of the noise,which could reproduce the results of real hardware(up to statistical error) could require many parame-ters to completely specify it. Therefore, most noisemodels consider only a small handful of physical ef-fects. Consequently discrepancies between the re-sults of noisy simulation and running experimentson real hardware always remain.

Historically, closing the gap between noisy simu-lation and real hardware required developing noisemodels of increasing sophistication. Developingthem typically requires a great deal of physics ex-pertise to identify new noise channels. Further, newexperiments would have to be designed in order toestimate their parameters in the noise model.

Here, we suggest some of the benchmarks con-ducted in this work could be helpful in identifyingwhether new noise channels should be incorporatedinto a noise model. In particular, by isolating thecircuit types and coupling maps for which the dis-crepancies are greatest, it is possible to speculateabout the possible causes of the mismatch.

This investigation could also influence the perfor-mance of noise-aware compilation strategies, whichuse properties of the noise. Verifying the accuracyof these noise properties, through verification of theaccuracy of the resulting noise models, could im-prove the performance of noise-aware strategies.

For the devices explored here, the noise modelsare built using Qiskit. They are derived from adevice’s properties and include one- and two-qubitgate errors18 and single-qubit readout errors. Wefind these noise models are inadequate to explainsome of the discrepancies observed in the data.

Noise Does Not Just Flatten Distributions

One discrepancy between experiments and noisysimulations is the spread of the data. For example,Figure 7 shows that only in the experimental casedo the whiskers of the plot fall below the value 0.5,indicating the heavy outputs are less likely thanthey would be in the uniform distribution. Somenoise type, in particular one which shifts the prob-ability density, rather than uniformly flattening it,is not considered, or is under appreciated, by thenoise models used. Identifying that noise channelis left to future work, though we speculate it maybe related to a kind of thermal relaxation error.

18These are modelled to consist of a depolarising errorsfollowed by a thermal relaxation errors.

24

Page 25: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

Noise Models Under Represent Some NoiseChannels

The classical simulations in Figure 7 suggest ibmqx2should perform similarly to ibmq_ourense in mostcases. In fact, it quite consistently performs worse.This is isolated in Figure 8, with the same phe-nomenon being observed in Figure 3 and Figure 10,showing the behavior is consistent across all circuittypes and figures of merit.

This difference between simulated and experi-mental results is pronounced in the case of Fig-ure 10, where deep circuits are used. This suggeststhe noise models may be underestimating the er-ror from time-dependent noises such as depolaris-ing and dephasing, or from two-qubit gates whichare more prevalent in deep circuits.

Another such example of a two-qubit noise chan-nel, which is explicitly not accounted for in thenoise models, is crosstalk. The results in Figure8 are consistent with the expectation that cross-talk should have the greatest impact on more highlyconnected devices [79]. As such crosstalk may bethe origin of the discrepancy. Of note is the factthis benchmark wasn’t explicitly designed to cap-ture the effects of crosstalk, and yet those effectsmanifest themselves in its results. We anticipatethat including crosstalk-aware passes in compila-tion strategies [77] would reduce the discrepancy.

6 Conclusion

The performance of quantum computing devicesis highly dependent on several factors. Amongstthem are the noise levels of the device, the soft-ware used to construct and manipulate the circuitsimplemented, and the applications for which thedevice is used. The impact of these factors on theperformance of a quantum computing stack are in-tertwined, making the task of predicting its holisticperformance from knowledge of the performance ofeach component impossible. In order to understandand measure the performance of quantum comput-ing stacks, benchmarks must take this into consid-eration.

In this work we have addressed this prob-lem by introducing a methodology for perform-ing application-motivated, holistic benchmarkingof the full quantum computing stack. To do so we

provide a benchmark suite utilising differing circuitclasses and figures of merit to access a variety ofproperties of the device. This includes the use ofthree circuit classes: deep circuits and shallow cir-cuits, which are novel to this paper; and squarecircuits, which resemble random circuits used inother benchmarking experiments [38]. In additionwe make use of a diverse selection of figures of meritto measure the performance of the quantum com-puting stacks considered, namely: Heavy OutputGeneration Benchmarking, Cross-Entropy Bench-marking, and the `1-norm distance.

In particular, in the form of deep circuits wepresent an alternative to previous approaches toapplication-motivated benchmarking. This is byconsidering circuits inspired by one of the primi-tives utilised in VQE, namely Pauli gadgets em-ployed for state preparation, rather than VQE it-self. Further, while we have found that the per-formances of quantum computing stacks are indis-tinguishable when using square circuits and HeavyOutput Generation Benchmarking for a large num-ber of qubits, shallow circuits extend the numberof qubits for which detail can be observed, whilealso being consistent with philosophy of volumetricbenchmarking.

We demonstrate this benchmark suite byemploying it on ibmqx2, ibmq_16_melbourne,ibmq_ourense, and ibmq_singapore. In doing sowe justified our thesis that the accuracy of a com-putation depends on several levels of the quantumcomputing stack, and that each layer should notbe considered in isolation. For example, identify-ing that the increased connectivity of a device doesnot compensate for the increased noise, as we do inSection 5.1, shows the impact of this layer of thestack, and justifies investigating devices with a va-riety of coupling maps and noise levels. By showingthe differing performance between five compilationstrategies, we are able to identify, in Section 5.1,the dependence of the best compilation strategy touse on the device and the dimension of the circuit.This illustrates the dependence of the performanceof the quantum computing stack on the compilationlayer, and the interdependence between the compi-lation strategy, device and application on the over-all performance of the quantum computing stack.In particular, noise-aware compilation strategies of-ten perform well, when the noise model used by thestrategy is accurate, as discussed in Section 5.3.

25

Page 26: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

In Section 5.2, the wide selection of circuitswithin the proposed benchmark suite reveals thatthe same device, evaluated according to a fixed fig-ure of merit, will perform differently when runningdifferent applications, whose circuits are compiledby the same compilation strategy. Indeed the com-parative performance of (compilation strategy, de-vice) pairs is shown to vary between our circuitclasses. This justifies our inclusion of circuit classeswhich collectively cover a wide selection of applica-tions in the benchmark suite proposed here, andour full quantum computing stack approach.

We foresee the benchmarks conducted in thiswork providing a means to select the best quan-tum computing stack, of those explored here, for aparticular task, and vice versa. As such we also an-ticipate that a variety of new quantum computingstacks could be benchmarked in the way describedin this work, empowering the user with knowledgeabout the performance of current quantum tech-nologies for particular tasks.

These benchmarks may, in time, come to comple-ment noise models and calibration information as ameans to disseminate information about a device’sperformance. This parallels the use of the LIN-PACK benchmarks [6] alongside FLOPS to com-pare diverse classical computers. Recently, quan-tum volume, as defined in [38], has started to beadopted as one such metric [82], and we hope thebenchmark suite developed here will be incorpo-rated similarly. Further, our benchmarks may fa-cilitate an understanding of how new, or hard-to-characterize, noise affects the practical performanceof quantum computers, as implied by the classicalsimulations of Section 5.3.

The work presented here could be extended inseveral directions. The first is to examine the im-pact of incorporating these benchmarks into a com-pilation strategy. While noise-aware compilationstrategies currently use properties of qubits to de-cide how to compile a circuit, it would be interest-ing to explore if instead optimising for these bench-marks would change the compilation. The tradeoff between the benefits of doing so against the in-creased compilation time resulting from the timetaken to perform the benchmarks should then alsobe assessed. This information would help in theunderstanding of the interplay between the amountof classical circuit optimisation performed and theamount by which the performance of a quantum

system can be increased.

Second, the philosophy of application-motivatedbenchmarking could be extended to circuits whichare more easily classically simulable. Because oftheir reliance on classical simulation, the bench-marks introduced here may be used up to, but notafter, the point of demonstrating quantum compu-tational supremacy. Hence new circuit classes willneed to be introduced which can be classically sim-ulated in this regime. Alternatively, application-motivated benchmarks that are derived from com-bining benchmarks of smaller devices [3] could bedeveloped.

Third, we envision a need to systematically studyhow properties of hardware, such as noise levelsor connectivity, influence a given device’s perfor-mance. In this work, we were limited to the par-ticular devices made available by IBM Quantum,which limits our ability to perform such a system-atic inquiry. It is is nevertheless vital to do so, asthe results of Section 5.1 show that changing thehardware can dramatically influence performance.Indeed, this would allow us understand if the ob-servations made in Section 5.1 are typical, and toexplore the existence of other relationships. Thiscould be achieved by implementing this benchmarksuite on more devices, or synthetic devices withtunable coupling maps and noise information.

Finally, there is a need study the correlationbetween the results of an application-motivatedbenchmark and the performance of a quantum com-puting stack at running the application which mo-tivated it. This would show that benchmarkingapplication subroutines provides reliable predictorsof performance when running the application itself.While similar work has explored the correlation be-tween the classification accuracy and circuit prop-erties of parametrised quantum circuits [37], com-paring the performance of the benchmarks definedhere with their applications is a subject for futurework. For example, comparing the performance ofa stack at implementing deep circuits and runningthe VQE algorithm would show the extent to whichquantum computing stacks that perform well at aparticular kind of state preparation circuit also per-form well in estimating properties of a wide rangeof molecules.

26

Page 27: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

Data Availability

The data gathered during the experiments con-ducted for this work, as presented in Section5, are available at http://doi.org/10.5281/zenodo.3832121. QASM files representing the cir-cuits executed, the per sample bitstring outputsfrom device and simulator executions, and devicecalibration data gathered throughout the course ofthe experiments, are provided.

Acknowledgements

DM acknowledges support from the Engineeringand Physical Sciences Research Council (grantEP/L01503X/1). This work was made possible, inpart, by systems built by IBM as part of the IBMQuantum program, and made accessible via mem-bership in the IBM Q Network.

IBM, IBM Q, and Qiskit are trademarks of Inter-national Business Machines Corporation, registeredin many jurisdictions worldwide. Other product orservice names may be trademarks or service marksof IBM or other companies.

References

[1] Gadi Aleksandrowicz, Thomas Alexander,Panagiotis Barkoutsos, Luciano Bello, et al.Qiskit: An Open-source Framework for Quan-tum Computing, January 2019. URL https://doi.org/10.5281/zenodo.2562111.

[2] Peter J Karalekas, Nikolas A Tezak, Eric C Pe-terson, Colm A Ryan, Marcus P da Silva, andRobert S Smith. A quantum-classical cloudplatform optimized for variational hybrid al-gorithms. Quantum Science and Technology,5(2):024003, April 2020. doi:10.1088/2058-9565/ab7559. URL https://doi.org/10.1088%2F2058-9565%2Fab7559.

[3] Frank Arute, Kunal Arya, Ryan Babbush,Dave Bacon, et al. Quantum supremacyusing a programmable superconducting pro-cessor. Nature, 574(7779):505–510, 2019.ISSN 1476-4687. doi:10.1038/s41586-019-1666-5. URL https://doi.org/10.1038/s41586-019-1666-5.

[4] Simon J. Devitt. Performing quantum com-puting experiments in the cloud. Phys.Rev. A, 94:032329, September 2016.doi:10.1103/PhysRevA.94.032329. URLhttps://link.aps.org/doi/10.1103/PhysRevA.94.032329.

[5] S. Debnath, N. M. Linke, C. Figgatt, K. A.Landsman, K. Wright, and C. Monroe.Demonstration of a small programmable quan-tum computer with atomic qubits. Nature,536(7614):63–66, August 2016. ISSN 1476-4687. doi:10.1038/nature18648. URL https://doi.org/10.1038/nature18648.

[6] A Petitet, R C Whaley, J Dongarra, andA Cleary. Hpl - a portable implementation ofthe high-performance linpack benchmark fordistributed-memory computers. URL http://www.netlib.org/benchmark/hpl/.

[7] Jack J. Dongarra, Piotr Luszczek, andAntoine Petitet. The linpack benchmark:past, present and future. Concurrency andComputation: Practice and Experience, 15(9):803–820, 2003. doi:10.1002/cpe.728. URLhttps://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.728.

[8] Jack Dongarra and Piotr Luszczek.TOP500, pages 2055–2057. Springer US,Boston, MA, 2011. ISBN 978-0-387-09766-4. doi:10.1007/978-0-387-09766-4_157. URL https://doi.org/10.1007/978-0-387-09766-4_157.

[9] Ryan LaRose. Overview and Compari-son of Gate Level Quantum Software Plat-forms. Quantum, 3:130, March 2019.ISSN 2521-327X. doi:10.22331/q-2019-03-25-130. URL https://doi.org/10.22331/q-2019-03-25-130.

[10] Seyon Sivarajah, Silas Dilkes, AlexanderCowtan, Will Simmons, Alec Edgington, andRoss Duncan. t|ket〉: A retargetable compilerfor nisq devices. Quantum Science and Tech-nology, 2020. doi:10.1088/2058-9565/ab8e92.URL https://iopscience.iop.org/article/10.1088/2058-9565/ab8e92.

27

Page 28: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

[11] Robert S. Smith, Michael J. Curtis, andWilliam J. Zeng. A practical quantum in-struction set architecture. 2016. URL https://arxiv.org/abs/1608.03355.

[12] Frank Arute, Kunal Arya, Ryan Babbush,Dave Bacon, et al. Quantum approximate op-timization of non-planar graph problems on aplanar superconducting processor, 2020. URLhttps://arxiv.org/abs/2004.04197.

[13] Robin Blume-Kohout and Kevin C. Young. Avolumetric framework for quantum computerbenchmarks, 2019. URL https://arxiv.org/abs/1904.05546.

[14] Norbert M. Linke, Dmitri Maslov, MartinRoetteler, Shantanu Debnath, Caroline Fig-gatt, Kevin A. Landsman, Kenneth Wright,and Christopher Monroe. Experimental com-parison of two quantum computing architec-tures. Proceedings of the National Academyof Sciences, 114(13):3305–3310, 2017. ISSN0027-8424. doi:10.1073/pnas.1618020114.URL https://www.pnas.org/content/114/13/3305.

[15] Sam McArdle, Suguru Endo, Alán Aspuru-Guzik, Simon C. Benjamin, and XiaoYuan. Quantum computational chem-istry. Rev. Mod. Phys., 92:015003, March2020. doi:10.1103/RevModPhys.92.015003.URL https://link.aps.org/doi/10.1103/RevModPhys.92.015003.

[16] Alexander J. McCaskey, Zachary P. Parks,Jacek Jakowski, Shirley V. Moore, Titus D.Morris, Travis S. Humble, and Raphael C.Pooser. Quantum chemistry as a bench-mark for near-term quantum computers.npj Quantum Information, 5(1):99, 2019.ISSN 2056-6387. doi:10.1038/s41534-019-0209-0. URL https://doi.org/10.1038/s41534-019-0209-0.

[17] Pierre-Luc Dallaire-Demers, Michał Stęchły,Jerome F. Gonthier, Ntwali Toussaint Bashige,Jonathan Romero, and Yudong Cao. An appli-cation benchmark for fermionic quantum sim-ulations. 2020. URL https://arxiv.org/abs/2003.01862.

[18] E. F. Dumitrescu, A. J. McCaskey, G. Hagen,G. R. Jansen, T. D. Morris, T. Papenbrock,R. C. Pooser, D. J. Dean, and P. Lougov-ski. Cloud quantum computing of an atomicnucleus. Phys. Rev. Lett., 120:210501, May2018. doi:10.1103/PhysRevLett.120.210501.URL https://link.aps.org/doi/10.1103/PhysRevLett.120.210501.

[19] Frank Arute, Kunal Arya, Ryan Babbush,Dave Bacon, et al. Hartree-fock on a supercon-ducting qubit quantum computer, 2020. URLhttps://arxiv.org/abs/2004.04174.

[20] Vedran Dunjko and Hans J Briegel. Ma-chine learning & artificial intelligence inthe quantum domain: a review of recentprogress. Reports on Progress in Physics,81(7):074001, June 2018. doi:10.1088/1361-6633/aab406. URL https://doi.org/10.1088%2F1361-6633%2Faab406.

[21] Marcello Benedetti, Delfina Garcia-Pintos,Oscar Perdomo, Vicente Leyton-Ortega, Yun-seong Nam, and Alejandro Perdomo-Ortiz. Agenerative modeling approach for benchmark-ing and training shallow quantum circuits.npj Quantum Information, 5(1):45, 2019.ISSN 2056-6387. doi:10.1038/s41534-019-0157-8. URL https://doi.org/10.1038/s41534-019-0157-8.

[22] Kathleen E. Hamilton and Raphael C. Pooser.Error-mitigated data-driven circuit learningon noisy quantum hardware. 2019. URLhttps://arxiv.org/abs/1911.13289.

[23] Kathleen E. Hamilton, Eugene F. Du-mitrescu, and Raphael C. Pooser. Gener-ative model benchmarks for superconduct-ing qubits. Phys. Rev. A, 99:062323,June 2019. doi:10.1103/PhysRevA.99.062323.URL https://link.aps.org/doi/10.1103/PhysRevA.99.062323.

[24] Madita Willsch, Dennis Willsch, FengpingJin, Hans De Raedt, and Kristel Michielsen.Benchmarking the quantum approximate op-timization algorithm, 2019. URL https://arxiv.org/abs/1907.02359.

28

Page 29: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

[25] Andreas Bengtsson, Pontus Vikstål, Christo-pher Warren, Marika Svensson, et al. Quan-tum approximate optimization of the exact-cover problem on a superconducting quantumprocessor, 2019. URL https://arxiv.org/abs/1912.10495.

[26] G. Pagano, A. Bapat, P. Becker, K. S. Collins,A. De, P. W. Hess, H. B. Kaplan, A. Kypri-anidis, W. L. Tan, C. Baldwin, L. T. Brady,A. Deshpande, F. Liu, S. Jordan, A. V. Gor-shkov, and C. Monroe. Quantum approximateoptimization of the long-range ising modelwith a trapped-ion quantum simulator, 2019.URL https://arxiv.org/abs/1906.02700.

[27] John Preskill. Quantum computing and theentanglement frontier, 2012. URL https://arxiv.org/abs/1203.5813.

[28] Aram W. Harrow and Ashley Montanaro.Quantum computational supremacy. Nature,549(7671):203–209, September 2017. ISSN1476-4687. doi:10.1038/nature23458. URLhttp://dx.doi.org/10.1038/nature23458.

[29] Sergio Boixo, Sergei V. Isakov, Vadim N.Smelyanskiy, Ryan Babbush, Nan Ding,Zhang Jiang, Michael J. Bremner, John M.Martinis, and Hartmut Neven. Characteriz-ing quantum supremacy in near-term devices.Nature Physics, 14(6):595–600, June 2018.ISSN 1745-2481. doi:10.1038/s41567-018-0124-x. URL https://doi.org/10.1038/s41567-018-0124-x.

[30] Dominic W. Berry, Graeme Ahokas, RichardCleve, and Barry C. Sanders. Efficientquantum algorithms for simulating sparsehamiltonians. Communications in Mathe-matical Physics, 270(2):359–371, Mar 2007.ISSN 1432-0916. doi:10.1007/s00220-006-0150-x. URL https://doi.org/10.1007/s00220-006-0150-x.

[31] Alberto Peruzzo, Jarrod McClean, Peter Shad-bolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J.Love, Alán Aspuru-Guzik, and Jeremy L.O’Brien. A variational eigenvalue solver ona photonic quantum processor. Nature Com-munications, 5(1):4213, July 2014. ISSN 2041-1723. doi:10.1038/ncomms5213. URL https://doi.org/10.1038/ncomms5213.

[32] Jonathan Romero, Ryan Babbush, Jarrod RMcClean, Cornelius Hempel, Peter J Love,and Alán Aspuru-Guzik. Strategies forquantum computing molecular energies usingthe unitary coupled cluster ansatz. QuantumScience and Technology, 4(1):014008, October2018. doi:10.1088/2058-9565/aad3e4. URLhttps://iopscience.iop.org/article/10.1088/2058-9565/aad3e4.

[33] Sukin Sim, Peter D. Johnson, and AlánAspuru-Guzik. Expressibility and entanglingcapability of parameterized quantum circuitsfor hybrid quantum-classical algorithms. Ad-vanced Quantum Technologies, 2(12):1900070,2019. doi:10.1002/qute.201900070. URLhttps://onlinelibrary.wiley.com/doi/abs/10.1002/qute.201900070.

[34] Abhinav Kandala, Antonio Mezzacapo,Kristan Temme, Maika Takita, MarkusBrink, Jerry M. Chow, and Jay M. Gam-betta. Hardware-efficient variational quan-tum eigensolver for small molecules andquantum magnets. Nature, 549(7671):242–246, September 2017. ISSN 1476-4687. doi:10.1038/nature23879. URLhttps://doi.org/10.1038/nature23879.

[35] Brian Coyle, Daniel Mills, Vincent Danos, andElham Kashefi. The born supremacy: Quan-tum advantage and training of an ising bornmachine. 2019. URL https://arxiv.org/abs/1904.02214.

[36] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu,and Dacheng Tao. The expressive power ofparameterized quantum circuits, 2018. URLhttps://arxiv.org/abs/1810.11922.

[37] Thomas Hubregtsen, Josef Pichlmeier, andKoen Bertels. Evaluation of parameterizedquantum circuits: on the design, and the rela-tion between classification accuracy, express-ibility and entangling capability. 2020. URLhttps://arxiv.org/abs/2003.09887.

[38] Andrew W. Cross, Lev S. Bishop, SarahSheldon, Paul D. Nation, and Jay M.Gambetta. Validating quantum com-puters using randomized model circuits.Phys. Rev. A, 100:032328, September 2019.

29

Page 30: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

doi:10.1103/PhysRevA.100.032328. URLhttps://link.aps.org/doi/10.1103/PhysRevA.100.032328.

[39] Alexandru Gheorghiu, Theodoros Kapourni-otis, and Elham Kashefi. Verification ofquantum computation: An overview of ex-isting approaches. Theor. Comp. Sys., 63(4):715–808, May 2019. ISSN 1432-4350.doi:10.1007/s00224-018-9872-3. URL https://doi.org/10.1007/s00224-018-9872-3.

[40] Urmila Mahadev. Classical verification ofquantum computations. In 2018 IEEE59th Annual Symposium on Foundations ofComputer Science (FOCS), pages 259–267.IEEE, 2018. URL http://ieee-focs.org/FOCS-2018-Papers/pdfs/59f259.pdf.

[41] Wojciech Kozlowski and Stephanie Wehner.Towards large-scale quantum networks. InProceedings of the Sixth Annual ACM In-ternational Conference on Nanoscale Com-puting and Communication, NANOCOM ’19,New York, NY, USA, 2019. Association forComputing Machinery. ISBN 9781450368971.doi:10.1145/3345312.3345497. URL https://doi.org/10.1145/3345312.3345497.

[42] C. Greganti, T. F. Demarie, M. Ringbauer,J. A. Jones, V. Saggio, I. A. Calafell, L. A.Rozema, A. Erhard, M. Meth, L. Postler,R. Stricker, P. Schindler, R. Blatt, T. Monz,P. Walther, and J. F. Fitzsimons. Cross-verification of independent quantum devices.2019. URL https://arxiv.org/abs/1905.09790.

[43] Daniel Mills, Anna Pappa, TheodorosKapourniotis, and Elham Kashefi. Infor-mation theoretically secure hypothesis testfor temporally unstructured quantum com-putation (extended abstract). ElectronicProceedings in Theoretical Computer Science,266:209–221, February 2018. ISSN 2075-2180. doi:10.4204/eptcs.266.14. URL http://dx.doi.org/10.4204/EPTCS.266.14.

[44] Scott Aaronson and Lijie Chen. Complexity-theoretic foundations of quantum supremacyexperiments. 2016. URL https://arxiv.org/abs/1612.05903.

[45] NathanWiebe, Christopher Granade, and D GCory. Quantum bootstrapping via compressedquantum hamiltonian learning. New Journalof Physics, 17(2):022005, February 2015.doi:10.1088/1367-2630/17/2/022005. URLhttps://doi.org/10.1088%2F1367-2630%2F17%2F2%2F022005.

[46] Dan Shepherd and Michael J. Bremner.Temporally unstructured quantum compu-tation. Proceedings of the Royal SocietyA: Mathematical, Physical and Engi-neering Sciences, 465(2105):1413–1439,2009. doi:10.1098/rspa.2008.0443. URLhttps://royalsocietypublishing.org/doi/abs/10.1098/rspa.2008.0443.

[47] Michael J. Bremner, Richard Jozsa, andDan J. Shepherd. Classical simulation ofcommuting quantum computations impliescollapse of the polynomial hierarchy. Proceed-ings of the Royal Society A: Mathematical,Physical and Engineering Sciences, 467(2126):459–472, 2011. doi:10.1098/rspa.2010.0301.URL https://royalsocietypublishing.org/doi/abs/10.1098/rspa.2010.0301.

[48] Michael J. Bremner, Ashley Montanaro,and Dan J. Shepherd. Average-case com-plexity versus approximate simulationof commuting quantum computations.Phys. Rev. Lett., 117:080501, August 2016.doi:10.1103/PhysRevLett.117.080501. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.117.080501.

[49] Michael J. Bremner, Ashley Montanaro,and Dan J. Shepherd. Achieving quantumsupremacy with sparse and noisy commutingquantum computations. Quantum, 1:8, April2017. ISSN 2521-327X. doi:10.22331/q-2017-04-25-8. URL https://doi.org/10.22331/q-2017-04-25-8.

[50] Richard M. Karp and Richard J. Lipton. Someconnections between nonuniform and uniformcomplexity classes. In Proceedings of theTwelfth Annual ACM Symposium on The-ory of Computing, STOC ’80, page 302–309,New York, NY, USA, 1980. Association forComputing Machinery. ISBN 0897910176.

30

Page 31: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

doi:10.1145/800141.804678. URL https://doi.org/10.1145/800141.804678.

[51] J. Misra and David Gries. A constructiveproof of vizing’s theorem. Information Pro-cessing Letters, 41(3):131 – 133, 1992. ISSN0020-0190. doi:https://doi.org/10.1016/0020-0190(92)90041-S. URL http://www.sciencedirect.com/science/article/pii/002001909290041S.

[52] Juan Bermejo-Vega, Dominik Hangleiter,Martin Schwarz, Robert Raussendorf, andJens Eisert. Architectures for quantumsimulation showing a quantum speedup.Phys. Rev. X, 8:021010, April 2018.doi:10.1103/PhysRevX.8.021010. URLhttps://link.aps.org/doi/10.1103/PhysRevX.8.021010.

[53] D Hangleiter, M Kliesch, M Schwarz, andJ Eisert. Direct certification of a class ofquantum simulations. Quantum Scienceand Technology, 2(1):015004, February 2017.doi:10.1088/2058-9565/2/1/015004. URLhttps://doi.org/10.1088%2F2058-9565%2F2%2F1%2F015004.

[54] Adam Bouland, Bill Fefferman, ChinmayNirkhe, and Umesh Vazirani. Quantumsupremacy and the complexity of random cir-cuit sampling. 2018. URL https://arxiv.org/abs/1803.04402.

[55] Ramis Movassagh. Efficient unitary paths andquantum computational supremacy: A proofof average-case hardness of random circuitsampling. 2018. URL https://arxiv.org/abs/1810.04681.

[56] Michael A. Nielsen and Isaac L. Chuang.Quantum computation and quantum informa-tion: 10th anniversary edition, 2010.

[57] Alexander Cowtan, Silas Dilkes, Ross Duncan,Will Simmons, and Seyon Sivarajah. Phasegadget synthesis for shallow circuits. Elec-tronic Proceedings in Theoretical ComputerScience, 318:214–229, April 2020. ISSN 2075-2180. doi:10.4204/eptcs.318.13. URL http://dx.doi.org/10.4204/EPTCS.318.13.

[58] Panagiotis Kl. Barkoutsos, Jerome F.Gonthier, Igor Sokolov, Nikolaj Moll, GianSalis, Andreas Fuhrer, Marc Ganzhorn,Daniel J. Egger, Matthias Troyer, Anto-nio Mezzacapo, Stefan Filipp, and IvanoTavernelli. Quantum algorithms for elec-tronic structure calculations: Particle-holehamiltonian and optimized wave-functionexpansions. Phys. Rev. A, 98:022322, August2018. doi:10.1103/PhysRevA.98.022322.URL https://link.aps.org/doi/10.1103/PhysRevA.98.022322.

[59] C. E. Porter and R. G. Thomas. Fluc-tuations of nuclear reaction widths.Phys. Rev., 104:483–491, October 1956.doi:10.1103/PhysRev.104.483. URLhttps://link.aps.org/doi/10.1103/PhysRev.104.483.

[60] C. Neill, P. Roushan, K. Kechedzhi, S. Boixo,et al. A blueprint for demonstrating quantumsupremacy with superconducting qubits.Science, 360(6385):195–199, 2018. ISSN0036-8075. doi:10.1126/science.aao4309.URL https://science.sciencemag.org/content/360/6385/195.

[61] Sergio Boixo, Vadim N. Smelyanskiy, andHartmut Neven. Fourier analysis of samplingfrom noisy chaotic quantum circuits. 2017.URL https://arxiv.org/abs/1708.01875.

[62] Scott Aaronson and Alex Arkhipov. Thecomputational complexity of linear optics.In Proceedings of the Forty-Third AnnualACM Symposium on Theory of Comput-ing, STOC ’11, page 333–342, New York,NY, USA, 2011. Association for Com-puting Machinery. ISBN 9781450306911.doi:10.1145/1993636.1993682. URL https://doi.org/10.1145/1993636.1993682.

[63] Edwin Pednault, John A. Gunnels, GiacomoNannicini, Lior Horesh, and Robert Wisnieff.Leveraging secondary storage to simulate deep54-qubit sycamore circuits. 2019. URL https://arxiv.org/abs/1910.09534.

[64] Benjamin Villalonga, Sergio Boixo, Bron Nel-son, Christopher Henze, Eleanor Rieffel, Ru-pak Biswas, and Salvatore Mandrà. A flex-ible high-performance simulator for verifying

31

Page 32: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

and benchmarking quantum circuits imple-mented on real hardware. npj Quantum In-formation, 5(1):86, 2019. ISSN 2056-6387.doi:10.1038/s41534-019-0196-1. URL https://doi.org/10.1038/s41534-019-0196-1.

[65] Benjamin Villalonga, Dmitry Lyakh, SergioBoixo, Hartmut Neven, Travis S Humble, Ru-pak Biswas, Eleanor G Rieffel, Alan Ho, andSalvatore Mandrà. Establishing the quan-tum supremacy frontier with a 281 pflop/ssimulation. Quantum Science and Technol-ogy, 5(3):034003, apr 2020. doi:10.1088/2058-9565/ab7eeb. URL https://doi.org/10.1088%2F2058-9565%2Fab7eeb.

[66] pytket documentation. URL https://cqcl.github.io/pytket/build/html/index.html.

[67] qiskit documentation. URL https://qiskit.org/documentation/.

[68] Sumeet Khatri, Ryan LaRose, AlexanderPoremba, Lukasz Cincio, Andrew T. Sorn-borger, and Patrick J. Coles. Quantum-assisted quantum compiling. Quantum, 3:140,May 2019. ISSN 2521-327X. doi:10.22331/q-2019-05-13-140. URL https://doi.org/10.22331/q-2019-05-13-140.

[69] Ali JavadiAbhari, Shruti Patil, DanielKudrow, Jeff Heckey, Alexey Lvov, Fred-eric T. Chong, and Margaret Martonosi.Scaffcc: A framework for compilation andanalysis of quantum computing programs.In Proceedings of the 11th ACM Confer-ence on Computing Frontiers, CF ’14, NewYork, NY, USA, 2014. Association for Com-puting Machinery. ISBN 9781450328708.doi:10.1145/2597917.2597939. URL https://doi.org/10.1145/2597917.2597939.

[70] Aram W. Harrow, Benjamin Recht, andIsaac L. Chuang. Efficient discrete ap-proximations of quantum gates. Journal ofMathematical Physics, 43(9):4445–4451, 2002.doi:10.1063/1.1495899. URL https://doi.org/10.1063/1.1495899.

[71] Joel J. Wallman and Joseph Emerson.Noise tailoring for scalable quantum

computation via randomized compiling.Phys. Rev. A, 94:052325, November 2016.doi:10.1103/PhysRevA.94.052325. URLhttps://link.aps.org/doi/10.1103/PhysRevA.94.052325.

[72] Jeff Heckey, Shruti Patil, Ali JavadiAb-hari, Adam Holmes, Daniel Kudrow, Ken-neth R. Brown, Diana Franklin, Frederic T.Chong, and Margaret Martonosi. Compilermanagement of communication and paral-lelism for quantum computation. In Pro-ceedings of the Twentieth International Con-ference on Architectural Support for Pro-gramming Languages and Operating Sys-tems, ASPLOS ’15, page 445–456, NewYork, NY, USA, 2015. Association for Com-puting Machinery. ISBN 9781450328357.doi:10.1145/2694344.2694357. URL https://doi.org/10.1145/2694344.2694357.

[73] Prakash Murali, Jonathan M. Baker, AliJavadi-Abhari, Frederic T. Chong, andMargaret Martonosi. Noise-adaptive com-piler mappings for noisy intermediate-scalequantum computers. In Proceedings ofthe Twenty-Fourth International Conferenceon Architectural Support for ProgrammingLanguages and Operating Systems, ASP-LOS ’19, page 1015–1029, New York,NY, USA, 2019. Association for Com-puting Machinery. ISBN 9781450362405.doi:10.1145/3297858.3304075. URL https://doi.org/10.1145/3297858.3304075.

[74] Ali J Abhari, Arvin Faruque, Mohammad JDousti, Lukas Svec, Oana Catu, AmlanChakrabati, Chen-Fu Chiang, Seth Vander-wilt, John Black, and Fred Chong. Scaffold:Quantum programming language. Techni-cal report, PRINCETON UNIV NJ DEPTOF COMPUTER SCIENCE, 2012. URLhttps://www.cs.princeton.edu/research/techreps/TR-934-12.

[75] Alexander J McCaskey, Dmitry I Lyakh,Eugene F Dumitrescu, Sarah S Powers, andTravis S Humble. XACC: a system-levelsoftware infrastructure for heterogeneousquantum–classical computing. Quantum Sci-ence and Technology, 5(2):024002, February

32

Page 33: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

2020. doi:10.1088/2058-9565/ab6bf6. URLhttps://doi.org/10.1088%2F2058-9565%2Fab6bf6.

[76] Alexander Cowtan, Silas Dilkes, Ross Duncan,Alexandre Krajenbrink, Will Simmons, andSeyon Sivarajah. On the Qubit Routing Prob-lem. In Wim van Dam and Laura Mancinska,editors, 14th Conference on the Theory ofQuantum Computation, Communication andCryptography (TQC 2019), volume 135 ofLeibniz International Proceedings in Infor-matics (LIPIcs), pages 5:1–5:32, Dagstuhl,Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. ISBN 978-3-95977-112-2. doi:10.4230/LIPIcs.TQC.2019.5.URL http://drops.dagstuhl.de/opus/volltexte/2019/10397.

[77] Prakash Murali, David C. Mckay, MargaretMartonosi, and Ali Javadi-Abhari. Softwaremitigation of crosstalk on noisy intermediate-scale quantum computers. In Proceed-ings of the Twenty-Fifth International Con-ference on Architectural Support for Pro-gramming Languages and Operating Sys-tems, ASPLOS ’20, page 1001–1016, NewYork, NY, USA, 2020. Association for Com-puting Machinery. ISBN 9781450371025.doi:10.1145/3373376.3378477. URL https://doi.org/10.1145/3373376.3378477.

[78] IBM Quantum Experience User Guide.Advanced single-qubit gates, 2019. URLhttps://quantum-computing.ibm.com/support/guides/user-guide.

[79] Christopher Chamberland, Guanyu Zhu,Theodore J. Yoder, Jared B. Hertzberg, andAndrew W. Cross. Topological and subsys-tem codes on low-degree graphs with flagqubits. Phys. Rev. X, 10:011022, Jan-uary 2020. doi:10.1103/PhysRevX.10.011022.URL https://link.aps.org/doi/10.1103/PhysRevX.10.011022.

[80] Jerry Chow and Jay Gambetta. Quan-tum takes flight: Moving from labora-tory demonstrations to building systems,2020. URL https://www.ibm.com/blogs/research/2020/01/quantum-volume-32/.

[81] Iskren Vankov, Daniel Mills, Petros Wallden,and Elham Kashefi. Methods for classicallysimulating noisy networked quantum architec-tures. Quantum Science and Technology, 5(1):014001, November 2019. doi:10.1088/2058-9565/ab54a4. URL https://doi.org/10.1088%2F2058-9565%2Fab54a4.

[82] J. M. Pino, J. M. Dreiling, C. Figgatt, J. P.Gaebler, S. A. Moses, M. S. Allman, C. H.Baldwin, M. Foss-Feig, D. Hayes, K. Mayer,C. Ryan-Anderson, and B. Neyenhuis. Demon-stration of the qccd trapped-ion quantum com-puter architecture, 2020. URL https://arxiv.org/abs/2003.01293.

[83] Joseph Emerson, Yaakov S. Weinstein, Mar-cos Saraceno, Seth Lloyd, and David G.Cory. Pseudo-random unitary operatorsfor quantum information processing. Sci-ence, 302(5653):2098–2100, 2003. ISSN0036-8075. doi:10.1126/science.1090790.URL https://science.sciencemag.org/content/302/5653/2098.

[84] Joseph Emerson, Etera Livine, andSeth Lloyd. Convergence conditionsfor random quantum circuits. Phys.Rev. A, 72:060302, December 2005.doi:10.1103/PhysRevA.72.060302. URLhttps://link.aps.org/doi/10.1103/PhysRevA.72.060302.

[85] Fernando G. S. L. Brandão, Aram W. Har-row, and Michał Horodecki. Local randomquantum circuits are approximate polynomial-designs. Communications in Mathemati-cal Physics, 346(2):397–434, September 2016.ISSN 1432-0916. doi:10.1007/s00220-016-2706-8. URL https://doi.org/10.1007/s00220-016-2706-8.

[86] Andrew Fagan and Ross Duncan. Optimis-ing clifford circuits with quantomatic. Elec-tronic Proceedings in Theoretical ComputerScience, 287:85–105, January 2019. ISSN2075-2180. doi:10.4204/eptcs.287.5. URLhttp://dx.doi.org/10.4204/EPTCS.287.5.

[87] M Blaauboer and R L de Visser. An ana-lytical decomposition protocol for optimalimplementation of two-qubit entangling

33

Page 34: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

gates. Journal of Physics A: Mathematicaland Theoretical, 41(39):395307, sep 2008.doi:10.1088/1751-8113/41/39/395307. URLhttps://doi.org/10.1088%2F1751-8113%2F41%2F39%2F395307.

[88] Easwar Magesan and Jay M. Gambetta.Effective hamiltonian models of the cross-resonance gate. Phys. Rev. A, 101:052308,May 2020. doi:10.1103/PhysRevA.101.052308.URL https://link.aps.org/doi/10.1103/PhysRevA.101.052308.

[89] Timothy Proctor, Kenneth Rudinger,Kevin Young, Mohan Sarovar, andRobin Blume-Kohout. What randomizedbenchmarking actually measures. Phys.Rev. Lett., 119:130502, September 2017.doi:10.1103/PhysRevLett.119.130502. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.119.130502.

[90] E. Knill, D. Leibfried, R. Reichle, J. Britton,R. B. Blakestad, J. D. Jost, C. Langer,R. Ozeri, S. Seidelin, and D. J. Wineland.Randomized benchmarking of quantumgates. Phys. Rev. A, 77:012307, January2008. doi:10.1103/PhysRevA.77.012307.URL https://link.aps.org/doi/10.1103/PhysRevA.77.012307.

[91] Arnaud Carignan-Dugas, Kristine Boone,Joel J Wallman, and Joseph Emerson.From randomized benchmarking experimentsto gate-set circuit fidelity: how to inter-pret randomized benchmarking decay pa-rameters. New Journal of Physics, 20(9):092001, September 2018. doi:10.1088/1367-2630/aadcc7. URL https://doi.org/10.1088%2F1367-2630%2Faadcc7.

[92] Timothy J. Proctor, Arnaud Carignan-Dugas,Kenneth Rudinger, Erik Nielsen, RobinBlume-Kohout, and Kevin Young. Directrandomized benchmarking for multiqubit de-vices. Phys. Rev. Lett., 123:030503, July2019. doi:10.1103/PhysRevLett.123.030503.URL https://link.aps.org/doi/10.1103/PhysRevLett.123.030503.

[93] Jay M. Gambetta, A. D. Córcoles, S. T.Merkel, B. R. Johnson, John A. Smolin,

Jerry M. Chow, Colm A. Ryan, ChadRigetti, S. Poletto, Thomas A. Ohki,Mark B. Ketchen, and M. Steffen. Char-acterization of addressability by simulta-neous randomized benchmarking. Phys.Rev. Lett., 109:240504, December 2012.doi:10.1103/PhysRevLett.109.240504. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.109.240504.

[94] David C. McKay, Andrew W. Cross, Christo-pher J. Wood, and Jay M. Gambetta. Corre-lated randomized benchmarking, 2020. URLhttps://arxiv.org/abs/2003.02354.

[95] Easwar Magesan, J. M. Gambetta, andJoseph Emerson. Scalable and robustrandomized benchmarking of quantum pro-cesses. Phys. Rev. Lett., 106:180504, May2011. doi:10.1103/PhysRevLett.106.180504.URL https://link.aps.org/doi/10.1103/PhysRevLett.106.180504.

[96] Easwar Magesan, Jay M. Gambetta, andJoseph Emerson. Characterizing quan-tum gates via randomized benchmarking.Phys. Rev. A, 85:042311, April 2012.doi:10.1103/PhysRevA.85.042311. URLhttps://link.aps.org/doi/10.1103/PhysRevA.85.042311.

[97] A. D. Córcoles, Jay M. Gambetta, Jerry M.Chow, John A. Smolin, Matthew Ware, JoelStrand, B. L. T. Plourde, and M. Steffen.Process verification of two-qubit quan-tum gates by randomized benchmarking.Phys. Rev. A, 87:030301, March 2013.doi:10.1103/PhysRevA.87.030301. URLhttps://link.aps.org/doi/10.1103/PhysRevA.87.030301.

[98] David C. McKay, Sarah Sheldon, John A.Smolin, Jerry M. Chow, and Jay M. Gam-betta. Three-qubit randomized benchmark-ing. Phys. Rev. Lett., 122:200502, May2019. doi:10.1103/PhysRevLett.122.200502.URL https://link.aps.org/doi/10.1103/PhysRevLett.122.200502.

34

Page 35: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

A Exponential DistributionThe exponential distribution, with rate λ, is a prob-ability distribution with the probability densityfunction

Pr (x) = λe−λx.

This is the distribution of waiting times betweenevents in a Poisson process. We are concernedwith showing that output probabilities of the cir-cuits classes considered here are exponentially dis-tributed. Such a property is a signature of quan-tum chaos, and that a class of circuits is approx-imately Haar random [29, 83, 84]. It also allowsfor the calculation of both the ideal value of thecross-entropy discussed in Section 3.2, and the idealheavy output probability as discussed in Section3.1. This in turn allows us to fully exploit Cross-Entropy Benchmarking and Heavy Output Gener-ation Benchmarking. Here we will argue numeri-cally which of the circuits we introduce in Section2 generate output probabilities of this form19, anddiscuss the implications when they do not.

We also demonstrate why the circuit depths usedin Section 2 are necessary to generate output prob-abilities of this form. To do this we generate 100circuits of each type and number of layers, where alayer is as defined in the respective Algorithms ofSection 2. We then calculate the ideal output prob-abilities using classical simulation and compare thisdistribution of output probabilities to the exponen-tial distribution. In the case of square circuits anddeep circuits, we notice a better approximation ofthe exponential distribution by the distribution ofoutput probabilities, measured by the `1-norm dis-tance between the two, as the number of layers in-creases. We can use this to isolate the number oflayers at which the difference approaches its mini-mum.

A.1 Square CircuitsThe exponential form of the distribution of the out-put probabilities from random circuits similar tosquare circuits has been established [29, 44]. Asthe procedure we use to generate square circuits,seen in Algorithm 2, differs slightly from that used

19This numerical approach to demonstrating properties ofdistributions of output probabilities from particular circuitclasses parallels that taken in other work on benchmarking[29, 44, 52, 62].

for other similar random circuits [29, 38, 44], weexplore the distribution of its output probabilitieshere.

The relevant results are seen in Figure 12. Inparticular, it can be seen from Figure 12b that theminimum value of `1-norm distance between thedistribution of output probabilities and the expo-nential distribution is approached at a number oflayers equal to the number of qubits, justifying ourchoice of layer numbers in Algorithm 2. It may bethat asymptotically the number of layers requiredis sub-linear [29], although for the circuit sizes usedhere a linear growth in depth is appropriate. Fig-ure 12a illustrates the closeness of fit of the twodistributions.

A.2 Deep Circuits

Unlike with square circuits, there is no precedentfor utilising deep circuits to generate exponentiallydistributed output probabilities, as we do here.This allows us to use deep circuits as a uniquely in-sightful benchmark of the performance of quantumcomputing stacks, grounded both in the theoreticalresults of Section 3, and in pertinent applications.

The relevant results are seen in Figure 13. Inparticular, it can be seen from Figure 13b that theminimum value of `1-norm distance between thedistribution of output probabilities and the expo-nential distribution is approached at a number oflayers equal to three times the number of qubits,plus one, justifying our choice of layer numbers inAlgorithm 3. Figure 13a illustrates the closeness offit of the two distributions.

The depth required to achieve an exponential dis-tribution of outcome probabilities with deep cir-cuits is greater than is the case for square circuits.Indeed, random circuits were initially introducedas the shallowest circuits required to generate suchoutput probabilities [29]. This sacrifice in depthis made to achieve a benchmark which is uniquelyapplication motivated, as discussed in Section 2.

A.3 Shallow Circuits

Unlike in the case of square circuits and deep cir-cuits, the output probabilities of shallow circuitsare not exponentially distributed. This is unsur-prising since random circuits with this limited con-nectivity are thought to require at least depth

35

Page 36: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

0.00 0.05 0.10 0.15 0.20 0.25 0.300

10

20

30

pC

Proba

bilityDensity

(a) The distribution of output probabilities from a cir-cuit C, where C is a 5 qubit circuit, from the squarecircuits class as defined in Algorithm 2.

1 2 3 4 5 6

0.2

0.4

0.6

0.8

Number of Layers

` 1-norm

distan

ce

(b) The `1-norm distance between the distribution ofoutput probabilities of square circuits and the expo-nential distribution 2ne−2nx, where n is the number ofqubits. A layer is defined as in Algorithm 2. Colourscorrespond to numbers of qubits in the following way:2 [ ], 3 [ ], 4 [ ], 5 [ ].

Figure 12: Exponential distribution fittingdata for square circuits.

0.0 0.1 0.2 0.30

10

20

30

pC

Proba

bilityDensity

(a) The distribution of output probabilities from a cir-cuit C, where C is a 5 qubit circuit, from the deepcircuits class as defined in Algorithm 3.

0 5 10 15 200.00

0.25

0.50

0.75

1.00

1.25

1.50

Number of Layers

` 1-norm

distan

ce

(b) The `1-norm distance between the distribution ofoutput probabilities of deep circuits and the exponen-tial distribution 2ne−2nx, where n is the number ofqubits. A layer is defined as in Algorithm 3. Colourscorrespond to numbers of qubits in the following way:2 [ ], 3 [ ], 4 [ ], 5 [ ].

Figure 13: Exponential distribution fittingdata for deep circuits.

36

Page 37: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

O (√n) to create such a feature [52, 54, 85]. This

has the unfortunate side effect that the results ofSection 3.2 do not apply, and so Cross-EntropyBenchmarking cannot be used.

While it is also true that the predictions madeabout the ideal heavy output probability, as dis-cussed in Section 3.1, also do not apply, a studyof the heavy output probability is still of interest.In particular, while we cannot connect the bench-mark to the HOG problem of Problem 1, we cancompare the probability of generating heavy out-puts to the ideal probability of producing heavyoutputs, as calculated by classical simulation.

B Compilation Strategies

This section details the compilation strategies ex-plored in each of our experiments. For the circuitfamilies and figures of merit investigated here, thecompilation strategies we used were designed andempirically confirmed to perform well at the com-pilation tasks at hand. The version of each packageused are listed in Table 2.

noise-unaware pytket and noise-aware pytketThe noise-unaware pytket and noise-aware pytketcompilation strategies are generated using Algo-rithm 4. noise-unaware pytket is generated by pass-ing False as input to Algorithm 4, and noise-awarepytket by passing True.

Of particular interest are the following functions:

OptimiseCliffors: Simplifies Clifford gate se-quences [86].

KAKDecomposition: Identifies two-qubit sub-circuits with more than 3 CXs and reducesthem via the KAK/Cartan decomposition[87].

route: Modifies the circuit to satisfy the archi-tectural constraints [76]. This will introduceSWAP gates.

noise_aware_placement: Selects initial qubitplacement taking in to account reporteddevice gate error rates [10].

line_placement: Attempts to place qubits next tothose they interact with in the first few time

slices. This does not take device error ratesinto account.

Algorithm 4 pytket compilation strategies. Thepasses listed here are named as in the documenta-tion for pytket [66], where additional detail on theiractions can be found.

Input: noise_aware ∈ True, False

1: OptimiseCliffords2: KAKDecomposition3:4: RebaseToRzRx . Convert to IBM gate set5: CommuteRzRxThroughCX6:7: if noise_aware then8: noise_aware_placement9: else

10: line_placement11: end if12:13: route14: decompose_SWAP_to_CX15: redirect_CX_gates . Orientate CX to

coupling map16:17: OptimisePostRouting . Optimisation

preserving placement and orientation

noise-unaware Qiskit and noise-aware QiskitThe noise-unaware Qiskit and noise-aware Qiskitcompilation strategies, as defined in Algorithm 5,are heavily inspired by level_3_passmanager, apreconfigured compilation strategy made availablein Qiskit. noise-unaware Qiskit is generated by pass-ing noise_aware as False in Algorithm 4, andnoise-aware Qiskit by passing True.

Where possible we passed stochastic as True inorder to use StochasticSwap instead of BasicSwapduring the swap mapping pass. In general,StochasticSwap generates circuits with lowerdepth; however, for the versions listed in Table 2,it proved faulty for some circuit sizes and devicecoupling maps used in this work. StochasticSwapmay also result in repeated measurement of thesame qubit, which cannot be implement. Repeatedcompilation attempts may therefore be necessary,

37

Page 38: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

Package Version

Qiskit [1, 67] 0.12.0pytket [10, 66] 0.3.0

Table 2: Packages used in this work, and their corresponding versions.

and if this fails the circuit is not included in theplots of Section 5.

Of particular note are the following functions:

NoiseAdaptiveLayout: Selects initial qubit place-ment based on minimising readout error rates[73].

DenseLayout: Chooses placement by finding themost connected subset of qubits.

Unroller: Decomposes unitary operation to de-sired gate set.

StochasticSwap: Adds SWAP gates to adhere tocoupling map using a randomised algorithm.

BasicSwap: Produces a circuit adhering to cou-pling map using a simple rule: CX gates in thecircuit which are not supported by the hard-ware are preceded with necessary SWAP gates.

only pytket routing In this case we perform, inthe order as listed, the pytket operations: route,decompose_SWAP_to_CX, and redirect_CX_gates.We then account for the architecture gate set, with-out any further optimisation.

C Device Data

Two device properties leveraged by our compila-tion strategies are the coupling maps, describing theconnectivity of the qubits and in which directionsCX gates can be performed, and the calibration in-formation, describing the noise levels of the device.These properties, and devices noise levels in par-ticular, are considered valuable benchmarks of theperformance of the device in their own right.

These properties are collectively influential innoise-aware compiling, as detailed in Appendix B.

Algorithm 5 Qiskit compilation strategies. Thepasses listed here are named as in the documenta-tion for Qiskit [67], where additional detail on theiractions can be found.

Input:noise_aware ∈ True, Falsestochastic ∈ True, False

1: Unroller2:3: if noise_aware then4: NoiseAdaptiveLayout5: else6: DenseLayout7: end if8: AncillaAllocation . Assign idle qubits as

ancillas9:

10: if stochastic then11: StochasticSwap12: else13: BasicSwap14: end if15:16: Decompose(SwapGate) . Decompose SWAP to

CX17: CXDirection . Orientate CX to coupling map18:19: . Gather 2 qubit blocks20: Collect2qBlocks21: ConsolidateBlocks22:23: Unroller . Unroll two-qubit blocks24: Optimize1qGates . Combine chains of

one-qubit gates25: CXDirection

38

Page 39: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

There circuits are compiled to adhere to the de-vice’s coupling map, while also aiming to min-imise some function of the calibration informa-tion. Because full quantum computing stack holis-tic benchmarking encompasses the circuit compi-lation strategies, it provides a novel way of usingdevice information to benchmark an entire system,instead of simply the physical qubits which com-prise it.

C.1 Device Coupling Maps

A coupling map of a device is a graphical represen-tation of how two-qubit gates can be applied acrossthe device. In this representation, each qubit isrepresented by a vertex, with directed edges join-ing qubits between which a two-qubit gate can beapplied. For the devices considered here, this two-qubit gate is a CX gate, implemented using thecross-resonance interaction of transmon qubits [88].The direction of the edge is from the control tothe target qubit of the CX gate, with bi-directionaledges indicating that both qubits can be used aseither the control or target. The coupling mapsof the devices investigated in this work are shownin Figure 14. For those devices all edges are bi-directional, although this is not typical when theasymmetric CX is employed.

As discussed in Section 4, a trade-off exists be-tween the connectivity of the device and the num-ber of two-qubit gates necessary to implement agiven circuit. More highly connected coupling mapstypically require fewer two-qubit gates to imple-ment a fixed unitary than less connected ones, ow-ing to the reduced need for SWAP gates to ac-count for discrepancies between the coupling mapsof the uncompiled circuit and the device. Whilethis reduced depth can reduce the impact of timebased noise channels, this is counterbalanced bythe higher levels of cross-talk experienced by qubitscorresponding to vertices with high degree in thedevice’s coupling map [79].

C.2 Device Calibration Information

The noise-aware tools employed by the compilationstrategies explored in this work consider three kindsof errors which can occur, namely: readout error,single-qubit gate error, and two-qubit gate error.For the devices provided through IBM Quantum,

this information is contained in calibration datawhich is accessible using tools in the Qiskit library,and is updated twice daily. The experiments inthis paper were conducted between 2020-01-29 and2020-02-10 with the calibration data in Figure 15and Figure 16 aggregated over this time period.

An assignment or readout error corresponds toan incorrect reading of the state of the qubit; for ex-ample, returning “0” when the proper label is “1”, orvice-versa. The probability of incorrectly labellingthe qubit is called the readout error, denoted, εa,and is calculated as

εa =Pr(“0”||1〉) + Pr(“1”||0〉)

2. (11)

εa is estimated by repeatedly preparing a qubit ina known state, immediately measuring it, and thencounting the number of times the measurement re-turns the wrong label. This value, for the devicesexplored in this paper, is reported in Figure 15a.

Errors affecting the gates of the device corre-spond to an incorrect operation applied by the de-vice. There are many ways to quantify the effect ofthis error, with IBM Quantum’s devices reportingrandomized benchmarking (RB) numbers [89, 90].The RB number, εC, is estimated by running manyself-inverting Clifford circuits, consisting of m lay-ers of gates drawn from the n-qubit Clifford group,inverted at layer m + 1. The survival probabil-ity, which is the probability the input state is un-changed, can then be estimated. Under a broad setof noise models and assumptions [89, 91], this sur-vival probability can be shown to decay exponen-tially with m. Consequently, it can be estimatedby fitting a decay curve of the form Apm +B. TheRB number is related to p ∈ [0, 1], called the depo-larisation/decay rate, by

εC = (1− p) (1− 1/D) , (12)

where D = 2n, and n is the number of qubits actedon by the Clifford gates. εC, which is also referredto as the error per Clifford of the device, is min-imised at p = 1, in which case the survival proba-bility is constant and set by the state preparationand measurement errors.

The Clifford gates necessary for RB must be com-piled to the native gate set of the device. Using anestimate of εC, an estimate of the error per gate, εgG,for a gate G, can be obtained by multiplying εC by

39

Page 40: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

(a) ibmqx2 (b) ibmq_singapore (c) ibmq_16_melbourne (d) ibmq_ourense

Figure 14: Coupling maps of the devices studied in this work. Vertices, represented by bluecircles, correspond to qubits, while edges are directed from the control to the target qubits of permittedtwo-qubit gates.

a factor related to the average number of uses of Gwhen implementing a random Clifford operation:

εgG ∼ εC ×# uses of G per Clifford. (13)

Values for εgU2, the error per gate for U2 gates, can

be found in Figure 15b, and εgCX, that for CX gates,in Figure 16. The commonly reported average fi-delity for U3 gates is 1−

(1− εgU2

)2.There are many variants of randomized bench-

marking, such as direct RB [92], simultaneous RB[93], and correlated RB [94]. For details on therandomized benchmarking protocol used by IBMQuantum, see [93, 95–98].

The experiments necessary for cross-entropybenchmarking may themselves also be used to esti-mate a depolarisation rate in a similar way to RB[3]. Instead of using random Clifford circuits, how-ever, the random circuits are run. Under the as-sumption that the action of a random circuit canbe described using a depolarising error model (withequal-probability Pauli errors), then the Pauli er-ror, εP, can be estimated as

εP = (1− p)(1− 1/D2). (14)

Here, p is the depolarisation rate of the survivalprobability under the action of random circuits,estimated as above. Interestingly, εP can be esti-mated using single and two-qubit RB information.

Several important noise channels, most notablycross-talk, are not included in the device calibrationdata. As shown in Section 5, the effects of this noisecan be inferred through the application-motivatedbenchmarks we introduce in this work, by showingthe trade-off between connectivity of the device andcross-talk [79].

D Empirical Relationship Be-tween Heavy Output Gen-eration Probability and L1Distance

As discussed in Section 3.4, the theoretical foun-dations for believing that implementing shallowcircuits to within a fixed `1-norm distance con-stitutes a demonstration of quantum computa-tional supremacy are stronger than for imple-mentations with high heavy output generationprobability. That being said, Figure 3 andFigure 11 contain similar features. For exam-ple, ibmq_16_melbourne consistently performs theworst, with ibmq_singapore and ibmq_ourense per-forming the best in both figures of merit. An in-teresting question, then, is how these two figures ofmerit generally relate to one another.

If the `1-norm distance was 0, the experimen-tal outcome frequencies would equal the ideal out-come probabilities. Consequently, the heavy out-put probabilities would be the same between thedevice and an ideal quantum computer. Becausethe heavy output probability depends on the cir-cuit in question, when examining the empirical rela-tionship between `1-norm distance and heavy out-put probability, it is useful to normalize the lat-ter by the heavy output probability of an ideally-implemented circuit. We define the normalisedheavy output generation probability as the ratio ofthe heavy output probability of the device and theheavy output probability from an ideal quantumcomputer. Therefore if the `1-norm distance was 0,the normalised heavy output generation probabilitywould be 1.

40

Page 41: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

0 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9

10 2

10 1

Qubit

Error

Rate

(a) Average readout error. The readout error is the probability the state of a given qubit is incorrectlylabelled.

0 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9

10 3

Qubit

Error

Rate

(b) Average error per U2 gate. The error per gate is a measure of how accurately the U2 gate is applied.

Figure 15: Error per single qubit operations on the devices used in this work. Bars indicatethe average error rates; error bars are one standard deviation. Data aggregated based on calibrationdata collected over the course of our experiments. Devices shown here are: ibmqx2 [ ], ibmq_ourense[ ], ibmq_singapore [ ], ibmq_16_melbourne [ ]. A logarithmic scale is used.

41

Page 42: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

(0,1

)(0

,14)

(2,1

)(2

,3)

(2,1

2)(4

,3)

(4,5

)(4

,10)

(6,5

)(6

,8)

(8,6

)(8

,7)

(8,9

)(1

0,4)

(10,

9)(1

0,11

)(1

2,2)

(12,

11)

(12,

13)

(14,

0)(1

4,13

)(0

,2)

(2,0

)(2

,4)

(4,2

)(6

,1)

(6,7

)(8

,3)

(10,

5)(1

2,7)

(14,

9)(1

6,11

)(1

6,15

)(1

6,17

)(1

8,13

)(1

8,17

)(1

8,19

)

10 2

10 1

Error

Rate

(1,0

)(1

,2)

(1,1

3)(3

,2)

(3,4

)(3

,11)

(5,4

)(5

,6)

(5,9

)(7

,8)

(9,5

)(9

,8)

(9,1

0)(1

1,3)

(11,

10)

(11,

12)

(13,

1)(1

3,12

)(1

3,14

)(1

,3)

(3,1

)(1

,6)

(3,8

)(5

,10)

(7,6

)(7

,12)

(9,1

4)(1

1,16

)(1

3,18

)(1

5,16

)(1

7,16

)(1

7,18

)(1

9,18

)

10 2

10 1

Qubits

Figure 16: Average error per CX operation on the devices used in this work. The error perCX gate is a measure of how accurately the CX gate is applied. Bars indicate the average error rates;error bars are one standard deviation. Data aggregated based on calibration data collected over thecourse of our experiments. Devices shown here are: ibmqx2 [ ], ibmq_ourense [ ], ibmq_singapore [ ],ibmq_16_melbourne [ ]. A logarithmic scale is used.

42

Page 43: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

As the `1-norm distance increases, the experi-mental frequencies increasingly differ from the idealoutcome probabilities. Two things then may hap-pen: heavy outputs are produced more regularly, inwhich case the normalised heavy output generationprobability will grow above 1; or less regularly, inwhich case the normalised heavy output generationprobability will fall below 1. In practice, we expectthe distribution produced by the device to convergeto the uniform one over all bit strings as the noiseincreases, so we expect the normalised heavy out-put generation probability to fall with increasing`1-norm distance.

The empirical relationship between the nor-malised heavy output generation probability and`1-norm distance is shown in Figure 17. For eachcircuit, Figure 17 plots the `1-norm distance ofthe distribution produced by a real device againstthe normalised heavy output generation probabil-ity. As expected, a negative correlation exists be-tween these two figures of merit. For the deepestcircuits, and in particular the widest circuits fromthe deep circuits class, the cluster of points canbe seen to indicate that the the normalised heavyoutput generation probability falls more slowly asthe `1-norm distance becomes larger. This is be-cause the minimum value of heavy output gener-ation probability is being reached, which is to saythat the output distribution from the real devicehas converged to the uniform one, while more de-tail can be extracted by considering the `1-normdistance.

This correlation is encouraging as, in the regimewhere it becomes impossible to calculate the `1-norm distance, we can be justified in believingthat the correlation between the features present inthe plot throughout this section persist. This lineof reasoning is similar to that used when Cross-Entropy Benchmarking is used to predict demon-strations of quantum computational supremacy inthe regime when it too becomes impossible to cal-culate [3]. Note also that this correlation contrastswith the knowledge that, in general, the probabil-ity of producing heavy outputs does not provide anupper bound on the `1-norm distance [54], and re-veals that in practice it may be relied upon to doso.

43

Page 44: Application-Motivated, Holistic Benchmarking of a Full ...€¦ · Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills 1,2, Seyon Sivarajah2,

0.2

0.4

0.6

0.8

1.0

1.2

0.2

0.4

0.6

0.8

1.0

1.2

0.2

0.4

0.6

0.8

1.0

1.2

0.00 0.25 0.50 0.75 1.00 1.25 1.500.2

0.4

0.6

0.8

1.0

1.2

0.00 0.25 0.50 0.75 1.00 1.25 1.50 0.00 0.25 0.50 0.75 1.00 1.25 1.50

Normalised

Heavy

Outpu

tProba

bility

`1-norm distance

Circuit ClassShallow Circuits Square Circuits Deep Circuits

Device

ibmq_

ourenseibm

q_16_

melbourne

ibmqx2

ibmq_

singapore

Figure 17: Scatter plot and linear regression line comparing the normalised heavy outputgeneration probability and `1-norm distance. Each point corresponds to one circuit of the classand width as labelled. Colours correspond to numbers of qubits in the following way: 2 [ ], 3 [ ], 4 [ ],5 [ ], 6 [ ], 7 [ ].

44