gate set tomography - arxiv

Gate Set TomographyErik Nielsen1, John King Gamble2, Kenneth Rudinger1, Travis Scholten3, Kevin Young1, and RobinBlume-Kohout1

1Quantum Performance Laboratory, Sandia National Laboratories2Microsoft Research3IBM Quantum, IBM T.J. Watson Research Center

Gate set tomography (GST) is a protocol fordetailed, predictive characterization of logic op-erations (gates) on quantum computing proces-sors. Early versions of GST emerged around2012-13, and since then it has been refined,demonstrated, and used in a large number ofexperiments. This paper presents the foun-dations of GST in comprehensive detail. Themost important feature of GST, compared toolder state and process tomography protocols,is that it is calibration-free. GST does not relyon pre-calibrated state preparations and mea-surements. Instead, it characterizes all the op-erations in a gate set simultaneously and self-consistently, relative to each other. Long se-quence GST can estimate gates with very highprecision and efficiency, achieving Heisenbergscaling in regimes of practical interest. In thispaper, we cover GST’s intellectual history, thetechniques and experiments used to achieve itsintended purpose, data analysis, gauge free-dom and fixing, error bars, and the interpreta-tion of gauge-fixed estimates of gate sets. Ourfocus is fundamental mathematical aspects ofGST, rather than implementation details, butwe touch on some of the foundational algorith-mic tricks used in the pyGSTi implementation.

Contents1 Introduction 2

1.1 Gate set tomography . . . . . . . . . . . 21.2 Section guide & expert summary . . . . 3

2 Background 42.1 Mathematical Preliminaries . . . . . . . 4

2.1.1 Quantum processors and quan-tum circuits . . . . . . . . . . . . 4

2.1.2 States, measurements, andHilbert-Schmidt space . . . . . . 5

2.1.3 Quantum logic gates . . . . . . . 62.1.4 Gate sets . . . . . . . . . . . . . 7

2.1.5 Circuits . . . . . . . . . . . . . . 82.2 State, Process, and Measurement To-

mography . . . . . . . . . . . . . . . . . 92.2.1 Quantum state tomography . . . 92.2.2 Quantum process tomography . . 102.2.3 Measurement tomography . . . . 10

2.3 The role of calibration in tomography . 11

3 Linear gate set tomography (LGST) 113.1 Introduction to LGST . . . . . . . . . . 113.2 The LGST algorithm . . . . . . . . . . . 123.3 Preparing the fiducial vectors . . . . . . 143.4 Improving LGST with maximum likeli-

hood estimation . . . . . . . . . . . . . 15

4 Long-sequence Gate Set Tomography 164.1 Historical background . . . . . . . . . . 164.2 Experiment design for long-sequence GST 17

4.2.1 Selecting germs (g) . . . . . . . . 194.2.2 Defining base circuits . . . . . . 194.2.3 Putting it all together . . . . . . 20

4.3 Estimating gate set parameters in long-sequence GST . . . . . . . . . . . . . . . 20

5 Advanced long-sequence GST 225.1 Gate set models . . . . . . . . . . . . . . 22

5.1.1 Gauge freedom in constrainedgate sets . . . . . . . . . . . . . . 23

5.1.2 TP and CPTP constrained models 235.2 Fiducial-pair reduction . . . . . . . . . . 24

6 Analyzing GST estimates 256.1 Goodness-of-fit and Markovianity . . . . 256.2 Gauge optimization . . . . . . . . . . . . 26

6.2.1 Why gauge optimization? . . . . 276.2.2 Metrics for gauge optimization . 28

6.3 Error bars . . . . . . . . . . . . . . . . . 306.3.1 General problems with region es-

timators . . . . . . . . . . . . . . 306.3.2 Bootstrapping . . . . . . . . . . 316.3.3 Confidence regions based on the

Hessian of the likelihood . . . . . 316.3.4 Dealing with gauge freedom . . . 32

Accepted in Quantum 2021-07-25, click title to verify. Published under CC-BY 4.0. 1

arX

iv:2

009.

0730

1v2

[qu

ant-

ph]

28

Sep

2021

https://quantum-journal.org/?s=Gate%20Set%20Tomography&reason=title-click

6.3.5 Gauge-fixing bootstrap regions . 326.3.6 Gauge-fixing Hessian-based regions 32

7 Summary 33

Appendices 34

A Gauge 34A.1 Gauge degrees of freedom . . . . . . . . 34A.2 Gauge considerations in error bars . . . 36

B The extended LGST (eLGST) protocol 37

C Addressing overcompleteness in LGST withpseudo-inverses 39

D Implementations in pyGSTi 40D.1 Circuit selection . . . . . . . . . . . . . 40D.2 Non-TP loglikelihood optimization . . . 44

E Numerical verification 45

F χ2 estimator SPAM bias 49

References 49

1 IntroductionGetting to useful quantum computation will requireengineering quantum bits (qubits) that support high-fidelity quantum logic operations, and then assemblingthem into increasingly large quantum processors, withmany coupled qubits, that can run quantum circuits.But qubits are susceptible to noise. High-fidelity logicoperations are hard to perform, and failure rates as lowas 10−4 are rare and remarkable. Useful quantum com-putation may require fault tolerant quantum error cor-rection, which distills a few high-fidelity logical qubitsfrom many noisy ones. Threshold theorems [2–4, 89]prove that this is possible in realistic quantum proces-sors – if the processor’s errors satisfy certain conditions.Each threshold theorem assumes different conditions onthe errors, but in general they must (1) be only of cer-tain types, and (2) occur at a rate below a certain value(generally around 10−3 – 10−4 for the most commontypes). So it is not an exaggeration to say that the en-tire program of practical quantum computing dependson understanding and characterizing the errors that oc-cur in as-built processors, controlling their form, andreducing their rates.

Assessing how qubits, logic operations, and entireprocessors behave is the central task of quantum de-vice characterization, a.k.a. “quantum characteriza-tion, verification, and validation” (QCVV). There aremany protocols for this task, all of which share the

same basic structure: a set of experiments describedby quantum circuits are performed on the processor,yielding data that is processed according to some al-gorithm, to produce an estimate of an aspect of theprocessor’s behavior. Some protocols produce highlydetailed predictive models of each logic operation (to-mography [5, 6, 8, 10, 12, 14–16, 19, 30, 32, 33, 40,42, 55, 59, 61, 64, 66, 75, 79, 84–86, 97, 114, 117,123, 126, 136, 152, 159, 169]). Others produce semi-predictive partial characterizations (component bench-marking [1, 9, 24, 27, 37, 39, 47, 48, 50, 56, 58, 60,68, 70, 76, 77, 87, 88, 92, 100–102, 105, 108, 115, 127,129, 135, 139, 151, 157, 160–164]). And some assess theperformance of entire processors (holistic benchmarking[7, 13, 18, 21, 38, 67, 71, 96, 106, 111, 118, 173, 174]).

All of these methods are valuable parts of the QCVVtoolbox, serving distinct (though overlapping) pur-poses. Some tasks demand simple holistic summariesof overall performance, while others are best served bypartial characterization. But for debugging, rigorousdevice modeling, and reliable prediction of complex cir-cuit performance (including the circuits that performquantum error correction) there is as yet no substitutefor detailed tomography of individual logic operations,a.k.a. quantum gates. The most powerful and com-prehensive technique for this task is currently gate settomography (GST). Developed around 2012-13, GSThas been used in a variety of experiments [14, 19, 42,72, 82, 85, 128, 153, 167, 176], discussed in the literature[25, 49, 87, 93, 104, 124, 130, 137, 139, 142, 146, 170],and implemented in a large open-source software project[120, 121]. But no comprehensive explanation of thetheory behind GST has appeared in the literature. Thispaper fills that gap.

1.1 Gate set tomographyThe term “gate-set tomography” first appeared in a2012 paper from IBM Research [110] that introducedseveral of GST’s core concepts, but the group did notpursue GST further. Around the same time, our groupat Sandia National Labs started pursuing an indepen-dent approach to the same problem [14, 85], which wehave developed extensively since then [19, 42, 121, 128].This approach to GST, which has been used in a num-ber of additional experiments over the past 5 years[34, 72, 82, 104, 137, 153, 167, 176], is the one thatwe discuss and present here. We discuss its relationshipto IBM’s original approach where appropriate.

“Tomography” means the reconstruction of a com-prehensive model (of something) from many partialcross-sections or slices, each of which provides onlya limited view that may be useless by itself. Tomo-graphic techniques are distinguished by their aspira-


tion to construct a comprehensive model for a sys-tem or component, by fitting that model to the datafrom many independent experiments. Quantum tomo-graphic methods include state tomography [6, 8, 15,16, 32, 59, 61, 66, 75, 79, 117, 152, 159], process to-mography [5, 10, 12, 30, 33, 40, 55, 84, 86, 97, 114,123, 126, 136, 169], detector or measurement tomog-raphy [20, 29, 54, 98, 99], Hamiltonian tomography[35, 36, 43, 53, 63, 73, 90, 138, 144, 165, 166, 175] andGST.

Like process tomography, GST is intended to char-acterize how logic operations (e.g. gates) impact theirtarget qubits. Those gates’ target subsystem and statespace needs to be specified up front, and GST recon-structs the gate-driven evolution of those targets only.GST is not designed to probe the holistic performance ofmany-qubit processors, and as most commonly used, itdoes not capture general crosstalk. (Although GST caneasily be adapted to focus on specific types of crosstalk,and characterize them in detail.)

GST differs from state and process tomography,which we discuss later in Section 2.2, in two fundamen-tal ways. First, it is almost entirely calibration-free (or“self-calibrating”). When GST reconstructs a modelof a quantum system, it does not depend on a priordescription of the measurements used (as does state to-mography) or the states that can be prepared (as doesprocess tomography). Second, GST reconstructs or es-timates not a single logic operation (e.g., state prepara-tion or logic gate), but an entire set of logic operations– a gate set including one or more state preparations,measurements, and logic gates. These two features areinextricably linked, and we discuss this connection laterin this paper.

GST’s independence from calibration is its originalraison d’etre. As IBM pointed out [110], state andprocess tomography become systematically unreliablewhen the “reference frame” operations on which theyrely (pre-calibrated measurements for state tomogra-phy, pre-calibrated state preparations and measure-ments for process tomography) are either unknown, ormisidentified. This limits their practical utility andmakes them unsuitable for rigorous characterization.

GST is not the only calibration-free method. Today,new characterization protocols are more or less expectedto be calibration-free. The oldest calibration-free pro-tocol is randomized benchmarking (RB) [48, 88], whichpredates GST by almost 10 years. Detailed compar-isons between RB and GST have been given elsewhere[19, 42], and they are complementary tools addressingdifferent needs.

This paper is not a GST “how-to” manual. We doprovide some incidental discussion of how to performGST in practice, and how it is implemented in our

pyGSTi software [120], but only inasmuch as it illus-trates aspects of the theory. Separately, we recentlypublished a concise guide to applying GST to character-ize hardware [121]. In contrast, this article is intendedas a theory paper presenting GST’s mathematical back-ground, justifications, and derivations.

1.2 Section guide & expert summaryBecause this paper attempts to be comprehensive, it israther long. So we begin here with a short expert-levelsummary of the major ideas and results, which mayhelp readers decide what parts to read. A reader whounderstands everything in this section may not need toread any further at all! Other readers may wish to skipto the first part that discusses a statement (from thissummary) that is not obvious.

We begin in Section 2 by establishing mathemati-cal and conceptual foundations. We introduce quan-tum logic operations (2.1.1); the mathematical repre-sentations of states and measurements (2.1.2); and su-peroperators as a model of noisy quantum logic gates,including transfer matrix and Choi matrix representa-tions, and introduce super-Dirac (superbra / superket)notation (2.1.3). We introduce gate sets, their represen-tation as a collection of superoperators and operators,and gauge freedom (2.1.4). Finally, we establish nota-tion and conventions for quantum circuits and the su-peroperator τ(g) caused by a quantum circuit g (2.1.5).

In the second part of Section 2, we review standardforms of tomography for states, processes, and measure-ments (2.2). We conclude this introduction by explain-ing the role of calibration in standard tomography (2.3)to set the stage for GST.

Section 3 is devoted to linear inversion GST (LGST).We start with the history of GST, describing how LGSTsolved some of the problems with the initial versions ofGST (3.1). We then derive LGST, emphasizing par-allels with process tomography (3.2), and introducingtechniques required for long-sequence GST. Readersinclined to skip over the mathematics in Section 3.2should feel free to do so, as these derivations are notprerequisites for subsequent material. In the last partsof Section 3, we address two issues left unanswered inthe derivation; how to prepare fiducial states and mea-surements (3.3), and how to fit LGST data better withmaximum likelihood estimation (3.4).

Section 4 introduces long-sequence GST, the stan-dard “GST” protocol. (LGST came first, and is eas-ier to analyze theoretically, but is rarely used alonein practice). We introduce the two main motivationsfor long-sequence GST, which are (1) much higher ac-curacy, and (2) the ability to impose constraints suchas complete positivity. After discussing early and ob-


solete versions of long-sequence GST (4.1), we explainhow to construct circuits that amplify errors and enableHeisenberg-limited accuracy (4.2). Then, we define theGST likelihood function and show how to maximize itto find the maximum likelihood estimate of a gate setmodel (4.3).

Section 5’s two subsections cover two “advanced” as-pects of long-sequence GST. The first (5.1) conceptual-izes gate sets as parameterized models, introducing theterm “gate set model”, and discusses how to imposeconstraints like trace preservation or complete positiv-ity. The second (5.2) describes “fiducial pair reduc-tion”, a method for eliminating redundancy among thestandard set of GST circuits and thereby reducing thenumber of circuits prescribed.

Section 6 addresses the question “What can you dowith a GST estimate once you have it?” We start withthe critical step of assessing goodness-of-fit and detect-ing model violation or non-Markovianity (6.1). We alsoexplain what we mean by “Markovian” in this con-text, and why model violation is associated with “non-Markovian” behavior. We then turn to the importantproblem of extracting simplified metrics (e.g., fidelity)from a gate set, explain why choosing a gauge is nec-essary, and explain how to do so judiciously using aprocess we call “gauge optimization” (6.2). Finally, wediscuss how to place error bars around that point es-timate, including how to deal with the complicationsinduced by gauge freedom (6.3).

Appendices address some of the most technical andtangential points. These include the gauge (A), thehistorically-important eLGST algorithm (B), how todeal with overcomplete data in LGST (C), specific im-plementation choices made in pyGSTi (D), numericalstudies that support claims made in the main text (E),and the bias we find in a particular estimator (F). Ap-pendices are referenced from the text where applicable.

2 Background2.1 Mathematical PreliminariesWe begin with a concise introduction to the mathemat-ics, formalism, and notation used to describe qubits,multiqubit processors, their logic operations, and theprograms (quantum circuits) that they perform.

2.1.1 Quantum processors and quantum circuits

An idealized quantum information processor comprisesone or more qubits that can be (i) initialized, (ii) trans-formed by quantum logic gates, and (iii) measured or“read out” to provide tangible classical data [45]. Us-ing such a processor is usually described by physicists

prepare all qubits

measure all qubits

a)

b)

n-qubit operation

Figure 1: (a) A fixed-input classical-output (FI/CO) quantumcircuit begins by preparing all the qubits and ends by measuringall the qubits. Individual 1- and 2-qubit elementary operationsa

(gray boxes) that occur at the same time are grouped into cir-cuit layers (dashed rectangles) which by construction act onall the qubits (horizontal black lines). (b) In this work, a gaterefers to a n-qubit operation and thus corresponds to a circuitlayer (downward arrow). FI/CO circuits are, for the purposesof this paper, composed of: a n-qubit state preparation, asequence of n-qubit gates (circuit layers), and a n-qubit mea-surement. The symbols ρ, Gγ and M = {Ej}j label theseoperations within a gate set (see Eq. 14). The particlar cir-cuit shown begins with preparation in the i-th available state,proceeds with execution of gates indexed by γi, . . . γ6, and con-cludes with performance of the m-th available measurement.

aOften called “gates” but not here since we use the term “gate”to mean a n-qubit operation.

as running experiments, and by computer scientists asrunning programs. These are the same thing. Experi-ments/programs are described by quantum circuits (seeFigure 1a), which specify a schedule of logic operations.

In this paper, the experiments we consider correspondto circuits that begin with initialization of all the qubits,conclude with measurement of all the qubits, and applyzero or more logic gates in the middle.

If a processor contains just one qubit, then only onelogic gate can be performed at a time, and every circuitcorresponds to a state preparation, a sequence of logicgates, and a measurement. Circuits on many qubitstypically divide each clock cycle into operations on justone or two qubits. We refer to these as elementary oper-ations, as we use the more common term “gate” to de-note a n-qubit operation (see below). The elementaryoperations of a single clock cycle constitute a circuitlayer, and so each circuit corresponds to a preparation,sequence of layers, and measurement. Models of many-qubit circuits must then have a means of describing theway a circuit layer propagates a n-qubit quantum statebased on the one- and two-qubit elementary operations


within that layer. Such modeling, when the layers arecomposed of imperfect operations, is nontrivial [143].

The implementation of GST described in this paperutilizes simple models where each circuit layer is an in-dependent operation. That is, the only models we con-sider describe only n-qubit operations that never occurin parallel. To simplify our language, we refer to thesen-qubit operations as gates, and never dissect them fur-ther. This means that in this paper, the term “gate” issynonymous with circuit layer, as every circuit layer ismodeled as an independent operation (a “gate”) thatacts on all the qubits. For example, a GST model as-sumes no a priori relationship between the behavior ofthe following three layers (labeled by G for “gate”):

• “GXI” – An X gate on qubit 1, in parallel with anidle gate on qubit 2.

• “GIX” – An idle gate on qubit 1, in parallel withan X gate on qubit 2.

• “GXX” – An X gate on qubit 1, in parallel withan X gate on qubit 2.

This approach rapidly becomes unwieldy as the num-ber of qubits grows, since exponentially many distinctlayers are generally possible. While this limits the scal-ability of the GST protocol (which was developed in thecontext of one- and two-qubit systems where modelingparallel gates wasn’t an issue), many of the ideas andtheory surrounding it can be extended to many-qubitmodels [119].

QCVV protocols, including tomography, seek to de-duce properties of a processor’s elementary logic oper-ations from the results of running circuits. Doing sorequires a model for all three kinds of logic operations(initialization, gates, and measurements) that has ad-justable parameters, and can be used to predict circuitoutcome probabilities. The most common model, andthe one used by GST, is based on Hilbert-Schmidt space.

2.1.2 States, measurements, and Hilbert-Schmidt space

A n-qubit processor is initialized into a particular quan-tum state. Knowledges of a quantum state enables pre-dicting probabilities of outcomes of any measurementsthat could be made on the system. After initialization,the state evolves as gates are applied. The final state,just before the processor is measured, determines theprobabilities of all possible measurement outcomes. Sowe model the three kinds of quantum logic operationsas follows: initialization is modeled by a quantum state;logic gates are modeled by dynamical maps that trans-form quantum states; measurements are described bylinear functionals that map quantum states to proba-bility distributions.

A quantum system is described with the help of a d-dimensional Hilbert space H = Cd, where d is the largestnumber of distinct outcomes of a repeatable measure-ment on that system (and is therefore somewhat sub-jective). For a single qubit, d = 2 by definition, and fora system of n qubits, d = 2n. 1

A quantum state is described by a d × d positivesemidefinite, trace-1 matrix ρ that acts on its Hilbertspace (ρ : H → H) [122]. This is called a density matrix.A matrix must be positive semidefinite and have trace1 in order to represent a physically valid state. Suchdensity matrices form a convex set, and it is very con-venient to embed this set in the complex d2-dimensionalvector space of d×dmatrices. This vector space is calledHilbert-Schmidt space and denoted by B(H). Hermitian(self-adjoint) matrices form a real d2-dimensional sub-space of Hilbert-Schmidt space. In this work, we onlyconsider density matrices that are Hermitian, but disre-gard the trace and positivity constraints, using the sym-bol ρ to describe any element of real Hilbert-Schmidtsubspace. We state explicitly if/when the trace andpositivity constraints are assumed.

Hilbert-Schmidt space is equipped with an innerproduct 〈A,B〉 ≡ Tr(A†B) that, as we will see, isvery useful in quantum tomography. In a generaliza-tion of Dirac notation on a Hilbert space we repre-sent an element B of Hilbert-Schmidt space by a col-umn vector denoted |B〉〉, and an element of its (iso-morphic) dual space by a row vector denoted 〈〈A|, sothat 〈〈A|B〉〉 = Tr(A†B). We refer to |B〉〉 as a superketand to 〈〈A| as a superbra, since operations on B(H) arecalled superoperators (see the next section).

Measuring a quantum system yields an outcome or“result” which we assume is drawn from a discrete setof k alternatives. Each outcome’s probability is a linearfunction of the state ρ. Therefore, the ith outcome canbe represented by a dual vector 〈〈Ei|, so that Pr(i|ρ) =〈〈Ei|ρ〉〉 = Tr(Eiρ). The laws of probability require thatEi ≥ 0 and

∑iEi = 1l. The Ei are called effects. The

set {Ei} completely describes the measurement and iscalled a positive operator-valued measure (POVM). Aswith states, we sometimes relax positivity constraintson POVM effects for convenience.

We find it useful to define a basis for Hilbert-Schmidtspace, {Bi}, with the following properties:

1. Hermiticity: Bi = B†i ,

2. Orthonormality: Tr(BiBj) = δij ,

3. Traceless for i > 0: B0 = 1l/√d and Tr(Bi) =

0 ∀ i > 0.

1Formal quantum theory is complicated by the possibility ofinfinite-dimensional Hilbert spaces, but these are rarely requiredfor quantum computing, so we will assume finite d.


Here, 1l is the d-dimensional identity map. Fora single qubit, the normalized Pauli matrices{

1l/√

2, σx/√

2, σy/√

2, σz/√

2}

constitute just such abasis. For n qubits, the n-fold tensor products of thePauli operators satisfy these conditions.

Since states and effects are both Hermitian, choosinga Hermitian basis makes it easy to represent states andeffects in the d2-dimensional real subspace of B(H) men-tioned above containing Hermitian operators. We abusenotation slightly and hereafter use “Hilbert-Schmidtspace” and B(H) to denote this real vector space, be-cause we never need its complex extension.

We conclude with a brief example to illustrate theseconcepts more concretely. Consider the single-qubitdensity matrix

ρ =(

3/4 (1 + i)/8(1− i)/8 1/4

). (1)

This matrix can be written as a linear combination ofthe normalized Pauli matrices,

ρ = 1√2

(1l√2

)+ 1

4√

2

(σx√

2

)− 1

4√

2

(σy√

2

)+ 1

2√

2

(σz√

2

),

(2)where each coefficient can be found by taking the innerproduct Tr(B†i ρ) (these coefficients will always be real).We then represent ρ as |ρ〉〉 ∈ B(H) by the real columnvector

|ρ〉〉 =

1/√

21/(4√

2)−1/(4

√2)

1/(2√

2)

. (3)

Similarly, the projectors onto the |0〉 and |1〉 states,given by the complex-valued matrices

E0 =(

1 00 0

)and E1 =

(0 00 1

)(4)

may be represented as the row (dual) vectors

〈〈E0| =(

1/√

2 0 0 1/√

2)

and (5)〈〈E1| =

(1/√

2 0 0 −1/√

2). (6)

A measurement with the POVM {E0, E1} gives theprobability 0 or 1 via the Born rule,

p0 = 〈〈E0|ρ〉〉 = 〈〈E0| · |ρ〉〉 = Tr(E0ρ) = 34 (7)

p1 = 〈〈E1|ρ〉〉 = 〈〈E1| · |ρ〉〉 = Tr(E1ρ) = 14 , (8)

where · is the standard vector dot product. This exam-ple illustrates the convenience of representing states andmeasurements as real vectors in Hilbert-Schmidt space,and the similarity with standard Dirac notation.

2.1.3 Quantum logic gates

What happens between the beginning of a quantum cir-cuit (initialization into a state ρ) and its end (a mea-surement {Ei}), is defined by a sequence of circuit lay-ers (Figure 1a). Because we treat each circuit layer as aunique and independent “gate” (see Section 2.1.1), themiddle portion of a circuit can be seen as a sequence ofquantum logic gates (Figure 1b). Gates are dynamicaltransformations on the set of quantum states. An idealquantum logic gate is reversible and corresponds to aunitary transformation of H. Such a gate would trans-form ρ as ρ→ UρU† for some d× d unitary matrix U .This is a linear transformation from B(H) to itself, andis called a superoperator. The superoperator describingan ideal (unitary) gate is orthogonal. 2

Real logic gates are noisy, and not perfectly re-versible. They still act linearly on states, so they canbe represented by superoperators on B(H). But the su-peroperators for noisy gates are not orthogonal, and aregenerally contractive. These superoperators are knownas quantum processes or quantum channels. Given anysuperoperator Λ, we can represent it as a d2 × d2 ma-trix that acts associatively (by left multiplication) on|ρ〉〉 ∈ B(H), by just choosing a basis for B(H). Thisrepresentation is often called the transfer matrix of Λ.We adopt this terminology, and denote it by SΛ. Thus,we write

Λ : |ρ〉〉 7→ SΛ|ρ〉〉, (9)where the column vector SΛ|ρ〉〉 is obtained by multiply-ing the column vector |ρ〉〉 from the left by the matrixSΛ. If ρ is prepared, and then Λ is performed, and fi-nally {Ei} is measured, then the probability of outcomei is

pi = 〈〈Ei|SΛ|ρ〉〉 = Tr(EiSΛρ). (10)Not every linear superoperator describes a physically

allowed operation. For example, the summed-up prob-ability of all measurement outcomes is given by Tr(ρ),so Tr(ρ) must equal 1. A superoperator Λ that changesTr(ρ) violates the law of total probability. So only trace-preserving superoperators are physically allowed. Fur-thermore, if ρ is not positive semidefinite, then someoutcome of some measurement will have negative prob-ability. So states must satisfy ρ ≥ 0. If there existsany ρ ≥ 0 such that Λ(ρ) is not ≥ 0, then that Λ isnot physically possible. A superoperator that preservesρ ≥ 0 is called positive.

However, a stronger condition is actually required:when Λ acts on part of a larger system, it must pre-serve the positivity of that extended system. This cor-responds to requiring Λ ⊗ 1lA to be positive for any

2To be clear, the d×d unitary matrix U is not a superoperator– it’s an operator. The linear transformation ρ → UρU† actingon Hilbert-Schmidt space is the superoperator.


auxiliary state space A, and is called complete positiv-ity [31, 91]. It is stronger than simple positivity; somepositive maps aren’t completely positive. Superopera-tors representing physically possible operations must becompletely positive and trace-preserving (CPTP). TheCPTP constraint turns out to be both necessary andsufficient; any CPTP superoperator can be physicallyimplemented given appropriate resources by the Stine-spring dilation construction [122].

The TP (trace-preserving) condition is easy to de-scribe and impose in the transfer matrix representation;it corresponds to 〈〈1l|SΛ = 〈〈1l|. If SΛ is written in a ba-sis whose elements are traceless except for the first one(as required earlier), then Λ is TP if and only if the firstrow of SΛ equals [1, 0 . . . 0].

The CP condition is more easily described in a dif-ferent representation, the operator sum representation[31, 91]:

Λ : ρ 7→∑ij

χΛijBiρB

†j . (11)

Here, {Bi} is a basis for B(H), and χΛij is a matrix of

coefficients called the “Choi process matrix” that rep-resents Λ. This Choi representation of Λ provides thesame information as the transfer matrix SΛ. The map-ping between them is known as the Choi-Jamiolkowskiisomorphism [31, 80]:

χΛ = d(SΛ ⊗ 1l)|ΠEPR〉〉, (12)

where ΠEPR = |ΨEPR〉〈ΨEPR| and |ΨEPR〉 is a max-imally entangled state on a space that is the tensorproduct of B(H) and a reference auxiliary space of equaldimension. Note that Eq. 12 is technically problematic:the right hand side is a super-ket, but the left hand sideis a matrix. Equality here means that, in a consistentbasis, these two objects are element-wise equal. Thiselement-wise equality is the Choi-Jamiolkowski isomor-phism. Krauss operators correspond to the eigenvectorsof χΛ, and provide an intuitive picture of many commonsuperoperators[122].

The CP condition is elegant and simple in the Choirepresentation: Λ is CP if and only if χ is positivesemidefinite. This condition is independent of the basis{Bi} used to define χ.

Hereafter, we use the transfer matrix representationalmost exclusively, and so we use the term “superopera-tor” to also refer to the superoperator’s transfer matrix.Likewise, when we refer to a superoperator’s matrix rep-resentation, this should be understood to be the transfermatrix. When we use the Choi matrix representation(only to test/impose complete positivity), we will stateit explicitly. We use the term “quantum operation” in-terchangeably with “gate” to refer to general quantumoperations (not necessarily CP or TP), and will explic-itly state when CP and/or TP conditions are assumed.

2.1.4 Gate sets

Prior to GST, tomographic protocols sought to recon-struct individual logic operations – states, gates, ormeasurements – in isolation. But in real qubits andquantum processors, this isolation isn’t possible. Everyprocessor has initial states, gates, and measurements,and characterizing any one of them requires making useof the others. Underlying GST, RB, robust phase esti-mation [87], and other modern QCVV protocols is thesimultaneous representation of all a processor’s opera-tions as a single entity – a gate set.

Most processors provide just one native state prepa-ration and one native measurement. Gates are used toprepare other states and perform other measurements.For completeness, we consider the most general situa-tion here. Consider a processor that can perform NGdistinct gates, Nρ distinct state preparations, and NMdistinct measurements, and where the m-th measure-

ment has N(m)E distinct outcomes. We define the fol-

lowing representations for those operations:

Gi : B(H)→ B(H) for i = 1 . . . NG,

|ρ(i)〉〉 ∈ B(H) for i = 1 . . . Nρ, and (13)

〈〈E(m)i | ∈ B(H)∗ form = 1 . . . NM, i = 1 . . . N (m)

E .

The collective set of symbols Gi, |ρ(i)〉〉, and 〈〈E(m)i |

serve two distinct roles. First, by simply naming a setof operations, they provide a specification of the quan-tum processor’s capabilities. Secondly, each symbol rep-resents a mathematical model for the behavior (ideal,estimated, or unknown, depending on context) of thephysical hardware device when the corresponding oper-ation is performed in a quantum circuit. The notationgiven above is used throughout this paper. We refer toall these matrices and vectors collectively as a gate set,denoted by G, and defined as

G ={{|ρ(i)〉〉

}Nρi=1

; {Gi}NGi=1 ;

{〈〈E(m)

i |}NM,N

(m)E

m=1,i=1

}.

(14)A gate set is a complete model and description of aquantum processor’s behavior when running quantumcircuits.

A gate set is more than just a convenient way to pack-age together independent gate, preparation, and mea-surement operations. The operations on a quantumprocessor that a gate set describes are relational andinter-dependent. As a consequence, the representationgiven above – where gates correspond to d2 × d2 su-peroperators, state preparations to d2-element columnvectors, and POVM effects to d2-element row vectors –is an over-specification of the physical gate set. To seethis, consider a transformation of the gate set that acts


as

〈〈E(m)i | → 〈〈E(m)

i |M−1

|ρ(i)〉〉 →M |ρ(i)〉〉Gi →MGiM

−1, (15)

where M is any invertible superoperator. This trans-formation changes the concrete representation of thegate set, but does not change any observable proba-bility (see Eq.18). Since absolutely nothing observablehas changed, these are equivalent descriptions or repre-sentations of the same physical system.

The transformations defined by Eq. 15 form a contin-uous group, so the standard representation of gate setsdescribed above contains entire “orbits” of equivalentgate sets. This degeneracy, referred to generically as“gauge freedom”, is a perennial irritant and obstacle tocharacterization of quantum processors [19, 44, 94, 127].More importantly, it shows that a gate set is not justthe sum of its parts, and that tomography on a gateset is not just tomography on its components. GST istomography of a novel entity.

2.1.5 Circuits

The term “quantum circuit” is used in many contexts tomean subtly different things. For clarity, we find it help-ful to distinguish two related but distinct types of quan-tum circuits: fixed-input, classical-output (FI/CO) andquantum-input, quantum-output (QI/QO) circuits.

Data for standard QCVV protocols – e.g., state/pro-cess tomography, RB, or GST – consists of the countedoutcomes of experiments. Each experiment is describedby a quantum circuit that begins by initializing andends by measuring of all the qubits. Because the inputto such a circuit is “fixed” by the state preparation andthe output of the circuit is classical measurement data,we call this type of circuit a fixed-input classical-outputcircuit. A FI/CO circuit represents and describes aprobability distribution over classical bit strings, anda more-or-less concrete procedure for generating it witha quantum processor.

A quantum circuit can also describe an arrangementof unitary logic gates, with no explicit initialization ormeasurement, that could be inserted into a larger circuitas a subroutine. We call this kind of circuit a quantum-input quantum-output circuit. It is a sequence of cir-cuit layer operations, none of which are preparation ormeasurement layers. A QI/QO circuit represents anddescribes a quantum channel that maps an input quan-tum state to an output quantum state.

Thus, each FI/CO circuit comprises (1) one of the Nρpossible state preparations, (2) a QI/QO circuit (a se-quence of layers), and (3) one of the Nm measurements,which generates a specific outcome. On a processor that

has just one state preparation and one measurementoperation, every FI/CO circuit can be described com-pletely by the QI/QO circuit composed of all its circuitlayers.

Let γ be an index (or label) for available circuit lay-ers (gates). When we define a QI/QO circuit S asS = (γ1, γ2, ...γL), it means “do layer γ1, then do gateγ2, etc.” When the layer indexed (labeled) by γ is per-formed, it transforms the quantum processor’s state.This transformation is represented by a superoperatorGγ . Performing an entire QI/QO circuit, e.g. S, alsoapplies a superoperator. We denote the transfer matrixfor QI/QO circuit S by τ(S). It denotes the map fromB(H)→ B(H) formed by composing the elements of S,i.e., τ(S) = GγL · · ·Gγ2Gγ1 . Because it is a commonsource of confusion, we emphasize that gates appear inchronological order in layer sequences (S), but in re-verse chronological order in matrix products τ(S), be-cause that’s how matrix multiplication works. For laterreference, the general action of τ on a layer sequencecan be written

τ ((γ1, γ2, ...γL)) = GγL · · ·Gγ2Gγ1 , (16)

where 1 ≤ γi ≤ NG is an integer index into the set{Gi}NG

i=1 of corresponding gates (quantum processes).We use integer exponentiation of a QI/QO circuit todenote repetition. So, if S = (γ1, γ2) then S2 = SS =(γ1, γ2, γ1, γ2). It follows that

τ(Sn) = τ(S)n. (17)

Data sets are generated by defining a set of FI/COcircuits, repeating each one N times, and recording theresults. Results from each specific circuit are summa-rized by observed frequencies, fk = nk/N , where nkis the number of times the kth measurement outcomewas observed, for k = 1 . . . N (m)

E . These frequencies areusually close to their corresponding probabilities,

fk ≈ 〈〈E(m)k |GγL · · ·Gγ2Gγ1 |ρ(i)〉〉 ∀k, (18)

and can be used to estimate them. More sophisticatedestimators exist, but Eq. 18 motivates the idea that

some or all of |ρ(i)〉〉, 〈〈E(m)k | and Gj can be inferred

from the observed frequencies. Doing so is tomogra-phy. Each kind of tomography treats some operationsas known, and uses them as a reference frame to esti-mate the others.

It is usually apparent from context whether a FI/COor QI/QO circuit is being referenced. In the remainderof this paper, we only specify the type of circuit in caseswhen it may be ambiguous.


state tomography

process tomography

measurement tomography

unknown entity

assumed-known informationally complete set

Figure 2: Structure of the circuits required for state, process,and measurement tomography. Each of these protocols recon-structs an unknown entity (a state ρ, process G, or measure-ment M) by placing that entity in circuits with an assumed-known reference frame formed by an informationally completeset of state preparations or measurements (or both). Primedsymbols (ρ′ and M′) are meant to connote effective state prepa-rations and measurements, which are often implemented byapplying gate operations after or before a native state prepa-ration or measurement. A critical problem with these standardtomographic techniques is that ρ′ and M′ are in practice neverknown exactly.

2.2 State, Process, and Measurement Tomogra-phyIn this section we review standard state and processtomography. We also briefly describe the (straight-forward) generalization to measurement tomography.These methods establish the context for GST, but theyalso introduce the mathematical machinery and nota-tion that we will use for GST.

2.2.1 Quantum state tomography

The goal of quantum state tomography [6, 8, 15, 16,32, 59, 61, 66, 75, 79, 117, 152, 159] is to construct afull description of a quantum state ρ, from experimen-tal data. It is assumed that some quantum system canbe repeatedly prepared in the unknown state ρ, thatvarious fiducial measurements can be performed on it,that those measurements are collectively information-ally complete, and that the POVMs representing thosemeasurements are known (see Figure 2).

“Fiducial” means “accepted as a fixed basis of refer-ence”, and the measurements used in state tomographyconstitute a frame of reference. A set of measurements{E(m)

i }m,i is informationally complete if and only if, forany state ρ, the probabilities

p(m)i (ρ) = Tr

[ρE

(m)i

]uniquely identify ρ – i.e., there is no other ρ consistentwith them. This implies a simple linear-algebraic con-

dition: the {E(m)i } must span the entire space of effects,

forming a complete dual basis for states.

To perform state tomography, many copies of ρ aremade available, and divided into M pools. The mthfiducial measurement is applied to all the copies in the

mth pool. The observed frequencies f(m)i are recorded,

and used to estimate the corresponding probabilities

p(m)i (ρ) = Tr

[ρE

(m)i

].

Then ρ is deduced from these probabilities.

Practical state tomography is more complicated (andinteresting) because only finitely many copies can bemeasured, so each observed frequency isn’t generallyequal to the corresponding probability. Those probabil-ities have to be estimated from the data. One option isto estimate pi = fi (where · over a variable indicates anestimate). This is called linear inversion. It’s not veryaccurate, and often yields an estimate ρ that isn’t pos-itive, but despite these practical drawbacks it’s a theo-retical cornerstone of tomography. Linear inversion un-derlies the concept of informational completeness, whichcaptures whether an experiment design (here, a set ofmeasurements) is sufficient for estimation. These ideasalso underly GST, and are used directly in linear GST.So we now present a concise overview of linear inversionstate tomography and informational completeness.

Let the Hilbert-Schmidt space vector |ρ〉〉 denote anunknown quantum state that we want to reconstruct.

Let {E(m)i }m,i represent the set of M known fiducial

measurements, indexed by m = 1 . . .M , with i =1 . . . N (m) indexing the outcomes of the mth measure-ment. We represent each of these effects as a dual vec-

tor 〈〈E(m)i | in Hilbert-Schmidt space. Only the list of

effects matters, not which measurement m a given ef-

fect E(m)i belongs to, so we simply list all the effects

as {Ej : j = 1 . . . Nf1}, where Nf1 is the total numberof distinct measurement outcomes for all measurementsperformed.

Now, let us assume that we have learned the ex-act probabilities of each measurement outcome in stateρ (ignoring the entire process of performing measure-ments on samples, collating the results, and estimatingprobabilities from frequencies). These probabilities aregiven by Born’s rule,

pj = Tr [Ejρ] . (19)

We can we write this as a Hilbert-Schmidt space innerproduct,

pj = 〈〈Ej |ρ〉〉, (20)

and then stack all the row vectors 〈〈Ej | atop each other


to form an Nf1 × d2 matrix

A =

〈〈E1|〈〈E2|

...〈〈ENf1 |

. (21)

Now, ~p = A|ρ〉〉 is a vector whose jth element is pj (theprobability of Ej , as defined above). In the context ofstate tomography, all the measurement effects Ej areassumed to be known a priori, so the entire matrix Ais known. A also has full column rank (d2), because

the {〈〈E(m)i |} are informationally complete and there-

fore span Hilbert-Schmidt space. Therefore, we can re-construct ρ using linear algebra. If A is square, it hasan inverse, and |ρ〉〉 = A−1~p. If Nf1 is greater than d2,

then the {〈〈E(m)i |} form an overcomplete basis and A is

not square. In this case, we can solve for |ρ〉〉 using apseudo-inverse:

|ρ〉〉 = (ATA)−1AT ~p. (22)

2.2.2 Quantum process tomography

The goal of quantum process tomography [5, 10, 12,30, 33, 40, 55, 84, 86, 97, 114, 123, 126, 136, 169] isto construct a full description of a quantum process(e.g., a gate) from experimental data. This is done byconstructing an informationally complete set of knownfiducial states, preparing many copies of each of them,passing those copies through the process of interest, andperforming state tomography on the output states (seeFigure 2). All the same caveats and complications men-tioned for state tomography apply here. As above, weignore them and focus on the underlying math prob-lem of reconstructing an unknown process from exactlymeasured probabilities.

Let G be the superoperator (transfer matrix) repre-senting the process we want to reconstruct. Let {〈〈Ej |}be the list of POVM effects representing all the out-comes of all the fiducial measurements, as in state to-mography. And let {|ρi〉〉} be a list of Nf2 fiducial quan-tum states that will be repeatedly prepared as inputsto G.

If state ρi is prepared, G is applied, and a measure-ment is performed with possible outcomes {Ej}, thenthe probability of observing outcome Ej is

Pj,i = Tr (EjG[ρi]) (23)= 〈〈Ej |G|ρi〉〉. (24)

In addition to the A matrix defined previously, we de-fine a new d2 ×Nf2 matrix B from the column vectorsrepresenting the fiducial states |ρi〉〉:

B =(|ρ1〉〉 |ρ2〉〉 · · · |ρNf2〉〉

). (25)

Now, we can write the Nf1 ×Nf2 matrix P whose ele-ments are Pj,i as

P = AGB. (26)

In the context of process tomography, both the fiducialstates and the fiducial measurement effects are known,so all the elements of both A and B are known. A musthave full column rank and B must have full row rank,because the {ρi} and {Ej} were assumed to be informa-tionally complete. If they are square (Nf1 = Nf2 = d2),then they are invertible, and we can immediately recon-struct the original process as

G = A−1PB−1. (27)

If there are more than d2 input states and/or measure-ment outcomes, then (as before) we use a pseudo-inverseto obtain a least-squares solution:

G = (ATA)−1ATPBT (BBT )−1. (28)

More sophisticated estimators – i.e., maps from {pi,j}data to estimates of G – exist. They are importantwhen finite-sample errors in the estimated probabilitiesare significant. But the linear inversion tomographydescribed here is the essence of process tomography.

2.2.3 Measurement tomography

Using tomographic techniques to characterize measure-ments gets much less attention than state and processtomography – partly, perhaps, because it’s a straight-forward generalization of state tomography.

Measurement tomography [20, 29, 54, 98, 99] usespre-calibrated fiducial states to reconstruct the POVMeffects for an unknown measurement (see Figure 2).This is easy to represent using the framework alreadydeveloped for process tomography. If {ρi} are knownfiducial states, and {Ej} is the unknown POVM, thenwe can define one vector of observable probabilities ~pjfor each unknown effect:

[pj ]i = Tr[ρiEj ] (29)= 〈〈Ej |ρi〉〉. (30)

Using the B matrix defined in the preceding discussionof process tomography, this is simply

~pjT = 〈〈Ej |B, (31)

which means that each effect Ej can be reconstructedby linear inversion as

〈〈Ej | = ~pjTB−1, (32)

where B−1 is a matrix inverse or pseudo-inverse, as ap-propriate.


2.3 The role of calibration in tomographyState, process, and measurement tomography are con-ceptually straightforward, but rely on a critical (andunrealistic) assumption. They reconstruct the unknownoperation relative to some reference operations – fidu-cial measurements and/or states – that are assumed tobe perfectly known. Often, the reference operations arealso assumed to be noiseless.

These assumptions are never satisfied in practice.Identifying the exact POVMs that describe the fidu-cial measurements for state tomography – i.e., calibrat-ing them – would require perfectly known states. Butidentifying those states, by state tomography, relies onknown measurements. This leads to an endless loopof self-referentiality. In the same way, process tomog-raphy relies on fiducial states and measurements thatare almost always produced by applying quantum logicgates (specific quantum processes) to just a few nativestates and measurements [156]. A process tomogra-pher’s knowledge of those fiducial objects can be nomore precise than their knowledge of the gates used toprepare them – which would have to be characterizedwith process tomography.

Ref. [110] articulated this problem clearly. It alsodemonstrated that this concern is real, and has practi-cal consequences. In realistic scenarios, errors in statepreparation and measurement (SPAM) dominate inac-curacy in process tomography.

Several forms of “calibration-free” tomography [14,19, 23, 26, 74, 78, 83, 86, 107, 109, 112, 113, 131–133, 147, 154, 155] were proposed either independentlyfrom, or in response to, Ref. [110]. Each sought tocharacterize quantum gates, state preparations, and/ormeasurements self-consistently without any prior cali-bration. Gate set tomography has emerged as the mostwidely adopted of these, and is now a de facto stan-dard for performing calibration-free tomography. How-ever, it comes with certain costs, which have motivatedthe continued use of state and process tomography insome experiments. While we recognize the costs anddrawbacks of GST, the risks of traditional state andprocess tomography are so clear that they should notbe used, except under remarkable circumstances (e.g.,where there are very good reasons to believe the refer-ence operations are known to higher precision than isneeded in the final estimate).

3 Linear gate set tomography (LGST)GST offers two benefits. It resolves the circularity in-herent to state and process tomography, and achieveshigher accuracy with lower experimental cost than pro-cess tomography. These features are only loosely linked.

Linear GST (LGST) resolves the pre-calibration prob-lem, but has the same accuracy/effort scaling as processtomography. In this section, we use LGST to introducethe basic concepts and methods of GST.

3.1 Introduction to LGSTLinear-inversion gate set tomography (LGST) is basi-cally a calibration-free amalgamation of state, process,and measurement tomography. It constructs a low-precision estimate of a gate set G using the data froma specific set of short circuits. LGST demonstrates thefeasibility of avoiding pre-calibrated reference frames,and also why doing so creates gauge freedom.

LGST was developed roughly contemporaneouslywith IBM’s “overkill” approach to gate set tomography[110]. IBM identified the pre-calibration problem, rec-ognized that circuit outcome probabilities are given byexpressions of the form pS = 〈〈E|G . . . |ρ〉〉, and observedthat a sufficiently large set of such probabilities shouldbe sufficient to reconstruct the gate set. They proposedperforming all circuits of 3 or fewer gates to generatedata, then fitting a gate set to that data by numeri-cal maximization of the likelihood function (maximumlikelihood estimation, or MLE)

L(G) = Pr(data|G). (33)

But this likelihood function is not well-behaved. It isnot quasi-convex (i.e., its level sets are not convex), be-cause (1) the probabilities for the experimental observa-tions are not linear functions of the parameters, and (2)the existence of the gauge makes the likelihood’s max-imum very degenerate. Plotting the likelihood wouldreveal, instead of a unimodal “hill”, an assortment ofwinding “ridges” with perfectly level crests. Standardoptimization methods [22] often fail to find global max-ima on such functions, getting stuck in local maxima.The IBM “overkill” algorithm therefore relied on start-ing with a good initial guess for the gate set, that wouldlie with high probability in a local neighborhood of thelikelihood’s maximum.

LGST avoids this problem by using a very different,structured, experiment design. Instead of an unstruc-tured set of short circuits, LGST prescribes a specificset of them. The resulting data can be analyzed usingonly linear algebra, and the analysis is very similar tothat used for process tomography.

As LGST looks very much like process tomography,it’s important to recognize that it is doing somethingsignificantly different. If LGST was trying to recon-struct transfer matrices for individual gates, it wouldfail, because reconstructing individual gates without apre-calibrated reference frame is not possible. LGST re-constructs gates up to gauge freedom. Actually, it does


more – it reconstructs the entire gate set up to the globalgauge freedom given in Eq. 15, recapitulated below:

〈〈E(m)i | → 〈〈E(m)

i |M−1

|ρ(i)〉〉 →M |ρ(i)〉〉 (34)Gi →MGiM

−1.

Such transformations change the elements of the gateset, but not any observable probability. So it’s not pos-sible to distinguish between gauge-equivalent gate sets,and reconstructing a gate set up to arbitrary M consti-tutes success.

Since a gate set comprises states, gates, and measure-ments, it’s tempting to say that LGST characterizes allof them simultaneously. But this is not quite right. Agate set is not a collection of unrelated quantum oper-ations. Quantum operations are usually described rel-ative to an implicit and absolute reference frame. Butin most experiments, no such reference frame is avail-able. So GST characterizes all these operations relativeto each other, and estimates every property of a gate setthat can be measured without a reference frame. Butsome properties of gate sets can’t be measured, evenin principle, and they correspond to gauge degrees offreedom.

Gauge freedom makes some familiar properties ofgates unmeasurable. Other properties of gates turnout to be not associated with a single operation, butpurely relational properties – i.e., they are properties ofthe gate set, but not of any individual gate within it.This awkwardness is the unavoidable price of avoidingpre-calibrated reference frames. GST outputs a self-consistent representation of the available states, pro-cesses, and measurements, but that representation isgenerally not unique. If finite-sample errors did not ex-ist, LGST would be a perfect estimator of the gate set,and this paper would be much shorter. But real experi-ments always suffer from finite sample error. N trials ofan event with probability p does not generally yield ex-actly pN successes, so estimating p from data generallyyields p = p ± O(1/

√N). As a result, LGST’s estima-

tion error scales as O(1/√N), just like process tomog-

raphy. So estimating a gate set to within ±10−5 withLGST would require repeating each circuit N ≈ 1010

times, which is impractical. The “long-sequence GST”protocol described in Section 4 is much more efficient,and makes standalone LGST preferable only when thereare severe resource constraints. But LGST remains im-portant both pedagogically and as a key first step inlong-sequence GST’s analysis pipeline.

3.2 The LGST algorithmIn this section, we present the core LGST algorithm.We focus on (1) what LGST seeks to estimate, (2) what

data is required for LGST, and (3) how to transformthat data into an estimate under ideal circumstances.At first, we make some idealized simplifying assump-tions in order to maximize clarity. After presenting thecore algorithm, we return to these assumptions, relaxthem, and show how to make this algorithm practicaland robust. The simplifying assumptions are:

1. We assume the ability to create informationallycomplete sets of fiducial states

{|ρ′j〉〉

}and mea-

surement effects {〈〈E′i|} (see Section 3.3).

2. We ignore finite sample error in estimated proba-bilities, and its effects (see Section 3.4).

3. We assume that the fiducial states and effectsare exactly informationally complete, not overcom-plete, so that Nf1 = Nf2 = d2 (see Appendix C).

We use the notation of Eq. 14 to denote the contentsof a generic gate set G:

G ={{|ρ(i)〉〉

}Nρi=1

; {Gi}NGi=1 ;

{〈〈E(m)

i |}NM,N

(m)E

m=1,i=1

}.

(35)As in Section 2.1.4, Nρ, NG and NM are the number ofdistinct state preparations, gates, and measurements,

and N(m)E is the number of possible outcomes for the

m-th distinct measurement.To perform process tomography on an operation G

in Section 2.2.2, we constructed a matrix P of observ-able probabilities using informationally complete fidu-cial states and effects. For LGST, we will do the samething. But although we assume the existence and im-plementability of fiducial sets, we do not assume thatwe know them. They still form a reference frame, butwe don’t know what that frame is.

To reconstruct a set of gates (processes) {Gk}, wewill need one such matrix Pk for each gate:

[Pk]i,j = 〈〈E′i|Gk|ρ′j〉〉. (36)

These probabilities are directly measurable – we don’tknow what ρ′j and E′i are, but we can prepare/measurethem. The first line of Figure 3 illustrates the circuitcorresponding to Eq. 36.

We can construct A and B matrices from the fiducialvectors exactly as for process tomography. The differ-ence, of course, is that although those matrices exist,their entries are not known to the tomographer. Justas before, we can write

Pk = AGkB. (37)

Since we do not know A or B, we cannot solve this equa-tion for Gk. Instead, to compensate for our ignorance


d)

e)

f)

a)

b)

c)

native operation informationally complete set

e.g. e.g.

e.g. e.g.

Figure 3: Structures of the two types of circuits required bythe LGST algorithm. Upper panel: Each native gate, Gk, issandwiched between the elements of informationally completesets of effective state preparations, {ρ′

i}, and of effective mea-surements, {M′

j}. These are the same circuits that processtomography requires to characterize Gk. Line (a) shows thesecircuits in their simplest form, with each informationally com-plete set displayed as a unit. Line (b) depicts the common casewhen the set of effective preparations (measurements) is im-plemented by following (preceding) a single native preparation(measurement) operation with a fiducial circuit Ff (Hh), seeEqs. 51 and 52. Line (c) exemplifies that the fiducial circuitsare composed of native gates, and gives the circuit entirely interms of native operations. Lower panel: Because LGST doesnot assume knowledge of the ρ′

i and M′j , it requires circuits

that sandwich nothing between pairs of fiducials in order to beself-calibrating. The circuit diagrams in lines (d), (e), and (f)parallel those in (a)-(c). LGST also requires the circuits thatperform state (measurement) tomography on ρ (M), but theseare not explicitly shown. They are similar to (d)-(f) (replacingρ′ with ρ or M′ with M), and are actually included as a sub-set of these circuits when the gate set contains only a singlenative state preparation (measurement) and one of the prepara-tion (measurement) fiducial circuits is the empty (do-nothing)cirucit.

about ρ′j and E′i, we measure some additional proba-bilities that correspond to process tomography on thenull operation. We arrange these into a Gram matrixfor the fiducial states/effects:

1li,j = 〈〈E′i|ρ′j〉〉. (38)

Figure 3d-f depicts the circuits whose outcome proba-bilities make up 1l. This matrix can also be written interms of the A and B matrices, as

1l = AB. (39)

We assumed that these matrices are square (Nf1 =Nf2 = d2) and invertible (which follows from informa-tional completeness). So we can invert the Gram matrix

to get 1l−1 = B−1A−1, multiply Eq. 37 on the left byit,

1l−1Pk = B−1GkB, (40)

and solve for Gk to get

Gk = B1l−1PkB

−1. (41)

This may not look like the solution – there’s still anunknown B involved – but it is. We’ve reconstructedGk up to a similarity transformation by the unknownB. Moreover, we can do this in exactly the same wayfor all the gates Gk, and get estimates of them all upto the same B.

We also need to reconstruct the native states ρ(l) and

measurement effects {E(m)l } in the gate set. To do so,

we construct the following vectors (denoted by~·) of ob-servable probabilities:[

~R(l)]j

= 〈〈E′j |ρ(l)〉〉 (42)[~Q

(m)l

]j

= 〈〈E(m)l |ρ′j〉〉. (43)

Measuring these probabilities corresponds to perform-ing state tomography on each native state ρ(l), and mea-

surement tomography on every native effect {E(m)l } –

in the unknown frame defined by the fiducial effects andstates. They can be written using A and B as

~R(l) = A|ρ(l)〉〉 (44)~Q

(m)Tl = 〈〈E(m)

l |B, (45)

and by using the identity 1l = AB in Eq. 44, we get thefollowing equations for all the elements of the gate set:

Gk = B1l−1PkB

−1 (46)

|ρ(l)〉〉 = B1l−1 ~R(l) (47)〈〈E(m)

l | = ~Q(m)Tl B−1. (48)

Perhaps surprisingly, this is the answer – we’ve recov-ered the original gate set up to a gauge. The unknown


B is now a gauge transformation (see Eq. 15). It’s notjust unknown, but unknowable, and irrelevant. Thetomographer can set B equal to any invertible matrixand get back the original gate set, up to gauge free-dom. Equivalently, B indexes different (but equivalent)representations of the gate set. The simplest choice isB = 1l, but this almost always selects a very differ-ent gauge from that of the target gate set, resulting inthe right hand sides of Eqs. 42-48 being significantlydifferentfrom their ideal values even when there is lit-tle or no error. The best a priori choice is to choosea B corresponding to the tomographer’s best a prioriguess for the fiducial states (see Eq. 25), because that(implicitly) defines the gauge in which the tomographeris expecting to work. But although this is the best apriori choice, it is almost never satisfactory for comput-ing gauge-dependent quantities like the process fidelity.Reliable analysis of the estimate requires a posteriorigauge-fixing, which we discuss in Section 6.2.

3.3 Preparing the fiducial vectorsIn Section 3.2 above, we assumed by fiat that informa-tionally complete sets of fiducial states {ρ′j} and effects{E′i} could be prepared. But most processors admit justone native state preparation, and one measurement. Sofiducial states and measurements are implemented us-ing gates from the gate set itself (as shown in Figure 3(b-c,e-f)).

To do this, we define two small sets of QI/QO “fidu-cial circuits”. Each fiducial state is prepared by apply-ing one of the preparation fiducial circuits to a nativestate preparation. Each fiducial measurement is per-formed by performing one of the measurement fiducialcircuits before a native measurement. In the commoncase where the native state preparation and measure-ment are unique, the {|ρ′j〉〉} and {〈〈E′i|} are entirelydetermined by the corresponding fiducial circuits:

〈〈E′i| = 〈〈E(0)bi/Mc|τ (Hi mod M ) , (49)

|ρ′j〉〉 = τ (Fj) |ρ0〉〉. (50)

Equation 49 uses M ≡ N(0)E as shorthand to avoid too

cumbersome a notation, and uses b·c and mod to denotethe floor function and modular division.

In the more general case of multiple state prepara-tions and measurements, these expressions become:

〈〈E′i| = 〈〈E(m(i))t(i) |τ

(Hh(i)

), (51)

|ρ′j〉〉 = τ(Ff(j)

)|ρr(j)〉〉, (52)

They must include simple functions that map eacheffective-item index (i or j) to the native preparation

index (r(j)), measurement index (m(i)), or fiducial in-dex (f(j) or h(i))) corresponding to that item. Equa-tions 51 and 52 (or 49 and 50) introduce and definemeasurement fiducial circuits, {Hk}, and preparationfiducial circuits, {Fk} that result in the sets {|ρ′j〉〉} and{〈〈E′i|} being informationally complete. Their essentialfunction is describe how effective preparations and mea-surements are constructed from native operations, andare referred for this purpose in remainder of this work.This is shown pictorially in Figure 3.

This construction has no consequences for the lin-ear inversion analysis described above – how the fidu-cial states/effects are produced doesn’t matter, as longas they are consistent. But it does have other conse-quences.

First, it ensures that every “observable probability”required by LGST can be obtained by running a specificcircuit, composed from operations in the gate set itself.

Second, it subtly reduces the number of free param-eters in the model, because (e.g.) ρ′1 and ρ′2 are notentirely independent, but are actually generated by thesame set of operations applied in a different order. TheLGST linear inversion algorithm is blind to this corre-lation, but more careful statistical analysis can extractit and provide a somewhat more accurate estimate (seebelow).

Third, this construction places the burden of ensuringinformational completeness of the fiducial states/effectsonto the choice of fiducial circuits. This choice is not“canonical” for GST – it depends on the gates availablein the gate set. In practice, “fiducial selection” is usu-ally done by (1) modeling the gates as error-free, andthen (2) using that model to find a sets of fiducial cir-cuits that produce informationally complete states/ef-fects. The goal is to have a Gram matrix 1l that is well-conditioned, so that its inverse doesn’t inflate finite-sample errors. Optimal condition number is achievedwhen the fiducial states/effects form a 2-design[148].

Of course, the gates are not necessarily error-free, andfiducial selection may fail if they are sufficiently erro-neous. Fortunately, post-hoc validation is straightfor-ward, by computing the singular values of the measured1l before inverting it. If the d2 largest singular values arenot all sufficiently large, then different (or additional)fiducials should be added (and more data taken). If nofiducials are found to produce informationally completesets, then the operations in the gate set are not capa-ble of exploring the system’s nominal state space, andtomography must be limited to a subspace thereof[11].


3.4 Improving LGST with maximum likelihoodestimationIn the preceding sections, we have described the goalof tomography (including GST, and specifically LGST)as “reconstructing” states, processes, or gate sets. Thislanguage has a long history in the tomography litera-ture. But admitting the existence of finite sample er-rors – observed frequencies of circuit outcomes will notexactly match their underlying probabilities – disruptsthis paradigm. The objects that we “reconstruct” arenot quite the true ones. Worse yet, in general there isno “true” state or process! Quantum states and quan-tum processes are mathematical models for real phys-ical systems. Those models rely on assumptions (e.g.,Markovianity), which are never quite true in practice.

“Reconstructing” is a flawed term, and at this pointwe leave it behind. It is more accurate to say that GSTestimates the gate set. The goal of GST is to fit a gateset to data – i.e., to find the gate set that fits the ob-served data best. This description is more consistentwith the least-squares generalization of linear inversionderived above. But, having rephrased the problem thisway, it is very reasonable to ask “Is least-squares theright way to fit the data?” The LGST algorithms de-rived above minimize a sum of squared differences be-tween predicted probabilities and observed frequencies:

f(gate set) =∑j

(p

(gate set)j − f (observed)

j

)2. (53)

But while minimizing this sum of squared residuals isvery convenient, it’s not especially well-motivated. Fur-thermore, the LGST algorithms given above don’t actu-ally minimize the average squared error over the entiredataset – i.e., all the circuits performed to obtain LGSTdata – because the projector Π is chosen specifically tocapture the support of the Gram matrix 1l, without re-gard for the other directly observed probabilities (e.g.,those in the Pk).

The best way to fit the observed data optimally isto set aside traditional tomography and linear algebra,and focus directly on (1) the actual data (outcome fre-quencies of quantum circuits), and (2) the model beingfit to it (gate sets). The set of all possible gate sets canbe viewed as a parameterized manifold. Each point onthat manifold is a gate set that predicts specific, com-putable probabilities for each quantum circuit. It istherefore possible to write the objective function fromEq. 53 precisely – with j ranging over every circuit per-formed – as a function of the gate set, and minimize itnumerically to find a gate set that really does minimizethe total squared error.

There’s no particularly compelling reason to minimizethe total squared error, though. An objective function

that is very well motivated in statistics is the likelihoodfunction:

L(gate set) = Pr(data|gate set). (54)

Although not immediately obvious, the sum-of-squaresobjective function in Eq. 53 can be seen as an approx-imation to the likelihood. Maximizing the likelihoodfunction is the apotheosis of linear inversion, and yieldsa more accurate estimate when the likelihood functionis nonlinear, as it is here. If we numerically vary overgate sets to find the one that maximizes L, that is max-imum likelihood estimation (MLE), and it is a strictlybetter and more accurate way of analyzing LGST datathan the linear inversion methods defined above.

In the rest of this paper, we will adopt this approach.We will view the data as a list of circuits with observedoutcome frequencies. We will view gate sets as statis-tical models that predict circuit probabilities. And wewill find good estimates of real-world gate sets by fittingthat model’s parameters to observed data, using MLEor approximations to it.

A natural question, then, is “Why develop the wholelinear-inversion LGST framework, if having done so weimmediately move beyond it?” There are several goodreasons. The most practical is that the linear-inversionLGST algorithm developed above is actually a criticalfirst step in any numerical MLE algorithm! Becausecircuit probabilities are nonlinear functions of the gateset parameters, both Eq. 53 and the likelihood functionitself have unpleasant global behavior, including localextrema. Linear-inversion LGST is a very fast algo-rithm to obtain a pretty good estimate, which can serveas a starting point for (and be refined by) numericalMLE.

But LGST also serves as a critical conceptual foun-dation. Regardless of what estimator we use (linearinversion, MLE, or something else), we can’t estimatethe gate set’s parameters unless the experiments thatgenerated the data are sensitive to them. LGST con-structs a set of experiments guaranteed to be sensitiveto all the parameters in the gate set. Deeper circuits,introduced in the next section, amplify those parame-ters and allow greater sensitivity – but the experimentsthat we use in long-sequence GST are clearly derivativeof the ones in LGST, and rely on the same structure toguarantee sensitivity to all amplified parameters.


4 Long-sequence Gate Set TomographyTraditional process tomography on a gate Gk uses cir-cuits in which Gk appears just once. Observed proba-bilities depend linearly on Gk, making it very easy toinvert Born’s rule to estimate Gk. LGST bends thisrule – a given Gk may appear both in the middle of thecircuit and in the fiducial circuits – but the circuits areshort, and each gate appears only O(1) times. This en-ables the relatively straightforward analysis presentedin the previous section, but it also limits precision. If Aand B (Eqs. 21 and 25) are well-conditioned [conditionnumber O(1)] then each matrix element of Gk is veryclose to a linear combination of observed probabilities.

If each circuit is performed N times, then finite sam-ple fluctuations cause estimation errors in each proba-bility,

p = p± O(1)√N, (55)

and the accuracy of Gk is also limited to ±O(1)/√N .

Under certain circumstances, some of these probabilities(with p ≈ 0) can be estimated to within O(1/N) [103],but the resulting improvement in overall accuracy islimited by SPAM noise and by the fact that only a fewof the circuits can be chosen to have p ≈ 0.

A completely different way to break the 1/√N

boundary is to incorporate data from deep circuits, inwhich a gate appears many times. The circuits mustbe designed so that their outcome probabilities dependmore sensitively on the elements of Gk, with sensitivityproportional to the number of times Gk appears in thecircuit. For example, the outcome probabilities for

Pr = 〈〈E|GkGkGkGk|ρ〉〉, (56)

may be four times as sensitive to changes in Gk as theoutcome probabilities for

Pr = 〈〈E|Gk|ρ〉〉, (57)

and therefore allow four times the precision in estimat-ing some aspects of Gk. Long-sequence GST turns thissimple idea into a practical algorithm for characterizinggate sets.

The original motivation for long-sequence GST wasprecision and accuracy, but another advantage emergedlater. Unlike traditional tomography and LGST, long-sequence GST is easily adaptable to a wide range ofnoise models. In process tomography and LGST, eachnoisy gate is specifically described by an arbitrary trans-fer matrix. Even small extensions – e.g., focusing on lowrank transfer matrices – require conceptual and practi-cal modifications of tomography [57, 65, 69, 149]. Butlong-sequence GST only requires a parameterized modelfor the noisy gates that can be embedded into transfer

matrices. The gate set of Eq. 14 is such a model, with

one parameter per element of |ρ(i)〉〉, Gi, and 〈〈E(m)i | ex-

pressed in a chosen basis. But a gate set can, more gen-erally, be described by any mapping between a param-eter space and the aforementioned operations. Long-sequence GST can be used to estimate a gate set thatis parameterized in any reasonable way. This flexibilitycan be used to model various features and constraintson the gate set, including complete positivity (CP) andtrace preservation (TP). We defer a discussion of thisflexibility until Section 5.1, as these details are not es-sential to understanding long-sequence GST. Presently,it is sufficient to note that a gate set is a mapping froma parameter space (e.g., the direct sum of the vectorspaces for each operator) to a set of operations capableof predicting circuit outcomes.

The core long-sequence GST protocol used by ourresearch group has remained largely unchanged sinceabout 2017. But between 2013 and 2017, it morphedrepeatedly as we discovered that certain things didn’twork, or found better ways to solve certain problems.Its current stable form is the result of a nontrivial evo-lution process. Like most products of evolution, it con-tains both historical artifacts and necessary solutions tonon-obvious problems. Rather than present the state ofthe art as a fait accompli, we introduce it by outliningthe historical path that led to it (Section 4.1), empha-sizing some of the techniques that didn’t work. Next,we explain how to construct a good long-sequence GSTexperiment design, a set of circuits of depth at mostO(L) that enables estimating gate set parameters toprecision O(1/L) (Section 4.2). We conclude this pre-sentation of long-sequence GST by showing how to es-timate those parameters by efficiently maximizing thelikelihood function, and discuss a variety of technicalinsights that make the numerical methods reliable inpractice (Section 4.3).

4.1 Historical backgroundOur first attempt to achieve higher accuracy involvedperforming random, unstructured circuits and fittinggate set models to the resulting count data [14]. Weaugmented the set of circuits prescribed by LGST witha variety of random circuits (much like those used in di-rect randomized benchmarking [129]) of various depths,and then used numerical MLE to fit a gate set to the re-sulting data. This approach produces a likelihood func-tion that is generically messy – it is not a quasi-convexfunction of the gate set, usually has local maxima, andhas no particular structure. However, it is straightfor-ward to evaluate, and its derivative can be computedefficiently from the data. Moreover, a subset of the dataenable LGST, which provides a “pretty close” starting


point for local gradient maximization of L(G).Investigation of this approach revealed two problems.

1. Random circuits provided surprisingly low preci-sion. Although estimation error decreased with L,it declined as O(1/

√L), not O(1/L).

2. The lack of structure in the likelihood functionmade numerical MLE problematic. Local gradi-ent optimization worked only unreliably, and washighly dependent on starting location. We achievedreasonable success in simulations by starting withLGST, and then refining this estimate by addingsuccessively deeper circuits to the likelihood func-tion. However, this technique proved less reliable inexperiments, where the underlying model was lessvalid. Running the optimizer repeatedly with dif-ferent starting conditions, and incorporating moreglobal-optimization techniques, often revealed bet-ter local maxima of the likelihood. This suggestedthat even the best estimates we found might notbe global maxima.

We concluded that (1) as the IBM group had observedearlier, MLE on unstructured GST data was not a sat-isfactory solution, and (2) we needed to choose circuitsmore cleverly (as with LGST) to make the analysis eas-ier and more reliable.

We developed an approach called extended LGST(eLGST). It relied on two critical modifications to theoriginal “unstructured MLE” approach, which imposeadditional structure to the experiment design and dataanalysis (respectively).

First, instead of performing random circuits, we con-structed a set of circuits corresponding to performingLGST – not on the gates Gk themselves, but on asmall set of “base” circuits, {Sl}. We took each Sl andsandwiched it between fiducial circuits (just as we didwith each Gk for LGST). Originally, the base circuitswere simple repetitions of individual gates, e.g. Gpk forp = 1, 2, 4, 8, . . .. We found that this amplified some,but not all, parameters of the gate set.

This extended-LGST (eLGST), experiment designeventually evolved into one where the base circuits werechosen to be a set of short “germ” circuits (gi) repeatedp times (gpi ). This spawned the “germ selection” pro-cedure used today in long-sequence GST, which we dis-cuss in detail in Section 4.2. This structure ensures thateach base circuit amplifies some set of deviations fromthe target gates, so that these deviations change the ob-served probabilities for the circuits based on that basecircuit by O(L). The germs are chosen so that they am-plify different deviations, and so that collectively theyamplify all deviations. This careful, non-random ex-periment design allowed eLGST to achieve consistent,reliable, and predictable accuracy that scales as 1/L.

a)

b)

c)

native operation informationally complete set

amplificationally complete set

Figure 4: The structure of circuits in the standard GST exper-iment design, shown in increasing detail. (a) Each GST circuitconsists of an effective state preparation ρ′ (Eq. 52), followedby a germ circuit g repeated p times, followed by an effectivemeasurement M′ = {E′

i} (Eq. 51). (b) Effective preparationsare often implemented by a native state preparation ρ followedby a preparation fiducial circuit F , and similarly effective mea-surements are often implemented by measurement fiducial cir-cuit H followed by a native measurement M. (c) Writing thefiducials and germ in terms of native gate operations revealshow the native operations of a gate set compose to form a GSTcircuit.

Second, instead of using MLE to fit a gate set directlyto the data, eLGST fits the gates indirectly via a 2-stepprocess. In the first step, we estimated the transfer ma-

trix for each base circuit, τ(gpi ). Then, those estimateswere used to back out a transfer matrix for each gate.The eLGST protocol is a precursor to the long-sequenceGST we now describe. Appendix B gives a full descrip-tion of eLGST.

4.2 Experiment design for long-sequence GSTA long-sequence GST experiment is designed to enablehigh accuracy with minimal experimental effort. LGSTcan estimate a gate set to arbitrary accuracy, but be-cause uncertainty in the estimated parameters scales asO(1/

√N) if each circuit is repeated N times, achieving

precision ε requires O(1/ε2) repetitions. This makesprecisions of ε ≈ 10−5, as demonstrated in [19], practi-cally impossible to reach.

Long-sequence GST overcomes this barrier by speci-fying a different experiment design – a set of quantumcircuits to be run – containing circuits that amplify er-rors in the gate set. This experiment design retains thebasic structure of LGST: each of a list of “operations ofinterest” is probed by constructing circuits that sand-wich it between pre- and post-operation fiducial circuits.But instead of a single gate, the middle of each sand-wich is a more complicated base circuit that amplifiescertain errors so they can be measured more preciselyby tomography. In this section, we present the long-


sequence GST experiment design. We use the term ex-periment to mean an experiment design along with thedata obtained by repeating each of its circuits manytimes.

Each long-sequence GST circuit constitutes threeconsecutive parts (see Figure 4):

1. Prepare a particular state, |ρ′k〉〉 by performing anative preparation followed by a fiducial circuit.

2. Perform p repetitions of a short circuit g that wecall a germ.

3. Perform a particular measurement {〈〈E′(m)i |}

(Eq. 51) by performing a fiducial circuit and thena native POVM measurement.

The outcome statistics from repeating such a circuitmany times allow estimating probabilities like

p = 〈〈E′(m)i |τ(gpj )|ρ′k〉〉. (58)

If the states and measurements are each information-ally complete, these probabilities are sufficient for to-mography on τ(gpj ). All GST experiments are derivedfrom this basic structure, and every circuit performedfor GST is specified by an (j, k, p,m) tuple.

We call the short circuit g a germ because, like a germof wheat or the germ of an idea, it serves as the tem-plate for something larger – here, an arbitrarily deepcircuit built by repetition. We refer to gp, the object oftomography, as a base circuit and p as a germ power.Repeating g amplifies errors in g. For example, if g isa single unitary gate G, and G over-rotates by angle θ,then τ(gp) will over-rotate by pθ. By varying the inputstate and final measurement (k and m) over informa-tionally complete sets, we probe the operation betweenthem, just as in process tomography or LGST. If thistomography on gp measures pθ to precision ±ε, thenwe’ve measured θ itself to precision ±ε/p.

...

...

...

...

...

...

...

...

...

...

...

...

Nf1 state preparations

Nf2

mea

sure

men

ts

6~L=8

4 8 16L (maximum base circuit depth)

Ger

m

1. Germs

2. Base circuits

3. Sandwich with fiducials 4. Fiducial pair reduction (optional)

5. Final GST experiment design

3~L=4 15~L=16

Figure 5: Overview of how to design a GST experiment. Step1. Choose germ circuits that amplify all gauge-invariant pa-rameters in the processor’s gate set. Step 2. Define basecircuits by raising each germ to powers chosen so that the basecircuits’ depths are as close as possible to being logarithmicallyspaced. Step 3. Sandwich each base circuit between each ofNf1 ×Nf2 fiducial pairs defined by Nf1 preparation and Nf2measurement fiducial circuits. Step 4. Optionally apply fidu-cial pair reduction (FPR) to eliminate redundant circuits fromthe experiment design, leaving just enough to ensure sensitivityto each base circuit’s amplified parameters. Final result 5. AGST experiment design can be visualized as a grid of plaque-ttes. The plaquettes correspond to base circuits, are arrangedby germ and L, the maximum depth used to construct the basecircuit from the germ. Within a plaquette, squares indicate dif-ferent fiducial pairs, and the plaquette will be “sparse” whenFPR has been applied.

Simple germs comprising a single gate G are not suf-ficient to amplify every error. Some errors can only beamplified by first constructing a multi-gate circuit, e.g.,


g = G1G2, and then repeating it. Repeating g manytimes amplifies errors that commute with τ(g). In theexample above, an over-rotation error in G is an errorthat commutes with G, so it is amplified. But supposethat G rotates by the correct angle, but around thewrong axis, e.g., instead of a rotation around z, G per-forms a rotation around z cosφ+ x sinφ. This tilt erroris not amplified by repeating G, but it can be amplifiedby a more complex germ that includes G together withother gates.

To achieve high precision estimates efficiently, weneed to amplify every parameter in the gate set that canbe amplified. Two kinds of parameters cannot be ampli-fied. Gauge parameters cannot be measured at all, andproperties of the SPAM operations cannot be amplifiedbecause SPAM operations only appear once in each cir-cuit. So germs amplify errors in gates. Therefore, as wediscuss selection of germs, we will focus exclusively onthe gates and ignore SPAM operations.

In the following sections we describe how to select aset of base circuits – germs and powers to raise them tothat amplify all of the amplifiable parameters of a given“target” gate set. This target gate set is typically takento be the ideal gates a device is designed to implement.More details may be found in Appendix D.1.

4.2.1 Selecting germs (g)

To estimate a gate set efficiently and accurately, everyvariation in the gate set, except those correspondingto SPAM and gauge directions, must be amplified bysome germ. We call a set of germs that achieves thisgoal amplificationally complete, in direct analogy to “in-formationally complete” sets of states or measurements.

To evaluate (and ensure) amplificational complete-ness, we model errors in the gate set as small perturba-tions to the target gate set. Each germ g, when re-peated, will amplify certain errors. Specifically, anysmall perturbation to a gate set’s parameters thatcauses a change in τ(g) that commutes with τ(g) willbe amplified by g. The set of all such perturbations iseasily shown to be closed under linear combination, soeach germ g amplifies error vectors that lie in a sub-space of the gate set’s parameter space. This subspacecan be easily constructed from the target gate set andthe germ.

It is also straightforward to construct the tangentsubspace of gauge variations. It contains all small per-turbations around the target gate set that do not changeany observable probability at all. Its complement isthe subspace of physical errors that need to be ampli-fied. Now we can define amplificational completenessprecisely: a set of germs {gj} is amplificationally com-plete for a target gate set G if and only if the union of

the error subspaces amplified by each gj span the com-plement of the subspace of gauge variations.

Amplificational completeness defines a concrete con-dition that germs must satisfy for GST. However, it de-pends on the target gate set, so each new target gate setrequires finding a new set of germs. Furthermore, differ-ent desiderata may motivate different amplificationallycomplete sets of germs (e.g., it is reasonable to priori-tize the smallest set of germs, or the shortest possiblegerms, or the set that achieves the highest precision for agiven maximum depth). In the pyGSTi implementationof GST [120, 121], we use a particular search algorithmto evaluate sets of short germs and find the smallestamplificationally complete set. Algorithmic and math-ematical details can be found in Appendix D.1. We re-quire that a chosen germ set always includes each gateGi in the gate set as an independent germ.

4.2.2 Defining base circuits

Once a set of germs is selected, a set of base circuits isconstructed by raising each germ to several powers p.We begin by selecting p = 1. Since each gate Gi is alsogerm, this ensures that each individual gate appears asa base circuit. What remains is to choose all the othervalues of p that will appear in the experiment.

Large values of p amplify errors more. But includingonly large powers p would create an aliasing problem. Ifτ(g) over-rotates by θ = π/16, then repeating it p = 32times creates an over-rotation by 2π, which appears tobe no error at all! So, exactly as in phase estimation,smaller powers p are needed too. Since this is essentiallya binary search, logarithmically spaced values of p areoptimal.

More generally, a base circuit’s p is less relevant thanits overall depth. If germ g1 has depth 1, while g2 hasdepth 5, then g10

1 and g22 both have depth 10. So in

describing GST experiments, we typically organize basecircuits by their approximate depth l rather than p. Ifwe denote the depth of germ g by |g|, then the depth-l base circuit based on g is gbl/|g|c. Our intuition touse logarithmically spaced p above leads us to chooselogarithmically spaced l – i.e., l = 1,m,m2,m3, . . . – tochoose the base circuits.

Making m larger reduces the total number of circuitsto be run, but requires higher precision (more repeti-tions) at each value of l. Empirically, we have foundm = 2 to work reliably – i.e., l = 1, 2, 4, 8, 16 . . . – butother choices are possible. Germs of depth d > 1 do nothave realizations for depths l < d, so a germ of depth5 would appear first at l = 8 (with 1 repetition), thenl = 16 (with 3 repetitions), etc. Thus, the depth of a“depth l” base circuit is not necessarily equal to l, butis always in the interval (l/m, l].


Any given experiment can only include finitely manycircuits, so for each germ there must be a maximumdepth L at which it appears. This is configurable. Se-lecting L should be guided by three facts. First, in-creasing L amplifies errors more and yields more preci-sion (all else being equal). Second, increasing L makesthe experiment larger (more circuits), which requiresmore time to obtain and analyze. Third, there is es-sentially no benefit to increasing L beyond the pointwhere decoherence and stochastic errors dominate. Ifthe rate of decoherence in each gate is η, then circuitsof depth � 1/η generally all produce the same equilib-rium state, and little or nothing will be learned fromcircuits of depth L > O(1)/η. If different gates havevery different rates of stochastic errors, then it is usefulto let L vary from germ to germ.

4.2.3 Putting it all together

The circuits for a full GST experiment are constructedas follows. First a set of amplificationally completegerms is selected (see 4.2.1, Fig. 5 step 1). Next a setof base circuits is chosen (see 4.2.2, Fig. 5 step 2). Letthese be given by O =

{gpi,ji

}i,j

, where i indexes a

germ and pi,j is the j-th power that we apply to thei-th germ. As explained above, to amplify the desirederrors it is sufficient that the GST circuits estimate theprobabilities

pabij = 〈〈E′a|τ(gi)pi,j |ρ′b〉〉, (59)

where a = 1 . . . Nf1 and b = 1 . . . Nf2 range over theindices of the effective preparations and effects (Fig. 5step 3 and 5). Writing the effective state preparationsand POVM effects in terms of their constituent fiducialcircuits (Eqs. 51 and 52) gives the probabilities entirelyin terms of native operations, i.e. elements of the gateset:

pabij = 〈〈Em(a)t(a) |τ

(Hh(a)

)τ (gi)pi,j τ

(Ff(b)

)|ρr(b)〉〉.

(60)The corresponding circuit (which can be read off di-rectly, from right to left because the matrix-ordering ofEq. 60 is reversed from time-ordering) is repeated Ntimes to approximate pabij . The steps of this circuitare:

1. prepare the r(b)-th state

2. perform the (QI/QO) circuit Ff(b)gpi,ji Hh(a)

3. measure using the m(a)-th type of measurement

The frequency ft(a) = nt(a)/N , where nt(a) is the num-ber of times the t(a)th outcome was observed, estimatespabij . In this way, a circuit’s outcome data can estimate

all of the pabij that differ only in their t(a) (POVM ef-fect) index.

Letting a, b, i and j range over their allowed valuesin the steps above defines all the circuits needed to runlong-sequence GST. These circuits constitute a GST ex-periment design. Their structure is shown schematicallyin Figure 4, where again we omit the many indices (seeEq. 60) for clarity. For later analysis of the circuits’outcome data, it is helpful to separately keep track ofthe fiducial and germ circuits, and the germ powers. InAppendix E we show that the circuits of a GST exper-iment design (using the particular pyGSTi algorithmsdiscussed in Appendix D.1) result in an accuracy thatscales as the inverse of the circuit depth and total num-ber of circuits (Heisenberg-like scaling).

Finally, we note that arriving at a definite list of cir-cuits (an experiment design) required information basedon the experimental circumstances. The sets of fidu-cials and germs were based on the available native gates(specified by a target gate set), and the maximum depthof the circuits was based on a tradeoff between resourcesand accuracy (longer experiments are more resource-intensive, because they require more and deeper cir-cuits, but they give more accuracy – although only upto a maximum useful depth that is roughly the inverseof the decoherence rate). GST experiments are tailoredto a hardware’s capabilities.

4.3 Estimating gate set parameters in long-sequence GSTA gate set is a parameterized statistical model. The pa-rameterization may be simple (e.g., every gate elementis a parameter) or nontrivial (see Section 5.1). Fitting agate set model to data from the experiments describedin Section 4.2 is an example of parameter estimation.There are many ways to do this (see Section 3.4). Inthis paper, we focus on maximum likelihood estimation(MLE), which requires varying the gate set’s parame-ters to maximize the probability of the data (or, forpractical simplicity, its logarithm).

The loglikelihood function can be constructed as fol-lows. Let G denote the model, s index a circuit of theexperiment design, and Ns be the total number of timescircuit s was repeated in the GST experiment. Further-more, let ms be the number of outcomes of s, and Ns,βsdenote the number of times outcome βs was observed.The contribution to the total loglikelihood from s issimply the multinomial likelihood function for an ms-outcome Bernoulli scheme,

logLs = Ns∑βs

fs,βs log(ps,βs), (61)

where ps,βs is the probability predicted by G of getting


outcome βs from circuit s, and fs,βs = Ns,βs/Ns is thecorresponding observed frequency. The total loglikeli-hood for the entire GST experiment is just the sum

logL =∑s

logLs =∑s,βs

Nsfs,βs log(ps,βs), (62)

where s ranges over all of the circuits in the experi-ment design. This derivation assumes that G is trace-preserving (TP), so that ∀s

∑βsps,βs = 1. An exten-

sion to non-TP gate sets requires some technical tricksand is explained in Appendix D.2.

Maximizing logL is made nontrivial by the structureof the problem. This contrasts with state and processtomography, where the parameterized model is a den-sity matrix ρ or a superoperator G, each observableprobability is a linear function of the parameter, andthe loglikelihood is a sum of logarithms of linear func-tions. This means MLE state/process tomography is astraightforward convex optimization problem.

In contrast, the GST likelihood function (Eq. 62) isextremely non-convex. Because gates appear repeatedlyin circuits, the probabilities ps,βs are nonlinear func-tions of gate elements (and, more generally, the param-eters of G). The construction outlined previously causesthe ps,βs to oscillate, creating a loglikelihood functionthat looks like an egg crate or optical lattice, with manylocal maxima. Finding global maxima of such functionsis generally hard.

The gauge freedom creates more problems, by turn-ing unique maxima into “ridges” that trace out gaugeorbits. Constraining the optimization to CP gate setscreates complexity, because the CP condition does notrespect gauge symmetry and can create local maxima.Ignoring the CP constraint makes optimization easier,but allows unphysical gate sets that can have zero ornegative likelihood.

We have developed a particular pipeline of MLE andrelated estimators that reliably maximize Eq. 62 when Gis parameterized in one of several common ways (includ-ing with TP and CPTP constraints). We present thismethod, not as the only way to perform the parameter-estimation step of long-sequence GST, but as a way ofimplementing this crucial step of the protocol.

Our approach is outlined in Algorithm 1. Here ~θ isthe vector of G’s parameters being optimized, D is adata set, Truncate(D, L) is the subset of D’s data corre-sponding to circuits whose germ-power has depth ≤ L,and Argmin(S,G,D, ~θ1) yields the ~θ at which statistic

S(G(~θ),D) is minimal using a local optimizer seeded at~θ1. D0 is the full GST data set, and ~θ0 is the initialvector of model parameters, often provided by LGST(Section 3). We have found m = 2 to work well inpractice.

Algorithm 1 Long-sequence GST~θ ← ~θ0for L ∈ 1,m,m2,m3, . . . do

D ← Truncate(D0, L)~θ ← Argmin(χ2,G,D, ~θ)

end for~θ ← Argmin(− logL,G,D0, ~θ)

We highlight three important elements to this ap-proach that we believe are particularly important toits success:

Optimization Stages. Our approach proceeds in multi-ple “stages”. At each stage, we run a traditional localoptimization method (we find the Levenberg-Marquardttechnique to give good results) on a subset of all thedata. Each stage sets a maximum base circuit depthL, and at that stage only data from base sequenceswith depth less than or equal to L are incorporatedinto the likelihood function (Eq. 62). We choose L =1,m,m2,m3, . . ., so that the stage corresponding to Lcontains the circuits whose base circuits are gbl/|g|c forl = 1,m,m2 . . . , L. Successive stages add deeper cir-cuits, while keeping all the shorter circuits. By usingthe best-fit output of a stage as the starting point ofthe next we create a daisy chain of estimations thatavoids local minima. As long as finite sample fluctua-tions are kept small enough (by repeating each circuitenough times) the previous stage’s best-fit estimate lieswith high probability in the correct basin of the nextstage’s objective function, rendering its oscillatory na-ture benign.

Use of χ2 as a logL proxy. At every stage except thelast one, the χ2 statistic is optimized instead of loglike-lihood. The χ2 is a weighted-sum-of-squares functionthat (like likelihood) quantifies goodness-of-fit. Usingour definitions above, it can be written as

χ2 =∑s,βs

Ns(ps,βs − fs,βs)2

ps,βs=∑s

χ2s, (63)

where

χ2s =

∑βs

Ns(ps,βs − fs,βs)2

ps,βs(64)

is the contribution from a single circuit s. The weights(1/ps,βs) ensure that χ2 is a local quadratic approxima-tion to the negative loglikelihood. Compared with theloglikelihood, χ2 can be computed faster, and is morewell-behaved as an objective function. It has one sig-nificant disadvantage: its minimum is a slightly biasedestimator (see Appendix F). Optimizing the loglikeli-hood in the final stage renders this a non-issue, and we


find minimizing χ2 to be more robust at seeding thefinal logL maximization than performing logL maxi-mizations all the time. One notable exception to thisoccurs when the number of circuit repetitions is low andχ2 becomes a poor proxy for logL.

Regularization. We increase the reliability of optimiza-tion by regularizing both logL and χ2. Both functionshave poles when probabilities are zero, which can leadto numerical instabilities when probabilities are nearzero. By slightly altering each function, we make themmore amenable to optimization. A simple and effectiveway to regularize Eq. 63 is to limit the least-squaresweights to a maximum cutoff, e.g. 1/pmin. Since χ2 isalready just a proxy for the negative loglikelihood, thishas no effect on the final fit. The logL function needsto be tweaked more carefully, by replacing Eq. 61 withits second-order Taylor series when ps,βs < pmin, wherepmin is chosen to be much less than the smallest possiblenon-zero frequency (e.g., 10−4 if each circuit is repeated1000 times). Because this alters the logL function onlywhen a probability is much different than its observedfrequency, it distorts the objective’s value (and therebythe rigor of its interpretation) only for particularly badfits. We have evidence that these small adjustmentsto the standard logL and χ2 cause local optimizationmethods to be more robust, presumably because theyavoid regions with very large gradient and widen basinsof convergence.

The method outlined above is not guaranteed to finda global maximum of logL, but does so impressivelyoften in practice. Improvements to the algorithm are adeserving area for future work, but the algorithm de-scribed above demonstrates that the parameter estima-tion problem lying at the heart of long-sequence GSTis tractable, enabling its practical use.

The long-sequence GST protocol described above –designing an experiment, running that experiment toproduce data, and finally analyzing the data via multi-stage MLE – produces a best-fit gate set in an unknowngauge. The gauge is irrelevant when predicting circuitoutcomes, so the estimated gate set can be used imme-diately for this purpose. (For example, it is possible topredict the success probabilities of RB circuits and ex-tract the RB number.) However, computing standardmetrics such as fidelity and diamond-distance requiresan additional step called gauge optimization. We dis-cuss this, and other common post-processing, in Section6. Readers should feel free to jump to that section, ifdesired. Section 5 covers several advanced topics thatwere omitted from our presentation thus far of long-sequence GST.

5 Advanced long-sequence GSTThis section discusses two additional topics related tolong-sequence GST. Although they are not essential tothe protocol as a whole, they infuse it with significantadditional capability. First, we formalize the notion of agate set model, and suggest a natural path to extendinglong-sequence GST by creating additional models. Sec-ond, we describe how the number of circuits in the GSTexperiment design (Section 4.2) can be sizeably reducedby taking advantage of overcompleteness present in thestandard design. The material here is not required byany of the remaining main text.

5.1 Gate set modelsGate sets, as we have defined them, contain generalstate, measurements, and superoperators. A gate set G(Eq. 14) represents each gate as a d2×d2 real-valued ma-trix, and each state preparation or measurement effectas a d2-dimensional real-valued vector or dual vector,respectively. Let us define matrix space as M ∼ RNe ,where

Ne = d4NG + d2

(Nρ +

NM∑m=1

N(m)E

)(65)

is the total number of (real) elements in a gate set. Mis thus isomorphic to the direct sum of the vector spacescontaining each gate set operation’s matrix, vector, ordual vector. Any gate set (as defined by Eq. 14) is apoint in matrix space, G ∈ M, with coordinates givenby the elements of each gate set operation.

A gate set is a statistical model that assigns proba-bilities to the outcomes of quantum circuits.3

We use the term gate set model to mean a mappingbetween a parameter space P ∼ RNp and M. Thedimension of parameter space, Np, is typically less thanNe. Formally, a gate set model corresponds to a choiceof P and a map

W : P →M. (66)

A gate set model is a parameterized statistical modelthat associates with every point in parameter space agate set. Long-sequence GST finds an optimal gate setby searching over this parameter space, optimizing over~θ ∈ P. So by using different gate set models, we canconstrain this optimization to subsets of the entire ma-trix space. Informally, a basis for P defines “knobs”that can be adjusted – e.g., by the MLE optimization,

3Technically, a gate set is a factory, which can produce a sta-tistical model for any set of circuits. Since such a set of circuitsis always implied by context in which gate sets are used, we allowourselves to abuse terminology slightly.


but also by hand if so desired – to vary a gate set andmake it more consistent with the observed data.

If we set P = M and W (x) = x then each elementof every operation in the gate set is an independent pa-rameter. We call this the fully parameterized gate setmodel, and we can use it for GST (as in Section 4).However, it includes all gate sets, even wildly nonphys-ical ones that violate complete positivity and/or tracepreservation. So it is useful to define smaller gate setmodels that parameterize strict subsets ofM (e.g., onlyCPTP gate sets).

5.1.1 Gauge freedom in constrained gate sets

Gate set models may have gauge freedoms. A gaugefreedom exists if there is a transformation that can beapplied to ~θ ∈ P that changes ~θ but does not changeany observable probability that can be computed fromthe corresponding gate set W (~θ). We have already dis-cussed the gauge freedoms of the fully parameterizedgate set model; they correspond to similarity transfor-mations by invertible matrices M that act on each gatematrix as Gk →MGkM

−1 (see Eq. 15). For any givenG, the action of all possible M on G traces out a gaugeorbit containing all the gate sets G′ ∈M that are equiv-alent to G. For general gate set models, one must con-sider the pullback of Eq. 15 to P. As we discuss be-low, this can make the analysis more complicated. InM, gauge freedoms correspond to gauge orbits – sets ofequivalent gate sets. But in P, these freedoms can maponto a different construct. Still, it is helpful to picturethe parameter space as being foliated into gauge orbits,like an onion or layered pastry. Appendix A.1 describesgauge transformations in more detail.

It can be useful to count the gauge degrees of freedomfor a given gate set model. All the gauge orbits, exceptfor a set of measure zero, are manifolds of this fixed di-mension. (If one or more gates in a gate set have degen-erate spectra, then they remain invariant under certaingauge transformations; the corresponding orbits havelower dimension, like the center of an onion). Since

P has dimension Np, then any point ~θ can be variedin Np linearly independent directions. These define alocal tangent space at θ, which we can partition intoorthogonal subspaces of Ngauge

p gauge directions that

are tangent to the gauge orbit on which ~θ lies, andNnongauge

p non-gauge directions that are normal to thegauge orbit (see Appendix A.1). It follows trivially thatNgauge

p +Nnongaugep = Np. This partition of Np is same

at almost all points ~θ (the exceptions are the singu-lar points corresponding to gate sets with degenerategates), allowing us to view P mathematically as a fiberbundle.

The fully parameterized gate set model has a param-

eter space with dimension

N fullp = NGd

4 +(Nρ +

NM∑m=1

N(m)E

)d2, (67)

and it generally has d2 gauge degrees of freedom excep-tions occur when one or more operators commute withevery operation in the gate set.

5.1.2 TP and CPTP constrained models

Quantum theory requires density matrices to have unittrace, and the operations acting on them to be trace-preserving (TP). These are constraints on G which canbe used to define a smaller gate set model. The TPconstraint corresponds to locking the first row of ev-ery superoperator to be [1, 0, . . . 0] and the first elementof every state preparation vector to equal 1/

√d (since

Tr(B0) = d/√d =

√d, see Section 2.1.2). Enforcing

these constraints defines a TP parameterized gate setmodel with Np = NTP

p parameters, where

NTPp = NGd

2(d2 − 1) +Nρ(d2 − 1) +(NM∑m=1

N(m)E

)d2.

(68)The mapping W is trivial to construct in this case: itleaves some of the elements of G fixed.

When running long-sequence GST, the TP parame-terized gate set model is generally superior to the fullyparameterized model, and presents no extra complica-tions. The gauge freedoms for the TP parameterizedmodel are easy to derive; gauge transformations corre-spond exactly to matrices M that are also TP, so in-stead of d2 gauge parameters, the TP parameterizedmodel has d2 − d gauge parameters.

Quantum theory also requires operations to be com-pletely positive (CP). Imposing this constraint definesa CP parameterized gate set model. However, the CPconstraint is harder to define (and impose) than the TPconstraint. Whereas the TP constraint is a constantequality and defines a subspace of the full parameterspace, the CP constraint is an nonlinear inequality.

A mapping function W whose range is constrained toCP gate sets can be constructed in several ways. One isto represent each gate by the Cholesky factorization Tof its Choi matrix. In this way, each gate is representedby a lower-triangular matrix Tk with real diagonals, andthe gate (transfer) matrix Gk is obtained by applyingthe Choi-Jamiolkowski isomorphism to the Choi matrixχk = T †kTk. It is usually desirable to apply the TPconstraint as well, to get a CPTP parameterized gateset model ; this is done by additional constraints on eachTk (unfortunately, the TP constraint is not as simple inthe Choi representation).


An alternative way of constraining a gate to be CP isto write it in terms of an error generator, which we de-note ξ. Let Gk be the transfer matrix of a gate, and thecorresponding ideal (error-free) CPTP operation be G0

k.We define Gk’s error generator, ξ, to be the logarithmof the quotient Gk(G0

k)−1, so that

Gk = eξG0k. (69)

The error generator ξ is itself a superoperator, and whenξ = 0, Gk = G0

k and the gate has no error. By restrict-ing ξ to be a Lindbladian [95], we can guarantee that eξ

is CPTP. Because CPTP maps form a semigroup, Gkis CPTP as well. The Lindblad form that ensures eξ isCPTP is

ξ =d2∑i=1

αiHi +d2∑

j,k=2βjkSjk, (70)

where operators Hi and Sjk act as

Hi : ρ→ i[ρ,Bi] and (71)

Sjk : ρ→ BjρBk −12 (BkBjρ+ ρBkBj) (72)

on density matrices ρ, αi is real, β is a positive semidefi-nite (Hermitian) matrix. Here, {Bi} is a basis for B(H)with the properties given in Section 2.1.2. Indices j andk sum only over the non-identity elements, so they be-gin at 2. The condition on β can be implemented byparameterizing β’s Cholesky factorization as describedabove. We have found that this parameterization fa-cilitates optimization better than the Choi-matrix pa-rameterization. (This practical advantage makes thisthe CPTP parameterization of choice in pyGSTi.) Adownside to the error generator approach is that not allCPTP maps can be put into the form given by Eq. 69- only those that can be infinitesimally generated. Wehave not observed this restriction to have significantconsequences in any experiment to date.

Using either parameterization, the number of param-eters Np for the CP and CPTP gate set models are thesame as those for the full and TP models, respectively.

In the literature on state and process tomography,estimators that respect CP are common, and generallyregarded as superior to unconstrained estimators. Theargument is simple: nature and quantum theory onlypermit CP processes, so we should not consider non-CPestimates. Furthermore, the CP constraint constitutesuniversally valid prior information, and must thereforeimprove estimation accuracy (at the cost of some bias).

The value of CP is more ambiguous in the context ofGST. The CP constraint can be imposed, as describedabove, but doing so has some practical consequences. Itmakes MLE estimation more complex [145], and com-plicates the construction of error bars (see Section 6.3).It also removes a valuable diagnostic: the data should

be consistent with a CP gate set whether or not the con-straint is imposed, so when GST returns a non-CP gateset (possible when the constraint isn’t imposed), this isa sign that something else is wrong – usually some formof non-Markovian dynamics in the experiment.

Most importantly, the CP constraint interacts badlywith the gauge freedom. Every gate set on a gauge or-bit is equivalent – gauge transformations do not changeany physically observable property. But non-unitarygauge transformations do change complete positivity!This may seem paradoxical, since non-CP gates can pro-duce negative circuit outcome probabilities. But thisoperational interpretation requires the ability to createarbitrary input states, including states entangled witha reference frame, and apply the gate to them. Suchstates constitute an absolute reference frame, and thegauge freedom arises precisely because no such refer-ence frame is available. A gate set may be CP in onegauge, and non-CP in another gauge, because those twogauges imply different sets of allowable (non-negative)input states to the gates. We explore these issues fur-ther in Appendix A.1.

In practice, we have found it useful to perform long-sequence GST using both the CPTP and TP gate setmodels, and compare the results. If a significantly bet-ter non-CP fit is found, then the two estimates shouldbe examined carefully: either the CPTP fit got trappedin a local maximum, or significant non-Markovian dy-namics occurred during the experiment. If differencebetween the two fits is not statistically significant, thenthe CPTP fit is often preferable for later analysis be-cause it is better behaved within post-processing (e.g.,each gate’s eigenvalues cannot be greater than 1.0).

We conclude this discussion of gate set models by not-ing that we have only scratched the surface of possiblemodels here. The long-sequence GST framework (Sec-tion 4) is agnostic to which gate set model is used. Wehave experimented with models ranging from the fullyparameterized model to radically simplified models likea single-parameter model where each gate experiencesdepolarizing error with the same rate. In this paper wedo not dig more deeply into these possible variationsor their applications: we generally assume that one ofthe fully-, TP-, or CPTP-parameterized models is beingused.

5.2 Fiducial-pair reductionIn the “standard long-sequence GST” experiment de-sign of Section 4.2, every base circuit is sandwichedbetween every possible pair of preparation and mea-surement fiducial circuits to generate the full exper-iment design. This enables complete tomography ofeach base circuit, which is clearly sufficient to estimate


the gate set. But it also yields very large GST experi-ments. Information completeness requires O(d2) prepa-ration fiducial circuits and O(d) measurement fiducialcircuits (because each one provides d − 1 independentoutcomes), giving O(d3) fiducial pairs. In the 2-qubitcase this means ≈ 43 = 64 circuits are required for eachbase circuit.)

However, many of these circuits are redundant. Elim-inating redundant circuits yields much smaller and moreefficient experiments. The redundancy stems from thefact that (as observed previously) each germ g only am-plifies certain “directions” in error space. Because thissubspace corresponds to commuting deviations in τg(the operation produced by g), its dimension is that ofτg’s commutant. This can be much smaller than thespace defining τg itself – e.g., if τg is a nondegener-ate d2 × d2 matrix, then although perturbations to τgform an d4-dimensional space, the commutant is onlyd2-dimensional. So measuring all the fiducial pairs is re-dundant. We only need to ensure that each base circuitis probed so that every amplified perturbation impactsthe observed outcome probabilities. Each amplified di-rection will affect the outcome probabilities correspond-ing to at least one fiducial pair. So if a given germ am-plifies m directions in parameter space, it is sufficientto select m fiducial pairs that measure linearly indepen-dent probabilities sensitive to those m variations. Step4 in Figure 5 depicts how FPR functions during theconstruction of a GST experiment design.

The pyGSTi implementation performs “fiducial pairreduction” via an algorithm, described in AppendixD.1, that starts by constructing an informationally com-plete set of effective SPAM pairs, then selects germs,and finally eliminates redundant fiducial pairs for eachbase circuit, one by one, until no further fiducial pairscan be eliminated without losing sensensitivity to someamplified perturbation of the gate set’s parameters.

A systematic study of the robustness of fiducial pairreduction is left as a topic for future work. Anecdo-tal evidence suggests that fiducial pair reduction canreduce the size of GST experiments significantly, by 50-90% in many cases. However, eliminating redundanciesin the data also tends to increase the risk that GST’sparameter estimation step will get stuck at a local max-imum, and we have observed this particularly in caseswhere the GST model does a poor job at explaining thedata. At this point we see the fiducial pair reductiontechnique as useful and sometimes necessary, but notone that should be applied in every case.

6 Analyzing GST estimatesWe now turn from describing the GST protocols to in-terpreting their output. GST is a characterization pro-

tocol, and its purpose is not just to provide overall per-formance estimation (like benchmarking) but to revealhow an as-built component (e.g., gate or gate set) dif-fers from its ideal. This requires nontrivial analysis. Inthis section, we give several ways to analyze or post-process the results of a GST experiment. We will as-sume throughout that the GST optimization success-fully finds a global optimum, i.e., it doesn’t get stuck ata local maximum. This assumption is motivated by theexperiment design and iterative optimizaiton algorithmgiven in previous sections, as well as by our experience.We first discuss how to assess the validity of the GSTestimate (Section 6.1). If a GST estimate is valid, itmay be helpful to gauge-optimize the estimate in orderto compute common gauge-dependent metrics like pro-cess fidelity or diamond-distance (Section 6.2). Finally,we describe how to put error bars on quantities derivedfrom a GST estimate (Section 6.3).

6.1 Goodness-of-fit and MarkovianityThe first diagnostic from a GST experiment has nothingto do with the estimated gate set itself. It is the good-ness of fit – how well does that estimate (whatever itis) match the data? GST’s gate set model is capable ofdescribing any set of Markovian gates in d-dimensionalHilbert space, along with state preparations and POVMeffects (modulo optional TP and/or CP constraints). Soif it fails to fit the data, this is strong evidence that someassumption of the model was violated. We describe allsuch violations as “non-Markovianity”, meaning thatthe observed behavior was influenced by some internalor external context variable that was not included inthe model [141, 158]. Common phenomena of this typeinclude slow drift [128], leakage [164, 172], persistentenvironments [28, 81], pulse spillover [41, 60, 116, 125],and gate-induced heating [134, 168]. We do not attemptto diagnose specific phenomena here. Instead we focuson how to quantitatively detect when the GST modelhas “failed to fit the data” as well as it should. Incases where such pheomena are known to dominate thenoise, it would be more appropriate to use techniquestailored to the corresponding type of non-Markoviannoise, e.g. Ref. [128].

Consider a GST experiment containing Nexp distinctcircuits. It will produce a dataset described by No freeparameters, where No is simply the number of inde-pendent circuit outcomes that can be observed. In thetypical case where there is just one native measurement

(NM = 1) that has N(1)E outcomes,

No = Nexp

(N

(1)E − 1

). (73)

For a single qubit supporting just one native 2-outcome


measurement, N(1)E = 2 and each circuit’s data has 2−

1 = 1 independent degree of freedom.Any data set like this can be explained perfectly by a

trivial “maximal model” with No parameters – the onethat assigns one probability for each independent out-come. Fitting this maximal model to the data achieveslogLmax (see Section 4.3).

If GST’s estimate has a likelihood of L, then we canmeasure how well GST fit the data by comparing logLto the maximal model’s logLmax. If the data wereproduced by some Markovian gate set, then both theGST model and the maximal model are valid, and bothshould fit the data equally well if we account for extrafree parameters in the maximal model. Wilks’ theorem[171] tells us how to do this accounting, and we canuse it to quantify the fit quality of the GST estimaterelative to that of the maximal model. Wilks’ theoremstates that if the gate set model is valid, then the log-likelihood ratio statistic between them is a χ2

k randomvariable:

2(logLmax − logL) ∼ χ2k, (74)

where the maximal model has k = No−Nnongaugep more

parameters than the gate set model has non-gauge pa-rameters.

If the loglikelihood ratio appears to have been sam-pled from a χ2

k distribution, then the GST model fitsthe data as well as can be expected, and captures thebehavior of the device exposed by this data. On theother hand, if the observed loglikelihood ratio is so highthat a χ2

k distribution would be very unlikely to haveproduced it, then we have evidence that the data weregenerated by a non-Markovian process. The χ2

k distri-

bution has mean k and standard deviation√

2k. Wequantify the observed model violation by the numberof standard deviations by which the loglikelihood ratioexceeds its expected value under the χ2

k hypothesis:

Nσ ≡2(logLmax − logL)− k

2√k

(75)

Nσ ≤ 1 indicates an extremely good fit that appearscompletely trustworthy. (We have rarely seen this ex-cept on artificially simulated data). Conversely, Nσ � 1indicates significant model violation, meaning that nogate set can describe all of the data.4 Since the gate setonly assumes Markovianity (and sometimes the physi-cal TP and CP constraints), model violation indicatesthe presence of some kind of non-Markovian noise (asdefined above).

We emphasize, however, that a large Nσ value doesnot necessarily mean that the observed behavior is

4This statement relies on the stated assumption that the globallikelihood optimization was successful

strongly non-Markovian. It indicates high confidencein the conclusion “the Markovian model was violated”.This does not quantify how much the model would haveto be expanded to fit the data well, nor the probabil-ity of observing a surprising event, and it doesn’t implythat the GST estimate is useless or untrustworthy. Ifthere is any model violating behavior at all, then Nσgenerally increases linearly with the number of timeseach circuit is repeated. So more sensitive experimentswill yield higherNσ, even with exactly the same physics.Quantifying model violation in more operational anduseful ways is a compelling topic for future research.

The Nσ statistic is sensitive to the total model vio-lation, added up across all circuits. It’s also useful toidentify individual circuits whose behavior is inconsis-tent with the GST estimate. We can do this by ex-amining each circuit’s contribution to the loglikelihood,logLs (Eq. 61), and comparing it to the maximal model.Wilks’ theorem predicts 2(logLmax,s−logLs) should beχ2k-distributed, with k approximately equal to the num-

ber of independent outcomes of a single circuit. 5 Theseper-circuit tests are almost independent from the “to-tal” test described above – either test may reveal modelviolation when the other does not, although most formsof model violation trigger both.

If many circuits are tested simultaneously for modelviolation (as is usually the case), it is important to raisethe threshold for detecting a violation using extremevalue theory. If K tests are performed in parallel, thento keep the probability of any false positive below α,each test should be performed at the α1/K confidencelevel.

We have found a plot of the per-circuit goodness-of-fitvalues, represented as a grid of colored boxes arrangedby germ and base circuit depth (see Figure 6), to bea useful diagnostic tool. The color scale in Figure 6changes from a linear grayscale to a logarithmic red-scale when the logLs value of a box exceeds the 95thpercentile of the expected χ2 distribution (but see noteabove about adjusting for multiple tests). Deeper cir-cuits are more sensitive to most forms of non-Markoviannoise. If model violation (red boxes) appear preferen-tially in circuits containing a particular germ, this usu-ally indicates that a gate appearing in that germ is thecause.

6.2 Gauge optimizationWe now turn to analysis of estimated gate sets. Theoutput of “doing GST” on a quantum processor – con-structing an experiment, performing the circuits, and

5Technically it would be more accurate to subtract from thisvalue the number of gate set parameters “per circuit”, but this isusually much less than one and has virtually no impact.


1 2 4 8L

Gx

Gy

Gi

Gx ·Gy

germ

0 4 8 0 7 0 4 0 4 7 1 27 7 1 1 1 13 1 7 63 4 7 16 1

1 10 6 1 5 2 10 1 2 5 8 7 5 8 13 2 21 25 0 12 3 1 39 9

9 1 0 10 1 7 1 10 6 1 5 2 1 5 10 8 2 3 21 0 3 12 1 0

0 1 1 0 0 20 1 0 1 0 3 0 0 3 0 8 2 6 1 5 0 4 0 5

3 9 6 1 10 2 9 1 0 10 1 7 10 1 2 5 8 7 2 21 1 0 12 10

3 3 0 9 1 1 3 9 6 1 10 2 1 10 6 1 5 2 8 2 4 21 0 0

3 0 7 11 0 16 7 0 1 4 1 7 16 0 7 7 0 11 0 4 14 2 1 64

0 4 3 10 1 1 3 3 7 0 3 9 1 2 9 0 1 26 0 5 25 1 1 16

6 0 2 0 3 4 2 1 2 25 0 1 4 1 1 7 1 0 1 3 0 3 1 0

4 15 9 6 2 7 9 0 3 4 0 1 7 0 1 4 1 7 11 4 11 0 0 14

0 1 4 0 0 0 4 5 1 1 14 0 0 0 0 0 6 0 4 0 5 0 1 3

5 0 4 1 0 3 4 15 9 6 2 7 3 0 7 11 0 16 7 6 11 1 1 0

16 0 1 9 0 0 20 0 3 3 0 1 4 0 1 4 13 0 0 1 18 6 14 17

4 2 5 1 0 7 4 2 1 6 10 0 3 3 0 2 7 20 1 32 14 0 43 37

6 5 0 1 3 8 0 13 1 2 8 1 0 0 6 2 0 2 37 8 12 36 6 1

12 0 0 0 4 0 9 6 1 1 0 2 4 3 5 0 0 7 8 0 32 22 15 43

2 10 17 7 3 0 2 6 8 6 2 0 4 1 15 0 4 4 2 31 6 0 6 19

18 8 1 3 20 1 18 2 1 8 24 0 0 1 4 0 6 0 28 7 3 31 6 0

9 0 3 4 0 1 0 11 1 0 8 0 1 0 2 3 0 16 16 12 2 14 15 0

9 1 0 10 1 7 4 10 8 1 0 4 0 0 4 9 3 28 17 7 1 20 7 2

3 9 6 1 10 2 0 0 1 3 0 5 1 0 0 1 0 2 0 17 30 1 44 42

5 0 4 1 0 3 15 6 1 2 12 10 3 1 0 0 0 1 3 6 1 1 0 1

3 3 0 9 1 1 1 0 3 0 1 1 0 1 16 4 2 16 22 2 9 12 0 11

4 3 5 3 9 9 0 1 1 0 0 20 3 0 5 10 2 1 0 7 23 0 9 43

Figure 6: A sample model violation plot displaying the valuesof 2(logLmax,s − logLs) for circuits s. Each 6 × 6 plaque-tte of colored squares represents a set of circuits based on thesame base circuit gp with (maximum) depth L increasing alongthe horizontal axis and the germ g varying along the verticalaxis. The 36 distinct boxes within a plaquette represent distinctcircuits, formed by sandwiching gp between 6 different prepara-tion fiducials (indexed by column) and 6 measurement fiducials(indexed by row). In this particular case the single-qubit gateset ideally comprised of X(π/2), Y (π/2) and I (the identity)was used, and both the preparation and measurement fiducialsare the set {∅, X, Y,XX,XXX, Y Y Y }, where X and Y areshorthand for X(π/2) and Y (π/2).

fitting a model to the data – is a gate set that representshow that processor’s gates act. The catch, as noted re-peatedly above, is that this estimate is only unique upto gauge transformations. And while gauge transforma-tions have no effect on outcome probabilities of circuits,they can have a huge effect on the individual transfermatrices representing the gates, and derived quantitiesof interest like fidelity.

The ideal solution to this problem would be to onlycompute and report gauge-invariant properties. Butmost of the metrics commonly used to assess quan-tum operations are not gauge-invariant. They vary –usually quite a lot – over gauge orbits. This meansthey don’t correspond to physically observable quanti-ties [127]. These metrics originated as ways to quantifythe performance of a single gate in the context of other,perfect operations that form a reference frame. Gaugefreedom emerges from the absence of a reference frame,and the motivation for GST is that, despite common as-sumptions, experiments on quantum processors usuallydo not feature an absolutely reliable reference frame.But even if they are flawed, gauge-variant metrics likefidelity, diamond distance, or entropy are not currentlydispensable; no gauge-invariant replacements exist. Soin order to compute well-studied gauge-variant metricssuch as process fidelity and diamond distance for GSTestimates, we choose a particular gauge using what wecall gauge optimization.

Gauge optimization means reporting the GST esti-mate, G, in a gauge that optimizes some gauge-variantmetric of “closeness” to the ideal target gate set. Inother words, we choose the gauge to make the gateslook as good as possible. The best metric is not uniqueor obvious, and can be highly situation-dependent. Theimplementation of GST in pyGSTi [121] supports gaugeoptimization using a variety of metrics. Actually per-forming the gauge optimization, once a metric is chosen,is straightforward and uninteresting (pyGSTi uses lo-cal gradient minimization, with occasional long randomjumps to avoid rare local minimum traps). We focushere instead on (1) the rationale and justification forgauge optimization, and (2) the rationale for choosinga metric to optimize.

6.2.1 Why gauge optimization?

A common and fundamental objection to GST andgauge optimization goes something like this: “Intention-ally seeking out and constructing the gate set closest tothe desired target seems like cheating, if not activelycircular.”

No truly satisfactory answer to this can exist, becausein the absence of a privileged reference frame, gauge-variant metrics have limited meaning and shouldn’t


be computed. Gauge optimization is fundamentallya hack. But the quantum computing community hasbeen reluctant to abandon metrics like process fidelity,diamond distance, and trace distance that are deeplyrooted in the theory and literature of quantum informa-tion – not least because no good alternatives are knownyet.

But, with that disclaimer, there are good arguments(which we find compelling) that gauge optimization isat least a good hack, contrary to the objection proposedabove.

First: Gauge optimization is not an all-powerful toolfor reducing errors. It cannot alter any predicted cir-cuit outcome probabilities, and is therefore powerless toimprove any circuit-outcome-based performance metric.Gauge optimization does seek out the gate set closest tothe desired targets – but it searches over a very limitedset of candidates! Each gate’s eigenvalues are gauge-invariant. Target gates are almost always unitary, sothey have unit-modulus eigenvalues. Any errors thatchange a gate’s eigenvalues – e.g., stochastic errors thatshrink a gate’s eigenvalues toward zero – cannot be dis-guised or eliminated by gauge transformations. Almostevery known physical error in gates cannot be elimi-nated by gauge transformations. Even coherent tilt er-rors, which manifest as relational mismatches betweenthe rotation axes of different gates, can only be pushedfrom one gate to the other by gauge transformations.Physical errors that do correspond to gauge transfor-mations can originate from an error that is global to thegate set being analyzed but not to the entire lab. Forexample, a misalignment of one qubit such that all itsgates and SPAM operations are rotated corresponds toa gauge transformation of the 1-qubit system (but thiswould not correspond to a gauge transformation in a2-qubit system where the other qubit was not rotated).

Second: While gauge transformations cannot removemost physically plausible errors, they can create arbi-trarily large unphysical errors. Applying a large unitarygauge transformation to the target gate set itself willmake every gate appear to have large coherent errors,and thus large infidelities and diamond distances. Butof course, this is an illusion – this gate set has no er-rors at all, because it’s just another representation ofthe target. As long as we are stuck with gauge-variantmetrics, this example demands gauge optimization oran equivalent gauge-fixing procedure.

Third: It’s tempting to seek a different gauge-fixingprocedure that produces a standard form for gate sets,without explicitly trying to make them look like thetarget gate set. But, as the previous example shows,any gauge-fixing procedure that works actually has todo this anyway – just in disguise. If we construct a gateset that’s equivalent to the target, and then feed it into

a gauge-fixing procedure, it absolutely must return thetarget gate set, regardless of what the target is. If itdoesn’t, then it will return a gate set that appears tohave errors, which is not true (by construction).

6.2.2 Metrics for gauge optimization

Given an estimated gate set G and a target G0, gaugeoptimization finds the gauge that minimizes some func-tion

f(G,G0). (76)

Some general guidelines for choosing f include: itshould be non-negative, it should be uniquely zero ifG = G0, and it should facilitate minimization (e.g.,smoothness and convexity are desirable). However, itdoes not need to satisfy all the properties of a mathe-matical metric (e.g., the triangle inequality).

We examined variations on three different metrics, allderived from summing up some well-known function oftwo matrices over all the logic operations in the gateset6:

1. Infidelity,

2. Trace/diamond distance,

3. Squared Frobenius (Euclidian) distance.

Infidelity and trace/diamond distance have operationalinterpretations in quantum information. The Frobeniusdistance between transfer matrices and density matricesgenerally does not.

Infidelity is not a suitable metric for gauge optimiza-tion for an interesting reason: when applied to generalmatrices (with no positivity constraint), it is neitherstrictly positive nor uniquely minimized when G = G0.This is related to the fact that infidelity is not actually ametric in the mathematical sense. If G is a unitary gate,then although G has fidelity 1 with itself – F (G,G) = 1– it’s actually possible to achieve fidelity greater than1 by performing a nonunitary gauge transformation, soF (G,MGM−1) > 1 for some M . We emphasize thatin these circumstances, M is not unitary, and MGM−1

is not completely positive. But nonetheless, this meansthat infidelity only satisfies the required desiderata ifthe CP constraint is imposed during gauge optimiza-tion. This is inelegant, and poses practical problems ingauge optimization.

We found trace/diamond distance to be expensiveto compute (because there is no closed form for dia-mond distance), hard to work with (because it is notsmooth), and to yield results not significantly differentfrom squared Frobenius distance. In principle, diamond

6SPAM operations are included in these “logical operations”,and are incorporated in various ways


distance is an appealing metric – it has deep opera-tionally meaning, and there is a nice elegance in saying“We have chosen the gauge that minimizes the diamonddistance to the target gates.” However, we found it tooinconvenient in practice to justify this advantage.

We find the squared Frobenius distance to be themost useful gauge optimization metric. It is well-behaved and fast to compute, and we have found fewdisadvantages to its use. In particular, we utilize theweighted sum of squared Frobenius distances betweenthe estimated and target gate set elements,

f(G,G) =Nρ∑i=1

αi|ρ(i) − ρ(i)|2 +NG∑i=1

βi|Gi −Gi|2

+NM∑m=1

N(m)E∑i=1

γm,i|E(m)i − E(m)

i |2, (77)

where the notation of Eq. 35 is used to denote the el-ements of the estimated (G) and target (G) gate sets,and | · | represents the Frobenius norm. The weights αi,βi and γm,i are real numbers.

The weighting turns out to be important, for the oddand interesting reason that SPAM errors cannot be am-plified. Long-sequence GST with very deep circuits canresolve errors in gates to very high precision, poten-tially below 10−5. But achieving this sort of accuracyin SPAM operations is virtually impossible. An errorthat rotates the initial state relative to the measurementaxis by a small angle θ cannot be detected using fewerthan O(1/θ2) shots. In an experiment where the longestcircuit is L gates deep, and each circuit is repeated Ntimes, estimation error in the SPAM operations is typ-ically O(1/

√N), while estimation error in the gates is

O(1/L√N), which can be vastly smaller.

But gauge transformations can “slosh” certain errors– coherent errors, in particular – between SPAM op-erations and gate operations. As a specific example,consider a GST experiment with N = 104 and L = 100,performed on a perfect experimental system. In thisidealized case, all estimated errors are due entirely tofinite-sample noise in the experiment. We would ex-pect the estimated errors in the gates to be around±1/(L

√N) ≈ 10−4, but estimated errors in the SPAM

to be around ±1/√N ≈ 10−2. But if we perform gauge

optimization and seek to minimize the equally weightedsum of Eq. 77, the optimal solution will “split the dif-ference,” finding a gauge in which both the SPAM andgate operations have about 0.5 × 10−2 coherent error.This is a poor and misleading choice of gauge, becausethe highly precise estimate of the gates is polluted byestimation error in the SPAM.

One solution is to adjust the weights in Eq. 77, as-signing different weights to SPAM operations (αi and

γm,i) and gate operations (βi). We have found that agood rule of thumb is to assume that the finite-sampleestimation error in each operation is proportional tothe largest number of times it appears in any circuit –typically L for gates and 1 for SPAM – and then givethe corresponding term in the metric a weight propor-tional to the inverse square of its estimation error. So,in the example above, we would set αi = γm,i = 1 andβi = 104 and minimize Eq. 77. This would tend to splitrelational error between SPAM and gates unequally, as-signing 99% of the error to the SPAM.

Another heuristic for avoiding poor gauge choices,which we find works even better in practice, is to per-form several sequential “stages” of gauge optimization.Each stage uses the Frobenius distance metric (Eq. 77)with different weights and a different optimization do-main. In the case of a fully parameterized gate set thesestages are:

1. Optimize over the full gauge group (over all invert-ible M in Eq. 15) using uniform weights. This findsa decent gauge choice - one that may need sometweaking (see following steps) but that doesn’t as-sign objectively unnecessary error, e.g., by rotatingall of the gates and SPAM with respect to their tar-gets. Starting from this point, we next:

2. Optimize over the unitary group (over all unitaryM) using 100% weight on the gates (αi = γm,i = 0,βi = 1). This rotates the gates into the best ver-sions of themselves possible, and is justified in thecommon case where we know the gates to far higheraccuracy than the SPAM. By restricting to the uni-tary group, we disallow transformations that makethe gates slightly better by drastically stretchingor skewing the notion of distance.

3. Optimize over the SPAM gauge group using 100%weight on the SPAM operations. We define the“SPAM gauge group” as the 2-parameter family{diag(α, β, . . . β) : α, β ∈ R}, where diag(. . .) isa diagonal matrix with the given values on its di-agonal. Matrices of this form slosh error betweenSPAM operations and the non-unital part of thegates (recall that the 0-th basis element is the iden-tity) in a gate set. Thus, this optimization reduceserrors on the SPAM operations at the expense of in-creasing non-unital gate errors. Because the SPAMgauge group doesn’t necessarily preserve completepositivity, we include this as an optimization con-straint.

In the case of a TP gate set, the initial stage ismodified to optimize only over TP matrices M andthe SPAM gauge group of the final stage is re-stricted to the 1-parameter family of diagonal matrices


{diag(1, β, . . . β) : β ∈ R} whose top-left element isone. We treat CPTP gate sets by simply omitting thefirst stage of the procedure followed in the TP case.

Gauge optimization yields a gate set GGO that isgauge-equivalent to G, but effectively in standard form.That is, any distinct yet equivalent gate set would bebrought to the same form by gauge optimization, pro-vided that the same metric was used for gauge opti-mization. From GGO, gauge-variant quantities can nowbe computed, and assigned meaning on the groundsthat at least they’ve been computed in a fixed and well-motivated gauge. Computing gate metrics such as thefidelity or diamond distance requires some type of in-formed gauge-fixing is required, and gauge optimizationas described here is the best method we have found.

6.3 Error barsThe GST analysis pipleine described so far produces apoint estimate of the gate set – a single gate set that fitsthe data well. But estimation is like throwing darts at aboard: no dart ever hits the exact center of the target.Even if we assume there exists a “true” gate set, GMLEwill at best be close to it. We need to estimate howclose. We need what physicists tend to call “error bars”,and what statisticians refer to as a region estimator.

At least three kinds of region estimators are com-monly used: confidence regions [17], credible regions[52, 62], and standard errors [46]. The details of whateach means, and their relative merits, are rather techni-cal, and good discussions relevant to tomography can befound in [17, 32]. Here, we mostly follow the confidenceregion approach, but we do not attempt rigor. Our goalis to demonstrate that putting plausible, reasonable er-ror bars around GST estimates, by adapting well-knowntechniques, is possible. We welcome rigorous critiquesand improvements.

We have successfully used two techniques to equipGST estimates with error bars. The first is bootstrap-ping. The second is likelihood-ratio confidence regions.Our limited testing has found both to reliably producereasonable error bars. However, our testing also makesit clear that GST is a “messy” statistical problem, andfurther research and development are needed. To putthis more bluntly: we would not attempt to publishthe error bar algorithms reported here as a standalonepaper at this time, because they are insufficiently devel-oped. But, at the same time, we believe it is irresponsi-ble to publish or deploy a QCVV protocol without anymention of error bars. Since the overall theory of GSTis mature and overdue for publication, our imperfect so-lution to these frustrated constraints is to discuss thebest available error bar technology for GST, includingthis clear disclaimer.

6.3.1 General problems with region estimators

There is no universal agreement on the “correct” wayto report uncertainty about any tomographic estimate.However, the most principled approach appears tobe to construct and report either a confidence region[17, 32, 51] or credible region [61, 150]. In either ap-proach, the tomographer describes their uncertainty by(1) choosing a probability α (e.g., α = 95%), then (2)reporting a subset R of the parameter space, of moreor less arbitrary shape, for which a statement like “Rprobably contains the true parameter, with probabilityα.” can be made.7

Both approaches pose some significant practical chal-lenges.

1. There is no standard or efficient way to describe anarbitrary region in a high-dimensional space.

2. This construction requires the tomographer to picka particular α in advance. But this value is rarelymeaningful to end users, and there is absolutely noguarantee that a region for α′ 6= α can be extrapo-lated from the α region.

3. Researchers and end users are almost always moreconcerned with some particular scalar functionf(~θ) of the high-dimensional tomographic param-

eter ~θ than with ~θ itself. Extracting an intervalestimate for f(~θ) from a region estimate for ~θ isnontrivial, and if done crudely can dramaticallyoverestimate the uncertainty in f .

None of these problems invalidate the theoretical va-lidity of optimal confidence or credible regions. Butbecause of them, most experiments either ignore er-ror bars, or simplify theoretically rigorous techniques soruthlessly that their provable properties (optimality orrigorous coverage) are lost. For example, the first issuecan be mitigated by specializing to regions of a partic-ular shape – e.g., ellipsoids, or spheres in a particularmetric – but if coverage probability is maintained, thisusually requires regions that are far from optimal andoverestimate uncertainty.

The two methods we discuss here can both be moti-vate by a fairly common ansatz: we assume that localasymptotic normality (LAN) applies. When LAN holds,

1. the likelihood function for the current data is Gaus-sian, and

2. the likelihood function for other hypothetical datasets that could have been observed (but weren’t)is almost surely Gaussian with almost the samecovariance matrix.

7We emphasize that this statement’s precise phrasing andmeaning are tricky, and vary between the two approaches.


These consequences mean that we can treat the underly-ing statistical problem like a Gaussian shift model. TheMLE will be N (~θtrue, σ) distributed around the trueparameter value, and the likelihood function will be aGaussian whose covariance matrix is that same σ.

Inasmuch as this assumption holds, it resolves all ofthe practical problems above:

1. Every optimal confidence region is an ellipsoid cen-tered at the MLE with covariance σ.

2. Changing α just requires scaling that ellipsoid upor down.

3. Confidence intervals for any linear function f(~θ)can be extracted very straightforwardly from theconfidence region (by marginalizing the confidenceellipsoid and applying a Bonferroni correction),

and the same procedure can be used when f(~θ) isnonlinear as long as it’s approximately linear overthe extent of the confidence ellipsoid.

Of course, LAN will never hold exactly, and we can con-struct GST examples where it is a pretty bad approxi-mation. In these cases, ellipsoid confidence regions will(usually) overestimate uncertainty, and (occasionally)have insufficient coverage probability. More research isrequired to identify and map out the regimes where theLAN ansatz used here is unreliable.

6.3.2 Bootstrapping

The basic idea of the bootstrap is very simple: repeat-edly simulate the experiment that generated the realdata to generate a large number of “fake” datasets, an-alyze each of them to produce a point estimate, anduse their scatter (variance) to construct a region. Thisworks very well for Gaussian shift models, as long asa large enough sample of fake datasets is constructed.It is known to break down when the estimator is bi-ased, when it is significantly heteroskedastic, or if thebootstrap distribution is undersampled. Both bias andheteroskedasticity are known problems for state andprocess tomography due to positivity constraints. Wefind these to be less important for GST for two rea-sons. First, we do not usually impose CPTP constraintsexplicitly. Second, long-sequence GST can achievesuch high accuracy that the data bound the estimatecomfortably away from the zero-probability boundarieswhere heteroskedasticity becomes large. When we havecross-validated bootstrapping against the alternativemethod described below, we have observed consistentresults.

The two basic types of bootstrap are parametric (inwhich the “fake” datasets are simulated from scratchusing the MLE gate set) and non-parametric (in which

each circuit’s outcomes are resampled directly from thedata, with replacement). We have performed bothtypes, and have not seen significant differences. How-ever, on theory grounds, the parametric version shouldbe more reliable because it is less affected by constraint-induced bias, and by certain small-sample-size effects.

Applying standard bootstrap methods to GST en-counters two specific problems. First, the GST modelhas a lot of parameters. One qubit gate sets usuallyhave at least n = 30 free parameters, while 2-qubit gatesets usually have at least n = 1000. Unless at leastn2 samples are generated, the sample covariance willnecessarily be rank-deficient, so estimating the popula-tion covariance by bootstrapping requires at least O(n2)samples. Since each bootstrap sample requires runningan MLE analysis on a fake dataset, which can take min-utes or hours, the time required to generate 106 (or even1000) samples is excessive.

So, instead of trying to construct a confidence regionfor the entire gate set (which, as noted above, is almostnever directly useful), we use the bootstrap to constructconfidence intervals for scalar parameters directly. Es-timating the uncertainty in a single parameter this wayrequires only O(1) samples, and k distinct parameters(e.g., all the matrix elements of the transfer matricesin the gate set) can be reliably bootstrapped with onlyO(log k) samples.

The second issue is more subtle, and stems from thegauge freedom in gate sets. We discuss this in Section6.3.5 below.

6.3.3 Confidence regions based on the Hessian of the like-lihood

Likelihood-ratio (LR) confidence regions are an alterna-tive and logically independent approach to quantifyinguncertainty. They are based on the intuition that thetrue ~θ will probably have a high likelihood – not muchless than the maximum – and so the set of all ~θ withlikelihood above some threshold constitutes an efficientconfidence region. The basic theory for LR confidenceregions in tomography can be found in Ref. [17].

If we assume that LAN applies, then the likelihoodfunction is Gaussian, its logarithm is quadratic, and soit is completely determined by its second derivatives orHessian evaluated at the MLE. Under these assump-tions, a LR confidence region can be computed simplyby

1. Calculating the Hessian H of the loglikelihood atthe MLE, which is somewhat tedious but straight-forward.

2. Inverting H and multiplying H−1 by an appropri-ate scaling factor to define an ellipsoid that coin-


cides with the a specific contour of the likelihoodfunction.

The desired contour is defined by χ2 theory, and corre-sponds to gate sets G whose loglikelihood satisfies

−2 log(L(G)L(G)

)= λthresh(k, α),

where k = Nnongaugep is the number of non-gauge param-

eters in the gate set, L(G) is the maximum likelihoodfor any gate set (e.g., GST’s point estimate), α is thedesired confidence level, and λthresh is the α quantile ofa χ2

k distribution.

Confidence intervals for linear functions f(~θ) = ~v ·~θ ofthe gate set can be derived easily from this procedure.We simply project the covariance matrix H−1 onto thesubspace spanned by ~v to get the variance of a scalar,and then set λthresh based on k = 1 free parameter.However, we have to deal with the gauge freedom first.

6.3.4 Dealing with gauge freedom

GST’s gauge freedom creates problems for both thebootstrapped and the Hessian-based regions describedabove. All the gate sets on a gauge orbit are completelyequivalent. Imagine that we could parameterize the setof all gate sets as G = (X,Y ), where X describes whichgauge orbit G is on and Y specifies its location on thegauge orbit. X would include all the gauge-invariant pa-rameters, and Y would constitute pure gauge parame-ters. In this scenario, how would we describe our uncer-tainty about (i.e., put error bars on) Y ? On one hand,the data provide absolutely no information about Y ,suggesting total uncertainty and large error bars. But,on the other hand, we may simply choose a gauge byfiat, suggesting absolute certainty and vanishing errorbars.

The answer is made clear by considering how “gaugeparameters” (Y , above) affect the variables that sci-entists wish to learn from GST – e.g., superoperatormatrix elements, fidelities, or diamond norms. Theseare all functions of the gate set, but they are not gaugeinvariant. So if we know X exactly, but allow Y tovary wildly, then these desirable variables will also vary.Thus, if we assign large error bars to gauge parameters,we necessarily get large uncertainty about fidelities andother gate properties – even if we have learned every-thing that can be learned about the gate set (X) to highprecision!

So we can only assign consistent error bars if we fixthe gauge. Every point estimate necessarily has a fixedgauge, by its very nature – G = (X, Y ), so G has afixed value of Y (gauge). But a region is a set of points{G}. It is easy to construct a confidence region contain-ing gate sets with different gauges – say, G1 = (X1, Y1)

and G2 = (X2, Y2), where Y1 6= Y2. If we allow thisto happen, then the error bars on interesting but non-gauge-invariant quantities will be artificially inflated.So fixing the gauge for a region requires ensuring thatevery element of the region has the same Y coordinates.

This is harder than it sounds, and is actually some-thing of a category mistake. Above, we said “Imaginethat we could” separate G into gauge-invariant param-eters and pure gauge parameters. This is not generallypossible in gauge theories, because it would require aglobal definition of parallel transport that does not gen-erally exist. Different gauge orbits are typically not thesame, or even isomorphic. For example, consider twogate sets G1,G2 that each contain just a single gate Gi– but in G1, Gi is the identity matrix, while in G2, Giis nondegenerate. One Gi is invariant under all gaugetransformations, while the other is not. So the gauge or-bit of G1 is a zero-dimensional point (the entire gate setG1 = {Gi} is invariant under gauge transformations),while that of G2 is much larger and more complex. Nocoordinate system (Y ) can describe them both, and ifa confidence region includes both G1 and G2 there is noway to demand that they have “the same Y coordinate”.

Instead, we gauge-fix a region by first identifying eachorbit (equivalence class of gate sets) in the region, as-signing it a single representative gate set, and selectingthose representatives to minimize the size (variance) ofthe resulting set of gate sets. This approach can be usedfor both bootstrapped and Hessian-based regions, butplays out differently in practice for the two construc-tions.

6.3.5 Gauge-fixing bootstrap regions

We have found the following procedure effective at elim-inating effects of gauge freedom from boostrapped errorbars. Each gate set that is sampled via the bootstrap is“gauge optimized” using a fixed procedure, which setsits gauge to make it as close as possible to the tar-get gate set. This is a good operational approximationto requiring every element of the region to be “in thesame gauge”. By making every representative as closeas possible to a fixed target, it minimizes the radius andvariance of the entire region.

6.3.6 Gauge-fixing Hessian-based regions

This is a bit trickier, because in normal statistical sce-narios, the Hessian H of the loglikelihood is full rank.But each gauge degree of freedom creates a direction inparameter space, at the MLE, along which the loglike-lihood is locally constant. It has no curvature in thesedirections, and thus its second derivative vanishes, giv-ing H a kernel. Before we can invert H, we must getrid of these zero eigenvalues by projecting H onto its


support. This projection corresponds to “slicing” anuncertainty ellipsoid that is infinitely extended alongthe tangent plane of the gauge orbit at the MLE, whichyields a finite and bounded ellipse. However, minimiz-ing the size of the region requires choosing the rightplane in which to slice. This is somewhat technical,and details are given in Appendix A.2. The resultingprojected Hessian has full rank and can be inverted andanalyzed as given above. It defines a covariance tensorin (non-gauge) gate set space that, when scaled by anappropriate factor, defines an ellipsoid that is a validα confidence region. Writing down this ellipsoid ex-plicitly (as a tensor), while possible, is not very practi-cal. Instead, we use it to define error bars (confidenceintervals) for any and all interesting scalar quantities(e.g., fidelities, diamond norms, superoperator matrixelements, etc).

We compute such confidence intervals as follows. Letf(G) be a scalar function of a gate set parameters withlinearization f(G) ≈ f0 + ∇f · (G − GMLE). We definean α confidence interval around the best-estimate valuef0 = f(GMLE) by computing

δf =√

(P (∇f))† · (P (H)/C1)−1 · P (∇f) (78)

where P indicates projection onto the (correctly chosen)support of H, P (H) is the projected Hessian and P (∇f)is the similarly projected gradient of f . C1 is a scalarconstant which satisfies CDF1(C1) = α, where CDFkis the cumulative density function of the χ2

k probabil-ity distribution. With δf so defined, f0 ± δf specifiesa good α confidence interval for f . Within the linearapproximation to f , which is valid for small δf , thisinterval corresponds to minimizing and maximizing thevalue of f over the contour of the loglikelihood corre-sponding to a α confidence interval if the loglikelihoodhad a single parameter.

We emphasize that this does not construct a α con-fidence region. Equation 78 can be used to construct αconfidence intervals for each of the Nnongauge

p parame-ters, but the region formed by taking the product of allthese intervals contains the truth only if every one ofthe intervals contains its parameter. This occurs withprobability at least (α)N

nongaugep , not (α). A true α re-

gion can be readily constructed by simply replacing C1with Ck in Eq. 78, where k = Nnongauge

p .

When error bars on a scalar quantity are needed, webelieve the confidence intervals constructed above aremore meaningful than projections of the full α confi-dence region, for two reasons. First, these intervals areconsistent with the error bars reported by the paramet-ric bootstrap, which yields standard errors for each pa-rameter independently. Those intervals would have tobe expanded significantly to construct a joint confidence

region. Secondly, we are often interested in the uncer-tainty of single scalar quantity (e.g., diamond norms),independently of others. The confidence intervals con-structed this way correctly measure this uncertainty.

7 SummaryWe have presented gate set tomography as a method forcharacterizing a quantum logic device. GST is in someways a successor to existing tomographic methods suchas quantum state and process tomography, but it is alsofundamentally different: a gate set is a entity unto itself,not just a collection of unrelated components. So GSTis really a new type of tomography that probes a dif-ferent object (a gate set) with unique properties. Thereis still considerable overlap between GST and other to-mographic methods, so we conclude by recalling andsummarizing GST’s primary distinguishing (and novel)features.

• Calibration free: GST neither requires nor evenconsiders a pre-calibrated reference frame, nor re-lies on any priori assumptions about the accuracyof any operations.

• Self consistent: GST produces self-consistent de-scriptions of all the operations (state preparations,gates, and measurements) exposed by a quantumprocessor. This is different from estimating eachquantum state and superoperator independently,both because it requires no reference frame, andbecause (as a result) it exposes gauge degrees offreedom that make some errors relational betweenoperations, rather than endemic to a specific gate.

• Hyperaccurate: The use of deep, periodic cir-cuits enables GST estimates’ precision to achieveHeisenberg-like scaling (modulo gauge freedom; seeAppendix E).

• Constraint-friendly: GST is not restricted to es-timating arbitrary quantum transfer matrices, butcan naturally incorporate almost any constraint orphysics-inspired model, by treating a gate set asa parameterized model. Immediately useful exam-ples includ restricting the estimate to be trace pre-serving (TP) or completely positive and trace pre-serving (CPTP).

• Model validation: It is very natural within GSTto detect and quantify violation of the underlyingmodel that is being fit, using standard statisticaltechniques. This is a useful byproduct of the over-complete experiments used by GST, and can beused as a built-in warning system to identify sys-tematic errors in characterization.


Although this paper is intended as the first compre-hensive description of gate set tomography, includingthe details of the long-sequence variant, GST has beenused in a variety of published experiments [14, 19, 42,72, 82, 85, 104, 128, 137, 153, 167, 170, 176] in severalqubit technologies. Like all protocols, it has limitations,which we’ve tried to expose and discuss here. But GSTis also a mature protocol with extensive real-world val-idation in experiments. A reference implementation ofGST is available in the free, open-source Python pack-age pyGSTi [120, 121]. Extensions of GST to include ad-ditional constraints and to analyze time-dependent dataare currently under investigation, as are variants of theprotocol which scale more favorably with the number ofqubits. But even if these extensions should fail to pro-duce useful practical tools, the development of GST todate has contributed something we believe is of lastingvalue: it demonstrated the principle of self-consistentlycharacterizing all the operations on a quantum logic de-vice, and revealed that this problem is both more andless than the sum of its parts, thanks to the unexpectedrole of gauge freedom.

Acknowledgements. This work was supported by the U.S.

Department of Energy, Office of Science, Office of Advanced

Scientific Computing Research Quantum Testbed Program,

Intelligence Advanced Research Projects Activity, and the

Laboratory Directed Research and Development program

at Sandia National Laboratories. Sandia National Labo-

ratories is a multi-program laboratory managed and oper-

ated by National Technology and Engineering Solutions of

Sandia, LLC., a wholly owned subsidiary of Honeywell In-

ternational, Inc., for the U.S. Department of Energy’s Na-

tional Nuclear Security Administration under contract DE-

NA-0003525. All statements of fact, opinion or conclusions

contained herein are those of the authors and should not be

construed as representing the official views or policies of the

U.S. Department of Energy, or the U.S. Government.

A Gauge

A.1 Gauge degrees of freedomGST estimates gate set parameters by varying theseparameters to better fit a given set of data. A naivegoal would be to estimate every parameter with highaccuracy – i.e., to estimate a best-fit point within theparameter space P. However, it is usually the case (inGST analyses) that at every point in parameter spacethere are directions along which none of the predictedprobabilities change. Variations along these directionsdo not affect the goodness-of-fit (likelihood) at all. Wecall them gauge directions.

Gauge directions arise from the class of gauge trans-

formations TM :M→M acting on the space of explic-itly represented gate sets. They were defined by Eq. 15and we repeat them here:

〈〈E(m)i | → 〈〈E(m)

i |M−1

|ρ(i)〉〉 →M |ρ(i)〉〉Gi →MGiM

−1, (79)

where M is any invertible superoperator. In this ap-pendix, we refer to these as matrix-space gauge trans-formations. Such transformations map one gate set inM to another without changing any observable prob-ability (see Eq.18). The set of transformations TMforms a group isomorphic to the Lie group GL(d2),as TM · TM ′ = TMM ′ and (TM )−1 = TM−1 . We callM ∈ GL(d2) an element of the “gauge group” GL(d2).The action of TM can be pulled back by the gate setmodel’s mapping (W ) to act on the parameter spaceP. This pullback is not always a well-defined functionas TM does not, in general, respect the constraints onM imposed by W . When it is well-defined, we callthe resulting pulled back transformation on parameterspace a parameter-space gauge transformation. In thisappendix, the unqualified term “gauge transformation”will always refer to a parameter-space transformationand TM : M → M will be referred to as a “matrix-space gauge transformation”.

In the case of the fully parameterized model, P isisomorphic to M, and the gauge transformations areidentical to the matrix-space gauge transformations ofEq. 15.

In the case of a TP parameterized model, the gaugetransformations are given by restricting Eq. 15 to theparameterized elements of M (i.e. excluding the firstrow of Gi and the first element of |ρ(i)〉〉) and con-straining M to be TP (i.e., its first row should be[1, 0, 0, . . . 0]). The gauge transformations of the TPmodel form a subgroup of the matrix-space gauge trans-formations, which is isomorphic to the subgroup ofGL(d2) comprised of all d× d matrices whose first rowis [1, 0, 0, . . . 0]. This nice structure results in a well-defined gauge group on the parameter space.

When dealing with a CP (or CPTP) parameterizedmodel, there is no such structure and the set of gaugetransformations that respect the model’s constraintsvaries from point to point in parameter space. TheCP constraint does not correspond to a constraint ofM ∈ GL(d2) to one of its subgroups, and so there isno well-defined gauge group. In this sense, we say theCP constraint “does not play nicely with the gauge”. Inthis scenario, it may be possible to identify a gauge sub-group that does respect the constraints. For example,consider restricting the set of all matrix-space gaugetransformations to those for which M corresponds to


a unitary action on H. Since these transformationsrespect the CP and TP constraints and have a groupstructure isomorphic to the unitary group, then form aproper gauge subgroup.

Each gauge direction at point p ∈ P corresponds toa degree of freedom within the class of gauge transfor-mations. For example, in a single-qubit fully parame-terized gate set model, the gauge transformations arein one-to-one correspondence with the invertible 4 × 4matrix M of Eq. 15. So there are 16 independent gaugedirections at every point in parameter space, defined byvarying each of the elements of M . In a single-qubitTP-parameterized model there are only 12 such gaugedirections because of the restriction that M be TP. Inthe CP (CPTP) case there are 16 (12) gauge directionsat every point in parameter space except those corre-sponding to gate sets where at least one gate lies onthe boundary of CP space. At such points, the gaugefreedom is constrained, and there are fewer gauge direc-tions.8

As we stated in the main text, the number of gaugedirections is fixed almost everywhere by the gate setmodel, but the gauge directions themselves vary fromone parameter-space point to another. The gauge free-dom thus gives rise to curved “gauge manifolds” (or or-bits) within parameter space whose points all yield iden-tical probabilities and thus identical fits to any data. IfS : P → P is a gauge transformation and ~θ is a vec-tor of gate set parameters, then the gate sets W (~θ) and

W (S(~θ)) are physically indistinguishable (W : P → Mis the gate set model’s mapping). This situation is simi-lar to that found in electromagnetism and quantum fieldtheory, where the natural mathematical description ofan entity is over-complete or redundant.

The parameter space, because it is foliated into gaugemanifolds, takes the structure of a fiber bundle. Fibersare given by the local gauge directions, and the basemanifold given by the perpendicular non-gauge direc-tions. Importantly, the base manifold depends on themetric chosen for P. In essence (see Figure 7), P canbe viewed as a space of physically distinguishable fibersof equivalent gate sets (or, more precisely, their pre-images). In this picture, “fixing a gauge” means choos-ing a representative point from each of the fibers to de-fine a set (which may be a manifold, if the choice is per-formed in a smooth way) of physically distinguishablegate sets. When the gate set model yields gauge trans-formations with a group structure – i.e., when there isa well-defined gauge group – the fibers correspond toorbits of the gauge group.

Let us consider the tangent space Tq at a given point

8There may fewer than 16 and 12 gauge parameters in specialcircumstances when there are classes of would-be gauge transfor-mations which commute with all of the gates and SPAM elements.

𝒫”parameter space”

ℳ”matrix space” W

𝒢mle = 𝑊(𝜃mle )

𝜃mle

TM

S

(if pullback exists)

Figure 7: Matrix and Parameter spaces. The matrix spaceMof gate sets has an inherent fiber bundle structure as describedin the text. The gate set model’s mapping W : P → M canbe used to pull back this structure onto the parameter spaceP. Each fiber in M consists of a gauge-equivalent set of gatesets, and movement within these fibers is accomplished by thematrix-space gauge transformations TM (Eq. 15). Movementwithin the fibers of P are accomplished by “gauge transfor-mations” as referred to by the text. Since the goodness-of-fitbetween a gate set and data is invariant under gauge trans-formations, the “best-fit” gate set, GMLE is really a best-fitfiber.


q ∈M. Recall that the dimension of M is given by Ne(Eq. 65). Tq is spanned by the derivatives with respectto some basis forM. The derivatives of the infinitesimalgauge transformations will span a subspace of Tq, whichwe call the gauge subspace at q. Let us write an elementof the gauge group GL(d2) (see Eq. 15) as

M = exp(K), (80)

where K is a d2×d2 real matrix. The space of infinitesi-mal gauge transformations is spanned by the operations

〈〈E(m)i | → 〈〈E(m)

i |(I −K)|ρ(i)〉〉 → (I +K)|ρ(i)〉〉Gi → Gi + [K,Gi], (81)

which can be found by inserting Eq. 80 into Eq. 15 andkeeping only first order terms in K. Square brackets inEq. 81 denote the commutator. Taking a derivative withrespect to each element Kjk yields the gauge subspacedirections

−〈〈E(m)i |ujk,

ujk|ρ(i)〉〉, and

[ujk, Gi] , (82)

where uij is the matrix whose i, j-th element equals 1and whose other elements are zero. These gauge direc-tions can be viewed as length-Ne vectors, ~dQij = dTM

dKij ∈Tq, where i and j range between 1 and d2. Let dQ be

the Ne × d4 matrix with the ~dQij as its columns, andlet dP be the Ne × Np Jacobian of W at point p ∈ P.The first Np components of each vector in a basis forthe nullspace of [dP |dQ] give a basis for the pullback of

the gauge subspace to T~θ, the tangent space at ~θ ∈ P.That is, the columns of P, where[

PQ

]≡ nullspace([dP |dQ]), (83)

form a basis for the gauge space at ~θ and projectiononto this space is given by the action of the projector

Pgauge = P(PTP

)−1 PT . (84)

We call the number of gauge directions at each point(equivalently the dimension of the fibers) the “num-ber of gauge parameters”, and denote it Ngauge

p . Thenumber of remaining dimensions in parameter space,Nnongauge

p = Np − Ngaugep (also the dimension of the

base manifold) is called the “number of non-gauge pa-rameters”. Note that while the gauge directions gener-ally vary from point to point, the number of total (Np),gauge (Ngauge

p ), and non-gauge (Nnongaugep ) parameters

are globally fixed almost everywhere by our common

gate set models (full, TP, and CP) and can be treatedas constants.

Gauge degrees of freedom are very relevant to the cen-tral task of parameter estimation because at any pointin parameter space there is no difference in the good-ness of fit along the gauge directions (i.e. along thegauge fiber). This means that though we naively desirea best-fit point in parameter space (i.e. knowledge ofall the parameters), the data can at most determine abest-fit fiber (a point of the base manifold). GST dataanalysis (e.g., MLE) seeks to find a gate set whose pa-rameters mark a point within the best-fit fiber: the datacannot distinguish between points within a fiber. Anyspecific best-fit gate set necessarily assigns values to allNp parameters, but only Nnongauge

p linear combinationsof those parameters are actually physically meaningful,or can have finite error bars, as discussed in Section A.2.Gauge transformations, which act as automorphisms ofeach gauge fiber, may be used to map the initial best-fit gate set to another physically identical one. We saythat two gate sets mapped to each other under gaugetransformations are gauge-equivalent.

This gauge ambiguity – the inability to pin downNgauge

p parameters of a gate set – is unavoidable andstems from a relativism intrinsic to gate sets (as dis-cussed in the main text). For example, there is nounique matrix representation for a X(π/2) gate. In-stead an X(π/2) gate is defined by its behavior relativeto the other gates, state preparations, and effects whichcomprise the gate set. This is related to the standardfreedom to choose a basis in linear algebra; the funda-mental issue here is that the data do not and cannotchoose a single preferred basis (i.e. reference frame).

Gauge ambiguity makes it difficult to compare twogate sets, since two distinct gate sets corresponding todifferent points in P may actually be physically equiv-alent.

Example. Consider the gate set G = {ρ = |0〉〈0| , E =|1〉〈1| , G1 = GX}, where GX : ρ → σxρσx. Any uni-tary change of basis is a valid gauge transformation,so G is gauge-equivalent to G′ = {ρ′ = |+〉〈+| , E =|−〉〈−| , G1 = GZ}, where GZ : ρ → σzρσz. If an ex-perimentalist intended to produce G, but tomographyreports the (equivalent) gate set G′, then by all the ob-vious metrics the device is wildly out of spec. Of course,this is merely a misunderstanding resulting from differ-ent choices of reference frame. To avoid such communi-cation failures, it is necessary that all parties agree ona shared gauge.

A.2 Gauge considerations in error barsFollowing Section 6.3.6, suppose we compute the Hes-sian of the loglikelihood at a given point in parameter


space. In the absence of gauge freedoms the Hessianwould be full rank. In our case, however, the presenceof Ngauge

p gauge degrees of freedom mean that at theMLE there are Ngauge

p directions in parameter spacealong which the loglikelihood has no (zero) curvature.Stated equivalently, the Np × Np Hessian matrix willhave Ngauge

p zero eigenvalues. If one were to proceedwith the Hessian-based confidence region formalism, ig-noring the gauge freedoms, trouble would arise whenneeding to invert a rank-deficient Hessian.

As usual, the gauge freedom gives us the opportunityto make a choice – namely how to fix the gauge. Fromthe point of view of uncertainty analysis, we don’t needto have any uncertainty about the position along thegauge directions because we can fix it. The data tellus nothing about the gauge directions because they arephysically meaningless, yet because they are meaning-less we may choose them however we want, and there-fore know them completely.

Practically this means we may ignore the the Hes-sian components along the gauge directions and onlyconsider those along the “non-gauge” directions, de-fined as those orthogonal to the gauge directions. How-ever, orthogonality is metric-dependent. Our choice ofa parameter-space metric will determine which direc-tions are considered to be the non-gauge directions, andtherefore will influence the Hessian and consequentlyany region estimates and error bars. Recognizing this,and dealing with it, is critical for assigning sensible errorbars.

To visualize how error bars are affected by the choiceof a metric, let us return to the fiber-bundle picture ofparameter space presented in Section A.1 and Figure 7above. The MLE, a single point in parameter space,is an arbitrary point within the fiber corresponding tothe best-fit equivalence class of gate sets. The gauge di-rections are well-defined: they are the directions givenby Eq. 82 along which the loglikelihood is invariant,and they form a Ngauge

p -dimensional space. However,there are many different Nnongauge

p -dimensional non-gauge spaces which are linearly independent from thegauge space, and would be considered orthogonal to thegauge space in some metric. As Figure 8 illustrates,choosing this space is like choosing a specific angle/wayof cutting through the fiber bundle. by choosing a spe-cific non-gauge space, we are essentially choosing howmuch movement along the fibers (the gauge directions)should be coupled to movement between the fibers.

Almost any set of Nnongaugep independent directions,

or equivalently almost any metric, will yield a valid non-gauge space. Moreover, there is no “correct” metric:we are free to choose whichever we please. But ourchoice will affect the error bars. So, to choose wisely, wemay perform an optimization over metrics, minimizing

Non-gauge direction A

Figure 8: Diagram representing different choices of the non-gauge directions used to define the space on which error bars arecomputed. Lines represent different gauge orbits (fibers), withthe central red line indicating the maximum-likelihood (ML)orbit. Computing Hessian-based error bars requires that we fixthe gauge not just within the ML orbit but in a region aroundthis orbit (so we can project the Hessian and remove its zeroeigenvalues). Performing this regional gauge fixing can be donein many ways, as it amounts to selecting a point within eachfiber; the arrows give two such examples.

a weighted sum of the size of the error bars on 1) thematrix elements of each superoperator representation,and 2) the vector elements of each state preparation andmeasurement effect.

An alternative way of choosing a metric involvespartitioning the parameters into “gate” and “SPAM”groups. We then define each group’s “intrinsic error”as the minimum average size of the error bars on pa-rameters in that group when other groups’ error barsare ignored (allowed to be large). We then choose themetric which weights the parameters of the differentpartitions according to the ratios of their intrinsic errorrates.

Once a metric is selected, the Hessian can be pro-jected onto the corresponding non-gauge space by con-structing a projector using an expression identical toEq. 84, but where the columns of P are non-gauge ratherthan gauge directions.

B The extended LGST (eLGST) proto-colThe extended-LGST (eLGST) protocol was an impor-tant milestone on the way to the long-sequence GSTprotocol described in the main text. As we state in Sec-tion 4.1, it utilizes structured circuits based on repeatedgerms (similar to long-sequence GST), but uses LGSTin combination with a least-squares objective functionto arrive at its final estimate (instead of MLE used inlong-sequence GST). As such, it constitutes an interest-


ing fossil - an extinct intermediate step - that, despitebeing of little practical use, is worth outlining here.

The eLGST analysis proceeds as follows. Let {gi}be the set of germ circuits, and let the data constitute

LGST experiments on base circuits of the form glji –

that is, l repetitions of each germ circuit, for some setof integers {lj}. Each circuit is repeated N times toyield data.

First, LGST is used for each base circuit, to obtaina reasonably accurate estimate of τ(gi)lj for that base

circuit. We denote this estimate by τ(gi)lj , and assumethat

τ(gi)lj = τ(gi)lj ±O(1)√N. (85)

Now, if the gi were just numbers (not matrices), thenwe could get a high-precision estimate by simply taking

the low-precision estimate τ(gi)l, for some very large l,and computing its lth root,

τ(gi) =(τ(gi)l

)1/l= τ(gi)±

O(1)l√N. (86)

Getting this same result when gi is a matrix is a bitmore complicated, because matrix roots are very un-stable. To make this concept work for matrices, weneed to 1) iteratively increase l and 2) avoid explicitlytaking lth roots of matrices. These technical points arediscussed below. The following iterative algorithm isused in eLGST to get a high-precision estimate of gi:

1. Perform LGST on base circuits of the form gli forl = 1, 2, 4, 8, 16 . . . lmax to obtain approximate esti-

mates τ(gi)lj , where i indexes the germ circuit.

2. For each gi, set the initial τ(gi) equal to τ(gi)1.

3. Use least-squares optimization to find the τ(gi)that minimizes

δ2 = |τ(gi)− τ(gi)1|22 + |τ(gi)2− τ(gi)2|22. (87)

4. Repeat Step 3 with a new cost function δ4 = δ2 +|τ(gi)

4− τ(gi)4|22.

5. Continue repeating Step 3 with cost functionsδ8, δ16, . . . δlmax until all data has been incorpo-rated.

6. Use the same approach to extract high-precisionestimates of the gates (Gk) from the estimates ofthe germs (τ(gi)).

This algorithm was found to be reliable and accuratewhen tested on simulated data, and thus inspired thecurrent stable algorithm used for long-sequence GST.

The problems with matrix roots mentioned aroundEq. 86 are worth a brief discussion. The first problemis demonstrated even by the simple case where τ(gi) is acomplex scalar, τ(gi) = eiθ. Now τ(gi)l = ei(lθ mod2π),and so the lth root is multi-valued (i.e., θ → θ + n 2π

lleaves all observable probabilities invariant). We haveto choose the right branch. This is impossible withoutfurther information.

We solve this problem by iteratively increasing l from1 to its final value logarithmically. For example, if ini-tially l = 32, we would include the base circuits forl = 1, 2, 4, 8, 16, 32. The additional cost is only logarith-mic in maximum l, and as long as N is large enough,it completely solves the problem of choosing the correctbranch. We begin by using the l = 1 data to get a de-cent estimate of τ(gi) ± O(1)/

√N . Then, we use the

l = 2 data to deduce that

τ(gi) ≈ ±√g2i , (88)

and use the l = 1 estimate to identify unambiguouslywhich of the two branches indicated by the ± symbolis correct. We repeat this process recursively for eachsuccessively larger l.

The second problem is peculiar to matrices. The pro-cedure given above works very well for scalar τ(gi) =eiθ, but not for matrix-valued τ(gi). To see this, con-sider the example of

τ(gi) = σZ . (89)

Clearly, τ(gi)2 = 1l. If we perform LGST on τ(gi)2,

we will generally obtain g2i = 1l±O(1)/

√N , where the

error term is a small, random matrix. Suppose (without

loss of generality) that we get g2i = 1l+εσx = εσx, where

σ· denotes a standard Pauli matrix. There are multiple

square roots of g2i , but this isn’t the main problem; the

main problem is that none of them are even close to thetrue value of τ(gi) = σz! Instead, every square root ofεσx is diagonal in the σx basis.

The root of the problem is that the matrix square-root function is highly non-smooth, and can rip apartthe topology of matrices. To fix it, we observe that we

should be looking for a gi such that gi2 ≈ g2

i – not such

that gi ≈√g2i . These two equations are not equivalent

due to the topological violence that can be hiding in thematrix root.

These tricks are implemented in the final version ofeLGST given above, and were pivotal to the success ofeLGST as a protocol.


C Addressing overcompleteness inLGST with pseudo-inversesAlthough using exactly Nf1 = Nf2 = d2 fiducial statesand effects makes the mathematics of LGST convenient,there are practical advantages to using more than d2

fiducials. They include

• Redundancy: overcomplete fiducials provide moreinformation and allow self-consistency checking,

• Precision: with some gate sets, such as Cliffordgates, optimal fiducial sets (2-designs) can only begenerated with more than d2 fiducials,

• Feasibility: if only projective measurements with deffects are used, then at least d+ 1 of them are re-quired for informational completeness, which yieldsd2 + d effects.

Overcomplete fiducials yield vector spaces of ob-servable probabilities whose dimension exceeds that ofHilbert-Schmidt space. The A, B, and (generally) Pkmatrices are not square, and therefore not invertible.However, all the steps from the previous derivation stillhold – they just require slightly more complicated linearalgebra.

Let us return to the derivation at Eq. 37. We havemeasured all the Nf1 ×Nf2 elements of Pk and 1l, andwe want to solve for Gk in these two equations:

Pk = AGkB

1l = AB.

The elements of Pk and 1l will experience small ran-dom perturbations because of finite sample errors inestimates of probabilities. So in general, these equa-tions will have no exact solutions – the rank of Pk and1l will be higher than d2, but AB can only have rank d2

(because we demand reconstructed gates of this dimen-sion). Therefore, we seek the best approximate solution.A simple and elegant criterion for “best” (though notactually the best criterion; see “Maximum likelihoodestimation” below) is least-squares. We seek the Gk(and also A and B) that minimize |Pk − AGkB|2 and|1l−AB|2.

To find A and B, we start with the identity

|1l−AB|2 = Tr[1lT 1l− 1lTAB −BTAT 1l +BTATAB

],

(90)differentiate with respect to A (using the rule that∇ATr[AX] = XT and ∇ATr[ATX] = X), set the re-sulting gradient to zero and solve. This yields

A = 1lBT (BBT )−1. (91)

Some useful observations about this expression:

• This is a valid generalization of the expres-sion derived previously (A = 1lB−1), becauseBT (BBT )−1 is a right inverse of B – i.e., multi-plying it on the left by B yields 1l.

• Furthermore, if B is square and invertible, thenthis inverse is unique, so this expression reduces toA = 1lB−1 in that case.

• This inverse is well-conditioned. Although B isgenerally singular, BBT is square and full-rank (byinformational completeness).

Next, we write out |Pk −AGkB|2 the same way, takethe gradient with respect to Gk, set it to zero, and solvefor Gk. This yields

Gk = (ATA)−1ATPkBT (BBT )−1, (92)

and substituting A from Eq. 91 yields

Gk = B

[BT

(B1lT 1l

)−1B

]1lTPk

[BT (BBT )−1] .

(93)This, again, is a valid generalization of the expression

derived earlier (Gk = B1l−1PkB

−1). If B is invert-ible, then this entire expression reduces to the earlierone. But this expression only requires inverting d2× d2

matrices that have full rank if the fiducials are informa-tionally complete.

One problem remains; B (which is still unknown) ap-pears in the estimate. In the previous derivation, whereB was assumed to be square and invertible, we showedthat although the final expression for Gk involved B, itonly determined the gauge, and had no impact on anypredicted probability.

In Eq. 93, this is not quite true. B does affect theestimate. . . but only through its support. Recall thatB’s dimensions are now d2 ×Nf1. If the fiducial statesare informationally complete, then B has rank d2. Itscolumns span the entire d2-dimensional state space, butits rows only span a d2-dimensional subspace of theNf1-dimensional space of observable probabilities.

We can write B as B = B0Π, where Π is a d2 ×Nf1projector onto some d2-dimensional subspace, and B0is an arbitrary full-rank d2 × d2 matrix. Substitutingthis into Eq. 93 yields a remarkable simplification:

Gk = B0

(Π1lT 1lΠT

)−1 (Π1lTPkΠT

)B−1

0 . (94)

Here, B0 clearly determines the gauge and only thegauge. Π, on the other hand, has a real effect. What’shappening here is simple: the high-dimensional infor-mation contained in the data (1l and Pk) has to becompressed into the lower-dimensional space of the es-timate Gk. The estimator is a linear map, so the only


way to do this is by projection, and Π (the projectoronto the support of B) is that projection. So we arefree to choose B more or less arbitrarily (as long as ithas rank d2) – but its support will affect the estimate,while its form on that support affects only the gauge.

To choose Π optimally, we recall that A and B arechosen to minimize |1l−AB|2. If we write out AB usingEq. 91 and the identity B = B0Π, we get

AB = 1lBT (BBT )−1B = 1lΠTΠ, (95)

and if we define the complement projector Πc =1lNf1×Nf1 −ΠTΠ, then

|1l−AB|2 = Tr[Πc1l

T 1lΠc

]. (96)

This is uniquely minimized by choosing Π to be theprojector onto the d2 right singular vectors of 1l withthe largest singular values. With this specification forΠ, Eq. 94 is a completely specified LGST estimator forGk.

A more or less identical (but easier) calculation yieldslinear inversion estimates of the native states and ef-fects, in terms of the directly observable probability

vectors ~R(l) and ~Q(m)l :

|ρ(l)〉〉 = B0

(Π1lT 1lΠT

)−1Π1lT ~R(l), (97)

〈〈E(m)l | =

[~Q

(m)l

]TΠTB−1

0 . (98)

D Implementations in pyGSTiThis appendix provides descriptions of several algo-rithms used in pyGSTi’s implementation of gate set to-mography. These are not included in the main text be-cause they describe implementation choices that are notfundamental to GST as a protocol. These implementa-tion details demonstrate that various steps of the GSTprotocol are tractable in practice, and may be helpfulto readers seeking to implement GST in computer code.

D.1 Circuit selectionRecall that the primary goal when selecting circuits forlong-sequence GST is to amplify all possible gate er-rors. More specifically, the germs must amplify all ofa gate set’s parameters with the exception of those lin-ear combinations that correspond to gauge freedoms –we want to amplify Nnongauge

p independent parameterdirections as defined in Appendix A.1. By imposinga periodic structure on circuits (see Section 4.2), thechoice of circuits becomes one of choosing (together)a set of germs gi, germ-powers p, and effective SPAMpairs (when fiducial-pair reduction is used, see 5.2). The

native set of SPAM operations is a property of the targetgate set (input by the user). The selection of (effective-preparation, effective-measurement) pairs means select-ing appropriate fiducial circuits to precede or follow thesome or all of the native state preparations or measur-ments. In the typical case when there is just a singlenative state preparation and single native POVM, thispair selection amounts to just selecting pairs of fiducialcircuits.

In pyGSTi’s most tested (and trusted) implementa-tion of circuit selection, the tasks of selecting fiducials,germs, and germ-powers are separated for simplicity.After these tasks complete, one may optionally performfiducial pair reduction. We consider each task in turnbelow.

Throughout this appendix, G refers to a gate setmodel for the quantum processor at hand. So, in addi-tion to using the notation of Eq. 14 to describe membersof G, we may also speak of G’s parameters. G should ap-proximate the true gate set (i.e. the to-be-determinedbest-fit estimate) so that circuits that amplify G’s pa-rameters also amplify the parameters of the best-fit esti-mate, and likewise for informationally complete fiducialcircuits. Initially, we set G to be a TP-parameterizedgate set containing the ideal (target) gates. Usually thebest-fit gate set is close enough to the ideal one thatthe experiment design resulting from this initial choiceof G is all that is ever needed. However it is possible,if necessary, to iterate this process, using, e.g., the ini-tial GST estimate to generate a new experiment designwhich gives rise to a second GST estimate, etc.

D.1.1 Fiducial selection

Fiducial selection refers to the process of (indepen-dently) choosing informationally complete (IC) sets ofeffective state preparations and effective measurements.That is, fiducial selection is the process that determines{Hk} and {Fk} in Eqs. 51 and 52, respectively.

A set of matrices is IC if and only if it spans thevector space B(H). This requires at least d2 linearlyindependent elements. While choosing d2 random cir-cuits would almost certainly result in an IC set, ele-ments could likely be chosen that are “more” linearlyindependent. This notion of an amount of linear in-dependence is quantified by the spectrum of the Grammatrix 1l, defined (as in Eq. 38) by

1li,j = 〈〈Ei|ρj〉〉. (99)

If either{|ρ′j〉〉

}or {〈〈E′i|} fails to be IC, the Gram ma-

trix will fail to have d2 non-zero (to machine precision)singular values. As the d2-th largest magnitude singu-lar value approaches zero this indicates that one of thesets is close to being linearly dependent, which is to be


avoided. Thus, an optimal set of fiducial circuits (onewhose elements span B(H) as uniformly as possible) canbe found in practice by maximizing the smallest of thetop d2 singular values of the Gram matrix over manycandidate sets. We typically consider as a candidateset all the circuits with depth below some cutoff, andhave only a single native state preparation and mea-surement (Nρ = NM = 1). The precise details of howfiducial selection is done are non-essential to achievingthe desired asymptotic accuracy scaling, as this onlyrequires informational completeness. More uniformlyIC sets are beneficial because they decrease the prefac-tor in the the accuracy scaling (i.e. the y-offset of thediamond-distance vs. depth series in Figure 10).

For fiducial selection to succeed, one must computethe Gram matrix using good estimates of the actualgates. When the true (or best-fit) gate set is onlyslightly different from G, then the actual gates will pre-pare states (and effects) that are close to the expectedones – and therefore close to uniformly IC. If the truegates are sufficiently far from those used in fiducial se-lection, this is readily detected in the singular values ofthe empirical Gram matrix (Eq. 38), and remedied byiterating over multiple experiment-generation steps asdescribed at the beginning of this section. From thispoint forward, let us assume fiducial selection has suc-ceeded in identifying sets of Nf1 effective preparationsand Nf2 effective POVM effects.D.1.2 Germ Selection

Germ selection seeks to identify a set of low-depthcircuits whose powers make useful “operations of inter-est” for tomography. Let us begin by considering moregenerally a completely arbitrary set of “operations ofinterest”, O, and asking: “What needs to be in this setto amplify all possible gate set parameters?”.

By performing the circuits that sandwich each O ∈O between all pairs of effective state preparations andmeasurements, we are able to estimate the probabilities

{〈〈Ej |τ(O)|ρi〉〉} for i = 1 . . . Nf1, j = 1 . . . Nf2.(100)

Naively we could simply take O to be the set of sin-gle gates (each a depth-1 circuit) in the gate set, andrepeatedly measure the prescribed circuits many times.These are the same circuit used by process tomogra-phy, and are, up to some additional short circuits (seeEqs. 38-43), sufficient to characterize the gates usingLGST. However, by repeatedly measuring each circuitN times we are only able estimate each gate’s elements(and thereby the gate set parameters, assuming theseparameters are linear in the elements) to an accuracyproportional to 1/

√N (see Section 4). This is just the

standard stochastic error scaling arising from the 1/√N

scaling of the standard deviation of the mean in a Gaus-

sian or binomial distribution. To achieve 1/N scaling,we must include deeper circuits. Intuitively, we wantto take each parameter θ (an element of ~θ after fixinga basis for parameter space) in the gate set model andmap it to a probability that depends on it as p ≈ pθ,where p is (or is proportional to) the germ-power.

Consider the case of a single-qubit gate Gx thatis intended to be an x-rotation by π/2 radians butis actually an x-rotation by θ = π/2 + ε, where εis small. Let us suppose that every circuit is mea-sured N times. If τ(O) = Gx then we would esti-mate θ mod 2π = π/2 + ε ± α/

√N for some constant

α. (The mod 2π arises because rotations by 2π areundetectable.) If τ(O) = G2

x, which is a rotation by2θ = π + 2ε, then we would estimate 2θ mod 2π =π+2ε±α/

√N and thus θ mod π = π/2+ε±α/(2

√N).

More generally, if τ(O) = Gpx, then we would esti-mate pθ mod 2π = pπ/2 + pε ± α/

√N and thus θ

mod 2π/p = π/2 + ε ± α/(p√N). While it may be

initially concerning that as p increases the “branch”ambiguity in θ increases (as one is unable to discrimi-nate between angles separated by 2π/p), this issue canbe circumvented by probing Gpx for multiple values ofp by choosing logarithmically-spaced depths l and let-ting p = bl/|g|c (see Section 4.2.2)). One only needslogarithmically-spaced p (or l) values to determine thecorrect “branch” for θ, as each p can be used to ruleout a constant factor of the possible branches. Thesame reasoning lies behind the Robust Phase Estima-tion protocol [140].

In the end, we see that by increasing l the uncer-tainty in θ decreases as 1/l. Since the total number ofcircuits scales at most linearly with l (and more oftenthan not logarithmically since l-values are logarithmi-cally spaced), this is at worst a Heisenberg-like scalingin the total circuit number and markedly better thanthe 1/

√N scaling that would be obtained by just in-

creasing N for the single τ(O) = Gx.This simple example suggests that we should include

in O logarithmically spaced powers of each of the gates.These circuits amplify some but not all types of gateerrors (i.e. parameters of the gate set). Let us returnto our single-qubit example, but now suppose Gx is anexact π/2 rotation around a slightly wrong axis. Thiscorresponds to the unitary map

Gx = e−i(π/4)(cos εσx+sin εσy), (101)

where σx and σy are the Pauli operators and ε is as-sumed small. This “tilt error”, is not amplified byGpx, which is easily seen by observing that G4

x = 1l,so the error cancels, or “echos”, itself out after just fourrepetitions. More sophisticated circuits are needed toamplify tilt errors. In this example, it is sufficient toprobe GxGy, where Gy is a perfect single-qubit π/2 ro-


tation around the y-axis. GxGy is then a rotation by2π/3 + ε/

√3. Therefore, performing (GxGy)p ampli-

fies the deviation ε by a factor of p and, as before, al-lows estimation of ε to an accuracy scaling as 1/(p

√N).

This illustrates the need for circuits other than the gatesthemselves which must be repeated to amplify all of agate set’s parameters. We call each of the circuits whichis repeated, O ∈ O, a “germ” (consistent with Section4.2).

The general situation gets rapidly complicated – e.g.,if Gy were not perfect in the above example, then GxGyalone could not distinguish between y-axis tilt in Gx andx-axis tilt in Gy. In general, every circuit is sensitiveto some nontrivial linear combination of gate set pa-rameters. To choose a set of germs, we create a list ofcandidate germs (e.g., every circuit shorter than somecutoff depth), and for each candidate germ g identifywhat linear combination of parameters it amplifies. Wedo this by computing the Jacobian,

∇(p)g ≡

1p

∂ [τ(g)p]∂~θ

∣∣∣∣W (~θ)=G

, (102)

where τ(g) is the B(H) → B(H) map formed by com-

posing the elements of g (see Eq. 16), and ~θ ∈ P is avector of gate set parameters. Note that we are justi-fied in assuming we have access to this entire Jacobian –i.e. the derivatives of all the elements of τ(g)p – becausewe have committed to sandwiching the base circuit be-tween informationally complete sets of fiducials. As wewill see in Section D.1.4, inclusion of all of these fiducialpairs is overkill and we can remove many of them. Notealso that, as in the main text, since the germs are onlyable to amplify errors in the gates and not in the SPAM,we will assume a gate set model whose parameters onlyvary its gates.

In general, τ(g) is a d2 × d2 matrix and the gate sethas Np parameters (Np = NGd

4 in the case of fully

parameterized gate set model). This means ∇(p)g is a

d4 × Np matrix. Its d4 right singular vectors indicatelinear combinations of model parameters that τ(g) am-plifies, and the corresponding singular values quantifyhow much they are amplified. A zero singular value in-dicates a parameter combination that is not amplifiedat all (like the tilt error discussed above). We expecteach germ to amplify at least the d2 linear combina-tions of parameters given by its eigenvectors. It mayamplify more than d2 if its spectrum is degenerate andremains so for all unitary perturbations (e.g., a single-qubit unitary gate will always have two unit eigenval-ues). The amplification properties of a set of Ng germs

{g1 . . . gNg

}is described by the Ne ×Np Jacobian

J (p) =

∇(p)g1

∇(p)g2...

∇(p)gNg

. (103)

Our goal is to choose germs that provide high sen-sitivity at “large” values of p (or equivalently l, ifp = bl/|g|c). In practice, it is not useful to make llarger than 1/(|g|η), where η is the rate of stochasticor depolarizing noise. To select germs, however, we ig-nore this effect and make the simplifying assumptionthat the gates (and therefore τ(g)) are reversible. Inpractice, we project the gates of G onto their closestunitary and consider random unitary variations of G toavoid the effects of accidental degeneracies. Under theassumption of unitary gates, it is possible to define thel → ∞ limit of the Jacobian in Eq. 102. Using theproduct rule and that τ(g)−1 = τ(g)†,

∇(p)g = 1

p

p−1∑n=0

τ(g)n ∂τ(g)∂~θ

τ(g)p−1−n (104)

=[

1p

p−1∑n=0

τ(g)n∇(1)g (τ(g)†)n

]τ(g)−(p−1)(105)

As p→∞, the average over all powers n of τ(g) twirls

∇(1)g . By Schur’s lemma, the effect of twirling is to

project ∇(1)g onto the commutant of τ(g) – i.e., onto

the subspace of matrices that commute with τ(g). Fur-thermore, multiplication by the unitary τ(g)−(p−1) ismerely a change of basis, and has no effect on the rightsingular vectors or the singular values of ∇g. So, up toan irrelevant change of basis:

limp→∞

∇(p)g = Πτ(g)

[∇(1)g

], (106)

where Πτ(g) is the projection onto the commutant ofτ(g).

This framework defines a notion of completeness for

germs. The set of germs {gi}Ngi=1 is amplificationally

complete (AC) if and only if the right singular rank ofits Jacobian, the p → ∞ limit of Eq. 103, equals thetotal number of physically accessible parameters in thegate set, Nnongauge

p (see Appendix A.1). To build an ACset of germs, it is sufficient to add germs, one by one,to a set until its Jacobian has rank equal to Nnongauge

p .One may continue to optimize the set of germs numer-ically by adding and removing germs (taken from somemaster candidate list), and only keeping modificationsthat lower a certain score function. The primary scorefunction used in pyGSTi is


f({g1 . . . gNg

}) =

Tr[(J†J)−1]Ng

. (107)

This score estimates the mean squared error of estima-tion if a fixed number of counts are spread over the Ngdistinct germs.D.1.3 Base circuit selection

It remains to specify the number of repetitions(i.e. the values of l and therefore p) for each germ used toconstruct the base circuits of a GST experiment design.We have motivated the use of logarithmically spacedgerm powers (above, and in Appendix B), and as de-scribed in the main text (Section 4.2.2) the maximumnumber of repetitions should be chosen so that the depthof the repeated germ does not exceed Lη, a rough maxi-mum circuit depth before the output probabilities ceaseto yield useful information.

The main idea is to repeat each germ a logarithmi-cally increasing number of times until the depth of therepeated germ circuit reaches Lη. We define the set of

“maximum depths”,{li = mi−1}Nl

i=1 (with m = 2 usu-ally) such that lNl ≤ Lη, and define lmax = lNl = maxi lifor later convenience. Let us write the germ power pformally as a function of germ and max-depth:

p(g, l) = bl/|g|c, (108)

so that p(g, l) is the maximum number of times g maybe repeated without exceeding depth l (if |g| > l thenp(g, l) = 0). Germ g is then repeated p(g, li) times foreach i = 1 . . . Nl. Note that this yields slightly differentbase circuits from just choosing logarithmically-spacedpowers (p) - see Figure 5 step 2. In particular, when|g| > 1 we may end up including additional circuits thatwe would not have if p was itself logarithmically spaced.These additional circuits add to the over-completenessof the data, which we discuss later. The reason forchoosing logarithmically-spaced maximum depths (l)rather than exponents (p) is primarily so that we can, inmost cases, use a single set {li} for all germs. If a com-mon set of powers (p) was used for all the germs, thiswould either result in wasted circuits of depth greaterthan lmax or the failure to repeat short germs as manytimes as one could. A more flexible (but more com-plicated) approach is to vary lmax from germ to germbased on prior knowledge of the device’s stochastic noisebehavior (e.g., when two-qubit gates have more stochas-tic error than one-qubit gates). pyGSTi allows users tospecify per-germ maximum depths, but this is not thedefault behavior.

In the end, the complete set of pyGSTi-generatedGST circuits is as follows. With all the repeated germs

given by O ={gp(gi,lj)i

}i,j

, where {lj}Nlj=1 is the set of

Nl maximum depths, the probabilities corresponding toEq. 59 are given by (using pi,j = p(gi, lj)):

〈〈Ea|τ(gi)p(gi,lj)|ρb〉〉, (109)

where a = 1 . . . Nf1, b = 1 . . . Nf2, i = 1 . . . Ng, andj = 1 . . . Nl. Expanding the fiducial circuits gives theprobabilities in terms of just native operations (ele-ments of the gate set),

〈〈Em(a)t(a) |τ

(Hh(a)

)τ (gi)p(gi,lj) τ

(Ff(b)

)|ρr(b)〉〉. (110)

D.1.4 Fiducial pair reduction.

While the set of circuits corresponding to Eq. 110achieves the goal of amplifying all of the gate set model’sparameters, it isn’t as efficient as it could be. This isbecause we have included a complete set of fiducial pairsfor each base circuit. In this section we expand uponthe description of fiducial pair reduction (FPR) in themain text (Section 5.2).

FPR is especially needed when we consider scalingGST beyond a single qubit. Moving to more qubits nec-essarily brings dramatic increases in the dimensionalityof the parameter manifold, but the standard GST ex-periment design increases more rapidly than is requiredby the greater number of parameters. An ambitious 1-qubit GST experiment uses 6 fiducial circuits, 11 germs,and up to 13 powers (L = 1, 2, 4, . . . 8192). This adds upto approximately 6× 6× 11× 13 = 5148 circuits. Thissounds like a lot – but a direct generalization of thismethod to 2 qubits would require 36 fiducial circuitsand 80 germs, for roughly 1.3 × 106 distinct circuits.This is not practical.

Thankfully, this many circuits is not necessary. Asdiscussed in the main text, the standard experiment de-sign contains redundancies stemming from the fact thatwe don’t need full tomographic information for each

repeated germ (gp(gi,lj)i ) circuit. We only need to be

able to extract the amplified linear combinations of pa-rameters for each germ, which are typically much fewer(≈ d2) than the ≈ d3 circuits that pinpoint the ≈ d4

elements in τ(gp(gi,lj)i

). As such, we roughly expect

that we can reduce the number of circuits by a factorof d3/d2 = d and still maintain the desired Heisenbergscaling.

One way of doing this is to simply randomly removesome fraction of the Nf1 ·Nf2 circuits corresponding toeach repeated-germ. This technique seems to work inpractice but has not been studied extensively. An alter-nate technique is to remove a specific set of the Nf1 ·Nf2“fiducial pairs” (corresponding to (a, b) pairs of indicesin Eqs. 59-60) for each germ. The same set of fiducialpairs is removed for every power of the same germ. Toconstruct a reduced set of M fiducial pairs for germ g


we construct the projectors {Pk} where Pk = JkJTk and

Jk is the M ×Np matrix whose m-th row is the deriva-tive of 〈〈Eam |τ(g)|ρbm〉〉 with respect to k-th amplifiedparameter of g. The “amplified parameters” of g arethe non-zero elements of τ(g)’s Jordan normal form, asthese are the generalization of τ(g)’s eigenvalues whichinclude off-diagonal elements of degenerate blocks. Ifthe rank of

∑k Pk is equal to the maximal number of

parameters g can amplify then the set of fiducial pairs{(am, bm)}Mm=1 is sufficient and may be used as the re-duced set of fiducial pairs for g. (The maximal num-ber of parameters g can amplify can easily be found bycomputing the rank when including all possible fiducialpairs.) For each germ we find a minimal set of fiducialpairs which amplifies the maximum number of param-eters, and this collection of per-germ fiducial pairs dic-tates a reduced set of circuits which should maintain thesame desired accuracy scaling. We find this to be truein practice, but that the FPR process also makes theparameter estimation less robust to data that does notfit the gate set model. The redundancy, as one mightexpect, serves to stabilize the numerical optimizationused in the parameter estimation portion of GST.

It is also possible to consider different maximum-depth sets {li} for different germs, so that the numberof times each each germ is repeated is strictly a logarith-mic sequence. We expect that this would similarly leadto GST maintaining the desired scaling on data thatcan be fit well but that it could lead to an overall lackof robustness to data which cannot be fit well. We havenot performed simulations to verify this hypothesis, andleave this as a subject of future work.

D.2 Non-TP loglikelihood optimizationIf the gate set is not TP, then the expression for the log-likelihood in Eq. 61 above must be modified. In doingso, we will make use of a general relationship betweenthe Poisson and Multinomial distributions. Consider amultinomial distribution with event probabilities {nj}and number of samples N . Now suppose the numberof samples is no longer fixed but instead is Poisson dis-tributed with mean Λ = N . This results in the previ-ously multinomially-distributed nj becoming indepen-dently Poisson-distributed with rates λj = njΛ = njN .This is easier to see in the reverse direction, start-ing from independent Poisson-distributed nj , and post-selecting on N = N0 where N0 is a fixed number. Thisintroduces correlations among the nj , and the nj be-come multinomially distributed.

To see this, suppose we have K Poisson-distributedrandom variables nj , each with its own rate parameterλj :

Pr(nj) =e−λjλ

njj

nj !. (111)

In turn, the random variable X ≡∑j nj is Poisson

distributed, with rate parameter λ ≡∑j λj :

Pr(X = N) = e−λλN

N ! . (112)

Further, since the nj are independent, the probabilityof observing a particular set of values {nj}Kj=1 is simplythe product of the individual probabilities:

Pr({nj}) =∏j

Pr(nj) = e−λ∏j

λnjj

nj !. (113)

What is the probability of observing this particular setof values, conditioned on X = N0? By the definition ofconditional probability, this is given by

Pr({nj} | X = N0) = Pr({nj} ∩ {X = N0})Pr(X = N0) . (114)

The denominator of the above expression has alreadybeen calculated (Eq. 112); what remains is to calculatethe numerator. Notice that if the sum of the {nj} isnot N0, then the numerator is zero, while if their sumis N0, then the numerator is Pr({nj}):

Pr({nj}∩ {X = N0}) ={

Pr({nj})∑j nj = N0

0∑j nj 6= N0

.

(115)Therefore, the conditional probability is

Pr({nj} | X = N0) =

e−λ∏j

λnjj

nj !

( N0!e−λλN0

)

= N0!n1!n2! · · ·nK !

(∏j λ

njj

λN0

)

= N0!n1!n2! · · ·nK !

∏j

(λjλ

)nj= N0!n1!n2! · · ·nK !

∏j

(λj∑k λk

)nj,

which is a multinomial distribution over N0 trials, andwith event probabilities pj = λj∑

kλk

. Note that when

moving to the third line in this derivation we used thefact N0 =

∑j nj .

When a gate set is not constrained to be TP, it meansthat the predicted outcome probabilities for a circuitneed not sum to one, and equivalently that the pre-dicted total number of counts need not equal the ac-tual (observed) number of counts. Assuming the pre-dicted number of counts will be (Poisson-) distributedaround the actual number, we find ourselves in the sit-uation just described with {nj} corresponding to the


outcome probabilities {ps,βs}βs . We may thus think ofλs,βs ≡ Nsps,βs as the rate of seeing outcome βs aftercircuit s and write the likelihood for a single circuit asthe likelihood for independent Poisson distributions:

Ls =∏βs

λNs,βss,βs

e−λs,βs

Ns,βs !

logLs =∑βs

Ns,βs log(Nsps,βs)−Nsps,βs − log(Ns,βs !)

logL′s =∑βs

Ns,βs log(ps,βs)−Nsps,βs (116)

where in the final line we have for convenience discardedthe ps,βs-independent Ns,βs log(Ns) and log(Ns,βs !)terms and defined logL′s. The total loglikelihood is thesum of each logL′s, giving

logL =∑s

logL′s =∑s,βsNs

Ns,βs log(ps,βs)−Nsps,βs

(117)The second term in Eq. 117 can be viewed as a Lagrangemultiplier (or penalty term) which softly enforces theTP constraint that

∑βsps,βs = 1 for all s. Further-

more, we note that this “Poisson-picture” form of theloglikelihood is identical (up to an unimportant con-stant shift) to Eq. 62 for TP gate sets, and so Eq. 117can be interpreted as a more general formulation whichallows for both TP and non-TP gate sets.

pyGSTi uses this Poisson-picture formulation of logLall the time, both because it is more general (can han-dle cases when the gates are not TP) and because itmakes the function more amenable to least-squares op-timization techniques such as the Levenberg-Marquardtmethod. As noted in the text (Section 4.3), pyGSTialso regularizes the logL functions so that evaluatingthe objective function at negative probabilities (as canbe predicted by non-CP-constrained gate set models) ispossible.

E Numerical verificationThis appendix presents a numerical verification of theaccuracy scaling claims derived in the main text. Herewe present evidence that

• the accuracy with which LGST reconstructs a gateset scales as O(1/

√N), where N is the number of

times each circuit is run on the device (so there areN sampled outcomes per circuit), and

• the accuracy with which long-sequence GST recon-structs a gate set scales as O(1/L), where L is themaximum depth of the base circuits used in long-sequence GST.

Both points can be handled similarly, as they bothinvolve testing whether the slope of a particular plot isnear a target value. Our results consist of verifying thevalues of such slopes over many trials on 1- and 2-qubitsystems. Each trial consists of the following steps:

1. Choose a “true” gate set. This identifies the trial,and is a slightly perturbed ideal/target gate set.The true gate set contains imperfect operations.

2. Simulate circuits using the true gate set. Circuitoutcomes are generated by sampling N times themultinomial distribution corresponding to the out-come probabilities predicted by the true gate set.For LGST trials, the number of circuits is fixedand N is varied to form multiple data sets eachcorresponding to a different value of N . For long-sequence GST trials, N is fixed, and data setscorresponding to different maximum base-circuitdepth L are created.

3. Perform LGST or GST on the simulated data sets.This leads to a series of estimated gate sets – onefor each value of N or L, respectively.

4. Plot the accuracy of the LGST (GST) estimates vs.N (L) on a log-log plot, and compute its best-fitslope. Accuracy is computed as a distance to thetrue gate set (see below), and quantifies how wellthe estimate matches the true gate set.

In the end, each trial leads to a slope that we comparewith the target value of −1/2 (for LGST trials) or −1(for GST trials).

Trials are constructed by first selecting a perfect gateset that has unitary gates. We consider 6 1-qubit per-fect gate sets and 5 2-qubit perfect gate sets that wecommonly see in hardware. Each 1-qubit (2-qubit) per-fect gate set is perturbed 100 (10) times by applying(a) independently random depolarization to each gate,with a maximum strength of 10−3, (b) a constant SPAMerror of 10−2 (1%), and (c) independently random rota-tions of up to 10−3 radians about each axis (x, y, and zfor 1-qubit and axes corresponding to the 15 nontrivialPauli matrices in 2-qubit cases). These noise strengthswere chosen to be realistic values for current state-of-the-art devices. Each perturbed gate set is then usedto generate a series of data sets by varying N or L asdescribed above.

Quantifying the accuracy of LGST/GST is done bycomputing the average diamond-norm distance, de-noted �avg(G,G0), between the gates of the true gateset G0 and those of the estimate G, where

�avg (G1,G2) = 1NG

∑i

||G(1)i −G

(2)i ||�. (118)


In Eq. 118, G(1)i are the gates belonging to G1, G

(2)i

are the gates belonging to G2, and the diamond norm isgiven by

||G−G0||� = supρ||(G⊗1d)[ρ]− (G0⊗1d)[ρ]||1, (119)

where ρ ranges over all valid quantum states.

Since the diamond distance is not gauge invariant(see Appendix A.1) we gauge-optimize the estimatedgate sets to the true gate set before computing it. If wedidn’t, the results wouldn’t be meaningful and we wouldnot expect to observe �avg to decrease in the expectedway. The choice of average diamond distance here issomewhat arbitrary. If we instead used the infidelity,the Frobenius distance, or another similar well-behavedquantity measuring the difference between two gate sets,

we would expect to see the same scaling behavior.Figure 9 shows the average diamond distance vs. N

for estimates obtained from LGST, and confirms the ex-pected 1/

√N scaling (slopes are near -0.5). This is true

for all the single- and two-qubit gate sets we tested. Inthe figure legends, single-qubit rotation gates are givenby their axis and angle, e.g., X(π/2) denotes a x-axisrotation by π/2 radians. The Nφ gate is a π/2-rotationgate about the tilted (x, y, z) = (

√3/2, 0,−1/2) axis.

Two-qubit gates are given as either two letters thatindicate a tensor product of single-qubit π/2 rotationgates (e.g., XI means a X(π/2) gate on qubit 1 andan idle on qubit 2), or as a commonly known CNOT orCPHASE gate.

The O(1/√N) scaling leads to an overall O(1/

√N ′)

scaling, where N ′ is the total number of samples, sincethe number of circuits required by LGST is fixed andN is proportional to N ′.


10 100 1000 10000 100000 1000000N (number of samples)

103

102

101

100

101

102

103

104

105

106

avg(

,0)

[dat

a sc

aled

by

10k ,

see

lege

nd fo

r k]

1-qubit

Gate set [scaling expondent k]

X(2 ), Y(2 ) [0]

X(2 ), Z(2 ) [1]

X(2 ), Y(2 ), I [2]

X(2 ), Y(2 ), Z(2 ), I [3]

Z(2 ), N [4]

X(4 ), Z(2 ) [5]

slope=-1/2

10 100 1000 10000 100000 1000000N (number of samples)

103

102

101

100

101

102

103

104

105

106

avg(

,0)

[dat

a sc

aled

by

10k ,

see

lege

nd fo

r k]

2-qubit

Gate set [scaling expondent k]

XI, YI, IX, IY, CNOT [0]XI, YI, IX, IY, CPHASE [1]XI, YI, IX, IY, II, CNOT [2]XI, YI, IX, IY, II, CPHASE [3]XI, YI, IX, IY [4]slope=-1/2

0.54 0.52 0.50 0.48 0.46 0.44 0.42Slope

0

25

50

75

100

125

150

175

Cou

nt

0.47 0.46 0.45 0.44 0.43Slope

0

2

4

6

8

10

12

Cou

nt

Figure 9: 1/√N dependence of LGST accuracy. The average diamond norm distance between the gates of a LGST estimate

and the data-generating “true” gate set is computed as a function of the number of data samples per circuit, N . The dashedline with slope -0.5 indicates the expected O(1/

√N) scaling. The left and right columns of plots correspond to 1- and 2-qubit

results, respectively. At each value of N , 100 (1-qubit cases) or 10 (2-qubit cases) perturbations of the original perfect gate setwere used (color-coded). A line is plotted for each perturbed gate set. Y-axis values are multiplied 10k to aid in visibility (thisshifts lines upward by k grid cells). The integer k is different for each perfect gate set, and is given in the legend. The legend’sk-value indicates the numbr of grid cells the data must be shifted down to make the y-axis values are correct. Below each plot isa histogram of the best-fit slopes of all the lines within it.


1 2 4 8 16 32 64 128 256 512 1024L (maximum base circuit depth)

104

103

102

101

100

101

102

103

104

avg(

,0)

[dat

a sc

aled

by

10k ,

see

lege

nd fo

r k]

1-qubit

Gate set [scaling exponent k]

X(2 ), Y(2 ) [0]

X(2 ), Z(2 ) [1]

X(2 ), Y(2 ), I [2]

X(2 ), Y(2 ), Z(2 ), I [3]

Z(2 ), N [4]

X(4 ), Z(2 ) [5]

slope = -1

1 2 4 8 16 32 64 128 256L (maximum base circuit depth)

103

102

101

100

101

102

103

avg(

,0)

[dat

a sc

aled

by

10k ,

see

lege

nd fo

r k]

2-qubit

Gate set [scaling exponent k]

XI, YI, IX, IY, CNOT [0]XI, YI, IX, IY, CPHASE [1]XI, YI, IX, IY, II, CNOT [2]XI, YI, IX, IY, II, CPHASE [3]XI, YI, IX, IY [4]slope = -1

1.0 0.9 0.8 0.7 0.6Slope

0

20

40

60

80

100

120

140

160

Cou

nt

1.00 0.95 0.90 0.85 0.80Slope

0

2

4

6

8

Cou

nt

Figure 10: 1/L dependence of long-sequence GST accuracy. Upper plots show the average diamond norm distance, �avg(G,G0),between each long-sequence GST estimate G and the corresponding data-generating gate set G0 is displayed as a function of themaximum base-circuit depth L. The dashed line has slope equal to -1.0, indicating the expected O(1/L) scaling. Left and rightcolumns of plots correspond to 1- and 2-qubit results, respectively. At each value of L, 100 (1-qubit cases) or 10 (2-qubit cases)perturbations of the original perfect gate set were used (color-coded). A line is plotted for each perturbed gate set. Y-axis valuesare multiplied 10k to aid in visibility (this shifts lines upward by k grid cells). The integer k is different for each perfect gate set,and is given in the legend. The legend’s k-value indicates the numbr of grid cells the data must be shifted down to make the y-axisvalues are correct. Below each upper plot is a histogram of the best-fit slopes of all the lines within it.

Long-sequence GST utilizes deep circuits to amplifygate errors and thereby make them more visible. Bydesign, the sensitivity of GST to gate errors shouldscale as 1/L, where L is the maximum depth of the re-peated germ circuits. Thus, we expect GST to achievea O(1/L) accuracy scaling. Figure 10 shows the aver-age diamond distance vs. L for estimates obtained fromLGST, and confirms this 1/L scaling (slopes are closeto −1).

The long-sequence GST simulations were run with

fixed N = 1000 and the maximum base-circuit lengthschosen to be logarithmically-spaced powers of 2 from 1to L. There is some expected “roll-off” at larger valuesof L that is believed to be due to L approaching Lη (thedepth at which circuit outcomes no longer provide usefulinformation because the quantum state is completelydecohered). This slight bend is more visible in the 2-qubit case.

This O(1/L) scaling easily translates to an overallO(1/N ′) scaling, where N ′ is the total number of sam-


ples, since by fixingN and pessimistically assuming thatthere are L different maximum-base-circuit-depth val-ues (usually there are log(L)), N ′ is proportional to L.

All of the analysis shown here used the LGST andlong-sequence GST implementations in pyGSTi version0.9.9.2.

F χ2 estimator SPAM biasIn the course of our analysis, we discovered a subtleproblem with the minimum-χ2 estimation: it can besignificantly biased. To see this, consider a simple exam-ple: estimating a coin’s bias. Suppose that two circuitsare done. In each circuit the coin is flipped 100 times.The first circuit yields 0 heads, and the second yields1. In this simple case the data could just be combinedinto a single data set with N = 200 and n = 1, in whichcase any reasonable estimator yields p = 0.005. If, in-stead, we combine the circuits into a single likelihoodfunction,

L(p) = p1(1− p)99 × p0(1− p)100,

then its maximum is still at pMLE = 0.005. But if wecombine the circuit into a single χ2 function,

χ2(p) = (p− 0.01)2

p(1− p) + (p− 0)2

p(1− p) ,

then the minimum is at

pmχ2 =√

19801− 119800 ≈ 0.007.

This is not in itself bias, but when this estimator isaveraged over all data sets, we find that minimum-χ2

estimation overestimates the probability of rare events.If the true underlying probability were 0.01, then 〈p〉would be 0.0118. As p gets smaller, the bias gets worse:

limp→0〈p〉 =

√19801− 1

99 p ≈ 1.411p.

This example is remarkably relevant, because the esti-mation of SPAM error in GST proceeds almost iden-tically to this. Many distinct circuits – each typicallywith N ≈ 100 counts – are combined into a single χ2 orlikelihood function, whose optimum gives the estimatedSPAM error.

Indeed, we observe that the using the minimum-χ2

estimation can overestimate SPAM error by as muchas 100%. While the bias usually doesn’t have a directeffect on gate estimates (deep circuit data dominatesthem, and the deep circuit data is not dominated byrare events), it can indirectly induce a noticeable un-derestimate of the RB error rate (by causing the SPAM

error rate to be overestimated). Using the full MLEinstead resolves these issues. The ML estimate is gen-erally quite accurate, and the “residual likelihood” (or,more precisely, the loglikelihood ratio statistic w/r.t. amaximal model) is χ2 distributed when the model isvalid, and forms an excellent test statistic for modelvalidation.

References[1] Scott Aaronson. Shadow tomography of quan-

tum states. In Proceedings of the 50th AnnualACM SIGACT Symposium on Theory of Com-puting, pages 325–338. dl.acm.org, 2018.

[2] Panos Aliferis and Andrew W. Cross. Subsys-tem fault tolerance with the bacon-shor code.Phys. Rev. Lett., 98:220502, May 2007. DOI:10.1103/PhysRevLett.98.220502.

[3] Panos Aliferis and John Preskill. Fibonaccischeme for fault-tolerant quantum computation.Phys. Rev. A, 79:012332, Jan 2009. DOI:10.1103/PhysRevA.79.012332.

[4] Panos Aliferis, Daniel Gottesman, and JohnPreskill. Quantum accuracy threshold for con-catenated distance-3 codes. Quantum Info. Com-put., 6(2):97–165, March 2006. ISSN 1533-7146.

[5] J B Altepeter, D Branning, E Jeffrey, T C Wei,P G Kwiat, R T Thew, J L O’Brien, M A Nielsen,and A G White. Ancilla-assisted quantum pro-cess tomography. Phys. Rev. Lett., 90(19):193601,May 2003. ISSN 0031-9007. DOI: 10.1103/Phys-RevLett.90.193601.

[6] L M Artiles, R D Gill, and M I Guta. An invi-tation to quantum tomography. J. R. Stat. Soc.Series B Stat. Methodol., 67(1):109–134, Febru-ary 2005. ISSN 1369-7412, 1467-9868. DOI:10.1111/j.1467-9868.2005.00491.x.

[7] Frank Arute, Kunal Arya, Ryan Babbush, DaveBacon, Joseph C Bardin, Rami Barends, RupakBiswas, Sergio Boixo, Fernando G S L Bran-dao, David A Buell, Brian Burkett, Yu Chen, Zi-jun Chen, Ben Chiaro, Roberto Collins, WilliamCourtney, Andrew Dunsworth, Edward Farhi,Brooks Foxen, Austin Fowler, Craig Gidney,Marissa Giustina, Rob Graff, Keith Guerin, SteveHabegger, Matthew P Harrigan, Michael J Hart-mann, Alan Ho, Markus Hoffmann, Trent Huang,Travis S Humble, Sergei V Isakov, Evan Jeffrey,Zhang Jiang, Dvir Kafri, Kostyantyn Kechedzhi,Julian Kelly, Paul V Klimov, Sergey Knysh,Alexander Korotkov, Fedor Kostritsa, DavidLandhuis, Mike Lindmark, Erik Lucero, DmitryLyakh, Salvatore Mandra, Jarrod R McClean,Matthew McEwen, Anthony Megrant, Xiao Mi,


https://doi.org/10.1103/PhysRevLett.98.220502


https://doi.org/10.1103/PhysRevA.79.012332




https://doi.org/10.1111/j.1467-9868.2005.00491.x

https://doi.org/10.1111/j.1467-9868.2005.00491.x

Kristel Michielsen, Masoud Mohseni, Josh Mu-tus, Ofer Naaman, Matthew Neeley, CharlesNeill, Murphy Yuezhen Niu, Eric Ostby, An-dre Petukhov, John C Platt, Chris Quintana,Eleanor G Rieffel, Pedram Roushan, Nicholas CRubin, Daniel Sank, Kevin J Satzinger, VadimSmelyanskiy, Kevin J Sung, Matthew D Tre-vithick, Amit Vainsencher, Benjamin Villalonga,Theodore White, Z Jamie Yao, Ping Yeh, AdamZalcman, Hartmut Neven, and John M Martinis.Quantum supremacy using a programmable su-perconducting processor. Nature, 574(7779):505–510, October 2019. ISSN 0028-0836, 1476-4687.DOI: 10.1038/s41586-019-1666-5.

[8] Konrad Banaszek, Marcus Cramer, and DavidGross. Focus on quantum tomography. NewJ. Phys., 15(12):125020, 2013. ISSN 1367-2630.DOI: 10.1088/1367-2630/15/12/125020. URLhttps://doi.org/10.1088/1367-2630/15/12/125020.

[9] R Barends, J Kelly, A Megrant, A Veitia,D Sank, E Jeffrey, T C White, J Mutus, A GFowler, B Campbell, Y Chen, Z Chen, B Chiaro,A Dunsworth, C Neill, P O’Malley, P Roushan,A Vainsencher, J Wenner, A N Korotkov, A NCleland, and John M Martinis. Superconductingquantum circuits at the surface code thresholdfor fault tolerance. Nature, 508(7497):500–503,April 2014. ISSN 0028-0836, 1476-4687. DOI:10.1038/nature13171.

[10] Ariel Bendersky, Fernando Pastawski, andJuan Pablo Paz. Selective and efficient estima-tion of parameters for quantum process tomog-raphy. Phys. Rev. Lett., 100(19):190403, May2008. ISSN 0031-9007. DOI: 10.1103/Phys-RevLett.100.190403.

[11] Ingemar Bengtsson and Karol Zyczkowski. Ondiscrete structures in finite hilbert spaces, 2017.

[12] R C Bialczak, M Ansmann, M Hofheinz,E Lucero, M Neeley, A D O’Connell, D Sank,H Wang, J Wenner, M Steffen, A N Cleland,and J M Martinis. Quantum process tomogra-phy of a universal entangling gate implementedwith josephson phase qubits. Nat. Phys., 6(6):409–413, June 2010. ISSN 1745-2473, 1745-2481.DOI: 10.1038/nphys1639.

[13] Lev S Bishop, Sergey Bravyi, Andrew Cross,Jay M Gambetta, and John Smolin. Quantumvolume. Quantum Volume. Technical Report,2017.

[14] R. Blume-Kohout, J.K. Gamble, E. Nielsen,J. Mizrahi, J.D. Sterk, and P. Maunz. Robust,self-consistent, closed-form tomography of quan-

tum logic gates on a trapped ion qubit. arXivpreprint arXiv:1310.4492, 2013.

[15] Robin Blume-Kohout. Hedged maximum like-lihood quantum state estimation. Phys. Rev.Lett., 105(20):200504, November 2010. ISSN0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.105.200504.

[16] Robin Blume-Kohout. Optimal, reliable esti-mation of quantum states. New J. Phys., 12(4):043034, April 2010. ISSN 1367-2630. DOI:10.1088/1367-2630/12/4/043034.

[17] Robin Blume-Kohout. Robust error barsfor quantum tomography. arXiv preprintarXiv:1202.5270, February 2012.

[18] Robin Blume-Kohout and Kevin C. Young.A volumetric framework for quantum com-puter benchmarks. Quantum, 4:362, November2020. ISSN 2521-327X. DOI: 10.22331/q-2020-11-15-362. URL https://doi.org/10.22331/q-2020-11-15-362.

[19] Robin Blume-Kohout, John King Gamble, ErikNielsen, Kenneth Rudinger, Jonathan Mizrahi,Kevin Fortier, and Peter Maunz. Demonstrationof qubit operations below a rigorous fault toler-ance threshold with gate set tomography. Nat.Commun., 8, February 2017. ISSN 2041-1723.DOI: 10.1038/ncomms14485.

[20] J Z Blumoff, K Chou, C Shen, M Reagor,C Axline, R T Brierley, M P Silveri, C Wang,B Vlastakis, S E Nigg, and Others. Imple-menting and characterizing precise multiqubitmeasurements. Physical Review X, 6(3):031041,2016. DOI: 10.1103/PhysRevX.6.031041.URL https://link.aps.org/doi/10.1103/PhysRevX.6.031041.

[21] Sergio Boixo, Sergei V Isakov, Vadim N Smelyan-skiy, Ryan Babbush, Nan Ding, Zhang Jiang,Michael J Bremner, John M Martinis, and Hart-mut Neven. Characterizing quantum supremacyin near-term devices. Nat. Phys., 14(6):595–600, April 2018. ISSN 1745-2473. DOI:10.1038/s41567-018-0124-x.

[22] S. Boyd and L. Vandenberghe. Convex optimiza-tion. Cambridge university press, 2004.

[23] Agata M Branczyk, Dylan H Mahler, Lee ARozema, Ardavan Darabi, Aephraim M Stein-berg, and Daniel FV James. Self-calibratingquantum state tomography. New Journal ofPhysics, 14(8):085003, 2012. DOI: 10.1088/1367-2630/14/8/085003. URL https://doi.org/10.1088/1367-2630/14/8/085003.

[24] A Carignan-Dugas, J J Wallman, and J Emerson.Characterizing universal gate sets via dihedralbenchmarking. Phys. Rev. A, 2015. ISSN


https://doi.org/10.1038/s41586-019-1666-5

https://doi.org/10.1088/1367-2630/15/12/125020

https://doi.org/10.1088/1367-2630/15/12/125020

https://doi.org/10.1088/1367-2630/15/12/125020

https://doi.org/10.1038/nature13171




https://doi.org/10.1038/nphys1639



https://doi.org/10.1088/1367-2630/12/4/043034

https://doi.org/10.1088/1367-2630/12/4/043034

https://doi.org/10.22331/q-2020-11-15-362

https://doi.org/10.22331/q-2020-11-15-362

https://doi.org/10.22331/q-2020-11-15-362

https://doi.org/10.22331/q-2020-11-15-362

https://doi.org/10.1038/ncomms14485

https://doi.org/10.1103/PhysRevX.6.031041

https://link.aps.org/doi/10.1103/PhysRevX.6.031041


https://doi.org/10.1038/s41567-018-0124-x

https://doi.org/10.1038/s41567-018-0124-x

https://doi.org/10.1088/1367-2630/14/8/085003

https://doi.org/10.1088/1367-2630/14/8/085003

https://doi.org/10.1088/1367-2630/14/8/085003

https://doi.org/10.1088/1367-2630/14/8/085003

1050-2947. DOI: 10.1103/PhysRevA.92.060302.URL https://link.aps.org/doi/10.1103/PhysRevA.92.060302.

[25] Pascal Cerfontaine, Rene Otten, and Hen-drik Bluhm. Self-consistent calibration ofquantum-gate sets. Phys. Rev. Applied, 13:044071, Apr 2020. DOI: 10.1103/PhysRevAp-plied.13.044071. URL https://link.aps.org/doi/10.1103/PhysRevApplied.13.044071.

[26] Robert J. Chapman, Christopher Ferrie, and Al-berto Peruzzo. Experimental demonstration ofself-guided quantum tomography. Phys. Rev.Lett., 117:040402, Jul 2016. DOI: 10.1103/Phys-RevLett.117.040402. URL https://link.aps.org/doi/10.1103/PhysRevLett.117.040402.

[27] T Chasseur and F K Wilhelm. Complete random-ized benchmarking protocol accounting for leak-age errors. Phys. Rev. A, 92(4):042333, Octo-ber 2015. ISSN 1050-2947. DOI: 10.1103/Phys-RevA.92.042333.

[28] E A Chekhovich, M N Makhonin, A I Tar-takovskii, A Yacoby, H Bluhm, K C Nowack, andL M K Vandersypen. Nuclear spin effects in semi-conductor quantum dots. Nat. Mater., 12(6):494–504, June 2013. DOI: 10.1038/nmat3652. URLhttps://doi.org/10.1038/nmat3652.

[29] Yanzhu Chen, Maziar Farahzad, Shinjae Yoo, andTzu-Chieh Wei. Detector tomography on IBMquantum computers and mitigation of an imper-fect measurement. Phys. Rev. A, 100(5):052315,2019. ISSN 1050-2947. DOI: 10.1103/Phys-RevA.100.052315. URL https://link.aps.org/doi/10.1103/PhysRevA.100.052315.

[30] Andrew M Childs, Isaac L Chuang, and Deb-bie W Leung. Realization of quantum process to-mography in NMR. Phys. Rev. A, 64(1):012314,June 2001. ISSN 1050-2947. DOI: 10.1103/Phys-RevA.64.012314.

[31] Man-Duen Choi. Completely positive linearmaps on complex matrices. Linear Algebra Appl.,10(3):285–290, 1975. ISSN 0024-3795. DOI:https://doi.org/10.1016/0024-3795(75)90075-0. URL https://www.sciencedirect.com/science/article/pii/0024379575900750.

[32] Matthias Christandl and Renato Renner. Re-liable quantum state tomography. Phys. Rev.Lett., 109:120403, Sep 2012. DOI: 10.1103/Phys-RevLett.109.120403. URL https://link.aps.org/doi/10.1103/PhysRevLett.109.120403.

[33] Isaac L Chuang and M A Nielsen. Prescriptionfor experimental determination of the dynamicsof a quantum black box. J. Mod. Opt., 44(11-12):2455–2467, November 1997. ISSN 0950-0340.DOI: 10.1080/09500349708231894.

[34] Lukasz Cincio, Kenneth Rudinger, MohanSarovar, and Patrick J. Coles. Machine learning ofnoise-resilient quantum circuits. PRX Quantum,2:010324, Feb 2021. DOI: 10.1103/PRXQuan-tum.2.010324. URL https://link.aps.org/doi/10.1103/PRXQuantum.2.010324.

[35] Jared H Cole. Hamiltonian tomography:the quantum (system) measurement prob-lem. New J. Phys., 17(10):101001, September2015. ISSN 1367-2630. DOI: 10.1088/1367-2630/17/10/101001.

[36] Jared H Cole, Sonia G Schirmer, Andrew DGreentree, Cameron J Wellard, Daniel K L Oi,and Lloyd C L Hollenberg. Identifying an ex-perimental two-state hamiltonian to arbitraryaccuracy. Phys. Rev. A, 71(6):062312, June2005. ISSN 1050-2947. DOI: 10.1103/Phys-RevA.71.062312.

[37] A D Corcoles, Jay M Gambetta, Jerry M Chow,John A Smolin, Matthew Ware, Joel Strand,B L T Plourde, and M Steffen. Process ver-ification of two-qubit quantum gates by ran-domized benchmarking. Phys. Rev. A, 87(3):030301, March 2013. ISSN 1050-2947. DOI:10.1103/PhysRevA.87.030301.

[38] Andrew W Cross, Lev S Bishop, Sarah Sheldon,Paul D Nation, and Jay M Gambetta. Validat-ing quantum computers using randomized modelcircuits. Phys. Rev. A, 100(3):032328, Septem-ber 2019. ISSN 1050-2947. DOI: 10.1103/Phys-RevA.100.032328.

[39] Marcus P da Silva, Olivier Landon-Cardinal, andDavid Poulin. Practical characterization of quan-tum devices without tomography. Phys. Rev.Lett., 107(21):210404, November 2011. ISSN0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.107.210404.

[40] G M D’Ariano and P Lo Presti. Quantum tomog-raphy for measuring experimentally the matrix el-ements of an arbitrary quantum operation. Phys.Rev. Lett., 86(19):4195–4198, May 2001. ISSN0031-9007. DOI: 10.1103/PhysRevLett.86.4195.

[41] S Debnath, N M Linke, C Figgatt, K A Lands-man, K Wright, and C Monroe. Demonstrationof a small programmable quantum computer withatomic qubits. Nature, 536(7614):63–66, August2016. DOI: 10.1038/nature18648. URL https://doi.org/10.1038/nature18648.

[42] Juan P Dehollain, Juha T Muhonen, RobinBlume-Kohout, Kenneth M Rudinger, John KingGamble, Erik Nielsen, Arne Laucht, StephanieSimmons, Rachpon Kalra, Andrew S Dzu-rak, and Andrea Morello. Optimization ofa solid-state electron spin qubit using gate



https://link.aps.org/doi/10.1103/PhysRevA.92.060302


https://doi.org/10.1103/PhysRevApplied.13.044071


https://link.aps.org/doi/10.1103/PhysRevApplied.13.044071




https://link.aps.org/doi/10.1103/PhysRevLett.117.040402




https://doi.org/10.1038/nmat3652

https://doi.org/10.1038/nmat3652







https://doi.org/https://doi.org/10.1016/0024-3795(75)90075-0



https://www.sciencedirect.com/science/article/pii/0024379575900750

https://www.sciencedirect.com/science/article/pii/0024379575900750





https://doi.org/10.1080/09500349708231894

https://doi.org/10.1103/PRXQuantum.2.010324

https://doi.org/10.1103/PRXQuantum.2.010324

https://link.aps.org/doi/10.1103/PRXQuantum.2.010324

https://link.aps.org/doi/10.1103/PRXQuantum.2.010324

https://doi.org/10.1088/1367-2630/17/10/101001

https://doi.org/10.1088/1367-2630/17/10/101001













set tomography. New Journal of Physics,18(10):103018, 2016. DOI: 10.1088/1367-2630/18/10/103018. URL https://doi.org/10.1088/1367-2630/18/10/103018.

[43] C Di Franco, M Paternostro, and M S Kim.Hamiltonian tomography in an access-limited set-ting without state initialization. Phys. Rev. Lett.,102(18):187203, May 2009. ISSN 0031-9007. DOI:10.1103/PhysRevLett.102.187203.

[44] Olivia Di Matteo, John Gamble, Chris Granade,Kenneth Rudinger, and Nathan Wiebe. Opera-tional, gauge-free quantum tomography, Novem-ber 2020. ISSN 2521-327X. URL https://doi.org/10.22331/q-2020-11-17-364.

[45] David P DiVincenzo. The physical im-plementation of quantum computation.Fortschritte der Physik: Progress of Physics,48(9-11):771–783, 2000. URL https://doi.org/10.1002/1521-3978(200009)48:9/11%3C771::AID-PROP771%3E3.0.CO;2-E.

[46] BRADLEY EFRON. Nonparametric estimatesof standard error: The jackknife, the boot-strap and other methods. Biometrika, 68(3):589–599, 12 1981. ISSN 0006-3444. DOI:10.1093/biomet/68.3.589. URL https://doi.org/10.1093/biomet/68.3.589.

[47] Artur K Ekert, Carolina Moura Alves, Daniel K LOi, Micha l Horodecki, Pawe l Horodecki, and L CKwek. Direct estimations of linear and nonlinearfunctionals of a quantum state. Phys. Rev. Lett.,88(21):217901, May 2002. ISSN 0031-9007. DOI:10.1103/PhysRevLett.88.217901.

[48] Joseph Emerson, Marcus Silva, Osama Moussa,Colm Ryan, Martin Laforest, Jonathan Baugh,David G. Cory, and Raymond Laflamme. Sym-metrized characterization of noisy quantumprocesses. Science, 317(5846):1893–1896, 2007.ISSN 0036-8075. DOI: 10.1126/science.1145699.URL https://science.sciencemag.org/content/317/5846/1893.

[49] Suguru Endo, Simon C Benjamin, and YingLi. Practical quantum error mitigation for Near-Future applications. Phys. Rev. X, 8(3):031027,July 2018. DOI: 10.1103/PhysRevX.8.031027.

[50] Alexander Erhard, Joel J Wallman, LukasPostler, Michael Meth, Roman Stricker, Es-teban A Martinez, Philipp Schindler, ThomasMonz, Joseph Emerson, and Rainer Blatt. Char-acterizing large-scale quantum computers viacycle benchmarking. Nat. Commun., 10(1):5347, November 2019. ISSN 2041-1723. DOI:10.1038/s41467-019-13068-7.

[51] Philippe Faist and Renato Renner. Practicaland reliable error bars in quantum tomogra-

phy. Phys. Rev. Lett., 117(1):010404, July 2016.ISSN 0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.117.010404.

[52] Christopher Ferrie. High posterior density ellip-soids of quantum states. New Journal of Physics,16(2):023006, feb 2014. DOI: 10.1088/1367-2630/16/2/023006.

[53] Christopher Ferrie, Christopher E Granade, andD G Cory. How to best sample a periodic proba-bility distribution, or on the accuracy of hamilto-nian finding strategies. Quantum Inf. Process., 12(1):611–623, January 2013. ISSN 1573-1332. DOI:10.1007/s11128-012-0407-6.

[54] Jaromır Fiurasek. Maximum-likelihood estima-tion of quantum measurement. Phys. Rev.A, 64:024102, Jul 2001. DOI: 10.1103/Phys-RevA.64.024102. URL https://link.aps.org/doi/10.1103/PhysRevA.64.024102.

[55] Jaromır Fiurasek and Zden ek Hradil. Maximum-likelihood estimation of quantum processes. Phys.Rev. A, 63:020101, Jan 2001. DOI: 10.1103/Phys-RevA.63.020101. URL https://link.aps.org/doi/10.1103/PhysRevA.63.020101.

[56] Steven T Flammia and Yi-Kai Liu. Direct fi-delity estimation from few pauli measurements.Phys. Rev. Lett., 106(23):230501, June 2011.ISSN 0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.106.230501.

[57] Steven T Flammia, David Gross, Yi-Kai Liu,and Jens Eisert. Quantum tomography via com-pressed sensing: error bounds, sample complexityand efficient estimators. New Journal of Physics,14(9):095022, sep 2012. ISSN 1367-2630. DOI:10.1088/1367-2630/14/9/095022. URL https://doi.org/10.1088/1367-2630/14/9/095022.

[58] J P Gaebler, A M Meier, T R Tan, R Bowler,Y Lin, D Hanneke, J D Jost, J P Home,E Knill, D Leibfried, and D J Wineland.Randomized benchmarking of multiqubit gates.Phys. Rev. Lett., 108(26):260503, June 2012.ISSN 0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.108.260503.

[59] W Gale, E Guth, and G T Trammell. Determina-tion of the quantum state by measurements. Phys.Rev., 165(5):1434–1436, January 1968. ISSN0959-8472. DOI: 10.1103/PhysRev.165.1434.

[60] Jay M Gambetta, A D Corcoles, S T Merkel, B RJohnson, John A Smolin, Jerry M Chow, Colm ARyan, Chad Rigetti, S Poletto, Thomas A Ohki,Mark B Ketchen, and M Steffen. Characteriza-tion of addressability by simultaneous randomizedbenchmarking. Phys. Rev. Lett., 109(24):240504,December 2012. ISSN 0031-9007, 1079-7114. DOI:10.1103/PhysRevLett.109.240504.


https://doi.org/10.1088/1367-2630/18/10/103018

https://doi.org/10.1088/1367-2630/18/10/103018

https://doi.org/10.1088/1367-2630/18/10/103018

https://doi.org/10.1088/1367-2630/18/10/103018



https://doi.org/10.22331/q-2020-11-17-364

https://doi.org/10.22331/q-2020-11-17-364

https://doi.org/10.1002/1521-3978(200009)48:9/11%3C771::AID-PROP771%3E3.0.CO;2-E



https://doi.org/10.1093/biomet/68.3.589






https://doi.org/10.1126/science.1145699

https://science.sciencemag.org/content/317/5846/1893

https://science.sciencemag.org/content/317/5846/1893


https://doi.org/10.1038/s41467-019-13068-7

https://doi.org/10.1038/s41467-019-13068-7



https://doi.org/10.1088/1367-2630/16/2/023006

https://doi.org/10.1088/1367-2630/16/2/023006

https://doi.org/10.1007/s11128-012-0407-6

https://doi.org/10.1007/s11128-012-0407-6











https://doi.org/10.1088/1367-2630/14/9/095022

https://doi.org/10.1088/1367-2630/14/9/095022

https://doi.org/10.1088/1367-2630/14/9/095022

https://doi.org/10.1088/1367-2630/14/9/095022



https://doi.org/10.1103/PhysRev.165.1434



[61] Christopher Granade, Joshua Combes, andD G Cory. Practical bayesian tomography.New Journal of Physics, 18(3):033024, mar2016. ISSN 1367-2630. DOI: 10.1088/1367-2630/18/3/033024. URL https://doi.org/10.1088/1367-2630/18/3/033024.

[62] Christopher Granade, Christopher Ferrie, IanHincks, Steven Casagrande, Thomas Alexander,Jonathan Gross, Michal Kononenko, and YuvalSanders. QInfer: Statistical inference softwarefor quantum applications. Quantum, 1:5, April2017. ISSN 2521-327X. DOI: 10.22331/q-2017-04-25-5. URL https://doi.org/10.22331/q-2017-04-25-5.

[63] Christopher E Granade, Christopher Ferrie,Nathan Wiebe, and D G Cory. Robust on-line hamiltonian learning. New J. Phys., 14(10):103013, October 2012. ISSN 1367-2630. DOI:10.1088/1367-2630/14/10/103013.

[64] Daniel Greenbaum. Introduction to quantum gateset tomography. arXiv:1509.02921, 2015.

[65] David Gross, Yi-Kai Liu, Steven T Flam-mia, Stephen Becker, and Jens Eisert. Quan-tum state tomography via compressed sensing.Phys. Rev. Lett., 105(15):150401, October 2010.ISSN 0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.105.150401.

[66] J Haah, A W Harrow, Z Ji, X Wu, andN Yu. Sample-Optimal tomography of quantumstates. IEEE Trans. Inf. Theory, 63(9):5628–5641, September 2017. ISSN 0018-9448, 1557-9654. DOI: 10.1109/TIT.2017.2719044.

[67] Kathleen E Hamilton, Eugene F Dumitrescu, andRaphael C Pooser. Generative model benchmarksfor superconducting qubits. Phys. Rev. A, 99(6):062323, June 2019. ISSN 1050-2947. DOI:10.1103/PhysRevA.99.062323.

[68] Robin Harper and Steven T Flammia. Estimatingthe fidelity of T gates using standard interleavedrandomized benchmarking. Quantum Sci. Tech-nol., 2(1):015008, March 2017. ISSN 2058-9565.DOI: 10.1088/2058-9565/aa5f8d.

[69] Teiko Heinosaari, Luca Mazzarella, andMichael M Wolf. Quantum tomography un-der prior information. Commun. Math. Phys.,318(2):355–374, March 2013. ISSN 1432-0916.DOI: 10.1007/s00220-013-1671-8.

[70] Jonas Helsen, Francesco Battistel, and Barbara MTerhal. Spectral quantum tomography. npj Quan-tum Information, 5(1):74, September 2019. ISSN2056-6387. DOI: 10.1038/s41534-019-0189-0.

[71] Cornelius Hempel, Christine Maier, JonathanRomero, Jarrod McClean, Thomas Monz, HengShen, Petar Jurcevic, Ben P Lanyon, Peter

Love, Ryan Babbush, Alan Aspuru-Guzik, RainerBlatt, and Christian F Roos. Quantum chemistrycalculations on a Trapped-Ion quantum simula-tor. Phys. Rev. X, 8(3):031022, July 2018. DOI:10.1103/PhysRevX.8.031022.

[72] Sabrina S Hong, Alexander T Papageorge,Prasahnt Sivarajah, Genya Crossman, Nicolas Di-dier, Anthony M Polloreno, Eyob A Sete, Ste-fan W Turkowski, Marcus P da Silva, and Blake RJohnson. Demonstration of a parametrically ac-tivated entangling gate protected from flux noise.Phys. Rev. A, 101(1):012302, January 2020. ISSN1050-2947. DOI: 10.1103/PhysRevA.101.012302.

[73] P-Y Hou, L He, F Wang, X-Z Huang, W-G Zhang,X-L Ouyang, X Wang, W-Q Lian, X-Y Chang,and L-M Duan. Experimental hamiltonian learn-ing of an 11-qubit Solid-State quantum spin reg-ister. Chin. Physics Lett., 36(10):100303, Octo-ber 2019. ISSN 0256-307X. DOI: 10.1088/0256-307X/36/10/100303.

[74] Zhibo Hou, Huangjun Zhu, Guo-Yong Xiang,Chuan-Feng Li, and Guang-Can Guo. Error-compensation measurements on polarizationqubits. J. Opt. Soc. Am. B, 33(6):1256–1265,Jun 2016. DOI: 10.1364/JOSAB.33.001256. URLhttp://josab.osa.org/abstract.cfm?URI=josab-33-6-1256.

[75] Z. Hradil. Quantum-state estimation.Phys. Rev. A, 55:R1561–R1564, Mar 1997.DOI: 10.1103/PhysRevA.55.R1561. URLhttps://link.aps.org/doi/10.1103/PhysRevA.55.R1561.

[76] Hsin-Yuan Huang, Richard Kueng, and JohnPreskill. Predicting many properties of aquantum system from very few measurements.Nature Physics, 16(10):1050–1057, Oct 2020.ISSN 1745-2481. DOI: 10.1038/s41567-020-0932-7. URL https://doi.org/10.1038/s41567-020-0932-7.

[77] W Huang, C H Yang, K W Chan, T Tanttu,B Hensen, R C C Leon, M A Fogarty, J C CHwang, F E Hudson, K M Itoh, A Morello,A Laucht, and A S Dzurak. Fidelity benchmarksfor two-qubit gates in silicon. Nature, 569(7757):532–536, May 2019. ISSN 0028-0836, 1476-4687.DOI: 10.1038/s41586-019-1197-0.

[78] Christopher Jackson and S. J. van Enk. De-tecting correlated errors in state-preparation-and-measurement tomography. Phys. Rev. A,92:042312, Oct 2015. DOI: 10.1103/Phys-RevA.92.042312. URL https://link.aps.org/doi/10.1103/PhysRevA.92.042312.

[79] Daniel F V James, Paul G Kwiat, William JMunro, and Andrew G White. Measurement


https://doi.org/10.1088/1367-2630/18/3/033024

https://doi.org/10.1088/1367-2630/18/3/033024

https://doi.org/10.1088/1367-2630/18/3/033024

https://doi.org/10.1088/1367-2630/18/3/033024

https://doi.org/10.22331/q-2017-04-25-5

https://doi.org/10.22331/q-2017-04-25-5

https://doi.org/10.22331/q-2017-04-25-5

https://doi.org/10.22331/q-2017-04-25-5

https://doi.org/10.1088/1367-2630/14/10/103013

https://doi.org/10.1088/1367-2630/14/10/103013



https://doi.org/10.1109/TIT.2017.2719044



https://doi.org/10.1088/2058-9565/aa5f8d

https://doi.org/10.1007/s00220-013-1671-8

https://doi.org/10.1038/s41534-019-0189-0




https://doi.org/10.1088/0256-307X/36/10/100303

https://doi.org/10.1088/0256-307X/36/10/100303

https://doi.org/10.1364/JOSAB.33.001256

http://josab.osa.org/abstract.cfm?URI=josab-33-6-1256

http://josab.osa.org/abstract.cfm?URI=josab-33-6-1256

https://doi.org/10.1103/PhysRevA.55.R1561

https://link.aps.org/doi/10.1103/PhysRevA.55.R1561

https://link.aps.org/doi/10.1103/PhysRevA.55.R1561

https://doi.org/10.1038/s41567-020-0932-7

https://doi.org/10.1038/s41567-020-0932-7

https://doi.org/10.1038/s41567-020-0932-7

https://doi.org/10.1038/s41567-020-0932-7

https://doi.org/10.1038/s41586-019-1197-0





of qubits. Phys. Rev. A, 64(5):052312, Octo-ber 2001. ISSN 1050-2947. DOI: 10.1103/Phys-RevA.64.052312.

[80] A. Jamio lkowski. Linear transformations whichpreserve trace and positive semidefiniteness of op-erators, 1972. ISSN 0034-4877. URL https://doi.org/10.1016/0034-4877(72)90011-0.

[81] Jun Jing and Lian-Ao Wu. Decoherence andcontrol of a qubit in spin baths: an exactmaster equation study. Scientific Reports, 8(1):1471, January 2018. DOI: 10.1038/s41598-018-19977-9. URL https://doi.org/10.1038/s41598-018-19977-9.

[82] Manoj K. Joshi, Andreas Elben, Benoıt Ver-mersch, Tiff Brydges, Christine Maier, Pe-ter Zoller, Rainer Blatt, and Christian F.Roos. Quantum information scrambling ina trapped-ion quantum simulator with tun-able range interactions. Phys. Rev. Lett.,124:240505, Jun 2020. DOI: 10.1103/Phys-RevLett.124.240505. URL https://link.aps.org/doi/10.1103/PhysRevLett.124.240505.

[83] Adam C Keith, Charles H Baldwin, ScottGlancy, and E Knill. Joint quantum-stateand measurement tomography with incompletemeasurements. Physical Review A, 98(4):042318,2018. DOI: 10.1103/PhysRevA.98.042318.URL https://link.aps.org/doi/10.1103/PhysRevA.98.042318.

[84] Dohun Kim, Zhan Shi, C B Simmons, D R Ward,J R Prance, Teck Seng Koh, John King Gamble,D E Savage, M G Lagally, Mark Friesen, S N Cop-persmith, and Mark A Eriksson. Quantum con-trol and process tomography of a semiconductorquantum dot hybrid qubit. Nature, 511(7507):70–74, July 2014. ISSN 0028-0836, 1476-4687. DOI:10.1038/nature13407.

[85] Dohun Kim, D. R. Ward, C. B. Simmons,John King Gamble, Robin Blume-Kohout, ErikNielsen, D. E. Savage, M. G. Lagally, MarkFriesen, S. N. Coppersmith, and M. A. Eriks-son. Microwave-driven coherent operation of asemiconductor quantum dot charge qubit. Nat.Nanotechnol., 10(3):243–247, 03 2015. DOI:10.1038/nnano.2014.336. URL https://doi.org/10.1038/nnano.2014.336.

[86] Shelby Kimmel, Marcus P. da Silva, Colm A.Ryan, Blake R. Johnson, and Thomas Ohki.Robust extraction of tomographic informationvia randomized benchmarking. Phys. Rev.X, 4:011050, Mar 2014. DOI: 10.1103/Phys-RevX.4.011050. URL https://link.aps.org/doi/10.1103/PhysRevX.4.011050.

[87] Shelby Kimmel, Guang Hao Low, and Theodore J

Yoder. Robust calibration of a universal single-qubit gate set via robust phase estimation. Phys.Rev. A, 92(6):062315, December 2015. ISSN 1050-2947. DOI: 10.1103/PhysRevA.92.062315.

[88] E. Knill, D. Leibfried, R. Reichle, J. Britton,R. B. Blakestad, J. D. Jost, C. Langer, R. Oz-eri, S. Seidelin, and D. J. Wineland. Random-ized benchmarking of quantum gates. Phys. Rev.A, 77:012307, Jan 2008. DOI: 10.1103/Phys-RevA.77.012307.

[89] Emanuel Knill. Quantum computing with real-istically noisy devices. Nature, 434(7029):39–44,2005. DOI: 10.1038/nature03350. URL https://doi.org/10.1038/nature03350.

[90] Stefan Krastanov, Sisi Zhou, Steven T Flam-mia, and Liang Jiang. Stochastic estimation ofdynamical variables. Quantum Sci. Technol., 4(3):035003, May 2019. ISSN 2058-9565. DOI:10.1088/2058-9565/ab18d5.

[91] K Kraus. States, effects, and operations: funda-mental notions of quantum theory, volume 190 ofLecture Notes in Physics. Springer-Verlag, 1983.

[92] Benjamin Levi, Cecilia C Lopez, Joseph Emerson,and D G Cory. Efficient error characterizationin quantum information processing. Phys. Rev.A, 75(2):022314, February 2007. ISSN 1050-2947.DOI: 10.1103/PhysRevA.75.022314.

[93] Junan Lin, Brandon Buonacorsi, RaymondLaflamme, and Joel J Wallman. On the free-dom in representing quantum operations. New J.Phys., 21(2):023006, February 2019. ISSN 1367-2630. DOI: 10.1088/1367-2630/ab075a.

[94] Junan Lin, Brandon Buonacorsi, RaymondLaflamme, and Joel J Wallman. On the free-dom in representing quantum operations. NewJournal of Physics, 21(2):023006, feb 2019. DOI:10.1088/1367-2630/ab075a. URL https://doi.org/10.1088%2F1367-2630%2Fab075a.

[95] G. Lindblad. On the generators of quantum dy-namical semigroups. Comm. Math. Phys., 48(2):119–130, 1976. DOI: 10.1007/BF01608499. URLhttps://doi.org/10.1007/BF01608499.

[96] Norbert M Linke, Dmitri Maslov, Martin Roet-teler, Shantanu Debnath, Caroline Figgatt,Kevin A Landsman, Kenneth Wright, andChristopher Monroe. Experimental comparisonof two quantum computing architectures. Proc.Natl. Acad. Sci. U. S. A., 114(13):3305–3310,March 2017. ISSN 0027-8424, 1091-6490. DOI:10.1073/pnas.1618020114.

[97] Mirko Lobino, Dmitry Korystov, ConnorKupchak, Eden Figueroa, Barry C Sanders,and A I Lvovsky. Complete characterizationof quantum-optical processes. Science, 322




https://doi.org/10.1016/0034-4877(72)90011-0

https://doi.org/10.1016/0034-4877(72)90011-0

https://doi.org/10.1038/s41598-018-19977-9

https://doi.org/10.1038/s41598-018-19977-9

https://doi.org/10.1038/s41598-018-19977-9

https://doi.org/10.1038/s41598-018-19977-9










https://doi.org/10.1038/nnano.2014.336














https://doi.org/10.1088/2058-9565/ab18d5

https://doi.org/10.1088/2058-9565/ab18d5


https://doi.org/10.1088/1367-2630/ab075a

https://doi.org/10.1088/1367-2630/ab075a

https://doi.org/10.1088/1367-2630/ab075a

https://doi.org/10.1088%2F1367-2630%2Fab075a

https://doi.org/10.1088%2F1367-2630%2Fab075a

https://doi.org/10.1007/BF01608499

https://doi.org/10.1007/BF01608499

https://doi.org/10.1073/pnas.1618020114

https://doi.org/10.1073/pnas.1618020114

(5901):563–566, October 2008. ISSN 0036-8075,1095-9203. DOI: 10.1126/science.1162086.

[98] A. Luis and L. L. Sanchez-Soto. Com-plete characterization of arbitrary quantum mea-surement processes. Phys. Rev. Lett., 83:3573–3576, Nov 1999. DOI: 10.1103/Phys-RevLett.83.3573. URL https://link.aps.org/doi/10.1103/PhysRevLett.83.3573.

[99] J S Lundeen, A Feito, H Coldenstrodt-Ronge, K LPregnell, Ch Silberhorn, T C Ralph, J Eisert,M B Plenio, and I A Walmsley. Tomography ofquantum detectors. Nat. Phys., 5(1):27–30, Jan-uary 2009. ISSN 1745-2473, 1745-2481. DOI:10.1038/nphys1133.

[100] Easwar Magesan, J M Gambetta, and JosephEmerson. Scalable and robust randomized bench-marking of quantum processes. Phys. Rev. Lett.,106(18):180504, May 2011. ISSN 0031-9007. DOI:10.1103/PhysRevLett.106.180504.

[101] Easwar Magesan, Jay M Gambetta, and JosephEmerson. Characterizing quantum gates via ran-domized benchmarking. Phys. Rev. A, 85(4):042311, April 2012. ISSN 1050-2947. DOI:10.1103/PhysRevA.85.042311.

[102] Easwar Magesan, Jay M Gambetta, B R Johnson,Colm A Ryan, Jerry M Chow, Seth T Merkel,Marcus P da Silva, George A Keefe, Mary BRothwell, Thomas A Ohki, Mark B Ketchen, andM Steffen. Efficient measurement of quantumgate error by interleaved randomized benchmark-ing. Phys. Rev. Lett., 109(8):080505, August 2012.ISSN 0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.109.080505.

[103] D. H. Mahler, Lee A. Rozema, Ardavan Darabi,Christopher Ferrie, Robin Blume-Kohout, andA. M. Steinberg. Adaptive quantum statetomography improves accuracy quadratically.Phys. Rev. Lett., 111:183601, Oct 2013. DOI:10.1103/PhysRevLett.111.183601.

[104] S Mavadia, C L Edmunds, C Hempel, H Ball,F Roy, T M Stace, and M J Biercuk. Exper-imental quantum verification in the presence oftemporally correlated noise. npj Quantum Infor-mation, 4(1):7, February 2018. ISSN 2056-6387.DOI: 10.1038/s41534-017-0052-0.

[105] Karl Mayer and Emanuel Knill. Quantum processfidelity bounds from sets of input states. Phys.Rev. A, 98(5):052326, November 2018. ISSN 1050-2947. DOI: 10.1103/PhysRevA.98.052326.

[106] Alexander J McCaskey, Zachary P Parks, JacekJakowski, Shirley V Moore, Titus D Morris,Travis S Humble, and Raphael C Pooser. Quan-tum chemistry as a benchmark for near-termquantum computers. npj Quantum Information,

5(1):99, November 2019. ISSN 2056-6387. DOI:10.1038/s41534-019-0209-0.

[107] A. F. McCormick, S. J. van Enk, and M. Beck.Experimental demonstration of loop state-preparation-and-measurement tomography. Phys.Rev. A, 95:042329, Apr 2017. DOI: 10.1103/Phys-RevA.95.042329. URL https://link.aps.org/doi/10.1103/PhysRevA.95.042329.

[108] David C McKay, Sarah Sheldon, John ASmolin, Jerry M Chow, and Jay M Gam-betta. Three-Qubit randomized benchmarking.Phys. Rev. Lett., 122(20):200502, May 2019.ISSN 0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.122.200502.

[109] J Medford, Johannes Beil, JM Taylor,SD Bartlett, AC Doherty, EI Rashba, DP Di-Vincenzo, H Lu, AC Gossard, and Charles MMarcus. Self-consistent measurement andstate tomography of an exchange-only spinqubit. Nature nanotechnology, 8(9):654,2013. DOI: 10.1038/nnano.2013.168. URLhttps://doi.org/10.1038/nnano.2013.168.

[110] Seth T. Merkel, Jay M. Gambetta, John A.Smolin, Stefano Poletto, Antonio D. Corcoles,Blake R. Johnson, Colm A. Ryan, and MatthiasSteffen. Self-consistent quantum process tomog-raphy. Phys. Rev. A, 87:062119, Jun 2013. DOI:10.1103/PhysRevA.87.062119.

[111] Kristel Michielsen, Madita Nocon, DennisWillsch, Fengping Jin, Thomas Lippert, and HansDe Raedt. Benchmarking gate-based quantumcomputers. Comput. Phys. Commun., 220:44–55, November 2017. ISSN 0010-4655. DOI:10.1016/j.cpc.2017.06.011.

[112] D Mogilevtsev, J Rehacek, and Z Hradil. Self-calibration for self-consistent tomography. NewJournal of Physics, 14(9):095001, sep 2012. DOI:10.1088/1367-2630/14/9/095001. URL https://doi.org/10.1088/1367-2630/14/9/095001.

[113] Mohammadreza Mohammadi and Agata M.Branczyk. Optimization of quantum statetomography in the presence of experimentalconstraints. Phys. Rev. A, 89:012113, Jan2014. DOI: 10.1103/PhysRevA.89.012113.URL https://link.aps.org/doi/10.1103/PhysRevA.89.012113.

[114] M Mohseni and D A Lidar. Direct characteriza-tion of quantum dynamics. Phys. Rev. Lett., 97(17):170501, October 2006. ISSN 0031-9007. DOI:10.1103/PhysRevLett.97.170501.

[115] Osama Moussa, Marcus P da Silva, Colm A Ryan,and Raymond Laflamme. Practical experimentalcertification of computational quantum gates us-ing a twirling procedure. Phys. Rev. Lett., 109(7):


https://doi.org/10.1126/science.1162086















https://doi.org/10.1038/s41534-017-0052-0


https://doi.org/10.1038/s41534-019-0209-0

https://doi.org/10.1038/s41534-019-0209-0











https://doi.org/10.1016/j.cpc.2017.06.011

https://doi.org/10.1016/j.cpc.2017.06.011

https://doi.org/10.1088/1367-2630/14/9/095001

https://doi.org/10.1088/1367-2630/14/9/095001

https://doi.org/10.1088/1367-2630/14/9/095001

https://doi.org/10.1088/1367-2630/14/9/095001






070504, August 2012. ISSN 0031-9007, 1079-7114.DOI: 10.1103/PhysRevLett.109.070504.

[116] Pranav Mundada, Gengyan Zhang, ThomasHazard, and Andrew Houck. Suppression ofqubit crosstalk in a tunable coupling super-conducting circuit. Phys. Rev. Applied, 12:054023, Nov 2019. DOI: 10.1103/PhysRevAp-plied.12.054023. URL https://link.aps.org/doi/10.1103/PhysRevApplied.12.054023.

[117] W J Munro, D F V James, A G White, andPaul G Kwiat. Tomography and its role in quan-tum computation. HP Laboratories Technical Re-port, 53, 2001.

[118] C Neill, P Roushan, K Kechedzhi, S Boixo, S VIsakov, V Smelyanskiy, A Megrant, B Chiaro,A Dunsworth, K Arya, R Barends, B Burkett,Y Chen, Z Chen, A Fowler, B Foxen, M Giustina,R Graff, E Jeffrey, T Huang, J Kelly, P Klimov,E Lucero, J Mutus, M Neeley, C Quintana,D Sank, A Vainsencher, J Wenner, T C White,H Neven, and J M Martinis. A blueprint fordemonstrating quantum supremacy with super-conducting qubits. Science, 360(6385):195–199,April 2018. ISSN 0036-8075, 1095-9203. DOI:10.1126/science.aao4309.

[119] Erik Nielsen. Gate set tomography on manyqubits. in preparation.

[120] Erik Nielsen, Kenneth Rudinger, John King Gam-ble, and Robin Blume-Kohout. pyGSTi: Apython implementation of gate set tomography,2016. URL http://github.com/pyGSTio.

[121] Erik Nielsen, Kenneth Rudinger, Timothy Proc-tor, Antonio Russo, Kevin Young, and RobinBlume-Kohout. Probing quantum processorperformance with pyGSTi. Quantum Scienceand Technology, 5(4):044002, jul 2020. DOI:10.1088/2058-9565/ab8aa4. URL https://doi.org/10.1088/2058-9565/ab8aa4.

[122] MA Nielsen and Isaac L Chuang. Quantum Com-putation and Quantum Information. CambridgeUniversity Press, 2000.

[123] J. L. O’Brien, G. J. Pryde, A. Gilchrist, D. F. V.James, N. K. Langford, T. C. Ralph, andA. G. White. Quantum process tomographyof a controlled-not gate. Phys. Rev. Lett.,93:080502, Aug 2004. DOI: 10.1103/Phys-RevLett.93.080502. URL https://link.aps.org/doi/10.1103/PhysRevLett.93.080502.

[124] T E O’Brien, B Tarasinski, and L DiCarlo.Density-matrix simulation of small surface codesunder current and projected experimental noise.npj Quantum Information, 3(1):1–8, Septem-ber 2017. ISSN 2056-6387, 2056-6387. DOI:10.1038/s41534-017-0039-x.

[125] C Piltz, T Sriarunothai, A F Varon, and C Wun-derlich. A trapped-ion-based quantum byte with10(-5) next-neighbour cross-talk. Nat. Commun.,5:4679, August 2014. DOI: 10.1038/ncomms5679.URL https://doi.org/10.1038/ncomms5679.

[126] J F Poyatos, J I Cirac, and P Zoller. Com-plete characterization of a quantum process: TheTwo-Bit quantum gate. Phys. Rev. Lett., 78(2):390–393, January 1997. ISSN 0031-9007. DOI:10.1103/PhysRevLett.78.390.

[127] Timothy Proctor, Kenneth Rudinger, KevinYoung, Mohan Sarovar, and Robin Blume-Kohout. What randomized benchmarking ac-tually measures. Phys. Rev. Lett., 119:130502, Sep 2017. DOI: 10.1103/Phys-RevLett.119.130502. URL https://link.aps.org/doi/10.1103/PhysRevLett.119.130502.

[128] Timothy Proctor, Melissa Revelle, Erik Nielsen,Kenneth Rudinger, Daniel Lobser, Peter Maunz,Robin Blume-Kohout, and Kevin Young. Detect-ing and tracking drift in quantum informationprocessors. Nature Communications, 11(1):5396,Oct 2020. ISSN 2041-1723. DOI: 10.1038/s41467-020-19074-4. URL https://doi.org/10.1038/s41467-020-19074-4.

[129] Timothy J Proctor, Arnaud Carignan-Dugas,Kenneth Rudinger, Erik Nielsen, Robin Blume-Kohout, and Kevin Young. Direct random-ized benchmarking for multiqubit devices. Phys.Rev. Lett., 123(3):030503, July 2019. ISSN0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.123.030503.

[130] Zbigniew Pucha la, Lukasz Rudnicki, and KarolZyczkowski. Pauli semigroups and unistochas-tic quantum channels. Phys. Lett. A, 383(20):2376–2381, July 2019. ISSN 0375-9601. DOI:10.1016/j.physleta.2019.04.057.

[131] Nicolas Quesada, Agata M. Branczyk, andDaniel F. V. James. Self-calibrating tomogra-phy for multidimensional systems. Phys. Rev.A, 87:062118, Jun 2013. DOI: 10.1103/Phys-RevA.87.062118. URL https://link.aps.org/doi/10.1103/PhysRevA.87.062118.

[132] Nicolas Quesada, Agata M. Branczyk, andDaniel F.V. James. Holistic quantum state andprocess tomography. In Frontiers in Optics2013, page FW1C.6. Optical Society of America,2013. DOI: 10.1364/FIO.2013.FW1C.6. URLhttp://www.osapublishing.org/abstract.cfm?URI=FiO-2013-FW1C.6.

[133] Nicolas Quesada, Agata M. Branczyk, andDaniel F.V. James. Self-calibrating tomographyfor non-unitary processes. In The RochesterConferences on Coherence and Quantum Optics







https://doi.org/10.1126/science.aao4309

https://doi.org/10.1126/science.aao4309

http://github.com/pyGSTio

https://doi.org/10.1088/2058-9565/ab8aa4








https://doi.org/10.1038/s41534-017-0039-x

https://doi.org/10.1038/s41534-017-0039-x









https://doi.org/10.1038/s41467-020-19074-4

https://doi.org/10.1038/s41467-020-19074-4

https://doi.org/10.1038/s41467-020-19074-4

https://doi.org/10.1038/s41467-020-19074-4



https://doi.org/10.1016/j.physleta.2019.04.057

https://doi.org/10.1016/j.physleta.2019.04.057





https://doi.org/10.1364/FIO.2013.FW1C.6

http://www.osapublishing.org/abstract.cfm?URI=FiO-2013-FW1C.6

http://www.osapublishing.org/abstract.cfm?URI=FiO-2013-FW1C.6

and the Quantum Information and Measurementmeeting, page W6.38. Optical Society of Amer-ica, 2013. DOI: 10.1364/QIM.2013.W6.38. URLhttp://www.osapublishing.org/abstract.cfm?URI=QIM-2013-W6.38.

[134] P. Rebentrost, I. Serban, T. Schulte-Herbruggen,and F. K. Wilhelm. Optimal control of aqubit coupled to a non-markovian environ-ment. Phys. Rev. Lett., 102:090401, Mar2009. DOI: 10.1103/PhysRevLett.102.090401.URL https://link.aps.org/doi/10.1103/PhysRevLett.102.090401.

[135] Daniel M Reich, Giulia Gualdi, and Chris-tiane P Koch. Optimal strategies for esti-mating the average fidelity of quantum gates.Phys. Rev. Lett., 111(20):200401, November 2013.ISSN 0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.111.200401.

[136] M Riebe, K Kim, P Schindler, T Monz, P OSchmidt, T K Korber, W Hansel, H Haffner, C FRoos, and R Blatt. Process tomography of iontrap quantum gates. Phys. Rev. Lett., 97(22):220407, December 2006. ISSN 0031-9007. DOI:10.1103/PhysRevLett.97.220407.

[137] M A Rol, C C Bultink, T E O’Brien, S R de Jong,L S Theis, X Fu, F Luthi, R F L Vermeulen, J Cde Sterke, A Bruno, D Deurloo, R N Schouten,F K Wilhelm, and L DiCarlo. Restless tuneupof High-Fidelity qubit gates. Phys. Rev. Applied,7(4):041001, April 2017. DOI: 10.1103/PhysRe-vApplied.7.041001.

[138] Kenneth Rudinger and Robert Joynt. Com-pressed sensing for hamiltonian recon-struction. Phys. Rev. A, 92:052322, Nov2015. DOI: 10.1103/PhysRevA.92.052322.URL https://link.aps.org/doi/10.1103/PhysRevA.92.052322.

[139] Kenneth Rudinger, Shelby Kimmel, Daniel Lob-ser, and Peter Maunz. Experimental demon-stration of a cheap and accurate phase estima-tion. Phys. Rev. Lett., 118(19):190502, May 2017.ISSN 0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.118.190502.

[140] Kenneth Rudinger, Shelby Kimmel, Daniel Lob-ser, and Peter Maunz. Experimental demonstra-tion of a cheap and accurate phase estimation.Phys. Rev. Lett., 118:190502, May 2017. DOI:10.1103/PhysRevLett.118.190502.

[141] Kenneth Rudinger, Timothy Proctor, Dy-lan Langharst, Mohan Sarovar, Kevin Young,and Robin Blume-Kohout. Probing context-dependent errors in quantum processors. Phys.Rev. X, 9:021045, Jun 2019. DOI: 10.1103/Phys-

RevX.9.021045. URL https://link.aps.org/doi/10.1103/PhysRevX.9.021045.

[142] Lukasz Rudnicki, Zbigniew Pucha la, and KarolZyczkowski. Gauge invariant information con-cerning quantum channels. Quantum, 2:60, April2018. ISSN 2521-327X. DOI: 10.22331/q-2018-04-11-60. URL https://doi.org/10.22331/q-2018-04-11-60.

[143] Mohan Sarovar, Timothy Proctor, KennethRudinger, Kevin Young, Erik Nielsen, and RobinBlume-Kohout. Detecting crosstalk errors inquantum information processors. Quantum, 4:321, September 2020. ISSN 2521-327X. DOI:10.22331/q-2020-09-11-321. URL https://doi.org/10.22331/q-2020-09-11-321.

[144] S G Schirmer, A Kolli, and D K L Oi. Exper-imental hamiltonian identification for controlledtwo-level systems. Phys. Rev. A, 69(5):050306,May 2004. ISSN 1050-2947. DOI: 10.1103/Phys-RevA.69.050306.

[145] Travis L Scholten and Robin Blume-Kohout.Behavior of the maximum likelihood in quan-tum state tomography. New Journal of Physics,20(2):023050, feb 2018. DOI: 10.1088/1367-2630/aaa7e2. URL https://doi.org/10.1088/1367-2630/aaa7e2.

[146] Travis L Scholten, Yi-Kai Liu, Kevin Young, andRobin Blume-Kohout. Classifying single-qubitnoise using machine learning. arXiv preprintarXiv:1908.11762, August 2019.

[147] Christian Schwemmer, Lukas Knips, DanielRichart, Harald Weinfurter, Tobias Moroder,Matthias Kleinmann, and Otfried Guhne. Sys-tematic errors in current quantum state to-mography tools. Phys. Rev. Lett., 114:080403, Feb 2015. DOI: 10.1103/Phys-RevLett.114.080403. URL https://link.aps.org/doi/10.1103/PhysRevLett.114.080403.

[148] A.J. Scott. Tight informationally complete quan-tum measurements. Journal of Physics A:Mathematical and General, 39, 10 2006. DOI:10.1088/0305-4470/39/43/009.

[149] A Shabani, R L Kosut, M Mohseni, H Rabitz,M A Broome, M P Almeida, A Fedrizzi, andA G White. Efficient measurement of quan-tum dynamics via compressive sensing. Phys.Rev. Lett., 106(10):100401, March 2011. ISSN0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.106.100401.

[150] Jiangwei Shang, Hui Khoon Ng, Arun Sehrawat,Xikun Li, and Berthold-Georg Englert. Op-timal error regions for quantum state estima-tion. New J. Phys., 15(12):123026, December


https://doi.org/10.1364/QIM.2013.W6.38

http://www.osapublishing.org/abstract.cfm?URI=QIM-2013-W6.38

http://www.osapublishing.org/abstract.cfm?URI=QIM-2013-W6.38





















https://doi.org/10.22331/q-2018-04-11-60

https://doi.org/10.22331/q-2018-04-11-60

https://doi.org/10.22331/q-2018-04-11-60

https://doi.org/10.22331/q-2018-04-11-60

https://doi.org/10.22331/q-2020-09-11-321

https://doi.org/10.22331/q-2020-09-11-321

https://doi.org/10.22331/q-2020-09-11-321

https://doi.org/10.22331/q-2020-09-11-321



https://doi.org/10.1088/1367-2630/aaa7e2








https://doi.org/10.1088/0305-4470/39/43/009

https://doi.org/10.1088/0305-4470/39/43/009



2013. ISSN 1367-2630. DOI: 10.1088/1367-2630/15/12/123026.

[151] Sarah Sheldon, Lev S Bishop, Easwar Magesan,Stefan Filipp, Jerry M Chow, and Jay M Gam-betta. Characterizing errors on qubit operationsvia iterative randomized benchmarking. Phys.Rev. A, 93(1):012301, January 2016. ISSN 1050-2947. DOI: 10.1103/PhysRevA.93.012301.

[152] John A. Smolin, Jay M. Gambetta, andGraeme Smith. Efficient method for comput-ing the maximum-likelihood quantum statefrom measurements with additive gaussiannoise. Phys. Rev. Lett., 108:070502, Feb2012. DOI: 10.1103/PhysRevLett.108.070502.URL https://link.aps.org/doi/10.1103/PhysRevLett.108.070502.

[153] Chao Song, Jing Cui, H Wang, J Hao, H Feng,and Ying Li. Quantum computation with univer-sal error mitigation on a superconducting quan-tum processor. Sci Adv, 5(9):eaaw5686, Septem-ber 2019. ISSN 2375-2548. DOI: 10.1126/sci-adv.aaw5686.

[154] Cyril Stark. Self-consistent tomography of thestate-measurement gram matrix. Phys. Rev.A, 89:052109, May 2014. DOI: 10.1103/Phys-RevA.89.052109. URL https://link.aps.org/doi/10.1103/PhysRevA.89.052109.

[155] S. S. Straupe, D. P. Ivanov, A. A. Kalinkin,I. B. Bobrov, S. P. Kulik, and D. Mogilevt-sev. Self-calibrating tomography for angu-lar schmidt modes in spontaneous paramet-ric down-conversion. Phys. Rev. A, 87:042109, Apr 2013. DOI: 10.1103/Phys-RevA.87.042109. URL https://link.aps.org/doi/10.1103/PhysRevA.87.042109.

[156] Maki Takahashi, Stephen D. Bartlett, and An-drew C. Doherty. Tomography of a spinqubit in a double quantum dot. Phys. Rev.A, 88:022120, Aug 2013. DOI: 10.1103/Phys-RevA.88.022120. URL https://link.aps.org/doi/10.1103/PhysRevA.88.022120.

[157] G Toth, W Wieczorek, D Gross, R Krischek,C Schwemmer, and H Weinfurter. Permuta-tionally invariant quantum tomography. Phys.Rev. Lett., 105(25):250403, December 2010. ISSN0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.105.250403.

[158] Andrzej Veitia and Steven J. van Enk. Testing thecontext-independence of quantum gates. arXivpreprint arXiv:1810.05945, 2020.

[159] K Vogel and H Risken. Determination ofquasiprobability distributions in terms of prob-ability distributions for the rotated quadraturephase. Phys. Rev. A Gen. Phys., 40(5):2847–

2849, September 1989. ISSN 0556-2791. DOI:10.1103/physreva.40.2847.

[160] Joel Wallman, Chris Granade, Robin Harper,and Steven T Flammia. Estimating the co-herence of noise. New Journal of Physics, 17(11):113020, nov 2015. DOI: 10.1088/1367-2630/17/11/113020. URL https://doi.org/10.1088/1367-2630/17/11/113020.

[161] Joel J. Wallman. Randomized benchmarking withgate-dependent noise. Quantum, 2:47, January2018. ISSN 2521-327X. DOI: 10.22331/q-2018-01-29-47. URL https://doi.org/10.22331/q-2018-01-29-47.

[162] Joel J. Wallman and Steven T. Flammia. Ran-domized benchmarking with confidence. NewJ. Phys., 16(10):103032, 2014. ISSN 1367-2630.DOI: 10.1088/1367-2630/16/10/103032.

[163] Joel J Wallman, Marie Barnhill, and JosephEmerson. Robust characterization of loss rates.Phys. Rev. Lett., 115(6):060501, August 2015.ISSN 0031-9007, 1079-7114. DOI: 10.1103/Phys-RevLett.115.060501.

[164] Joel J Wallman, Marie Barnhill, and JosephEmerson. Robust characterization of leakageerrors. New J. Phys., 18(4):043021, April2016. ISSN 1367-2630. DOI: 10.1088/1367-2630/18/4/043021.

[165] Jianwei Wang, Stefano Paesani, Raffaele San-tagati, Sebastian Knauer, Antonio A Gentile,Nathan Wiebe, Maurangelo Petruzzella, Jeremy LO’Brien, John G Rarity, Anthony Laing, andMark G Thompson. Experimental quantumhamiltonian learning. Nat. Phys., 13(6):551–555,June 2017. ISSN 1745-2473, 1745-2481. DOI:10.1038/nphys4074.

[166] Sheng-Tao Wang, Dong-Ling Deng, and L-M Duan. Hamiltonian tomography for quan-tum many-body systems with arbitrary cou-plings. New J. Phys., 17(9):093017, Septem-ber 2015. ISSN 1367-2630. DOI: 10.1088/1367-2630/17/9/093017.

[167] Matthew Ware, Guilhem Ribeill, Diego Riste,Colm A. Ryan, Blake Johnson, and Marcus P.da Silva. Experimental pauli-frame randomiza-tion on a superconducting qubit. Phys. Rev.A, 103:042604, Apr 2021. DOI: 10.1103/Phys-RevA.103.042604. URL https://link.aps.org/doi/10.1103/PhysRevA.103.042604.

[168] A. E. Webb, S. C. Webster, S. Colling-bourne, D. Bretaud, A. M. Lawrence, S. Weidt,F. Mintert, and W. K. Hensinger. Resiliententangling gates for trapped ions. Phys. Rev.Lett., 121:180501, Nov 2018. DOI: 10.1103/Phys-


https://doi.org/10.1088/1367-2630/15/12/123026

https://doi.org/10.1088/1367-2630/15/12/123026





https://doi.org/10.1126/sciadv.aaw5686

https://doi.org/10.1126/sciadv.aaw5686















https://doi.org/10.1103/physreva.40.2847

https://doi.org/10.1103/physreva.40.2847

https://doi.org/10.1088/1367-2630/17/11/113020

https://doi.org/10.1088/1367-2630/17/11/113020

https://doi.org/10.1088/1367-2630/17/11/113020

https://doi.org/10.1088/1367-2630/17/11/113020

https://doi.org/10.22331/q-2018-01-29-47

https://doi.org/10.22331/q-2018-01-29-47

https://doi.org/10.22331/q-2018-01-29-47

https://doi.org/10.22331/q-2018-01-29-47

https://doi.org/10.1088/1367-2630/16/10/103032



https://doi.org/10.1088/1367-2630/18/4/043021

https://doi.org/10.1088/1367-2630/18/4/043021



https://doi.org/10.1088/1367-2630/17/9/093017

https://doi.org/10.1088/1367-2630/17/9/093017







RevLett.121.180501. URL https://link.aps.org/doi/10.1103/PhysRevLett.121.180501.

[169] Yaakov S. Weinstein, Timothy F. Havel, JosephEmerson, Nicolas Boulant, Marcos Saraceno, SethLloyd, and David G. Cory. Quantum process to-mography of the quantum fourier transform. TheJournal of Chemical Physics, 121(13):6117–6133,2004. DOI: 10.1063/1.1785151. URL https://doi.org/10.1063/1.1785151.

[170] G.A.L. White, C.D. Hill, and L.C.L. Hol-lenberg. Performance optimization for drift-robust fidelity improvement of two-qubitgates. Phys. Rev. Applied, 15:014023, Jan2021. DOI: 10.1103/PhysRevApplied.15.014023.URL https://link.aps.org/doi/10.1103/PhysRevApplied.15.014023.

[171] S. S. Wilks. The large-sample distribution of thelikelihood ratio for testing composite hypotheses.Ann. Math. Statist., 9(1):60–62, 03 1938. DOI:10.1214/aoms/1177732360. URL https://doi.org/10.1214/aoms/1177732360.

[172] Christopher J. Wood and Jay M. Gambetta.Quantification and characterization of leak-age errors. Phys. Rev. A, 97:032306, Mar2018. DOI: 10.1103/PhysRevA.97.032306.URL https://link.aps.org/doi/10.1103/PhysRevA.97.032306.

[173] K Wright, K M Beck, S Debnath, J MAmini, Y Nam, N Grzesiak, J-S Chen, N CPisenti, M Chmielewski, C Collins, K M Hudek,J Mizrahi, J D Wong-Campos, S Allen, J Apis-dorf, P Solomon, M Williams, A M Ducore, A Bli-nov, S M Kreikemeier, V Chaplin, M Keesan,C Monroe, and J Kim. Benchmarking an 11-qubit quantum computer. Nat. Commun., 10(1):5464, November 2019. ISSN 2041-1723. DOI:10.1038/s41467-019-13534-2.

[174] Kubra Yeter-Aydeniz, Eugene F Dumitrescu,Alex J McCaskey, Ryan S Bennink, Raphael CPooser, and George Siopsis. Scalar quantum fieldtheories as a benchmark for near-term quantumcomputers. Phys. Rev. A, 99(3):032306, March2019. ISSN 1050-2947. DOI: 10.1103/Phys-RevA.99.032306.

[175] Jun Zhang and Mohan Sarovar. Quantum hamil-tonian identification from measurement timetraces. Phys. Rev. Lett., 113(8):080401, Au-gust 2014. ISSN 0031-9007, 1079-7114. DOI:10.1103/PhysRevLett.113.080401.

[176] Shuaining Zhang, Yao Lu, Kuan Zhang, WentaoChen, Ying Li, Jing-Ning Zhang, and KihwanKim. Error-mitigated quantum gates exceedingphysical fidelities in a trapped-ion system. Nat.

Commun., 11(1):587, January 2020. ISSN 2041-1723. DOI: 10.1038/s41467-020-14376-z.





https://doi.org/10.1063/1.1785151

https://doi.org/10.1063/1.1785151

https://doi.org/10.1063/1.1785151




https://doi.org/10.1214/aoms/1177732360







https://doi.org/10.1038/s41467-019-13534-2

https://doi.org/10.1038/s41467-019-13534-2





https://doi.org/10.1038/s41467-020-14376-z

gate set tomography - arxiv

Documents