helmut g. katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling...

Helmut G. Katzgraberhttps://intractable.lol

Quantum vs Classical Optimization:A status update on the arms race

https://intractable.lolHelmut G. Katzgraber


https://intractable.lolHelmut G. Katzgraber


~ = 0 ~ > 0

0 0:

• Some questions we would like answers to…

• Current status of quantum vs classical optimization?

• What about quantum approaches for machine learning?

• If QA fails to deliver, can we still benefit? Think quantum inspired…

• Texas A&M team:

Outline

S. Mandrà @ , F. Hamze @ , C. Thomas @ . as well as…

• Some questions we would like answers to…

• Current status of quantum vs classical optimization?

• What about quantum approaches for machine learning?

• If QA fails to deliver, can we still benefit? Think quantum inspired…

• Texas A&M team:

Outline

S. Mandrà @ , F. Hamze @ , C. Thomas @ .

C. FangDr. W. Wang

H. Munoz-B. J. ChancellorDr. Z. Zhu

A. BarzegarA. Ochoa

C. Pattison missing

as well as…

Why quantum annealing? Optimization!

• Selected problems of interest:

• Constraint satisfaction (SAT)

• Number partitioning

• Minimum vertex covers

• Traveling salesman problem, …

• What do all these have in common?

• Rough cost function landscapes.

• They are problems in NP (also typical hard).

• All map onto Quadratic Unconstrained Binary Optimization (QUBO) problems.

(x11ORx12)AND(x21ORx22)...

min vertex cover

NPP

H(Si) =NX

i 6=j

QijSiSj Si 2 {±1}

NP Problems

NP-complete

P Problems

minSi












min vertex cover

NPP

H(Si) =NX

i 6=j

QijSiSj Si 2 {±1}

NP Problems

NP-complete

P Problems

minSi












min vertex cover

NPP

H(Si) =NX

i 6=j

QijSiSj Si 2 {±1}

NP Problems

NP-complete

P Problems

Good QUBO solvers & fast architectures needed!

Moore’s Law is coming to an end…

• Four possible ways to overcome the end of Moore’s law:

• Build larger silicon-based computers.

• Develop faster silicon-based technologies.

• Focus on faster algorithms.

• Go beyond standard silicon architectures.

• Here, deep synergy between…

• Physics,…

• …quantum information, … … and computer science.

“The road map was an incredibly interesting experiment,” says Flamm. “So far as I know, there is no example of anything like this in any other industry, where every manufacturer and supplier gets together and figures out what they are going to do.” In effect, it converted Moore’s law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did.

And it all worked beautifully, says Flamm — right up until it didn’t.

HEAT DEATHThe first stumbling block was not unexpected. Gargini and others had warned about it as far back as 1989. But it hit hard nonetheless: things got too small.

“It used to be that whenever we would scale to smaller feature size, good things happened automatically,” says Bill Bottoms, president of Third Millennium Test Solutions, an equipment manufacturer in Santa Clara. “The chips would go faster and consume less power.”

But in the early 2000s, when the features began to shrink below about 90 nanometres, that automatic benefit began to fail. As electrons had to move faster and faster through silicon circuits that were smaller and smaller, the chips began to get too hot.

That was a fundamental problem. Heat is hard to get rid of, and no one wants to buy a mobile phone that burns their hand. So manufac-turers seized on the only solutions they had, says Gargini. First, they stopped trying to increase ‘clock rates’ — how fast microprocessors execute instructions. This effectively put a speed limit on the chip’s electrons and limited their ability to generate heat. The maximum clock rate hasn’t budged since 2004.

Second, to keep the chips moving along the Moore’s law performance curve despite the speed limit, they redesigned the internal circuitry so that each chip contained not one processor, or ‘core’, but two, four or more. (Four and eight are common in today’s desktop computers and smartphones.) In principle, says Gargini, “you can have the same output with four cores going at 250 megahertz as one going at 1 gigahertz”. In practice, exploiting eight processors means that a problem has to be broken down into eight pieces — which for many algorithms is dif-ficult to impossible. “The piece that can’t be parallelized will limit your improvement,” says Gargini.

Even so, when combined with creative redesigns to compensate for electron leakage and other effects, these two solutions have enabled chip manufacturers to continue shrinking their circuits and keeping their transistor counts on track with Moore’s law. The question now is what will happen in the early 2020s, when continued scaling is no longer possible with silicon because quantum effects have come into play. What comes next? “We’re still struggling,” says An Chen, an electrical engineer who works for the international chipmaker GlobalFoundries in Santa Clara, California, and who chairs a committee of the new road map that is looking into the question.

That is not for a lack of ideas. One possibility is to embrace a completely new paradigm — something like quantum computing, which promises exponential speed-up for certain calculations, or neuro morphic computing, which aims to model processing elements on neurons in the brain. But none of these alternative paradigms has made it very far out of the laboratory. And many researchers think that quantum computing will offer advantages only for niche applications, rather than for the everyday tasks at which digital computing excels. “What does it mean to quantum-balance a chequebook?” wonders John Shalf, head of computer-science research at the Lawrence Berkeley National Laboratory in Berkeley, California.

MATERIAL DIFFERENCESA different approach, which does stay in the digital realm, is the quest to find a ‘millivolt switch’: a material that could be used for devices at least as fast as their silicon counterparts, but that would generate much less heat. There are many candidates, ranging from 2D graphene-like compounds to spintronic materials that would compute by flipping electron spins rather than by moving electrons. “There is an enormous research space to be explored once you step outside the confines of the established technology,” says Thomas Theis, a physicist who directs the nanoelectronics initiative at the Semiconductor Research Corporation (SRC), a research-funding consortium in Durham, North Carolina.

Unfortunately, no millivolt switch has made it out of the laboratory either. That leaves the architectural approach: stick with silicon, but configure it in entirely new ways. One popular option is to go 3D. Instead of etching flat circuits onto the surface of a silicon wafer, build skyscrapers: stack many thin layers of silicon with microcircuitry etched into each. In principle, this should make it possible to pack more computational power into the same space. In practice, however, this currently works only with memory chips, which do not have a heat problem: they use circuits that consume power only when a memory cell is accessed, which is not that often. One example is the Hybrid Memory Cube design, a stack of as many as eight memory layers that is being pursued by an industry consortium originally

196010–2

1

102

104

106

108

1010

1974 1988 2002 2016

19500.1

1

10

100

103

104

105

106

107

108

109

1010

1012

1011

1013

1960

Siz

e (m

m3 )

1970 1980 1990 2000 2010 2020

MOORE’S LORE

For the past five decades, the number of transistors per microprocessor chip — a rough measure of processing power — has doubled about every two years, in step with Moore’s law (top). Chips also increased their ‘clock speed’, or rate of executing instructions, until 2004, when speeds were capped to limit heat. As computers increase in power and shrink in size, a new class of machines has emerged roughly every ten years (bottom).

Transistors per chip

Clock speeds (MHz)

Mainfra

me

Minicomputer

Personal

computer

Laptop

Smartphone

Embedded

processors

SO

UR

CE: TO

P, IN

TEL; B

OTTO

M, S

IA/S

RC

1 4 6 | N A T U R E | V O L 5 3 0 | 1 1 F E B R U A R Y 2 0 1 6

FEATURENEWS

© 2016 Macmillan Publishers Limited. All rights reserved

tran

sist

or c

ount

1970 1980 20101990 2000

108

106

104

102

G. Moore (1965)

100

10-2

clock speed (MHz)

adapted from Nature (2016)








• Physics,…


not scalable












196010–2

1

102

104

106

108

1010

1974 1988 2002 2016

19500.1

1

10

100

103

104

105

106

107

108

109

1010

1012

1011

1013

1960

Siz

e (m

m3 )

1970 1980 1990 2000 2010 2020

MOORE’S LORE



Clock speeds (MHz)

Mainfra

me

Minicomputer

Personal

computer

Laptop

Smartphone

Embedded

processors

SO

UR

CE: TO

P, IN

TEL; B

OTTO

M, S

IA/S

RC


FEATURENEWS


tran

sist

or c

ount

1970 1980 20101990 2000

108

106

104

102

G. Moore (1965)

100

10-2

clock speed (MHz)









• Physics,…


not scalable

already close to fab limits












196010–2

1

102

104

106

108

1010

1974 1988 2002 2016

19500.1

1

10

100

103

104

105

106

107

108

109

1010

1012

1011

1013

1960

Siz

e (m

m3 )

1970 1980 1990 2000 2010 2020

MOORE’S LORE



Clock speeds (MHz)

Mainfra

me

Minicomputer

Personal

computer

Laptop

Smartphone

Embedded

processors

SO

UR

CE: TO

P, IN

TEL; B

OTTO

M, S

IA/S

RC


FEATURENEWS


tran

sist

or c

ount

1970 1980 20101990 2000

108

106

104

102

G. Moore (1965)

100

10-2

clock speed (MHz)









• Physics,…


not scalable

already close to fab limits

potentially disruptive












196010–2

1

102

104

106

108

1010

1974 1988 2002 2016

19500.1

1

10

100

103

104

105

106

107

108

109

1010

1012

1011

1013

1960

Siz

e (m

m3 )

1970 1980 1990 2000 2010 2020

MOORE’S LORE



Clock speeds (MHz)

Mainfra

me

Minicomputer

Personal

computer

Laptop

Smartphone

Embedded

processors

SO

UR

CE: TO

P, IN

TEL; B

OTTO

M, S

IA/S

RC


FEATURENEWS


tran

sist

or c

ount

1970 1980 20101990 2000

108

106

104

102

G. Moore (1965)

100

10-2

clock speed (MHz)


Current state of the art: Special-purpose analog quantum annealers

Antikythera ~ 80BC

• What is it?

• Semi-programmable analog annealer.

• 2000 superconducting flux qubits.

• Controversial performance.

• Still, huge technological feat…

• What can it do?

• It can minimize QUBOs post embedding onto the machine’s hardwired Chimera topology.

• Limitations:

• Low connectivity.

• Analog noise.

• …

• What is it?

• Semi-programmable analog annealer.

• 2000 superconducting flux qubits.

• Controversial performance.

• Still, huge technological feat…

• What can it do?

• It can minimize QUBOs post embedding onto the machine’s hardwired Chimera topology.

• Limitations:

• Low connectivity.

• Analog noise.

• …

K44

How do quantum annealers optimize?

How do quantum annealers optimize?

Sequentially.

Classical Analog: Simulated Annealing (SA)

• Annealing:

• 7000 year-old neolithic technology.

• Slowly cool to remove imperfections.

• Simulated Annealing (SA):

• Stochastically sample using Monte Carlo.

• If the system is thermalized, cool it.

• The slower the cooling, the better, e.g.,

• Problem: SA is inefficient for complex systems.

• Solution: Multiple restarts & statistics gathering.

Germancopper axe

Kirkpatrick et al., Science (83)

H({S})

Geman & Geman

T (t) = a� bt

Classical Analog: Simulated Annealing (SA)

• Annealing:

• 7000 year-old neolithic technology.

• Slowly cool to remove imperfections.

• Simulated Annealing (SA):

• Stochastically sample using Monte Carlo.

• If the system is thermalized, cool it.

• The slower the cooling, the better, e.g.,

• Problem: SA is inefficient for complex systems.

• Solution: Multiple restarts & statistics gathering.

Kirkpatrick et al., Science (83)

H({S})

Geman & Geman

T (t) = a� bt

Quantum Annealing (QA)

• Idea:

• Use quantum fluctuations instead of thermal.

• Sequential algorithm like SA.

• Theoretical advantages over SA:

• Fluctuations determine the “tunneling radius.”

• Not limited to a local search.

• Implementation in DW device (transverse-field QA):

• Apply a transverse field that does not commute:

• Reduce the fluctuation amplitude D via a given annealing protocol.

Kadowaki & Nishimori (98)Farhi et al. (00)

H(Si) =NX

i 6=j

QijSiSj H(Si

) =NX

i 6=j

Qij

Sz

i

Sz

i

�DNX

i

Sx

i

Morita & Nishimori (06)

[Sx, Sz] 6= 0

Promising signs of quantum speedup…?

see Mandrà, Zhu, Perdomo-O. & Katzgraber (PRA, arXiv:1604.01746)

Denchev et al. (Dec. 2015)

5

100 200 300 400 500 600 700 800 900 1000Problem size (bits)

102

104

106

108

1010

1012

1014Q

MC

and

SA

sing

le-c

ore

anne

alin

gtim

e(µ

s)180 296 489 681 945

85th75th50th

102

104

106

108

1010

1012

1014

D-W

ave

anne

alin

gtim

e(µ

s)

QMC

SA

D-Wave

FIG. 4. Time to find the optimal solution with 99% proba-bility for di↵erent problem sizes. We compare Simulated An-nealing (SA), Quantum Monte Carlo (QMC) and the D-Wave2X. To assign a runtime for the classical algorithms we takethe number of spin updates (for SA) or worldline updates (forQMC) that are required to reach a 99% success probabilityand multiply that with the time to perform one update ona single state-of-the-art core. Shown are the 50th, 75th and85th percentiles over a set of 100 instances. It occupied mil-lions of processor cores for several days to tune and run theclassical algorithms for these benchmarks. The runtimes forthe higher quantiles for the largest problem size for QMC werenot computed due to the high computational cost. For a sim-ilar comparison with QMC with di↵erent parameters pleasesee Fig. 13

tonian is

Hcl

= �MX⌧=1

0@Xjk

Jjk

M�j

(⌧)�k

(⌧)

+J?(s)Xj

�j

(⌧)�j

(⌧ + 1)

1A , (9)

where �j

(⌧) = ±1 are classical spins, j and k are siteindices, ⌧ is a replica index, and M is the number ofreplicas. The coupling between replicas is given by

J?(s) = � 1

2�ln tanh

A(s)�

M, (10)

where � is the inverse temperature. The configurationsfor a given spin j across all replicas ⌧ is called the world-line of spin j. Periodic boundary conditions are imposedbetween �

j

(M) and �j

(1). We used continuous path in-tegral QMC, which corresponds to the limit �⌧ ! 0 [46],and, unlike discrete path integral QMC, does not su↵erfrom discretization errors of order 1/M .

We numerically compute the number of sweeps nsweeps

required for QMC to find the ground state with 99%

probability at di↵erent quantiles. In our case, a sweepcorresponds to two update attempts for each worldline.The computational e↵ort is n

sweeps

⇥N⇥Tworldline

, whereN is the number of qubits and T

worldline

is the time to up-date a worldline. We average T

worldline

over all the stepsin the quantum annealing schedule; however the valueof T

worldline

depends on the particular schedule chosen.As explained above for SA, we report the total computa-tional e↵ort of QMC in standard units of time per singlecore. For the annealing schedule used in the current D-Wave 2X processor, we find

Tworldline

= � ⇥ 870 ns (11)

using an Intel(R) Xeon(R) CPU E5-1650 @ 3.20GHz.This study is designed to explore the utility of QMC

as a classical optimization routine. Accordingly, we op-timize QMC by running at a low temperature, 4.8 mK.We also observe that QMC with open boundary condi-tions (OBC) performs better than standard QMC withperiodic boundary conditions in this case [38]; therefore,OBC is used in this comparison. We further optimize thenumber of sweeps per run which, for a given quantile, re-sults in the lowest total computational e↵ort. We findthat the optimal number of sweeps is 106 at the largestproblem size. This enhances the ability of QMC to simu-late quantum tunneling, and gives a very high probabilityof success per run in the median case, p

success

= 0.16.All the qubits in a cluster have approximately the same

orientation in each local minima of the e↵ective meanfield potential. Neighboring local minima typically cor-respond to di↵erent orientations of a single cluster. Here,tunneling time is dominated by a single purely imaginaryinstanton and is described by Eq. (35) below. It wasrecently demonstrated that, in this situation, the expo-nent a

min

/~ for physical tunneling is identical to that ofQMC [38]. As seen in Fig. 4, we do not find a substan-tial di↵erence in the scaling of QMC and D-Wave (QA).However, we find a very substantial computational over-head associated with the prefactor B in the expressionT = BeDamin/~ for the runtime. In other words, B

QMC

can exceed BQA

by many orders of magnitude. The roleof the prefactor becomes essential in situations where thenumber of cotunneling qubits D is finite, i.e., is inde-pendent of the problem size N (or depends on N veryweakly). Between some quantiles and system sizes weobserve a prefactor advantage as high as 108.

C. D-Wave versus other Classical Solvers

Based on the results presented here, one cannot claima quantum speedup for D-Wave 2X, as this would requirethat the quantum processor in question outperforms thebest known classical algorithm. This is not the case forthe weak-strong cluster networks. This is because a va-riety of heuristic classical algorithms can solve most in-stances of Chimera structured problems much faster thanSA, QMC, and the D-Wave 2X [47–49] (for a possible

Google’s “108 results” – slope vs offset

N [problem size]

T

TS

in µ

s

DW

2 an

neal

ing

time

in µ

s

Denchev et al. (15)

5


102

104

106

108

1010

1012

1014Q

MC

and

SA

sing

le-c

ore

anne

alin

gtim

e(µ

s)180 296 489 681 945

85th75th50th

102

104

106

108

1010

1012

1014

D-W

ave

anne

alin

gtim

e(µ

s)

QMC

SA

D-Wave


tonian is

Hcl

= �MX⌧=1

0@Xjk

Jjk

M�j

(⌧)�k

(⌧)

+J?(s)Xj

�j

(⌧)�j

(⌧ + 1)

1A , (9)

where �j


J?(s) = � 1

2�ln tanh

A(s)�

M, (10)


j

(M) and �j





sweeps

⇥N⇥Tworldline


worldline


worldline


worldline


Tworldline

= � ⇥ 870 ns (11)



success



min


QMC

can exceed BQA





N [problem size]

T

TS

in µ

s

DW

2 an

neal

ing

time

in µ

s

4

(a)

J = +1

� < 0.5

J = �1(b)

h

1

=�1

h

2

=��

h

1

Figure 1: Sketch of the weak-strong clusters and networks.(a) Structure of a weak-strong cluster. Two K

4,4 cells of theChimera lattice are connected ferromagnetically (blue lines,J = 1), as well as all spins within each K

4,4 cell. Blackdots correspond to qubits in the strong cluster with a biasingmagnetic field h

1

= �1. The white dots represent the weakcluster, where each site is coupled to a weaker field h

2

= ��h1

with � = 0.44 < 0.5 in the opposite direction. The whitelines represent the connections from the strong cluster toneighboring strong clusters of a weak-strong pair. (b) Weak-strong cluster network: each rectangle represents a weak-strongcluster. The di↵erent weak-strong clusters are connected viaa spin-glass backbone where the interactions can take values{±1}. Here, red lines represent J = �1. Note that theconnections between clusters only occur between the strongclusters.

where V represents the 8 vertices in one K4,4 unit cell

of the Chimera graph. The subset V ⇢ V represents thevertices of the right-hand-side of the strong and weakclusters that are linked by a ferromagnetic interactionJ = 1.

A weak-strong cluster network Hamiltonian H is thenconstructed by connecting the sites of each strong clus-ter with neighboring strong clusters [white lines in Fig-ure 1(a)] using a spin-glass backbone with random cou-plings J

C

2 {±1}, i.e.,

H =X

C

J

C

HC

ws

. (4)

Note that the weak clusters only couple to the strongcluster within a given weak-strong cluster. Because ofimperfections in the DW2X device, the embedding of theweak-strong cluster network in the Chimera topology isnontrivial. However, systems of up to n = 945 qubitshave been studied.

The main result of Ref. [49] is to show, either experimen-tally (by using the DW2X quantum optimizer) or numer-ically (by using quantum Monte Carlo simulations), thatquantum co-tunneling e↵ects play a fundamental role inadiabatic optimization. Note that quantum Monte Carlo

is the closest classical algorithm to quantum annealing onthe DW2X. The results of Ref. [49] on the DW2X chip areapproximately 108 times faster than simulated annealing[15] and considerably faster than quantum Monte Carlodespite both the DW2X quantum annealer and quantumMonte Carlo having a similar scaling (similar slope ofthe curves in Figure 4 of Ref. [49] for quantum MonteCarlo and the DW2X). While this, indeed, represents thefirst solid evidence that the DW2X machine might havecapabilities that classical optimization approaches do notpossess, it is important to perform a comprehensive com-parison to a wide variety of state-of-the-art optimizationmethods. Within the categories defined in Section II,the results of Ref. [49] for the DW2X clearly outperformany sequential optimization methods, however fall shortof outperforming tailored and nontailored optimizationmethods. We feel, however, that knowingly exploiting thestructure of a problem does not amount to a fair compar-ison. However, our results shown below clearly suggestthat generic optimization methods still outperform theDW2X. One might thus question the importance of theresults of Ref. [49]. We emphasize that this is the firststudy that undoubtedly shows that the DW2X machinehas finite-range tunneling and gives clear hints towardsthe class of problems where analog quantum annealingmachines might excel.

In addition to showing here that a variety of either“tailored” to the weak-strong cluster structure or more“generic” classical heuristics can achieve similar perfor-mances of the DW2X chip, we also study the energylandscape of the weak-strong cluster networks. The lat-ter provides valuable insights about the limitations offinite-range tunneling for this class of problems. Ouranalysis suggest that the scaling advantage of finite-rangecotunneling over sequential algorithms could be lost forinstances with problem sizes beyond the ones consideredin Ref. [49].

In the next paragraph we further discuss the perfor-mance of DW2X compared to tailored and nontailoredclassical heuristics in detail.

IV. RESULTS

In this Section, we present our main results. In the firstpart, we compare the performance of the DW2X deviceagainst general (nontailored) and tailored classical algo-rithms. The description of the used algorithms is in theAppendix. In the second part, we analyze in depth thescaling behavior of the DW2X device by varying the num-ber of used qubits. The aim is to better understand therole of a non-optimal annealing times for a noisy analogdevice to the asymptotic scaling of the computationaltime. Finally, we study the energy landscape, as proposedin Ref. [32], and show that for increasing problem size thespin-glass backbone of the weak-strong cluster networkdominates and the advantages of finite-range tunnelingdiminish for increasing system sizes.

Q = �1

Q = +1

Denchev et al. (15)

5


102

104

106

108

1010

1012

1014Q

MC

and

SA

sing

le-c

ore

anne

alin

gtim

e(µ

s)180 296 489 681 945

85th75th50th

102

104

106

108

1010

1012

1014

D-W

ave

anne

alin

gtim

e(µ

s)

QMC

SA

D-Wave


tonian is

Hcl

= �MX⌧=1

0@Xjk

Jjk

M�j

(⌧)�k

(⌧)

+J?(s)Xj

�j

(⌧)�j

(⌧ + 1)

1A , (9)

where �j


J?(s) = � 1

2�ln tanh

A(s)�

M, (10)


j

(M) and �j





sweeps

⇥N⇥Tworldline


worldline


worldline


worldline


Tworldline

= � ⇥ 870 ns (11)



success



min


QMC

can exceed BQA





N [problem size]

T

TS

in µ

s

DW

2 an

neal

ing

time

in µ

s

4

(a)

J = +1

� < 0.5

J = �1(b)

h

1

=�1

h

2

=��

h

1




1


2

= ��h1





C

2 {±1}, i.e.,

H =X

C

J

C

HC

ws

. (4)






IV. RESULTS


Q = �1

Q = +1

Denchev et al. (15)

H(Si) =NX

i 6=j

QijSiSj �X

i

hiSi

5


102

104

106

108

1010

1012

1014Q

MC

and

SA

sing

le-c

ore

anne

alin

gtim

e(µ

s)180 296 489 681 945

85th75th50th

102

104

106

108

1010

1012

1014

D-W

ave

anne

alin

gtim

e(µ

s)

QMC

SA

D-Wave


tonian is

Hcl

= �MX⌧=1

0@Xjk

Jjk

M�j

(⌧)�k

(⌧)

+J?(s)Xj

�j

(⌧)�j

(⌧ + 1)

1A , (9)

where �j


J?(s) = � 1

2�ln tanh

A(s)�

M, (10)


j

(M) and �j





sweeps

⇥N⇥Tworldline


worldline


worldline


worldline


Tworldline

= � ⇥ 870 ns (11)



success



min


QMC

can exceed BQA





N [problem size]

T

TS

in µ

s

DW

2 an

neal

ing

time

in µ

s

4

(a)

J = +1

� < 0.5

J = �1(b)

h

1

=�1

h

2

=��

h

1




1


2

= ��h1





C

2 {±1}, i.e.,

H =X

C

J

C

HC

ws

. (4)






IV. RESULTS


Q = �1

Q = +1

Denchev et al. (15)

spin-glass backbone

H(Si) =NX

i 6=j

QijSiSj �X

i

hiSi

5


102

104

106

108

1010

1012

1014Q

MC

and

SA

sing

le-c

ore

anne

alin

gtim

e(µ

s)180 296 489 681 945

85th75th50th

102

104

106

108

1010

1012

1014

D-W

ave

anne

alin

gtim

e(µ

s)

QMC

SA

D-Wave


tonian is

Hcl

= �MX⌧=1

0@Xjk

Jjk

M�j

(⌧)�k

(⌧)

+J?(s)Xj

�j

(⌧)�j

(⌧ + 1)

1A , (9)

where �j


J?(s) = � 1

2�ln tanh

A(s)�

M, (10)


j

(M) and �j





sweeps

⇥N⇥Tworldline


worldline


worldline


worldline


Tworldline

= � ⇥ 870 ns (11)



success



min


QMC

can exceed BQA





N [problem size]

T

TS

in µ

s

DW

2 an

neal

ing

time

in µ

s

Denchev et al. (15)

5


102

104

106

108

1010

1012

1014Q

MC

and

SA

sing

le-c

ore

anne

alin

gtim

e(µ

s)180 296 489 681 945

85th75th50th

102

104

106

108

1010

1012

1014

D-W

ave

anne

alin

gtim

e(µ

s)

QMC

SA

D-Wave


tonian is

Hcl

= �MX⌧=1

0@Xjk

Jjk

M�j

(⌧)�k

(⌧)

+J?(s)Xj

�j

(⌧)�j

(⌧ + 1)

1A , (9)

where �j


J?(s) = � 1

2�ln tanh

A(s)�

M, (10)


j

(M) and �j





sweeps

⇥N⇥Tworldline


worldline


worldline


worldline


Tworldline

= � ⇥ 870 ns (11)



success



min


QMC

can exceed BQA





N [problem size]

T

TS

in µ

s

DW

2 an

neal

ing

time

in µ

s

Denchev et al. (15)

Catapult + QMC

5


102

104

106

108

1010

1012

1014Q

MC

and

SA

sing

le-c

ore

anne

alin

gtim

e(µ

s)180 296 489 681 945

85th75th50th

102

104

106

108

1010

1012

1014

D-W

ave

anne

alin

gtim

e(µ

s)

QMC

SA

D-Wave


tonian is

Hcl

= �MX⌧=1

0@Xjk

Jjk

M�j

(⌧)�k

(⌧)

+J?(s)Xj

�j

(⌧)�j

(⌧ + 1)

1A , (9)

where �j


J?(s) = � 1

2�ln tanh

A(s)�

M, (10)


j

(M) and �j





sweeps

⇥N⇥Tworldline


worldline


worldline


worldline


Tworldline

= � ⇥ 870 ns (11)



success



min


QMC

can exceed BQA





N [problem size]

T

TS

in µ

s

DW

2 an

neal

ing

time

in µ

s

Denchev et al. (15)

Better scaling of DW and quantum inspired.

Catapult + QMC

~ = 0 ~ > 0

: 00

~ = 0 ~ > 0

:0 1

What if we use better algorithms?

• Tailored to the problems and/or underlying graph:

• Hamze-de Freitas-Selby algorithm (HFS).

• Hybrid cluster methods (HCM).

• Super-spin approximation (SS).

• Not tailored to the problems and/or underlying graph:

• Population annealing (particle swarm) sequential Monte Carlo (PA).

• Parallel tempering & isoenergetic cluster optimizer (PT+ICM).

• Reminder – Sequential methods used in the Google study:

• Simulated annealing (SA).

• Quantum Monte Carlo (QMC).

• D-Wave 2X (DW2).

Zhu, Ochoa, Katzgraber PRL (15)

Zhu (16)Venturelli, et al. (15)

Wang et al., PRE (15)

Kirkpatrick et al. (83)

Hamze et al. (12)

Denchev et al. (15)

















Hamze et al. (12)

Denchev et al. (15)

MaxSAT 2016 winner

















Hamze et al. (12)

Denchev et al. (15)

MaxSAT 2016 winner Catapult?

0.00.10.20.30.40.50.60.70.8

SA PA DW2QMC

HCMRMC+ICM

PT+ICMHFS SS

b [5

0%]

(a + b √n + c log10(√n)) fit(a + b √n) fitT ⇠ poly(

pn)10a+b

pn

Asymptotic scaling exponent b (slope)b

[50

%, m

ain

scal

ing

expo

nent

]

0.00.10.20.30.40.50.60.70.8

SA PA DW2QMC

HCMRMC+ICM

PT+ICMHFS SS

b [5

0%]


pn)10a+b

pn

smal

ler

mea

ns b

ette

r sc

alin

g


[50

%, m

ain

scal

ing

expo

nent

]

smal

ler

mea

ns b

ette

r sc

alin

g


[50

%, m

ain

scal

ing

expo

nent

]

sequen

tial

tailo

red

not tail

ored

tailo

red

0.00.10.20.30.40.50.60.70.8

SA PA DW2QMC

HCMRMC+ICM

PT+ICMHFS SS

b [5

0%]


pn)10a+b

pn

smal

ler

mea

ns b

ette

r sc

alin

g


[50

%, m

ain

scal

ing

expo

nent

]

sequen

tial

tailo

red

not tail

ored

tailo

red

0.00.10.20.30.40.50.60.70.8

SA PA DW2QMC

HCMRMC+ICM

PT+ICMHFS SS

b [5

0%]


pn)10a+b

pn

Only “sequential” quantum speedup.

~ = 0 ~ > 0

:0 1

~ = 0 ~ > 0

: 11

Most recent D-Wave benchmarks

…

see Mandrà, Katzgraber & Thomas (QST, arXiv:1703.00622)

King et al. (17)

100

101

102

103

104

105

106

107

0 5 10 15 20

α = 0.80, ρ = 5

TTS

(µs)

Number of logical variables

MWPM (no broken qubits)MWPM

DW2000Q, TTS1DW2000Q, TTS2ICM (logical), TTS2

TT

S [µ

s]D-Wave’s frustrated cluster loop problems

n [number of logical variables]

DW2000Q

SA

QMC

King et al. (17)

p

100

101

102

103

104

105

106

107

0 5 10 15 20

α = 0.80, ρ = 5

TTS

(µs)




TT

S [µ



DW2000Q

SA

QMC

King et al. (17)

p

Catapult + QMC

100

101

102

103

104

105

106

107

0 5 10 15 20

α = 0.80, ρ = 5

TTS

(µs)




TT

S [µ



DW2000Q

SA

QMC

• Ruggedness of FCLs (spin-glass backbone) fools codes.

• The logical problem is defined on K44 cells and is therefore planar.

King et al. (17)

p

Catapult + QMC

100

101

102

103

104

105

106

107

0 5 10 15 20

α = 0.80, ρ = 5

TTS

(µs)




TT

S [µ



DW2000Q

SA

QMC

• Ruggedness of FCLs (spin-glass backbone) fools codes.

• The logical problem is defined on K44 cells and is therefore planar.

King et al. (17)

• Planar problems arepolynomial (P class).

• Exact algorithms exist.

Why is this a problem?

p

Catapult + QMC

100

101

102

103

104

105

106

107

1 10 100 1000

TTS

(µs)


α = 0.80, ρ = 5

mwpm (fully-chimera)mwpm

DW2kQ, 1/p ttsDW2kQ, log(0.01)/log(1-p) tts

Using minimum-weight perfect matching…


TT

S [µ

s]

King et al. (17)

Mandrà et al. (17)

Edmonds (61)

p

100

101

102

103

104

105

106

107

1 10 100 1000

TTS

(µs)


α = 0.80, ρ = 5





TT

S [µ

s]

DW2000Q

MWPM

King et al. (17)

Mandrà et al. (17)

Edmonds (61)

p

100

101

102

103

104

105

106

107

1 10 100 1000

TTS

(µs)


α = 0.80, ρ = 5





TT

S [µ

s]

DW2000Q

MWPM

King et al. (17)

Mandrà et al. (17)

Exponentially faster than DW2000Q…

Edmonds (61)

p

~ = 0 ~ > 0

:1 1

~ = 0 ~ > 0

: 12

Fair sampling – A key ingredient in ML

see also Mandrà, Zhu & Katzgraber (PRL, arXiv:1606.07146)

A

What is fair sampling?

• Definition (fair sampling):

• Ability of an algorithm to find uncorrelated solutions to a problem with (almost) the same probability.

• Why is this important?

• Sometimes solutions are more important than the optimum (SAT filters, #SAT, machine learning,…).

• Some solutions might be more “convenient” due to additional constraints.

• Algorithm benchmarking:

• Standard – Find the optimum fast and reliably.

• Stringent – Find all minimizing configurations equiprobably.

What is fair sampling?

• Definition (fair sampling):

• Ability of an algorithm to find uncorrelated solutions to a problem with (almost) the same probability.

• Why is this important?

• Sometimes solutions are more important than the optimum (SAT filters, #SAT, machine learning,…).

• Some solutions might be more “convenient” due to additional constraints.

• Algorithm benchmarking:

• Standard – Find the optimum fast and reliably.

• Stringent – Find all minimizing configurations equiprobably.

current state of the art is PT+ICM

• 5-variable toy model suggests bias:

• What about quantum annealers?

• Design problems with known degeneracy:

• Study the distribution of ground states for fixed NGS.

fair sampling

Can transverse-field QA sample fairly? Matsuda, Nishimori, Katzgraber (NJP 2009)

Jij = +1Jij = �1

annealing time

P [p

roba

bilit

y of

sta

tes]

Ground-state statistics from annealing algorithms 3

The present paper is organized as follows: Section 2 describes the solution of a small

system by direct diagonalization and numerical integration of the Schrodinger equation.

Section 3 is devoted to the studies of larger degenerate systems via quantum Monte

Carlo simulations, followed by concluding remarks in section 4.

2. Schrodinger dynamics for a small system

It is instructive to first study a small-size system by a direct solution of the Schrodinger

equation, both in stationary and nonstationary contexts. The classical optimization

problem for this purpose is chosen to be a five-spin system with interactions as shown

in figure 1.

Figure 1. Five-spin toy model studied. Full lines denoteferromagnetic interactions (Jij = 1) while dashed lines standfor antiferromagnetic interactions (Jij = −1). Because of thegeometry of the problem the system has a degenerate groundstate by construction.

The Hamiltonian of this system is given by

H0 = −!

⟨ij⟩

Jijσzi σ

zj , (1)

where the sum is over all nearest-neighbour interactions Jij = ±1 and σzi denote Ising

spins parallel to the z-axis. The system has six degenerate ground states, three of which

are shown in figure 2. We apply a transverse field

H1 = −!

i

σxi (2)

|1⟩ |2⟩ |3⟩

Figure 2. Nontrivial degenerate ground states of the toy model shown in figure 1.Filled and open circles denote up and down spins, respectively. The other three groundstates |1⟩, |2⟩, and |3⟩ are obtained from |1⟩, |2⟩, and |3⟩ by reversing all spins.

|1i |2i |3i










in figure 1.



H0 = −!

⟨ij⟩

Jijσzi σ

zj , (1)




H1 = −!

i

σxi (2)

|1⟩ |2⟩ |3⟩











in figure 1.



H0 = −!

⟨ij⟩

Jijσzi σ

zj , (1)




H1 = −!

i

σxi (2)

|1⟩ |2⟩ |3⟩

Figure 2. Nontrivial degenerate ground states of the toy model shown in figure 1.Filled and open circles denote up and down spins, respectively. The other three groundstates |1⟩, |2⟩, and |3⟩ are obtained from |1⟩, |2⟩, and |3⟩ by reversing all spins.H=

X

hiji

JijSiSj

Jij 2 {±5,±6,±7}H=X

hiji

JijSiSj NGS = 3 · 2k = {6, 12, 24, 48, 96, . . .}, k 2 N l

og(h

its)

rankGS




• Study the distribution of ground states for fixed NGS.

fair sampling


Jij = +1Jij = �1

annealing time

P [p

roba

bilit

y of

sta

tes]

|2i |3i










in figure 1.



H0 = −!

⟨ij⟩

Jijσzi σ

zj , (1)




H1 = −!

i

σxi (2)

|1⟩ |2⟩ |3⟩











in figure 1.



H0 = −!

⟨ij⟩

Jijσzi σ

zj , (1)




H1 = −!

i

σxi (2)

|1⟩ |2⟩ |3⟩


X

hiji

JijSiSj

Jij 2 {±5,±6,±7}H=X

hiji

JijSiSj NGS = 3 · 2k = {6, 12, 24, 48, 96, . . .}, k 2 N l

og(h

its)

rankGS




• Study the distribution of ground states for fixed NGS. unfair


Jij = +1Jij = �1

annealing time

P [p

roba

bilit

y of

sta

tes]

|2i |3i










in figure 1.



H0 = −!

⟨ij⟩

Jijσzi σ

zj , (1)




H1 = −!

i

σxi (2)

|1⟩ |2⟩ |3⟩











in figure 1.



H0 = −!

⟨ij⟩

Jijσzi σ

zj , (1)




H1 = −!

i

σxi (2)

|1⟩ |2⟩ |3⟩


X

hiji

JijSiSj

Jij 2 {±5,±6,±7}H=X

hiji

JijSiSj NGS = 3 · 2k = {6, 12, 24, 48, 96, . . .}, k 2 N l

og(h

its)

rankGS

10-310-210-1100101102

0.0 0.2 0.4 0.6 0.8 1.0Num

ber o

f Hits

[Ave

rage

]

Rankgs/(3 ⋅ 2k)

c = 9, k = 2c = 9, k = 3c = 9, k = 4c = 9, k = 5

sample data for N = 684

RankGS/(3•2k)

N

umbe

r of

hits

(av

erag

ed o

ver

sam

ples

)Transverse-field QA is exponentially biased

k = 2 (NGS = 12)k = 3 (NGS = 24)k = 4 (NGS = 48)k = 5 (NGS = 96)

10-310-210-1100101102

0.0 0.2 0.4 0.6 0.8 1.0Num

ber o

f Hits

[Ave

rage

]

Rankgs/(3 ⋅ 2k)

c = 9, k = 2c = 9, k = 3c = 9, k = 4c = 9, k = 5

sample data for N = 684

RankGS/(3•2k)

N

umbe

r of

hits

(av

erag

ed o

ver

sam

ples

)Transverse-field QA is exponentially biased

k = 2 (NGS = 12)k = 3 (NGS = 24)k = 4 (NGS = 48)k = 5 (NGS = 96)

Standard QA will need

tweaks for fair sampling.

~ = 0 ~ > 0

:2 1

~ = 0 ~ > 0

: 13

~ = 0 ~ > 0

: 13analog QA

~ = 0 ~ > 0

: 13Look out for IARPA’s QEO report on QA.

analog QA

~ = 0 ~ > 0

: 13Look out for IARPA’s QEO report on QA.

analog QA

However… Soon superseded by digital?


• Classical optimization pushes quantum technology.

• Quantum developments leverage classical quantum inspired methods.

• ML could benefit from quantum samplers… if these can sample fairly.

• To date, no application speedup or better scaling of quantum annealing.

[email protected]


• Classical optimization pushes quantum technology.

• Quantum developments leverage classical quantum inspired methods.

• ML could benefit from quantum samplers… if these can sample fairly.

• To date, no application speedup or better scaling of quantum annealing.

Thank you.

[email protected]

helmut g. katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling...

Documents