helmut g. katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling...
TRANSCRIPT
![Page 1: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/1.jpg)
![Page 2: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/2.jpg)
Helmut G. Katzgraberhttps://intractable.lol
Quantum vs Classical Optimization:A status update on the arms race
![Page 3: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/3.jpg)
https://intractable.lolHelmut G. Katzgraber
Quantum vs Classical Optimization:A status update on the arms race
![Page 4: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/4.jpg)
https://intractable.lolHelmut G. Katzgraber
Quantum vs Classical Optimization:A status update on the arms race
~ = 0 ~ > 0
0 0:
![Page 5: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/5.jpg)
• Some questions we would like answers to…
• Current status of quantum vs classical optimization?
• What about quantum approaches for machine learning?
• If QA fails to deliver, can we still benefit? Think quantum inspired…
• Texas A&M team:
Outline
S. Mandrà @ , F. Hamze @ , C. Thomas @ . as well as…
![Page 6: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/6.jpg)
• Some questions we would like answers to…
• Current status of quantum vs classical optimization?
• What about quantum approaches for machine learning?
• If QA fails to deliver, can we still benefit? Think quantum inspired…
• Texas A&M team:
Outline
S. Mandrà @ , F. Hamze @ , C. Thomas @ . as well as…
![Page 7: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/7.jpg)
• Some questions we would like answers to…
• Current status of quantum vs classical optimization?
• What about quantum approaches for machine learning?
• If QA fails to deliver, can we still benefit? Think quantum inspired…
• Texas A&M team:
Outline
S. Mandrà @ , F. Hamze @ , C. Thomas @ .
C. FangDr. W. Wang
H. Munoz-B. J. ChancellorDr. Z. Zhu
A. BarzegarA. Ochoa
C. Pattison missing
as well as…
![Page 8: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/8.jpg)
Why quantum annealing? Optimization!
• Selected problems of interest:
• Constraint satisfaction (SAT)
• Number partitioning
• Minimum vertex covers
• Traveling salesman problem, …
• What do all these have in common?
• Rough cost function landscapes.
• They are problems in NP (also typical hard).
• All map onto Quadratic Unconstrained Binary Optimization (QUBO) problems.
(x11ORx12)AND(x21ORx22)...
min vertex cover
NPP
H(Si) =NX
i 6=j
QijSiSj Si 2 {±1}
NP Problems
NP-complete
P Problems
![Page 9: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/9.jpg)
Why quantum annealing? Optimization!
• Selected problems of interest:
• Constraint satisfaction (SAT)
• Number partitioning
• Minimum vertex covers
• Traveling salesman problem, …
• What do all these have in common?
• Rough cost function landscapes.
• They are problems in NP (also typical hard).
• All map onto Quadratic Unconstrained Binary Optimization (QUBO) problems.
(x11ORx12)AND(x21ORx22)...
min vertex cover
NPP
H(Si) =NX
i 6=j
QijSiSj Si 2 {±1}
NP Problems
NP-complete
P Problems
![Page 10: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/10.jpg)
minSi
Why quantum annealing? Optimization!
• Selected problems of interest:
• Constraint satisfaction (SAT)
• Number partitioning
• Minimum vertex covers
• Traveling salesman problem, …
• What do all these have in common?
• Rough cost function landscapes.
• They are problems in NP (also typical hard).
• All map onto Quadratic Unconstrained Binary Optimization (QUBO) problems.
(x11ORx12)AND(x21ORx22)...
min vertex cover
NPP
H(Si) =NX
i 6=j
QijSiSj Si 2 {±1}
NP Problems
NP-complete
P Problems
![Page 11: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/11.jpg)
minSi
Why quantum annealing? Optimization!
• Selected problems of interest:
• Constraint satisfaction (SAT)
• Number partitioning
• Minimum vertex covers
• Traveling salesman problem, …
• What do all these have in common?
• Rough cost function landscapes.
• They are problems in NP (also typical hard).
• All map onto Quadratic Unconstrained Binary Optimization (QUBO) problems.
(x11ORx12)AND(x21ORx22)...
min vertex cover
NPP
H(Si) =NX
i 6=j
QijSiSj Si 2 {±1}
NP Problems
NP-complete
P Problems
Good QUBO solvers & fast architectures needed!
![Page 12: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/12.jpg)
Moore’s Law is coming to an end…
• Four possible ways to overcome the end of Moore’s law:
• Build larger silicon-based computers.
• Develop faster silicon-based technologies.
• Focus on faster algorithms.
• Go beyond standard silicon architectures.
• Here, deep synergy between…
• Physics,…
• …quantum information, … … and computer science.
“The road map was an incredibly interesting experiment,” says Flamm. “So far as I know, there is no example of anything like this in any other industry, where every manufacturer and supplier gets together and figures out what they are going to do.” In effect, it converted Moore’s law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did.
And it all worked beautifully, says Flamm — right up until it didn’t.
HEAT DEATHThe first stumbling block was not unexpected. Gargini and others had warned about it as far back as 1989. But it hit hard nonetheless: things got too small.
“It used to be that whenever we would scale to smaller feature size, good things happened automatically,” says Bill Bottoms, president of Third Millennium Test Solutions, an equipment manufacturer in Santa Clara. “The chips would go faster and consume less power.”
But in the early 2000s, when the features began to shrink below about 90 nanometres, that automatic benefit began to fail. As electrons had to move faster and faster through silicon circuits that were smaller and smaller, the chips began to get too hot.
That was a fundamental problem. Heat is hard to get rid of, and no one wants to buy a mobile phone that burns their hand. So manufac-turers seized on the only solutions they had, says Gargini. First, they stopped trying to increase ‘clock rates’ — how fast microprocessors execute instructions. This effectively put a speed limit on the chip’s electrons and limited their ability to generate heat. The maximum clock rate hasn’t budged since 2004.
Second, to keep the chips moving along the Moore’s law performance curve despite the speed limit, they redesigned the internal circuitry so that each chip contained not one processor, or ‘core’, but two, four or more. (Four and eight are common in today’s desktop computers and smartphones.) In principle, says Gargini, “you can have the same output with four cores going at 250 megahertz as one going at 1 gigahertz”. In practice, exploiting eight processors means that a problem has to be broken down into eight pieces — which for many algorithms is dif-ficult to impossible. “The piece that can’t be parallelized will limit your improvement,” says Gargini.
Even so, when combined with creative redesigns to compensate for electron leakage and other effects, these two solutions have enabled chip manufacturers to continue shrinking their circuits and keeping their transistor counts on track with Moore’s law. The question now is what will happen in the early 2020s, when continued scaling is no longer possible with silicon because quantum effects have come into play. What comes next? “We’re still struggling,” says An Chen, an electrical engineer who works for the international chipmaker GlobalFoundries in Santa Clara, California, and who chairs a committee of the new road map that is looking into the question.
That is not for a lack of ideas. One possibility is to embrace a completely new paradigm — something like quantum computing, which promises exponential speed-up for certain calculations, or neuro morphic computing, which aims to model processing elements on neurons in the brain. But none of these alternative paradigms has made it very far out of the laboratory. And many researchers think that quantum computing will offer advantages only for niche applications, rather than for the everyday tasks at which digital computing excels. “What does it mean to quantum-balance a chequebook?” wonders John Shalf, head of computer-science research at the Lawrence Berkeley National Laboratory in Berkeley, California.
MATERIAL DIFFERENCESA different approach, which does stay in the digital realm, is the quest to find a ‘millivolt switch’: a material that could be used for devices at least as fast as their silicon counterparts, but that would generate much less heat. There are many candidates, ranging from 2D graphene-like compounds to spintronic materials that would compute by flipping electron spins rather than by moving electrons. “There is an enormous research space to be explored once you step outside the confines of the established technology,” says Thomas Theis, a physicist who directs the nanoelectronics initiative at the Semiconductor Research Corporation (SRC), a research-funding consortium in Durham, North Carolina.
Unfortunately, no millivolt switch has made it out of the laboratory either. That leaves the architectural approach: stick with silicon, but configure it in entirely new ways. One popular option is to go 3D. Instead of etching flat circuits onto the surface of a silicon wafer, build skyscrapers: stack many thin layers of silicon with microcircuitry etched into each. In principle, this should make it possible to pack more computational power into the same space. In practice, however, this currently works only with memory chips, which do not have a heat problem: they use circuits that consume power only when a memory cell is accessed, which is not that often. One example is the Hybrid Memory Cube design, a stack of as many as eight memory layers that is being pursued by an industry consortium originally
196010–2
1
102
104
106
108
1010
1974 1988 2002 2016
19500.1
1
10
100
103
104
105
106
107
108
109
1010
1012
1011
1013
1960
Siz
e (m
m3 )
1970 1980 1990 2000 2010 2020
MOORE’S LORE
For the past five decades, the number of transistors per microprocessor chip — a rough measure of processing power — has doubled about every two years, in step with Moore’s law (top). Chips also increased their ‘clock speed’, or rate of executing instructions, until 2004, when speeds were capped to limit heat. As computers increase in power and shrink in size, a new class of machines has emerged roughly every ten years (bottom).
Transistors per chip
Clock speeds (MHz)
Mainfra
me
Minicomputer
Personal
computer
Laptop
Smartphone
Embedded
processors
SO
UR
CE: TO
P, IN
TEL; B
OTTO
M, S
IA/S
RC
1 4 6 | N A T U R E | V O L 5 3 0 | 1 1 F E B R U A R Y 2 0 1 6
FEATURENEWS
© 2016 Macmillan Publishers Limited. All rights reserved
tran
sist
or c
ount
1970 1980 20101990 2000
108
106
104
102
G. Moore (1965)
100
10-2
clock speed (MHz)
adapted from Nature (2016)
![Page 13: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/13.jpg)
Moore’s Law is coming to an end…
• Four possible ways to overcome the end of Moore’s law:
• Build larger silicon-based computers.
• Develop faster silicon-based technologies.
• Focus on faster algorithms.
• Go beyond standard silicon architectures.
• Here, deep synergy between…
• Physics,…
• …quantum information, … … and computer science.
“The road map was an incredibly interesting experiment,” says Flamm. “So far as I know, there is no example of anything like this in any other industry, where every manufacturer and supplier gets together and figures out what they are going to do.” In effect, it converted Moore’s law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did.
And it all worked beautifully, says Flamm — right up until it didn’t.
HEAT DEATHThe first stumbling block was not unexpected. Gargini and others had warned about it as far back as 1989. But it hit hard nonetheless: things got too small.
“It used to be that whenever we would scale to smaller feature size, good things happened automatically,” says Bill Bottoms, president of Third Millennium Test Solutions, an equipment manufacturer in Santa Clara. “The chips would go faster and consume less power.”
But in the early 2000s, when the features began to shrink below about 90 nanometres, that automatic benefit began to fail. As electrons had to move faster and faster through silicon circuits that were smaller and smaller, the chips began to get too hot.
That was a fundamental problem. Heat is hard to get rid of, and no one wants to buy a mobile phone that burns their hand. So manufac-turers seized on the only solutions they had, says Gargini. First, they stopped trying to increase ‘clock rates’ — how fast microprocessors execute instructions. This effectively put a speed limit on the chip’s electrons and limited their ability to generate heat. The maximum clock rate hasn’t budged since 2004.
Second, to keep the chips moving along the Moore’s law performance curve despite the speed limit, they redesigned the internal circuitry so that each chip contained not one processor, or ‘core’, but two, four or more. (Four and eight are common in today’s desktop computers and smartphones.) In principle, says Gargini, “you can have the same output with four cores going at 250 megahertz as one going at 1 gigahertz”. In practice, exploiting eight processors means that a problem has to be broken down into eight pieces — which for many algorithms is dif-ficult to impossible. “The piece that can’t be parallelized will limit your improvement,” says Gargini.
Even so, when combined with creative redesigns to compensate for electron leakage and other effects, these two solutions have enabled chip manufacturers to continue shrinking their circuits and keeping their transistor counts on track with Moore’s law. The question now is what will happen in the early 2020s, when continued scaling is no longer possible with silicon because quantum effects have come into play. What comes next? “We’re still struggling,” says An Chen, an electrical engineer who works for the international chipmaker GlobalFoundries in Santa Clara, California, and who chairs a committee of the new road map that is looking into the question.
That is not for a lack of ideas. One possibility is to embrace a completely new paradigm — something like quantum computing, which promises exponential speed-up for certain calculations, or neuro morphic computing, which aims to model processing elements on neurons in the brain. But none of these alternative paradigms has made it very far out of the laboratory. And many researchers think that quantum computing will offer advantages only for niche applications, rather than for the everyday tasks at which digital computing excels. “What does it mean to quantum-balance a chequebook?” wonders John Shalf, head of computer-science research at the Lawrence Berkeley National Laboratory in Berkeley, California.
MATERIAL DIFFERENCESA different approach, which does stay in the digital realm, is the quest to find a ‘millivolt switch’: a material that could be used for devices at least as fast as their silicon counterparts, but that would generate much less heat. There are many candidates, ranging from 2D graphene-like compounds to spintronic materials that would compute by flipping electron spins rather than by moving electrons. “There is an enormous research space to be explored once you step outside the confines of the established technology,” says Thomas Theis, a physicist who directs the nanoelectronics initiative at the Semiconductor Research Corporation (SRC), a research-funding consortium in Durham, North Carolina.
Unfortunately, no millivolt switch has made it out of the laboratory either. That leaves the architectural approach: stick with silicon, but configure it in entirely new ways. One popular option is to go 3D. Instead of etching flat circuits onto the surface of a silicon wafer, build skyscrapers: stack many thin layers of silicon with microcircuitry etched into each. In principle, this should make it possible to pack more computational power into the same space. In practice, however, this currently works only with memory chips, which do not have a heat problem: they use circuits that consume power only when a memory cell is accessed, which is not that often. One example is the Hybrid Memory Cube design, a stack of as many as eight memory layers that is being pursued by an industry consortium originally
196010–2
1
102
104
106
108
1010
1974 1988 2002 2016
19500.1
1
10
100
103
104
105
106
107
108
109
1010
1012
1011
1013
1960
Siz
e (m
m3 )
1970 1980 1990 2000 2010 2020
MOORE’S LORE
For the past five decades, the number of transistors per microprocessor chip — a rough measure of processing power — has doubled about every two years, in step with Moore’s law (top). Chips also increased their ‘clock speed’, or rate of executing instructions, until 2004, when speeds were capped to limit heat. As computers increase in power and shrink in size, a new class of machines has emerged roughly every ten years (bottom).
Transistors per chip
Clock speeds (MHz)
Mainfra
me
Minicomputer
Personal
computer
Laptop
Smartphone
Embedded
processors
SO
UR
CE: TO
P, IN
TEL; B
OTTO
M, S
IA/S
RC
1 4 6 | N A T U R E | V O L 5 3 0 | 1 1 F E B R U A R Y 2 0 1 6
FEATURENEWS
© 2016 Macmillan Publishers Limited. All rights reserved
tran
sist
or c
ount
1970 1980 20101990 2000
108
106
104
102
G. Moore (1965)
100
10-2
clock speed (MHz)
adapted from Nature (2016)
![Page 14: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/14.jpg)
Moore’s Law is coming to an end…
• Four possible ways to overcome the end of Moore’s law:
• Build larger silicon-based computers.
• Develop faster silicon-based technologies.
• Focus on faster algorithms.
• Go beyond standard silicon architectures.
• Here, deep synergy between…
• Physics,…
• …quantum information, … … and computer science.
not scalable
“The road map was an incredibly interesting experiment,” says Flamm. “So far as I know, there is no example of anything like this in any other industry, where every manufacturer and supplier gets together and figures out what they are going to do.” In effect, it converted Moore’s law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did.
And it all worked beautifully, says Flamm — right up until it didn’t.
HEAT DEATHThe first stumbling block was not unexpected. Gargini and others had warned about it as far back as 1989. But it hit hard nonetheless: things got too small.
“It used to be that whenever we would scale to smaller feature size, good things happened automatically,” says Bill Bottoms, president of Third Millennium Test Solutions, an equipment manufacturer in Santa Clara. “The chips would go faster and consume less power.”
But in the early 2000s, when the features began to shrink below about 90 nanometres, that automatic benefit began to fail. As electrons had to move faster and faster through silicon circuits that were smaller and smaller, the chips began to get too hot.
That was a fundamental problem. Heat is hard to get rid of, and no one wants to buy a mobile phone that burns their hand. So manufac-turers seized on the only solutions they had, says Gargini. First, they stopped trying to increase ‘clock rates’ — how fast microprocessors execute instructions. This effectively put a speed limit on the chip’s electrons and limited their ability to generate heat. The maximum clock rate hasn’t budged since 2004.
Second, to keep the chips moving along the Moore’s law performance curve despite the speed limit, they redesigned the internal circuitry so that each chip contained not one processor, or ‘core’, but two, four or more. (Four and eight are common in today’s desktop computers and smartphones.) In principle, says Gargini, “you can have the same output with four cores going at 250 megahertz as one going at 1 gigahertz”. In practice, exploiting eight processors means that a problem has to be broken down into eight pieces — which for many algorithms is dif-ficult to impossible. “The piece that can’t be parallelized will limit your improvement,” says Gargini.
Even so, when combined with creative redesigns to compensate for electron leakage and other effects, these two solutions have enabled chip manufacturers to continue shrinking their circuits and keeping their transistor counts on track with Moore’s law. The question now is what will happen in the early 2020s, when continued scaling is no longer possible with silicon because quantum effects have come into play. What comes next? “We’re still struggling,” says An Chen, an electrical engineer who works for the international chipmaker GlobalFoundries in Santa Clara, California, and who chairs a committee of the new road map that is looking into the question.
That is not for a lack of ideas. One possibility is to embrace a completely new paradigm — something like quantum computing, which promises exponential speed-up for certain calculations, or neuro morphic computing, which aims to model processing elements on neurons in the brain. But none of these alternative paradigms has made it very far out of the laboratory. And many researchers think that quantum computing will offer advantages only for niche applications, rather than for the everyday tasks at which digital computing excels. “What does it mean to quantum-balance a chequebook?” wonders John Shalf, head of computer-science research at the Lawrence Berkeley National Laboratory in Berkeley, California.
MATERIAL DIFFERENCESA different approach, which does stay in the digital realm, is the quest to find a ‘millivolt switch’: a material that could be used for devices at least as fast as their silicon counterparts, but that would generate much less heat. There are many candidates, ranging from 2D graphene-like compounds to spintronic materials that would compute by flipping electron spins rather than by moving electrons. “There is an enormous research space to be explored once you step outside the confines of the established technology,” says Thomas Theis, a physicist who directs the nanoelectronics initiative at the Semiconductor Research Corporation (SRC), a research-funding consortium in Durham, North Carolina.
Unfortunately, no millivolt switch has made it out of the laboratory either. That leaves the architectural approach: stick with silicon, but configure it in entirely new ways. One popular option is to go 3D. Instead of etching flat circuits onto the surface of a silicon wafer, build skyscrapers: stack many thin layers of silicon with microcircuitry etched into each. In principle, this should make it possible to pack more computational power into the same space. In practice, however, this currently works only with memory chips, which do not have a heat problem: they use circuits that consume power only when a memory cell is accessed, which is not that often. One example is the Hybrid Memory Cube design, a stack of as many as eight memory layers that is being pursued by an industry consortium originally
196010–2
1
102
104
106
108
1010
1974 1988 2002 2016
19500.1
1
10
100
103
104
105
106
107
108
109
1010
1012
1011
1013
1960
Siz
e (m
m3 )
1970 1980 1990 2000 2010 2020
MOORE’S LORE
For the past five decades, the number of transistors per microprocessor chip — a rough measure of processing power — has doubled about every two years, in step with Moore’s law (top). Chips also increased their ‘clock speed’, or rate of executing instructions, until 2004, when speeds were capped to limit heat. As computers increase in power and shrink in size, a new class of machines has emerged roughly every ten years (bottom).
Transistors per chip
Clock speeds (MHz)
Mainfra
me
Minicomputer
Personal
computer
Laptop
Smartphone
Embedded
processors
SO
UR
CE: TO
P, IN
TEL; B
OTTO
M, S
IA/S
RC
1 4 6 | N A T U R E | V O L 5 3 0 | 1 1 F E B R U A R Y 2 0 1 6
FEATURENEWS
© 2016 Macmillan Publishers Limited. All rights reserved
tran
sist
or c
ount
1970 1980 20101990 2000
108
106
104
102
G. Moore (1965)
100
10-2
clock speed (MHz)
adapted from Nature (2016)
![Page 15: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/15.jpg)
Moore’s Law is coming to an end…
• Four possible ways to overcome the end of Moore’s law:
• Build larger silicon-based computers.
• Develop faster silicon-based technologies.
• Focus on faster algorithms.
• Go beyond standard silicon architectures.
• Here, deep synergy between…
• Physics,…
• …quantum information, … … and computer science.
not scalable
already close to fab limits
“The road map was an incredibly interesting experiment,” says Flamm. “So far as I know, there is no example of anything like this in any other industry, where every manufacturer and supplier gets together and figures out what they are going to do.” In effect, it converted Moore’s law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did.
And it all worked beautifully, says Flamm — right up until it didn’t.
HEAT DEATHThe first stumbling block was not unexpected. Gargini and others had warned about it as far back as 1989. But it hit hard nonetheless: things got too small.
“It used to be that whenever we would scale to smaller feature size, good things happened automatically,” says Bill Bottoms, president of Third Millennium Test Solutions, an equipment manufacturer in Santa Clara. “The chips would go faster and consume less power.”
But in the early 2000s, when the features began to shrink below about 90 nanometres, that automatic benefit began to fail. As electrons had to move faster and faster through silicon circuits that were smaller and smaller, the chips began to get too hot.
That was a fundamental problem. Heat is hard to get rid of, and no one wants to buy a mobile phone that burns their hand. So manufac-turers seized on the only solutions they had, says Gargini. First, they stopped trying to increase ‘clock rates’ — how fast microprocessors execute instructions. This effectively put a speed limit on the chip’s electrons and limited their ability to generate heat. The maximum clock rate hasn’t budged since 2004.
Second, to keep the chips moving along the Moore’s law performance curve despite the speed limit, they redesigned the internal circuitry so that each chip contained not one processor, or ‘core’, but two, four or more. (Four and eight are common in today’s desktop computers and smartphones.) In principle, says Gargini, “you can have the same output with four cores going at 250 megahertz as one going at 1 gigahertz”. In practice, exploiting eight processors means that a problem has to be broken down into eight pieces — which for many algorithms is dif-ficult to impossible. “The piece that can’t be parallelized will limit your improvement,” says Gargini.
Even so, when combined with creative redesigns to compensate for electron leakage and other effects, these two solutions have enabled chip manufacturers to continue shrinking their circuits and keeping their transistor counts on track with Moore’s law. The question now is what will happen in the early 2020s, when continued scaling is no longer possible with silicon because quantum effects have come into play. What comes next? “We’re still struggling,” says An Chen, an electrical engineer who works for the international chipmaker GlobalFoundries in Santa Clara, California, and who chairs a committee of the new road map that is looking into the question.
That is not for a lack of ideas. One possibility is to embrace a completely new paradigm — something like quantum computing, which promises exponential speed-up for certain calculations, or neuro morphic computing, which aims to model processing elements on neurons in the brain. But none of these alternative paradigms has made it very far out of the laboratory. And many researchers think that quantum computing will offer advantages only for niche applications, rather than for the everyday tasks at which digital computing excels. “What does it mean to quantum-balance a chequebook?” wonders John Shalf, head of computer-science research at the Lawrence Berkeley National Laboratory in Berkeley, California.
MATERIAL DIFFERENCESA different approach, which does stay in the digital realm, is the quest to find a ‘millivolt switch’: a material that could be used for devices at least as fast as their silicon counterparts, but that would generate much less heat. There are many candidates, ranging from 2D graphene-like compounds to spintronic materials that would compute by flipping electron spins rather than by moving electrons. “There is an enormous research space to be explored once you step outside the confines of the established technology,” says Thomas Theis, a physicist who directs the nanoelectronics initiative at the Semiconductor Research Corporation (SRC), a research-funding consortium in Durham, North Carolina.
Unfortunately, no millivolt switch has made it out of the laboratory either. That leaves the architectural approach: stick with silicon, but configure it in entirely new ways. One popular option is to go 3D. Instead of etching flat circuits onto the surface of a silicon wafer, build skyscrapers: stack many thin layers of silicon with microcircuitry etched into each. In principle, this should make it possible to pack more computational power into the same space. In practice, however, this currently works only with memory chips, which do not have a heat problem: they use circuits that consume power only when a memory cell is accessed, which is not that often. One example is the Hybrid Memory Cube design, a stack of as many as eight memory layers that is being pursued by an industry consortium originally
196010–2
1
102
104
106
108
1010
1974 1988 2002 2016
19500.1
1
10
100
103
104
105
106
107
108
109
1010
1012
1011
1013
1960
Siz
e (m
m3 )
1970 1980 1990 2000 2010 2020
MOORE’S LORE
For the past five decades, the number of transistors per microprocessor chip — a rough measure of processing power — has doubled about every two years, in step with Moore’s law (top). Chips also increased their ‘clock speed’, or rate of executing instructions, until 2004, when speeds were capped to limit heat. As computers increase in power and shrink in size, a new class of machines has emerged roughly every ten years (bottom).
Transistors per chip
Clock speeds (MHz)
Mainfra
me
Minicomputer
Personal
computer
Laptop
Smartphone
Embedded
processors
SO
UR
CE: TO
P, IN
TEL; B
OTTO
M, S
IA/S
RC
1 4 6 | N A T U R E | V O L 5 3 0 | 1 1 F E B R U A R Y 2 0 1 6
FEATURENEWS
© 2016 Macmillan Publishers Limited. All rights reserved
tran
sist
or c
ount
1970 1980 20101990 2000
108
106
104
102
G. Moore (1965)
100
10-2
clock speed (MHz)
adapted from Nature (2016)
![Page 16: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/16.jpg)
Moore’s Law is coming to an end…
• Four possible ways to overcome the end of Moore’s law:
• Build larger silicon-based computers.
• Develop faster silicon-based technologies.
• Focus on faster algorithms.
• Go beyond standard silicon architectures.
• Here, deep synergy between…
• Physics,…
• …quantum information, … … and computer science.
not scalable
already close to fab limits
potentially disruptive
“The road map was an incredibly interesting experiment,” says Flamm. “So far as I know, there is no example of anything like this in any other industry, where every manufacturer and supplier gets together and figures out what they are going to do.” In effect, it converted Moore’s law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did.
And it all worked beautifully, says Flamm — right up until it didn’t.
HEAT DEATHThe first stumbling block was not unexpected. Gargini and others had warned about it as far back as 1989. But it hit hard nonetheless: things got too small.
“It used to be that whenever we would scale to smaller feature size, good things happened automatically,” says Bill Bottoms, president of Third Millennium Test Solutions, an equipment manufacturer in Santa Clara. “The chips would go faster and consume less power.”
But in the early 2000s, when the features began to shrink below about 90 nanometres, that automatic benefit began to fail. As electrons had to move faster and faster through silicon circuits that were smaller and smaller, the chips began to get too hot.
That was a fundamental problem. Heat is hard to get rid of, and no one wants to buy a mobile phone that burns their hand. So manufac-turers seized on the only solutions they had, says Gargini. First, they stopped trying to increase ‘clock rates’ — how fast microprocessors execute instructions. This effectively put a speed limit on the chip’s electrons and limited their ability to generate heat. The maximum clock rate hasn’t budged since 2004.
Second, to keep the chips moving along the Moore’s law performance curve despite the speed limit, they redesigned the internal circuitry so that each chip contained not one processor, or ‘core’, but two, four or more. (Four and eight are common in today’s desktop computers and smartphones.) In principle, says Gargini, “you can have the same output with four cores going at 250 megahertz as one going at 1 gigahertz”. In practice, exploiting eight processors means that a problem has to be broken down into eight pieces — which for many algorithms is dif-ficult to impossible. “The piece that can’t be parallelized will limit your improvement,” says Gargini.
Even so, when combined with creative redesigns to compensate for electron leakage and other effects, these two solutions have enabled chip manufacturers to continue shrinking their circuits and keeping their transistor counts on track with Moore’s law. The question now is what will happen in the early 2020s, when continued scaling is no longer possible with silicon because quantum effects have come into play. What comes next? “We’re still struggling,” says An Chen, an electrical engineer who works for the international chipmaker GlobalFoundries in Santa Clara, California, and who chairs a committee of the new road map that is looking into the question.
That is not for a lack of ideas. One possibility is to embrace a completely new paradigm — something like quantum computing, which promises exponential speed-up for certain calculations, or neuro morphic computing, which aims to model processing elements on neurons in the brain. But none of these alternative paradigms has made it very far out of the laboratory. And many researchers think that quantum computing will offer advantages only for niche applications, rather than for the everyday tasks at which digital computing excels. “What does it mean to quantum-balance a chequebook?” wonders John Shalf, head of computer-science research at the Lawrence Berkeley National Laboratory in Berkeley, California.
MATERIAL DIFFERENCESA different approach, which does stay in the digital realm, is the quest to find a ‘millivolt switch’: a material that could be used for devices at least as fast as their silicon counterparts, but that would generate much less heat. There are many candidates, ranging from 2D graphene-like compounds to spintronic materials that would compute by flipping electron spins rather than by moving electrons. “There is an enormous research space to be explored once you step outside the confines of the established technology,” says Thomas Theis, a physicist who directs the nanoelectronics initiative at the Semiconductor Research Corporation (SRC), a research-funding consortium in Durham, North Carolina.
Unfortunately, no millivolt switch has made it out of the laboratory either. That leaves the architectural approach: stick with silicon, but configure it in entirely new ways. One popular option is to go 3D. Instead of etching flat circuits onto the surface of a silicon wafer, build skyscrapers: stack many thin layers of silicon with microcircuitry etched into each. In principle, this should make it possible to pack more computational power into the same space. In practice, however, this currently works only with memory chips, which do not have a heat problem: they use circuits that consume power only when a memory cell is accessed, which is not that often. One example is the Hybrid Memory Cube design, a stack of as many as eight memory layers that is being pursued by an industry consortium originally
196010–2
1
102
104
106
108
1010
1974 1988 2002 2016
19500.1
1
10
100
103
104
105
106
107
108
109
1010
1012
1011
1013
1960
Siz
e (m
m3 )
1970 1980 1990 2000 2010 2020
MOORE’S LORE
For the past five decades, the number of transistors per microprocessor chip — a rough measure of processing power — has doubled about every two years, in step with Moore’s law (top). Chips also increased their ‘clock speed’, or rate of executing instructions, until 2004, when speeds were capped to limit heat. As computers increase in power and shrink in size, a new class of machines has emerged roughly every ten years (bottom).
Transistors per chip
Clock speeds (MHz)
Mainfra
me
Minicomputer
Personal
computer
Laptop
Smartphone
Embedded
processors
SO
UR
CE: TO
P, IN
TEL; B
OTTO
M, S
IA/S
RC
1 4 6 | N A T U R E | V O L 5 3 0 | 1 1 F E B R U A R Y 2 0 1 6
FEATURENEWS
© 2016 Macmillan Publishers Limited. All rights reserved
tran
sist
or c
ount
1970 1980 20101990 2000
108
106
104
102
G. Moore (1965)
100
10-2
clock speed (MHz)
adapted from Nature (2016)
![Page 17: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/17.jpg)
Moore’s Law is coming to an end…
• Four possible ways to overcome the end of Moore’s law:
• Build larger silicon-based computers.
• Develop faster silicon-based technologies.
• Focus on faster algorithms.
• Go beyond standard silicon architectures.
• Here, deep synergy between…
• Physics,…
• …quantum information, … … and computer science.
not scalable
already close to fab limits
potentially disruptive
“The road map was an incredibly interesting experiment,” says Flamm. “So far as I know, there is no example of anything like this in any other industry, where every manufacturer and supplier gets together and figures out what they are going to do.” In effect, it converted Moore’s law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did.
And it all worked beautifully, says Flamm — right up until it didn’t.
HEAT DEATHThe first stumbling block was not unexpected. Gargini and others had warned about it as far back as 1989. But it hit hard nonetheless: things got too small.
“It used to be that whenever we would scale to smaller feature size, good things happened automatically,” says Bill Bottoms, president of Third Millennium Test Solutions, an equipment manufacturer in Santa Clara. “The chips would go faster and consume less power.”
But in the early 2000s, when the features began to shrink below about 90 nanometres, that automatic benefit began to fail. As electrons had to move faster and faster through silicon circuits that were smaller and smaller, the chips began to get too hot.
That was a fundamental problem. Heat is hard to get rid of, and no one wants to buy a mobile phone that burns their hand. So manufac-turers seized on the only solutions they had, says Gargini. First, they stopped trying to increase ‘clock rates’ — how fast microprocessors execute instructions. This effectively put a speed limit on the chip’s electrons and limited their ability to generate heat. The maximum clock rate hasn’t budged since 2004.
Second, to keep the chips moving along the Moore’s law performance curve despite the speed limit, they redesigned the internal circuitry so that each chip contained not one processor, or ‘core’, but two, four or more. (Four and eight are common in today’s desktop computers and smartphones.) In principle, says Gargini, “you can have the same output with four cores going at 250 megahertz as one going at 1 gigahertz”. In practice, exploiting eight processors means that a problem has to be broken down into eight pieces — which for many algorithms is dif-ficult to impossible. “The piece that can’t be parallelized will limit your improvement,” says Gargini.
Even so, when combined with creative redesigns to compensate for electron leakage and other effects, these two solutions have enabled chip manufacturers to continue shrinking their circuits and keeping their transistor counts on track with Moore’s law. The question now is what will happen in the early 2020s, when continued scaling is no longer possible with silicon because quantum effects have come into play. What comes next? “We’re still struggling,” says An Chen, an electrical engineer who works for the international chipmaker GlobalFoundries in Santa Clara, California, and who chairs a committee of the new road map that is looking into the question.
That is not for a lack of ideas. One possibility is to embrace a completely new paradigm — something like quantum computing, which promises exponential speed-up for certain calculations, or neuro morphic computing, which aims to model processing elements on neurons in the brain. But none of these alternative paradigms has made it very far out of the laboratory. And many researchers think that quantum computing will offer advantages only for niche applications, rather than for the everyday tasks at which digital computing excels. “What does it mean to quantum-balance a chequebook?” wonders John Shalf, head of computer-science research at the Lawrence Berkeley National Laboratory in Berkeley, California.
MATERIAL DIFFERENCESA different approach, which does stay in the digital realm, is the quest to find a ‘millivolt switch’: a material that could be used for devices at least as fast as their silicon counterparts, but that would generate much less heat. There are many candidates, ranging from 2D graphene-like compounds to spintronic materials that would compute by flipping electron spins rather than by moving electrons. “There is an enormous research space to be explored once you step outside the confines of the established technology,” says Thomas Theis, a physicist who directs the nanoelectronics initiative at the Semiconductor Research Corporation (SRC), a research-funding consortium in Durham, North Carolina.
Unfortunately, no millivolt switch has made it out of the laboratory either. That leaves the architectural approach: stick with silicon, but configure it in entirely new ways. One popular option is to go 3D. Instead of etching flat circuits onto the surface of a silicon wafer, build skyscrapers: stack many thin layers of silicon with microcircuitry etched into each. In principle, this should make it possible to pack more computational power into the same space. In practice, however, this currently works only with memory chips, which do not have a heat problem: they use circuits that consume power only when a memory cell is accessed, which is not that often. One example is the Hybrid Memory Cube design, a stack of as many as eight memory layers that is being pursued by an industry consortium originally
196010–2
1
102
104
106
108
1010
1974 1988 2002 2016
19500.1
1
10
100
103
104
105
106
107
108
109
1010
1012
1011
1013
1960
Siz
e (m
m3 )
1970 1980 1990 2000 2010 2020
MOORE’S LORE
For the past five decades, the number of transistors per microprocessor chip — a rough measure of processing power — has doubled about every two years, in step with Moore’s law (top). Chips also increased their ‘clock speed’, or rate of executing instructions, until 2004, when speeds were capped to limit heat. As computers increase in power and shrink in size, a new class of machines has emerged roughly every ten years (bottom).
Transistors per chip
Clock speeds (MHz)
Mainfra
me
Minicomputer
Personal
computer
Laptop
Smartphone
Embedded
processors
SO
UR
CE: TO
P, IN
TEL; B
OTTO
M, S
IA/S
RC
1 4 6 | N A T U R E | V O L 5 3 0 | 1 1 F E B R U A R Y 2 0 1 6
FEATURENEWS
© 2016 Macmillan Publishers Limited. All rights reserved
tran
sist
or c
ount
1970 1980 20101990 2000
108
106
104
102
G. Moore (1965)
100
10-2
clock speed (MHz)
adapted from Nature (2016)
![Page 18: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/18.jpg)
Current state of the art: Special-purpose analog quantum annealers
Antikythera ~ 80BC
![Page 19: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/19.jpg)
![Page 20: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/20.jpg)
• What is it?
• Semi-programmable analog annealer.
• 2000 superconducting flux qubits.
• Controversial performance.
• Still, huge technological feat…
• What can it do?
• It can minimize QUBOs post embedding onto the machine’s hardwired Chimera topology.
• Limitations:
• Low connectivity.
• Analog noise.
• …
![Page 21: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/21.jpg)
• What is it?
• Semi-programmable analog annealer.
• 2000 superconducting flux qubits.
• Controversial performance.
• Still, huge technological feat…
• What can it do?
• It can minimize QUBOs post embedding onto the machine’s hardwired Chimera topology.
• Limitations:
• Low connectivity.
• Analog noise.
• …
![Page 22: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/22.jpg)
• What is it?
• Semi-programmable analog annealer.
• 2000 superconducting flux qubits.
• Controversial performance.
• Still, huge technological feat…
• What can it do?
• It can minimize QUBOs post embedding onto the machine’s hardwired Chimera topology.
• Limitations:
• Low connectivity.
• Analog noise.
• …
K44
![Page 23: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/23.jpg)
How do quantum annealers optimize?
![Page 24: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/24.jpg)
How do quantum annealers optimize?
Sequentially.
![Page 25: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/25.jpg)
Classical Analog: Simulated Annealing (SA)
• Annealing:
• 7000 year-old neolithic technology.
• Slowly cool to remove imperfections.
• Simulated Annealing (SA):
• Stochastically sample using Monte Carlo.
• If the system is thermalized, cool it.
• The slower the cooling, the better, e.g.,
• Problem: SA is inefficient for complex systems.
• Solution: Multiple restarts & statistics gathering.
Germancopper axe
Kirkpatrick et al., Science (83)
H({S})
Geman & Geman
T (t) = a� bt
![Page 26: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/26.jpg)
Classical Analog: Simulated Annealing (SA)
• Annealing:
• 7000 year-old neolithic technology.
• Slowly cool to remove imperfections.
• Simulated Annealing (SA):
• Stochastically sample using Monte Carlo.
• If the system is thermalized, cool it.
• The slower the cooling, the better, e.g.,
• Problem: SA is inefficient for complex systems.
• Solution: Multiple restarts & statistics gathering.
Germancopper axe
Kirkpatrick et al., Science (83)
H({S})
Geman & Geman
T (t) = a� bt
![Page 27: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/27.jpg)
Classical Analog: Simulated Annealing (SA)
• Annealing:
• 7000 year-old neolithic technology.
• Slowly cool to remove imperfections.
• Simulated Annealing (SA):
• Stochastically sample using Monte Carlo.
• If the system is thermalized, cool it.
• The slower the cooling, the better, e.g.,
• Problem: SA is inefficient for complex systems.
• Solution: Multiple restarts & statistics gathering.
Kirkpatrick et al., Science (83)
H({S})
Geman & Geman
T (t) = a� bt
![Page 28: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/28.jpg)
Classical Analog: Simulated Annealing (SA)
• Annealing:
• 7000 year-old neolithic technology.
• Slowly cool to remove imperfections.
• Simulated Annealing (SA):
• Stochastically sample using Monte Carlo.
• If the system is thermalized, cool it.
• The slower the cooling, the better, e.g.,
• Problem: SA is inefficient for complex systems.
• Solution: Multiple restarts & statistics gathering.
Kirkpatrick et al., Science (83)
H({S})
Geman & Geman
T (t) = a� bt
![Page 29: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/29.jpg)
Quantum Annealing (QA)
• Idea:
• Use quantum fluctuations instead of thermal.
• Sequential algorithm like SA.
• Theoretical advantages over SA:
• Fluctuations determine the “tunneling radius.”
• Not limited to a local search.
• Implementation in DW device (transverse-field QA):
• Apply a transverse field that does not commute:
• Reduce the fluctuation amplitude D via a given annealing protocol.
Kadowaki & Nishimori (98)Farhi et al. (00)
H(Si) =NX
i 6=j
QijSiSj H(Si
) =NX
i 6=j
Qij
Sz
i
Sz
i
�DNX
i
Sx
i
Morita & Nishimori (06)
[Sx, Sz] 6= 0
![Page 30: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/30.jpg)
Quantum Annealing (QA)
• Idea:
• Use quantum fluctuations instead of thermal.
• Sequential algorithm like SA.
• Theoretical advantages over SA:
• Fluctuations determine the “tunneling radius.”
• Not limited to a local search.
• Implementation in DW device (transverse-field QA):
• Apply a transverse field that does not commute:
• Reduce the fluctuation amplitude D via a given annealing protocol.
Kadowaki & Nishimori (98)Farhi et al. (00)
H(Si) =NX
i 6=j
QijSiSj H(Si
) =NX
i 6=j
Qij
Sz
i
Sz
i
�DNX
i
Sx
i
Morita & Nishimori (06)
[Sx, Sz] 6= 0
![Page 31: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/31.jpg)
Promising signs of quantum speedup…?
see Mandrà, Zhu, Perdomo-O. & Katzgraber (PRA, arXiv:1604.01746)
Denchev et al. (Dec. 2015)
![Page 32: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/32.jpg)
5
100 200 300 400 500 600 700 800 900 1000Problem size (bits)
102
104
106
108
1010
1012
1014Q
MC
and
SA
sing
le-c
ore
anne
alin
gtim
e(µ
s)180 296 489 681 945
85th75th50th
102
104
106
108
1010
1012
1014
D-W
ave
anne
alin
gtim
e(µ
s)
QMC
SA
D-Wave
FIG. 4. Time to find the optimal solution with 99% proba-bility for di↵erent problem sizes. We compare Simulated An-nealing (SA), Quantum Monte Carlo (QMC) and the D-Wave2X. To assign a runtime for the classical algorithms we takethe number of spin updates (for SA) or worldline updates (forQMC) that are required to reach a 99% success probabilityand multiply that with the time to perform one update ona single state-of-the-art core. Shown are the 50th, 75th and85th percentiles over a set of 100 instances. It occupied mil-lions of processor cores for several days to tune and run theclassical algorithms for these benchmarks. The runtimes forthe higher quantiles for the largest problem size for QMC werenot computed due to the high computational cost. For a sim-ilar comparison with QMC with di↵erent parameters pleasesee Fig. 13
tonian is
Hcl
= �MX⌧=1
0@Xjk
Jjk
M�j
(⌧)�k
(⌧)
+J?(s)Xj
�j
(⌧)�j
(⌧ + 1)
1A , (9)
where �j
(⌧) = ±1 are classical spins, j and k are siteindices, ⌧ is a replica index, and M is the number ofreplicas. The coupling between replicas is given by
J?(s) = � 1
2�ln tanh
A(s)�
M, (10)
where � is the inverse temperature. The configurationsfor a given spin j across all replicas ⌧ is called the world-line of spin j. Periodic boundary conditions are imposedbetween �
j
(M) and �j
(1). We used continuous path in-tegral QMC, which corresponds to the limit �⌧ ! 0 [46],and, unlike discrete path integral QMC, does not su↵erfrom discretization errors of order 1/M .
We numerically compute the number of sweeps nsweeps
required for QMC to find the ground state with 99%
probability at di↵erent quantiles. In our case, a sweepcorresponds to two update attempts for each worldline.The computational e↵ort is n
sweeps
⇥N⇥Tworldline
, whereN is the number of qubits and T
worldline
is the time to up-date a worldline. We average T
worldline
over all the stepsin the quantum annealing schedule; however the valueof T
worldline
depends on the particular schedule chosen.As explained above for SA, we report the total computa-tional e↵ort of QMC in standard units of time per singlecore. For the annealing schedule used in the current D-Wave 2X processor, we find
Tworldline
= � ⇥ 870 ns (11)
using an Intel(R) Xeon(R) CPU E5-1650 @ 3.20GHz.This study is designed to explore the utility of QMC
as a classical optimization routine. Accordingly, we op-timize QMC by running at a low temperature, 4.8 mK.We also observe that QMC with open boundary condi-tions (OBC) performs better than standard QMC withperiodic boundary conditions in this case [38]; therefore,OBC is used in this comparison. We further optimize thenumber of sweeps per run which, for a given quantile, re-sults in the lowest total computational e↵ort. We findthat the optimal number of sweeps is 106 at the largestproblem size. This enhances the ability of QMC to simu-late quantum tunneling, and gives a very high probabilityof success per run in the median case, p
success
= 0.16.All the qubits in a cluster have approximately the same
orientation in each local minima of the e↵ective meanfield potential. Neighboring local minima typically cor-respond to di↵erent orientations of a single cluster. Here,tunneling time is dominated by a single purely imaginaryinstanton and is described by Eq. (35) below. It wasrecently demonstrated that, in this situation, the expo-nent a
min
/~ for physical tunneling is identical to that ofQMC [38]. As seen in Fig. 4, we do not find a substan-tial di↵erence in the scaling of QMC and D-Wave (QA).However, we find a very substantial computational over-head associated with the prefactor B in the expressionT = BeDamin/~ for the runtime. In other words, B
QMC
can exceed BQA
by many orders of magnitude. The roleof the prefactor becomes essential in situations where thenumber of cotunneling qubits D is finite, i.e., is inde-pendent of the problem size N (or depends on N veryweakly). Between some quantiles and system sizes weobserve a prefactor advantage as high as 108.
C. D-Wave versus other Classical Solvers
Based on the results presented here, one cannot claima quantum speedup for D-Wave 2X, as this would requirethat the quantum processor in question outperforms thebest known classical algorithm. This is not the case forthe weak-strong cluster networks. This is because a va-riety of heuristic classical algorithms can solve most in-stances of Chimera structured problems much faster thanSA, QMC, and the D-Wave 2X [47–49] (for a possible
Google’s “108 results” – slope vs offset
N [problem size]
T
TS
in µ
s
DW
2 an
neal
ing
time
in µ
s
Denchev et al. (15)
![Page 33: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/33.jpg)
5
100 200 300 400 500 600 700 800 900 1000Problem size (bits)
102
104
106
108
1010
1012
1014Q
MC
and
SA
sing
le-c
ore
anne
alin
gtim
e(µ
s)180 296 489 681 945
85th75th50th
102
104
106
108
1010
1012
1014
D-W
ave
anne
alin
gtim
e(µ
s)
QMC
SA
D-Wave
FIG. 4. Time to find the optimal solution with 99% proba-bility for di↵erent problem sizes. We compare Simulated An-nealing (SA), Quantum Monte Carlo (QMC) and the D-Wave2X. To assign a runtime for the classical algorithms we takethe number of spin updates (for SA) or worldline updates (forQMC) that are required to reach a 99% success probabilityand multiply that with the time to perform one update ona single state-of-the-art core. Shown are the 50th, 75th and85th percentiles over a set of 100 instances. It occupied mil-lions of processor cores for several days to tune and run theclassical algorithms for these benchmarks. The runtimes forthe higher quantiles for the largest problem size for QMC werenot computed due to the high computational cost. For a sim-ilar comparison with QMC with di↵erent parameters pleasesee Fig. 13
tonian is
Hcl
= �MX⌧=1
0@Xjk
Jjk
M�j
(⌧)�k
(⌧)
+J?(s)Xj
�j
(⌧)�j
(⌧ + 1)
1A , (9)
where �j
(⌧) = ±1 are classical spins, j and k are siteindices, ⌧ is a replica index, and M is the number ofreplicas. The coupling between replicas is given by
J?(s) = � 1
2�ln tanh
A(s)�
M, (10)
where � is the inverse temperature. The configurationsfor a given spin j across all replicas ⌧ is called the world-line of spin j. Periodic boundary conditions are imposedbetween �
j
(M) and �j
(1). We used continuous path in-tegral QMC, which corresponds to the limit �⌧ ! 0 [46],and, unlike discrete path integral QMC, does not su↵erfrom discretization errors of order 1/M .
We numerically compute the number of sweeps nsweeps
required for QMC to find the ground state with 99%
probability at di↵erent quantiles. In our case, a sweepcorresponds to two update attempts for each worldline.The computational e↵ort is n
sweeps
⇥N⇥Tworldline
, whereN is the number of qubits and T
worldline
is the time to up-date a worldline. We average T
worldline
over all the stepsin the quantum annealing schedule; however the valueof T
worldline
depends on the particular schedule chosen.As explained above for SA, we report the total computa-tional e↵ort of QMC in standard units of time per singlecore. For the annealing schedule used in the current D-Wave 2X processor, we find
Tworldline
= � ⇥ 870 ns (11)
using an Intel(R) Xeon(R) CPU E5-1650 @ 3.20GHz.This study is designed to explore the utility of QMC
as a classical optimization routine. Accordingly, we op-timize QMC by running at a low temperature, 4.8 mK.We also observe that QMC with open boundary condi-tions (OBC) performs better than standard QMC withperiodic boundary conditions in this case [38]; therefore,OBC is used in this comparison. We further optimize thenumber of sweeps per run which, for a given quantile, re-sults in the lowest total computational e↵ort. We findthat the optimal number of sweeps is 106 at the largestproblem size. This enhances the ability of QMC to simu-late quantum tunneling, and gives a very high probabilityof success per run in the median case, p
success
= 0.16.All the qubits in a cluster have approximately the same
orientation in each local minima of the e↵ective meanfield potential. Neighboring local minima typically cor-respond to di↵erent orientations of a single cluster. Here,tunneling time is dominated by a single purely imaginaryinstanton and is described by Eq. (35) below. It wasrecently demonstrated that, in this situation, the expo-nent a
min
/~ for physical tunneling is identical to that ofQMC [38]. As seen in Fig. 4, we do not find a substan-tial di↵erence in the scaling of QMC and D-Wave (QA).However, we find a very substantial computational over-head associated with the prefactor B in the expressionT = BeDamin/~ for the runtime. In other words, B
QMC
can exceed BQA
by many orders of magnitude. The roleof the prefactor becomes essential in situations where thenumber of cotunneling qubits D is finite, i.e., is inde-pendent of the problem size N (or depends on N veryweakly). Between some quantiles and system sizes weobserve a prefactor advantage as high as 108.
C. D-Wave versus other Classical Solvers
Based on the results presented here, one cannot claima quantum speedup for D-Wave 2X, as this would requirethat the quantum processor in question outperforms thebest known classical algorithm. This is not the case forthe weak-strong cluster networks. This is because a va-riety of heuristic classical algorithms can solve most in-stances of Chimera structured problems much faster thanSA, QMC, and the D-Wave 2X [47–49] (for a possible
Google’s “108 results” – slope vs offset
N [problem size]
T
TS
in µ
s
DW
2 an
neal
ing
time
in µ
s
Denchev et al. (15)
![Page 34: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/34.jpg)
5
100 200 300 400 500 600 700 800 900 1000Problem size (bits)
102
104
106
108
1010
1012
1014Q
MC
and
SA
sing
le-c
ore
anne
alin
gtim
e(µ
s)180 296 489 681 945
85th75th50th
102
104
106
108
1010
1012
1014
D-W
ave
anne
alin
gtim
e(µ
s)
QMC
SA
D-Wave
FIG. 4. Time to find the optimal solution with 99% proba-bility for di↵erent problem sizes. We compare Simulated An-nealing (SA), Quantum Monte Carlo (QMC) and the D-Wave2X. To assign a runtime for the classical algorithms we takethe number of spin updates (for SA) or worldline updates (forQMC) that are required to reach a 99% success probabilityand multiply that with the time to perform one update ona single state-of-the-art core. Shown are the 50th, 75th and85th percentiles over a set of 100 instances. It occupied mil-lions of processor cores for several days to tune and run theclassical algorithms for these benchmarks. The runtimes forthe higher quantiles for the largest problem size for QMC werenot computed due to the high computational cost. For a sim-ilar comparison with QMC with di↵erent parameters pleasesee Fig. 13
tonian is
Hcl
= �MX⌧=1
0@Xjk
Jjk
M�j
(⌧)�k
(⌧)
+J?(s)Xj
�j
(⌧)�j
(⌧ + 1)
1A , (9)
where �j
(⌧) = ±1 are classical spins, j and k are siteindices, ⌧ is a replica index, and M is the number ofreplicas. The coupling between replicas is given by
J?(s) = � 1
2�ln tanh
A(s)�
M, (10)
where � is the inverse temperature. The configurationsfor a given spin j across all replicas ⌧ is called the world-line of spin j. Periodic boundary conditions are imposedbetween �
j
(M) and �j
(1). We used continuous path in-tegral QMC, which corresponds to the limit �⌧ ! 0 [46],and, unlike discrete path integral QMC, does not su↵erfrom discretization errors of order 1/M .
We numerically compute the number of sweeps nsweeps
required for QMC to find the ground state with 99%
probability at di↵erent quantiles. In our case, a sweepcorresponds to two update attempts for each worldline.The computational e↵ort is n
sweeps
⇥N⇥Tworldline
, whereN is the number of qubits and T
worldline
is the time to up-date a worldline. We average T
worldline
over all the stepsin the quantum annealing schedule; however the valueof T
worldline
depends on the particular schedule chosen.As explained above for SA, we report the total computa-tional e↵ort of QMC in standard units of time per singlecore. For the annealing schedule used in the current D-Wave 2X processor, we find
Tworldline
= � ⇥ 870 ns (11)
using an Intel(R) Xeon(R) CPU E5-1650 @ 3.20GHz.This study is designed to explore the utility of QMC
as a classical optimization routine. Accordingly, we op-timize QMC by running at a low temperature, 4.8 mK.We also observe that QMC with open boundary condi-tions (OBC) performs better than standard QMC withperiodic boundary conditions in this case [38]; therefore,OBC is used in this comparison. We further optimize thenumber of sweeps per run which, for a given quantile, re-sults in the lowest total computational e↵ort. We findthat the optimal number of sweeps is 106 at the largestproblem size. This enhances the ability of QMC to simu-late quantum tunneling, and gives a very high probabilityof success per run in the median case, p
success
= 0.16.All the qubits in a cluster have approximately the same
orientation in each local minima of the e↵ective meanfield potential. Neighboring local minima typically cor-respond to di↵erent orientations of a single cluster. Here,tunneling time is dominated by a single purely imaginaryinstanton and is described by Eq. (35) below. It wasrecently demonstrated that, in this situation, the expo-nent a
min
/~ for physical tunneling is identical to that ofQMC [38]. As seen in Fig. 4, we do not find a substan-tial di↵erence in the scaling of QMC and D-Wave (QA).However, we find a very substantial computational over-head associated with the prefactor B in the expressionT = BeDamin/~ for the runtime. In other words, B
QMC
can exceed BQA
by many orders of magnitude. The roleof the prefactor becomes essential in situations where thenumber of cotunneling qubits D is finite, i.e., is inde-pendent of the problem size N (or depends on N veryweakly). Between some quantiles and system sizes weobserve a prefactor advantage as high as 108.
C. D-Wave versus other Classical Solvers
Based on the results presented here, one cannot claima quantum speedup for D-Wave 2X, as this would requirethat the quantum processor in question outperforms thebest known classical algorithm. This is not the case forthe weak-strong cluster networks. This is because a va-riety of heuristic classical algorithms can solve most in-stances of Chimera structured problems much faster thanSA, QMC, and the D-Wave 2X [47–49] (for a possible
Google’s “108 results” – slope vs offset
N [problem size]
T
TS
in µ
s
DW
2 an
neal
ing
time
in µ
s
4
(a)
J = +1
� < 0.5
J = �1(b)
h
1
=�1
h
2
=��
h
1
Figure 1: Sketch of the weak-strong clusters and networks.(a) Structure of a weak-strong cluster. Two K
4,4 cells of theChimera lattice are connected ferromagnetically (blue lines,J = 1), as well as all spins within each K
4,4 cell. Blackdots correspond to qubits in the strong cluster with a biasingmagnetic field h
1
= �1. The white dots represent the weakcluster, where each site is coupled to a weaker field h
2
= ��h1
with � = 0.44 < 0.5 in the opposite direction. The whitelines represent the connections from the strong cluster toneighboring strong clusters of a weak-strong pair. (b) Weak-strong cluster network: each rectangle represents a weak-strongcluster. The di↵erent weak-strong clusters are connected viaa spin-glass backbone where the interactions can take values{±1}. Here, red lines represent J = �1. Note that theconnections between clusters only occur between the strongclusters.
where V represents the 8 vertices in one K4,4 unit cell
of the Chimera graph. The subset V ⇢ V represents thevertices of the right-hand-side of the strong and weakclusters that are linked by a ferromagnetic interactionJ = 1.
A weak-strong cluster network Hamiltonian H is thenconstructed by connecting the sites of each strong clus-ter with neighboring strong clusters [white lines in Fig-ure 1(a)] using a spin-glass backbone with random cou-plings J
C
2 {±1}, i.e.,
H =X
C
J
C
HC
ws
. (4)
Note that the weak clusters only couple to the strongcluster within a given weak-strong cluster. Because ofimperfections in the DW2X device, the embedding of theweak-strong cluster network in the Chimera topology isnontrivial. However, systems of up to n = 945 qubitshave been studied.
The main result of Ref. [49] is to show, either experimen-tally (by using the DW2X quantum optimizer) or numer-ically (by using quantum Monte Carlo simulations), thatquantum co-tunneling e↵ects play a fundamental role inadiabatic optimization. Note that quantum Monte Carlo
is the closest classical algorithm to quantum annealing onthe DW2X. The results of Ref. [49] on the DW2X chip areapproximately 108 times faster than simulated annealing[15] and considerably faster than quantum Monte Carlodespite both the DW2X quantum annealer and quantumMonte Carlo having a similar scaling (similar slope ofthe curves in Figure 4 of Ref. [49] for quantum MonteCarlo and the DW2X). While this, indeed, represents thefirst solid evidence that the DW2X machine might havecapabilities that classical optimization approaches do notpossess, it is important to perform a comprehensive com-parison to a wide variety of state-of-the-art optimizationmethods. Within the categories defined in Section II,the results of Ref. [49] for the DW2X clearly outperformany sequential optimization methods, however fall shortof outperforming tailored and nontailored optimizationmethods. We feel, however, that knowingly exploiting thestructure of a problem does not amount to a fair compar-ison. However, our results shown below clearly suggestthat generic optimization methods still outperform theDW2X. One might thus question the importance of theresults of Ref. [49]. We emphasize that this is the firststudy that undoubtedly shows that the DW2X machinehas finite-range tunneling and gives clear hints towardsthe class of problems where analog quantum annealingmachines might excel.
In addition to showing here that a variety of either“tailored” to the weak-strong cluster structure or more“generic” classical heuristics can achieve similar perfor-mances of the DW2X chip, we also study the energylandscape of the weak-strong cluster networks. The lat-ter provides valuable insights about the limitations offinite-range tunneling for this class of problems. Ouranalysis suggest that the scaling advantage of finite-rangecotunneling over sequential algorithms could be lost forinstances with problem sizes beyond the ones consideredin Ref. [49].
In the next paragraph we further discuss the perfor-mance of DW2X compared to tailored and nontailoredclassical heuristics in detail.
IV. RESULTS
In this Section, we present our main results. In the firstpart, we compare the performance of the DW2X deviceagainst general (nontailored) and tailored classical algo-rithms. The description of the used algorithms is in theAppendix. In the second part, we analyze in depth thescaling behavior of the DW2X device by varying the num-ber of used qubits. The aim is to better understand therole of a non-optimal annealing times for a noisy analogdevice to the asymptotic scaling of the computationaltime. Finally, we study the energy landscape, as proposedin Ref. [32], and show that for increasing problem size thespin-glass backbone of the weak-strong cluster networkdominates and the advantages of finite-range tunnelingdiminish for increasing system sizes.
Q = �1
Q = +1
Denchev et al. (15)
![Page 35: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/35.jpg)
5
100 200 300 400 500 600 700 800 900 1000Problem size (bits)
102
104
106
108
1010
1012
1014Q
MC
and
SA
sing
le-c
ore
anne
alin
gtim
e(µ
s)180 296 489 681 945
85th75th50th
102
104
106
108
1010
1012
1014
D-W
ave
anne
alin
gtim
e(µ
s)
QMC
SA
D-Wave
FIG. 4. Time to find the optimal solution with 99% proba-bility for di↵erent problem sizes. We compare Simulated An-nealing (SA), Quantum Monte Carlo (QMC) and the D-Wave2X. To assign a runtime for the classical algorithms we takethe number of spin updates (for SA) or worldline updates (forQMC) that are required to reach a 99% success probabilityand multiply that with the time to perform one update ona single state-of-the-art core. Shown are the 50th, 75th and85th percentiles over a set of 100 instances. It occupied mil-lions of processor cores for several days to tune and run theclassical algorithms for these benchmarks. The runtimes forthe higher quantiles for the largest problem size for QMC werenot computed due to the high computational cost. For a sim-ilar comparison with QMC with di↵erent parameters pleasesee Fig. 13
tonian is
Hcl
= �MX⌧=1
0@Xjk
Jjk
M�j
(⌧)�k
(⌧)
+J?(s)Xj
�j
(⌧)�j
(⌧ + 1)
1A , (9)
where �j
(⌧) = ±1 are classical spins, j and k are siteindices, ⌧ is a replica index, and M is the number ofreplicas. The coupling between replicas is given by
J?(s) = � 1
2�ln tanh
A(s)�
M, (10)
where � is the inverse temperature. The configurationsfor a given spin j across all replicas ⌧ is called the world-line of spin j. Periodic boundary conditions are imposedbetween �
j
(M) and �j
(1). We used continuous path in-tegral QMC, which corresponds to the limit �⌧ ! 0 [46],and, unlike discrete path integral QMC, does not su↵erfrom discretization errors of order 1/M .
We numerically compute the number of sweeps nsweeps
required for QMC to find the ground state with 99%
probability at di↵erent quantiles. In our case, a sweepcorresponds to two update attempts for each worldline.The computational e↵ort is n
sweeps
⇥N⇥Tworldline
, whereN is the number of qubits and T
worldline
is the time to up-date a worldline. We average T
worldline
over all the stepsin the quantum annealing schedule; however the valueof T
worldline
depends on the particular schedule chosen.As explained above for SA, we report the total computa-tional e↵ort of QMC in standard units of time per singlecore. For the annealing schedule used in the current D-Wave 2X processor, we find
Tworldline
= � ⇥ 870 ns (11)
using an Intel(R) Xeon(R) CPU E5-1650 @ 3.20GHz.This study is designed to explore the utility of QMC
as a classical optimization routine. Accordingly, we op-timize QMC by running at a low temperature, 4.8 mK.We also observe that QMC with open boundary condi-tions (OBC) performs better than standard QMC withperiodic boundary conditions in this case [38]; therefore,OBC is used in this comparison. We further optimize thenumber of sweeps per run which, for a given quantile, re-sults in the lowest total computational e↵ort. We findthat the optimal number of sweeps is 106 at the largestproblem size. This enhances the ability of QMC to simu-late quantum tunneling, and gives a very high probabilityof success per run in the median case, p
success
= 0.16.All the qubits in a cluster have approximately the same
orientation in each local minima of the e↵ective meanfield potential. Neighboring local minima typically cor-respond to di↵erent orientations of a single cluster. Here,tunneling time is dominated by a single purely imaginaryinstanton and is described by Eq. (35) below. It wasrecently demonstrated that, in this situation, the expo-nent a
min
/~ for physical tunneling is identical to that ofQMC [38]. As seen in Fig. 4, we do not find a substan-tial di↵erence in the scaling of QMC and D-Wave (QA).However, we find a very substantial computational over-head associated with the prefactor B in the expressionT = BeDamin/~ for the runtime. In other words, B
QMC
can exceed BQA
by many orders of magnitude. The roleof the prefactor becomes essential in situations where thenumber of cotunneling qubits D is finite, i.e., is inde-pendent of the problem size N (or depends on N veryweakly). Between some quantiles and system sizes weobserve a prefactor advantage as high as 108.
C. D-Wave versus other Classical Solvers
Based on the results presented here, one cannot claima quantum speedup for D-Wave 2X, as this would requirethat the quantum processor in question outperforms thebest known classical algorithm. This is not the case forthe weak-strong cluster networks. This is because a va-riety of heuristic classical algorithms can solve most in-stances of Chimera structured problems much faster thanSA, QMC, and the D-Wave 2X [47–49] (for a possible
Google’s “108 results” – slope vs offset
N [problem size]
T
TS
in µ
s
DW
2 an
neal
ing
time
in µ
s
4
(a)
J = +1
� < 0.5
J = �1(b)
h
1
=�1
h
2
=��
h
1
Figure 1: Sketch of the weak-strong clusters and networks.(a) Structure of a weak-strong cluster. Two K
4,4 cells of theChimera lattice are connected ferromagnetically (blue lines,J = 1), as well as all spins within each K
4,4 cell. Blackdots correspond to qubits in the strong cluster with a biasingmagnetic field h
1
= �1. The white dots represent the weakcluster, where each site is coupled to a weaker field h
2
= ��h1
with � = 0.44 < 0.5 in the opposite direction. The whitelines represent the connections from the strong cluster toneighboring strong clusters of a weak-strong pair. (b) Weak-strong cluster network: each rectangle represents a weak-strongcluster. The di↵erent weak-strong clusters are connected viaa spin-glass backbone where the interactions can take values{±1}. Here, red lines represent J = �1. Note that theconnections between clusters only occur between the strongclusters.
where V represents the 8 vertices in one K4,4 unit cell
of the Chimera graph. The subset V ⇢ V represents thevertices of the right-hand-side of the strong and weakclusters that are linked by a ferromagnetic interactionJ = 1.
A weak-strong cluster network Hamiltonian H is thenconstructed by connecting the sites of each strong clus-ter with neighboring strong clusters [white lines in Fig-ure 1(a)] using a spin-glass backbone with random cou-plings J
C
2 {±1}, i.e.,
H =X
C
J
C
HC
ws
. (4)
Note that the weak clusters only couple to the strongcluster within a given weak-strong cluster. Because ofimperfections in the DW2X device, the embedding of theweak-strong cluster network in the Chimera topology isnontrivial. However, systems of up to n = 945 qubitshave been studied.
The main result of Ref. [49] is to show, either experimen-tally (by using the DW2X quantum optimizer) or numer-ically (by using quantum Monte Carlo simulations), thatquantum co-tunneling e↵ects play a fundamental role inadiabatic optimization. Note that quantum Monte Carlo
is the closest classical algorithm to quantum annealing onthe DW2X. The results of Ref. [49] on the DW2X chip areapproximately 108 times faster than simulated annealing[15] and considerably faster than quantum Monte Carlodespite both the DW2X quantum annealer and quantumMonte Carlo having a similar scaling (similar slope ofthe curves in Figure 4 of Ref. [49] for quantum MonteCarlo and the DW2X). While this, indeed, represents thefirst solid evidence that the DW2X machine might havecapabilities that classical optimization approaches do notpossess, it is important to perform a comprehensive com-parison to a wide variety of state-of-the-art optimizationmethods. Within the categories defined in Section II,the results of Ref. [49] for the DW2X clearly outperformany sequential optimization methods, however fall shortof outperforming tailored and nontailored optimizationmethods. We feel, however, that knowingly exploiting thestructure of a problem does not amount to a fair compar-ison. However, our results shown below clearly suggestthat generic optimization methods still outperform theDW2X. One might thus question the importance of theresults of Ref. [49]. We emphasize that this is the firststudy that undoubtedly shows that the DW2X machinehas finite-range tunneling and gives clear hints towardsthe class of problems where analog quantum annealingmachines might excel.
In addition to showing here that a variety of either“tailored” to the weak-strong cluster structure or more“generic” classical heuristics can achieve similar perfor-mances of the DW2X chip, we also study the energylandscape of the weak-strong cluster networks. The lat-ter provides valuable insights about the limitations offinite-range tunneling for this class of problems. Ouranalysis suggest that the scaling advantage of finite-rangecotunneling over sequential algorithms could be lost forinstances with problem sizes beyond the ones consideredin Ref. [49].
In the next paragraph we further discuss the perfor-mance of DW2X compared to tailored and nontailoredclassical heuristics in detail.
IV. RESULTS
In this Section, we present our main results. In the firstpart, we compare the performance of the DW2X deviceagainst general (nontailored) and tailored classical algo-rithms. The description of the used algorithms is in theAppendix. In the second part, we analyze in depth thescaling behavior of the DW2X device by varying the num-ber of used qubits. The aim is to better understand therole of a non-optimal annealing times for a noisy analogdevice to the asymptotic scaling of the computationaltime. Finally, we study the energy landscape, as proposedin Ref. [32], and show that for increasing problem size thespin-glass backbone of the weak-strong cluster networkdominates and the advantages of finite-range tunnelingdiminish for increasing system sizes.
Q = �1
Q = +1
Denchev et al. (15)
H(Si) =NX
i 6=j
QijSiSj �X
i
hiSi
![Page 36: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/36.jpg)
5
100 200 300 400 500 600 700 800 900 1000Problem size (bits)
102
104
106
108
1010
1012
1014Q
MC
and
SA
sing
le-c
ore
anne
alin
gtim
e(µ
s)180 296 489 681 945
85th75th50th
102
104
106
108
1010
1012
1014
D-W
ave
anne
alin
gtim
e(µ
s)
QMC
SA
D-Wave
FIG. 4. Time to find the optimal solution with 99% proba-bility for di↵erent problem sizes. We compare Simulated An-nealing (SA), Quantum Monte Carlo (QMC) and the D-Wave2X. To assign a runtime for the classical algorithms we takethe number of spin updates (for SA) or worldline updates (forQMC) that are required to reach a 99% success probabilityand multiply that with the time to perform one update ona single state-of-the-art core. Shown are the 50th, 75th and85th percentiles over a set of 100 instances. It occupied mil-lions of processor cores for several days to tune and run theclassical algorithms for these benchmarks. The runtimes forthe higher quantiles for the largest problem size for QMC werenot computed due to the high computational cost. For a sim-ilar comparison with QMC with di↵erent parameters pleasesee Fig. 13
tonian is
Hcl
= �MX⌧=1
0@Xjk
Jjk
M�j
(⌧)�k
(⌧)
+J?(s)Xj
�j
(⌧)�j
(⌧ + 1)
1A , (9)
where �j
(⌧) = ±1 are classical spins, j and k are siteindices, ⌧ is a replica index, and M is the number ofreplicas. The coupling between replicas is given by
J?(s) = � 1
2�ln tanh
A(s)�
M, (10)
where � is the inverse temperature. The configurationsfor a given spin j across all replicas ⌧ is called the world-line of spin j. Periodic boundary conditions are imposedbetween �
j
(M) and �j
(1). We used continuous path in-tegral QMC, which corresponds to the limit �⌧ ! 0 [46],and, unlike discrete path integral QMC, does not su↵erfrom discretization errors of order 1/M .
We numerically compute the number of sweeps nsweeps
required for QMC to find the ground state with 99%
probability at di↵erent quantiles. In our case, a sweepcorresponds to two update attempts for each worldline.The computational e↵ort is n
sweeps
⇥N⇥Tworldline
, whereN is the number of qubits and T
worldline
is the time to up-date a worldline. We average T
worldline
over all the stepsin the quantum annealing schedule; however the valueof T
worldline
depends on the particular schedule chosen.As explained above for SA, we report the total computa-tional e↵ort of QMC in standard units of time per singlecore. For the annealing schedule used in the current D-Wave 2X processor, we find
Tworldline
= � ⇥ 870 ns (11)
using an Intel(R) Xeon(R) CPU E5-1650 @ 3.20GHz.This study is designed to explore the utility of QMC
as a classical optimization routine. Accordingly, we op-timize QMC by running at a low temperature, 4.8 mK.We also observe that QMC with open boundary condi-tions (OBC) performs better than standard QMC withperiodic boundary conditions in this case [38]; therefore,OBC is used in this comparison. We further optimize thenumber of sweeps per run which, for a given quantile, re-sults in the lowest total computational e↵ort. We findthat the optimal number of sweeps is 106 at the largestproblem size. This enhances the ability of QMC to simu-late quantum tunneling, and gives a very high probabilityof success per run in the median case, p
success
= 0.16.All the qubits in a cluster have approximately the same
orientation in each local minima of the e↵ective meanfield potential. Neighboring local minima typically cor-respond to di↵erent orientations of a single cluster. Here,tunneling time is dominated by a single purely imaginaryinstanton and is described by Eq. (35) below. It wasrecently demonstrated that, in this situation, the expo-nent a
min
/~ for physical tunneling is identical to that ofQMC [38]. As seen in Fig. 4, we do not find a substan-tial di↵erence in the scaling of QMC and D-Wave (QA).However, we find a very substantial computational over-head associated with the prefactor B in the expressionT = BeDamin/~ for the runtime. In other words, B
QMC
can exceed BQA
by many orders of magnitude. The roleof the prefactor becomes essential in situations where thenumber of cotunneling qubits D is finite, i.e., is inde-pendent of the problem size N (or depends on N veryweakly). Between some quantiles and system sizes weobserve a prefactor advantage as high as 108.
C. D-Wave versus other Classical Solvers
Based on the results presented here, one cannot claima quantum speedup for D-Wave 2X, as this would requirethat the quantum processor in question outperforms thebest known classical algorithm. This is not the case forthe weak-strong cluster networks. This is because a va-riety of heuristic classical algorithms can solve most in-stances of Chimera structured problems much faster thanSA, QMC, and the D-Wave 2X [47–49] (for a possible
Google’s “108 results” – slope vs offset
N [problem size]
T
TS
in µ
s
DW
2 an
neal
ing
time
in µ
s
4
(a)
J = +1
� < 0.5
J = �1(b)
h
1
=�1
h
2
=��
h
1
Figure 1: Sketch of the weak-strong clusters and networks.(a) Structure of a weak-strong cluster. Two K
4,4 cells of theChimera lattice are connected ferromagnetically (blue lines,J = 1), as well as all spins within each K
4,4 cell. Blackdots correspond to qubits in the strong cluster with a biasingmagnetic field h
1
= �1. The white dots represent the weakcluster, where each site is coupled to a weaker field h
2
= ��h1
with � = 0.44 < 0.5 in the opposite direction. The whitelines represent the connections from the strong cluster toneighboring strong clusters of a weak-strong pair. (b) Weak-strong cluster network: each rectangle represents a weak-strongcluster. The di↵erent weak-strong clusters are connected viaa spin-glass backbone where the interactions can take values{±1}. Here, red lines represent J = �1. Note that theconnections between clusters only occur between the strongclusters.
where V represents the 8 vertices in one K4,4 unit cell
of the Chimera graph. The subset V ⇢ V represents thevertices of the right-hand-side of the strong and weakclusters that are linked by a ferromagnetic interactionJ = 1.
A weak-strong cluster network Hamiltonian H is thenconstructed by connecting the sites of each strong clus-ter with neighboring strong clusters [white lines in Fig-ure 1(a)] using a spin-glass backbone with random cou-plings J
C
2 {±1}, i.e.,
H =X
C
J
C
HC
ws
. (4)
Note that the weak clusters only couple to the strongcluster within a given weak-strong cluster. Because ofimperfections in the DW2X device, the embedding of theweak-strong cluster network in the Chimera topology isnontrivial. However, systems of up to n = 945 qubitshave been studied.
The main result of Ref. [49] is to show, either experimen-tally (by using the DW2X quantum optimizer) or numer-ically (by using quantum Monte Carlo simulations), thatquantum co-tunneling e↵ects play a fundamental role inadiabatic optimization. Note that quantum Monte Carlo
is the closest classical algorithm to quantum annealing onthe DW2X. The results of Ref. [49] on the DW2X chip areapproximately 108 times faster than simulated annealing[15] and considerably faster than quantum Monte Carlodespite both the DW2X quantum annealer and quantumMonte Carlo having a similar scaling (similar slope ofthe curves in Figure 4 of Ref. [49] for quantum MonteCarlo and the DW2X). While this, indeed, represents thefirst solid evidence that the DW2X machine might havecapabilities that classical optimization approaches do notpossess, it is important to perform a comprehensive com-parison to a wide variety of state-of-the-art optimizationmethods. Within the categories defined in Section II,the results of Ref. [49] for the DW2X clearly outperformany sequential optimization methods, however fall shortof outperforming tailored and nontailored optimizationmethods. We feel, however, that knowingly exploiting thestructure of a problem does not amount to a fair compar-ison. However, our results shown below clearly suggestthat generic optimization methods still outperform theDW2X. One might thus question the importance of theresults of Ref. [49]. We emphasize that this is the firststudy that undoubtedly shows that the DW2X machinehas finite-range tunneling and gives clear hints towardsthe class of problems where analog quantum annealingmachines might excel.
In addition to showing here that a variety of either“tailored” to the weak-strong cluster structure or more“generic” classical heuristics can achieve similar perfor-mances of the DW2X chip, we also study the energylandscape of the weak-strong cluster networks. The lat-ter provides valuable insights about the limitations offinite-range tunneling for this class of problems. Ouranalysis suggest that the scaling advantage of finite-rangecotunneling over sequential algorithms could be lost forinstances with problem sizes beyond the ones consideredin Ref. [49].
In the next paragraph we further discuss the perfor-mance of DW2X compared to tailored and nontailoredclassical heuristics in detail.
IV. RESULTS
In this Section, we present our main results. In the firstpart, we compare the performance of the DW2X deviceagainst general (nontailored) and tailored classical algo-rithms. The description of the used algorithms is in theAppendix. In the second part, we analyze in depth thescaling behavior of the DW2X device by varying the num-ber of used qubits. The aim is to better understand therole of a non-optimal annealing times for a noisy analogdevice to the asymptotic scaling of the computationaltime. Finally, we study the energy landscape, as proposedin Ref. [32], and show that for increasing problem size thespin-glass backbone of the weak-strong cluster networkdominates and the advantages of finite-range tunnelingdiminish for increasing system sizes.
Q = �1
Q = +1
Denchev et al. (15)
spin-glass backbone
H(Si) =NX
i 6=j
QijSiSj �X
i
hiSi
![Page 37: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/37.jpg)
5
100 200 300 400 500 600 700 800 900 1000Problem size (bits)
102
104
106
108
1010
1012
1014Q
MC
and
SA
sing
le-c
ore
anne
alin
gtim
e(µ
s)180 296 489 681 945
85th75th50th
102
104
106
108
1010
1012
1014
D-W
ave
anne
alin
gtim
e(µ
s)
QMC
SA
D-Wave
FIG. 4. Time to find the optimal solution with 99% proba-bility for di↵erent problem sizes. We compare Simulated An-nealing (SA), Quantum Monte Carlo (QMC) and the D-Wave2X. To assign a runtime for the classical algorithms we takethe number of spin updates (for SA) or worldline updates (forQMC) that are required to reach a 99% success probabilityand multiply that with the time to perform one update ona single state-of-the-art core. Shown are the 50th, 75th and85th percentiles over a set of 100 instances. It occupied mil-lions of processor cores for several days to tune and run theclassical algorithms for these benchmarks. The runtimes forthe higher quantiles for the largest problem size for QMC werenot computed due to the high computational cost. For a sim-ilar comparison with QMC with di↵erent parameters pleasesee Fig. 13
tonian is
Hcl
= �MX⌧=1
0@Xjk
Jjk
M�j
(⌧)�k
(⌧)
+J?(s)Xj
�j
(⌧)�j
(⌧ + 1)
1A , (9)
where �j
(⌧) = ±1 are classical spins, j and k are siteindices, ⌧ is a replica index, and M is the number ofreplicas. The coupling between replicas is given by
J?(s) = � 1
2�ln tanh
A(s)�
M, (10)
where � is the inverse temperature. The configurationsfor a given spin j across all replicas ⌧ is called the world-line of spin j. Periodic boundary conditions are imposedbetween �
j
(M) and �j
(1). We used continuous path in-tegral QMC, which corresponds to the limit �⌧ ! 0 [46],and, unlike discrete path integral QMC, does not su↵erfrom discretization errors of order 1/M .
We numerically compute the number of sweeps nsweeps
required for QMC to find the ground state with 99%
probability at di↵erent quantiles. In our case, a sweepcorresponds to two update attempts for each worldline.The computational e↵ort is n
sweeps
⇥N⇥Tworldline
, whereN is the number of qubits and T
worldline
is the time to up-date a worldline. We average T
worldline
over all the stepsin the quantum annealing schedule; however the valueof T
worldline
depends on the particular schedule chosen.As explained above for SA, we report the total computa-tional e↵ort of QMC in standard units of time per singlecore. For the annealing schedule used in the current D-Wave 2X processor, we find
Tworldline
= � ⇥ 870 ns (11)
using an Intel(R) Xeon(R) CPU E5-1650 @ 3.20GHz.This study is designed to explore the utility of QMC
as a classical optimization routine. Accordingly, we op-timize QMC by running at a low temperature, 4.8 mK.We also observe that QMC with open boundary condi-tions (OBC) performs better than standard QMC withperiodic boundary conditions in this case [38]; therefore,OBC is used in this comparison. We further optimize thenumber of sweeps per run which, for a given quantile, re-sults in the lowest total computational e↵ort. We findthat the optimal number of sweeps is 106 at the largestproblem size. This enhances the ability of QMC to simu-late quantum tunneling, and gives a very high probabilityof success per run in the median case, p
success
= 0.16.All the qubits in a cluster have approximately the same
orientation in each local minima of the e↵ective meanfield potential. Neighboring local minima typically cor-respond to di↵erent orientations of a single cluster. Here,tunneling time is dominated by a single purely imaginaryinstanton and is described by Eq. (35) below. It wasrecently demonstrated that, in this situation, the expo-nent a
min
/~ for physical tunneling is identical to that ofQMC [38]. As seen in Fig. 4, we do not find a substan-tial di↵erence in the scaling of QMC and D-Wave (QA).However, we find a very substantial computational over-head associated with the prefactor B in the expressionT = BeDamin/~ for the runtime. In other words, B
QMC
can exceed BQA
by many orders of magnitude. The roleof the prefactor becomes essential in situations where thenumber of cotunneling qubits D is finite, i.e., is inde-pendent of the problem size N (or depends on N veryweakly). Between some quantiles and system sizes weobserve a prefactor advantage as high as 108.
C. D-Wave versus other Classical Solvers
Based on the results presented here, one cannot claima quantum speedup for D-Wave 2X, as this would requirethat the quantum processor in question outperforms thebest known classical algorithm. This is not the case forthe weak-strong cluster networks. This is because a va-riety of heuristic classical algorithms can solve most in-stances of Chimera structured problems much faster thanSA, QMC, and the D-Wave 2X [47–49] (for a possible
Google’s “108 results” – slope vs offset
N [problem size]
T
TS
in µ
s
DW
2 an
neal
ing
time
in µ
s
Denchev et al. (15)
![Page 38: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/38.jpg)
5
100 200 300 400 500 600 700 800 900 1000Problem size (bits)
102
104
106
108
1010
1012
1014Q
MC
and
SA
sing
le-c
ore
anne
alin
gtim
e(µ
s)180 296 489 681 945
85th75th50th
102
104
106
108
1010
1012
1014
D-W
ave
anne
alin
gtim
e(µ
s)
QMC
SA
D-Wave
FIG. 4. Time to find the optimal solution with 99% proba-bility for di↵erent problem sizes. We compare Simulated An-nealing (SA), Quantum Monte Carlo (QMC) and the D-Wave2X. To assign a runtime for the classical algorithms we takethe number of spin updates (for SA) or worldline updates (forQMC) that are required to reach a 99% success probabilityand multiply that with the time to perform one update ona single state-of-the-art core. Shown are the 50th, 75th and85th percentiles over a set of 100 instances. It occupied mil-lions of processor cores for several days to tune and run theclassical algorithms for these benchmarks. The runtimes forthe higher quantiles for the largest problem size for QMC werenot computed due to the high computational cost. For a sim-ilar comparison with QMC with di↵erent parameters pleasesee Fig. 13
tonian is
Hcl
= �MX⌧=1
0@Xjk
Jjk
M�j
(⌧)�k
(⌧)
+J?(s)Xj
�j
(⌧)�j
(⌧ + 1)
1A , (9)
where �j
(⌧) = ±1 are classical spins, j and k are siteindices, ⌧ is a replica index, and M is the number ofreplicas. The coupling between replicas is given by
J?(s) = � 1
2�ln tanh
A(s)�
M, (10)
where � is the inverse temperature. The configurationsfor a given spin j across all replicas ⌧ is called the world-line of spin j. Periodic boundary conditions are imposedbetween �
j
(M) and �j
(1). We used continuous path in-tegral QMC, which corresponds to the limit �⌧ ! 0 [46],and, unlike discrete path integral QMC, does not su↵erfrom discretization errors of order 1/M .
We numerically compute the number of sweeps nsweeps
required for QMC to find the ground state with 99%
probability at di↵erent quantiles. In our case, a sweepcorresponds to two update attempts for each worldline.The computational e↵ort is n
sweeps
⇥N⇥Tworldline
, whereN is the number of qubits and T
worldline
is the time to up-date a worldline. We average T
worldline
over all the stepsin the quantum annealing schedule; however the valueof T
worldline
depends on the particular schedule chosen.As explained above for SA, we report the total computa-tional e↵ort of QMC in standard units of time per singlecore. For the annealing schedule used in the current D-Wave 2X processor, we find
Tworldline
= � ⇥ 870 ns (11)
using an Intel(R) Xeon(R) CPU E5-1650 @ 3.20GHz.This study is designed to explore the utility of QMC
as a classical optimization routine. Accordingly, we op-timize QMC by running at a low temperature, 4.8 mK.We also observe that QMC with open boundary condi-tions (OBC) performs better than standard QMC withperiodic boundary conditions in this case [38]; therefore,OBC is used in this comparison. We further optimize thenumber of sweeps per run which, for a given quantile, re-sults in the lowest total computational e↵ort. We findthat the optimal number of sweeps is 106 at the largestproblem size. This enhances the ability of QMC to simu-late quantum tunneling, and gives a very high probabilityof success per run in the median case, p
success
= 0.16.All the qubits in a cluster have approximately the same
orientation in each local minima of the e↵ective meanfield potential. Neighboring local minima typically cor-respond to di↵erent orientations of a single cluster. Here,tunneling time is dominated by a single purely imaginaryinstanton and is described by Eq. (35) below. It wasrecently demonstrated that, in this situation, the expo-nent a
min
/~ for physical tunneling is identical to that ofQMC [38]. As seen in Fig. 4, we do not find a substan-tial di↵erence in the scaling of QMC and D-Wave (QA).However, we find a very substantial computational over-head associated with the prefactor B in the expressionT = BeDamin/~ for the runtime. In other words, B
QMC
can exceed BQA
by many orders of magnitude. The roleof the prefactor becomes essential in situations where thenumber of cotunneling qubits D is finite, i.e., is inde-pendent of the problem size N (or depends on N veryweakly). Between some quantiles and system sizes weobserve a prefactor advantage as high as 108.
C. D-Wave versus other Classical Solvers
Based on the results presented here, one cannot claima quantum speedup for D-Wave 2X, as this would requirethat the quantum processor in question outperforms thebest known classical algorithm. This is not the case forthe weak-strong cluster networks. This is because a va-riety of heuristic classical algorithms can solve most in-stances of Chimera structured problems much faster thanSA, QMC, and the D-Wave 2X [47–49] (for a possible
Google’s “108 results” – slope vs offset
N [problem size]
T
TS
in µ
s
DW
2 an
neal
ing
time
in µ
s
Denchev et al. (15)
Catapult + QMC
![Page 39: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/39.jpg)
5
100 200 300 400 500 600 700 800 900 1000Problem size (bits)
102
104
106
108
1010
1012
1014Q
MC
and
SA
sing
le-c
ore
anne
alin
gtim
e(µ
s)180 296 489 681 945
85th75th50th
102
104
106
108
1010
1012
1014
D-W
ave
anne
alin
gtim
e(µ
s)
QMC
SA
D-Wave
FIG. 4. Time to find the optimal solution with 99% proba-bility for di↵erent problem sizes. We compare Simulated An-nealing (SA), Quantum Monte Carlo (QMC) and the D-Wave2X. To assign a runtime for the classical algorithms we takethe number of spin updates (for SA) or worldline updates (forQMC) that are required to reach a 99% success probabilityand multiply that with the time to perform one update ona single state-of-the-art core. Shown are the 50th, 75th and85th percentiles over a set of 100 instances. It occupied mil-lions of processor cores for several days to tune and run theclassical algorithms for these benchmarks. The runtimes forthe higher quantiles for the largest problem size for QMC werenot computed due to the high computational cost. For a sim-ilar comparison with QMC with di↵erent parameters pleasesee Fig. 13
tonian is
Hcl
= �MX⌧=1
0@Xjk
Jjk
M�j
(⌧)�k
(⌧)
+J?(s)Xj
�j
(⌧)�j
(⌧ + 1)
1A , (9)
where �j
(⌧) = ±1 are classical spins, j and k are siteindices, ⌧ is a replica index, and M is the number ofreplicas. The coupling between replicas is given by
J?(s) = � 1
2�ln tanh
A(s)�
M, (10)
where � is the inverse temperature. The configurationsfor a given spin j across all replicas ⌧ is called the world-line of spin j. Periodic boundary conditions are imposedbetween �
j
(M) and �j
(1). We used continuous path in-tegral QMC, which corresponds to the limit �⌧ ! 0 [46],and, unlike discrete path integral QMC, does not su↵erfrom discretization errors of order 1/M .
We numerically compute the number of sweeps nsweeps
required for QMC to find the ground state with 99%
probability at di↵erent quantiles. In our case, a sweepcorresponds to two update attempts for each worldline.The computational e↵ort is n
sweeps
⇥N⇥Tworldline
, whereN is the number of qubits and T
worldline
is the time to up-date a worldline. We average T
worldline
over all the stepsin the quantum annealing schedule; however the valueof T
worldline
depends on the particular schedule chosen.As explained above for SA, we report the total computa-tional e↵ort of QMC in standard units of time per singlecore. For the annealing schedule used in the current D-Wave 2X processor, we find
Tworldline
= � ⇥ 870 ns (11)
using an Intel(R) Xeon(R) CPU E5-1650 @ 3.20GHz.This study is designed to explore the utility of QMC
as a classical optimization routine. Accordingly, we op-timize QMC by running at a low temperature, 4.8 mK.We also observe that QMC with open boundary condi-tions (OBC) performs better than standard QMC withperiodic boundary conditions in this case [38]; therefore,OBC is used in this comparison. We further optimize thenumber of sweeps per run which, for a given quantile, re-sults in the lowest total computational e↵ort. We findthat the optimal number of sweeps is 106 at the largestproblem size. This enhances the ability of QMC to simu-late quantum tunneling, and gives a very high probabilityof success per run in the median case, p
success
= 0.16.All the qubits in a cluster have approximately the same
orientation in each local minima of the e↵ective meanfield potential. Neighboring local minima typically cor-respond to di↵erent orientations of a single cluster. Here,tunneling time is dominated by a single purely imaginaryinstanton and is described by Eq. (35) below. It wasrecently demonstrated that, in this situation, the expo-nent a
min
/~ for physical tunneling is identical to that ofQMC [38]. As seen in Fig. 4, we do not find a substan-tial di↵erence in the scaling of QMC and D-Wave (QA).However, we find a very substantial computational over-head associated with the prefactor B in the expressionT = BeDamin/~ for the runtime. In other words, B
QMC
can exceed BQA
by many orders of magnitude. The roleof the prefactor becomes essential in situations where thenumber of cotunneling qubits D is finite, i.e., is inde-pendent of the problem size N (or depends on N veryweakly). Between some quantiles and system sizes weobserve a prefactor advantage as high as 108.
C. D-Wave versus other Classical Solvers
Based on the results presented here, one cannot claima quantum speedup for D-Wave 2X, as this would requirethat the quantum processor in question outperforms thebest known classical algorithm. This is not the case forthe weak-strong cluster networks. This is because a va-riety of heuristic classical algorithms can solve most in-stances of Chimera structured problems much faster thanSA, QMC, and the D-Wave 2X [47–49] (for a possible
Google’s “108 results” – slope vs offset
N [problem size]
T
TS
in µ
s
DW
2 an
neal
ing
time
in µ
s
Denchev et al. (15)
Better scaling of DW and quantum inspired.
Catapult + QMC
![Page 40: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/40.jpg)
~ = 0 ~ > 0
: 00
![Page 41: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/41.jpg)
~ = 0 ~ > 0
:0 1
![Page 42: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/42.jpg)
What if we use better algorithms?
• Tailored to the problems and/or underlying graph:
• Hamze-de Freitas-Selby algorithm (HFS).
• Hybrid cluster methods (HCM).
• Super-spin approximation (SS).
• Not tailored to the problems and/or underlying graph:
• Population annealing (particle swarm) sequential Monte Carlo (PA).
• Parallel tempering & isoenergetic cluster optimizer (PT+ICM).
• Reminder – Sequential methods used in the Google study:
• Simulated annealing (SA).
• Quantum Monte Carlo (QMC).
• D-Wave 2X (DW2).
Zhu, Ochoa, Katzgraber PRL (15)
Zhu (16)Venturelli, et al. (15)
Wang et al., PRE (15)
Kirkpatrick et al. (83)
Hamze et al. (12)
Denchev et al. (15)
![Page 43: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/43.jpg)
What if we use better algorithms?
• Tailored to the problems and/or underlying graph:
• Hamze-de Freitas-Selby algorithm (HFS).
• Hybrid cluster methods (HCM).
• Super-spin approximation (SS).
• Not tailored to the problems and/or underlying graph:
• Population annealing (particle swarm) sequential Monte Carlo (PA).
• Parallel tempering & isoenergetic cluster optimizer (PT+ICM).
• Reminder – Sequential methods used in the Google study:
• Simulated annealing (SA).
• Quantum Monte Carlo (QMC).
• D-Wave 2X (DW2).
Zhu, Ochoa, Katzgraber PRL (15)
Zhu (16)Venturelli, et al. (15)
Wang et al., PRE (15)
Kirkpatrick et al. (83)
Hamze et al. (12)
Denchev et al. (15)
![Page 44: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/44.jpg)
What if we use better algorithms?
• Tailored to the problems and/or underlying graph:
• Hamze-de Freitas-Selby algorithm (HFS).
• Hybrid cluster methods (HCM).
• Super-spin approximation (SS).
• Not tailored to the problems and/or underlying graph:
• Population annealing (particle swarm) sequential Monte Carlo (PA).
• Parallel tempering & isoenergetic cluster optimizer (PT+ICM).
• Reminder – Sequential methods used in the Google study:
• Simulated annealing (SA).
• Quantum Monte Carlo (QMC).
• D-Wave 2X (DW2).
Zhu, Ochoa, Katzgraber PRL (15)
Zhu (16)Venturelli, et al. (15)
Wang et al., PRE (15)
Kirkpatrick et al. (83)
Hamze et al. (12)
Denchev et al. (15)
MaxSAT 2016 winner
![Page 45: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/45.jpg)
What if we use better algorithms?
• Tailored to the problems and/or underlying graph:
• Hamze-de Freitas-Selby algorithm (HFS).
• Hybrid cluster methods (HCM).
• Super-spin approximation (SS).
• Not tailored to the problems and/or underlying graph:
• Population annealing (particle swarm) sequential Monte Carlo (PA).
• Parallel tempering & isoenergetic cluster optimizer (PT+ICM).
• Reminder – Sequential methods used in the Google study:
• Simulated annealing (SA).
• Quantum Monte Carlo (QMC).
• D-Wave 2X (DW2).
Zhu, Ochoa, Katzgraber PRL (15)
Zhu (16)Venturelli, et al. (15)
Wang et al., PRE (15)
Kirkpatrick et al. (83)
Hamze et al. (12)
Denchev et al. (15)
MaxSAT 2016 winner Catapult?
![Page 46: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/46.jpg)
0.00.10.20.30.40.50.60.70.8
SA PA DW2QMC
HCMRMC+ICM
PT+ICMHFS SS
b [5
0%]
(a + b √n + c log10(√n)) fit(a + b √n) fitT ⇠ poly(
pn)10a+b
pn
Asymptotic scaling exponent b (slope)b
[50
%, m
ain
scal
ing
expo
nent
]
![Page 47: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/47.jpg)
0.00.10.20.30.40.50.60.70.8
SA PA DW2QMC
HCMRMC+ICM
PT+ICMHFS SS
b [5
0%]
(a + b √n + c log10(√n)) fit(a + b √n) fitT ⇠ poly(
pn)10a+b
pn
smal
ler
mea
ns b
ette
r sc
alin
g
Asymptotic scaling exponent b (slope)b
[50
%, m
ain
scal
ing
expo
nent
]
![Page 48: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/48.jpg)
smal
ler
mea
ns b
ette
r sc
alin
g
Asymptotic scaling exponent b (slope)b
[50
%, m
ain
scal
ing
expo
nent
]
sequen
tial
tailo
red
not tail
ored
tailo
red
0.00.10.20.30.40.50.60.70.8
SA PA DW2QMC
HCMRMC+ICM
PT+ICMHFS SS
b [5
0%]
(a + b √n + c log10(√n)) fit(a + b √n) fitT ⇠ poly(
pn)10a+b
pn
![Page 49: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/49.jpg)
smal
ler
mea
ns b
ette
r sc
alin
g
Asymptotic scaling exponent b (slope)b
[50
%, m
ain
scal
ing
expo
nent
]
sequen
tial
tailo
red
not tail
ored
tailo
red
0.00.10.20.30.40.50.60.70.8
SA PA DW2QMC
HCMRMC+ICM
PT+ICMHFS SS
b [5
0%]
(a + b √n + c log10(√n)) fit(a + b √n) fitT ⇠ poly(
pn)10a+b
pn
Only “sequential” quantum speedup.
![Page 50: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/50.jpg)
~ = 0 ~ > 0
:0 1
![Page 51: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/51.jpg)
~ = 0 ~ > 0
: 11
![Page 52: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/52.jpg)
Most recent D-Wave benchmarks
…
see Mandrà, Katzgraber & Thomas (QST, arXiv:1703.00622)
King et al. (17)
![Page 53: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/53.jpg)
100
101
102
103
104
105
106
107
0 5 10 15 20
α = 0.80, ρ = 5
TTS
(µs)
Number of logical variables
MWPM (no broken qubits)MWPM
DW2000Q, TTS1DW2000Q, TTS2ICM (logical), TTS2
TT
S [µ
s]D-Wave’s frustrated cluster loop problems
n [number of logical variables]
DW2000Q
SA
QMC
King et al. (17)
p
![Page 54: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/54.jpg)
100
101
102
103
104
105
106
107
0 5 10 15 20
α = 0.80, ρ = 5
TTS
(µs)
Number of logical variables
MWPM (no broken qubits)MWPM
DW2000Q, TTS1DW2000Q, TTS2ICM (logical), TTS2
TT
S [µ
s]D-Wave’s frustrated cluster loop problems
n [number of logical variables]
DW2000Q
SA
QMC
King et al. (17)
p
Catapult + QMC
![Page 55: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/55.jpg)
100
101
102
103
104
105
106
107
0 5 10 15 20
α = 0.80, ρ = 5
TTS
(µs)
Number of logical variables
MWPM (no broken qubits)MWPM
DW2000Q, TTS1DW2000Q, TTS2ICM (logical), TTS2
TT
S [µ
s]D-Wave’s frustrated cluster loop problems
n [number of logical variables]
DW2000Q
SA
QMC
• Ruggedness of FCLs (spin-glass backbone) fools codes.
• The logical problem is defined on K44 cells and is therefore planar.
King et al. (17)
p
Catapult + QMC
![Page 56: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/56.jpg)
100
101
102
103
104
105
106
107
0 5 10 15 20
α = 0.80, ρ = 5
TTS
(µs)
Number of logical variables
MWPM (no broken qubits)MWPM
DW2000Q, TTS1DW2000Q, TTS2ICM (logical), TTS2
TT
S [µ
s]D-Wave’s frustrated cluster loop problems
n [number of logical variables]
DW2000Q
SA
QMC
• Ruggedness of FCLs (spin-glass backbone) fools codes.
• The logical problem is defined on K44 cells and is therefore planar.
King et al. (17)
• Planar problems arepolynomial (P class).
• Exact algorithms exist.
Why is this a problem?
p
Catapult + QMC
![Page 57: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/57.jpg)
100
101
102
103
104
105
106
107
1 10 100 1000
TTS
(µs)
Number of logical variables
α = 0.80, ρ = 5
mwpm (fully-chimera)mwpm
DW2kQ, 1/p ttsDW2kQ, log(0.01)/log(1-p) tts
Using minimum-weight perfect matching…
n [number of logical variables]
TT
S [µ
s]
King et al. (17)
Mandrà et al. (17)
Edmonds (61)
p
![Page 58: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/58.jpg)
100
101
102
103
104
105
106
107
1 10 100 1000
TTS
(µs)
Number of logical variables
α = 0.80, ρ = 5
mwpm (fully-chimera)mwpm
DW2kQ, 1/p ttsDW2kQ, log(0.01)/log(1-p) tts
Using minimum-weight perfect matching…
n [number of logical variables]
TT
S [µ
s]
DW2000Q
MWPM
King et al. (17)
Mandrà et al. (17)
Edmonds (61)
p
![Page 59: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/59.jpg)
100
101
102
103
104
105
106
107
1 10 100 1000
TTS
(µs)
Number of logical variables
α = 0.80, ρ = 5
mwpm (fully-chimera)mwpm
DW2kQ, 1/p ttsDW2kQ, log(0.01)/log(1-p) tts
Using minimum-weight perfect matching…
n [number of logical variables]
TT
S [µ
s]
DW2000Q
MWPM
King et al. (17)
Mandrà et al. (17)
Exponentially faster than DW2000Q…
Edmonds (61)
p
![Page 60: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/60.jpg)
100
101
102
103
104
105
106
107
1 10 100 1000
TTS
(µs)
Number of logical variables
α = 0.80, ρ = 5
mwpm (fully-chimera)mwpm
DW2kQ, 1/p ttsDW2kQ, log(0.01)/log(1-p) tts
Using minimum-weight perfect matching…
n [number of logical variables]
TT
S [µ
s]
DW2000Q
MWPM
King et al. (17)
Mandrà et al. (17)
Exponentially faster than DW2000Q…
Edmonds (61)
p
![Page 61: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/61.jpg)
~ = 0 ~ > 0
:1 1
![Page 62: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/62.jpg)
~ = 0 ~ > 0
: 12
![Page 63: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/63.jpg)
Fair sampling – A key ingredient in ML
see also Mandrà, Zhu & Katzgraber (PRL, arXiv:1606.07146)
A
![Page 64: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/64.jpg)
What is fair sampling?
• Definition (fair sampling):
• Ability of an algorithm to find uncorrelated solutions to a problem with (almost) the same probability.
• Why is this important?
• Sometimes solutions are more important than the optimum (SAT filters, #SAT, machine learning,…).
• Some solutions might be more “convenient” due to additional constraints.
• Algorithm benchmarking:
• Standard – Find the optimum fast and reliably.
• Stringent – Find all minimizing configurations equiprobably.
![Page 65: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/65.jpg)
What is fair sampling?
• Definition (fair sampling):
• Ability of an algorithm to find uncorrelated solutions to a problem with (almost) the same probability.
• Why is this important?
• Sometimes solutions are more important than the optimum (SAT filters, #SAT, machine learning,…).
• Some solutions might be more “convenient” due to additional constraints.
• Algorithm benchmarking:
• Standard – Find the optimum fast and reliably.
• Stringent – Find all minimizing configurations equiprobably.
![Page 66: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/66.jpg)
What is fair sampling?
• Definition (fair sampling):
• Ability of an algorithm to find uncorrelated solutions to a problem with (almost) the same probability.
• Why is this important?
• Sometimes solutions are more important than the optimum (SAT filters, #SAT, machine learning,…).
• Some solutions might be more “convenient” due to additional constraints.
• Algorithm benchmarking:
• Standard – Find the optimum fast and reliably.
• Stringent – Find all minimizing configurations equiprobably.
current state of the art is PT+ICM
![Page 67: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/67.jpg)
• 5-variable toy model suggests bias:
• What about quantum annealers?
• Design problems with known degeneracy:
• Study the distribution of ground states for fixed NGS.
fair sampling
Can transverse-field QA sample fairly? Matsuda, Nishimori, Katzgraber (NJP 2009)
Jij = +1Jij = �1
annealing time
P [p
roba
bilit
y of
sta
tes]
Ground-state statistics from annealing algorithms 3
The present paper is organized as follows: Section 2 describes the solution of a small
system by direct diagonalization and numerical integration of the Schrodinger equation.
Section 3 is devoted to the studies of larger degenerate systems via quantum Monte
Carlo simulations, followed by concluding remarks in section 4.
2. Schrodinger dynamics for a small system
It is instructive to first study a small-size system by a direct solution of the Schrodinger
equation, both in stationary and nonstationary contexts. The classical optimization
problem for this purpose is chosen to be a five-spin system with interactions as shown
in figure 1.
Figure 1. Five-spin toy model studied. Full lines denoteferromagnetic interactions (Jij = 1) while dashed lines standfor antiferromagnetic interactions (Jij = −1). Because of thegeometry of the problem the system has a degenerate groundstate by construction.
The Hamiltonian of this system is given by
H0 = −!
⟨ij⟩
Jijσzi σ
zj , (1)
where the sum is over all nearest-neighbour interactions Jij = ±1 and σzi denote Ising
spins parallel to the z-axis. The system has six degenerate ground states, three of which
are shown in figure 2. We apply a transverse field
H1 = −!
i
σxi (2)
|1⟩ |2⟩ |3⟩
Figure 2. Nontrivial degenerate ground states of the toy model shown in figure 1.Filled and open circles denote up and down spins, respectively. The other three groundstates |1⟩, |2⟩, and |3⟩ are obtained from |1⟩, |2⟩, and |3⟩ by reversing all spins.
|1i |2i |3i
Ground-state statistics from annealing algorithms 3
The present paper is organized as follows: Section 2 describes the solution of a small
system by direct diagonalization and numerical integration of the Schrodinger equation.
Section 3 is devoted to the studies of larger degenerate systems via quantum Monte
Carlo simulations, followed by concluding remarks in section 4.
2. Schrodinger dynamics for a small system
It is instructive to first study a small-size system by a direct solution of the Schrodinger
equation, both in stationary and nonstationary contexts. The classical optimization
problem for this purpose is chosen to be a five-spin system with interactions as shown
in figure 1.
Figure 1. Five-spin toy model studied. Full lines denoteferromagnetic interactions (Jij = 1) while dashed lines standfor antiferromagnetic interactions (Jij = −1). Because of thegeometry of the problem the system has a degenerate groundstate by construction.
The Hamiltonian of this system is given by
H0 = −!
⟨ij⟩
Jijσzi σ
zj , (1)
where the sum is over all nearest-neighbour interactions Jij = ±1 and σzi denote Ising
spins parallel to the z-axis. The system has six degenerate ground states, three of which
are shown in figure 2. We apply a transverse field
H1 = −!
i
σxi (2)
|1⟩ |2⟩ |3⟩
Figure 2. Nontrivial degenerate ground states of the toy model shown in figure 1.Filled and open circles denote up and down spins, respectively. The other three groundstates |1⟩, |2⟩, and |3⟩ are obtained from |1⟩, |2⟩, and |3⟩ by reversing all spins.
Ground-state statistics from annealing algorithms 3
The present paper is organized as follows: Section 2 describes the solution of a small
system by direct diagonalization and numerical integration of the Schrodinger equation.
Section 3 is devoted to the studies of larger degenerate systems via quantum Monte
Carlo simulations, followed by concluding remarks in section 4.
2. Schrodinger dynamics for a small system
It is instructive to first study a small-size system by a direct solution of the Schrodinger
equation, both in stationary and nonstationary contexts. The classical optimization
problem for this purpose is chosen to be a five-spin system with interactions as shown
in figure 1.
Figure 1. Five-spin toy model studied. Full lines denoteferromagnetic interactions (Jij = 1) while dashed lines standfor antiferromagnetic interactions (Jij = −1). Because of thegeometry of the problem the system has a degenerate groundstate by construction.
The Hamiltonian of this system is given by
H0 = −!
⟨ij⟩
Jijσzi σ
zj , (1)
where the sum is over all nearest-neighbour interactions Jij = ±1 and σzi denote Ising
spins parallel to the z-axis. The system has six degenerate ground states, three of which
are shown in figure 2. We apply a transverse field
H1 = −!
i
σxi (2)
|1⟩ |2⟩ |3⟩
Figure 2. Nontrivial degenerate ground states of the toy model shown in figure 1.Filled and open circles denote up and down spins, respectively. The other three groundstates |1⟩, |2⟩, and |3⟩ are obtained from |1⟩, |2⟩, and |3⟩ by reversing all spins.H=
X
hiji
JijSiSj
Jij 2 {±5,±6,±7}H=X
hiji
JijSiSj NGS = 3 · 2k = {6, 12, 24, 48, 96, . . .}, k 2 N l
og(h
its)
rankGS
![Page 68: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/68.jpg)
• 5-variable toy model suggests bias:
• What about quantum annealers?
• Design problems with known degeneracy:
• Study the distribution of ground states for fixed NGS.
fair sampling
Can transverse-field QA sample fairly? Matsuda, Nishimori, Katzgraber (NJP 2009)
Jij = +1Jij = �1
annealing time
P [p
roba
bilit
y of
sta
tes]
|2i |3i
Ground-state statistics from annealing algorithms 3
The present paper is organized as follows: Section 2 describes the solution of a small
system by direct diagonalization and numerical integration of the Schrodinger equation.
Section 3 is devoted to the studies of larger degenerate systems via quantum Monte
Carlo simulations, followed by concluding remarks in section 4.
2. Schrodinger dynamics for a small system
It is instructive to first study a small-size system by a direct solution of the Schrodinger
equation, both in stationary and nonstationary contexts. The classical optimization
problem for this purpose is chosen to be a five-spin system with interactions as shown
in figure 1.
Figure 1. Five-spin toy model studied. Full lines denoteferromagnetic interactions (Jij = 1) while dashed lines standfor antiferromagnetic interactions (Jij = −1). Because of thegeometry of the problem the system has a degenerate groundstate by construction.
The Hamiltonian of this system is given by
H0 = −!
⟨ij⟩
Jijσzi σ
zj , (1)
where the sum is over all nearest-neighbour interactions Jij = ±1 and σzi denote Ising
spins parallel to the z-axis. The system has six degenerate ground states, three of which
are shown in figure 2. We apply a transverse field
H1 = −!
i
σxi (2)
|1⟩ |2⟩ |3⟩
Figure 2. Nontrivial degenerate ground states of the toy model shown in figure 1.Filled and open circles denote up and down spins, respectively. The other three groundstates |1⟩, |2⟩, and |3⟩ are obtained from |1⟩, |2⟩, and |3⟩ by reversing all spins.
Ground-state statistics from annealing algorithms 3
The present paper is organized as follows: Section 2 describes the solution of a small
system by direct diagonalization and numerical integration of the Schrodinger equation.
Section 3 is devoted to the studies of larger degenerate systems via quantum Monte
Carlo simulations, followed by concluding remarks in section 4.
2. Schrodinger dynamics for a small system
It is instructive to first study a small-size system by a direct solution of the Schrodinger
equation, both in stationary and nonstationary contexts. The classical optimization
problem for this purpose is chosen to be a five-spin system with interactions as shown
in figure 1.
Figure 1. Five-spin toy model studied. Full lines denoteferromagnetic interactions (Jij = 1) while dashed lines standfor antiferromagnetic interactions (Jij = −1). Because of thegeometry of the problem the system has a degenerate groundstate by construction.
The Hamiltonian of this system is given by
H0 = −!
⟨ij⟩
Jijσzi σ
zj , (1)
where the sum is over all nearest-neighbour interactions Jij = ±1 and σzi denote Ising
spins parallel to the z-axis. The system has six degenerate ground states, three of which
are shown in figure 2. We apply a transverse field
H1 = −!
i
σxi (2)
|1⟩ |2⟩ |3⟩
Figure 2. Nontrivial degenerate ground states of the toy model shown in figure 1.Filled and open circles denote up and down spins, respectively. The other three groundstates |1⟩, |2⟩, and |3⟩ are obtained from |1⟩, |2⟩, and |3⟩ by reversing all spins.H=
X
hiji
JijSiSj
Jij 2 {±5,±6,±7}H=X
hiji
JijSiSj NGS = 3 · 2k = {6, 12, 24, 48, 96, . . .}, k 2 N l
og(h
its)
rankGS
![Page 69: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/69.jpg)
• 5-variable toy model suggests bias:
• What about quantum annealers?
• Design problems with known degeneracy:
• Study the distribution of ground states for fixed NGS.
fair sampling
Can transverse-field QA sample fairly? Matsuda, Nishimori, Katzgraber (NJP 2009)
Jij = +1Jij = �1
annealing time
P [p
roba
bilit
y of
sta
tes]
|2i |3i
Ground-state statistics from annealing algorithms 3
The present paper is organized as follows: Section 2 describes the solution of a small
system by direct diagonalization and numerical integration of the Schrodinger equation.
Section 3 is devoted to the studies of larger degenerate systems via quantum Monte
Carlo simulations, followed by concluding remarks in section 4.
2. Schrodinger dynamics for a small system
It is instructive to first study a small-size system by a direct solution of the Schrodinger
equation, both in stationary and nonstationary contexts. The classical optimization
problem for this purpose is chosen to be a five-spin system with interactions as shown
in figure 1.
Figure 1. Five-spin toy model studied. Full lines denoteferromagnetic interactions (Jij = 1) while dashed lines standfor antiferromagnetic interactions (Jij = −1). Because of thegeometry of the problem the system has a degenerate groundstate by construction.
The Hamiltonian of this system is given by
H0 = −!
⟨ij⟩
Jijσzi σ
zj , (1)
where the sum is over all nearest-neighbour interactions Jij = ±1 and σzi denote Ising
spins parallel to the z-axis. The system has six degenerate ground states, three of which
are shown in figure 2. We apply a transverse field
H1 = −!
i
σxi (2)
|1⟩ |2⟩ |3⟩
Figure 2. Nontrivial degenerate ground states of the toy model shown in figure 1.Filled and open circles denote up and down spins, respectively. The other three groundstates |1⟩, |2⟩, and |3⟩ are obtained from |1⟩, |2⟩, and |3⟩ by reversing all spins.
Ground-state statistics from annealing algorithms 3
The present paper is organized as follows: Section 2 describes the solution of a small
system by direct diagonalization and numerical integration of the Schrodinger equation.
Section 3 is devoted to the studies of larger degenerate systems via quantum Monte
Carlo simulations, followed by concluding remarks in section 4.
2. Schrodinger dynamics for a small system
It is instructive to first study a small-size system by a direct solution of the Schrodinger
equation, both in stationary and nonstationary contexts. The classical optimization
problem for this purpose is chosen to be a five-spin system with interactions as shown
in figure 1.
Figure 1. Five-spin toy model studied. Full lines denoteferromagnetic interactions (Jij = 1) while dashed lines standfor antiferromagnetic interactions (Jij = −1). Because of thegeometry of the problem the system has a degenerate groundstate by construction.
The Hamiltonian of this system is given by
H0 = −!
⟨ij⟩
Jijσzi σ
zj , (1)
where the sum is over all nearest-neighbour interactions Jij = ±1 and σzi denote Ising
spins parallel to the z-axis. The system has six degenerate ground states, three of which
are shown in figure 2. We apply a transverse field
H1 = −!
i
σxi (2)
|1⟩ |2⟩ |3⟩
Figure 2. Nontrivial degenerate ground states of the toy model shown in figure 1.Filled and open circles denote up and down spins, respectively. The other three groundstates |1⟩, |2⟩, and |3⟩ are obtained from |1⟩, |2⟩, and |3⟩ by reversing all spins.H=
X
hiji
JijSiSj
Jij 2 {±5,±6,±7}H=X
hiji
JijSiSj NGS = 3 · 2k = {6, 12, 24, 48, 96, . . .}, k 2 N l
og(h
its)
rankGS
![Page 70: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/70.jpg)
• 5-variable toy model suggests bias:
• What about quantum annealers?
• Design problems with known degeneracy:
• Study the distribution of ground states for fixed NGS. unfair
Can transverse-field QA sample fairly? Matsuda, Nishimori, Katzgraber (NJP 2009)
Jij = +1Jij = �1
annealing time
P [p
roba
bilit
y of
sta
tes]
|2i |3i
Ground-state statistics from annealing algorithms 3
The present paper is organized as follows: Section 2 describes the solution of a small
system by direct diagonalization and numerical integration of the Schrodinger equation.
Section 3 is devoted to the studies of larger degenerate systems via quantum Monte
Carlo simulations, followed by concluding remarks in section 4.
2. Schrodinger dynamics for a small system
It is instructive to first study a small-size system by a direct solution of the Schrodinger
equation, both in stationary and nonstationary contexts. The classical optimization
problem for this purpose is chosen to be a five-spin system with interactions as shown
in figure 1.
Figure 1. Five-spin toy model studied. Full lines denoteferromagnetic interactions (Jij = 1) while dashed lines standfor antiferromagnetic interactions (Jij = −1). Because of thegeometry of the problem the system has a degenerate groundstate by construction.
The Hamiltonian of this system is given by
H0 = −!
⟨ij⟩
Jijσzi σ
zj , (1)
where the sum is over all nearest-neighbour interactions Jij = ±1 and σzi denote Ising
spins parallel to the z-axis. The system has six degenerate ground states, three of which
are shown in figure 2. We apply a transverse field
H1 = −!
i
σxi (2)
|1⟩ |2⟩ |3⟩
Figure 2. Nontrivial degenerate ground states of the toy model shown in figure 1.Filled and open circles denote up and down spins, respectively. The other three groundstates |1⟩, |2⟩, and |3⟩ are obtained from |1⟩, |2⟩, and |3⟩ by reversing all spins.
Ground-state statistics from annealing algorithms 3
The present paper is organized as follows: Section 2 describes the solution of a small
system by direct diagonalization and numerical integration of the Schrodinger equation.
Section 3 is devoted to the studies of larger degenerate systems via quantum Monte
Carlo simulations, followed by concluding remarks in section 4.
2. Schrodinger dynamics for a small system
It is instructive to first study a small-size system by a direct solution of the Schrodinger
equation, both in stationary and nonstationary contexts. The classical optimization
problem for this purpose is chosen to be a five-spin system with interactions as shown
in figure 1.
Figure 1. Five-spin toy model studied. Full lines denoteferromagnetic interactions (Jij = 1) while dashed lines standfor antiferromagnetic interactions (Jij = −1). Because of thegeometry of the problem the system has a degenerate groundstate by construction.
The Hamiltonian of this system is given by
H0 = −!
⟨ij⟩
Jijσzi σ
zj , (1)
where the sum is over all nearest-neighbour interactions Jij = ±1 and σzi denote Ising
spins parallel to the z-axis. The system has six degenerate ground states, three of which
are shown in figure 2. We apply a transverse field
H1 = −!
i
σxi (2)
|1⟩ |2⟩ |3⟩
Figure 2. Nontrivial degenerate ground states of the toy model shown in figure 1.Filled and open circles denote up and down spins, respectively. The other three groundstates |1⟩, |2⟩, and |3⟩ are obtained from |1⟩, |2⟩, and |3⟩ by reversing all spins.H=
X
hiji
JijSiSj
Jij 2 {±5,±6,±7}H=X
hiji
JijSiSj NGS = 3 · 2k = {6, 12, 24, 48, 96, . . .}, k 2 N l
og(h
its)
rankGS
![Page 71: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/71.jpg)
10-310-210-1100101102
0.0 0.2 0.4 0.6 0.8 1.0Num
ber o
f Hits
[Ave
rage
]
Rankgs/(3 ⋅ 2k)
c = 9, k = 2c = 9, k = 3c = 9, k = 4c = 9, k = 5
sample data for N = 684
RankGS/(3•2k)
N
umbe
r of
hits
(av
erag
ed o
ver
sam
ples
)Transverse-field QA is exponentially biased
k = 2 (NGS = 12)k = 3 (NGS = 24)k = 4 (NGS = 48)k = 5 (NGS = 96)
![Page 72: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/72.jpg)
10-310-210-1100101102
0.0 0.2 0.4 0.6 0.8 1.0Num
ber o
f Hits
[Ave
rage
]
Rankgs/(3 ⋅ 2k)
c = 9, k = 2c = 9, k = 3c = 9, k = 4c = 9, k = 5
sample data for N = 684
RankGS/(3•2k)
N
umbe
r of
hits
(av
erag
ed o
ver
sam
ples
)Transverse-field QA is exponentially biased
k = 2 (NGS = 12)k = 3 (NGS = 24)k = 4 (NGS = 48)k = 5 (NGS = 96)
Standard QA will need
tweaks for fair sampling.
![Page 73: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/73.jpg)
~ = 0 ~ > 0
:2 1
![Page 74: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/74.jpg)
~ = 0 ~ > 0
: 13
![Page 75: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/75.jpg)
~ = 0 ~ > 0
: 13analog QA
![Page 76: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/76.jpg)
~ = 0 ~ > 0
: 13Look out for IARPA’s QEO report on QA.
analog QA
![Page 77: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/77.jpg)
~ = 0 ~ > 0
: 13Look out for IARPA’s QEO report on QA.
analog QA
However… Soon superseded by digital?
![Page 78: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/78.jpg)
Quantum vs Classical Optimization:A status update on the arms race
• Classical optimization pushes quantum technology.
• Quantum developments leverage classical quantum inspired methods.
• ML could benefit from quantum samplers… if these can sample fairly.
• To date, no application speedup or better scaling of quantum annealing.
![Page 79: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/79.jpg)
Quantum vs Classical Optimization:A status update on the arms race
• Classical optimization pushes quantum technology.
• Quantum developments leverage classical quantum inspired methods.
• ML could benefit from quantum samplers… if these can sample fairly.
• To date, no application speedup or better scaling of quantum annealing.
Thank you.
![Page 80: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/80.jpg)
![Page 81: Helmut G. Katzgraber - microsoft.com · law from an empirical observation into a self-fulfilling prophecy: new chips followed the law because the industry made sure that they did](https://reader031.vdocument.in/reader031/viewer/2022041223/5e0e688e5e9b445bed0e23c4/html5/thumbnails/81.jpg)