a quantum hop eld neural network - arxiv · 2 hop eld network using matrix inversion. matrix...

A Quantum Hopfield Neural Network

Patrick Rebentrost,1 Thomas R. Bromley,1 Christian Weedbrook,1 and Seth Lloyd2

1Xanadu, 372 Richmond Street W, Toronto, Ontario M5V 1X6, Canada2Massachusetts Institute of Technology, Department of Mechanical Engineering,

77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA(Dated: February 8, 2018)

Quantum computing allows for the potential of significant advancements in both the speed and thecapacity of widely-used machine learning techniques. Here we employ quantum algorithms for theHopfield network, which can be used for pattern recognition, reconstruction, and optimization as arealization of a content addressable memory system. We show that an exponentially large networkcan be stored in a polynomial number of quantum bits by encoding the network into the amplitudes ofquantum states. By introducing a new classical technique for operating the Hopfield network, we canleverage quantum algorithms to obtain a quantum computational complexity that is logarithmic inthe dimension of the data. This potentially yields an exponential speed-up in comparison to classicalapproaches. We also present an application of our method as a genetic sequence recognizer.

INTRODUCTION

Machine learning is an interdisciplinary approach thatbrings together the fields of computer science, mathe-matics, statistics, and neuroscience with the objectiveof giving computers the ability to make predictions andgeneralizations from data [1]. A typical machine learn-ing problem falls into three main categories: supervisedlearning, where the computer learns from a set of trainingdata; unsupervised learning, with the objective of iden-tifying underlying patterns in data; and reinforcementlearning, where the computer evolves its approach basedon real-time feedback. Machine learning is changing howwe interact with technology in areas such as autonomousvehicles, the internet of things, and e-commerce.

Quantum information science has developed from theidea that quantum mechanics can provide improvementsin information processing and communication [2]. Thepromises of quantum information are manifold, rangingfrom exponentially fast quantum computers, informationtheoretic secure quantum communication networks, tohigh precision measurements useful in science and tech-nology. Over the past few decades, quantum informationscience has transitioned from scientific theory to a viableform of technology.

Given the encouraging technological implications ofboth machine learning and quantum information science,it was inevitable that their paths would crossover to formquantum machine learning [3–6]. Quantum-enhancedmachine learning approaches use a toolbox of quantumsubroutines to achieve computational speed-ups for es-tablished machine learning algorithms. This toolbox in-cludes fundamentals like quantum basic linear algebrasubroutines (qBLAS), including eigenvalue finding [2],matrix multiplication [7] and matrix inversion [8]. Onecan also build on quantum techniques, such as amplitudeamplification and quantum annealing. These elementshave been put together in recent works on quantum ma-chine learning [9–15], including nearest-neighbor cluster-ing [16], the quantum support vector machine [17], and

quantum principal component analysis [18, 19].

Artificial neural networks are highly successful in ma-chine learning and are hence of special interest for quan-tum adaptation [9, 12, 20–22]. Here, a collection of bi-nary valued neurons are connected and evolve in sucha way that each neuron decides its state based upon aweighted function of the neurons connecting to it. Theneurons can be organized into layers and may be config-ured to allow for backflow of information (known as a re-current network, often constructed from building blocksof long short-term memory [23]). We focus on the Hop-field network, which is a single layer, recurrent and fullyconnected neural network with undirected connectionsbetween neurons. Such networks can be trained usingthe Hebbian learning rule [24], based on the notion thatthe connection weights are stronger when they are reg-ularly fired together from training data. The Hopfieldnetwork can act an non-sequential associative memory,with technological application in image processing andoptimization [25] and wider interest in neuroscience andmedicine.

We present in this article a method to construct aquantum version of the Hopfield network (qHop), re-sulting from a new adaptation of the classical Hopfieldnetwork. Our approach differs from previous general-izations of the Hopfield network, Refs. [26, 27] focussedon the condensed matter/biology setting, Ref. [28] en-coded neurons directly into qubits, Ref. [29] used a quan-tum search, while Ref. [30] harnessed quantum annealing.Here, the network state is embedded into the amplitudesof a quantum system composed of a register of quantumbits (qubits). The training of qHop is then addressedby introducing quantum Hebbian learning, whereby thesymmetric graph weighting matrix can be associated toa density matrix stored in a qubit register. We showhow this density matrix can be used operationally to im-print relevant training information onto the system. Thenext step is to operate qHop efficiently. To this end,we propose a new approach to optimizing the classical

arX

iv:1

710.

0359

9v2

[qu

ant-

ph]

7 F

eb 2

018

2

Hopfield network using matrix inversion. Matrix inver-sion can typically be performed efficiently using quantumalgorithms with a run time O(poly (log d)) in the size ofthe matrix d [8]. By combining these algorithms with thequantum Hebbian learning subroutine and sparse Hamil-tonian simulation [31], we formalize our algorithm qHop.Using qHop can therefore provide speedups in the appli-cation of the Hopfield network as a content addressablememory system. As an example application, we considerthe problem of RNA sequence pattern recognition of theinfluenza A virus in genetics. We use this scenario tocompare the recovery performances of both approachesto operating the Hopfield network.

Neural networks

Let us first outline some basic features of neural net-works. Consider a collection of d artificial binary-valuedneurons xi ∈ {1,−1} with i ∈ {1, 2, . . . , d}, that are to-gether described by the activation pattern vector x ={x1, x2, . . . , xd}ᵀ, with xᵀ denoting the transpose of x.The neurons are formed into a (potentially multilayer)network by wiring them to create a connected graph,which can be specified by a real and square (d × d)-dimensional weighting matrix W . Its elements wij spec-ify the neuronal connection strength between neurons iand j [32]. We note that each neuron is not typicallyself-connected, so that wii = 0. Furthermore, for anundirected network, W is symmetric.

Setting the weight matrix W is achieved by teachingthe network a set of training data. This training datacan consist of known activation patterns for the visibleneurons, i.e. the input and output neurons, with thelearning achieved using tools such as backpropagation,gradient descent and Hebbian learning. A network canbe fully visible, so that every neuron acts as both aninput and an output.

The Hopfield network is a single layered, fully visible,and undirected neural network. Here, one can teach thenetwork using the Hebbian learning rule [24]. This rulesets the weighting matrix elements wij according to thenumber of occasions in the training set that the neu-rons i and j fire together. Consider a training set of Mactivation patterns x(m), with m ∈ {1, 2, . . . ,M}. The(normalized) weighting matrix is given by

W =1

Md

[M∑m=1

x(m)(x(m)

)ᵀ]− Idd, (1)

with Id the d-dimensional identity matrix.

RESULTSQuantum neural networks

Now we consider the task of using multi-qubit quantumsystems to construct quantum neural networks. One es-tablished method is to have a direct association betweenneurons and qubits [20], unlocking access to quantumproperties of entanglement and coherence. We insteadencode the neural network into the amplitudes of a quan-

tum state. This is achieved by introducing an associationrule between activation patterns of the neural networkand pure states of a quantum system. Consider any d-dimensional vector x := {x1, x2, . . . , xd}ᵀ. We associateit to the pure state |x〉 of a d-level quantum system ac-

cording to x→ |x| |x〉, with |x| =√∑d

i=1 x2i the l2-norm

of x and |x〉 := 1|x|∑di=1 xi |i〉 written with respect to the

standard basis such that 〈x|x〉 = 1. Note that for acti-vation pattern vectors with xi = ±1, the normalizationis |x|2 = d. The d-level quantum system can be imple-mented by a register of N = dlog2 de qubits, so that thequbit overhead of representing such a network scales log-arithmically with the number of neurons. We discuss inthe following section how the weighting matrix W canbe understood in the quantum setting by using quantumHebbian learning.

Crucial for quantum adaptations of neural networksis the classical-to-quantum read-in of activation pat-terns. In our setting, reading in an activation pattern xamounts to preparing the quantum state |x〉. This couldin principle be achieved using the developing techniquesof quantum random access memory (qRAM) [33] or ef-ficient quantum state preparation, for which restricted,oracle based, results exist [34]. In both cases, the com-putational overhead is logarithmic in terms of d. Onecan alternatively adapt a fully quantum perspective andtake the activation patterns |x〉 directly from a quantumdevice or as the output of a quantum channel. For theformer, our preparation run time is efficient wheneverthe quantum device is composed of a number of gatesscaling at most polynomially with the number of qubits.Instead, for the latter, we typically view the channel assome form of fixed system-environment interaction thatdoes not require a computational overhead to implement.

Quantum Hebbian learning

Using our association rule, the training set of activationpatterns x(m) can be associated with an ensemble of purequantum states |x(m)〉. Let us now focus on the Hopfieldnetwork, with a Hamiltonian weighting matrix W . Wefirst introduce the quantum Hebbian learning algorithm(qHeb), which relies on two important insights: (i) thatone can associate the weighting matrix W directly to amixed state ρ of a memory register of N qubits accordingto

ρ := W +Idd

=1

M

M∑m=1

|x(m)〉〈x(m)| , (2)

and (ii), one can efficiently perform quantum algorithmsthat harness the information contained in W .

To comment on (i), the problem of efficient prepara-tion of |x(m)〉 can be addressed using any of the tech-niques discussed in the previous section. We denote byTin the required run time to prepare each |x(m)〉. In thesituations discussed above Tin ∈ O (poly (log d)).

Regarding (ii), now suppose that we have prepared ρ

3

in the laboratory and want to harness the training in-formation contained within. If ρ is the direct output ofan unknown quantum device, then we cannot recover thetraining states |x(m)〉, since the decomposition of ρ intopure states is not unique. On the other hand, we canstill obtain useful information about ρ, such as its eigen-values and eigenstates. One approach to do this couldbe to perform a full quantum state tomography of ρ.For states with low rank r, there exists tomographicaltechniques with a run time O (poly (d log d, r)) [35], al-though for some cases the required run time for full statetomography can grow polynomially with the number ofqubits [36].

We show that one can use ρ as a “quantum softwarestate” [19]. That is, it is possible to efficiently simulateeiρt for time t to precision ε with a required run timeapproximately TqHeb ∈ O

(poly

(log d, t, 1

ε

)). One can

then utilize this ability to estimate the eigenvalues andeigenstates of ρ to precision ε through the quantum phaseestimation algorithm [2], requiring an overall run timeTeigenvalues ∈ O

(poly

(log d, 1

ε

)).

Let us define the set of M unitary operators {Uk}Mk=1acting on an N + 1 register of qubits according to

Uk := |0〉〈0| ⊗ I + |1〉〈1| ⊗ e−i|x(k)〉〈x(k)|∆t (3)

for a small time ∆t. We show in the Materials and Meth-ods how to simulate these unitaries and that one cansimulate a conditional e−iρt by applying them for a suit-ably large number of times. Specifically, if we perform nsequential repetitions of each of the M unitaries, we have

Ut :=

(M∏k=1

Uk

)n= |0〉〈0|⊗I+|1〉〈1|⊗e−iρt+O

(t2

nM

),

(4)with t = nM∆t. In such a way, we can simulate ρconditionally to a precision ε with a number of appli-cations of Uk of order O

(t2/ε

). Each Uk can be re-

alized with logarithmic run time using sparse Hamilto-nian simulation [31], resulting in the overall run time ofTqHeb ∈ O

(poly

(log d, t, 1

ε

)).

The quantum phase estimation algorithm [2, 8] canthen be implemented to find the eigenvalues µj(ρ) andcorresponding eigenstates |vj(ρ)〉 of ρ. Here we preparea register of T qubits additional to our register of N

qubits in the composite state∑2T

t=1 |t〉 ⊗ |ψ〉 for somearbitrary |ψ〉. The size of T is set by the precision withwhich we wish to estimate the eigenvalues. Applying thecontrolled unitaries Ut results in the state

∑j βj |µj(ρ)〉⊗

|vj(ρ)〉. Each |µj(ρ)〉 contains an approximation of theeigenvalues µj(ρ) [2], and βj := 〈vj(ρ)|ψ〉. If we takeT ∈ O (1/ε), we can estimate the eigenvalues of ρ toprecision ε with a number of copies of the memory states|x(m)〉 of the order O

(1/ε3

). This results in an overall

run time Teigenvalues ∈ O(poly

(log d, 1

ε

)). Our quantum

Hebbian learning method thus shows how to prepare theweight matrix from the training data as a mixed quantum

state and then specifies how that density matrix can beused in a quantum algorithm for higher-level machine-cognitive function, specifically to learn eigenvalues andeigenvectors.

The Hopfield network

We return to the classical Hopfield network and discussits operation, having already used the Hebbian learningrule to store M activation patterns in the weighting ma-trix W (the maximum capacity of the Hopfield networkis approximately d/(2 log d) [37]), see also Fig. 1 for adiagram. Suppose that we are supplied with a new acti-vation pattern, x, that may be a noise-degraded versionof one from the training set or alternatively a similar pat-tern that is to be compared to the training set. In thefollowing, we show the standard way of operating the net-work and then develop a new method based on matrixinversion.

The standard method of operating the Hopfield net-work proceeds by initializing it in the activation x andthen running an iterative process whereby neuron i isselected at random and updated according to the rule

xi →{

+1 if∑dj=1 wijxj ≥ θi

−1 otherwise,(5)

with θ := {θi}di=1 ∈ Rd a user-specified neuronal thresh-old vector that determines the switching threshold foreach neuron. Each element θi should be set so that itsmagnitude is of order at most 1. The result of everyupdate is a non-increase of the network energy

E = −1

2xᵀWx+ θᵀx, (6)

with the network eventually converging to a local mini-mum of E after a large number of iterations.

Since W has been fixed due to the Hebbian learningrule so that each x(m) is a local minimum of the energy,the output of the Hopfield network is ideally one of thetrained activation patterns. The utility of such a mem-ory system is clear and the Hopfield network has beendirectly employed, for example, in imaging [25].

We now introduce a new approach to operating theclassical Hopfield network, see Fig. 1. Suppose thatwe are supplied with incomplete data on a neuronalactivation pattern such that we only know the valuesof l < d neurons with labels L ⊂ {1, 2, . . . , d}. Wecan initialize our activation pattern to be x(inc) :={xinc

1 , xinc2 , . . . , xinc

d }ᵀ with xinci = ±1 if i ∈ L and xinc

i = 0otherwise. Our objective is to use the trained Hopfieldnetwork to recover the original activation pattern x.

Let us first define the projector P onto the subspace ofknown neurons, such that P is diagonal with respect tothe standard basis. We proceed by minimizing the energyE in Eq. (6) subject to the constraint that Px = x(inc).

4

Approach Training Read-In Operation Read-Out

Cla

ssic

alM

atri

x In

vers

ion

Qu

antu

mHebbian Learning

𝑊 =1

𝑀𝑑

𝑚=1

𝑀

𝒙(𝑚) 𝒙(𝑚) ⊺−𝕀𝑑𝑑

qHeb𝑇qHeb

Prepare ȁ 𝑤𝑇in

qHop𝑇qHop

ȁ 𝑣

𝑇out 𝑥 𝑥 2

ห ൿ𝑥(𝑚)

𝑇p𝑠

𝛾 ȁ 𝑥

⇔ 1

A-1( )

log2 𝑑qubits

2log2 𝑑qubits

⇔ -1 ⇔ 0 ⇔𝜃

partial information

FIG. 1: The Hopfield network. We discuss three approaches to operating the network. The standard classical approach

is to iteratively update the neurons based on the connections to neighboring neurons. Our newly-developed classical approach

solves a relaxation of the problem posed a linear equation system and solvable through matrix inversion. Hebbian learning is

employed to set the weighting matrix W from d-length training data {x(m)}Mm=1. The third approach uses qHop, encoding data

in order log2 d qubits. Here, the pure state |w〉 is first prepared which contains user-defined neuron thresholds and a partial

memory pattern. Our qHop algorithm proceeds to calculate |v〉 = A−1 |w〉, with the matrix A containing information on the

training data and regularization γ. To achieve this, we introduce the quantum Hebbian learning algorithm qHeb for density

matrix exponentiation of the mixture ρ detailing training data |x(m)〉. The output pure state |v〉 contains information on the

reconstructed state |x〉 and Lagrange multipliers, which are post-selected out. The result |x〉 can be accessed through global

properties such as the swap test, which uses multiple copies of |x〉 to measure the fidelity |〈x|x〉|2 with another state |x〉. The

required run time for each step is given by the subscripted T .

The Lagrangian for this optimization is

L = −1

2xᵀWx+θᵀx−λᵀ

(Px− x(inc)

)+γ

2xᵀx, (7)

where we introduce a Lagrange multiplier vector λ ∈Rd with support only on P and a fixed regularizationparameter γ ≥ 1. The first-order derivative conditionsfor optimization are evaluated as

∂L

∂x= (γId −W )x+ θ − Pλ !

= 0,

∂L

∂λ= −Px+ x(inc) !

= 0. (8)

One can equivalently consider this as a system of linearequations Av = w with

A :=

(W − γId P

P 0

),

v :=

(xλ

), w :=

(θ

x(inc)

). (9)

The solution of this system then provides a vector vwhich extremizes the energy E subject to Px = x(inc).With ‖X‖ the spectral norm (largest absolute eigen-value) of a matrix X, note from the definition in Eq. (1)for the weight matrix W that ‖W‖ ≤ 1. In addition,

‖σx ⊗ P‖ ≤ 1 and hence ‖A‖ ∈ O(γ). We set a rea-sonable choice of value for the regularization parameterto be γ ∈ O(1). It is shown in the Supplemental Ma-terial that the result of the optimization is necessarily aconstrained local minimum of the energy whenever γ ischosen such that γ > ‖W‖. Hence, it suffices to chooseγ > 1. As the matrix A is rank-deficient, we solve thesystem of equations by applying the pseudoinverse A+ tow, recovering a least-squares solution to v.

We find that the unconstrained elements of the resul-tant vector x are continuous valued, i.e., xi ∈ R for i /∈ L.This can be interpreted as a larger positive/negativevalue indicating a stronger confidence for the activation±1, respectively. For a particular neuron, the value canthen be projected to the nearest element ±1 to obtain aprediction for the activation of that neuron. The regu-larization term in the Lagrangian furthermore serves tominimize the l2-norm |x|2 of x, and can be adapted bythe user to prevent the optimization returning overly-large unconstrained elements, see the Supplemental Ma-terial for further details. Our approach to operating theHopfield network through matrix inversion is tested inthe Application section, using the example of RNA se-quencing in genetics.

The quantum Hopfield network

We now show how the Hopfield network can be run ef-ficiently as a combination of quantum algorithms that

5

we call qHop to perform the matrix inversion based ap-proach. Utilizing the embedding method for quantumneural networks already discussed, the system of linearequations specified in (9) can be written in terms of purequantum states as A |v| |v〉 = |w| |w〉, with A as before,P =

∑i∈L |i〉〈i|, and

|v〉 :=1

|v|(|x| |0〉 ⊗ |x〉+ |λ| |1〉 ⊗ |λ〉) ,

|w〉 :=1

|w|

(|θ| |0〉 ⊗ |θ〉+ |x(inc)| |1〉 ⊗ |x(inc)〉

),

(10)

being pure states of N + 1 qubits. The objective is tooptimize the energy function E in Eq. (6) by solving forv = |v| |v〉 = A−1 |w| |w〉, with A−1 the pseudoinverse ofA.

Let us see that it is possible to efficiently calculateA−1 |w〉 with a run time logarithmic in the dimensionof A by utilizing a combination of quantum subroutines.The objective is to use the quantum matrix inversion al-gorithm in Ref. [8]. This algorithm requires the ability toperform quantum phase estimation using efficient Hamil-tonian simulation of A. In the Materials and Methods,we show that one can simulate e−iAt by concurrently ex-ecuting the simulation of a sparse Hamiltonian linked tothe projector P as well as qHeb. To achieve efficiency,certain conditions must be met. These conditions areoutlined in the following subsections.

The essential steps of the algorithm are as follows andalso summarized in Fig 1. Let the spectral decompositionof A be given by

A =∑

j: |µj(A)|≥µ

µj(A) |vj(A)〉〈vj(A)|

+∑

j: |µj(A)|<µ

µj(A) |vj(A)〉〈vj(A)| , (11)

where we have split into two separate sums dependentupon the size of the eigenvalues µj(A) in comparison toa fixed user-defined number µ > 0. As we see in the fol-lowing, as well as the Supplemental Material, the chosenvalue of µ is a trade-off between the run time and the er-ror in calculating the pseudoinverse. The primary matrixinversion algorithm returns (up to normalization) [8]

A−1 |w〉 =∑

j: |µj(A)|≥µ

βjµj(A)

|vj(A)〉 , (12)

where βj = 〈vj(A)|w〉.To begin, we first prepare the input state |w〉 (which

contains the threshold data and incomplete activationpattern) and consider it in the eigenbasis of A, i.e. sothat |w〉 =

∑j βj |vj(A)〉. Our qHeb algorithm is then

initialized along with sparse Hamiltonian simulation [31]to perform quantum phase estimation (see the followingsection and the Materials and Methods for more details),

allowing us to obtain∑j βj |µj(A)〉⊗ |vj(A)〉 with µj(A)

an approximation of the eigenvalue µj(A) to precision ε.We then use a conditional rotation of an ancilla and afiltering process discussed in Ref. [8] to select only theeigenvalues larger than or equal to µ. This is followedby an uncomputing of the first register of T qubits byreversing the quantum phase estimation protocol. Aftermeasurement of the ancilla qubit, our result is (up tonormalization) the pure state A−1 |w〉.

We can efficiently access the state A−1 |w〉 in a numberof different ways by measuring its global properties. Onecan, for example, perform a swap test with a compari-son state, find the expectation value of |x〉 with respectto relevant observables, or resort to state discriminationtests. Alternatively, one can adopt a fully quantum per-spective and view the state A−1 |w〉 (or the post-selectedactivation pattern state |x〉, see below), as the final out-put of the algorithm. Our qHop algorithm then acts asan element of a given quantum toolchain, whose actionis to reconstruct a quantum state from an incomplete su-perposition based on the memory stored in ρ, and thento output to the next element in the chain.

Efficiency

We now turn to addressing the efficiency of qHop. Theoverall efficiency is not just dependent upon the run timeof our primary algorithm, and we must also consider theread-in efficiency of inputting |w〉 as well as the read-out efficiency of extracting useful information from theoutput state |v〉. Here we review each of these facetsand conclude with a comparison to the efficiencies of thediscussed classical approaches to operating the Hopfieldnetwork.

The input pure state |w〉 contains data on the user-specified neuronal thresholds θ, along with the incom-plete activation pattern x(inc). As we have discussed, theread-in of activation problems can add a computationaloverhead to quantum neural network algorithms, poten-tially cancelling any speed-ups yielded by the algorithmitself. This can be addressed using, e.g., qRAM [33] or ef-ficient state preparation techniques [34], or alternativelyby directly accessing the output of a quantum device.Let us denote by Tin the run time of inputting |w〉, whichwe take to be O (poly (log d)) using any of the discussedtechniques.

Following similar calculations to those discussed inRef. [8], we see that our algorithm proceeds by a combina-tion of phase estimation of A with run time Tphase alongwith filtering and amplification operations to select theeigenvalues |λj(A)| ≥ µ [8], requiring a run time Tfilter.Let us consider first phase estimation, which requires usto perform O

(1ε3

)calls to eiAt. One can decompose A

into three block matrices B, C, and D, correspondingto the off-diagonal projector P , an on-diagonal identityId, and, when using Hebbian learning , the on-diagonalmixed training state ρ, see Eq. (9). We show in the Ma-terials and Methods that eiAt is well approximated byapplying for n short times ∆t the unitaries UB/C/D gen-

6

erated by these block matrices, resulting in an error

ε :=∥∥eiAt − (UB(∆t)UC(∆t)UD(∆t))

n∥∥ ∈ O( t2n

),

(13)or equivalently requiring a number of steps n = O

(t2/ε

).

Here, we take the trace norm distance as defined in [19].

Since both B and C are 1-sparse matrices, wecan use efficient sparse Hamiltonian simulation tech-niques [31] to evaluate UB/C(∆t) with run time TB/C ∈O(poly

(∆t, log d, log

(1ε

))). For the matrix D, we

can use the quantum Hebbian learning techniques dis-cussed earlier (and in the Materials and Methods) tosimulate for a time ∆t, requiring a run time TD ∈O(poly

(∆t, 1

ε , log d))

. Note that the dependence onε means that TD is the dominant run time com-pared to TB/C . Hence, overall we have Tphase ∈O(poly

(log d, 1

ε

)). The run time for filtering and ampli-

fication adds an additional overhead Tfilter ∈ O(

1µ

)[8],

meaning that the user should set 1/µ ∈ O (poly (log d))to maintain efficiency. We hence achieve an overall algo-rithm run time of

TqHop ∈ O(

poly

(log d,

1

ε,

1

µ

)). (14)

The output of our algorithm is the pure state |v〉 givenin Eq. (10). We can then measure the first qubit in ourN + 1 qubit register and post-select on |0〉 to obtain |x〉.This succeeds with probability |x|2 / |x|2 + |λ|2, adding

a processing overhead Tps ∈ O(|λ|2

|x|2

). One can see from

Eq. (8) that xi ∈ O (1) for the constrained neurons i ∈ Land xi ∈ O

(1γ

)for the unconstrained neurons, so that

|x|2 ∈ O (d) whenever the number of constrained neuronsl is of the order d. On the other hand, since λi ∈ O (γ)

for i ∈ L and λi = 0 otherwise, we have |λ|2 ∈ O(dγ2).

Hence, overall our processing overhead is Tps ∈ O(γ2).

This means that our choice of γ is in fact a compromise,one must pick γ ≥ ‖W‖ to guarantee a local minimum,but if γ is too large then we add a run-time overhead toqHop.

The next step would naturally be to read-out the am-plitudes of |x〉 by performing tomography. However, evenfor pure states, tomographical techniques can introducean overhead that scales polynomially with the dimen-sion d [35]. Instead, one has to extract useful informa-tion from |x〉 using other approaches, which typically actglobally on |x〉 rather than directly accessing each of thed amplitudes. As discussed above, one option could beto measure the fidelity with another state |x〉, such asone of the training states, which can be achieved by per-forming a swap test with success probability Pswap =12

(1− |〈x|x〉|2

)[38]. We can then determine the fidelity

to a precision ε by performing O(Pswap(1−Pswap)

ε2

)swap

tests between copies of |x〉 and |x〉, with each swap testrequiring O (log d) qubit swaps and hence giving an ad-ditional run time to qHop of Tout ∈ O

(poly

(log d, 1

ε

)).

Alternatively, following the spirit of supervised learn-ing, one may have access to a set of p binary valuedobservables, corresponding to membership of some clas-sification categories. Measuring the expectation valuesof these observables with respect to |x〉 then allows for aclassification of |x〉 with respect to such categories. For agiven precision ε, each expectation value can be measuredwith O

(1ε2

)repetitions, resulting in a run-time overhead

to qHop of Tout ∈ O(poly

(1ε , p, Tobs

)), with Tobs the

time of the observable measurement.

To summarize, the full operation of qHop can be

achieved with a run time O(

poly(

log d, 1ε ,

1µ

)), Fig. 1

visualizes the individual run time contributions. We nowcompare this efficiency with both of the classical ap-proaches: the original Hopfield procedure [32], as wellas the new matrix inversion based approach introducedhere. It is clear that the original Hopfield procedurehas a run time polynomial in the number of neurons,since one must typically sample every one of the d neu-rons at least once. On the other hand, the best sparseclassical matrix inversion techniques have a run time

O(

poly(d, 1√

µ , log(

1ε

), s))

[39] where s is the sparsity,

and it has been shown in Ref. [8] that this run time can-not be improved even if one needs access only to the ex-pectation values of A. We hence see that qHop is poten-tially able to operate with lower computational demandsfor a suitably large d. However, let us emphasize that thisanalysis does not constitute a comprehensive benchmarkof qHop against other possible quantum approaches torunning the Hopfield network [26–30].

Application

Here we outline an application of the Hopfield network inRNA sequencing. Consider H1N1 strain of the influenzaA virus, which has 8 RNA segments that code for differ-ent functions in the virus. The segments are composedof a string of RNA-bases: A, C, G, and U. Each segmentcan in turn be converted to a double sized binary string,as shown in Fig. 2, which can be stored in the weightingmatrix of a Hopfield network. Suppose that we are pro-vided with partial information on a new RNA sequenceand would like to verify whether it belongs to the H1N1virus. For example, our sequence could be from a re-cently collected sample originating in an area with a newinfluenza outbreak. This scenario can be addressed byresorting to the Hopfield network.

We use this setting as a motivation for our numericspresented in Fig. 2, which contains a comparison of theperformance of the standard classical approach to oper-ating the Hopfield network with our new matrix inversionbased approach. Here, we store the first 50 RNA-basesfrom each of the 8 segments of the influenza A H1N1strain (i.e. so that d = 100, M = 8) in the weight-ing matrix W using the Hebbian learning rule (Data

7

Ham

min

g d

ista

nce

Number of known RNA-bases

A⇔ 1, 1 C⇔ 1, -1

G⇔ -1, 1 U⇔ -1, -1

Binary encoding

(a) (b)

FIG. 2: RNA recognition. (a) The Hopfield network can be used as a content addressable memory system for RNA-

recognition (Data source [40]). We encode 50 RNA-bases of the M = 8 strands originating from the H1N1 influenza A virus in

W , and then run the Hopfield network on partial information from a limited number of randomly selected RNA-bases from the

first strand. (b) The result of operating the Hopfield network on this example using the standard classical approach (dotted

line) and the matrix inversion based approach (solid line). The resultant Hamming distance to the true data is averaged over

1000 repetitions for varying amounts of partial information.

source [40]). The weighting matrix is filled to classicalcapacity, i.e., M = 5 ≈ d/(2 log d) [37], so that imperfectrecoveries are more easily identified. We then generateincomplete data from the first segment of H1N1 by ran-domly selecting l/2 RNA-bases for l/2 ∈ {1, 2, . . . , 50}.Both approaches to operating the Hopfield network arethen implemented to reconstruct the full activation pat-tern, with the Hamming distance measured between theresult and the original pattern. This is averaged over1000 repetitions of random choices of l/2 RNA-bases,with the resultant data plotted in Fig. 2. We see thatboth the conventional approach to the Hopfield networkand the new matrix inversion based approach have com-parable performances, with each able to recover the inputsegment for a suitably large l/2. Yet, by using qHop toperform the matrix inversion based approach, we couldoperate with a run time logarithmic in the system di-mension and hence increase the dimension far beyondd = 100, see the previous section and Fig. 1 for a com-parison of run times. Note that for the matrix inversionbase approach, we set γ = 1 to guarantee a local mini-mum since ‖W‖ ≈ 0.185.

DISCUSSION

Quantum effects have a profound potential to yield ad-vancements in machine learning over the coming decade.We have presented a quantum implementation (qHop) forthe Hopfield network that encodes an exponential num-ber of neurons within the amplitudes of only a polynomi-ally large register of qubits. This complements alterna-tive encodings focussing on a one-to-one correspondencebetween neurons and qubits. Crucially, the learning andoperation steps of the quantum Hopfield network can beexponentially quicker in run time when compared to clas-

sical approaches. We have also introduced a method oftraining a quantum neural network via quantum Hebbianlearning (qHeb).

As with many quantum algorithms, the efficient opera-tion of qHop is subject to some important considerations.One must first be able to efficiently read-in the classicalinitialization data of the neural network into our quan-tum device, which can be achieved using efficient purestate preparation techniques [34] or qRAM [33], or al-ternatively by directly using the output of a quantumdevice. Next, it must be possible to operate efficientlyqHeb, and matrix inversion [8]. This relies on efficientHamiltonian simulation of the system matrix, which weshow to be possible by resorting to sparse Hamiltoniansimulation techniques [31] and density matrix exponen-tiation [18, 19]. The matrix inversion algorithm thenoutputs the inverse only on a well-conditioned subspacewith (absolute) eigenvalues larger than a chosen fixedvalue µ whose inverse controls the algorithm efficiency.It is crucial to note that classical sparse matrix inversionalgorithms also have a similar efficiency-dependence onµ. Finally, it must be possible to efficiently access theoutput of qHop, which is a pure quantum state repre-senting a continuous-valued neuronal activation pattern.Since a quantum state tomography is typically resourceintensive, one can instead access global information suchas the fidelity with previously trained activation patternsor the expectation values with respect to observables.

We have introduced the subroutine qHeb, which adaptsthe standard Hebbian learning approach [24] to the quan-tum setting, a new addition to studies on quantum learn-ing. Our subroutine relies on the important observationthat the weight matrix W describing a neural networkcan be alternatively represented by a mixed quantum

8

state (or more generally, a Hamiltonian). Using den-sity matrix exponentiation [18, 19], this quantum statecan then be used operationally for the extraction of, e.g.,eigenvalues and eigenvectors of the weight matrix. Wehave shown that quantum Hebbian learning can be imple-mented by performing a sequential imprinting of memorypatterns, represented as pure quantum states, onto a reg-ister of memory qubits. Although introduced here withinthe context of the quantum Hopfield network, quantumHebbian learning can be of wider interest as a quantumsubroutine within other quantum neural networks.

Our findings, along with similar works in machinelearning [20], promise advancements of genuine techno-logical relevance. The approach we use encodes an expo-nential number of neurons into a polynomial number ofqubits. We have discussed a specific neural network, theHopfield network, which is a content addressable mem-ory system. As an application, we have shown how thematrix inversion-based Hopfield network can be utilizedfor identifying genetic segments of RNA in viruses. Fu-ture developments may focus on the nature of quantumneural networks themselves, identifying entirely new ap-plications that harness purely quantum properties with-out being based upon previous classical networks. Thenatural next step to benefit from the fruits of quantumneural networks, and developments in quantum machinelearning more generally, is to implement these algorithmson viable quantum devices.

MATERIALS AND METHODSQuantum Hebbian Learning

As discussed in the main part, the unitaries applying thedifferent memory pattern projectors |x(k)〉〈x(k)| condi-tionally and for a small time ∆t are given by

Uk := |0〉〈0| ⊗ I + |1〉〈1| ⊗ e−i|x(k)〉〈x(k)|∆t. (15)

We now show how to enact this operator. Let σ be anarbitrary state and |x(k)〉 the memory pattern. Also as-sume a control qubit in state |q〉. Let S be the swapmatrix between the subsystems for σ and |x(k)〉. Notethat

US := e−i|1〉〈1|⊗S∆t

= |0〉〈0| ⊗ I + |1〉〈1| ⊗ e−iS∆t, (16)

where |1〉〈1| ⊗ S is 1-sparse and efficiently simulat-able. For sparse Hamiltonian simulation, the methodsin Ref. [31] can be used with a constant number of oracle

calls and run time O(log d) (see the following section forfurther details), where we omit polylogarithmic factors

in O by use of the symbol O. Note that

tr2

{US |q〉〈q| ⊗ |x(k)〉〈x(k)| ⊗ σ U†S

}= Uk |q〉〈q| ⊗ σ U†k +O(∆t2). (17)

The trace is over the second subsystem containing the

state |x(k)〉. Thus the subsystem of ancilla qubit and σeffectively undergoes time evolution with Uk.

We now simulate the M unitaries Uk sequentially forn repetitions, i.e. performing Ut as defined in Eq. (3)in the main text, taking ∆t = t/nM . Consider for thesake of simplicity the unconditioned evolution. Using thestandard Suzuki-Trotter method we have that

ε :=∥∥∥(e−i|x(1)〉〈x(1)|t/(nM) . . . e−i|x

(M)〉〈x(M)|t/(nM))n

−e−iρt∥∥ ∈ O( t2

nM

). (18)

Hence, we require n ∈ O(t2

εM

)repetitions, with each

repetition requiring M sparse Hamiltonian simulations.

This results in a run time O(t2

ε

). The advantages of

this approach is that we can use copies of the trainingstates |x(m)〉 as “quantum software states” [19] and, inaddition, we do not require superpositions of the trainingstates.

Efficient Hamiltonian Simulation of A

We want to simulate the unitary eiAt to a fixed error εfor arbitrary t. Let us first write

A =

(ρ−

(γ + 1

d

)Id P

P 0

)=

(0 PP 0

)+

(−γ′Id 0

0 0

)+

(ρ 00 0

)=: B + C +D, (19)

where we introduce the (2d× 2d)-dimensional block ma-trices

B =

(0 PP 0

)C =

(−γ′Id 0

0 0

)D =

(ρ 00 0

)(20)

with γ′ = γ+ 1d . We now split the simulation time t into n

infinitesimally small time steps ∆t, i.e. so that t = n∆t,and consider eiA∆t. The time evolution eiA∆t can be sim-ulated by using applications of eiB∆t, eiC∆t, and eiD∆t

via the standard Suzuki-Trotter method. Suppose thatone has operators UB(∆t), UC(∆t), and UD(∆t) that sim-ulate eiB∆t, eiC∆t, and eiD∆t to errors at most O(∆t2),respectively. In many cases much better error scalingsexist. Then, eiB∆teiC∆teiD∆t is simulated to error alsoO(∆t2). By simply using the Taylor expansion, we seethat the error ε∆t of simulating eiA∆t is

ε∆t :=∥∥eiA∆t − UB(∆t)UC(∆t)UD(∆t)

∥∥ ∈ O (∆t2) .(21)

This means that by using n repetitions ofUB(∆t)UC(∆t)UD(∆t) we can simulate eiAt to anerror of ε ∈ O

(n∆t2

). Hence, for a fixed error ε and

9

time t, one needs to perform n ∈ O(t2

ε

)repetitions of

UB(∆t)UC(∆t)UD(∆t).We now evaluate the run time of performing one such

repetition. Consider the block matrix B. Because Pis a diagonal projector, B is a 1-sparse self-adjoint ma-trix, where sparsity is the maximum number of elementsin any column or row. A large series of works have ad-dressed the efficient Hamiltonian simulation of sparse ma-trices. Ref. [31] shows that sparse Hamiltonian simula-tion for a simulation time t to error ε can be performed

with a run time TB ∈ O(st‖B‖max log d). In our case, forthe maximum matrix element of B we have ‖B‖max = 1.The operator UC(∆t) is treated in a similar way. Turn-ing these operators U into their conditional versions andextending into a larger space as in Eq. (20) is in prin-ciple straightforward with the sparse matrix methods.Simulating the operator UD(∆t) is achieved using Heb-bian learning, see the previous Methods and Materialssection, and including a conditioning on an additionalancilla qubit in state |0〉.

[1] C. M. Bishop, Pattern recognition and machine learning(Springer, 2006).

[2] M. A. Nielsen, I. Chuang, Quantum computationand quantum information (Cambridge University Press,Cambridge, 2002).

[3] M. Schuld, I. Sinayskiy, F. Petruccione, An introductionto quantum machine learning, Contemporary Physics 56,172 (2015).

[4] J. Biamonte, et al., Quantum machine learning, Nature549, 195 (2017).

[5] C. Ciliberto, et al., Quantum machine learning: a classi-cal perspective, arXiv preprint arXiv:1707.08561 (2017).

[6] V. Dunjko, H. J. Briegel, Machine learning & artifi-cial intelligence in the quantum domain, arXiv preprintarXiv:1709.02779 (2017).

[7] N. Wiebe, D. Braun, S. Lloyd, Quantum Data Fitting,Physical Review Letters 109, 050505 (2012).

[8] A. W. Harrow, A. Hassidim, S. Lloyd, Quantum algo-rithm for linear systems of equations, Physical ReviewLetters 103, 150502 (2009).

[9] N. Wiebe, A. Kapoor, K. M. Svore, Quantum deep learn-ing, arXiv preprint arXiv:1412.3489 (2014).

[10] V. Dunjko, J. M. Taylor, H. J. Briegel, Quantum-enhanced machine learning, Physical Review Letters 117,130501 (2016).

[11] M. Benedetti, J. Realpe-Gomez, R. Biswas, A. Perdomo-Ortiz, Estimation of effective temperatures in quantumannealers for sampling applications: A case study withpossible applications in deep learning, Physical Review A94, 022308 (2016).

[12] J. Romero, J. Olson, A. Aspuru-Guzik, Quantum autoen-coders for efficient compression of quantum data, Quan-tum Science and Technology 2, 045001 (2017).

[13] M. Schuld, I. Sinayskiy, F. Petruccione, Quantum com-puting for pattern classification, Pacific Rim Interna-tional Conference on Artificial Intelligence (Springer,Berlin, 2014), pp. 208–220.

[14] Z. Zhao, J. K. Fitzsimons, J. F. Fitzsimons, Quan-tum assisted Gaussian process regression, arXiv preprintarXiv:1512.03929 (2015).

[15] L. Wossnig, Z. Zhao, A. Prakash, A quantum linearsystem algorithm for dense matrices, arXiv preprintarXiv:1704.06174 (2017).

[16] N. Wiebe, A. Kapoor, K. M. Svore, Quantum nearest-neighbor algorithms for machine learning, Quantum In-formation and Computation 15 (2015).

[17] P. Rebentrost, M. Mohseni, S. Lloyd, Quantum supportvector machine for big data classification, Physical Re-

view Letters 113, 130503 (2014).[18] S. Lloyd, M. Mohseni, P. Rebentrost, Quantum principal

component analysis, Nature Physics 10, 631 (2014).[19] S. Kimmel, C. Y.-Y. Lin, G. H. Low, M. Ozols, T. J. Yo-

der, Hamiltonian simulation with optimal sample com-plexity, npj Quantum Information 3, 13 (2017).

[20] M. Schuld, I. Sinayskiy, F. Petruccione, The quest for aquantum neural network, Quantum Information Process-ing 13, 2567 (2014).

[21] M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy,R. Melko, Quantum boltzmann machine, arXiv preprintarXiv:1601.02036 (2016).

[22] M. Benedetti, J. Realpe-Gomez, A. Perdomo-Ortiz,Quantum-assisted Helmholtz machines: A quantum-classical deep learning framework for industrial datasetsin near-term devices, arXiv preprint arXiv:1708.09784(2017).

[23] S. Hochreiter, J. Schmidhuber, Long short-term memory,Neural computation 9, 1735 (1997).

[24] D. O. Hebb, The Organization of Behavior (Wiley, Hobo-ken, 1949).

[25] K.-S. Cheng, J.-S. Lin, C.-W. Mao, The application ofcompetitive Hopfield neural network to medical imagesegmentation, IEEE Transactions on Medical Imaging15, 560 (1996).

[26] M. Akazawa, E. Tokuda, N. Asahi, Y. Amemiya, Quan-tum Hopfield network using single-electron circuitsAnovel Hopfield network free from the local-minimum dif-ficulty, Analog Integrated Circuits and Signal Processing24, 51 (2000).

[27] E. C. Behrman, K. Gaddam, J. Steck, S. Skinner, Micro-tubules as a quantum Hopfield network, The EmergingPhysics of Consciousness pp. 351–370 (2006).

[28] P. Rotondo, M. Marcuzzi, J. Garrahan, I. Lesanovsky,M. Muller, Open quantum generalisation of Hopfield neu-ral networks, arXiv preprint arXiv:1701.01727 (2017).

[29] D. Ventura, T. Martinez, Quantum associative memorywith exponential capacity, Neural Networks Proceedings,1998. IEEE World Congress on Computational Intelli-gence. The 1998 IEEE International Joint Conferenceon (IEEE, 1998), vol. 1, pp. 509–513.

[30] H. Seddiqi, T. S. Humble, Adiabatic quantum optimiza-tion for associative memory recall, Frontiers in Physics22, 79 (2014).

[31] D. W. Berry, A. M. Childs, R. Kothari, Hamiltonian sim-ulation with nearly optimal dependence on all param-eters, Foundations of Computer Science (FOCS), 2015IEEE 56th Annual Symposium on (IEEE, New York,

10

2015), pp. 792–809.[32] J. J. Hopfield, Neural networks and physical systems with

emergent collective computational abilities, Proceedingsof the National Academy of Sciences 79, 2554 (1982).

[33] V. Giovannetti, S. Lloyd, L. Maccone, Quantum ran-dom access memory, Physical Review Letters 100, 160501(2008).

[34] A. N. Soklakov, R. Schack, Efficient state preparation fora register of quantum bits, Physical Review A 73, 012307(2006).

[35] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, J. Eis-ert, Quantum state tomography via compressed sensing,Physical Review Letters 105, 150401 (2010).

[36] M. Cramer, et al., Efficient quantum state tomography,arXiv preprint arXiv:1101.4366 (2011).

[37] R. McEliece, E. Posner, E. Rodemich, S. Venkatesh,The capacity of the Hopfield associative memory, IEEETransactions on Information Theory 33, 461 (1987).

[38] D. Gottesman, I. Chuang, Quantum digital signatures,arXiv preprint quant-ph/0105032 (2001).

[39] J. R. Shewchuk, et al., An introduction to the conjugategradient method without the agonizing pain (Departmentof Computer Science, Carnegie-Mellon University., 1994).

[40] H. Zaraket, et al., Genetic makeup of amantadine-resistant and oseltamivir-resistant human influenzaA/H1N1 viruses, Journal of Clinical Microbiology 48,1085 (2010).

[41] A. Ghosh, D. Wassermann, R. Deriche, A polynomial ap-proach for maxima extraction and its application to trac-tography in HARDI, Information Processing in MedicalImaging (Springer, 2011), pp. 723–734.

[42] G. Shilov, Linear Algebra (Dover Publications, 1977).

Acknowledgements: We thank Juan Miguel Arrazola,Mayank Bhatia and Nathan Killoran for fruitful discus-sions. Funding: S. Lloyd was supported by OSD/AROunder the Blue Sky Initiative. Author contributions:P.R. and T.R.B developed the quantum Hopfield networkand quantum Hebbian learning, performed simulationsand wrote the manuscript. P.R. and S.L. conceived theinitial idea. C.W. supervised the project. Competinginterests: P.R., T.R.B, and C.W are employed withinthe quantum technology company Xanadu.

11

SUPPLEMENTAL MATERIAL

I. CONSTRAINED MINIMIZATION OF THE ENERGY FUNCTION

Here we show that the result of the constrained optimization outlined in the Hopfield network section of the mainmanuscript is necessarily a local minimum. Suppose that we are to optimize a real-valued scalar function f(x) of avector x ∈ Rd subject to l < d constraints composed into a real-valued vector function g(x) = 0. The correspondingLagrangian is L (x,λ) = f(x) − λᵀg(x) with Lagrange multiplier vector λ. Optimization can be achieved by

identifying vectors (x, λ) satisfying ∂xL = 0 and ∂λL = 0. To classify these optimal vectors we must consider the((l + d)× (l + d))-dimensional bordered Hessian matrix [41]

H (x,λ) :=

(0l ∇g(x)

∇g(x)ᵀ ∂2L∂x2

). (22)

In particular, (x, λ) is a local minimum if

(−1)ldet (Hk(x,λ)) > 0 (23)

for all k ∈ {2l+1, 2l+2, . . . , l+d}, where Hk(x,λ) is the k-th order leading principle submatrix of H (x,λ), composedof taking the first k rows and the first k columns.

We now show that this condition is satisfied when f(x) is the energy E in Eq. (6) of the main text and g(x) =Px− x(inc). The bordered Hessian matrix is then

H (x,λ) =

(0l −P

−P ᵀ γId −W

), (24)

with P a rectangular (l × d)-dimensional matrix of rows of unit vectors ei for i ∈ L, or equivalently the projector Pwith all zero rows removed. We note that in our setting the bordered Hessian matrix is in fact independent of x andλ, meaning that we can classify any extremum found. We therefore herein drop the following brackets around H .Now consider the leading principle minor Hk for any k ∈ {2l + 1, 2l + 2, . . . , l + d}, given by

Hk =

(0l −Pl×(k−l)

−(Pl×(k−l))ᵀ (γId −W )(k−l)

), (25)

with Pl×(k−l) composed of the first k − l columns of P and (γId − W )(k−l) the (k − l)-th order leading principalsubmatrix of γId −W .

Let us consider γ > ‖W‖ with ‖W‖ the largest eigenvalue of W , so that γId −W > 0. Sylvester’s criterion tells usthat (γId −W )(k−l) > 0 and is hence invertible. Using the Schur complement, we have that

det (Hk) = (−1)ldet((γId −W )(k−l)

)det(Pl×(k−l)

((γId −W )(k−l)

)−1(Pl×(k−l))

ᵀ), (26)

with X−1 the inverse of X. On the other hand, we know that((γId −W )(k−l)

)−1> 0. The action of

Pl×(k−l)((γId −W )(k−l)

)−1(Pl×(k−l))

ᵀ is to select an l-th order principal minor of((γId −W )(k−l)

)−1. It is a well

known result in linear algebra that any principle minor of a positive definite matrix is itself positive definite [42],

so that we know Pl×(k−l)((γId −W )(k−l)

)−1(Pl×(k−l))

ᵀ > 0 for any k. Since the determinant of a positive definitematrix is positive, we hence know that

det((γId −W )(k−l)

)> 0,

det(Pl×(k−l)

((γId −W )(k−l)

)−1(Pl×(k−l))

ᵀ)> 0. (27)

This means that the sign of det (Hk) is given by (−1)l, and that overall

(−1)ldet (Hk) > 0, (28)

12

0.00 0.05 0.10 0.15 0.20 0.25 0.30

0

10

20

30

40

50

g

Ham

min

gD

ista

nce

FIG. 3: The average Hamming distance between one of the reconstructed memory patterns and the original when l = 50neurons are known a priori, given as a function of the regularization parameter γ. The maximum eigenvalue ||W || ≈ 0.185of W , which γ must exceed to guarantee a local minimum, is shown as the vertical dashed line. Note that no increase of theHamming distance is observed for γ > 0.3.

satisfying the condition for a minimum given above.

II. SETTING THE REGULARIZATION PARAMETER

From the previous section, we see that it is necessary to introduce the regularization parameter to provide asufficient condition that our constrained optimization reaches a local minimum. From the perspective of machinelearning, the regularization parameter also functions to penalize large values of |x|2 in the minimization to preventover-fitting. In Fig. 3, following the example outlined in the main text, we plot the average Hamming distancebetween the reconstructed pattern (using our matrix-inversion based approach with discretized post processing) andthe original pattern for increasing values of regularization parameter and a constant number of known neurons l = 50.Here, the average Hamming distance drops off dramatically to zero for a sufficiently high regularization parameterγ > ‖W‖ ≈ 0.185. However, if one chooses an arbitrary large γ then this adds a polynomial run time onto qHop (seethe efficiency discussion in the main text). In the numerics of the main part, we set γ = 1.

III. SETTING THE VALUE OF µ

Our algorithm finds the inverse of

A :=∑

j: |µj(A)|≥µ

µj(A) |vj(A)〉〈vj(A)| , (29)

see Eq. (11) of the main text for comparison to A. It holds that A−1 is equal to the pseudoinverse A−1 whenever

µ does not exceed the smallest non-zero singular value |µmin| of A. Otherwise, A−1 |w〉 approximates A−1 |w〉 to anerror

η :=∣∣∣∣∣∣A−1 |w〉 −A−1 |w〉

∣∣∣∣∣∣ ∈ O( α

|µmin|

), (30)

with α the number of non-zero singular values not exceeding µ.

13

From Eq. (14) of the main text, it can be seen that qHop maintains the polylogarithmic efficiency in run time

whenever µ is such that µ ∈ O(

poly(

1log d

)). Hence, for the matrix inversion to be effective, we require A to be such

that either (1): |µmin| ≥ µ ∈ Θ(

poly(

1log d

)), with no errors in finding the pseudo inverse, or (2): |µmin| < µ but

with α|µmin| ∈ O (poly (log d)) so that the errors η in finding the pseudoinverse accumulate negligibly with increasing

system dimension d.

a quantum hop eld neural network - arxiv · 2 hop eld network using matrix inversion. matrix...

Documents