introductiontoquantuminformation - university of glasgow · in these ﬁve lectures i shall give a...

Introduction to Quantum Information

Stephen M. BarnettSchool of Physics and Astronomy, University of Glasgow, Glasgow G12 8QQ, UK

1

Contents

1 Classical Information Theory 11.1 A very short history 11.2 Probabilities and conditional probabilities 21.3 Entropy and information 31.4 Information and thermodynamics 41.5 Communications Theory 8

2 Quantum Communications and Quantum Key Distribution 112.1 Qubits 112.2 Information security 122.3 Quantum copying? 152.4 Optical polarization 162.5 Quantum cryptography 16

3 Generalized Measurements 213.1 Ideal von Neumann measurements 213.2 Non-ideal measurements 233.3 Probability operator measures 233.4 Optimized measurements 263.5 Operations and the post-measurement state 31

4 Entanglement and its Applications 334.1 Entangled states and non-locality 334.2 Quantum “magic tricks” 374.3 Ebits and shared entanglement 384.4 Quantum dense coding 394.5 Teleportation 41

5 Quantum Computation 445.1 Digital electronics 445.2 Quantum gates 455.3 Principles of quantum computation 485.4 Quantum algorithms 525.5 Errors and decoherence 55

References 57

1

Classical Information Theory

In these five lectures I shall give a short introduction to the field of quantuminformation. The lectures are drawn from my book Quantum Information (Barnett2009). My aim is to make these notes self-contained, but cannot hope to cover thetotality of what is now a large and active field of research. In these notes I aim, rather,to give a taster of the subject to whet the appetite of the reader. A more completeintroduction may be found in (Barnett 2009) or in any of a now large collection ofbooks and review papers devoted to the topic (far too may for me to attempt to makea list and to risk offence by innocent omission). One text that needs to be mentioned,however, is that by Nielsen and Chuang (2000), which did so much both to popularisethe field and to lay out its founding principles.

I am grateful to Oxford University Press for their permission and, indeed, encour-agement to reproduce some of the material from (Barnett 2009). I wish to express alsomy gratitude to Allison Yao for her help in preparing these notes.

1.1 A very short history

Our field starts with the work of the Reverend Thomas Bayes (1702–1761) and thecelebrated theorem that bears his name (of which more below) (Bayes 1763). Hiskey idea was that probabilities depend on what you know; if we acquire additionalinformation then this modifies the probabilities. Today such reasoning is uncontentiousand forms part of the prevailing paradigm in much of probability theory (Jeffreys, 1939;Box and Tiao 1973; Bretthorst 1988; Lee 1989; Jaynes 2003). This was not the case,however, for most of the 350 years of its history. An entertaining and informativepresentation of its troubled history may be found in (Mcgrayne 2011).

The picture is completed by identifying, or formulating the quantity of information.It was Claude Shannon (1916–2001) who solved this problem and, by using it to devisehis two coding theorems, founded information theory (Shannon 1948). Perhaps I cangive an indication of the magnitude of Shannon’s achievement by relating that the titleof his paper was A Mathematical Theory of Communication, but a year later the paperwas republished as a book (Shannon and Weaver 1949); apart from correcting a fewtypographical errors, there are only two changes, the inclusion of a short introductoryarticle by Weaver and a change of title to The Mathematical Theory of Communication.The theory was born complete, the numerous textbooks on the topic have greatlybroadened the scope and application of Shannon’s ideas but have not departed fromthe fundamentals as explained by Shannon in his first paper (Brillouin 1956; Khinchin1957; Kullback 1959; Hamming 1980; Cover and Thomas 1991; Goldie and Pinch 1991).

2 Classical Information Theory

Shannon’s information has a simple and familiar form. For a given set of probabil-ities {pi}, the information is

H = −∑

i

pi log pi . (1.1)

Remarkably, this is simply Boltzmann’s formula for the entropy (of which more later).We can sum up the fundamental lessons learned from Bayes and from Shannon

as follows: Bayes taught us that probabilities are not absolute but rather depend onavailable information. Shannon showed that information itself is a precisely definedfunction of the probabilities.

Shannon’s work was aimed, initially, at the problem of providing a quantitativetheory of communications, but any set of probabilities can be associated with a quan-tity of information and it follows that any probabilistic phenomenon has an associatedinformation theory. This idea has been applied, for example, to statistical mechanics(Jaynes 1957a, 1957b). Quantum theory is a probabilistic theory, of course, and so itwas inevitable that a quantum information theory would be developed.

1.2 Probabilities and conditional probabilities

Consider an event A with possible outcomes {ai}. Everything we know is specifiedby the probabilities for the possible outcomes: {P (ai)}. For tossing a fair coin, forexample, the outcomes are ‘heads’ and ‘tails’ with P (heads) = 1

2 = P (tails). In generalthe probabilities satisfy the two conditions

0 ≤ P (ai) ≤ 1∑

i

P (ai) = 1 . (1.2)

If we have two events A and B with outcomes {ai} and {bj} then the completedescription is given by the joint probabilities, P (ai, bj). Here the comma is read as‘and’ so that P (ai, bj) is the probability that both A = ai and B = bj. If the twoevents are independent then P (ai, bj) = P (ai)P (bj) but this is not true in general.More generally we have

P (ai) =∑

j

P (ai, bj)

P (bj) =∑

i

P (ai, bj) . (1.3)

If the events are created then what does learning the value of A tell us about thevalue of B? If we learn, for example, that A = a0 then P (bj) is replaced by

P (bj |a0) ∝ P (a0, bj) . (1.4)

Here the vertical line is read as ‘given that’ so that P (bj |a0) is the probability thatB = bj given that A = a0. We can find the constant of proportionality by noting thatthe sum over j of P (bj|a0) must be unity and this leads to Bayes’ rule:

P (ai, bj) = P (bj |ai)P (ai)

Entropy and information 3

= P (ai|bj)P (bj) . (1.5)

Bayes’ theorem utilises this rule to relate the two conditional probabilities:

P (ai|bj) =P (bj|ai)P (ai)

P (bj). (1.6)

1.3 Entropy and information

The link between information and entropy is a subtle and beautiful one. Shannonhimself gave a simple derivation of this as a mathematical theorem using only a fewsimple axioms he argued that information must satisfy (Shannon and Weaver 1949).We shall not reproduce this here bust rather present a simple argument to make theassociation plausible.

Suppose that we have a single event A with possible outcomes ai. If one, say a0, iscertain to occur then we acquire no information by observing A. By simple extension,if A = a0 is very likely to occur then we might confidently expect it and so when ithappens we learn very little. If, however, A = a0 is very unlikely then when it occurswe might need to drastically modify our actions. A very simple example is a two-statecommunication system that you may not have considered: a fire alarm. A fire alarmis either ringing or not ringing. Its most common state (hopefully not ringing) is soinnocuous that we give it no thought. When it does ring, however, we stop what weare doing and leave the building (if we are sensible).

It seems reasonable to conclude that learning the value of A provides a quantityof information, h, that increases as the corresponding probability decreases :

h[P (ai)] ⇑ as P (ai) ⇓ .

We think of learning something new as adding to the available information. Thereforefor independent probabilities for a pair of events, P (ai, bj) = P (ai)P (bj), it is naturalto require that

h[P (ai, bj)] = h[P (ai)P (bj)]

= h[P (ai)] + h[P (bj) , (1.7)

which immediately suggests logarithms. Hence we set

h[P (ai)] = −K logP (ai) . (1.8)

Here K is a positive constant, yet to be determined, and the minus sign ensures boththat h is positive and that it increases as the probability decreases (recall that thelogarithm of a number less than unity is negative).

It is convenient to define the information as the average of h, which means weightingh for each outcome by its probability of occurring and then summing. This procedureleads us, of course, to the entropy:

H = −K∑

i

P (ai) logP (ai) . (1.9)


We can absorb the prefactor K into the choice of base for the logarithm. A convenientchoice is to use base 2 so that

H = −∑

i

P (ai) log2 P (ai) bits , (1.10)

which is Shannon’s formula for the information. Henceforth we shall drop the subscript2. It is sometimes convenient, especially in analytical calculations, to use the naturalbase of logarithms, base e, which gives the entropy in nats:

He = −∑

i

P (ai) lnP (ai) nats . (1.11)

It is straightforward to show that He = H ln 2, so that 1 nat = ln 2 bits.For two events A and B we can write informations for the joint events or for the

single events:

H(A,B) = −∑

i,j

P (ai, bj) logP (ai, bj)

H(A) = −∑

i,j

P (ai, bj) log∑

k

P (ai, bk)

H(B) = −∑

i,j

P (ai, bj) log∑

l

P (al, bj) . (1.12)

We can also define information based in the conditional probabilities. An especiallyuseful measure of correlation between the two events is the mutual information:

H(A : B) = H(A) +H(B)−H(A,B) . (1.13)

This has the important properties that it is positive or zero and that it takes the valuezero if and only if the events are independent:

H(A : B) ≥ 0

H(A : B) = 0 iff P (ai, bj) = P (ai)P (bj) ∀i, j . (1.14)

It is the mutual information that provides an upper bound on the rate at whichwe can communicate. A more detailed (but gentle) discussion of these entropies andexploration of their properties may be found in (Barnett 2009).

1.4 Information and thermodynamics

The fact that the mathematical form of the information is also the entropy begs thequestion as to whether information entropy is the same quantity that appears instatistical mechanics. It is!

An important and simple example is the way in which we can obtain the Boltzmanndistribution by maximising the information (what we have yet to discover) subject onlyto a constraint on the average energy. This is such a nice calculation that I cannot

Information and thermodynamics 5

resist the temptation to include it here, even though there was not time to presentit in my lectures. Let a quantum system have a set of discrete energy levels Ei andbe in thermodynamic equilibrium with its environment. Our task is to determine theprobability, pi, that the system is to be found in any one of its given energy levels,Ei. Of course we know how to do this by maximising the entropy, but informationtheory gives us a rather different take on the problem; we maximise the informationof the system subject to the constraint only that its mean energy is fixed. In this waywe maximise, in an unbiased manner, our uncertainty about the state. The derivationis readily performed as a variational calculation using Lagrange’s method of unde-termined multipliers (Boas 1983). It is convenient to work with the natural base oflogarithms in this case and so we define the information to be

He = −∑

i

pi ln pi . (1.15)

Our task is to maximise this quantity subject to a fixed mean energy,∑

i piEi = E,and the probabilities summing to unity,

∑

i pi = 1. We can achieve this by varying,independently, the probabilities in the function

H = He + λ

(

1−∑

i

pi

)

+ β

(

E −∑

i

piEi

)

. (1.16)

We find

dH =∑

i

(− ln pi − 1− λ− βEi) dpi . (1.17)

We require this to be stationary (zero) for all dpi, which leads to the solution

pi = e−1−λe−βEi . (1.18)

We can fix the value of the undetermined multipliers by enforcing the normalisation ofthe probabilities and the value of the average energy to arrive at the familiar Boltzmanndistribution:

pi =exp (−Ei/(kBT ))

∑

j exp (−Ej/(kBT )). (1.19)

A more dramatic example was provided by Szilard in his paper On the decrease of

entropy in a thermodynamic system by the intervention of intelligent beings (Szilard1929). This paper is all the more remarkable in that it precedes Shannon’s work bynearly 20 years. Here is Szilard’s argument. Consider a box, of volume V0 with amovable partition dividing the box into two equal halves. There is a single molecule inone of the two sides, as depicted in Fig. 1.1. An intelligent being can look in the box anddetermine which side the molecule is in. By attaching a small weight to a pulley we canextract work from heat in an apparent violation of the second law of thermodynamics,an interesting paradox in the style of Maxwell’s famous demon (Maxwell 1871).


Q=W

W

(e)

(a)

(b)

(c)

(d)

Fig. 1.1 Szilard’s illustration of the connection between information and (thermodynamic)

entropy. Reproduced, with permission, from Barnett(2009).

Applying the ideal gas law,

PV = kBT , (1.20)

to our single-molecule gas allows us to calculate the work extracted from the expandinggas in an isothermal expansion:

W =

∫ V0

V0/2

PdV = kBT ln 2 . (1.21)

We have extracted useful work from the reservoir in apparent conflict with the secondlaw of thermodynamics. The second law can be saved, however, if the process of mea-suring and recording the position of the molecule produces an entropy change of notless than

∆S = kB ln 2 . (1.22)

Szilard concludes, in this way, a direct link between information and thermodynamicentropy.

Information and thermodynamics 7

(a)

(b)

(c)

(d)

Q=W

W

0 1

0

Fig. 1.2 Illustration of Landauer’s derivation of the thermodynamic cost of information

erasure. Reproduced, with permission, from Barnett(2009).

A more precise statement is that to complete the thermodynamic cycle, we need toinclude the process of the observer forgetting in which side the molecule was found. Theprocess of forgetting was studied by Landauer (1961, Leff and Rex 1994, Plenio andVitteli 2001). He considered a single-molecule gas, like Szilard’s, and proposed usingthe position of the molecule to represent a logical bit: the molecule being to the leftof the partion corresponding to a logical 0 and to the right corresponding to a logical1. Landauer showed that erasing and resetting the bit to 0 requires the dissipationof at least kBT ln 2 worth of energy as heat. To see this we can simply remove themembrane (a reversible process) and then push in, slowly, a fresh partition from theright, as depicted in Fig. 1.2. When the partition reached halfway the ‘memory’ hasbeen reset to the bit value 0 and, of course, all trace of the original bit value has beenlost. The work that needs to be done to achieve this is

W = −∫ V0/2

V0

PdV = kBT ln 2 , (1.23)

and this is dissipated, of course, as heat. We can view this as a resolution of the paradoximplicit in Szilard’s model. This is not the end of the story, however. It has recentlybeen shown that Landauer’s limit can be beaten if we pay a cost, not in energy, butin some other quantity such as angular momentum (Vaccaro and Barnett 2011). Thekey idea to take from these models, however, is unaffected: information is physical.


1.5 Communications Theory

Shannon first introduced his information theory as a/the mathematical theory of com-munication (Shannon 1948, Shannon and Weaver 1949). In doing so he introduced thevery general model of a communications system depicted in Fig. 1.3. Let us analysethis by introducing two characters, Alice and Bob, who have become very popular inquantum communications. Alice wishes to send a message to Bob. Alice’s event, A, isthe selection of the message from a set of possible messages {ai}. Bob’s event, B, isdetecting the message from the possible set {bj}. On Alice’s side there is an informa-tion source, which produces the choice of message to be sent (this may be Alice herself,of course) and a transmitter which produces the signal (electrical, optical, acousticalor even a piece of paper) ready for transmission. The signal propagates from Alice toBob and, whilst en route, is subject to noise. Bob receives the noisy signal and hisreceiving device turns the signal into a message.

BobAlice

SignalInformation

sourceTransmitter

Received

signalReceiver Destination

Noise

source

Fig. 1.3 Shannon’s model of a communications channel. Reproduced, with permission, from

Barnett(2009).

The limiting performance of such a communications channel is governed by twotheorems derived by Shannon (Shannon 1948, Shannon and Weaver 1949): his noiselesscoding theorem, which tells us how much a message can be compressed and still beread, and his noisy channel coding theorem, which tells us how much redundancy isneeded to correct errors.

1.5.1 Noiseless coding theorem

Most messages have an element of redundancy and can be compressed but still bereadable. As an example, you might like to try this one1:

THS LS HCHS SCHL HS NTRSTNG LCTRS

I compressed the message by removing the vowels. Shannon’s noiseless coding theoremquantifies the redundancy in a message. Two simple examples are that in English wemostly don’t need to put a ‘u’ after ‘q’ and a three-letter word beginning with ‘th’ willalmost certainly be ‘the’.

1If you are having difficulty, the message is THiS LeS HouCHeS SCHooL HaS iNTeReSTiNGLeCTuReS.

Communications Theory 9

Shannon’s theorem shows us the typical number of bits we need to encode a messageefficiently. It is the entropy associated with the probabilities for the messages that isthe key quantity. If we have one of a set of messages that is N bits long, for example

010 · · ·111︸︷︷︸

N bits

,

and each of the possible possible messages {ai} is selected with the probability associ-ated probability P (ai), then Shannon’s noiseless coding theorem tells us that we needonly an average of H(A) bits, if we can find an optimum coding scheme.

We shall not prove Shannon’s noiseless coding theorem, but rather present a simpleexample from Shannon’s original paper (Shannon 1948, Shannon and Weaver 1949).A simple derivation, both of the noiseless and noisy coding theorems may be foundin (Stenholm and Suominen 2005, Barnett 2009). Consider a message formed from analphabet of four letters, A, B, C and D and let each entry in the message be one ofthese four with the probabilities

P (A) =1

2

P (B) =1

4

P (C) =1

8= P (D) . (1.24)

The information associated with this set of probabilities is

H = −(1

2log

1

2+

1

4log

1

4+ 2× 1

8log

1

8

)

=7

4bits . (1.25)

Hence Shannon’s noiseless coding theorem says that we should be able (on average) toreduce a message of N characters, or 2N bits, to 1.75 bits. The key to this reductionis to use short sequences for common elements of the message (here the letter A) andlonger ones for the less likely ones (like C and D in our example). Here is a codingscheme that achieves this:

A = 0

B = 10

C = 110

D = 111 . (1.26)

A bit of thought will confirm that any message so encoded can be decoded uniquelyto recover the original sequence of letters. The average number of bits used to encodea sequence of N letters is then

N

(1

2× 1 +

1

4× 2 + 2× 1

8× 3

)

=7

4N = HN , (1.27)


which is the bound provided by Shannon’s theorem. The fact that Shannon’s value isreached in this case tells us, moreover, that no better coding is possible: this is theshortest possible length of the message.

1.5.2 Noisy coding theorem

The presence of noise on the communications channel will introduce errors in thereceived signal. We can combat these errors by introducing some redundancy, indeedthis is undoubtedly the reason why language evolved already containing redundancy.As a simple example, let us suppose that any given bit in the message is ‘flipped’ withprobability q and so produces an error. How much redundancy do we need to be ableto detect and correct these errors?

Shannon’s noisy coding theorem tells us that, on average, we require at least

N0

1−H(q)bits (1.28)

to encode, faithfully, one of 2N0 equiprobable messages. Here

H(q) = − [q log q + (1− q) log(1− q)] (1.29)

is the entropy associated with the single-bit error probability. In other words if wefirst remove all the redundancy to get 2N0 possible optimally compressed messages,we need to put back this much redundancy to combat errors.

The general statement is based on the mutual information. It says that the greatestnumber of messages that can be sent from Alice to Bob on a noisy channel, using Nbits, and be reconstructed by Bob is

2NH(A:B) . (1.30)

Any more is impossible in that an attempt to do so will inevitably produce ambiguitiesin Bob’s decoding process.

We conclude with a simple illustration of the principle of using redundancy tocombat errors. Try to read the following message:

WNTM NARMQN THRS S FN

You probably didn’t manage to do so. The reason for this is that I first compressedthe message by removing the vowels and then added in errors. Because much of theredundancy was removed, the message has become unreadable. If I had left the fullmessage (complete with vowels) and then added the errors, we might have:

WUANTFM INAORMAQION THEORS US FUN

Hopefully, after a bit of thought, you should be able to read the message2. Note thatdecoding the message is possible even though the errors affect both the characters inthe compressed message and in the redundant characters added to combat the errors.

2If you are struggling, the message is qUANTuM INfORMAtION THEORy iS FUN.

2

Quantum Communications andQuantum Key Distribution

Quantum communications differ from their classical counterpart in that the trans-mitted signal is carried by a quantum system. At its simplest, classical informationis expressed in bits ; and is carried by a physical system with two distinct states (forexample a high or low voltage or the presence or absence of an optical pulse). Onestate is used to represent the logical 0 and the other a logical 1.

We can take the same approach in quantum communications and use two orthogo-nal quantum states to represent 0 and 1. We label these, naturally enough, as |0〉 and|1〉. The additional element brought in by using a two-state quantum system is thatwe can also prepare any superposition state of the form

|ψ〉 = α|0〉+ β|1〉 , (2.1)

where α and β are complex probability amplitudes. A quantum bit, or qubit, is sucha two-state quantum system.

2.1 Qubits

Before getting into the theory of quantum communications, we pause to elaborate onthe idea of a qubit and the mathematical tools used to describe it. A qubit can beany quantum system with two orthogonal states. We choose a basis, often called thecomputational basis in quantum information, and label the two states in this basisas |0〉 and |1〉. It is convenient to think of this as a spin-half particle; this is notliterally true in most cases, but it has the benefit that we can use the Pauli-operatorsto describe the properties of the qubit. There are four Pauli operators, which are σz,σx, σy and the identity operator I. We can define each of these by their action on thestates in the computational basis:

σz |0〉 = |0〉 σz |1〉 = −|1〉σx|0〉 = |1〉 σx|1〉 = |0〉σy|0〉 = i|1〉 σy|1〉 = −i|0〉I|0〉 = |0〉 I|1〉 = |1〉 . (2.2)

The first three Pauli operators do not mutually commute. We can write the productrule for these operators in an appealingly simple form if we introduce, by means of a

12 Quantum Communications and Quantum Key Distribution

scalar product, the Pauli operator for an arbitrary direction associated with the unitvector a:

a · σ = axσx + ayσy + azσz . (2.3)

The product rule for two such operators is then

(a · σ)(b · σ) = (a · b)I + i(a× b) · σ . (2.4)

as may readily be verified.

2.2 Information security

The use of a quantum communications channel makes three essential modifications oradditions to Shannon’s model. The first is that the signal sent by Alice is encoded onthe quantum state of a physical system sent to Bob. This means that each signal ai isassociated with a quantum state with corresponding density operator ρi. In additionto any intrinsic noise in the channel, we have a special role for any eavesdropperwho might be listening in. This is because in order to extract any information, theeavesdropper needs to perform a measurement and, as we know, a measurement will,in general, modify the state of the observed system. Finally, Bob cannot determinethe state of the system but rather has to settle for measuring one observable. HenceBob must choose what to measure. All of these elements are essential in quantumcommunications and it is the combination that, in particular, makes quantum keydistribution possible. Before I get to that, let us set the scene by looking at securecommunications is general.

In the information age we are all aware of the importance and difficulty of keep-ing information secure. The science that today underpins this effort is cryptography(Piper and Murphy 2002). The history of secure communications and keeping informa-tion secure, however, is a long and interesting one (Singh 1999, 2000). For importantcommunications we might try enciphering a message to keep it safe. The simplest suchscheme, at least conceptually, is the single-key cryptosystem. The principal idea isthat Alice and Bob share a secret key (a number) which Alice can use to generate thecipher-text and Bob can use to decipher it. The general scheme is depicted in Fig.2.1.

We can write the transformation, formally, involving the plaintext P , the key Kand the ciphertext C. The ciphertext is a function of the plaintext and the key, to becalculated by Alice, and the plaintext may be recovered by Bob as a function of theciphertext and the key:

C = C(P ,K)P = P(C,K) . (2.5)

For example in a substitution cipher we replace each letter with a symbol. In principlethere are

26! ≈ 4× 1026

possible substitution ciphers, so an exhaustive search is completely impractical. Asubstitution cipher is easily cracked, however, using the known letter frequencies. For

Information security 13

E avesdropping Tampering

unprotected

crypto-key

(secret)

crypto-key

(secret)

inverse

mathematical

transformation

BobAlice

mathematical

transformation

dataunprotected

datadata

unprotected

data

Hostile

Network

Protected

data

Fig. 2.1 Elements of a secret communications channel. Reproduced, with permission, from

Barnett(2009).

example the most common symbol will be E which occurs 12.7% of the time, whileQ makes up only about 0.1% of the symbols (or perhaps a bit more in these notes!).Sherlock Holmes makes use of precisely this technique in one of his cases (Conan Doyle1903).

It was Shannon who gave us an objective criterion for perfect secrecy (Shannon1949). Let {pi} be the set of possible plaintexts (messages) and {cj} be the set ofpossible cipher texts. Shannon’s criterion for perfect secrecy is

P (pi|cj) = P (pi) ∀ i, j . (2.6)

A straightforward application of Bayes’ theorem shows that this is equivalent to

P (cj |pi) = P (cj) ∀ i, j . (2.7)

This means that discovering or intercepting the ciphertext does not provide any in-formation on the plaintext. The second condition states that any given ciphertext isequally likely to have been generated by any plaintext. A question that you might liketo ponder is how many possible keys does this require?

The simplest perfectly secure cipher is the Vernam cipher, or one-time pad. It usesa key in the form of a random sequence of bits, · · · 101110 · · ·, that is at least as long asthe plaintext (which is also a sequence of bits, of course). The ciphertext is producesby bit-wise modulo addition of the plaintext and the key:

0⊕ 0 = 0 0⊕ 1 = 1

1⊕ 0 = 1 1⊕ 1 = 0 . (2.8)

A simple example is

P · · · 0011010100 · · ·K · · · 1011101000 · · ·


C = P ⊕K · · · 1000111100 · · · .

The cipertext C is random because the key K is random. Deciphering is performedby Bob by a second modulo addition of the key. This works because K ⊕ K =· · · 0000000 · · ·:

C · · · 1000111100 · · ·K · · · 1011101000 · · ·

P = C ⊕ K · · · 0011010100 · · · .

Clearly the secrecy of the key is crucial, as anyone in possession of it can decipher themessage.

The remaining problem, of course, is how can Alice and Bob exchange the key K?To see how this and other secure communications are realised in practice we need tointroduce public-key cryptography, which is based on the fact that some mathematicaloperations are easy but the inverse inverse operation is very difficult and, hopefully,effectively impossible in practice (Buchmann 2001, Loepp and Wootters 2006, Barnett2009). The classic example is multiplying and factoring.

In public-key cryptography Bob generates two keys, an enciphering key e and adeciphering key d. He publishes e but keeps d secret. Alice can use e to encode hermessage which should be all but impossible (she hopes!) for anyone other than Bob todecipher. This is the RSA cryptosystem (Buchmann 2001, Loepp and Wootters 2006,Barnett 2009).

RSA scheme

1. Bob finds two large prime numbers, p and q, and calculates m = p× q. This is easy.2. He then finds two numbers e and d such that de = 1 modulo (p− 1)(q − 1). Thereare efficient algorithms for doing this if you know p and q.

3. The public key is m and e. The private key is d.

4. Alice takes her message, which is a number x, and turns it into a ciphertext by thetransformation

x→ xe modulo m

5. By the wonders of pure mathematics (actually it is not so hard to prove this)

(xemodulo m)d

modulo m = x moludo m,

which is the original message. So Bob can recover the message using his secret key d.

The security of the RSA scheme is believed to be equivalent to the difficulty of factoringm into its constituent primes p and q. It is certainly true that if p and q are knownthen finding d from e and m is straightforward.

How big does m have to be? Numbers of order 1090 can be factored in lessthan a day, so much larger numbers are needed. There is an RSA factoring chal-lenge, which you might like to try. The number RSA-212, which is ≈ 7 × 10212 was

Quantum copying? 15

factored in July 2012 to claim a $30,000 prize. There is currently a $75,000 prizeto anyone who can produce the two prime factors of the 270 decimal digit number(http://en.wikipedia.org/wiki/RSA accessed October 21 (2013)):

RSA-896 = 412023436986659543855531365332575948179811699844327982845455626433876445565248426198098870423161841879261420247188869492560931776375033421130982397485150944909106910269861031862704114880866970564902903653658867433731720813104105190864254793282601391257624033946373269391

Public key cryptography is routinely used to distribute keys and for proof of identityvia so-called digital signatures.

2.3 Quantum copying?

Quantum key distribution is a radically different approach to secure communications.It relies on the difficulty for anyone eavesdropping in determining the quantum signalgenerated by Alice. Before describing this in detail, we show that it is impossible tocopy an unknown quantum state. This is the famous no-cloning theorem of Woottersand Zurek (1982) and of Dieks (1982). It works by exposing a contradiction.

Suppose that you are given a qubit in some unknown (to you) quantum state:

|ψ〉 = α|0〉+ β|1〉 , (2.9)

that is, you do not know the amplitudes α and β. Clearly you could do a measurementbut that will not tell you |ψ〉. What you would like to achieve is a transformation onthis quit and a ‘blank’ prepared in the state |B〉 such that

|ψ〉 ⊗ |B〉 → |ψ〉 ⊗ |ψ〉 ∀ |ψ〉 . (2.10)

Let us suppose that this works if |ψ〉 = |0〉 or |1〉, so that

|0〉 ⊗ |B〉 → |0〉 ⊗ |0〉|1〉 ⊗ |B〉 → |1〉 ⊗ |1〉 . (2.11)

The superposition principle then tells us that for a general state |ψ〉 the correspondingtransformation is

|ψ〉 ⊗ |B〉 = (α|0〉+ β|1〉)⊗ |B〉→ α|0〉 ⊗ |0〉+ β|1〉 ⊗ |1〉6= (α|0〉+ β|1〉)⊗ (α|0〉+ β|1〉) . (2.12)

So it necessarily follows from the superposition principle that a quantum cloner cannotwork perfectly for all quantum states. Having established that strictly exact cloningof an unknown state is not possible, we might very reasonably ask what is the bestthat is possible. There has been a great deal of work on this topic, but probably themost important is the universal quantum-copying machine the operation of which isthe best possible with the same performance for all possible input qubit states (Buzekand Hillery 1996, Scarani et al 2005).


2.4 Optical polarization

Recall that for a plane wave, the electric field ~E, the magnetic field ~H and the wave-vector ~k are all mutually orthogonal and are oriented such that ~E × ~H points in thedirection of ~k see Fig. 2.2 . This fixes ~E and ~H to be in the plane perpendicular to ~k.

E

H

k

Fig. 2.2 Relative orientations of the electric and magnetic fields and the wavevector. Repro-

duced, with permission, from Barnett(2009).

Consider a complex electric field for a plane wave propagating in the z-direction:

~E = ~E0ei(kz−ωt) . (2.13)

We can characterise the polarization by the direction of the complex vector ~E0 in thex− y plane

~E0 = E0x~ı+ E0y~ (2.14)

or as a column vector, the Jones vector,

~E0 =

[E0x

E0y

]

. (2.15)

Only the relative sizes of E0x and E0y matter, so we can use a normalized Jonesvector. The Jones vectors corresponding to horizontal/vertical, diagonal and circularpolarizations are depicted in Fig. 2.3.

For a single photon we can associate each of the Jones vectors with a qubit stateaccording to the mapping

[αβ

]

→ α|0〉+ β|1〉 , (2.16)

which leads to the identifications listed in Fig. 2.4. This simple idea has been usedwidely in experimental demonstrations of a variety of quantum information and com-munications protocols, most especially in quantum cryptography.

2.5 Quantum cryptography

Quantum cryptography, or perhaps more precisely quantum key distribution, has be-come an advanced experimental technique and is on the verge, perhaps, of becoming a

Quantum cryptography 17

Horizontal

Vertical

Diagonal up

Diagonal down

Left circular

Right circular

1

0

0

1

1

2

1

1

1

2

1

−1

1

2

1

i

1

2

1

i

−i

Fig. 2.3 Linear and circular polarizations, together with their Jones vectors. Reproduced,

with permission, from Barnett(2009).

Horizontal

Vertical

Diagonal up

Diagonal down

Left circular

Right circular

0>

1>

2

1 ( (0> 1>+

2

1 ( (0> 1>-

2

1 ( (0> 1>+ i

2

1 ( (0> 1>- i

Fig. 2.4 Linear and circular polarizations for single photons, together with their associated

qubit states. Reproduced, with permission, from Barnett(2009).


practical technology. In these lectures I can give only a very brief overview of the topicand, for a more complete description, refer the reader to some of the introductoryliterature on the subject (Phoenix and Townsend 1995, Bouwmeester et al 2000, Gisinet al 2002, Loepp and Wootters 2006, Van Assche 2006, Barnett 2009).

The idea of using quantum effects for security is due to Stephen Wiesner (1983)who suggested (in a paper written in about 1970 but published only much later) theidea of unforgeable money. He supposed a high-value banknote worth, say, £10M .Each ‘note’ would have traps for 20 single photons, each of which was prepared in oneof the polarization states |0〉, |1〉, 1√

2(|0〉 + |1〉) and 1√

2(|0〉 − |1〉). The note also has

a serial number which identifies to the back the states of these photon polarizations.The problem face by a would-be forger is how to copy these unknown quantum states?The no-cloning theorem tells us, of course that this is impossible. The situation isillustrated in Fig. 2.5. The counterfeiter needs to work out the polarization of each ofthe trapped photons and do so only by performing measurements, but he/she has noidea which measurement to perform. Let us focus our attention on one of the photonsthat happens to be prepared in a state of vertical polarization. If the counterfeitermeasures this in the horizontal/vertical basis then the result will be a bank notewith the correct horizontally polarised photon. If, however, the counterfeiter measuresin the diagonal basis then either of the two possible results will occur with equalprobability and, importantly, there is no indication possible that the measurement hasbeen performed in the incorrect basis. If the polarization is checked in the bank thenan error will be produced with a probability 1

2 . It follows that for a bank note producedin this way, anyone checking in the bank will see an error (indicating a forgery) witha probability of 1

4 for each of the twenty photons. The probability that all 20 photons

in the forged note pass a test in the bank is(34

)20 ≈ 0.003. If this isn’t considered lowenough, then one simply has to add a few more light traps.

Bank

1>

Counterfeiter

1>

0 >

1 >

Bank

1>

1>

0>

1/2

1/4

1/4

1

1/2

1/2

1/2

1/2

1/2

1/4

1/4

Fig. 2.5 Possible outcomes due to the intervention of a counterfeiter, together with their

associated probabilities of occurrence. Reproduced, with permission, from Barnett(2009).

Quantum cryptography 19

Quantummoney is impractical but Bennett and Brassard (1984) proposed a varianton this idea: quantum key distribution. The idea is that Alice associates the bit values0 and 1 with a linear polarization state (either horizontal/vertical or diagonal/anti-diagonal):

H or D ←→ 0

V or A ←→ 1 . (2.17)

Suppose that Alice prepares a vertically polarised photon and that Bob measures thestate in the horizontal/vertical basis. An eavesdropper, Eve, does not know that thephoton was prepared in this basis and so can only make a choice of what to measure. Inthis way she will introduce, for 1

4 of the photons intercepted, a disagreement betweenthe values obtained by Alice and by Bob. This 25% error rate indicates to Alice andto Bob the activity of the eavesdropper.

All that is needed is a sequence of instructions (a protocol) to ensure that Eveis trapped in this way. One such protocol is depicted in Fig. 2.6. Alice prepares arandom sequence of bits and randomly encodes each onto the polarization as either ahorizontally/vertically or diagonally/anti-diagonally polarised photon, which she sendsto Bob. For each photon, Bob randomly (and independently of Alice, of course) choosesto make a polarization measurement in one of the two bases. Roughly half the timehe will guess correctly and half incorrectly. Bob then uses a classical channel to tellAlice which basis he used in each time slot for which he detected a photon (but not,of course, the measured value). Alice tells Bob the slots in which he made the correctchoice (time-slots 2, 4, 6, 8, 9, 11, 14, 15, 16, and 19, in the case depicted in the Fig.2.6). These should comprise a secret shared random bit string (0111010100 in thiscase) which can be used as a secret key.

00 00 111 10 1

5

0

2

0

0

19

0

0

16

0

0

14

0

0

9

0

0

15

1

1

11

1

1

8

1

1

6

1

1

4

1

1

1

0

0

X

18

0

1

X

17

1

1

X

13

1

0

X

12

1

0

X

10

0

1

X

7

1

0

X

3

0

1

X

20

1

1

X

Fig. 2.6 An example of a quantum cryptographic protocol. Alice prepares single photons in

one of two polarization bases and these are measured by Bob. Reproduced, with permission,

from Barnett(2009).


It remains only to determine whether an eavesdropper has been listening in. Todetermine whether this has happened or not, Alice and Bob can publicly reveal asubset of their shared data and look for errors. An example of what might be expectedto happen is depicted in Fig. 2.7. Alice and Bob delete those bits associated withtime-bins 1, 3, 7, 12, 13, 17, 18 and 20, in which the preparation and measurementwere performed using different bases. In time slots 2, 4, 11, 14, 15, and 19, Eve used adifferent basis to Alice and Bob and of these, this led to a difference in the recordedbit values for Alice and Bob in time-bins 4, 11 and 14. The probability that Eve has

been active and does not cause an error is then(34

)N, where N is the number of bits

tested. Naturally, those bits used in the open discussion between Alice and Bob needto be discarded.

5

0

11

1

0

E

1

6

1

1

1

20

1

1

D

1

14

0

1

E

0

9

0

0

0

10

0

0

8

1

1

1

3

0

1

D

0

7

1

0

D

1

1

0

0

D

0

18

0

1

D

1

17

1

1

D

1

12

1

1

D

1

13

1

0

D

1

4

1

0

E

1

2

0

0

1

19

0

0

1

16

0

0

0

15

1

1

1

Alice

Bob

Eve

Fig. 2.7 An example of a quantum cryptographic protocol in which an eavesdropper has

been active. Reproduced, with permission, from Barnett(2009).

Naturally, there are many subtleties that need to be accounted for and whichcomplicate, to a greater or lesser extent, the above simple picture. There will alwaysbe errors and, to be safe, we need to assume that these are due to eavesdropper activity.If we simply discard the communication when we find an error then communication of akey of any useful size becomes impossible. We need to detect and correct errors, whichis achieved by parity checks which shorten the key. The error rate is then used to placea bound on the information an eavesdropper might have and this can then be reducedby taking as key bits the parity of a number of bits (this is called privacy amplification).Practical systems, moreover, may not have access to single-photon sources and therewill usually be other departures from the ideal. Proving and assessing the security ofreal-world systems has become a research topic in its own right (Scarani et al 2009).

3

Generalized Measurements

The extraction of information from a quantum system requires us to perform mea-surements. Our task in this lecture is to set up a general description of this process1.We seek two things: (i) the probability for any given measurement result to occurand (ii) the state of the system after the measurement has been made, that is thepost-measurement state conditioned on the measurement outcome.

3.1 Ideal von Neumann measurements

Let us start with the measurement process as it is usually encountered in quantummechanics courses. This formulation is essentially that given by von Neumann in hisfamous and early book on quantum mechanics (von Neumann 1955). We start byrepresenting each observable A by a Hermitian operator2, A. This operator will havea complete set of eigenvectors |λn〉 and associated eigenvalues λn:

A|λn〉 = λn|λn〉 , (3.1)

which means that we can write A in the form

A =∑

n

λn|λn〉〈λn| . (3.2)

Let us assume, for the moment, that each of the eigenvalues is distinct from the others.The von Neumann description then states that if we perform a measurement of A thenwe will find the measurement result to be one of the eigenvalues and, moreover, theprobability for finding any one of these is

P (λn) = |〈λn|ψ〉|2 , (3.3)

where |ψ〉 is the pre-measurement state. More generally, for a mixed state with densityoperator ρ, we have

P (λn) = 〈λn|ρ|λn〉= Tr (ρ|λn〉〈λn|) . (3.4)

Immediately after the measurement, the von Neumann description has the system leftin the eigenstate corresponding to the measurement outcome. Hence if we make a

1We should note that we do not include in this, measurements involving post-selection and so willnot cover the topic of ‘weak measurements’ (Aharonov et al 1988).

2The distinction between Hermitian and self-adjoint operators will not concern us.

22 Generalized Measurements

measurement of A and find the value λn, then we know that the post-measurementstate is |λn〉 and repeating the measurement, if it is done quickly enough, should givethe same result.

There is one further point that we need to consider, which is that the eigenvaluesof A might be degenerate, which means that a set of orthonormal eigenvectors, |λjn〉,will correspond to the same measurement outcome, λn. To incorporate this case, it isuseful to introduce a projector onto the set of states with a common eigenvalue:

Pn =∑

j

|λjn〉〈λjn| . (3.5)

The probability that the measurement will give the result λn is then

P (λn) =∑

j

〈λjn|ρ|λjn〉

= Tr(

ρPn

)

. (3.6)

The post-measurement state will simply be that part of the pre-measurement statethat was in the subspace spanned by the corresponding eigenstates or, in other words,that part of the state selected by the projector. Hence if our measurement gives theresult λn then the density matrix changes as

ρ→ PnρPn

Tr(ρPn), (3.7)

where the denominator, which is the prior probability for the observed measurementresult, ensures correct normalisation of the post-measurement state.

Let us conclude this brief review of von Neumann measurements with a summaryof the properties of projectors.

Properties of projectors

1. They are Hermitian, Pn = P †n.

2. They are positive operators, Pn ≥ I.

3. They are complete,∑

n Pn = I.

4. They are orthonormal, PnPm = Pnδnm.

Here I is the identity operator. We note that the first three of these properties havephysical significance in that they are required in order that the probability rule,

P (λn) = Tr(

ρPn

)

, be true. They correspond, respectively, to the requirements that

the projectors are observables, that they give positive probabilities and that the proba-bilities for the complete set of possible outcomes must sum to unity. The final property,however, does not seem to have a similar physical significance and, indeed, we shallsee that generalised measurements correspond to dropping this requirement.

Non-ideal measurements 23

3.2 Non-ideal measurements

Most measurements are not of the form described in the preceding section. Consider,for example, the operation of a photodetector. Real photodetectors have a finite ef-ficiency and so do note resolve perfectly the number of photons. In detecting thephotons, moreover, the detector absorbs the light and so leaves the field in its zero-photon (or vacuum) state. Hence neither the von Neumann forms for the detectionprobabilities nor for the post-measurement state are appropriate and something moregeneral is required.

As a simple example, both of the problem and of how we might proceed, let usconsider a measurement of a spin-component for a single quit. The ideal measurementwould correspond to the pair of projectors

P0 = |0〉〈0|P1 = |1〉〈1| . (3.8)

Suppose that our detector gives the wrong answer with probability p so that the twopossible measurement results occur with probabilities

P (0) = (1 − p)Tr(

ρP0

)

+ pTr(

ρP1

)

P (1) = (1 − p)Tr(

ρP1

)

+ pTr(

ρP0

)

, (3.9)

the sum of which is clearly unity, as it should be. We can write these in a formreminiscent of the von Neumann expressions,

P (0) = Tr (ρπ0)

P (1) = Tr (ρπ0) , (3.10)

by introducing the probability operators

π0 = (1− p)P0 + pP1

π1 = (1− p)P1 + pP0 . (3.11)

These operators satisfy the first three properties that we listed for the projectors, theyare Hermitian, positive and complete. They are not orthonormal, however:

π0π1 = p(1− p)I . (3.12)

This is our first indication of a more general description of the measurement process,which we develop further below.

3.3 Probability operator measures

To calculate the portability for any given result of a generalized measurement we needa probability rule and, in particular, a set of operators that characterise the measure-ment, one for each of the possible measurement outcomes. The required operators are


the probability operators, the set of which is a probability operator measure or a posi-tive operator-valued measure (Helstrom 1976, Holevo 1982, 2001, Peres 1993, Barnett2009).

That the first three of the properties we listed for the projectors have physicalsignificance, but the third does not, tells us in what way we are allowed to generalisethe von Neumann description; we simply drop the fourth and final property from ourlist. Hence we describe any measurement by a set of probability operators {πm} suchthat the probability for getting measurement outcome m is

Pm = Tr (ρπm) , (3.13)

where the probability operators have the following three properties

Properties of probability operators

1. They are Hermitian, πn = π†n.

2. They are positive operators, πn ≥ I.

3. They are complete,∑

n πn = I.

We should emphasise that there is no restriction on the number of probability oper-ators; the number can be greater or less than the dimension of the state space. Forexample, we shall analyse an example in which a generalised measurement on a qubitwhich has three outcomes, even though the dimension of the state space is only 2.There is no need, moreover, for the probability operators to commute with each other.

The set of probability operators characterising a measurement is called a proba-bility operator measure (or POM) which you will also find called a positive operator-valued measure (POVM). The latter has become the commonly used expression, butI prefer the former3. The description of a measurement in terms of a POM is veryuseful because of two points:

1. All measurements can be described in this way. (This is at least reasonable giventhe physical significance of the three properties used to define a POM.)

2. Any set of the probability operators (that is any POM) is realisable, at least inprinciple, in the laboratory.

This combination is very useful because if we seek to find the best possible measure-ment in any situation, we can separate out the purely mathematical optimisation ofthe POM from the experimental question of how to achieve it. That any POM is re-alisable as a measurement is a consequence of Naimark’s theorem. We shall not provethis here but rather give an indication of how it works. A more complete presentationmay be found in (Barnett 2009). The key idea is to map the generalized measurementonto a projective, or von Neumann, measurement in an extended state space. To see

3POM or POVM? The term ‘probability operator measure’ tells us that the elements formingthe measure are the probability operators. The more popular expression ‘positive operator-valuedmeasure’ expresses the fact that the elements of the measure, the probability operators, are positiveoperators. Calling the set of operators a POM reminds us of their physical significance, while theterm POVM recalls their mathematical properties.

Probability operator measures 25

how this might be done, consider as quantum system s, which we wish to measure, andlet us prepare also an ancillary quantum system a, so that we know its initial state.The state of the combined system and ancilla is then |ψ〉s ⊗ |A〉a. Next we engineeran interaction between the system and the ancilla, so that a unitary transformationof our choosing occurs:

|ψ〉s ⊗ |A〉a → U |ψ〉s ⊗ |A〉a . (3.14)

Finally, we perform a von Neumann measurement on both the system and the ancilla,so that the probability for a given outcome will be

P (m, l) =∣∣∣s〈m| ⊗ a〈l|U |ψ〉s ⊗ |A〉a

∣∣∣

2

= s〈ψ|πml|ψ〉s , (3.15)

whereπml = a〈A|U †|m〉s ⊗ |l〉aa〈l| ⊗ s〈m|U |A〉a , (3.16)

which is an operator acting only on the system state space. It is clearly Hermitian andpositive. That it sums to the identity follows, moreover, from the completeness of theorthonomal states {|m〉s} and {|l〉a}. It follows that the operators πml are probabilityoperators.

Let us give a simple but important example of a generalised measurement; thesimultaneous measurement of the position and momentum of a particle. This is some-thing that we do all the time in our predominantly classical world but, because positionand momentum are incompatible observables, cannot be described as a von Neumannmeasurement. Let us write the probability density for a joint measurement to give aposition between xm and xm + dxm and also a momentum between pm and pm + dpmas

P(xm, pm) = Tr [ρπ(xm, pm)] , (3.17)

where the π(xm, pm) are positive operators satisfying the continuum condition

∫ ∞

−∞dxm

∫ ∞

−∞dpmπ(xm, pm) = I

⇒∫ ∞

−∞dxm

∫ ∞

−∞dpmP(xm, pm) = 1 . (3.18)

A good simultaneous measurement of the position and momentum of a body willlocalise rather well both quantities. In order to produce a plausible measurement op-erator, let us introduce a Gaussian state:

|xm, pm〉 = (2πσ2)−1/4

∫ ∞

−∞dx exp

[

− (x− xm)2

4σ2+ i

pmx

h

]

|x〉 , (3.19)

where |x〉 is a position eigenstate. We note that these states form an over-complete setin that

1

2πh

∫ ∞

−∞dxm

∫ ∞

−∞dpm|xm, pm〉〈xm, pm| = I . (3.20)


Comparing this with our normalisation condition leads us to identify our probabilityoperators as

π(xm, pm) =1

2πh|xm, pm〉〈xm, pm| . (3.21)

It follows, for example, that the probability distribution for the measured position is

P(xm) =

∫ ∞

−∞dx 〈x|ρ|x〉 exp

[

− (x− xm)2

2σ2

]

, (3.22)

which is a convolution of the true momentum distribution, 〈x|ρ|x〉, with a Gaussianof width associated with the state |xm, pm〉. A similar expression applies also for themeasured momentum distribution. The variances found for the measurement resultsare

Var(xm) = ∆x2 + σ2

Var(pm) = ∆p2 +h2

4σ2. (3.23)

The additional contributions, over and above ∆x2 and ∆p2, are a consequence ofthe fact that we have performed a simultaneous measurement of two incompatibleobservables.

3.4 Optimized measurements

We can seek the best possible measurement in any given situation in two stages:first we have the mathematical optimization by finding the best POM and secondlydesign an experiment that gives the corresponding probabilities. The mathematicaloptimization is simply a search over all possible sets of probability operators forminga POM. The optimal measurements to perform have been determined in a number ofcases and some of these have also been realised in the laboratory (Chefles 2000, Parisand Rehacek 2004, Bergou 2007, Barnett 2009, Barnett and Croke 2009b).

Let us begin by considering a simple example motivated by our treatment of quan-tum communications. Suppose we have a qubit which we know to have been preparedin one of two non-orthogonal states |ψ1〉 or |ψ2〉:

〈ψ1|ψ2〉 6= 0 . (3.24)

We can use our probability operators to ask whether a measurement exists that willdetermine which state has been prepared with certainty. Not surprisingly, the answeris no, but we can prove this. To proceed, we seek two probability operators, π1 andπ2, such that

〈ψ1|π1|ψ1〉 = 1 = 〈ψ2|π2|ψ2〉〈ψ2|π1|ψ2〉 = 0 = 〈ψ1|π2|ψ1〉

Optimized measurements 27

π1 + π2 = I . (3.25)

From our first condition, 〈ψ1|π1|ψ1〉 = 1, we can write

π1 = |ψ1〉〈ψ1|+ A , (3.26)

where A is a Hermitian, positive operator such that 〈ψ1|A|ψ1〉 = 0, which implies thatA|ψ1〉 = 0. From this it follows that

〈ψ2|π1|ψ2〉 = |〈ψ2|ψ1〉|2 + 〈ψ2|A|ψ2〉 (3.27)

which, being the sum of a positive quantity and a quantity greater than or equal tozero, must be greater than zero. Hence we cannot discriminate between these statesperfectly. A natural question then is “what is the best we can do?”. The answer de-pends, of course, on what we mean by ‘best’. Here we consider only two such strategies:measurement with the minimum probability of error and unambiguous state discrim-ination.

3.4.1 Maximum probability of being correct

Consider the following task, motivated by our discussion of quantum communications.We are given a quantum system which we know to have been prepared in one ofa set of possible states {ρj} and also the probabilities, {pj}, that each state wasprepared. We seek to identify the state with the maximum probability of being corrector, equivalently, with the minimum value of the error probability. Hence we constructa POM with probability operators {πj} and associate each measurement outcome, πj ,with the corresponding state ρj and then minimise the quantity

Perror = 1−∑

j

pjTr (ρjπj) . (3.28)

Remarkably, the necessary and sufficient conditions for reaching this minimum areknown (Holevo 1973, Helstrom 1976, Barnett 2009):

πj (pj ρj − pkρk) πk = 0∑

i

piρiπi − pj ρj ≥ 0 . (3.29)

A simple derivation of these is given in (Barnett and Croke 2009a). These conditions donot provide a direct means to construct the minimum error measurement and, indeed,there are many cases in which the strategy for measuring with minimum probabilityof error is not unique. Our strategy, therefore, is to try a measurement strategy andif it satisfies these conditions then we know it is optimal. The problem is usually atractable one where there is some symmetry amongst the set of possible states andthe minimum-error strategy has been derived in a variety of cases (Chefles 2000, Parisand Rehacek 2004, Bergou 2007, Barnett 2009, Barnett and Croke 2009b).

If we have just two possible states, ρ1 and ρ1, then the minimum-error measurementis known: we need only perform a projective (or von Neumann) measurement in which


the two measurement outcomes correspond to projectors onto the positive and negativeeigenvalues of p1ρ1 − p2ρ2, corresponding, respectively, to identifying the state as ρ1and ρ1. This gives as the minimum error probability the Helstrom bound:

Pminerror =

1

2[1− Tr |p1ρ1 − p2ρ2|] . (3.30)

It is interesting to note that sometimes a measurement does not help and the minimum-error strategy is simply to guess that the most likely state was prepared (Hunter 2003).As an example, let us suppose that we have a single qubit which we know to have beenprepared with equal probability (p1 = 1

2 = p2) in one of the two non-orthogonal states

|ψ1〉 = cos θ|0〉+ sin θ|1〉|ψ2〉 = cos θ|0〉 − sin θ|1〉 , (3.31)

so that the overlap of the states is

〈ψ1|ψ2〉 = cos 2θ . (3.32)

It is straightforward to confirm that the minimum error strategy corresponds to thetwo probability operators

π1 =1

2(|0〉+ |1〉)(〈0|+ 〈1|) = |π1〉〈π1|

π2 =1

2(|0〉 − |1〉)(〈0| − 〈1|) = |π2〉〈π2| . (3.33)

The arrangement of these states is depicted in Fig. 3.1. We see that the optimalmeasurement in this case corresponds to projection onto one of a pair of states |π1〉or |π2〉 which are ‘close’ to the states |ψ1〉 and |ψ2〉. The error probability arises fromthe fact that |π1(2)〉 is not perpendicular to |ψ2(1)〉. A simple experiment has beenperformed at this limit to discriminate between two states of photon linear polarization(Barnett and Riis 1997).

By no means all optimal measurements are von Neumann measurements. Indeeda generalised measurement is normally required. As a simple illustration, consider thetrine ensemble in which a quit is prepared (with equal probability) in one of the threestates

|ψ1〉 = |0〉

|ψ2〉 =1

2

(

−|0〉+√3|1〉

)

|ψ3〉 =1

2

(

|0〉+√3|1〉

)

. (3.34)

These correspond to states separated by 120◦ on a great circle of the Bloch sphere, asdepicted in Fig. 3.2. The minimum-error strategy in this case corresponds to the threeprobability operators

πi =2

3|ψi〉〈ψi| (3.35)

and gives an error probability of 13 so that this measurement strategy determines

the correct state with probability 23 . The required generalized measurement has been

Optimized measurements 29

ψ >2

ψ >1

π >1^

π >2^

>1

>0

Fig. 3.1 Orientation in state space of the two non-orthogonal states to be discriminated

and the directions along which it is best to measure. Reproduced, with permission, from

Barnett(2009).

demonstrated using photon polarization qubits; the device is an interferometer in whichthe path taken through the device provides the ancillary degree of freedom (Clarke etal 2001b).

ψ >3

ψ >2

ψ >1

Fig. 3.2 The trine states depicted on the Bloch sphere. Reproduced, with permission, from

Barnett(2009).


3.4.2 Unambiguous discrimination

We have found the minimum-error strategy for discriminating between a pair ofequiprobable qubit states, |ψ1〉 and |ψ2〉, but can we determine the state for cer-

tain? This might sound like a contradiction but it is not if we allow for the possibilityof an ambiguous measurement outcome. To see this, consider the von Neumann mea-surement in which the projectors are

P1 = |ψ1〉〈ψ1|P1 = |ψ⊥

1 〉〈ψ⊥1 | , (3.36)

where |ψ⊥1 〉 is the state orthogonal to |ψ1〉. If we get the result 1 then we know for

certain that the state was not |ψ1〉 and hence that it must have been |ψ2〉. If weget the result 1, however, then the state could have been either |ψ1〉 or |ψ2〉 and theoutcome is ambiguous. Hence this simple measurement strategy determines the stateunambiguously but only some of the time.

It is natural to ask if we can find an optimal unambiguous discrimination strategy(Ivanovic 1987, Dieks 1988, Peres 1988, Chefles 2000, Barnett 2009, Barnett and Croke2009b). By this we mean a measurement strategy that identifies the state unambigu-ously with the highest probability or, equivalently, produces the smallest probabilityfor an ambiguous outcome. The optimal strategy in this case is the three-element POMwith probability operators

π1 =1

1 + |〈ψ1|ψ2〉||ψ⊥

2 〉〈ψ⊥2 |

π2 =1

1 + |〈ψ1|ψ2〉||ψ⊥

1 〉〈ψ⊥1 |

π3 = I− π1 − π2 . (3.37)

Note that the first two probability operators, π1(2), are proportional to projectorsonto the states perpendicular to |ψ2(1)〉. this means that the measurement outcomescorresponding to these two operators unambiguously identify the state as |ψ1(2)〉. Thestate will be correctly identified with probability 1−|〈ψ1|ψ2〉| and an ambiguous resultoccurs with probability |〈ψ1|ψ2〉|. It is noteworthy that it is the modulus of the overlapof the states and not the modulus squared that appears in these probabilities.

To realise this unambiguous detection we require a three-element POM and unam-biguous discrimination between two non-orthogonal photon polarisation states hasbeen demonstrated experimentally, using optical fibre with polarisation-dependentlosses (Huttner et al 1996) and in an interferometer similar to that used for mini-mum error discrimination between the trine states (Clarke et al 2001a).

We have presented here only two of a variety of optimal measurements. Others thathave been investigated include maximising the mutual information and determiningthe state with maximum confidence. These, the relationships between them and theexperiments that have been performed are discussed further in (Barnett and Croke2009b).

Operations and the post-measurement state 31

3.5 Operations and the post-measurement state

We have not as yet addressed the question of how the measurement process modifiesthe quantum state in a generalized measurement4. There are two pressing reasons forproceeding beyond the von Neumann ideal in which the quantum system is left inan eigenstate corresponding to the measurement outcome. The first is that most realmeasurements are more destructive than this, and the second is that it gives us no ideahow to describe the post measurement state for a generalised, that is non-projective,measurement.

A rigorous treatment takes us into the mathematical world of effects and operations(Kraus K 1983). Rather than this, we present only an indication of what is required.For a more complete treatment we refer the reader to (Croke et al 2008, Barnett2009). We start by noting that quantum theory is linear in the density operator andthis suggests that an allowed transformation of the density operator should be of theform

ρ→ ρ′ =∑

i

AiρBi . (3.38)

Not every set of operators {Ai, Bi} will be allowed, however, as the transformed densityoperator must, itself, be a density operator. This means that it must be Hermitian,ρ′ = ρ′†, it must be positive, 〈ψ|ρ′|ψ〉 ≥ 0, and it must have unit trace, Trρ′ = 1. The

first of these conditions suggests that we should set Bi = A†i and doing so automatically

ensures that the second is fulfilled. The final one, the preservation of the unit trace, issatisfied if we set

∑

i A†i Ai = I.

The operator A†i Ai is positive and also Hermitian. The fact that we require the

sum of these products to equal the identity operator, moreover, suggests the naturalidentification

πi = A†i Ai , (3.39)

and this allows us to complete the required description of the change of state after ameasurement. If we know that a measurement has been performed but do not knowthe measurement outcome then the density operator is transformed as

ρ→∑

i

AiρA†i . (3.40)

If we know that measurement result i was recorded, however, then the state is trans-formed as

ρ→ AiρA†i

Tr(AiρA†i )

=AiρA

†i

Tr(πiρ). (3.41)

In order to arrive at a unit-trace density operator in this case, we have divided bythe a priori probability for the measurement result. This is directly analogous to theprocedure in the von Neumann scheme, in which the transformation is

4Time did not permit me to address this question at the School, but these notes would be incom-plete without at least a brief account of it.


ρ→ PnρPn

Tr(Pnρ). (3.42)

It should be noted that a set of Kraus operators {Ai} determine uniquely a corre-sponding set of probability operators {πi}, but the converse is not true; assigning theprobability operators does not determine a unique transformation on the state.

4

Entanglement and its Applications

More than any other feature, it is entanglement that gives quantum theory its distinc-tive character. It is the source of paradoxes, the most famous of which is the muchdiscussed EPR paradox (Einstein et al 1935, Bohr 1935, Wheeler and Zurek 1983,Whitaker 2006), of tests of distinctive ‘quantumness’, such as the violation of Bell’sfamous inequality (Bell 1964, 1987), and it underlies some of the more unexpected (andheadline-grabbing) features of quantum information such as teleportation (Bennett etal 1993).

4.1 Entangled states and non-locality

We should start by stating what entanglement is. The simplest answer is that it is thedistinctive property of entangled states, which leads us to ask “what are entangledstates?”. This is, in fact, a surprisingly difficult question to answer, at least in fullgenerality. For the purposes of these all too brief lectures we shall avoid completelythe tricky issue of entanglement for mixed states and discuss only pure states. Forpure states there is a clear definition but it is the definition of states that are not

entangled. A state that is not entangled is a state of two or more quantum systemsthat can be factorized into a product of single-system states. States that do not havethis property are entangled. For two quantum systems, A and B, the combined state,|Ψ〉AB, is entangled if1

|Ψ〉AB 6= |ψ〉A ⊗ |ψ〉B . (4.1)

Consider, for example, the two-qubit state

|Ψ〉 = cos θ|0〉 ⊗ |1〉 − sin θ|1〉 ⊗ |0〉 . (4.2)

This state will not be entangled if θ = nπ/2, but will be entangled for all other values ofθ. For entangled states, the properties of the A and B systems are quantum correlated.We can see this in the above example in that determining whether the first qubit isin the state |0〉 or the state |1〉 also determines the, previously undetermined, state ofthe second qubit.

The implications of the existence of entanglement are profound and many. In theselectures we shall address only five:

1. EPR and related non-locality paradoxes.

1For those not familiar with the notation, it is often convenient to use the symbol ⊗ to separatequantum states and especially operators. Thus σxσy denotes acting first with σy then with σx on asingle qubit, but σx ⊗ σy means act with σx on qubit 1 and with σy on qubit 2 (Barnett 2009).

34 Entanglement and its Applications

2. The abiliy to perform “magic”.

3. Quantum dense coding.

4. Teleportation.

5. Dramatic speed up in quantum computing.

We shall treat the first four of these in this lecture and leave the final one for the nextand final lecture.

Let us begin with the EPR, in the form given by Bohm (1951). To this end, considertwo qubits in the entangled state

|Ψ−〉AB =1√2(|0〉A ⊗ |1〉B − |1〉A ⊗ |0〉B) . (4.3)

Let us suppose that qubit A is held by Alice and qubit B by Bob and that they areat a considerable distance from each other. If either party measures their qubit inany basis then they find either of two possible results, each occurring with probability1/2. If they measure the same observable as each other then they find opposite oranti-correlated results. This is a simple consequence of the eigenvalue equation:

a · σ ⊗ a · σ|Ψ−〉 = −|Ψ−〉 , (4.4)

which holds for all unit vectors a. The paradox is that a measurement by Alice ofher particle will instantaneously project the state of its partner into an eigenstate ofthe observable selected by Alice. If we assume that Alice has a free choice of whatto measure and that any influence of her measurement cannot travel arbitrarily fast,then Bob’s qubit must have carried values for all possible observables, something thatis clearly at odds with complementarity. We give, here, only a partial resolution of thisin the form of the no-signalling theorem.

4.1.1 Bell’s theorem

Correlations are also common in the classical world and, indeed, underlie classical com-munications. Are not the quantum correlations associated with entanglement simplythe same thing? It was Bell’s theorem, in the form of the violation of his celebratedinequality, that gave the definitive answer “no”!

We present here a derivation of Bell’s inequality, in its most common form (Clauseret al 1969, Bell 1987). Let us start by thinking of our qubit as a spin-half particle. Ameasurement of the spin along a direction in space, given by the unit vector a, willreveal the spin to be aligned or anti-aligned with this direction. We write the first ofthese as +1 and the second as −1, so that in any given measurement our measurementresult A with be ±1. We can do the same thing for the second qubit (entangled withthe first) and write the measurement result as B = ±1. If we take the product ofthese two values and average over many experimental realisations then we obtain thecorrelation function

E(a,b) = 〈AB〉 , (4.5)

which is clearly bounded in magnitude:

Entangled states and non-locality 35

−1 ≤ E(a,b) ≤ 1 . (4.6)

In order to arrive at Bell’s inequality we need to suspend disbelief and wonder whata theory would look like if the indeterminacy of quantum theory arose from hiddenvariables or, more precisely, local hidden variables which we denote by λ and take tobe governed by a probability distribution P (λ). To this end we write the correlationfunction in the form

E(a,b) =

∫

dλP (λ)A(a, λ)B(b, λ) . (4.7)

Some words of explanation are in order. We say that this is a form based on a localhidden-variable theory because each measured result A and B depends only on (i) thechoice of observable made at the observation site (a and b respectively) and (ii) thehidden variable λ, presumed to have been determined in the source of the particles.The measurement results do not depend on the choice of measurement made at thedistant site. These considerations lead us to Bell’s inequality as follows. First we write

E(a,b)− E(a,b′) =

∫

dλP (λ)[A(a, λ)B(b, λ) −A(a, λ)B(b′, λ)]

=

∫

dλP (λ)A(a, λ)B(b, λ)[1 ±A(a′, λ)B(b′]

−∫

dλP (λ)A(a, λ)B(b′ , λ)[1 ±A(a′, λ)B(b] . (4.8)

In the second line we have added and subtracted the same expression. In doing so, wehave assumed it is meaningful for quantities such as A(a, λ) and A(a′, λ) to coexist.This is the realism part of local-realism; it states that properties exist even if we donot measure them. We recall that the product AB lies between −1 and +1 and hencewe can obtain from this equality an inequality of the form

|E(a,b) − E(a,b′)| ≤∫

dλP (λ)[1 ±A(a′, λ)B(b′, λ)]

+

∫

dλP (λ)[1 ±A(a′, λ)B(b, λ)]

= 2± [E(a′,b′) + E(a′,b)] , (4.9)

which implies that

|E(a,b)− E(a,b′)|+ |E(a′,b′) + E(a′,b)| ≤ 2 . (4.10)

This is Bell’s inequality; it states that any correlations of a local-realistic nature mustsatisfy this inequality. What do we find for entangled quantum states? Consider againthe EPR spin state:

|Ψ−〉AB =1√2(|0〉A ⊗ |1〉B − |1〉A ⊗ |0〉B) . (4.11)


It is straightforward to confirm that for spin measurements on the two qubits preparedin this state we find

E(a,b) = 〈a · σ ⊗ a · σ〉 = −a · b . (4.12)

If we put this into our Bell inequality we find

|E(a,b)−E(a,b′)|+ |E(a′,b′)+E(a′,b)| = |−a · (b−b′)|+ |−a′ · (b+b′)| . (4.13)

To find the maximum value we simply choose a and a′ to be parallel, respectively, tob−b′ and b+b′ and b and b′ to be mutually perpendicular, as depicted in Fig. 4.1.With this choice we find

|E(a,b)− E(a,b′)|+ |E(a′,b′) + E(a′,b)| = 2√2 , (4.14)

which clearly exceeds 2 and hence violates the inequality. It is this, perhaps morethan anything else, that signified the exceptional nature of entanglement. In quantuminformation it is often by violating a Bell inequality that we demonstrate most clearlythe presence of entanglement: all entangled pure states violate a Bell inequality (Gisin1991).

a

b

a `

b `

Fig. 4.1 Orientation of the optimal measurement directions for maximal violation of Bell’s

inequality. Reproduced, with permission, from Barnett(2009).

4.1.2 No-signalling theorem

A resolution, of sorts, to the EPR paradox is that no operation carried out on one ofan entangled pair of quantum systems is detectable by an observation on its entangledpartner. This means, of course, that no signal can be transmitted in this way. Theno signalling theorem was stated rigorously by Ghirardi et al (1980), but rather thanfollow their analysis, it is instructive to make use of what we have learned alreadyabout generalised measurements (Barnett 2009).

Let us suppose that Alice and Bob each have one of a pair of entangled systems andthat Alice attempts to communicate with Bob by making a measurement on hers. Shecannot, of course, control the measurement result but can decide what to measure.We shall allow Alice to choose between any two measurements, which we allow tobe generalised. Let Alice’s measurements have the probability operators {πA1

j } for

Quantum “magic tricks” 37

measurement choice 1 and {πA2j } for choice 2. In an attempt to detect the effect of

Alice’s choice, Bob makes a measurement characterised by the probability operators{πB

k }. The joint probabilities for Alice’s and Bob’s measurement results are

P (j, k) = 〈Ψ|πA1,A2j ⊗ πB

k |Ψ〉 , (4.15)

where |Ψ〉 is the joint state of Alice’s and Bob’s system. If Alice’s choice is to affectthe Bob’s observation, then we need the probabilities for Bob’s measurement resultsto depend on Alice’s choice of measurement. This is clearly not the case, however, as

P (k) =∑

j

P (j, k) = 〈Ψ|πBk |Ψ〉 , (4.16)

where we have used the fact that our probability operators sum to the identity oper-ator: ∑

j

πA1,A2j = I . (4.17)

The probabilities of any of Bob’s possible measurement results is independent of Alice’schoice of measurement and, indeed, are the same whether Alice performs a measure-ment at all. Nothing Alice does to her quantum system is detectable by Bob and henceno signalling is possible by quantum measurement of a distant system.

The no-signalling theorem is a rigorous consequence of quantum theory and can beused, in quantum information theory, to place bounds on what is and is not possible.As a simple example, we can derive a lower bound on the probability for an incon-clusive outcome in unambiguous state-discrimination using the no-signalling condition(Barnett and Andersson 2002).

4.2 Quantum “magic tricks”

The Bell inequality is only one of an impressive array of startling and testable conse-quences of entanglement (Redhead 1987, Greenberger et al 1990, Mermin 1990). Werethese lectures about the mysteries of quantum theory, then we might enjoy exploringthese, but our purpose is not to test entanglement and its consequences, but rather toexploit it. We shall present one example, however, of a testable consequence of entan-glement as it points to a necessary distinction between logical reasoning in quantumtheory and the classical world.

Hardy (1993, Goldstein 1994) presented a simple demonstration of non-localitywithout an inequality. This is instructive in that it provides a warning against reasoningon the basis of what might have been measured rather than what actually was observed.The following short presentation is reproduced, essentially verbatim, from Barnett(2009).

Consider a pair of qubits, one held by Alice and the other by Bob, prepared in thepure state

|Hardy〉 = 1√3(|0〉A ⊗ |0〉B + |1〉A ⊗ |0〉B + |0〉A ⊗ |1〉B) . (4.18)

We proceed by noting that this state can also be written in the form


|Hardy〉 =√

2

3|0′〉A ⊗ |0〉B +

1√3|0〉A ⊗ |1〉B

=1√3|1〉A ⊗ |0〉B +

√

2

3|0〉A ⊗ |0′〉B , (4.19)

where |0′〉 = 2−1/2(|0〉+ |1〉), is the eigenstate of σx with eigenvalue +1. The followingstatements follow directly from the form of |Hardy〉:

(i) If both Alice and Bob measure the observable correspondding to σz , with eigenstates|0〉 and |1〉, then at least one of them will get the result +1, corresponding to the state|0〉.(ii) If Alice measures σz and gets the value +1 then a measurement by Bob of σx will,with certainty, find the value +1, corresponding to the state |0′〉.(iii) If Bob measures σz and gets the value +1 then a measurement by Alice of σx will,with certainty, find the value +1, corresponding to the state |0′〉.

Local realistic ideas lead us to treat as simultaneously real the values ±1 of the ob-servables corresponding to the operators σz and σx. The values of these, which wedenote σz and σx, respectively, should be independent of any choice of an observationcarried out on the other qubit. This leads us to express the above three properties asthe following probabilities:

P (σAz = −1, σB

z = −1) = 0

P (σBx = +1|σA

z = +1) = 1

P (σAx = +1|σB

z = +1) = 1 . (4.20)

The first of these tells us that at least one of the properties σAz and σB

z must take thevalue +1 and the following two then tell us that at least one of the properties σA

x andσBz must take the value +1. It is a prediction of local realism, therefore, that σA

x andσBz cannot both take the value −1:

P (σAx = −1, σB

x = −1) = 0 . (4.21)

A quantum mechanical treatment, however, shows that measurement by both Aliceand Bob of σx can both give the result +1 and that this occurs with probability 1

12 .This is clearly at odds with the local-realistic reasoning presented above.

As with all the best magic tricks, it is what the audience assumes to have happenedthat makes the trick appear to be impossible.

4.3 Ebits and shared entanglement

In quantum information we think of entanglement not as a mystery, but rather as aresource to be exploited. The first question to be addressed, as with any resource, is todecide how to quantify it. This is a subtle question but we can give, at least, a simplepreliminary answer in terms of “ebits”. If two spatially separated individuals share a

Quantum dense coding 39

pair of maximally entangled qubits then we say that they share 1 ebit. By maximallyentangled we mean pure state such that the reduced density operator for each of thecomponent qubits is of the form I/2. This means that we can identify the state onlyby bringing together the component qubits and that a measurement of just a singlequbit is maximally uncertain.

It is important to ask how we can get to a situation in which two parties sharean ebit. One way, of course, is to prepare an entangled state at one site and thentransport (carefully) one of the entangled qubits to the other site. We might veryreasonably ask, however, if it is not possible to achieve the same result by means ofclassical communications, by exchanging instructions between the distant sites. Thatthis is not possible follows directly from what we already know. Let us suppose thatAlice prepares the entangled state

|Ψ−〉 = 1√2(|0〉 ⊗ |1〉 − |1〉 ⊗ |0〉) . (4.22)

To share this state with Bob she can send the second qubit to Bob by means of asuitable quantum communications channel. The resulting quantum correlations be-tween the qubits (at least for a sufficient number of entangled pairs) could be used todemonstrate a violation a Bell inequality. Such a violation is not possible, of course,for any classical or classically-mediated correlations. Hence we can conclude, withoutthe need for further analysis, that classical and quantum communications are radicallydifferent in nature.

The antisymmetric state, |Ψ−〉 is not the only maximally entangled state of twoqubits. Indeed, we can create a complete basis of four orthonomal states for two qubits,each of which is similarly maximally entangled. One simple choice is the set of states,now referred to as the Bell states or the Bell basis. These four states are

|Ψ±〉 = 1√2(|0〉 ⊗ |1〉 ± |1〉 ⊗ |0〉)

|Φ±〉 = 1√2(|0〉 ⊗ |0〉 ± |1〉 ⊗ |1〉) . (4.23)

As these states form an orthonormal basis, we can perform a von Neumann measure-ment on the two qubits with each of these four corresponding to a different outcome.This “Bell measurement” cannot be performed, however, simply combining a von Neu-mann measurement on each individual qubit.

4.4 Quantum dense coding

A very simple application of the Bell states, perhaps the simplest, is in quantum densecoding (Bennett and Wiesner 1992). The starting point is the observation that a (vonNeumann) measurement of a qubit can provide at most one bit of information. Tosee how this works, we need only consider a qubit that has been prepared with equalprobability in either of the two orthogonal states, |0〉 and |1〉. A simple projectivemeasurement in this basis will reveal the state that has been prepared and provide asingle bit.


By using an ebit we can, in a sense, double this rate by communicating two bitsof information for every qubit sent from Alice to Bob. Let us start with a single ebitshared by Alice and Bob, prepared in the state

|Ψ−〉AB =1√2(|0〉 ⊗ |1〉 − |1〉 ⊗ |0〉) . (4.24)

Bob selects one of four unitary transformations to perform on his qubit. These trans-formations, and the effect of the corresponding entangled state, are listed below:

U1|0〉B|1〉B ⇒

|0〉B|1〉B |Ψ−〉AB ⇒ |Ψ−〉AB

U2|0〉B|1〉B ⇒

|1〉B|0〉B |Ψ−〉AB ⇒ |Φ−〉AB

U3|0〉B|1〉B ⇒

−|0〉B|1〉B |Ψ−〉AB ⇒ |Ψ+〉AB

U4|0〉B|1〉B ⇒

−|1〉B|0〉B |Ψ−〉AB ⇒ |Φ+〉AB . (4.25)

We see that the effect of Bob’s choice is to transform the state of the entangled pair ofqubits into one of the four orthogonal Bell states. Hence by sending just his single qubitto Alice he can send two bits of information (recall that 4 = 22 states corresponds to2 bits). In order to retrieve the information Alice simply performs a Bell measurementon the two qubits comprising the ebit and extracts the two bits2.

It is interesting to reflect on the nature of a Bell measurement and, in particular,to try to understand it in terms of spin measurements. We know, of course, that thePauli operators σx, σy and σz do not mutually commute. It may come as somethingof a surprise, therefore, that the products of these cartesian components of the spin,σx ⊗ σx, σy ⊗ σy and σz ⊗ σz , commute for any pair of qubits. The Bell states are thesimultaneous eigenstates of these three product operators:

σx ⊗ σx|Ψ−〉 = −|Ψ−〉 σy ⊗ σy|Ψ−〉 = −|Ψ−〉 σz ⊗ σz|Ψ−〉 = −|Ψ−〉σx ⊗ σx|Ψ+〉 = |Ψ+〉 σy ⊗ σy|Ψ+〉 = |Ψ+〉 σz ⊗ σz|Ψ+〉 = −|Ψ+〉σx ⊗ σx|Φ−〉 = −|Φ−〉 σy ⊗ σy|Φ−〉 = |Φ−〉 σz ⊗ σz |Φ−〉 = |Φ−〉σx ⊗ σx|Φ+〉 = |Φ+〉 σy ⊗ σy|Φ+〉 = −|Φ+〉 σz ⊗ σz|Φ+〉 = |Φ+〉 . (4.26)

Thus the Bell states are those in which the products of the cartesian components ofthe spin are defined and we can view a Bell measurement as a comparison of the threecomponents of spin for the two qubits. You may have noticed, in fact, that it suffices tocompare only two components of the spin as the final one is then defined. The reasonfor this has its origins in the fact that the product of any two Pauli spin componentsis proportional to the third and, in particular σxσy = iσz. It follows that

(σx ⊗ σx) (σy ⊗ σy) = −σz ⊗ σz (4.27)

2Although it sounds easy to perform a Bell measurement, achieving this in practice is, perhapsnot surprisingly rather more of a challenge (Mattle et al 1996).

Teleportation 41

and hence that the product of the eigenvalues of σx ⊗ σx, σy ⊗ σy and σz ⊗ σz mustbe −1. This means that there can be no state of two qubits, for example, in which thex-, y- and z- components of the spin are all the same.

4.5 Teleportation

Our final application is quantum teleportation. This is an important application ofentanglement and came as something of a surprise when it was first proposed (Ben-nett et al 1993). We should be very clear before proceeding however, that quantumteleportation is not matter transportation like something out of the popular sciencefiction series Star Trek. A better understanding may be reached, I feel, by adopting thedescription proposed by Haroche, which is quantum teleportation is a “fax machinefor quantum information”. It is the quantum information, or rather the quantum statethat is transferred and not the physical system on which the qubit state is stored.

Let us suppose that Alice wishes to send a qubit to Bob. She may know the state ofthe qubit or she may not. We have seen that quantum communications and classicalcommunications are very different and so should not be surprised that there is, ingeneral, no classical way to achieve this. Alice has two ways to achieve the desiredresult (i) she can send the physical qubit carrying the quantum information to Bob(we might call this “qmail”), or (ii) she can teleport the quantum information to Bobusing an ebit of shared entanglement (which we might call a “qfax”).

Alice can teleport the qubit state

|ψ〉1 = α|0〉1 + β|1〉1 (4.28)

by making use of an ebit of the form

|Ψ−〉AB =1√2(|0〉A ⊗ |1〉B − |1〉A ⊗ |0〉B) , (4.29)

shared with Bob. We can write the combined state of the three qubits as

|Ψ〉1AB = |ψ〉1 ⊗ |Ψ−〉AB

=1

2

[|Ψ−〉1A (−α|0〉B − β|1〉B)

+|Ψ+〉1A (−α|0〉B + β|1〉B)+|Φ−〉1A (α|1〉B + β|0〉B)+|Φ+〉1A (α|1〉B − β|0〉B)

]. (4.30)

In arriving at the final expression, we have only rewritten the state, but the quantuminformation, in the form of the coefficients α and β, have appeared connected withBob’s qubit. We know, however, that Bob cannot yet access this information, for todo so would violate the no-signalling theorem.

If Alice performs a Bell measurement on her two qubits (qubits 1 and A) she canthen tell Bob which of four possible transformations to perform on qubit B in orderto change its state into |ψ〉B. The idea is best understood in terms of an example. Letus suppose that Alice’s Bell measurement gives a result corresponding to the state


|Ψ+〉1A, then she knows that Bob’s qubit is in the state −α|0〉B+ β|1〉B. She then hasonly to use a two-bit classical channel to tell Bob to perform unitary transformationnumber 3 so that:

U3|0〉B|1〉B ⇒

−|0〉B|1〉B − α|0〉B + β|1〉B ⇒ α|0〉B + β|1〉B (4.31)

and the teleportation is complete in that Bob’s qubit is in the same state as that inwhich the original qubit started, |ψ〉1. Each of the other three possible Bell-measurementoutcomes corresponds to a different unitary transformation that Bob is required toperform.

We can understand why teleportation works by recalling that a Bell measurementis one in which we compare the cartesian components of the two spins. In particularthe four possible measurement results tell us that

|Ψ−〉 ⇒ σx different, σy different, σz different

|Ψ+〉 ⇒ σx same, σy same, σz different

|Φ−〉 ⇒ σx different, σy same, σz same

|Φ+〉 ⇒ σx same, σy different, σz same . (4.32)

We know, also, that in the shared entangled state, |Ψ−〉AB, has the three componentsof the spins all opposite to each other. Hence we can reason as follows: (i) If the Bellmeasurement gives the state |Ψ−〉1A then the spin components for the qubits 1 and Aare anti-aligned but we know also that the qubits A and B are anti-aligned and hencewe are left with Bob’s qubit in initial state of qubit 1. (ii) If the Bell measurementgives the state |Ψ+〉1A then the x- and y-components of the spins for the qubits 1and A are the same but the z-component is different. It follows that qubit B is leftin a state in which the x- and y-components of the spin are opposite to that for theinitial state of qubit 1, but the z-component is the same. Instructing Bob to perform arotation, through π about the z-axis will leave his qubit in the initial state of qubit 1.Similarly, (iii) if the Bell measurement gives the state |Φ−〉 then Bob needs to performa qubit-rotation about the x-axis and (iv) a Bell measurement giving the state |Φ+〉means that Bob needs to perform a rotation about the y-axis.

There is much more that could be said about teleportation, but we conclude thisall too brief introduction by describing a few features worthy of further thought:

1. We note that there is a sense in which Bob already has the information after Alice’smeasurement. This is a direct consequence of the projective nature of Alice’s Bellmeasurement. What saves us from violating the no-signalling theorem is simply thatBob does not know where to look for it.

2. Alice’s copy of the original qubit is destroyed in the teleportation process. Thisis inevitable and not simply a consequence of the scheme we have adopted. Were itotherwise, then we would be violating the no-cloning theorem.

3. Alice does not have to know the state that is to be teleported. We have chosen ageneral qubit state parametrized by the amplitudes α and β, but no process undertaken

Teleportation 43

by either Alice or Bob is dependent on these amplitudes and it follows that Alice doesnot need to know the state to be teleported.

4. The qubit to be teleported may itself be part of an entangled state and in thisway we can teleport entanglement. In this way, if Alice shares an ebit with Bob andalso one with Claire then she can teleport the entanglement to leave Bob and Clairesharing an ebit:

|Ψ−〉C1 ⊗ |Ψ−〉AB ⇒ |Ψ−〉CB . (4.33)

This process of transferring entanglement by teleportation is commonly referred to asentanglement-swapping.

5

Quantum Computation

No course on quantum information would be complete without a treatment of quantumcomputation. Yet covering such a diverse topic in a single lecture presents, if anything,an even greater challenge than the areas addressed in the preceding lectures. Thematerial selected for this lecture can only provide a modest foretaste of the topic thatspans a number of disciplines, including physics and computer science, and for a moresatiating introduction I can only recommend some texts for further reading (Nielsenand Chuang 2000, Stenholm and Suominen 2005, Vedral 2006, Mermin 2007, Kaye et

al 2007, Barnett 2009, Gay and Mackie 2010, Pachos 2012).We should start by asking why the idea of quantum computers seems to have be-

come so prominent. There are, I think, two very good reasons. The first is in responseto a very pressing need. The speed of development in computers has been truly re-markable and follows an exponential increase in performance first described by Moorein 1965 (Moore 1965). There are many versions of this law, but perhaps the simplest isthat the number of transistors on chip doubles roughly every two years. The way thisis achieved is by making the individual transistors ever smaller. As we make compo-nents ever smaller, however, we must inevitably run into quantum effects. Rather thanfight against quantum mechanics (a battle we must lose at some size scale) perhapsit would be better to embrace the new possibilities provided by quantum informationprocessing. The second reason is the distinctively new possibilities offered by quantumalgorithms (which can only run on a quantum computer). These address problems thatwill always remain intractable for a computer based on classical logic and include sim-ulations of complex quantum systems (like large molecules) and the headline-grabbingShor’s algorithm for factoring, of which more later.

5.1 Digital electronics

We begin our discussion by describing the way in which logical bits and logic operationsare implemented classically. In the simplest form, the logical bits 1 and 0 are encodedas voltages - a high voltage for 1 and ground or zero volts for 0. These values aremanipulated by transistor-based devices called gates (Smith 1983). The most commonof these act either on a single bit value or couple two together. There is only onesingle-bit gate, the NOT gate, which simply changes a 0 to 1 and a 1 to 0. The symbolfor the NOT gate and its truth table are given in Fig 5.1. The simplest two-bit gatesare the AND and the OR gate, depicted in Fig 5.2. Combinations of these togetherwith a NOT gate to form NAND and NOR gates are also common. These two bitgates, together with the NOT gate allow us to perform any logical operation (strictly

Quantum gates 45

a Boolean operation) on a string of bits. Such an operation is, at a fundamental level,what we mean by a computation.

A

NOT

A

1 0

0 1

A A

Fig. 5.1 The NOT gate and its truth table. Reproduced, with permission, from Bar-

nett(2009).

A

AND

A

B

A B.A B.

0

0

1

1

1

0

1

0

0

0

1

0

B

A

OR

A

B

A B+

0

0

1

1

1

0

1

0

1

0

1

1

B A B+

Fig. 5.2 The AND and OR gates and their truth tables. Reproduced, with permission, from

Barnett(2009).

5.2 Quantum gates

The simplest way in which to introduce quantum elements into information processingis to replace each classical bit with a qubit and to design devices that operate uponthem. We are not completely free in the operations we can devise, however, as wemust respect the laws of quantum mechanics. This means, in particular, that thetransformations we can perform are necessarily limited to the unitary transformations.Even at the single qubit level, however, there is already a great deal more we cando that for a classical bit, as we are allowed to generate superpositions of the twoqubit states |0〉 and |1〉. A single qubit-gate may perform any single-qubit unitary

46 Quantum Computation

transformation, that is any rotation on the Bloch sphere. Six of the more commonlyoccurring one-qubit gates, together with the unitary transformations they enact arepresented in Fig 5.3.

XPauli-X X = = σ^

YPauli-Y Y = = σ ^

Tπ/8 T = ^

SPhase S = ^

ZPauli-Z Z = = σ^

HHadamard ^H =

x^

y^

z^

Fig. 5.3 Some of the more common one-qubit gates together with their associated unitary

transformations. Reproduced, with permission, from Barnett(2009).

We require in addition gates that couple qubits. Unlike their classical counterparts,these gates have two output qubits as well as two input ones; where this not the casethen we would not respect unitarity. Principal among the vast array of possible two-bit gates is the controlled NOT gate, or CNOT gate, depicted in Fig 5.4. The twoqubits entering the gate are designated the control qubit, C and the target qubit, T.The gate enacts the transformation |1〉 → |0〉, |0〉 → |1〉 if the control qubit is inthe state |1〉 but leaves it unchanged if the if the control qubit is in the state |0〉.Despite its innocuous looking truth table, it is straightforward to show that this gateis intrinsically quantum-mechanical in nature. To see this, let the control bit enteringthe gate be in a superposition of computational basis states. The transformation isthen

1√2(|0〉C + |1〉C) |0〉T →

1√2(|0〉C |0〉T + |1〉C |1〉T ) , (5.1)

which we recognise as one of the entangled Bell states. The fact that the CNOT gatehas generated an entangled state from an unentangled one (and therefore introduced

Quantum gates 47

the possibility of non-locality) suffices to establish the intrinsically quantum nature ofthe CNOT gate.

Control

Target

=

X

C CT T

0 00 0> > > >0 01 1> > > >1 10 1> > > >1 11 0> > > >

Fig. 5.4 The CNOT gate and its effect on the computational basis states. Reproduced, with

permission, from Barnett(2009).

In digital electronics, a small number of gates suffices to allow all possible Booleanoperations and, remarkably, something very similar holds for quantum gates. We canperform any unitary evolution of a set of qubits by acting on them only with single-qubit gates and a suitable two-qubit gate; the CNOT gate suffices for this purpose.We do not prove this assertion here as the required demonstration would take toolong and occupy too much space. The proof is not particularly difficult, however, andmay be find in the (Nielsen and Chuang 2000, Barnett 2009). As an example we canconstruct the three-qubit Toffoli gate, or controlled, controlled NOT gate using single-qubit and two-bit gates. The operation of the Toffoli gate is most readily understoodin the computational basis, |0〉, |1〉 for the three qubits:

|A〉 ⊗ |B〉 ⊗ |C〉 → |A〉 ⊗ |B〉 ⊗ |C⊕ (A · B)〉 , (5.2)

where A, B, and C take the values 0 or 1. In words, the gate leaves the computational-basis of the first and second qubits unchanged but flips the state, |0〉 ↔ |1〉, of thethird qubit if both the first and second qubits are in the state |1〉. The Toffoli gateand its implementation in terms of two-qubit gates is given in Fig 5.5. This involvestwo CNOT gates and three controlled unitary gates, in which the designated unitarytransformation,

W =1− i2

(

I + iσx

)

W † =1 + i

2

(

I− iσx)

, (5.3)

is performed on the target qubit if the control qubit is in the state |1〉 and the iden-tity is applied if it is in the state |0〉. The controlled unitary operation may also beimplemented using CNOT gates and single-qubit gates if desired (Barnett 2009).


=

W W W

A>

B>

C>Fig. 5.5 The three-qubit Toffoli gate constructed from two-qubit gates. Reproduced, with


If we can combine enough qubits and produce enough gates then we can produceany desired unitary transformation. This would then constitute a quantum processor,designed to take input data in the form of an input multi-qubit state and generateoutput in the form of another state. The information could then be extracted bymeans of a measurement on the qubits. What is necessary in order to achieve thisgoal? The answer was provided by DiVincenzo (1996) in the form of five critteriafor implementing a quantum computer. These have been modified somewhat over theintervening period, but the original five serve to convey the key requirements:

1) We need well-defined extendible qubit array that is stable.

2) The qubit array should be preparable in a suitable starting state, such as that inwhich all the qubits are in the state |0〉.3) We need good isolation from the environment, i.e. long coherence times.

4) It must be possible to perform a universal set of gate operations, such as single-qubitrotations and CNOT operations for any chosen pair of qubits.

5) Finally, we need to be able to perform something close to ideal von Neumannmeasurements on each of the qubits.

These demands present a significant technical challenge and, although great advanceshave been made, I think it safe to say that, at present, not proposed implementationof a quantum processor has managed to achieve them all.

We should note that the gate model of a quantum processor, described above,is not the only possible one. An alternative is provided by the use of cluster states(Raussendorf et al 2003). In this approach we first prepare a highly entangled state ofour qubits and then proceed by performing a sequence of single-qubit measurementsfollowed by single-qubit unitary transformations the form of which depends on thepreceding measurement results.

5.3 Principles of quantum computation

The basic idea of a quantum computation is quite simple. We start with an inputstring of bits, which we encode onto the initial states of our by preparing each in oneof the states |0〉 or |1〉. We then use our collection of gates to perform on this state a

Principles of quantum computation 49

unitary transformation:

101101001→ |1〉 ⊗ |0〉 ⊗ |1〉 ⊗ |1〉 ⊗ |0〉 ⊗ |1〉 ⊗ |0〉 ⊗ |0〉 ⊗ |1〉 ⊗ |1〉→ U |1〉 ⊗ |0〉 ⊗ |1〉 ⊗ |1〉 ⊗ |0〉 ⊗ |1〉 ⊗ |0〉 ⊗ |0〉 ⊗ |1〉 ⊗ |1〉 . (5.4)

The computation is completed by measuring each qubit to give back a (classical) bitstring which, hopefully, is the desired output:

U |1〉 ⊗ |0〉 ⊗ |1〉 ⊗ |1〉 ⊗ |0〉 ⊗ |1〉 ⊗ |0〉 ⊗ |0〉 ⊗ |1〉 ⊗ |1〉 → 000111010 . (5.5)

This is not quite all there is to it. There is a problem if the function we wishto calculate gives the same value for two distinct inputs. Consider, for example, thetwo-bit transformation in two bits, A and B, are transformed into A and (A AND B):

00→ 00 01→ 00 10→ 10 11→ 11 . (5.6)

The difficulty in realising this computation on a quantum processor in the mannerindicated above is that the transformation must be unitary and that unitary transfor-mation maintain the overlap of the initial states and our computation requires

|0〉 ⊗ |0〉 → |0〉 ⊗ |0〉 |0〉 ⊗ |1〉 → |0〉 ⊗ |0〉

〈0, 0|0, 1〉 = 0 −→ 〈0, 0|0, 0〉 = 1 , (5.7)

which clearly violates this requirement.In order to be able to compute any function, we input into our quantum processor

not one string of qubits but two. Let the input states of the two strings be |a〉 and |b〉.The first of these is the input data, with the sequence of bits encoded onto the qubitsas in Eq (5.4). The second string is to act as our output and is often prepared in thestate |b〉 = |0〉⊗N , so that every qubit is in the state |0〉. If the processor is to calculatethe Boolean function f(a) then we require it to realise the transformation

|a〉 ⊗ |b〉 → |a〉 ⊗ |b ⊕ f(a)〉 , (5.8)

where ⊕ denotes the bit-wise modulo addition. (See in Fig 5.6.) It is straightforward to

u^

f

>a >a

>b >b + f(a)

Fig. 5.6 A quantum processor designed to compute the function f(a). Reproduced, with



confirm, by explicit construction, that such a unitary transformation is always possible:

Uf =∑

a

|a〉〈a| ⊗

|f(a)〉〈a|+ |a〉〈f(a)|+∑

b6=a,f(a)

|b〉〈b|

. (5.9)

Of course this is not the only unitary operator that performs this task, but this simpleoperator suffices to demonstrate that a suitable unitary operator exists.

The remarkable properties of a quantum computer derive largely from the fact thatwe can put into the processor not just a single number but rather a superposition ofmany. If each qubit in the first qubit string is prepared in the state 2−1/2(|0〉 + |1〉)then the string of N qubits is in a superposition of every number between 0 and 2N−1:

2−N/2 (|0〉+ |1〉)⊗N = 2−N/22N−1∑

a=0

|a〉 . (5.10)

The linearity of quantum mechanics means that the output state is an entangled onein which the function f(a) has been calculated for every possible input number, as inFig 5.7.

>a

>b

Σa >a >b + f(a)Σ

axu

^

f }

Fig. 5.7 If a superposition of possible input numbers is prepared then the output state is

an entangled one in which values of the function for each of the inputs appear. Reproduced,

with permission, from Barnett(2009).

A simple example may serve to illustrate the potential of a quantum processor.With this aim in mind we present Deutsch’s algorithm and its many-bit extension, theDeutsch-Jozsa algorithm (Deutsch 1985, Deutsch and Jozsa 1992, Cleve et al 1998).Although somewhat contrived in nature, these serve to illustrate in a very simple waythe potential for speed-up offered by a quantum computer. These are examples of aclass of challenges called oracle-problems.

We start with Deutsch’s algorithm (Deutsch 1985). Let us suppose that we have a‘black box’ (an oracle) and that this device evaluates a one-bit function of a one-bitinput. This means that we input a single bit, with value either 0 or 1, and that thisgenerates an output value of either 0 or 1. There are four possible such functions, twoconstant functions:

f(0) = 0 f(1) = 0 ,

and f(0) = 1 f(1) = 1 , (5.11)

and two balanced functions (balanced in the sense that both possible bit values occurin the output, one for each input):

f(0) = 0 f(1) = 1 ,

Principles of quantum computation 51

and f(0) = 1 f(1) = 0 . (5.12)

If we want to know which function the oracle calculates then we need to input bothbits values and examine the corresponding outputs. Let us suppose, however, that ourtask is simply to determine whether the function calculated is a constant function ora balanced one. Classically, of course, this requires us to input both possible bit valuesand so also requires two computations. A suitable quantum processor, however, cando this in a single step. Consider the general transformation given in Eq. (5.8), butwith the two subit strings each replaced by a single qubit so that the transformationenacted by the processor is

|A〉 ⊗ |B〉 → |A〉 ⊗ |B⊕ f(A)〉 . (5.13)

We can make use of the superposition principle to prepare both of our qubits in asuperposition state and so produce the transformation

1

2(|0〉+ |1〉)⊗ (|0〉 − |1〉)→ 1

2

(|0〉 ⊗ |f(0)〉 − |0〉 ⊗ |f(0)〉+ |1〉 ⊗ |f(1)〉 − |1〉 ⊗ |f(1)〉

)

=1

2

[|0〉 ⊗

(|f(0)〉 − |f(0)〉

)+ |1〉 ⊗

(|f(1)〉 − |f(1)〉

)]

=1

2

(

(−1)f(0)|0〉+ (−1)f(1)|1〉)

⊗ (|0〉 − |1〉) . (5.14)

A simple measurement carried out on the first qubit then tells us what we need toknow: if it is found to be in the state 2−1/2(|0〉 + |1〉) then the function is constantand if we find it in the orthogonal state 2−1/2(|0〉− |1〉) then the function is balanced.We have found out what we need to know by addressing the oracle only once, ratherthan twice as would be required classically.

The Deutsch algorithm may seem, at least at first sight, not to be an especiallyconvincing illustration, after all we do input two qubits into the processor. The realpower appears when we consider an extension of it to the Deutsch-Jozsa algorithm. Inthis algorithm, our input is an n-bit number a and our output is again a single bit, 0or 1. The function is either constant, giving either 0 or 1 for all inputs, or balanced,giving 0 for half of the inputs and 1 for the other half. Our task is to determine analgorithm that determines with certainty whether the function is constant or balanced.We can start putting in different strings and has soon as we get two different outputvalues then we know for certain that the function compute is balanced. To know itsnature for certain, however, we may have to input over half of the possible inputswhich means addressing the oracle 2n−1 + 1 times. A quantum processor can achievethis task in a single shot, however. To see how this works, we let the first string consistof n qubits, each prepared in the superposition state 2−1/2(|0〉+ |1〉) with the secondstring being just a single qubit prepared in the state 2−1/2(|0〉 − |1〉). The oracle thenperforms the transformation

2−(n+1)/2(|0〉+ |1〉⊗n)⊗ (|0〉 − |1〉)→ 2−(n+1)/22n−1∑

a=0

(−1)f(a)|a〉 ⊗ (|0〉 − |1〉) . (5.15)

If the function is constant then the first string of n qubits remain in their input state,as equally-weighted superposition of the states |a〉, but if the function is balanced then


the qubit-string will be in a state orthogonal to this. Hence only a single computationis required to determine whether the function is constant or balanced. This representsan improvement over the classical requirement of 2(n−1) + 1 that is exponential in n,the number of bits. Such dramatic improvements are characteristic of a number ofquantum algorithms, but not all.

5.4 Quantum algorithms

Let us suppose that we have a suitable quantum processor, what might we do with it?One important thing that we might do is to use it to simulate a complicated quantumprocess (Feynman 1982). As quantum systems get larger it becomes ever more difficultto simulate them; were this not the case, then we could emulate a large quantum com-puter on a classical one. We might also use a quantum computer to speed-up searchingin an unstructured database using Grover’s algorithm, which provides a dramatic im-provement (albeit not an exponential one) over classical methods (Grover 1998). Themost dramatic possibility to date, however, is Shor’s algorithm for determining thetwo prime factors of a large number. You will recall that it is the difficulty in perform-ing this task that underlies to security of the RSA public-key cryptosystem and withit, much of the world’s secure communications (Shor 1997). It was Shor’s proposalmore than any other single factor that truly changed quantum computation, and withit quantum information, from a small-scale research field into a major internationalendeavour. There is space, in these lectures, to present briefly only one quantum al-gorithm and, because of its significance, we choose Shor’s algorithm. Several others,including Grover’s algorithm, and a more complete presentation of Shor’s algorithmmay be found in (Barnett 2009).

At the heart of Shor’s algorithm is the remarkable ability of a quantum computer toperform, highly efficiently, a quantum Fourier transform and hence to find the periodof a function. To see why this helps in factoring, we need to consider an idea fromnumber theory. We start with three bits of input data: N , the number to be factorized,m, a small integer chosen at random and the non-negative integers, n = 0, 1, 2, · · ·. Wefirst make the series FN (n) = mnmodN . If we can then find the period of this function,r, such that FN (n+ r) = FN (n), then the greatest common divisor of mr/2± 1 and Ndivides N . In other words, one of the factors of each of mr/2± 1 is also a prime factorof N . There are, of course, some additional subtleties, but the number theory to provethis is by no means difficult, and may be found in Barnett (2009). Let us considertwo examples to demonstrate the idea. Let us consider the problem of factoring 15 bymeans of this process. Let us first select m = 2, so that our series is

20mod15 = 1

21mod15 = 2

22mod15 = 4

23mod15 = 8

24mod15 = 1...

... . (5.16)

We see that the period is 4 and hence

Quantum algorithms 53

mr/2 + 1 = 5

mr/2 − 1 = 3 (5.17)

and these are the required prime factors. Let us try another example, m = 11, forwhich our series is

110mod15 = 1

111mod15 = 11

112mod15 = 1...

... , (5.18)

so the period is 2. This gives

mr/2 + 1 = 12

mr/2 − 1 = 10 . (5.19)

The greatest common divisor of 12 and 15 is 3, which is one of the prime factors weseek and the creates common divisor of 10 and 15 is 5, which is the other factor.

So why do we need a quantum computer to run this seemingly simple algorithm?The answer is that each step can be run efficiently, even for very large numbers, ona classical computer with the single exception of finding the period of the functionFN (n). This is a very difficult task and it is no exaggeration to state that it is thisperiod finding problem that underlies the security of the RSA cryptosysem.

Central to Shor’s algorithm for factoring on a quantum computer is the quantumFourier transform (Nielsen and Chuang 2000, Mermin 2007, Barnett 2009). We do notattempt a detailed account of the quantum Fourier transform, but present instead onlyan outline of the steps involved in Shor’s algorithm. It is, perhaps, clearest to give alist of the five major steps:

Shor’s algorithm to find the two prime factors of N

1. We start by finding two integers, q and M , such that

q = 2M > N2 (5.20)

and prepare two registers, each containing M qubits.

2. We set each on the qubits in the first register to the state 2−1/2(|0〉+ |1〉) and each inthe second register in the state |0〉, so that the state input into our quantum processoris

|ψ〉 = 1√q

q−1∑

n=0

|n〉1 ⊗ |0〉2 , (5.21)

where we have used the short-hand notation

|0〉 = |00 · · · 000〉


|1〉 = |00 · · · 001〉|2〉 = |00 · · · 010〉|3〉 = |00 · · · 011〉...

... . (5.22)

3. Next we chose an integer m at random and use the quantum processor to entanglethe two registers so that

|ψ〉 → 1√q

q−1∑

n=0

|n〉1 ⊗ |mnmodN〉2 . (5.23)

This can be achieved efficiently by means of a unitary transformation on a suitablyprogrammed quantum computer.

4. Next comes the crucial quantum Fourier transform, which again can be performedefficiently as a unitary transform on a quantum computer. We use our quantum com-puter to perform a quantum Fourier transform on the first register to generate thetransformation

1√q

q−1∑

n=0

|n〉1 ⊗ |mnmodN〉2 →1

q

q−1∑

n=0

q−1∑

k=0

|k〉1 ⊗ |mnmodN〉2 exp(

i2πkn

q

)

. (5.24)

Let us pause to think and see what has been achieved by all of this. The numbermnmodN has period r, which means, of course, that

mn+srmodN = mnmodN , (5.25)

so that we have some identical states in the second register:

|mnmodN〉 = |mn+rmodN〉 = |mn+2rmodN〉 = · · · . (5.26)

The coefficients of these states in our state will have the phases are

exp

(

i2πk(n+ sr)

q

)

= exp

(

i2πkn

q

)

× exp

(

i2πksr

q

)

, (5.27)

which will be in phase (or nearly in phase) for k ≈ q/r.

5. Finally, we need only measure the first register in the computational basis and wewill find, because of constructive interference, with high probability a value k for whichthe amplitude is large, one for which k ≈ q/r, or r ≈ q/k. This allows us to greatlynarrow the range of allowed values of r and hence greatly restrict the range of possiblefactors. All possible candidate factors can be checked efficiently and simply by dividingthem into N .

This is just one example, albeit the most prominent and newsworthy, of an effi-cient quantum algorithm. There are others and the field of quantum algorithm designpromises to gain in importance as the first quantum processors become available.

Errors and decoherence 55

5.5 Errors and decoherence

So what stops us building a quantum processor, revolutionising computation andbreaking into RSA? The short answer is that we don’t yet have any suitably scal-able implementation of quantum logic elements such as qubits and gates.

Many technical challenges remain to be solved before we can build a quantumcomputer to challenge current devices based on classical logic. I mention here only oneproblem, that of decoherence. In a quantum computer our information is encoded ontotwo-state quantum systems, our qubits and these are made to interact with each otherby the implementation of desired Hamiltonians. Yet the quantum natures, both of thequbits and of the Hamiltonian, makes them extremely sensitive to their environmentand even weak interactions can have a disastrous effect. As an illustration, let usconsider the effect of a single-qubit error in an implementation of Deutsch’s algorithm.To keep things simple, let us consider only a phase error:

|0〉 → |0〉 |1〉 → −|1〉 , (5.28)

and a bit-flip error|0〉 → |1〉 |1〉 → |0〉 . (5.29)

Recall that we obtain the desired information, as to whether the computed function isconstant or balanced, by measuring the first qubit to be in the state 2−1/2(|0〉+ |1〉) or2−1/2(|0〉 − |1〉), respectively. If a bit-flip error occurs for this qubit then we are safe,but if a phase-error occurs then we get the wrong answer!

Clearly we need to work hard to suppress all sources of errors and decoherence,but this is not the end of the problem. For large-scale computations we need a largenumber of qubits and the scaling is not at all favourable. To see this let us supposethat the probability that a single qubit has no error in a time t is

exp (−Γt) .If we have n qubits then the probability that none of these experiences an error inthis time is

exp (−nΓt) ,which is exponential in n. But worse is to come. Let us suppose that t is the typicalamount of time it takes to perform a single gate operation. If we need a sequence ofm of these then we find

exp (−nmΓt) .

Typically we might need each qubit to interact with every other qubit so m may be ofthe same rode as n. We do not need to perform the gate operations one at a time, ofcourse, and by suitably optimising the order of operations, we might have m ≈ lognto give a final zero-error probability of

exp (−n lognΓt) . (5.30)

For 300 qubits, the exponent is about 2,000 times smaller than the single-qubit andsingle-gate error rate, e−Γt, that we started with. The zero-error probability is the2, 000th power of the no error probability for a single qubit and a single-gate operation.


Clearly we need to control sources of decoherence to an extraordinary degree butultimately this will always be a losing battle. Is it all hopeless? By no means; thesolution is essentially the same as we encountered in the first lecture. We can combaterrors in a quantum computer by redundancy, just as we do to combat classical errors.This is a more subtle problem than in the classical regime, however. Firstly there is theno-cloning theorem which tells us that we cannot literally make multiple copies of ourqubits and there is also the problem that quantum measurements, if not performedcarefully, will modify the very qubit states that we are trying to protect. Nevertheless,there are protocols detecting and correcting qubit errors. Perhaps the most importantof these is the Steane code, in which each logical qubit is ended onto seven logicalqubits (Steane 1996, Barnett 2009), but a description of how this works will have towait for next time.

References

Aharonov, Y., Albert, D. Z. and Vaidman, L. (1988) How the result of a measurementof a component of spin of a spin− 1

2 particle can turn out to be 100. Phys. Rev. Lett.60, 1351–1354.Barnett, S. M. and Riis, E. (1997) Experimental demonstration of polarization dis-crimination at the Helstrom bound. J. Mod. Opt. 44, 1061–1064.Barnett, S. M. and Andersson, E. (2002) Bound on measurement based on the no-signalling condition. Phys. Rev. A 65, 044307.Barnett, S. M. (2009) Quantum information. Oxford University Press, Oxford.Barnett, S. M. and Croke, S. (2009a) On the conditions for discrimination betweenquantum states with minimum error. J. Phys. A: Math. Theor. 42, 062001.Barnett, S. M. and Croke, S. (2009b) Quantum state discrimination. Adv. Opt. Pho-

ton. 1, 238–278.Bayes, T. (1763) An essay towards solving a problem in the doctrine of chances. Phil.Trans. R. Soc. 53, 370–418.Bell, J. S. (1964) On the Einstein-Podolsky-Rosen paradox. Physics 1, 195–200.Reprinted in (Wheeler and Zurek 1983; Bell 1987).Bell, J. S. (1987) Speakable and unspeakable in quantum mecahnics. CambridgeUniversity Press, Cambridge.Bennett, C. H. and Brassard, G. (1984) Quantum cryptography: public key distribu-tion and coin tossing. Proceedings of IEEE international conference on computers

systems and signal processing, Bangalore India 175–179.Bennett, C. H. and Wiesner, S. J. (1992) Communication via one- and two-particleoperators on Einsetin-Podolsky-Rosen states. Phys. Rev. Lett. 69, 2881–2884.Bennett, C. H., Brassard, G., Crepeau, C., Jozsa, R., Peres, A. and Wootters, W. K.(1993) Teleporting an unknown quantum state via dual Einstein-Podolsky-Rosenchannels. Phys. Rev. Lett. 70, 1895–1899.Bergou, J. (2007) Quantum state discrimination and selected applications. J. Phys.Conf. Ser. 84, 012002.Boas, M. L. (1983) Mathematical methods in the physical sciences. Wiley, New York.Bohm, D. (1951) Quantum theory . Prentice-Hall, New Jersey. Reprinted by Dover,New York (1989).Bohr, N. (1935) Can quantum-mechanical description of reality be considered com-plete?. Phys. Rev. 47, 696–702.Bouwmeester, D., Ekert, A. and Zeilinger, A. (eds) (2000) The physics of quantum

information. Springer-Verglag, Berlin.Box, G. E. P. and Tiao, G. C. (1973) Bayesian inference in statistical analysis. Wiley,New York.

58 References

Bretthorst, G. L. (1988)Bayesian spectrum analysis and parameter estimation. Springer-Verlag, New York.Brillouin, L. (1956) Science and information theory . Academic Press, New York.Buchmann, J. A. (2001) Introduction to cryptography . Spring, New York.Buzek, V. and Hillery, M. (1996) Quantum copying: beyond the no-cloning theorem.Phys. Rev. A 54, 1844–1852.Chefles, A. (2000) Quantum state discrimination. Contemp. Phys. 41, 401–424.Clarke, R. B. M., Chefles, A., Barnett, S. M. and Riis, E. (2001a) Experimen-tal demonstration of optimal unambiguous state discrimination. Phys. Rev. A 63,040305(R).Clarke, R. B. M., Kendon, V. M., Chefles, A., Barnett, S. M. and Riis, E. (2001b)Experimental realization of optimal detection strategies for overcomplete states.Phys. Rev. A 64, 012303.Clauser, J. F., Horne, M. A., Shimony, A. and Holt, R. A. (1969) Proposed experimentto test local hidden-variable theories. Phys. Rev. Lett. 23, 880–884.Cleve, R., Ekert, A., Macchiavello, C. and Mosca, M. (1998) Quantum algorithmsrevisited Proc. R. Soc. Lond. A 454, 339–354.Conan Doyle, A. (1903) The adventure of the dancing men. Strand Magazine 26

December issue. Reprinted in The original illustrated Sherlock Holmes, Castle, NewJersey.Cover, T. M. and Thomas, J. A. (1991) Elements of information theory. Wiley, NewYork.Croke, S., Barnett, S. M. and Stenholm, S. (2008) Linear transformation of quantumstates. Ann. Phys. 323, 893–906.Deutsch, D. (1985) Quantum theory, the Church-Turing principle and the universalquantum computer. Proc. R. Soc. Lond. A 400, 97–117.Deutsch, D. and Jozsa, R. (1992) Rapid solution of problems by quantum computa-tion. Proc. Roy. Soc. Lond. A 439, 553–558.Dieks, D. (1982) Communication by EPR devices. Phys. Lett. A 92, 271–272.Dieks, D. (1988) Overlap and distinguishability of quantum states. Phys. Lett. A126, 303–306.DiVincenzo, D. (1996) Topics in quantum computers. http://arxiv.org/abs/cond-mat/9612126.Einstein, A., Podolsky, B. and Rosen, N. (1935) Can quantum-mechanical descriptionof reality be considered complete?. Phys. Rev. 47, 777–780.Feynman, R. P. (1982) Simulating physics with computers. Int. J. Theo. Phys. B2,467–488.Gay, S. and Mackie, I. (eds) (2010) Semantic techniques in quantum computation

Cambridge University Press, Cambridge.Ghirardi, G. C., Rimini, A. and Weber, T. (1980) A general argument against super-luminal transmissions through the quantum mechanical measurement process. Lett.al. Nuovo Cim. 27, 293–298.Gisin, N. (1991) Bell’s inequality holds for all non-product states. Phys. Lett. A 154,201–202. It should be noted that the paper contains some typographical errors, themost serious of which is the title; the paper proves the exact opposite of what is

References 59

stated in the title!Gisin, N., Ribordy, G., Tittel, W. and Zbinden, H. (2002) Quantum cryptography.Rev. Mod. Phys. 74, 145–195.Goldie, C. M. and Pinch, R. G. E. (1991) Communications theory. Cambridge Uni-versity Press, Cambridge.Goldstein, S. (1994) Nonlocality without inequalities for almost all entangled statesof two particles. Pys. Rev. Lett. 72, 1951–1951.Greenberger, D. M., Horne, M. A., Shimony, A. and Zeilinger, A. (1990) Bell’s theo-rem without inequalities. Am. J. Phys. 58, 1131–1143.Grover, L. (1998) Quantum computers can search rapidly by using almost any trans-formation. Phys. Rev. Lett. 80, 4329–4332.Hamming, R. W., (1980) Coding and information theory . Prentice-Hall, London.Hardy, L. (1993) Nonlocality for two particles without inequalities for almost allentangled states. Phys. Rev. Lett. 71, 1665–1668.Helstrom, C. W. (1976) Quantum detection and estimation theory . Academic Press,New York.Holevo, A. S. (1973) Statistical decision theory for quantum systems. J. Multivariate

Anal. 3, 337–394.Holevo, A. S. (1982) Probabilistic and statistical aspects of quantum theory . NorthHolland, Amsterdam.Holevo, A. S. (2001) Statistical structure of quantum theory . Springer-Verlag, Berlin.Hunter, K. (2003) Measurement does not always aid state discrimination. Phys. Rev.A 68, 012306.Huttner, B., Muller, A. Gautier, G., Zbinden, H. and Gisin, N. (1996) Unambiguousquantum measurement of nonorthogonal quantum states. Phys. Rev. A 54, 3783–3789.Ivanovic, I. D. (1987) How to differentiate between non-orthogonal states. Phys. Lett.A 123. 257–259.Jaynes, E. T. (1957a) Information theory and statistical mechanics I. Phys. Rev. 106,620–630.Jaynes, E. T. (1957b) Information theory and statistical mechanics II. Phys. Rev.108, 171–190.Jaynes, E. T. (2003) Probability theory the logic of science. Cambridge UniversityPress, Cambridge.Jeffreys, H. (1939) The theory of probability . Clarendon Press, Oxford.Kaye, P., Laflamme, R. and Mosca, M. (2007)An Introduction to quantum computing

Oxford University Press, Oxford.Khinchin, A. I. (1957) Mathematical foundations of information theory. Dover, NewYork.Kullback, S. (1959) Information theory and statistics. Dover, New York.Kraus, K. (1983) States, effects and operations. Springer-Verlag, Berlin.Landauer, R. (1961) Irreversibility and heat generation in the computing process.IBM J. Res. Dev. 5, 183–191. Reprinted in (Leff and Rex 1990, 2003).Lee, P. M. (1989) Bayesian statistics: an introduction. Edward Arnold, London.

60 References

Leff, H. S. and Rex, A. F. eds. (1990) Maxwell’s demon: entropy, information, com-

puting . Adam Hilger, Bristol.Leff, H. S. and Rex, A. F. (1994) Entropy of measurement and erasure: Szilard’smembrane model revisited. Am. J. Phys. 52, 3495–3499. Reprinted in (Leff andRex 2003).Leff, H. S. and Rex, A. F. (2003) Maxwell’s demon 2: entropy, classical and quantum

information, computing. Institute of Physics Publishing, Bristol.Loepp, S and Wootters, W. K. (2006) Protecting information: from classical error

correction to quantum cryptography. (Cambridge University Press, Cambridge).Mattle, K., Weinfurter, H., Kwiat, P. G. and Zeilinger, A. (1996) Dense coding inexperimental quantum communication. Phys. Rev. Lett. 76, 4656–4659.Maxwell, J. C. (1871) Theory of heat. Longmans, Green, and Co., London.Mcgrayne, S. B. (2011) The theory that would not die. Yale University Press, NewHaven.Mermin, D. (1990) What’s wrong with these elements of reality? Physics Today Juneissue 9–11Mermin, D. (2007) Quantum computer science Cambridge University Press, Cam-bridge.Moore, G. E. (1965) Cramming More Components onto Integrated Circuits Proc.

IEEE 86 82–85.Nielsen, M. A. and Chuang, I. L. (2000) Quantum computation and quantum infor-

mation Cambridge University Press, Cambridge.Pachos, J. K. (2012) Introduction to topological quantum computation CambridgeUniversity Press, Cambridge.Peres, A. (1988) How to differentiate between two non-orthogonal states. Phys. Lett.A 128, 19–19.Peres, A. (1993) Quantum theory: concepts and methods. Academic, Dordrecht.Plenio, M. and Vitelli, V. (2001) The physics of forgetting: Landauer’s erasure prin-ciple and information theory. Contemp. Phys. 42, 25–60.Phoenix, S. J. D. and Townsend, P. D. (1995) Quantum cryptography: how to beatthe code breakers using quantum mechanics. Contemp. Phys. 36, 165–195.Piper, F. and Murphy, S. (2002) Cryptography: a very short introduction. OxfordUniversity Press, Oxford.Raussendorf, R., Browne, D. E. and Briegel, H. J. (2003) Measurement-based quan-tum computation on cluster states. Phys. Rev. A 68, 022312.Redhead, M. (1987) Incompleteness, Nonlocality and Realism Clarendon Press, Ox-ford.Scarani, V., Iblisdir, S., Gisin, N. and Acın, A. (2005) Quantum cloning. Rev. Mod.

Phys. 77, 1225–1256.Scarani, V., Bechmann-Pasquinucci, H., Cerf, N. J., Dusek, M., Lutknehaus, N. andPeev, M. (2009) The security of practical quantum key distribution. Rev. Mod.

Phys. 81, 1301–1350.Shannon, C. E. (1948) A mathematical theory of communication. Bell Sys. Tech. J.27, 379–423 and 623–656.

References 61

Shannon, C. E. (1949) Communication theory of secrecy systems. Bell Sys. Tech. J.28, 656–715.Shannon, C. E. and Weaver, W. (1949) The mathematical theory of communication.University of Illinois Press, Urbana.Shor, P. W. (1997), Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer. SIAM J. Comput. 26, 1484–1509, arXiv:quant-ph/9508027v2Singh, S. (1999) The code book. Fourth Estate, London.Singh, S. (2000) The science of secrecy . Fourth Estate, London.Smith, R. J. (1983) Circuits, devices and systems (4th edn). Wiley, New York.Steane, A. M. Error correcting codes in quantum theory. Phys. Rev. Lett. 77 793–797.Stenholm, S. and Suominen, K.-A. (2005) Quantum approach to informatics. Wiley,Hoboken.Szillard, L. (1929) Uber die Entropieveminderung in einem thermodynamikschen Sys-tem bei eingriffen intelligenter Wesen. Zeits. Phys. 53, 840–856. Reprinted in trans-lation in (Wheeler and Zurek 1983; Leff and Rex 1990, 2003).Vaccaro, J. A. and Barnett, S. M. (2011) Information erasure without an energy cost.Proc. R. Soc. A 467, 1770–1778.Van Assche, G. (2006) Quantum cryptography and secret-key distillation. CambridgeUniversity Press, Cambridge.Vedral, V. (2006) Introduction to quantum information science Oxford UniversityPress, Oxford.von Neumann, J. (1955) Mathematical foundations of quantum mechanics. PrincetonUniversity Press, New Jersey.Wheeler, J. A. and Zurek, W. H. (eds) (1983) Quantum theory and measurement.Princeton University Press, New Jersey.Whitaker, A. (2006) Einstein, Bohr and the quantum dilemma (2nd edn). CambridgeUniversity Press, Cambridge.Wiesner, S. (1983) Conjugate coding. SIGACT News 15, 78–88.Wootters, W. K. and Zurek, W. H. (1982) A single quantum cannot be cloned. Nature299, 802–803.

introductiontoquantuminformation - university of glasgow · in these ﬁve lectures i shall give a...

Documents