1401.1219v1

34
Consciousness as a State of Matter Max Tegmark Dept. of Physics & MIT Kavli Institute, Massachusetts Institute of Technology, Cambridge, MA 02139 (Dated: January 8, 2014) We examine the hypothesis that consciousness can be understood as a state of matter, “perceptro- nium”, with distinctive information processing abilities. We explore five basic principles that may distinguish conscious matter from other physical systems such as solids, liquids and gases: the infor- mation, integration, independence, dynamics and utility principles. If such principles can identify conscious entities, then they can help solve the quantum factorization problem: why do conscious observers like us perceive the particular Hilbert space factorization corresponding to classical space (rather than Fourier space, say), and more generally, why do we perceive the world around us as a dynamic hierarchy of objects that are strongly integrated and relatively independent? Tensor factorization of matrices is found to play a central role, and our technical results include a theorem about Hamiltonian separability (defined using Hilbert-Schmidt superoperators) being maximized in the energy eigenbasis. Our approach generalizes Giulio Tononi’s integrated information framework for neural-network-based consciousness to arbitrary quantum systems, and we find interesting links to error-correcting codes, condensed matter criticality, and the Quantum Darwinism program, as well as an interesting connection between the emergence of consciousness and the emergence of time. I. INTRODUCTION What is the relation between the internal reality of your mind and the external reality described by the equa- tions of physics? The fact that no consensus answer to this question has emerged in the physics community lies at the heart of many of the most hotly debated issues in physics today. For example, how does quantum field theory with weak-field gravity explain the appearance of an approximately classical spacetime where experiments appear to have definite outcomes? Out of all of the pos- sible factorizations of Hilbert space, why is the particular factorization corresponding to classical space so special? Does the quantum wavefunction undergo a non-unitary collapse when an observation is made, or are there Ev- erettian parallel universes? Does the non-observability of spacetime regions beyond horizons imply that they in some sense do not exist independently of the regions that we can observe? If we understood consciousness as a physical phenomenon, we could in principle answer all of these questions by studying the equations of physics: we could identify all conscious entities in any physical sys- tem, and calculate what they would perceive. However, this approach is typically not pursued by physicists, with the argument that we do not understand consciousness well enough. In this paper, I argue that recent progress in neuro- science has fundamentally changed this situation, and that we physicists can no longer blame neuroscientists for our own lack of progress. I have long contended that consciousness is the way information feels when be- ing processed in certain complex ways [1, 2], i.e., that it corresponds to certain complex patterns in spacetime that obey the same laws of physics as other complex sys- tems, with no “secret sauce” required. In the seminal pa- per “Consciousness as Integrated Information: a Provi- sional Manifesto” [3], Giulio Tononi made this idea more specific and useful, making a compelling argument that for an information processing system to be conscious, it needs to have two separate traits: 1. Information: It has to have a large repertoire of accessible states, i.e., the ability to store a large amount of information. 2. Integration: This information must be integrated into a unified whole, i.e., it must be impossible to decompose the system into nearly independent parts. Tononi’s work has generated a flurry of activity in the neuroscience community, spanning the spectrum from theory to experiment (see [4, 5] for recent reviews), mak- ing it timely to investigate its implications for physics as well. This is the goal of the present paper — a goal whose pursuit may ultimately provide additional tools for the neuroscience community as well. A. Consciousness as a state of matter Generations of physicists and chemists have studied what happens when you group together vast numbers of atoms, finding that their collective behavior depends on the pattern in which they are arranged: the key differ- ence between a solid, a liquid and a gas lies not in the types of atoms, but in their arrangement. In this pa- per, I conjecture that consciousness can be understood as yet another state of matter. Just as there are many types of liquids, there are many types of consciousness. However, this should not preclude us from identifying, quantifying, modeling and ultimately understanding the characteristic properties that all liquid forms of matter (or all conscious forms of matter) share. To classify the traditionally studied states of matter, we need to measure only a small number of physical pa- arXiv:1401.1219v1 [quant-ph] 6 Jan 2014

Upload: luke-andrews

Post on 25-Nov-2015

4 views

Category:

Documents


0 download

DESCRIPTION

1401.1219v1

TRANSCRIPT

  • Consciousness as a State of Matter

    Max TegmarkDept. of Physics & MIT Kavli Institute, Massachusetts Institute of Technology, Cambridge, MA 02139

    (Dated: January 8, 2014)

    We examine the hypothesis that consciousness can be understood as a state of matter, perceptro-nium, with distinctive information processing abilities. We explore five basic principles that maydistinguish conscious matter from other physical systems such as solids, liquids and gases: the infor-mation, integration, independence, dynamics and utility principles. If such principles can identifyconscious entities, then they can help solve the quantum factorization problem: why do consciousobservers like us perceive the particular Hilbert space factorization corresponding to classical space(rather than Fourier space, say), and more generally, why do we perceive the world around us asa dynamic hierarchy of objects that are strongly integrated and relatively independent? Tensorfactorization of matrices is found to play a central role, and our technical results include a theoremabout Hamiltonian separability (defined using Hilbert-Schmidt superoperators) being maximized inthe energy eigenbasis. Our approach generalizes Giulio Tononis integrated information frameworkfor neural-network-based consciousness to arbitrary quantum systems, and we find interesting linksto error-correcting codes, condensed matter criticality, and the Quantum Darwinism program, aswell as an interesting connection between the emergence of consciousness and the emergence of time.

    I. INTRODUCTION

    What is the relation between the internal reality ofyour mind and the external reality described by the equa-tions of physics? The fact that no consensus answer tothis question has emerged in the physics community liesat the heart of many of the most hotly debated issuesin physics today. For example, how does quantum fieldtheory with weak-field gravity explain the appearance ofan approximately classical spacetime where experimentsappear to have definite outcomes? Out of all of the pos-sible factorizations of Hilbert space, why is the particularfactorization corresponding to classical space so special?Does the quantum wavefunction undergo a non-unitarycollapse when an observation is made, or are there Ev-erettian parallel universes? Does the non-observabilityof spacetime regions beyond horizons imply that theyin some sense do not exist independently of the regionsthat we can observe? If we understood consciousness as aphysical phenomenon, we could in principle answer all ofthese questions by studying the equations of physics: wecould identify all conscious entities in any physical sys-tem, and calculate what they would perceive. However,this approach is typically not pursued by physicists, withthe argument that we do not understand consciousnesswell enough.

    In this paper, I argue that recent progress in neuro-science has fundamentally changed this situation, andthat we physicists can no longer blame neuroscientistsfor our own lack of progress. I have long contendedthat consciousness is the way information feels when be-ing processed in certain complex ways [1, 2], i.e., thatit corresponds to certain complex patterns in spacetimethat obey the same laws of physics as other complex sys-tems, with no secret sauce required. In the seminal pa-per Consciousness as Integrated Information: a Provi-sional Manifesto [3], Giulio Tononi made this idea morespecific and useful, making a compelling argument that

    for an information processing system to be conscious, itneeds to have two separate traits:

    1. Information: It has to have a large repertoire ofaccessible states, i.e., the ability to store a largeamount of information.

    2. Integration: This information must be integratedinto a unified whole, i.e., it must be impossibleto decompose the system into nearly independentparts.

    Tononis work has generated a flurry of activity in theneuroscience community, spanning the spectrum fromtheory to experiment (see [4, 5] for recent reviews), mak-ing it timely to investigate its implications for physics aswell. This is the goal of the present paper a goal whosepursuit may ultimately provide additional tools for theneuroscience community as well.

    A. Consciousness as a state of matter

    Generations of physicists and chemists have studiedwhat happens when you group together vast numbers ofatoms, finding that their collective behavior depends onthe pattern in which they are arranged: the key differ-ence between a solid, a liquid and a gas lies not in thetypes of atoms, but in their arrangement. In this pa-per, I conjecture that consciousness can be understoodas yet another state of matter. Just as there are manytypes of liquids, there are many types of consciousness.However, this should not preclude us from identifying,quantifying, modeling and ultimately understanding thecharacteristic properties that all liquid forms of matter(or all conscious forms of matter) share.

    To classify the traditionally studied states of matter,we need to measure only a small number of physical pa-

    arX

    iv:1

    401.

    1219

    v1 [

    quan

    t-ph]

    6 Ja

    n 201

    4

  • 2ManyState of long-lived Information Easily Complex?matter states? integrated? writable? dynamics?Gas N N N YLiquid N N N YSolid Y N N NMemory Y N Y NComputer Y ? Y YConsciousness Y Y Y Y

    TABLE I: Substances that store or process information canbe viewed as novel states of matter and investigated withtraditional physics tools.

    rameters: viscosity, compressibility, electrical conductiv-ity and (optionally) diffusivity. We call a substance asolid if its viscosity is effectively infinite (producing struc-tural stiffness), and call it a fluid otherwise. We calla fluid a liquid if its compressibility and diffusivity aresmall and otherwise call it either a gas or a plasma, de-pending on its electrical conductivity.

    What are the corresponding physical parameters thatcan help us identify conscious matter, and what are thekey physical features that characterize it? If such param-eters can be identified, understood and measured, thiswill help us identify (or at least rule out) consciousnessfrom the outside, without access to subjective intro-spection. This could be important for reaching consen-sus on many currently controversial topics, ranging fromthe future of artificial intelligence to determining whenan animal, fetus or unresponsive patient can feel pain.If would also be important for fundamental theoreticalphysics, by allowing us to identify conscious observersin our universe by using the equations of physics andthereby answer thorny observation-related questions suchas those mentioned in the introductory paragraph.

    B. Memory

    As a first warmup step toward consciousness, let usfirst consider a state of matter that we would character-ize as memory what physical features does it have?For a substance to be useful for storing information, itclearly needs to have a large repertoire of possible long-lived states or attractors (see Table I). Physically, thismeans that its potential energy function has a large num-ber of well-separated minima. The information storagecapacity (in bits) is simply the base-2 logarithm of thenumber of minima. This equals the entropy (in bits)of the degenerate ground state if all minima are equallydeep. For example, solids have many long-lived states,whereas liquids and gases do not: if you engrave some-ones name on a gold ring, the information will still bethere years later, but if you engrave it in the surface of apond, it will be lost within a second as the water surfacechanges its shape. Another desirable trait of a memory

    substance, distinguishing it from generic solids, is that itis not only easy to read from (as a gold ring), but alsoeasy to write to: altering the state of your hard drive oryour synapses requires less energy than engraving gold.

    C. Computronium

    As a second warmup step, what properties should weascribe to what Margolus and Toffoli have termed com-putronium [6], the most general substance that can pro-cess information as a computer? Rather than just re-main immobile as a gold ring, it must exhibit complexdynamics so that its future state depends in some com-plicated (and hopefully controllable/programmable) wayon the present state. Its atom arrangement must beless ordered than a rigid solid where nothing interest-ing changes, but more ordered than a liquid or gas. Atthe microscopic level, computronium need not be par-ticularly complicated, because computer scientists havelong known that as long as a device can perform certainelementary logic operations, it is universal: it can be pro-grammed to perform the same computation as any othercomputer with enough time and memory. Computervendors often parametrize computing power in FLOPS,floating-point operations per second for 64-bit numbers;more generically, we can parametrize computronium ca-pable of universal computation by FLIPS: the numberof elementary logical operations such as bit flips that itcan perform per second. It has been shown by Lloyd[7] that a system with average energy E can perform amaximum of 4E/h elementary logical operations per sec-ond, where h is Plancks constant. The performance oftodays best computers is about 38 orders of magnitudelower than this, because they use huge numbers of parti-cles to store each bit and because most of their energy istied up in a computationally passive form, as rest mass.

    D. Perceptronium

    What about perceptronium, the most general sub-stance that feels subjectively self-aware? If Tononi isright, then it should not merely be able to store and pro-cess information like computronium does, but it shouldalso satisfy the principle that its information is inte-grated, forming a unified and indivisible whole.

    Let us also conjecture another principle that conscioussystems must satisfy: that of autonomy, i.e., that in-formation can be processed with relative freedom fromexternal influence. Autonomy is thus the combinationof two separate properties: dynamics and independence.Here dynamics means time dependence (hence informa-tion processing capacity) and independence means thatthe dynamics is dominated by forces from within ratherthan outside the system. Just like integration, autonomyis postulated to be a necessary but not sufficient condi-tion for a system to be conscious: for example, clocks

  • 3Principle DefinitionInformation A conscious system has substantial

    principle information storage capacity.Dynamics A conscious system has substantial

    principle information processing capacity.Independence A conscious system has substantial

    principle independence from the rest of the world.Integration A conscious system cannot consist of

    principle nearly independent parts.Utility A conscious system records mainly

    principle information that is useful for it.Autonomy A conscious system has substantial

    principle dynamics and independence.

    TABLE II: Conjectured necessary conditions for conscious-ness that we explore in this paper. The last one simply com-bines the second and third.

    and diesel generators tend to exhibit high autonomy, butlack substantial information storage capacity.

    E. Consciousness and the quantum factorizationproblem

    Table II summarizes the candidate principles that wewill explore as necessary conditions for consciousness.Our goal with isolating and studying these principles isnot merely to strengthen our understanding of conscious-ness as a physical process, but also to identify simpletraits of conscious matter that can help us tackle otheropen problems in physics. For example, the only propertyof consciousness that Hugh Everett needed to assume forhis work on quantum measurement was that of the infor-mation principle: by applying the Schrodinger equationto systems that could record and store information, heinferred that they would perceive subjective randomnessin accordance with the Born rule. In this spirit, we mighthope that adding further simple requirements such as inthe integration principle, the independence principle andthe dynamics principle might suffice to solve currentlyopen problems related to observation.

    In this paper, we will pay particular attention to whatI will refer to as the quantum factorization problem:why do conscious observers like us perceive the particu-lar Hilbert space factorization corresponding to classicalspace (rather than Fourier space, say), and more gener-ally, why do we perceive the world around us as a dy-namic hierarchy of objects that are strongly integratedand relatively independent? This fundamental problemhas received almost no attention in the literature [9]. Wewill see that this problem is very closely related to theone Tononi confronted for the brain, merely on a largerscale. Solving it would also help solve the physics-from-scratch problem [2]: If the Hamiltonian H and the totaldensity matrix fully specify our physical world, howdo we extract 3D space and the rest of our semiclassicalworld from nothing more than two Hermitean matrices,

    which come without any a priori physical interpretationor additional structure such as a physical space, quan-tum observables, quantum field definitions, an outsidesystem, etc.? Can some of this information be extractedeven from H alone, which is fully specified by nothingmore than its eigenvalue spectrum? We will see that ageneric Hamiltonian cannot be decomposed using tensorproducts, which would correspond to a decomposition ofthe cosmos into non-interacting parts instead, there isan optimal factorization of our universe into integratedand relatively independent parts. Based on Tononiswork, we might expect that this factorization, or somegeneralization thereof, is what conscious observers per-ceive, because an integrated and relatively autonomousinformation complex is fundamentally what a consciousobserver is!

    The rest of this paper is organized as follows. In Sec-tion II, we explore the integration principle by quanti-fying integrated information in physical systems, findingencouraging results for classical systems and interestingchallenges introduced by quantum mechanics. In Sec-tion III, we explore the independence principle, findingthat at least one additional principle is required to ac-count for the observed factorization of our physical worldinto an object hierarchy in three-dimensional space. InSection IV, we explore the dynamics principle and otherpossibilities for reconciling quantum-mechanical theorywith our observation of a semiclassical world. We discussour conclusions in Section V, including applications ofthe utility principle, and cover various mathematical de-tails in the three appendices. Throughout the paper, wemainly consider finite Hilbert spaces that can be viewedas collections of qubits; as explained in Appendix C, thisappears to cover standard quantum field theory with itsinfinite Hilbert space as well.

    II. INTEGRATION

    A. Our physical world as an object hierarchy

    One of the most striking features of our physical worldis that we perceive it as an object hierarchy, as illustratedin Figure 1. If you are enjoying a cold drink, you per-ceive ice cubes in your glass as separate objects becausethey are both fairly integrated and fairly independent,e.g., their parts are more strongly connected to one an-other than to the outside. The same can be said abouteach of their constituents, ranging from water moleculesall the way down to electrons and quarks. Let us quan-tify this by defining the robustness of an object as theratio of the integration temperature (the energy per partneeded to separate them) to the independence tempera-ture (the energy per part needed to separate the parentobject in the hierarchy). Figure 1 illustrates that all ofthe ten types of objects shown have robustness of tenor more. A highly robust object preserves its identity

  • 4Object: Oxygen atomRobustness: 10Independence T: 1 eVIntegration T: 10 eV

    Object: Oxygen nucleusRobustness: 105Independence T: 10 eVIntegration T: 1 MeV

    Object: ProtonRobustness: 200Independence T: 1 MeVIntegration T: 200 MeV

    Object: NeutronRobustness: 200Independence T: 1 MeVIntegration T: 200 MeV

    Object: ElectronRobustness: 1022?Independence T: 10 eVIntegration T: 1016 GeV?

    Object: Down quarkRobustness: 1017?Independence T: 200 MeVIntegration T: 1016 GeV?

    Object: Up quarkRobustness: 1017?Independence T: 200 MeVIntegration T: 1016 GeV?

    Object: Hydrogen atomRobustness: 10Independence T: 1 eVIntegration T: 10 eV

    Object: Ice cubeRobustness: 105Independence T: 3 mKIntegration T: 300 K

    Object: Water moleculeRobustness: 40Independence T: 300 KIntegration T: 1 eV

    {mgh/kB~3mK permolecule

    FIG. 1: We perceive the external world as a hierarchy of objects, whose parts are more strongly connected to one anotherthan to the outside. The robustness of an object is defined as the ratio of the integration temperature (the energy per partneeded to separate them) to the independence temperature (the energy per part needed to separate the parent object in thehierarchy).

    (its integration and independence) over a wide range oftemperatures/energies/situations. The more robust anobject is, the more useful it is for us humans to perceiveit as an object and coin a name for it, as per the above-mentioned utility principle.

    Returning to the physics-from-scratch problem, howcan we identify this object hierarchy if all we have to startwith are two Hermitean matrices, the density matrix encoding the state of our world and the Hamiltonian Hdetermining its time-evolution? Imagine that we knowonly these mathematical objects and H and have noinformation whatsoever about how to interpret the var-ious degrees of freedom or anything else about them. Agood beginning is to study integration. Consider, for

    example, and H for a single deuterium atom, whoseHamiltonian is (ignoring spin interactions for simplicity)

    H(rp,pp, rn,pn, re,pe) = (1)

    = H1(rp,pp, rn,pn) + H2(pe) + H3(rp,pp, rn,pn, re,pe),

    where r and p are position and momentum vectors, andthe subscripts p, n and e refer to the proton, the neutronand the electron. On the second line, we have decom-posed H into three terms: the internal energy of theproton-neutron nucleus, the internal (kinetic) energy ofthe electron, and the electromagnetic electron-nucleus in-teraction. This interaction is tiny, on average involving

  • 5much less energy than those within the nucleus:

    tr H3

    tr H1 105, (2)

    which we recognize as the inverse robustness for a typicalnucleus in Figure 3. We can therefore fruitfully approx-imate the nucleus and the electron as separate objectsthat are almost independent, interacting only weaklywith one another. The key point here is that we couldhave performed this object-finding exercise of dividingthe variables into two groups to find the greatest indepen-dence (analogous to what Tononi calls the cruelest cut)based on the functional form of H alone, without evenhaving heard of electrons or nuclei, thereby identifyingtheir degrees of freedom through a purely mathematicalexercise.

    B. Integration and mutual information

    If the interaction energy H3 were so small that wecould neglect it altogether, then H would be decompos-able into two parts H1 and H2, each one acting on onlyone of the two sub-systems (in our case the nucleus andthe electron). This means that any thermal state wouldbe factorizable:

    eH/kT = eH1/kT eH2/kT = 12, (3)

    so the total state can be factored into a product ofthe subsystem states 1 and 2. In this case, the mutualinformation

    I S(1) + S(2) S() (4)

    vanishes, where

    S() tr log2 (5)

    is the Shannon entropy (in bits). Even for non-thermalstates, the time-evolution operator U becomes separable:

    U eiHt/~ = eiH1t/~eiH2t/~ = U1U2, (6)

    which (as we will discuss in detail in Section III) impliesthat the mutual information stays constant over time andno information is ever exchanged between the objects. Insummary, if a Hamiltonian can be decomposed withoutan interaction term (with H3 = 0), then it describes twoperfectly independent systems.

    Let us now consider the opposite case, when a sys-tem cannot be decomposed into independent parts. Letus define the integrated information as the mutual in-formation I for the cruelest cut (the cut minimizingI) in some class of cuts that subdivide the system intotwo (we will discuss many different classes of cuts be-low). Although our -definition is slightly different from

    Tononis [3]1, it is similar in spirit, and we are reusing his-symbol for its elegant symbolism (unifying the shapesof I for information and O for integration).

    C. Maximizing integration

    We just saw that if two systems are dynamically inde-pendent (H3 = 0), then = 0 at all time both for ther-mal states and for states that were independent ( = 0)at some point in time. Let us now consider the oppo-site extreme. How large can the integrated information get? A as warmup example, let us consider the fa-miliar 2D Ising model in Figure 2 where n = 2500 mag-netic dipoles (or spins) that can point up or down areplaced on a square lattice, and H is such that they pre-fer aligning with their nearest neighbors. When T , eH/kT I, so all n states are equally likely, alln bits are statistically independent, and = 0. WhenT 0, all states freeze out except the two degenerateground states (all spin up or all spin down), so all spinsare perfectly correlated and = 1 bit. For interme-diate temperatures, long-range correlations are seen toexist such that typical states have contiguous spin-up orspin-down patches. On average, we get about one bit ofmutual information for each such patch crossing our cut(since a spin on one side knows about at a spin on theother side), so for bipartitions that cut the system intotwo equally large halves, the mutual information will beproportional to the length of the cutting curve. The cru-elest cut is therefore a vertical or horizontal straight lineof length n1/2, giving n1/2 at the temperature wheretypical patches are only a few pixels wide. We would sim-ilarly get a maximum integration n1/3 for a 3D Isingsystem and 1 bit for a 1D Ising system.

    Since it is the spatial correlations that provide the in-tegration, it is interesting to speculate about whetherthe conscious subsystem of our brain is a system near itscritical temperature, close to a phase transition. Indeed,Damasio has argued that to be in homeostasis, a num-ber of physical parameters of our brain need to be keptwithin a narrow range of values [10] this is preciselywhat is required of any condensed matter system to benear-critical, exhibiting correlations that are long-range(providing integration) but not so strong that the wholesystem becomes correlated like in the right panel or in abrain experiencing an epileptic seizure.

    1 Tononis definition of [3] applies only for classical systems,whereas we wish to study the quantum case as well. Our ismeasured in bits and can grow with system size like an extrinsicvariable, whereas his is an intrinsic variable akin representing asort of average integration per bit.

  • 6Morecorrelation

    Too little Too muchOptimum UniformRandom

    Lesscorrelation

    FIG. 2: The panels show simulations of the 2D Ising model on a 50 50 lattice, with the temperature progressively decreasingfrom left to right. The integrated information drops to zero bits at T (leftmost panel) and to one bit as T 0(rightmost panel), taking a maximum at an intermediate temperature near the phase transition temperature.

    D. Integration, coding theory and error correction

    Bits cut off

    Inte

    grat

    ed in

    form

    atio

    n Hamming (8,4)-code

    (16 8-bit strings)

    16 random 8-bit strings

    128 8-bit stringswith checksum bit

    2 4 6 8

    1

    2

    3

    4

    FIG. 3: For various 8-bit systems, the integrated informationis plotted as a function of the number of bits cut off into asub-system with the cruelest cut. The Hamming (8,4)-codeis seen to give classically optimal integration except for a bi-partition into 4 + 4 bits: an arbitrary subset containing nomore than three bits is completely determined by the remain-ing bits. The code consisting of the half of all 8-bit stringswhose bit sum is even (i.e., each of the 128 7-bit strings fol-lowed by a parity checksum bit) has Hamming distance d = 2and gives = 1 however many bits are cut off. A random setof 16 8-bit strings is seen to outperform the Hamming (8,4)-code for 4+4-bipartitions, but not when fewer bits are cut off.

    Even when we tuned the temperature to the most fa-vorable value in our 2D Ising model example, the inte-grated information never exceeded n1/2 bits, whichis merely a fraction n1/2 of the n bits of informationthat n spins can potentially store. So can we do better?Fortunately, a closely related question has been carefullystudied in the branch of mathematics known as codingtheory, with the aim of optimizing error correcting codes.Consider, for example, the following set of m = 16 bit

    strings, each written as a column vector of length n = 8:

    M =

    0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 10 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 1 0 1 0 0 1 0 1 1 0 1 0 0 10 1 0 1 0 1 0 1 1 0 1 0 1 0 1 00 1 0 1 1 0 1 0 0 1 0 1 1 0 1 00 0 1 1 0 0 1 1 1 1 0 0 1 1 0 00 0 1 1 1 1 0 0 0 0 1 1 1 1 0 00 1 1 0 0 1 1 0 1 0 0 1 1 0 0 1

    This is known as the Hamming(8,4)-code, and has Ham-ming distance d = 4, which means that at least 4 bitflips are required to change one string into another [11].It is easy to see that for a code with Hamming distanced, any (d 1) bits can always be reconstructed from theothers: You can always reconstruct b bits as long as eras-ing them does not make two bit strings identical, whichwould cause ambiguity about which the correct bit stringis. This implies that reconstruction works when the Ham-ming distance d > b.

    To translate such codes of m bit strings of length ninto physical systems, we simply created a state spacewith n bits (interpretable as n spins or other two-statesystems) and construct a Hamiltonian which has an m-fold degenerate ground state, with one minimum cor-responding to each of the m bit strings in the code.In the low-temperature limit, all bit strings will re-ceive the same probability weight 1/m, giving an entropyS = log2m. The corresponding integrated information of the ground state is plotted in Figure 3 for a fewexamples, as a function of cut size k (the number of bitsassigned to the first subsystem). To calculate for a cutsize k in practice, we simply minimize the mutual infor-mation I over all

    (nk

    )ways of partitioning the n bits into

    k and (n k) bits.We see that, as advertised, the Hamming(8,4)-code

    gives gives = 3 when 3 bits are cut off. However,it gives only = 2 for bipartitions; the -value for bi-

  • 7Bits cut off

    Inte

    grat

    ed in

    form

    atio

    n

    256 r

    andom

    16-bi

    t word

    s

    5 10 15

    2

    4

    6

    8

    128 ra

    ndom

    14-bit

    words

    64 12-

    bit wo

    rds

    32 10-

    bit wo

    rds

    16 8-bit w

    ords

    8 6-bit wo

    rds

    4 4-bit words

    FIG. 4: Same as for previous figure, but for random codeswith progressively longer bit strings, consisting of a randomsubset containing

    2n of the 2n possible bit strings. For

    better legibility, the vertical axis has been re-centered for theshorter codes.

    partitions is not simply related to the Hamming distance,and is not a quantity that most popular bit string codesare optimized for. Indeed, Figure 3 shows that for bipar-titions, it underperforms a code consisting of 16 randomunique bit strings of the same length. A rich and diverseset of codes have been published in the literature, andthe state-of-the-art in terms of maximal Hamming dis-tance for a given n is continually updated [12]. Althoughcodes with arbitrarily large Hamming distance d exist,there is (just as for our Hamming(8,4)-example above)no guarantee that will be as large as d 1 when thesmaller of the two subsystems contains more than d bits.Moreover, although Reed-Solomon codes are sometimesbilled as classically optimal erasure codes (maximizingd for a given n), their fundamental units are generallynot bits but groups of bits (generally numbers modulosome prime number), and the optimality is violated if wemake cuts that do not respect the boundaries of these bitgroups.

    Although further research on codes maximizing would be of interest, it is worth noting that simple ran-dom codes appear to give -values within a couple of bitsof the theoretical maximum in the limit of large n, as il-lustrated in Figure 4. When cutting off k out of n bits,the mutual information in classical physics clearly can-not exceed the number of bits in either subsystem, i.e., kand n k, so the -curve for a code must lie within theshaded triangle in the figure. (The quantum-mechanicalcase is more complicated, and we well see in the next sec-tion that it in a sense integrates both better and worse.)The codes for which the integrated information is plottedsimply consist of a random subset containing 2n/2 of the2n possible bit strings, so roughly speaking, half the bitsencode fresh information and the other half provide theredundancy giving near-perfect integration.

    Just as we saw for the Ising model example, these ran-dom codes show a tradeoff between entropy and redun-dancy, as illustrated in Figure 5. When there are n bits,how many of the 2n possible bit strings should we use

    2-logarithm of number of patterns used

    Inte

    grat

    ed in

    form

    atio

    n (b

    its)

    2 4 6 8 10 12 14

    1

    2

    3

    4

    5

    6

    7

    FIG. 5: The integrated information is shown for randomcodes using progressively larger random subsets of the 214

    possible strings of 14 bits. The optimal choice is seen to beusing about 27 bit strings, i.e., using about half the bits toencode information and the other half to integrate it.

    to maximize the integrated information ? If we use mof them, we clearly have log2m, since in classicalphysics, cannot exceed the entropy if the system (themutual information is I = S1 + S2 S, where S1 Sand S2 S so I S). Using very few bit strings istherefore a bad idea. On the other hand, if we use all2n of them, we lose all redundancy, the bits become in-dependent, and = 0, so being greedy and using toomany bit strings in an attempt to store more informa-tion is also a bad idea. Figure 5 shows that the optimaltradeoff is to use

    2n of the codewords, i.e., to use half

    the bits to encode information and the other half to in-tegrate it. Taken together, the last two figures thereforesuggest that n physical bits can be used to provide aboutn/2 bits of integrated information in the large-n limit.

    E. Integration in physical systems

    Let us explore the consequences of these results forphysical systems described by a Hamiltonian H and astate . As emphasized by Hopfield [13], any physicalsystem with multiple attractors can be viewed as an in-formation storage device, since its state permanently en-codes information about which attractor it belongs to.Figure 6 shows two examples of H interpretable as po-tential energy functions for a a single particle in two di-mensions. They can both be used as information storagedevices, by placing the particle in a potential well andkeeping the system cool enough that the particle staysin the same well indefinitely. The egg crate potentialV (x, y) = sin2(pix) sin2(piy) (top) has 256 minima andhence a ground state entropy (information storage ca-pacity) S = 8 bits, whereas the lower potential has only16 minima and S = 4 bits.

    The basins of attraction in the top panel are seen tobe the squares shown in the bottom panel. If we writethe x and y coordinates as binary numbers with bbits each, then the first 4 bits of x and y encode which

  • 8FIG. 6: A particle in the egg-crate potential energy land-scape (top panel) stably encodes 8 bits of information thatare completely independent of one another and therefore notintegrated. In contrast, a particle in a Hamming(8,4) poten-tial (bottom panel) encodes only 4 bits of information, butwith excellent integration. Qualitatively, a hard drive is morelike the top panel, while a neural network is more like thebottom panel.

    square (x, y) is in. The information in the remainingbits encodes the location within this square; these bitsare not useful for information storage because they canvary over time, as the particle oscillates around a mini-mum. If the system is actively cooled, these oscillationsare gradually damped out and the particle settles towardthe attractor solution at the minimum, at the center of itsbasin. This example illustrates that cooling is a physicalexample of error correction: if thermal noise adds smallperturbations to the particle position, altering the leastsignificant bits, then cooling will remove these perturba-tions and push the particle back towards the minimumit came from. As long as cooling keeps the perturbationssmall enough that the particle never rolls out of its basinof attraction, all the 8 bits of information encoding itsbasin number are perfectly preserved. Instead of inter-preting our n = 8 data bits as positions in two dimen-sions, we can interpret them as positions in n dimensions,where each possible state corresponds to a corner of then-dimensional hypercube. This captures the essence ofmany computer memory devices, where each bit is storedin a system with two degenerate minima; the least sig-nificant and redundant bits that can be error-correctedvia cooling now get equally distributed among all the di-mensions.

    How integrated is the information S? For the top panelof Figure 6, not at all: H can be factored as a tensorproduct of 8 two-state systems, so = 0, just as fortypical computer memory. In other words, if the particleis in a particular egg crate basin, knowing any one of the

    bits specifying the basin position tells us nothing aboutthe other bits. The potential in the lower panel, on theother hand, gives good integration. This potential retainsonly 16 of the 256 minima, corresponding to the 16 bitstrings of the Hamming(8,4)-code, which as we saw gives = 3 for any 3 bits cut off and = 2 bits for symmetricbipartitions. Since the Hamming distance d = 4 for thiscode, at least 4 bits must be flipped to reach anotherminimum, which among other things implies that no twobasins can share a row or column.

    F. The pros and cons of integration

    Natural selection suggests that self-reproducinginformation-processing systems will evolve integration ifit is useful to them, regardless of whether they are con-scious or not. Error correction can obviously be use-ful, both to correct errors caused by thermal noise andto provide redundancy that improves robustness towardfailure of individual physical components such as neu-rons. Indeed, such utility explains the preponderanceof error correction built into human-developed devices,from RAID-storage to bar codes to forward error cor-rection in telecommunications. If Tononi is correct andconsciousness requires integration, then this raises an in-teresting possibility: our human consciousness may haveevolved as an accidental by-product of error correction.There is also empirical evidence that integration is usefulfor problem-solving: artificial life simulations of vehiclesthat have to traverse mazes and whose brains evolve bynatural selection show that the more adapted they are totheir environment, the higher the integrated informationof the main complex in their brain [14].

    However, integration comes at a cost, and as we willnow see, near maximal integration appears to be pro-hibitively expensive. Let us distinguish between the max-imum amount of information that can be stored in a statedefined by and the maximum amount of informationthat can be stored in a physical system defined by H. Theformer is simply S() for the perfectly mixed (T = )state, i.e., log2 of the number of possible states (the num-ber of bits characterizing the system). The latter canbe much larger, corresponding to log2 of the number ofHamiltonians that you could distinguish between givenyour time and energy available for experimentation. Letus consider potential energy functions whose k differentminima can be encoded as bit strings (as in Figure 6),and let us limit our experimentation to finding all theminima. Then H encodes not a single string of n bits,but a subset consisting of k out of all 2n such strings, one

    for each minimum. There are(2n

    k

    )such subsets, so the

    information contained in H is

  • 9S(H) = log2

    (2n

    k

    )= log2

    2n!

    k!(2n k)!

    log2(2n)k

    kk= k(n log2 k) (7)

    for k 2n, where we used Stirlings approximationk! kk. So crudely speaking, H encodes not n bitsbut kn bits. For the near-maximal integration givenby the random codes from the previous section, we hadk = 2n/2, which gives S(H) 2n/2 n2 bits. For example,if the n 1011 neurons in your brain were maximallyintegrated in this way, then your neural network wouldrequire a dizzying 1010000000000 bits to describe, vastlymore information than can be encoded by all the 1089

    particles in our universe combined.The neuronal mechanisms of human memory are still

    unclear despite intensive experimental and theoretical ex-plorations [15], but there is significant evidence that thebrain uses attractor dynamics in its integration and mem-ory functions, where discrete attractors may be used torepresent discrete items [16]. The classic implementa-tion of such dynamics as a simple symmetric and asyn-chronous Hopfield neural network [13] can be conve-niently interpreted in terms of potential energy func-tions: the equations of the continuous Hopfield networkare identical to a set of mean-field equations that mini-mize a potential energy function, so this network alwaysconverges to a basin of attraction [17]. Such a Hopfieldnetwork gives a dramatically lower information contentS(H) of only about 0.25 bits for per synapse[17], andwe have only about 1014 synapses, suggesting that ourbrains can store only on the order of a few Terabytes ofinformation.

    The integrated information of a Hopfield network iseven lower. For a Hopfield network of n neurons, thetotal number of attractors is bounded by 0.14n [17],so the maximum information capacity is merely S log2 0.14n log2 n 37 bits for n = 1011 neurons. Evenin the most favorable case where these bits are maxi-mally integrated, our 1011 neurons thus provide a measly 37 bits of integrated information, as opposed toabout 5 1010 bits for a random coding.

    G. The integration paradox

    This leaves us with an integration paradox: why doesthe information content of our conscious experience ap-pear to be vastly larger than 37 bits? If Tononis informa-tion and integration principles from Section I are correct,the integration paradox forces us to draw at least one ofthe following three conclusions:

    1. Our brains use some more clever scheme for encod-ing our conscious bits of information, which allowsdramatically larger than Hopfield networks.

    2. These conscious bits are much fewer than we mightnaively have thought from introspection, implyingthat we are only able to pay attention to a verymodest amount of information at any instant.

    3. To be relevant for consciousness, the definition ofintegrated information that we have used must bemodified or supplemented by at least one additionalprinciple.

    We will see that the quantum results in the next sectionbolster the case for conclusion 3.

    The fundamental reason why a Hopfield network isspecified by much less information than a near-maximallyintegrated network is that it involves only pairwise cou-plings between neurons, thus requiring only n2 cou-pling parameters to be specified as opposed to 2n pa-rameters giving the energy for each of the 2n possiblestates. It is striking how H is similarly simple for thestandard model of particle physics, with the energy in-volving only sums of pairwise interactions between parti-cles supplemented with occasional 3-way and 4-way cou-plings. H for the brain and H for fundamental physicsthus both appear to belong to an extremely simple sub-class of all Hamiltonians, that require an unusually smallamount of information to describe. Just as a system im-plementing near-maximal integration via random codingis too complicated to fit inside the brain, it is also toocomplicated to work in fundamental physics: Since theinformation storage capacity S of a physical system isapproximately bounded by its number of particles [7] orby its area in Planck units by the Holographic principle[8], it cannot be integrated by physical dynamics that it-self requires storage of the exponentially larger informa-tion quantity S(H) 2S/2 S2 unless the Standard ModelHamiltonian is replaced by something dramatically morecomplicated.

    An interesting theoretical direction for further research(pursuing resolution 1 to the integration paradox) istherefore to investigate what maximum amount of in-tegrated information can be feasibly stored in a physi-cal system using codes that are algorithmic (such as RS-codes) rather than random. An interesting experimentaldirection would be to search for concrete implementa-tions of error-correction algorithms in the brain.

    In summary, we have explored the integration prin-ciple by quantifying integrated information in physicalsystems. We have found that although excellent integra-tion is possible in principle, it is more difficult in prac-tice. In theory, random codes provide nearly maximalintegration, with about half of all n bits coding for dataand the other half providing n bits of integration),but in practice, the dynamics required for implement-ing them is too complex for our brain or our universe.Most of our exploration has focused on classical physics,where cuts into subsystems have corresponded to parti-tions of classical bits. As we will see in the next section,finding systems encoding large amounts of integrated in-formation is even more challenging when we turn to the

  • 10

    0.5 1.0 1.5 2.0

    0.5

    1.0

    1.5

    2.0

    Entropy S

    Mut

    ual i

    nfor

    mat

    ion

    I

    Possible onlyquantum-mechanically

    (entanglement)

    Possibleclassically

    Bell pair

    ( ).4.4.2.0( ).91.03.03

    .03

    ( )1/31/31/30

    ( )1000

    ( ).7.1.1.1

    ( )1/2001/2

    ( )1/21/200

    ( )1/41/41/41/4

    ( ).3.3.3.1

    Quantum integrated

    Unitary transform

    ation

    Unitary transformation

    FIG. 7: Mutual information versus entropy for various 2-bitsystems. The different dots, squares and stars correspondto different states, which in the classical cases are definedby the probabilities for the four basis states 00, 01 10 and11. Classical states can lie only in the pyramid below theupper black star with (S, I) = (1, 1), whereas entanglementallows quantum states to extend all the way up to the upperblack square at (0, 2). However, the integrated information for a quantum state cannot lie above the shaded green/greyregion, into which any other quantum state can be broughtby a unitary transformation. Along the upper boundary ofthis region, either three of the four probabilities are equal, orto two of them are equal while one vanishes.

    quantum-mechanical case.

    III. INDEPENDENCE

    A. Classical versus quantum independence

    How cruel is what Tononi calls the cruelest cut, di-viding a system into two parts that are maximally in-dependent? The situation is quite different in classicalphysics and quantum physics, as Figure 7 illustrates fora simple 2-bit system. In classical physics, the state isspecified by a 22 matrix giving the probabilities for thefour states 00, 01, 10 and 11, which define an entropy Sand mutual information I. Since there is only one pos-sible cut, the integrated information = I. The pointdefined by the pair (S,) can lie anywhere in the pyra-mid in the figure, whos top at (S,) = (1, 1) (blackstar) gives maximum integration, and corresponds to per-fect correlation between the two bits: 50% probability for00 and 11. Perfect anti-correlation gives the same point.The other two vertices of the classically allowed region

    are seen to be (S,) = (0, 0) (100% probability for a sin-gle outcome) and (S,) = (2, 0) (equal probability for allfour outcomes).

    In quantum mechanics, where the 2-qubit state is de-fined by a 4 4 density matrix, the available area in the(S, I)-plane doubles to include the entire shaded trian-gle, with the classically unattainable region opened upbecause of entanglement. The extreme case is a Bell pairstate such as

    | = 12

    (||+ ||) , (8)

    which gives (S, I) = (0, 2). However, whereas there wasonly one possible cut for 2 classical bits, there are now in-finitely many possible cuts because in quantum mechan-ics, all Hilbert space bases are equally valid, and we canchoose to perform the factorization in any of them. Since is defined as I after the cruelest cut, it is the I-valueminimized over all possible factorizations. For simplicity,we use the notation where denotes factorization in thecoordinate basis, so the integrated information is

    = minU

    I(UU), (9)

    i.e., the mutual information minimized over all possi-ble unitary transformations U. Since the Bell pair ofequation (8) is a pure state = ||, we can unitarilytransform it into a basis where the first basis vector is|, making it factorizable:

    U

    12 0 0

    12

    0 0 0 00 0 0 012 0 0

    12

    U = 1 0 0 00 0 0 00 0 0 0

    0 0 0 0

    = ( 1 00 0)(

    1 00 0

    ).

    (10)This means that = 0, so in quantum mechanics, thecruelest cut can be very cruel indeed: the most entangledstates possible in quantum mechanics have no integratedinformation at all!

    The same cruel fate awaits the most integrated 2-bit state from classical physics: the perfectly correlatedmixed state = 12 || + 12 ||. It gave = 1 bitclassically above (upper black star in the figure), but aunitary transformation permuting its diagonal elementsmakes it factorable:

    U

    12 0 0 00 0 0 00 0 0 00 0 0 12

    U =

    12 0 0 00 12 0 00 0 0 00 0 0 0

    = ( 1 00 0)(

    12 00 12

    ),

    (11)so = 0 quantum-mechanically (lower black star in thefigure).

    B. Canonical transformations, independence andrelativity

    The fundamental reason that these states are more sep-arable quantum-mechanically is clearly that more cuts

  • 11

    are available, making the cruelest one crueler. Interest-ingly, the same thing can happen also in classical physics.Consider, for example, our example of the deuteriumatom from equation (1). When we restricted our cuts tosimply separating different degrees of freedom, we foundthat the group (rp,pp, rn,pn) was quite (but not com-pletely) independent of the group (re,pe), and that therewas no cut splitting things into perfectly independentpieces. In other words, the nucleus was fairly indepen-dent of the electron, but none of the three particles wascompletely independent of the other two. However, if weallow our degrees of freedom to be transformed beforethe cut, then things can be split into two perfectly in-dependent parts! The classical equivalent of a unitarytransformation is of course a canonical transformation(one that preserves phase-space volume). If we performthe canonical transformation where the new coordinatesare the center-of-mass position rM and the relative dis-placements rp rp rM and re re rM , and cor-respondingly define pM as the total momentum of thewhole system, etc., then we find that (rM ,pM ) is com-pletely independent of the rest. In other words, the av-erage motion of the entire deuterium atom is completelydecoupled from the internal motions around its center-of-mass.

    Thanks to relativity theory, this well-known decom-position into average and relative motions is of coursepossible for any isolated system. If two systems are com-pletely independent, then they can gain no knowledge ofeach other, so a conscious observer in one will be un-aware of the other. Conversely, we can view relativityas a special case of this idea: an observer in an isolatedsystem has no way of knowing whether she is at rest orin uniform motion, because these are simply two differ-ent allowed states for the center-of-mass system, whichis completely independent from the internal-motions sys-tem of which her consciousness is a part.

    C. How integrated can quantum states be?

    We saw in Figure 7 that some seemingly integratedstates, such as a Bell pair or a pair of classically per-fectly correlated bits, are in fact not integrated at all.But the figure also shows that some states are truly inte-grated even quantum-mechanically, with I > 0 even forthe cruelest cut. How integrated can a quantum state be?I have a conjecture which, if true, enables the answer tobe straightforwardly calculated.

    -Diagonality Conjecture (DC):The mutual information always takes its min-imum in a basis where is diagonal

    Although I do not have a proof of this conjecture, arigorous proof of a closely related conjecture will be pro-

    FIG. 8: When performing random unitary transformationsof a density matrix , the mutual information appears to al-ways be minimized when rotating into its eigenbasis, so that becomes diagonal.

    vided below for the special case of n = 2.2 Moreover,there is substantial numerical support for the conjecture.For example, Figure 8 plots the mutual information I for15,000 random unitary transformations of a single ran-dom 22 matrix, as a function of the off-diagonality, de-fined as the sum of the square modulus of the off-diagonalcomponents. It is seen that the lower envelope forms amonotonically increasing curve taking a minimum whenthe matrix is diagonal, i.e., rotated into its eigenbasis.Similar numerical experiments with a variety of n ndensity matrices up to n = 25 showed the same qualita-tive behavior. In the rest of this section, we will assumethat the DC is in fact true and explore the consequences;if it is false, then () will generally be even smaller thanfor the diagonal cases we explore.

    Assuming that the DC is true, the first step in com-puting the integrated information () is thus to diago-nalize the nn density matrix . If all n eigenvalues aredifferent, then there are n! possible ways of doing this,corresponding to the n! ways of permuting the eigenval-ues, so the DC simplifies the continuous minimizationproblem of equation (9) to a discrete minimization prob-lem over these n! permutations. Suppose that n = lm,and that we wish to factor the m-dimensional Hilbertspace into factor spaces of dimensionality l and m, sothat = 0. It is easy to see that this is possible if then eigenvalues of can be arranged into a l m matrixthat is multiplicatively separable (rank 1), i.e., the prod-uct of a column vector and a row vector. Extracting theeigenvalues for our example from equation (11) where

    2 The converse of the DC is straightforward to prove: if = 0(which is equivalent to the state being factorizable; = 12),then it is factorizable also in its eigenbasis where both 1 and 2are diagonal.

  • 12

    l = m = 2 and n = 4, we see that(12

    12

    0 0

    )is separable, but

    (12 00 12

    )is not,

    and the only difference is that the order of the four num-bers has been permuted. More generally, we see that tofind the cruelest cut that defines the integrated infor-mation , we want to find the permutation that makesthe matrix of eigenvalues as separable as possible. It iseasy to see that when seeking the permutation givingmaximum separability, we can without loss of generalityplace the largest eigenvalue first (in the upper left corner)and the smallest one last (in the lower right corner). Ifthere are only 4 eigenvalues (as in the above example),the ordering of the remaining two has no effect on I.

    D. The quantum integration paradox

    We now have the tools in hand to answer the key ques-tion from the last section: which state maximizes theintegrated information ? Numerical search suggeststhat the most integrated state is a rescaled projectionmatrix satisfying 2 . This means that some num-ber k of the n eigenvalues equal 1/k and the remain-ing ones vanish.3 For the n = 4 example from Fig-ure 7, k = 3 is seen to give the best integration, witheigenvalues (probabilities) 1/3, 1/3, 1/3 and 0, giving = log(27/16)/ log(8) 0.2516.

    For classical physics, we saw that the maximal at-tainable grows roughly linearly with n. Quantum-mechanically, however, it decreases as n increases!4

    In summary, no matter how large a quantum systemwe create, its state can never contain more than about aquarter of a bit of integrated information! This exacer-bates the integration paradox from Section II G, eliminat-ing both of the first two resolutions: you are clearly awareof more than 0.25 bits of information right now, and this

    3 A heuristic way of understanding why having many equal eigen-values is advantageous is that it helps eliminate the effect of theeigenvalue permutations that we are minimizing over. If the op-timal state has two distinct eigenvalues, then if swapping themchanges I, it must by definition increase I by some finite amount.This suggests that we can increase the integration by bringingthe eigenvalues infinitesimally closer or further apart, and repeat-ing this procedure lets us further increase until all eigenvaluesare either zero or equal to some positive value.

    4 One finds that is maximized when the k nonzero eigenvaluesare arranged in a Young Tableau, which corresponds to a parti-tion of k as a sum of positive integers k1 + k2 + ..., giving =S(p)+S(p)log2 k, where the probability vectors p and p aredefined by pi = ki/k and p

    i = k

    i /k. Here k

    i denotes the conju-

    gate partition. For example, if we cut an even number of qubitsinto two parts with n/2 qubits each, then n = 2, 4, 6, ..., 20 gives 0.252, 0.171, 0.128, 0.085, 0.085, 0.073, 0.056, 0.056, 0.051and 0.042 bits, respectively. This is if the diagonality conjec-ture is true if it is not, then is even smaller.

    quarter-bit maximum applies not merely to states of Hop-field networks, but to any quantum states of any system.Let us therefore begin exploring the third resolution: thatour definition of integrated information must be modifiedor supplemented by at least one additional principle.

    E. How integrated is the Hamiltonian?

    An obvious way to begin this exploration is to con-sider the state not merely at a single fixed time t, butas a function of time. After all, it is widely assumed thatconsciousness is related to information processing, notmere information storage. Indeed, Tononis original -definition [3] (which applies to classical neural networksrather than general quantum systems) involves time, de-pending on the extent to which current events affect fu-ture ones.

    Because the time-evolution of the state is determinedby the Hamiltonian H via the Schrodinger equation

    = i[H, ], (12)

    whose solution is

    (t) = eiHteiHt, (13)

    we need to investigate the extent to which the cruelestcut can decompose not merely but the pair (,H) intoindependent parts. (Here and throughout, we often useunits where ~ = 1 for simplicity.)

    F. Evolution with separable Hamiltonian

    As we saw above, the key question for is whether itit is factorizable (expressible as product = 1 2 ofmatrices acting on the two subsystems), whereas the keyquestion for H is whether it is what we will call addi-tively separable, being a sum of matrices acting on thetwo subsystems, i.e., expressible in the form

    H = H1 I + IH2 (14)

    for some matrices H1 and H2. For brevity, we will oftenwrite simply separable instead of additively separable. Asmentioned in Section II B, a separable Hamiltonian Himplies that both the thermal state eH/kT andthe time-evolution operator U eiHt/~ are factorizable.An important property of density matrices which waspointed out already by von Neumann when he inventedthem [18] is that if H is separable, then

    1 = i[H1, 1], (15)

    i.e., the time-evolution of the state of the first subsystem,1 tr 2, is independent of the other subsystem and ofany entanglement with it that may exist. This is easy to

  • 13

    prove: Using the identities (A16) and (A18) shows that

    tr2

    [H1 I, ] = tr2

    [(H1 I)] tr2

    [(H1 I)]= H1 tr

    2[(I I)] + tr

    2[(I I)]H1

    = H11 1H2 = [H1, 1]. (16)Using the identity (A10) shows that

    tr2

    [IH2, ] = 0. (17)

    Summing equations (16) and (17) completes the proof.

    G. The cruelest cut as the maximization ofseparability

    Since a general Hamiltonian H cannot be written inthe separable form of equation (14), it will also include athird term H3 that is non-separable. The independenceprinciple from Section I therefore suggests an interest-ing mathematical approach to the physics-from-scratchproblem of analyzing the total Hamiltonian H for ourphysical world:

    1. Find the Hilbert space factorization giving thecruelest cut, decomposing H into parts with thesmallest interaction Hamiltonian H3 possible.

    2. Keep repeating this subdivision procedure for eachpart until only relatively integrated parts remainthat cannot be further decomposed with a smallinteraction Hamiltonian.

    The hope would be that applying this procedure to theHamiltonian of our standard model would reproduce thefull observed object hierarchy from Figure 1, with the fac-torization corresponding to the objects, and the variousnon-separable terms H3 describing the interactions be-tween these objects. Any decomposition with H3 = 0would correspond to two parallel universes unable tocommunicate with one another.

    We will now formulate this as a rigorous mathemat-ics problem, solve it, and derive the observational conse-quences. We will find that this approach fails catastroph-ically when confronted with observation, giving interest-ing hints regarding further physical principles needed forunderstanding why we perceive our world as an objecthierarchy.

    H. The Hilbert-Schmidt vector space

    To enable a rigorous formulation of our problem, letus first briefly review the Hilbert-Schmidt vector space, aconvenient inner-product space where the vectors are notwave functions | but matrices such as H and . Forany two matrices A and B, the Hilbert-Schmidt innerproduct is defined by

    (A,B) tr AB. (18)

    For example, the trace operator can be written as aninner product with the identity matrix:

    tr A = (I,A). (19)

    This inner product defines the Hilbert-Schmidt norm(also known as the Frobenius norm)

    ||A|| (A,A) 12 = (tr AA) 12 =

    ij

    |Aij |2 12 . (20)

    If A is Hermitean (A = A), then ||A||2 is simply thesum of the squares of its eigenvalues.

    Real symmetric and antisymmetric matrices form or-thogonal subspaces under the Hilbert-Schmidt innerproduct, since (S,A) = 0 for any symmetric matrix S(satisfying St = S) and any antisymmetric matrix A(satisfying At = A). Because a Hermitian matrix (sat-isfying H = H) can be written in terms of real sym-metric and antisymmetric matrices as H = S + iA, wehave

    (H1,H2) = (S1,S2) + (A1,A2),

    which means that the inner product of two Hermiteanmatrices is purely real.

    I. Separating H with orthogonal projectors

    By viewing H as a vector in the Hilbert-Schmidt vec-tor space, we can rigorously define and decomposition ofit into orthogonal components, two of which are the sep-arable terms from equation (14). Given a factorization ofthe Hilbert space where the matrix H operates, we definefour linear superoperators5 i as follows:

    0H 1n

    (tr H) I (21)

    1H (

    1

    n2tr2

    H

    ) I2 0H (22)

    2H I1 (

    1

    n1tr1

    H

    )0H (23)

    3H (I1 2 3)H (24)It is straightforward to show that these four linear op-erators i form a complete set of orthogonal projectors,i.e., that

    3i=0

    i = I, (25)

    ij = iij , (26)

    (iH,jH) = ||iH||2ij . (27)

    5 Operators on the Hilbert-Schmidt space are usually called su-peroperators in the literature, to avoid confusions with operatorson the underlying Hilbert space, which are mere vectors in theHilbert-Schmidt space.

  • 14

    This means that any Hermitean matrix H can be de-composed as a sum of four orthogonal components Hi iH, so that its squared Hilbert-Schmidt norm can bedecomposed as a sum of contributions from the four com-ponents:

    H = H0 + H1 + H2 + H3, (28)

    Hi iH, (29)(Hi,Hj) = ||Hi||2ij , (30)||H||2 = ||H0||2+||H1||2+||H2||2+||H3||2. (31)

    We see that H0 I picks out the trace of H, whereas theother three matrices are trace-free. This trace term is ofcourse physically uninteresting, since it can be eliminatedby simply adding an unobservable constant zero-point en-ergy to the Hamiltonian. H1 and H2 corresponds to thetwo separable terms in equation (14) (without the traceterm, which could have been arbitrarily assigned to ei-ther), and H3 corresponds to the non-separable residual.A Hermitean matrix H is therefore separable if and onlyif 3H = 0. Just as it is customary to write the norm or avector r by r |r| (without bold face), we will denote theHilbert-Schmidt norm of a matrix H by H ||H||. Forexample, with this notation we can rewrite equation (31)as simply H2 = H20 +H

    21 +H

    22 +H

    23 .

    Geometrically, we can think of nn Hermitean matri-ces H as points in the N -dimensional vector space RN ,whereN = nn (Hermiteal matrices have n real numberson the diagonal and n(n 1)/2 complex numbers off thediagonal, constituting a total of n+ 2 n(n 1)/2 = n2real parameters). Diagonal matrices form a hyperplaneof dimension n in this space. The projection operators0, 1, 2 and 3 project onto hyperplanes of dimen-sion 1, (n 1), (n 1) and (n 1)2, respectively, soseparable matrices form a hyperplane in this space of di-mension 2n 1. For example, a general 4 4 Hermiteanmatrix can be parametrized by 10 numbers (4 real for thediagonal part and 6 complex for the off-diagonal part),and its decomposition from equation (28) can be writtenas follows:

    H =

    t+a+b+v d+w c+x yd+w t+abv z cxc+x z ta+bv dwy cx dw tab+v

    =

    =

    t 0 0 00 t 0 00 0 t 00 0 0 t

    + a 0 c 00 a 0 cc 0 a 0

    0 c 0 a

    + b d 0 0d b 0 00 0 b d

    0 0 d b

    +

    +

    v w x yw v z xx z v wy x w v

    (32)We see that t contributes to the trace (and H0) whilethe other three components Hi are traceless. We also seethat tr 1H2 = tr 2H1 = 0, and that both partial tracesvanish for H3.

    J. Maximizing separability

    We now have all the tools we need to rigorously max-imize separability and test the physics-from-scratch ap-proach described in Section III G. Given a HamiltonianH, we simply wish to minimize the norm of its non-separable component H3 over all possible Hilbert spacefactorizations, i.e., over all possible unitary transforma-tions. In other words, we wish to compute

    E minU||3H||, (33)

    where we have defined the integration energy E by anal-ogy with the integrated information . If E = 0, thenthere is a basis where our system separates into two paral-lel universes, otherwise E quantifies the coupling betweenthe two parts of the system under the cruelest cut.

    Tangent vectorH=[A,H]

    Integrationenergy

    E=||3||

    Non-separablecomponent3

    Sepa

    rabl

    e hy

    perp

    lane

    : 3

    =0

    Subsphere H=UH*U{

    FIG. 9: Geometrically, we can view the integration energyas the shortest distance (in Hilbert-Schmidt norm) betweenthe hyperplane of separable Hamiltonians and a subsphereof Hamiltonians that can be unitarily transformed into oneanother. The most separable Hamiltonian H on the subsphereis such that its non-separable component 3 is orthogonalto all subsphere tangent vectors [A,H] generated by anti-Hermitean matrices A.

    The Hilbert-Schmidt space allows us to interpret theminimization problem of equation (33) geometrically, asillustrated in Figure 9. Let H denote the Hamiltonianin some given basis, and consider its orbit H = UHUunder all unitary transformations U. This is a curved hy-persurface whose dimensionality is generically n(n 1),i.e., n lower than that of the full space of Hermiteanmatrices, since unitary transformation leave all n eigen-

  • 15

    values invariant.6 We will refer to this curved hypersur-face as a subsphere, because it is a subset of the full n2-dimensional sphere: the radius H (the Hilbert-Schmidtnorm ||H||) is invariant under unitary transformations,but the subsphere may have a more complicated topologythan a hypersphere; for example, the 3-sphere is knownto topologically be the double cover of SO(3), the matrixgroup of 3 3 orthonormal transformations.

    We are interested in finding the most separable point Hon this subsphere, i.e., the point on the subsphere that isclosest to the (2n1)-dimensional separable hyperplane.In our notation, this means that we want to find the pointH on the subsphere that minimizes ||3H||, the Hilbert-Schmidt norm of the non-separable component. If weperform infinitesimal displacements along the subsphere,||3H|| thus remains constant to first order (the gradientvanishes at the minimum), so all tangent vectors of thesubsphere are orthogonal to 3H, the vector from theseparable hyperplane to the subsphere.

    Unitary transformations are generated by anti-Hermitian matrices, so the most general tangent vectorH is of the form

    H = [A,H] AHHA (34)

    for some anti-Hermitean nn matrix A (any matrix sat-isfying A = A). We thus obtain the following simplecondition for maximal separability:

    (3H, [A,H]) = 0 (35)

    for any anti-Hermitean matrix A. Because the most gen-eral anti-Hermitean matrix can be written as A = iB fora Hermitean matrix B, equation (35) is equivalent to thecondition (3H, [B,H]) = 0 for all Hermitean matricesB. Since there are n2 anti-Hermitean matrices, equa-tion (35) is a system of n2 coupled quadratic equationsthat the components of H must obey.

    K. The Hamiltonian diagonality conjecture

    By analogy with our -diagonality conjecture above,I once again conjecture that maximal separability is at-tained in the eigenbasis.

    H-Diagonality Conjecture (HDC):The Hamiltonian is always maximally separa-ble (minimizing ||H3||) in the energy eigenba-sis where it is diagonal.

    Numerical tests of this conjecture produce encouragingsupport that is visually quite analogous to Figure 8, but

    6 nn-dimensional Unitary matricesU are known to form an nn-dimensional manifold: they can always be written as U = eiH

    for some Hermitean matrix H, so they are parametrized by thesame number of real parameters (n n) as Hermitean matrices.

    with ||H3|| rather than I on the vertical axis. For theHDC, however, the mathematical formalism above alsopermits rigorous analytic tests. Let us consider the sim-plest case, when n = 4 so that H can be parametrizedas in equation (32). Since ||H3|| is invariant under uni-tary transformations acting only on either of the twofactor spaces, we can without loss of generality selecta basis where H1 and H2 are diagonal, so that the pa-rameters c = d = 0. The optimality condition of equa-tion (35) gives a separate constraint equation for each ofn2 linearly independent anti-Hermitean matrices A. Ifwe choose these matrices to be of the simplest possibleform, each with all except one or two elements vanishing,we find that the resulting n2 real-valued equations pro-duces merely 8 linearly independent ones which can becombined into the following four complex equations:

    bw = 0, (36)

    ax = 0, (37)

    (a+ b)y = 0, (38)

    (a b)z = 0. (39)The HDC states that H3 ||H3|| takes its minimumvalue when H is diagonal, i.e., when w = x = y = z = 0.To complete the proof of the HDC for n = 4, we thusneed to find all solutions to these equations and verifythat for each of them, H3 is greater than or equal to itsvalue in the energy eigenbasis. There are only 6 cases toconsider for a and b:

    1. a = b = 0, solving the equations and givingH1 = H2 = 0. This clearly maximizes rather thanminimizes H3 (since H

    23 = H

    2H20H21H22 , andH0 is unitarily invariant), so H3 cannot be smallerthan in the eigenbasis.

    2. a = 0, b 6= 0, so w = y = z = 0, giving H3 =321/2(|v|2 + |x|2)1/2. This is not the H3-minimum,because H is separable (H3 = 0) in its eigenbasis.

    3. b = 0, a 6= 0, so x = y = z = 0, analogously givingH3 = 32

    1/2(|v|2 + |y|2)1/2 0, the value in theeigenbasis.

    4. a = b 6= 0, so w = x = y = 0, giving H3 = (32|v|2+16|z|2)1/2 321/2|v|2, the value in the eigenbasis.

    5. a = b 6= 0, so w = x = z = 0, giving H3 =(32|v|2 + 16|y|2)1/2 321/2|v|, the value in theeigenbasis.

    6. a 6= 0, b 6= 0, |a| 6= |b|, so w = x = y = z = 0,giving a diagonal H.

    In summary, when applying all unitary transformations,the non-separable part H3 takes its minimum when His diagonal, which completes the proof. It also takes itsmaximum when H1 = H2 = 0, and has saddle pointswhen H1 or H2 vanish or when H1 = H2. Any extremumwhere H1 and H2 are generic (non-zero and non-equal)has H diagonal.

  • 16

    Although it appears likely that the HDC can be analo-gously proven for any given n by using equation (35) andexamining all solutions to the resulting system of cou-pled quadratic equations, a general proof for arbitrary nwould obviously be more interesting. Any diagonal nnHamiltonian H is a solution to equation (35), so whatremains to be established is that this is the global mini-mum.7

    If the HDC is correct, then separability is always max-imized in the energy eigenbasis, where the n n matrixH is diagonal and the projection operators i definedby equations (21)-(24) greatly simplify. If we arrangethe n = lm diagonal elements of H into an lm matrixH, then the action of the linear operators i is given bysimple matrix operations:

    H0 QlHQm, (40)H1 PlHQm, (41)H2 QlHPm, (42)H3 PlHPm, (43)

    where

    Pm I Qm, (44)(Qm)ij 1

    m(45)

    are m m projection matrices satisfying P 2m = Pm,Q2m = Qm, PmQm = QmPm = 0, Pm + Qm = I. (Toavoid confusion, we are using boldface for n n matri-ces and plain font for smaller matrices involving only theeigenvalues.)

    L. Ultimate independence and the Quantum Zenoparadox

    In Section III G, we began exploring the idea that ifwe divide the world into maximally independent parts(with minimal interaction Hamiltonians), then the ob-served object hierarchy from Figure 1 would emerge. As-suming that the HDC is correct, we have now found thatthis decomposition (factorization) into maximally inde-pendent parts can be performed in the energy eigenba-sis of the total Hamiltonian. This means that all sub-system Hamiltonians and all interaction Hamiltonianscommute with one another, corresponding to an essen-tially classical world where none of the quantum effects

    7 In the large-n limit, it is rather intuitive that the HDC should beat least approximately true, because almost all the n(n1) n2off-diagonal degrees of freedom for H belong to H3 which is beingminimized. Since we can restrict ourselves to bases whereH1 andH2 are diagonal as above, there are only of order n off-diagonaldegrees of freedom not contributing to H3 (contributing insteadto H1 and H2). If we begin in a basis where H is diagonal andapply a unitary transformation adding off-diagonal elements, H3thus generically increases.

    FIG. 10: If the Hamiltonian of a system commutes with theinteraction Hamiltonian ([H1,H3] = 0), then decoherencedrives the system toward a time-independent state wherenothing ever changes. The figure illustrates this for the BlochSphere of a single qubit starting in a pure state and ending upin a fully mixed state = I/2. More general initial states endup somewhere along the z-axis. Here H1 z, generating asimple precession around the z-axis.

    associated with non-commutativity manifest themselves!In contrast, many systems that we customarily refer toas objects in our classical world do not commute withtheir interaction Hamiltonians: for example, the Hamil-tonian governing the dynamics of a baseball involves itsmomentum, which does not commute with the position-dependent potential energy due to external forces.

    As emphasized by Zurek [19], states commuting withthe interaction Hamiltonian form a pointer basis ofclassically observable states, playing an important rolein understanding the emergence of a classical world.The fact that the independence principle automaticallyleads to commutativity with interaction Hamiltoniansmight therefore be taken as an encouraging indicationthat we are on the right track. However, whereasthe pointer states in Zureks examples evolve over timedue to the systems own Hamiltonian H1, those in ourindependence-maximizing decomposition do not, becausethey commute also with H1. Indeed, the situationis even worse, as illustrated in Figure 10: any time-dependent system will evolve into a time-independentone, as environment-induced decoherence [2023, 25, 27]drives it towards an eigenstate of the interaction Hamil-tonian, i.e., an energy eigenstate.8

    8 For a system with a finite environment, the entropy will eventu-

  • 17

    The famous Quantum Zeno effect, whereby a systemcan cease to evolve in the limit where it is arbitrar-ily strongly coupled to its environment [28], thus has astronger and more pernicious cousin, which we will termthe Quantum Zeno Paradox or the Independence Para-dox.

    Quantum Zeno Paradox:If we decompose our universe into maximallyindependent objects, then all change grinds toa halt.

    In summary, we have tried to understand the emer-gence of our observed semiclassical world, with its hi-erarchy of moving objects, by decomposing the worldinto maximally independent parts, but our attempts havefailed dismally, producing merely a timeless world remi-niscent of heat death. In Section II G, we saw that usingthe integration principle alone led to a similarly embar-rassing failure, with no more than a quarter of a bit ofintegrated information possible. At least one more prin-ciple is therefore needed.

    IV. DYNAMICS AND AUTONOMY

    Let us now explore the implications of the dynamicsprinciple from Table II, according to which a conscioussystem has the capacity to not only store information,but also to process it. As we just saw above, there is aninteresting tension between this principle and the inde-pendence principle, whose Quantum Zeno Paradox givesthe exact opposite: no dynamics and no information pro-cessing at all.

    We will term the synthesis of these two competing prin-ciples the autonomy principle: a conscious system hassubstantial dynamics and independence. When explor-ing autonomous systems below, we can no longer studythe state and the Hamiltonian H separately, since theirinterplay is crucial. Indeed, we well see that there are in-teresting classes of states that provide substantial dy-namics and near-perfect independence even when the in-teraction Hamiltonian H3 is not small. In other words,for certain preferred classes of states, the independenceprinciple no longer pushes us to simply minimize H3 andface the Quantum Zeno Paradox.

    A. Probability velocity and energy coherence

    To obtain a quantitative measure of dynamics, let usfirst define the probability velocity v p, where the prob-

    ally decrease again, causing the resumption of time-dependence,but this Poincare recurrence time grows exponentially with envi-ronment size and is normally large enough that decoherence canbe approximated as permanent.

    ability vector p is given by pi ii. In other words,

    vk = kk = i[H, ]kk. (46)

    Since v is basis-dependent, we are interested in findingthe basis where

    v2 k

    v2k =k

    (kk)2 (47)

    is maximized, i.e., the basis where the sums of squares ofthe diagonal elements of is maximal. It is easy to seethat this basis is the eigenbasis of :

    v2 =k

    (kk)2 =

    jk

    (jk)2

    j 6=k

    (jk)2

    = ||||2 j 6=k

    (jk)2 (48)

    is clearly maximized in the eigenbasis where all off-diagonal elements in the last term vanish, since theHilbert-Schmidt norm |||| is the same in every basis;||||2 = tr 2, which is simply the sum of the squares ofthe eigenvalues of .

    Let us define the energy coherence

    H 12|||| = 1

    2||[H, ]|| =

    tr {[H, ]2}

    2

    =

    tr [H22 HH]. (49)

    For a pure state = ||, this definition implies thatH H, where H is the energy uncertainty

    H =[|H2| |H|2]1/2 , (50)

    so we can think of H as the coherent part of the en-ergy uncertainty, i.e., as the part that is due to quantumrather than classical uncertainty.

    Since |||| = ||[H, ]|| = 2H, we see that the maxi-mum possible probability velocity v is simply

    vmax =

    2 H, (51)

    so we can equivalently use either of v or H as convenientmeasures of quantum dynamics.9 Whimsically speaking,the dynamics principle thus implies that energy eigen-states are as unconscious as things come, and that if youknow your own energy exactly, youre dead.

    9 The fidelity between the state (t) and the initial state 0 isdefined as

    F (t) 0|(t), (52)and it is easy to show that F (0) = 0 and F (0) = (H)2, sothe energy uncertainty is a good measure of dynamics in that italso determines the fidelity evolution to lowest order, for purestates. For a detailed review of related measures of dynam-ics/information processing capacity, see [7].

  • 18

    FIG. 11: Time-evolution of Bloch vector tr1 for a single qubit subsystem. We saw how minimizing H3 leads to a static statewith no dynamics, such as the left example. Maximizing H, on the other hand, produces extremely simple dynamics such asthe right example. Reducing H by a modest factor of order unity can allow complex and chaotic dynamics (center); shownhere is a 2-qubit system whether the second qubit is traced out.

    Although it is not obvious from their definitions, thesequantities vmax and H are independent of time (eventhough generally evolves). This is easily seen in theenergy eigenbasis, where

    imn = [H, ]mn = mn(Em En), (53)where the energies En are the eigenvalues of H. In thisbasis, (t) = eiHt(0)eiHt simplifies to

    (t)mn = (0)mnei(EmEn)t, (54)

    This means that in the energy eigenbasis, the probabili-ties pn nn are invariant over time. These quantitiesconstitute the energy spectral density for the state:

    pn = En||En. (55)In the energy eigenbasis, equation (50) reduces to

    H2 = H2 =n

    pnE2n

    (n

    pnEn

    )2, (56)

    which is time-invariant because the spectral density pnis. For general states, equation (49) simplifies to

    H2 =mn

    |mn|2En(En Em). (57)

    This is time-independent because equation (54) showsthat mn changes merely by a phase factor, leaving |mn|invariant. In other words, when a quantum state evolvesunitarily in the Hilbert-Schmidt vector space, both theposition vector and the velocity vector retain theirlengths: both |||| and |||| remain invariant over time.

    B. Dynamics versus complexity

    Our results above show that if all we are interested inis maximizing the maximal probability velocity, then weshould find the two most widely separated eigenvalues ofH, Emin and Emax, and choose a pure state that involvesa coherent superposition of the two:

    | = c1|Emin+ c2|Emax, (58)

    where |c1| = |c2| = 1/

    2. This gives H = (Emax Emin)/2, the largest possible value, but produces an ex-tremely simple and boring solution (t). Since the spec-tral density pn = 0 except for these two energies, thedynamics is effectively that of a 2-state system (a sin-gle qubit) no matter how large the dimensionality of His, corresponding to a simple periodic solution with fre-quency = Emax Emin (a circular trajectory in theBloch sphere as in the right panel of Figure 11). This vi-olates the dynamics principle as defined in Table II, sinceno substantial information processing capacity exists: thesystem is simply performing the trivial computation thatflips a single bit repeatedly.

    To perform interesting computations, the systemclearly needs to exploit a significant part of its energyspectrum. As can be seen from equation (54), if theeigenvalue differences are irrational multiples of one an-other, then the time evolution will never repeat, and will eventually evolve through all parts of Hilbert spaceallowed by the invariants |Em||En|. The reduction ofH required to transition from simple periodic motion

  • 19

    Dynamics from H1

    slides along diagonal

    ij0

    High-decoherence subspace

    High-decoherence subspace

    Dynamics from H3 suppresses

    off-diagonal elements ij

    Dynamics from H3 suppresses

    off-diagonal elements ij

    Low-decoherence subspace

    i

    j

    FIG. 12: Schematic representation of the time-evolution ofthe density matrix ij for a highly autonomous subsystem.ij 0 except for a single region around the diagonal(red/grey dot), and this region slides along the diagonal un-der the influence of the subsystem Hamiltonian H1. Any ij-elements far from the diagonal rapidly approach zero becauseof environment-decoherence caused by the interaction Hamil-tonian H3.

    to such complex aperiodic motion is quite modest. Forexample, if the eigenvalues are roughly equispaced, thenchanging the spectral density pn from having all weight atthe two endpoints to having approximately equal weightfor all eigenvalues will only reduce the energy coherenceH by about a factor

    3, since the standard deviation

    of a uniform distribution is

    3 times smaller than itshalf-width.

    C. Highly autonomous systems: sliding along thediagonal

    What combinations of H, and factorization producehighly autonomous systems? A broad and interestingclass corresponds to macroscopic objects around us thatmove classically to an excellent approximation.

    The states that are most robust toward environment-induced decoherence are those that approximately com-mute with the interaction Hamiltonian [22]. As a simplebut important example, let us consider an interactionHamiltonian of the factorizable form

    H3 = AB, (59)

    and work in a basis where the interaction term A is di-agonal. If 1 is approximately diagonal in this basis,

    then H3 has little effect on the dynamics, which be-comes dominated by the internal subsystem HamiltonianH1. The Quantum Zeno Paradox we encountered in Sec-tion III L involved a situation where H1 was also diag-onal in this same basis, so that we ended up with nodynamics. As we will illustrate with examples below,classically moving objects in a sense constitute the op-posite limit: the commutator 1 = i[H1, 1] is essentiallyas large as possible instead of as small as possible, con-tinually evading decoherence by concentrating arounda single point that continually slides along the diagonal,as illustrated in Figure 12. Decohererence rapidly sup-presses off-diagonal elements far from this diagonal, butleaves the diagonal elements completely unaffected, sothere exists a low-decoherence band around the diagonal.Suppose, for instance, that our subsystem is the center-of-mass position x of a macroscopic object experiencinga position-dependent potential V (x) caused by couplingto the environment, so that Figure 12 represents the den-sity matrix 1(x, x

    ) in the position basis. If the potentialV (x) has a flat (V = 0) bottom of width L, then 1(x, x)will be completely unaffected by decoherence for the band|xx| < L. For a generic smooth potential V , the deco-herence suppression of off-diagonal elements grows onlyquadratically with the distance |xx| from the diagonal[21, 26], again making decoherence much slower than theinternal dynamics in a narrow diagonal band.

    As a specific example of this highly autonomous type,let us consider a subsystem with a uniformly spaced en-ergy spectrum. Specifically, consider an n-dimensionalHilbert space and a Hamiltonian with spectrum

    Ek =

    [k n 1

    2

    ]~ = k~ + E0, (60)

    k = 0, 1, ..., n 1. We will often set ~ = 1 for simplic-ity. For example, n = 2 gives the spectrum { 12 , 12}like the Pauli matrices divided by two, n = 5 gives{2,1, 0, 1, 2} and n gives the simple Harmonicoscillator (since the zero-point energy is physically irrele-vant, we have chosen it so that tr H =

    Ek = 0, whereas

    the customary choice for the harmonic oscillator is suchthat the ground state energy is E0 = ~/2).

    If we want to, we can define the familiar position andmomentum operators x and p, and interpret this systemas a Harmonic oscillator. However, the probability ve-locity v is not maximized in either the position or themomentum basis, except twice per oscillation whenthe oscillator has only kinetic energy, v is maximized inthe x-basis, and when it has only potential energy, v ismaximized in the p-basis, and when it has only poten-tial energy. If we consider the Wigner function W (x, p),which simply rotates uniformly with frequency , it be-comes clear that the observable which is always chang-ing with the maximal probability velocity is instead thephase, the Fourier-dual of the energy. Let us thereforedefine the phase operator

    FHF, (61)

  • 20

    where F is the unitary Fourier matrix.

    Please remember that none of the systems H that weconsider have any a priori physical interpretation; rather,the ultimate goal of the physics-from-scratch program isto derive any interpretation from the mathematics alone.Generally, any thus emergent interpretation of a subsys-tem will depend on its interactions with other systems.Since we have not yet introduced any interactions for oursubsystem, we are free to interpret it in whichever way isconvenient. In this spirit, an equivalent and sometimesmore convenient way to interpret our Hamiltonian fromequation (60) is as a massless one-dimensional scalar par-ticle, for which the momentum equals the energy, so themomentum operator is p = H. If we interpret the par-ticle as existing in a discrete space with n points and atoroidal topology (which we can think of as n equispacedpoints on a ring), then the position operator is related tothe momentum operator by a discrete Fourier transform:

    x = FpF, Fjk 1Nei

    jk2pin . (62)

    Comparing equations (61) and (62), we see that x =. Since F is unitary, the operators H, p, x and all have the same spectrum: the evenly spaced grid ofequation (60).

    As illustrated in Figure 13, the time-evolution gener-ated by H has a very simple geometric interpretationin the space spanned by the position eigenstates |xk,k = 1, ...n: the space is simply rotating with frequency around a vector that is the sum of all the position eigen-vectors, so after a time t = 2pi/n, a state |(0) = |xkhas been rotated such that it equals the next eigenvector:|(t) = |xk+1, where the addition is modulo n. Thismeans that the system has period T 2pi/, and that| rotates through each of the n basis vectors duringeach period.

    Let us now quantify the autonomy of this system, start-ing with the dynamics. Since a position eigenstate is aDirac delta function in position space, it is a plane wavein momentum space and in energy space, since H = p.This means that the spectral density is pn = 1/n for aposition eigenstate. Substituting equation (60) into equa-tion (56) gives an energy coherence

    H = ~n2 1

    12. (63)

    For comparison,

    ||H|| =(n1k=0

    E2k

    )1/2= ~

    n(n2 1)

    12=n H. (64)

    Let us now turn to quantifying independence and de-coherence. The inner product between the unit vector|(0) and the vector |(t) eiHt|(0) into which it

    evolves after a time t is

    fn() |eiH | = 1

    n

    n1k=0

    eiEk = ein12

    n1k=0

    eik

    =1

    nei

    n12

    1 ein1 ei =

    sinn

    n sin, (65)

    where t. This inner product fn is plotted in Fig-ure 14, and is seen to be a sharply peaked even functionsatisfying fn(0) = 1, fn(2pik/n) = 0 for k = 1, ..., n 1and exhibiting one small oscillation between each of thesezeros. The angle cos1 fn() between an initial vec-tor and its time evolution thus grows rapidly from 0 to90, then oscillates close to 90 until returning to 0 aftera full period T . An initial state |(0) = |xk thereforeevolves as

    j(t) = fn(t 2pi[j k]/n)in t