three lectures on probability in physicsphil.elte.hu/redei/talks/konstanz.pdf · three lectures on...

Three lectures on probability in physics

Miklos Redei

Department of History and Philosophy of Science

Eotvos University, Budapest, Hungary

Prepared for

Philosophy, Probability and the Special Sciences

International Summer School

Konstanz, Germany, July 27-August 4, 2003

(Thanks Gabriella, Stephan and Luc for the invitation!!)

1

Lecture 1

Probability in Classical Statistical Mechanics

Lecture 2

Probability in Hilbert Space Quantum Mechanics

Lecture 3

Probability in Algebraic Quantum Mechanics

2

Preliminary remarks

Each lecture

• will begin by a short review of the main issues to be discussed

in the talk

• will close by a brief summary of the main points

• will formulate one single sentence intended to be remembered

The pdf file of the lectures will be available from my home page

http://hps.elte.hu/∼redei

Some of the issues/points to be discussed are very technical but I

have suppressed the technicalities as much as I could

3

Probability in Classical Statistical Mechanics

Structure:

• Elementary mathematical comments

Probability theory in measure theoretic form, distributions, densities

• Informal overview of the basic problem and main idea of CSM

Averaging using Principle of Equal Apriori Probability

• Application of Central Limit Theorem in CSM

Physical assumptions making Central Limit Theorem applicable,

Central Limit Theorem, universal form of state density as

consequence of Central Limit Theorem

• Ergodic theory as an attempt to justify the Principle of Equal

Apriori Probability

The ergodic problem, Birkhoff’s theorem, metric transitivity

4

The one sentence to be remembered

Standard Equilibrium Classical Statistical Mechanics is based on

the difficult-to-justify Principle of Equal Apriori Probability and on

the physical assumptions that make possible the application of

Central Limit Theorem to such many body systems

5

Mathematical comments

(X,S, µ) classical measure space

X set

S Boolean algebra

µ:S → IR+ ∪∞ σ-additive measure

counting measure

L1(X, µ) integrable functions

pg(A) =∫

χAfdµ probability measure

g ∈ L1(X, µ) given by density g

w.r.t. counting measure µ

(X,S, pg) probability measure space

6

f : X → IR random variable

(S,B(IR)) measurable

p f−1:B(IR)→ IR distribution of f

F (r) = p(x ∈ X : f(x) ≤ r) distribution function of f

F (x) =∫ x

−∞φ(t)dt φ= density of distribution function

φ(t) =∫

eixtφ(x)dx characteristic function of

distribution function F

7

Proposition : If fi are independent random variables then

• the density φ of the distribution function of their sum is the

convolution of the densities of their distribution functions φi:

φ(x) =

∫

φN (x−N−1∑

i

xi)ΠN−1i φi(xi)dx1dx2 . . .dxN−1

• the characteristic function φ of the distribution function of

their sum is the product of the characteristic functions of their

distribution functions φi: φ(x) = Πiφi

8

Main problem of CSM

Macroscopically large system S in “equilibrium”

F physical quantity of large system, constant in time

si (i = 1, . . . N) identical subsystems of S such that

– si’s interact via known law

– si’s states change in time according to known law

– N is “large” (∼ 1023)

How can F be viewed/expressed/computed as the average

of a (suitable) function f that depends on the state’s of si’s?

Example:

Ideal gas

9

Sketch of Standard Solution of Main Problem of CSM

One macroscopic equilibrium state ↔ many microstates

Ω(E1, E2): # of (different) microstates between energies E1, E2

F =< f >=∫

fdp

f = appropriate phase function representing F

p = probability measure on phase space of large system expressing

that all microstates compatible with a given macrostate are equally

likely: the probability of a set of microstates is proportional to the

size (measure) of the set:

p(microstates between energies E1, E2) ≈ Ω(E1, E2)

Principle of Equal Apriori Probability

Ω typically has a density ω:

Ω(E1, E2) =

∫ E2

E1

ω(x)dx)

10

Principle of Equal Apriori Probability in the case when

macrostate = energy of the system is fixed (F = E = const.):

The microstates compatible with F = E = const.

lie on the constant energy (hyper)surface ΓE ⊂ X

the average of phase function f defined on ΓE :

< f >=∫

fdµ

where µ is the microcanonical probability measure:

µ(A) = 1µ(ΓE)v(A)

with v being the surface measure on ΓE

11

By the Principle of Equal Apriori Probability probabilities are

proportional to the “number” (measure) of states

⇓The state density ω is a crucial quantity:

ω determines the probabilistic behavior of large system

⇓Statistical mechanics is possible as a general theory of macroscopic

systems consisting of subsystems to the extent the density of states

of macroscopic systems has universal characteristics

12

Density of states does have universal features

Universal features are consequences of the following assumptions:

• The measure space describing a macroscopic system is the

product of measure spaces describing the subsystems

• The energy of the large system is the sum of energies of

subsystems

• The number of subsystems is very large (N →∞)

The force of these assumptions is that they make it possible to

apply the Central Limit Theorem to derive universal characteristics

of the state density function in the N →∞ limit

13

Central limit theorem:

Let (X,S, µ) be a probability space and fi: X → IR be independent

random variables such that

• Mµ(fi) = Mi, Mµ(|fi −Mi|2) = D2i , Mµ(|fi −Mk|3) = H3

i

exist

• limn→∞

3√∑

n

iH3

i√∑

n

iD2

i

= 0

Let gn = f1+f2+...fn

n be the average and g∗n = gn−M(gn)D(gn) be the

standardized average. If Gn is the distribution function of g∗n, then

limn→∞

Gn(x) =1

2π

∫ x

∞

e−y2

2 dy

The density of the distribution function of the (standardized)

average of n independent random variables tends to the Gaussian

as n→∞The point is that the proposition is true for any µ!

14

X = ×Ni Xi phase space of large/sub system

v = ×Ni vi, volume measure on B(X), (on B(Xi))

f : X → IR physical quantity of large system

f : Xi → IR, of subsystem e.g. energy fE , fEi

Ω(x) ≡ v(s ∈ X: fE(x) ≤ x) measure, volume of states of large

Ωi(x) ≡ vi(s ∈ Xi: fEi (x) ≤ x) of small system in which

the value of energy is less than x

ω(x) ≡ ddx

Ω(x) density of states, structure function

circle: area: Ω(x) = x2π measure of energy hypersurface Γx

circumference: ω(x) = 2xπ

< f >= 1ω(x)

ddx

∫

f−1E

([0,x])fdv expectation value of f

< f >=∫

f |Γxdµ on the energy surface Γx

µ(A) = 1Γ(x)

∫

A

vN−1

|gradfE |fE(y)=xµ = microcanonical measure

15

(X,B(X), v) = ×Ni (Xi,B(Xi), vi)

v = ×Ni vi fE =

∑

i fEi fE

i are independent in v

⇓

ω(x) =

∫

X\sn

ωn

(

x−n−1∑

i=1

xi

)[

Πn−1i=1 ωi(x)

]

dx

The state density of the large system is the convolution of the state

densities of the component systems

⇓If the densities ωi were probability densities (they are not because

they are not normalized), then the state density ω of the large

system would be the density of the distribution function of the sum

fE =∑

i fEi of the independent random variables fE

i representing

the energy of the subsystems, and we could apply Central Limit

Theorem to approximate ω. Yet, with a formally easy

normalization trick one can do this.

16

The normalization trick:

Define Φ(α), uα for the large system and

Φi(α), uαi for the subsystems by

Φ(α) :=

∫

e−αxω(x)dx (α ∈ IR+ parameter)

uα(x) :=

1

Φ(α)e−αxω(x) x ≥ 0

0 x < 0

Φi(α) :=

∫

e−αxωi(x)dx (α ∈ IR+ parameter)

uαi (x) :=

1

Φi(α)e−αxω(x) x ≥ 0

0 x < 0

17

Proposition:

• uα(x), uαi ≥ 0

∫uα =

∫uα

i = 1 (for all i)

(i.e. uα(x), uαi are probability densities)

• fEi are independent w.r.t. the probability measure defined by

the renormalized densities

• The convolution rule for ω entails

the product rule for Φ(α)

Φ(α) = ΠNi Φα

i

and the convolution rule for uα:

uα(x) =

∫

X\sn

uαn

(

x−n−1∑

i=1

)[

Πn−1i=1 uα

i (x)]

dx

18

Proposition, and in particular the convolution rule for uα shows

that uα behaves like the density of the distribution function of the

sum of N independent random variables having the densities uαi for

their distribution functions.

⇓ (Central Limit Theorem)

limN→∞

uα(x) =1

2πDuα

exp(

− (x−Muα)2

2Duα

)

⇓ (since ω(x) = Φ(α)eαxuα(x))

limN→∞

ω(x) = Φ(α)eαx 1

2πDuα

exp(

− (x−Muα)2

2Duα

)

In the limit N →∞ the density of states has a universal form

irrespective of the precise density of states ωi of the constituent

subsystems!

19

Example

ideal gas of N identical classical particles

enclosed in volume V , with particle mass m

Exact density function and approximate density function :

ωIG(x) = V N (2π)3N/2

Γ[(3N/2) + 1]m3n/2 3N

2x3N/2−1

ωIG(x) ≈ V N (2π)−3N/2

(3N/2)3N/2e−3n/2[2π(3n/2)]1/2m3N/2 3N

2x3N/2−1

Difference between ω(x) and ω(x): Γ[(3N/2) + 1] is replaced by its

approximation by Stirling’s formula N ! =√

2πNNNe−N (N 1)

where the gamma function is Γ(t) =∫ ∞

0e−xxt−1dx

20

Remark (Gibbs paradox)

ωIG and ωIG are the densities of the distribution of energy of ideal

gas of N particles. The Principle of Equal Apriori Probability

states that

p(microstates between energies E1, E2) ≈ Ω(E1, E2)

where Ω(E1, E2) is the number of different microstates. Computing

averages with Ω(E1, E2) =∫ E2

E1ωIG (Ω(E1, E2) =

∫ E2

E1ωIG) one

would get into contradiction with thermodynamics (the entropy

would not be additive = Gibbs paradox). One avoids the paradox

by defining

ωIGOK :=

1

N !ωIG ωIG

OK :=1

N !ωIG

Interpretation: the particles must not be considered

distinguishable when counting the number of microstates – not

even if they are considered distinguishable in classical mechanics.

21

Immediate, most important

corollary of universal form of density function

Boltzmann’s law:

The density of the probability distribution of a small subsystem of

a large system (=heat bath) is given by

ρsm(x) = const.e−βfEsm(x)

β=(inverse) temperature, Z = const.∫

Xsme−βfE

sm(x)dx partition

function, fEsm: Xsm → IR energy function of small system

< g >=1

const.Z

∫

Xsm

g(x)e−βfEsm(x)dx

expectation value of quantity (phase function) g of the small system

22

[

Boltzmann’s law + partition function]

⇓All relations of statistical mechanics

E.g.:

PV = NkT

state equation of classical ideal gas

of N particles in container of volume V

23

Provocatively formulated conclusion:

One can inflate bicycle tires because Central Limit Theorem is true

24

OK, let’s be more modest (and more precise):

One can inflate bicycle tires because Central Limit Theorem is true

AND

is applicable to physical systems

But why is it applicable?

Because the conditions of its applicability hold

in (some) physical systems

25

Conditions ensuring applicability of Central Limit Theorem were:

• The measure space describing a macroscopic system is the

product of measure spaces describing the subsystems

product assumption (OK )

• The energy of the large system is the sum of energies of

subsystems

sum assumption (OK – more or less )

• The number of subsystems is very large (N →∞)

size assumption (OK )

• Principle of Equal Apriori Probability

sounds metaphysical

26

Can one justify the Principle of Equal Apriori Probability ?

Possible attitudes:

• The success of Principle justifies it, further justification is not

needed/possible

success:

(Principle + Assumptions) ⇒ empirically correct predictions

• The Principle should/can be established by linking it to the

properties of the dynamic of large system (determined by the

dynamic of the subsystems)

ergodic-type theorems

27

Main idea of justifying microcanonical measure by ergodic theory :

Separation of

microscopic (= short) and macroscopic (=long)

time scales

Macroscopic measurements yielding macroscopic quantity F

take place on macroscopic time scale

⇓F = long time average of microscopically evolving phase function f :

using microcanonical averaging would be justified if

phase average of f = F = long time average of f

Can the equality of phase average and time average be proved ?

mThe ergodic problem

28

Classical results on the ergodic problem:

Proposition (Birkhoff’s theorem) Given a dynamical system

〈X,S, µ, Tt〉 the limit

f∗(x) := limτ→∞

1

τ

∫ τ

−τ

f(Ttx)dt

exists for µ-almost every x ∈ X and for all f ∈ L1(X, µ)

Definition : The dynamical system is metrically transitive

(synonym: ergodic ) if the T invariant sets have measure 0 or 1, i.e.

if[

(A ⊆ X and Tt[A] ⊆ A) imply µ(A) = 0 or µ(A) = 1]

29

Proposition : If the dynamical system is metrically transitive then

f∗(x) =1

τ

∫ τ

−τ

f(Ttx)dt =

∫

fdµ

for µ-almost every x ∈ X

⇓The time and phase averages are equal (almost everywhere) if the

dynamical system is metrically transitive

⇓If the dynamical system (ΓE ,B(ΓE), µ) on the constant energy

hypersurface ΓE defined by the dynamics of the subsystems in

CSM is metrically transitive (ergodic) then the

Principle of Equal Apriori Probability is justified

and the microcanonical probabilities can be given a

dynamical interpretation

30

Dynamical interpretation of microcanonical probabilities

of ergodic systems:

µ(A) =

∫

χAdµ = limτ→∞

∫ τ

−τ

χA(Ttx)dt

χA(Ttx) =

1 if Ttx ∈ A

0 if Ttx 6∈ A

⇓µ(A) = average time the phase point of the system can be found in

set A during its time evolution

31

Are the dynamical systems (ΓE ,B(ΓE), µ)

metrically transitive (ergodic)?

More generally:

On what conditions is a dynamical system ergodic?

Extremely difficult problem , interesting in its own right, important

in many branches of mathematics (not only in CSM)

32

The status of ergodicity of dynamical systems occurring in CSM is

(to the best of my knowledge) is still to a large extent an open

problem

• It has been claimed (Y. Sinai) that hard spheres in a box with

elastic collisions as the only interaction is ergodic but the full

proof of this claim has never been published (Wightman 1985)

and the claim is considered as (yet) unproven

• Results on the general theory of dynamical systems indicate

that ergodicity is not a property dynamical systems typically

possess

33

Even if ergodicity could be established for dynamical systems in

physics, the explanation of the Principle of Equal Apriori

Probability would only be an explanation

up to measure zero set

⇓we seem to be forced to assume that

the system is not in fact in the measure zero set

“Today’s formulation of the ergodic theorem is that, except for a set of

measure 0, the time average exists and = phase average. All the

hypothesis of ‘disorderliness’ are contained in the assumption that we are

not in that measure zero set.”

J. von Neumann to R. Ortvay (February 2, 1939)

“Revenge of the measure zero set”

34

Attempts have been made to weaken the notion of ergodicity:

• Requiring ε-ergodicity only

The dynamical system is 0 < ε-ergodic if it is metrically

transitive on an invariant set of measure (1− ε): there is a set

X ′ ⊆ X of measure (1− ε) such that if A ⊆ X ′ and Tt[A] ⊆ A

then µ(A) = 0 or µ(A) = 1− ε (Vranas, 1998)

The status of epsilon ergodicity is unclear

• Requiring equality of phase and time averages for the special

phase functions that are sums of phase functions of subsystems

(Khinchin, 1949)

35

Khinchin’s weakening of ergodicity

(using asymptotic form of energy density/microcanonical measure)

Proposition

µ(

s ∈ X :∣∣∣〈f(s)〉T − 〈f〉µ

〈f〉µ

∣∣∣ ≥ K1N

−1/4)

≤ K2N−1/4

µ = microcanonical measure

K1, K2 > 0 constants

f =∑N

i=1 fi phase function = sum of phase functions of subsystems

The set of points on the constant energy hypersurface where a sum

function differs from its microcanonical average more than some

amount that goes to zero as N →∞ has a measure that also goes

to infinity as N →∞.

Revenge of the measure zero set ⇒ Terror of the small measure set

36

Quotations

“This fundamental postulate [Principle of Equal Apriori Probability ] is

eminently reasonable and certainly does not contradict any of the laws of

mechanics. Whether the postulate is actually valid can, of course, only

be decided by making theoretical predictions based on it and by checking

whether these predictions are confirmed by experimental observations. A

large body of calculations based on this postulate have indeed yielded

results in very good agreement with observations. The validity of this

postulate can therefore be accepted with great confidence as the basis of

our theory.”

F. Reif: Fundamentals of statistical and thermal physis (McGraw-Hill,

1965) p. 55

37

“... in all expositions of the statistical mechanics, this phase average is

taken as a theoretical interpretation of any physical quantity. In doing so

either no arguments at all are given in favor of such a choice, or a special

hypothesis is constructed in order to justify this choice, or, finally,

various reasons are cited in favor of such an interpretation, indicating at

the same time that these reasons are not logically obligatory and that

the interpretation was generally accepted in view of the successful results

to which the theory based on this interpretation leads.”

A.I. Khinchin: Mathematical Foundations of Statistical Mechanics

(Dover Publications, 1948) p. 46

38

“...the task of a mathematical justification of the statistical mechanics

reduces essentially to two problems. The first of these two problems, to

investigate as exhaustively as possible, under what conditions and to

what degree the time averages of phase functions, which, as we have

seen, appear as a natural interpretation of experimental measurements,

can be replaced by the phase averages of the same functions. ... The

second problem ... is to create a general method for approximate

computation of phase averages or surfaces of constant energy.”

A.I. Khinchin: Mathematical Foundations of Statistical Mechanics

(Dover Publications) 1948, p. 47

39

“We emphasize once more that this distribution [the microcanonical

distribution ] is not the genuine statistical distribution of a closed

system. If it were, then this would be equivalent to the claim that,

during a sufficiently long time, the phase trajectory of a closed system

would come arbitrary close to any point of the manifold [of the constant

energy surface ]. This claim (under the name ergodic hypothesis ) is

however false in general.”

L.D. Landau, E.M. Lifschitz: Lehrbuch der Theoretischen Physik V.

Statistische Physik (Akademie Verlag, 1971) p. 13 [my translation ]

40

“The physical importance of ergodicity is that it can be used to justify

the use of the microcanonical ensemble for calculating equilibrium values

and fluctuations. Suppose f is some macroscopic observable and the

system is started at time zero from a dynamical state x, for which f(x)

has a value that is very far from its equilibrium value. As time proceeds,

we expect that the current value of f , which is f(Ttx), will approach and

mostly stay very close to an equilibrium value with only very rare large

fluctuations away from this value. This equilibrium value should

therefore be equal to the time average because the initial period during

which equilibrium is established contributes only negligibly to the

formula defining f∗(x). The [ergodic ] theorem tells us that this

equilibrium value is almost equal to 〈f〉, the average value of f in the

microcanonical ensemble, provided the system is ergodic.”

L. Lebowitz and R. Penrose: Modern ergodic theory Physics Today, vol. 26 (1973)

41

“Strict ergodicity has turned out to be surprisingly difficult to prove

even for relatively simple dynamical systems. Contrary to what is

sometimes asserted, the system of N elastic hard balls moving in a

cubical box with hard reflecting walls has not yet been proven to be

ergodic for arbitrary N – only for N ≤ 4 [references] Nevertheless it

iseems that mathematicians are coming increasingly closer to a proof

[references], and computational evidence suggests that this system is

indeed ergodic [references].”

“... epsilon-ergodicity needs to be investigated in more detail. It would

be nice to have theoretical results showing both that ergodicity is

approached when the number of degrees of freedom increases and that ε

decreases faster that exponentially with the number of degrees of

freedom.”

P. Vranas: Epsilon-ergodicity and the success of equilibrium statistical mechanics Philosophy of

Science vol. 65 (1998)

42

Summary

• Equilibrium CSM is possible as a general theory because the

assumptions (product, sum, size, Principle of Equal Apriori

Probability ) made on equilibrium systems make it possible to

apply the Central Limit Theorem, which implies that the

energy density has a universal form

• The Principle of Equal Apriori probability has a status

different from the other assumptions, and remains unmotivated

physically

• Attempts to justify the Principle of Equal Apriori Probability

via ergodic-type theorems are inconclusive, with unsolved

difficult problems

• “Revenge” of measure zero sets remains a problem even if

ergodicity obtains

43

Lecture 2

Structure

• Hilbert space quantum mechanics as non-commutative

probability theory

analogy between classical and quantum concepts, Hilbert lattice,

quantum state, Gleason’s theorem

• Interpretational difficulties related to Hilbert space QM

Kochen-Specker theorem, violation of subadditivity by quantum

states

• Von Neumann’s attitude towards Hilbert space QM

giving up Hilbert space quantum theory, preferring von Neumann

algebras

44

The single sentence one should take away from lecture 2

Hilbert space quantum mechanics is the non-commutative analogue

of classical, Kolmogorovian probability theory but Hilbert space

probability theory cannot be interpreted as the analogy suggests

45

Hilbert space Quantum Mechanics

‖non-commutative probability theory

classical probability theory ⇒ quantum probability theory

replace

Boolean algebra S by Hilbert lattice P(H)

probability measure p by quantum state φ on P(H)

46

(B,∨,∧,⊥) is a Boolean algebra if it is an orthocomplemented

distributive lattice with respect to the lattice operations ∨, ∧ and

A 7→ A⊥ orthocomplementation

Distributivity:

A ∨ (B ∧ C) = (A ∨B) ∧ (A ∨ C) for all A, B, C

A Boolean algebra is always isomorphic with a Boolean algebra of

subsets of a set X

with respect to the set theoretical operations

A ∧B = ∩BA ∨B = A ∪B

A⊥ = X \A

47

Hilbert lattice

(P(H),∨,∧,⊥)

P(H) = set of all closed linear subspaces of a Hilbert space H||

P(H) = set of all projections on a Hilbert space HLattice operations ∨,∧,⊥ defined by:

A ∧B = A ∩B

A ∨B = closure of[

(A + B) = ξ + η : ξ ∈ A, η ∈ B]

A⊥ = ξ ∈ H : 〈ξ, η〉 = 0 ∀η ∈ A

48

Crucial difference between Boolean algebra and Hilbert lattice

(between classical physics and quantum physics):

A Hilbert lattice is not distributive, only orthomodular:

Orthomodularity:

If A ≤ B and A⊥ ≤ C then A ∨ (B ∧ C) = (A ∨B) ∧ (A ∨ C)

Failure of distributivity

mnon commutativity of product of projections

Proposition : If A, B are projections from a distributive sublattice

of P(H) then AB=BA

Hence the terminology:

non-commutative = non-distributive = non-classical = quantum

49

Definition φ:P(H)→ [0, 1] is a quantum probability measure

(or quantum state ) if

(1) φ(0) = 0 φ(I) = 1

(2) φ(∨iAi) =∑

i φ(Ai) if[

Ai⊥Aj (⇔ Ai ≤ A⊥j ) (i 6= j)

]

A quantum state is a σ-additive map from P(H) into [0, 1]

φ is a complete analogue of a classical probability measure

50

Theorem (Gleason): If φ is a quantum state then there exists a

positive, trace class operator ρ with Tr(ρ) = 1 such that

φ(A) = Tr(ρA) =∑

i

〈ξi, ρAξi〉 (1)

and conversely, if ρ is a positive, trace class operator such that

Tr(ρ) = 1 then (1) defines a quantum state φ

Tr is defined by

Tr(Q) =∑

i

〈ξi, Qξi〉 ξi orthonormal basis in H

ρ is the analogue of the probability density function

Tr is the analogue of the counting measure

51

Gleason’s theorem shows: φ can be extended

from projections P(H) to bounded operators B(H)

Analogy: classical measure can be extended

from characteristic functions to integrable functions

The extension process is called: theory of integration

Conclusion: Gleason’s theorem is a theorem in non-commutative

integration

Recovering the standard notion of (vector) state as used in physics:

If ρ = Pξ = projection to ξ ∈ H (state vector) then

Tr(ρQ) =∑

i

〈ξi, QPξξi〉 = 〈ξ, QPξξ〉 = 〈ξ, Qξ〉

〈ξ, Qξ〉 = the usual expectation value of observable Q in state ξ

52

There is a very tight formal correspondence between concepts in

classical probability theory and quantum probability theory, an

analogy that goes beyond the key correspondence

p ←→ φ

µ counting measure ←→ Tr

probability density ←→ ρ

The correspondence is summarized in the next 3 slides

53

Classical Quantum

probability theory probability theory

(X,S, µ) (H,P(H), T r)

classical measure space Hilbert space QM

S Boolean algebra P(H) orthomodular lattice

µ counting measure Tr functional

L1(X, µ) T (H)

integrable functions trace class operators

54

L∞(X, µ) B(H)

essentially bounded functions bounded operators

(bounded) random variables (bounded) observables

g ∈ L1(X, µ), g ≥ 0,∫

gdµ = 1 ρ ∈ T (H), ρ ≥ 0, Tr(ρ) = 1

probability density density matrix ((normal) state)

S 3 A 7→ pg(A) =∫

χAgdµ ∈ [0, 1] P(H) 3 A 7→ Tr(ρA) ∈ [0, 1]

∫gfdµ, g ∈ L1(X, µ) Tr(ρA), ρ ∈ T (H)

expectation value of f ∈ L∞(X, µ) expectation value of A ∈ B(H)

with respect to pg in state ρ

55

L1(X, µ) Banach space T (H) Banach space

‖g‖1 =∫|g|dµ ‖ρ‖Tr = Tr(|ρ|)

L∞(X, µ) Banach space B(H) Banach space

‖f‖∞ = ess.sup.f ‖A‖ = sup‖ξ‖≤1‖Aξ‖

L1(X, µ)∗ = L∞(X, µ) duality T (H)∗ = B(H) duality

φ ∈ L1(X, µ)∗ φ ∈ T ∗

φ(g) =∫

fgdµ φ(ρ) = Tr(ρA)

for some f ∈ L∞(X, µ) for some A ∈ B(H)

L∞(X, µ)∗ ⊃ L1(X, µ) B(H)∗ ⊃ T (H)

L∞(X, µ) 3 f 7→∫

gfdµ, g ∈ L1(X, µ) B(H) 3 A 7→ Tr(ρA)

‖ · ‖∞-cont. functional ‖ · ‖ (op.norm) cont. functional

56

The major conceptual problem related to quantum mechanics:

How can a quantum probability space (H,P(H), φ) be interpreted

as a probability space ?

Specifically:

• What does P(H) stand for ?

• What interpretation can be given to the non-commutative

probability measure φ?

(relative frequency ?)

57

Interpretation of the classical probability space:

Classical propositional logic

||Boolean algebra

||Random event structure

Classical probability

||normalized measure µ on Boolean algebra with

subadditivity property:

µ(A) + µ(B) = µ(A ∪B) + µ(A ∩B)

probability = (possibly) relative frequency

58

The non-commutative version of the classical interpretation would be:

quantum (propositional) logic

||Hilbert lattice P(H)

||quantum (random) event structure

quantum probability

||normalized measure φ on Hilbert lattice

probability = relative frequency

Is such an interpretation possible?

59

Claim: The non-commutative version of a classical

interpretation is NOT possible

Arguments

• P(H) is not an event structure (under a natural conceptual

understanding of the notion of event )

• The quantum probability cannot be given a frequency

interpretation (a la von Mises)

60

(P(H),∨,∧,⊥) is event structure

mA ∧B = [A and B both happen]

A ∨B = [either A or B happens]

A⊥ = [A does not happen]

If A, B are events then

(i) every A either happens or does not happen

(ii) A happens ⇒ A⊥ does not happen

(iii) A happens and B happens ⇒ A ∧B happens

(iv) A ∨B happens ⇒ either A or B happens

(i)-(iv)

mThere exists a h:P(H)→ 0, 1 evaluation

||Boolean algebra homomorphism

61

Proposition : There exists no Boolean algebra homomorphism from

the Hilbert lattice P(H) into a Boolean algebra

because

Proposition (Kochen-Specker Theorem): There exists no partial

Boolean algebra homomorphism from a Hilbert lattice P(H) into

any Boolean algebra

h:P(H)→ B is a partial Boolean algebra homomorphism

mh is a Boolean algebra homomorphism

on every sub Boolean algebra of P(H)

62

Relative frequency interpretation of probability

(von Mises)

(X,S, p) has a relative frequency interpretation if there exists a

fixed statistical ensemble e1, e2, . . . such that

• For every attribute (event) A, presence/absence of A on every

element ei of the ensemble can be decided unambiguously

without changing ei/the ensemble

• For every A the number p(A) = (limit of) relative frequency of

event A in e1, e2, . . .

Von Mises: the ensemble is supposed to be random

Randomness is tricky and problematic but this is not the reason

why a frequency interpretation of quantum probability spaces is

not possible

63

Proposition : subadditivity of a probability measure

µ(A) + µ(B) = µ(A ∪B) + µ(A ∩B) is necessary for a relative

frequency interpretation

#(A ∪B)

N+

#(A ∩B)

N=

#((A \A ∩B) ∪ (B \A ∩B) ∪A ∩B))

N+

#(A ∩B)

N=

#(A \A ∩B) + #(B \A ∩B) + #(A ∩B) + #(A ∩B)

N=

#(A) + #(B)

N

64

Proposition : A quantum probability measure is NOT subadditive

because

Proposition : A countably additive map

φ:P(H)→ IR+ ∪∞

is subadditive iff

φ(A) = const.T r(A)

and

Tr(A) =∑

ξi∈A

〈ξ, Aξi〉+∑

ηj∈A⊥

〈ηj , Aηj〉

︸︷︷︸

0

= dim(A) =∞

for infinite dimensional linear subspaces A ∈ P(H)

65

Another version of failure of subadditivity of quantum states

(after (Szabo, 2001):

Proposition : For any projections A, B ∈ P(H) there exists a

quantum state φ such that

φ(A)︸︷︷︸

1

+ φ(B)︸︷︷︸

>0

−φ(A ∧B)︸︷︷︸

0

> 1

If A, B are events, how can A happen with certainty and B without

A?

Subadditivity excludes such a “strange” situation

66

Options to save interpretational consistency

quantum (propositional) logic

|| ←− give up ! - doesn’t help

Hilbert lattice P(H) ←− give up !? - very radical!

|| ←− give up !

quantum (random) event structure

quantum probability

|| ←− give up !

normalized measure on Hilbert lattice

|| ←− give up !

relative frequency interpretation

67

John von Neumann’s choice (1935-1936):

Give up Hilbert space probability theory!

“I would like to make a confession which may seem immoral: I do not

believe absolutely in Hilbert space any more. After all Hilbert-space (as

far as quantum-mechanical things are concerned) was obtained by

generalizing Euclidean space, footing on the principle of “conserving the

validity of all formal rules”. This is very clear, if you consider the

axiomatic-geometric definition of Hilbert-space, where one simply takes

Weyl’s axioms for a unitary-Euclidean-space, drops the condition on the

existence of a finite linear basis, and replaces it by a minimum of

topological assumptions (completeness + separability). Thus

Hilbert-space is the straightforward generalization of Euclidean space, if

one considers the vectors as the essential notions.

68

Now we [with F.J. Murray, von Neumann’s coauthor ] begin to believe,

that it is not the vectors which matter but the lattice of all linear

(closed) subspaces. Because:

1. The vectors ought to represent the physical states, but they do it

redundantly, up to a complex factor, only.

2. And besides the states are merely a derived notion, the primitive

(phenomenologically given) notion being the qualities, which

correspond to the linear closed subspaces.

But if we wish to generalize the lattice of all linear closed subspaces from

a Euclidean space to infinitely many dimensions, then one does not

obtain Hilbert space ...”

J. von Neumann to G. Birkhoff (Nov. 13., Wednesday [1935])

“I, for one, do not even believe, that the right formal frame for quantum

mechanics is already found.”

J. von Neumann to G. Birkhoff (Nov. 27, [1935])

69

What to replace Hilbert space quantum mechanics by?

John von Neumann’s answer: (hopefully) by the theory of

“rings of operators of type II1”

mtype II1 von Neumann algebras

Why? What are these “type II1 von Neumann algebras”?

This is the subject of the next lecture

70

Summary of Lecture 2.

• Hilbert space quantum mechanics is the non-commutative

(non-distributive) analogue of classical measure/probability

theory – the analogy is detailed and strong, the key elements

being

Boolean algebra ↔ Hilbert lattice

p ↔ quantum state φ

• The elements of non-commutative probability theory cannot be

interpreted as the analogy would suggest:

Hilbert lattice 6= random event structure (Kochen-Specker)

quantum probability φ(A) 6= relative frequency (violation of

subaditivity)

• Von Neumann’s suggestion: Hilbert space probability theory is

pathological, replace it by well behaving operator algebraic QM

71

Lecture 3

Structure

• Limits of Hilbert space probability theory

restricted type, not capable of describing large quantum systems

• Von Neumann algebras

notion, dimension function, classification theory, the five type)

• Rise and fall of the type II1 case

probabilistic reasons for von Neumann’s preference of the type II1

case, the interpretational problems remain for the type II1 case

• Re-interpretation of quantum probabilities as classical

conditional probabilities

Kolmogorovian Censorship hypothesis

72

The single sentence one should take away from Lecture 3

Von Neumann algebra theory is the non-commutative

generalization of Hilbert space probability theory that yields all the

types of non-commutative probability theories that occur in

classical probability theory but the interpretational difficulties

present in Hilbert space probability theory are not solved by

passing to the theory of von Neumann algebras

73

The analogy displayed in the tables between classical

measure/probability theory (reproduced in next slide) and the

non-commutative, Hilbert space measure/probability theory is not

perfect:

• Given a general (X,S) there is no canonical counting measure

µ on SIn contrast, given H, the non-commutative “counting measure”,

the trace functional, is uniquely determined, and only those

probability measures are present in the Hilbert space formalism

which can be given by densities with respect to the trace

• The types of classical measure/probability space and the type

of the Hilbert space measure/probability space may not match:

S can be non-atomic, P(H) is always atomic

µ can take on values in a continuum, Tr is discrete.

74

Classical Quantum

probability theory probability theory

(X,S, µ) (H,P(H), T r)

classical measure space Hilbert space QM

S Boolean algebra P(H) orthomodular lattice

µ counting measure Tr functional

L1(X, µ) T (H)

integrable functions trace class operators

75

Von Neumann algebra theory is precisely the non-commutative

generalization of Hilbert space measure/probability that yields all

the types of non-commutative measure/probability theories that

occur in classical measure/probability theory.

These typical classical measure/probability spaces are:

X = x1, x2, . . . xN, p(xi) = 1N

(i = 1, . . . N) discrete, finite

X = x1, x2, . . . xN , . . ., p(xi) = 1 (i = 1, . . . N) discrete, infinite

X = [0, 1], p = Lebesgue measure on [0, 1] continuous, finite

X = IR, p = Lebesgue measure on IR continuous, infinite

76

Definition : N ⊆ B(H) is a von Neumann algebra if

• I ∈ N (I= identity operator )

• If Q ∈ N then Q∗ ∈ N (*-closed)

• If Q1, Q2 ∈ N then (λ1Q1 + λ2Q2) ∈ N and Q1Q2 ∈ N(N algebra)

• N is closed in the sense that

if for some Q we have φ(Qn)→ φ(Q) for all states then Q ∈ N

B(H) itself is (obviously) a von Neumann algebra

Are there any other examples?

(Other = non-isomorphic to B(H))

mClassification problem

77

Classification of von Neumann algebras

The set of projections P(N ) of a von Neumann algebra is an

orthomodular lattice (= sublattice of the Hilbert lattice P(H))

Definition : d:N → IR+ ∪∞ is a dimension function if

d(A) + d(B) = d(A ∪B) + d(A ∩B)

msubadditivity !

The classification is in terms of the type of the range of the

dimension function defined on the projection lattice P(N ): the

type of the range of the dimension function coincides with the

notion of type used in classifying the classical probability spaces.

The ranges/types are shown on the next slide.

78

HN , dimHN = N finite

N = B(HN ), P(N ) = P(HN ) type IN dimensional

range of d(= Tr) = 1, 2, . . . N finite, discrete QM

H, dimH =∞ standard

N = B(H), P(N ) = P(H) type I∞ Hilbert space

range of d(= Tr) = 1, 2, . . . non-finite, discrete QM

N , P(N ) type II1 Quantum

range of d = [0, 1] finite, continuous stat.phys.

N , P(N ) type II∞ Quantum

range of d = IR non-finite, continuous stat.phys.

N , P(N ) type III Quantum

range of d = 0,∞ very non-finite field theory

79

Why do we need the

“esoteric” types of non-commutative probability theories?

Generally: large quantum systems cannot be described

probabilistically within the Hilbert space probability theory

large = infinite (in size or in degrees of freedom)

Examples of such systems

• lattice gases

(mathematically precise models of discrete quantum statistical

mechanical systems in thermodynamical limit)

• non-relativistic quantum field theory

(non-discrete quantum statistical mechanical systems in

thermodynamic limit)

• relativistic quantum field theory (type III)

80

Infinite, discrete quantum system

occurring in quantum statistical mechanics

not describable in Hilbert space QM

(one dimensional lattice gas, Ising model)

Small quantum system sitting at each point i on a one dimensional

lattice infinite in both directions:

. . .

B(Hi−2)•

i− 2

B(Hi−1)•

i− 1

B(Hi)•i

B(Hi+1)•

i + 1

B(Hi+2)•

i + 2 . . .

Hi = H identical copies, dim(H) = finite

Large quantum system: analogue of Descartes product of classical

probability spaces: (infinite) union of (finite) tensor product:

A = ∪N ⊗Ni B(Hi)

This system cannot be described within the Hilbert space

formalism because A 6= B(H) (there is a finite trace on A)

81

Types IN and II1 are distinguished:

• the dimension function is normalized and

• satisfies subadditivity ⇒ d(A) can be interpreted as relative

frequency

• the projection lattices P(HN ) and P(N ) are not only

orthomodular but modular :

If A ≤ B, then A ∨ (B ∧ C) = (A ∨B) ∧ (A ∨ C)

distributivity6⇐⇒ modularity

6⇐⇒ orthomodularity

(HN ,P(HN ), d) and (H,P(N ), d) seem to be well behaving

non-commutative (quantum) probability theories whose

probabilities can in principle be interpreted as relative frequencies

82

John von Neumann:

• The type II1 (finite, continuous) non-commutative probability

space (H,P(N ), d) is the proper infinite dimensional

generalization of the (HN ,P(HN ), d) finite dimensional

non-commutative probability space

• (HN ,P(HN ), d) is the proper probabilistic framework for QM

and not Hilbert space quantum mechanics

83

“I would like to make a confession which may seem immoral: I do not

believe absolutely in Hilbert space any more. After all Hilbert-space (as

far as quantum-mechanical things are concerned) was obtained by

generalizing Euclidean space, footing on the principle of “conserving the

validity of all formal rules”. This is very clear, if you consider the

axiomatic-geometric definition of Hilbert-space, where one simply takes

Weyl’s axioms for a unitary-Euclidean-space, drops the condition on the

existence of a finite linear basis, and replaces it by a minimum of

topological assumptions (completeness + separability). Thus

Hilbert-space is the straightforward generalization of Euclidean space, if

one considers the vectors as the essential notions.

84

Now we [with F.J. Murray, von Neumann’s coauthor ] begin to believe,

that it is not the vectors which matter but the lattice of all linear

(closed) subspaces. Because:

1. The vectors ought to represent the physical states, but they do it

redundantly, up to a complex factor, only.

2. And besides the states are merely a derived notion, the primitive

(phenomenologically given) notion being the qualities, which

correspond to the linear closed subspaces.

But if we wish to generalize the lattice of all linear closed subspaces from

a Euclidean space to infinitely many dimensions, then one does not

obtain Hilbert space, but that configuration, which Murray and I called

“case II1.” (The lattice of all linear closed subspaces of Hilbert-space is

our “case I∞”.) And this is chiefly due to the presence of the rule

a ≤ c→ a ∪ (b ∩ c) = (a ∪ b) ∩ c

This “formal rule” would be lost, by passing to Hilbert space!”

J. von Neumann to G. Birkhoff (Nov. 13., Wednesday [1935])

85

Von Neumann’s dream (around 1934-1935)

classical quantum

classical logic quantum logic

|| ||

Boolean algebra type II1 v. Neumann lattice

distributive modular

|| ||

random event structure quantum (random) event structure

classical probability quantum probability

|| ||

normalized measure dimension function d

|| ||

relative frequency relative frequency

86

Is (P(N ), d) with a type II1 v. Neumann lattice

REALLY a NON-COMMUTATIVE probability space

whose probabilities can be interpreted as relative frequencies?

NO

because

Proposition (Murray-von Neumann) : d is the restriction to P(N )

of (or can be extended to) a tracial state τ on Nτ is a tracial state iff

τ(XY ) = τ(Y X) for all X, Y ∈ N

“τ is insensitive for the non-commutativity”

87

Proposition : A linear functional on a von Neumann algebra is

subadditive if and only if it is a trace

Consequently : d (τ) is the ONLY subadditive measure on P(N ) ⇒Only those states satisfy a necessary condition for a relative

frequency interpretation which disregard the non-commutative

(non-classical) character of the “random event structure

Therefore

if one wants to have a genuinely non-commutative probability space

then the frequency interpretation has to go!

It did: von Neumann gave up the frequency view in 1937:

“This view, the so-called ‘frequency theory of probability’ has been very

brilliantly upheld and expounded by R. von Mises. This view, however,

is not acceptable to us, at least not in the present ‘logical’ context.”

“Quantum logic (strict- and probability logics)” unfinished, unpublished manuscript from 1937

88

How to interpret non-commutative probability

if not by relative frequency?

von Neumann: “logical interpretation”:

“However, all quantum mechanical probabilities are defined by inner

products of vectors. Essentially if a state of a system is given by one

vector, the transition probability in another state is the inner product of

the two which is the square of the cosine of the angle between them. In

other words, probability corresponds precisely to introducing the angles

geometrically. Furthermore, there is only one way to introduce it. The

more so because in the quantum mechanical machinery the negation of a

statement, so the negation of a statement which is represented by a

linear set of vectors, corresponds to the orthogonal complement of this

linear space.

And therefore, as soon as you have introduced into the projective

geometry the ordinary machinery of logics, you must have introduced the

concept of orthogonality. This actually is rigorously true and any

89

axiomatic elaboration of the subject bears it out. So in order to have

logics you need in this set a projective geometry with a concept of

orthogonality in it.

In order to have probability all you need is a concept of all angles, I mean

angles other than 90 . Now it is perfectly quite true that in a geometry,

as soon as you can define the right angle, you can define all angles.

Another way to put it is that if you take the case of an orthogonal space,

those mappings of this space on itself, which leave orthogonality intact,

leave all angles intact, in other words, in those systems which can be

used as models of the logical background for quantum theory, it is true

that as soon as all the ordinary concepts of logics are fixed under some

isomorphic transformation, all of probability theory is already fixed.

90

What I now say is not more profound than saying that the concept of a

priori probability in quantum mechanics is uniquely given from the start.

You can derive it by counting states and all the ambiguities which are

attached to it in classical theories have disappeared. This means,

however, that one has a formal mechanism, in which logics and

probability theory arise simultaneously and are derived simultaneously.”

J. von Neumann: Unsolved problems in mathematics” talk delivered to the International Congress

of Mathematicians, September 2-9, Amsterdam, 1954

91

What von Neumann says:

Assume: Quantum Logic = P(N ) of a type II1 von Neumann

algebra; U ∈ N unitary element

• U leaves angles between Hilbert space vectors invariant:

〈ξ, η〉 = 〈ξ, Uη〉

• every U leaves P(N ) invariant in the sense:

UAU∗ is projection if A is (U(·)U ∗ = symmetry/isomorphism

of logic)

• τ is a tracial state iff[

τ(UXU∗) = τ(X) for all U]

⇒ the set

of all U ’s determines the trace uniquely ⇒ set of U ’s

determines the dimension function d = subadditive probability

Probability (=d) is determined by logic

This situation only obtains in the type II1 and IN cases (=finite)

92

Von Neumann did not regard this “logical interpretation of

probability” well understood and well articulated:

“I think that it is quite important and will probably shade [shed] a great

deal of new light on logics and probably alter the whole formal structure

of logics considerably, if one succeeds in deriving this system from first

principles, in other words from a suitable set of axioms. All the existing

axiomatisations of this system are unsatisfactory in this sense, that they

bring in quite arbitrarily algebraical laws which are not clearly related to

anything that one believes to be true or that one has observed in

quantum theory to be true. So, while one has very satisfactorily

formalistic foundations of projective geometry of some infinite

generalizations of it, including orthogonality, including angles, none of

them are derived from intuitively plausible first principles in the manner

in which axiomatisations in other areas are.

93

Now I think that at this point lies a very important complex of open

problems, about which one does not know well of how to formulate them

now, but which are likely to give logics and the whole dependent system

of probability a new slam.”

J. von Neumann: Unsolved problems in mathematics” talk delivered to the International Congress

of Mathematicians, September 2-9, Amsterdam, 1954

Von Neumann never worked out the indicated “joint theory of

(quantum) logic and (quantum) probability” axiomatically, and not

because he did not try:

94

“Dear Doctor Silsbee,

It is with great regret that I am writing these lines to you, but I simply

cannot help myself. In spite of very serious attempts to write the article

on the ”Logics of quantum mechanics” I find it completely impossible to

do it at this time. As you may know, I wrote a paper on this subject

with Garrett Birkhoff in 1936 ([reference]), and I have thought a good

deal on the subject since. My work on continuous geometries, on which I

gave the Amer.Math.Soc. Colloquium lectures in 1937, comes to a

considerable extent from this source. Also a good deal concerning the

relationship between strict and probability logics (upon which I touched

briefly in the Henry Joseph Lecture) and the extension of this

“Propositional calculus” work to ”logics with quantifiers” (which I never

so far discussed in public). All these things should be presented as a

connected whole ...

When I offered to give the Henry Joseph Lecture on this subject, I

thought (and I hope that I was not too far wrong in this) that I could

give a reasonable general survey of at least part of the subject in a talk,

95

which might have some interest to the audience. I did not realize the

importance nor the difficulties of reducing this to writing.

I have now learned – after a considerable number of serious but very

unsuccessful efforts – that they are exceedingly great. I must, of course,

accept a good part of the responsibility for my method of writing – I

write rather freely and fast if a subject is ”mature” in my mind, but

develop the worst traits of pedantism and inefficiency if I attempt to give

a preliminary account of a subject which I do not have yet in what I can

believe in its final form.

I have tried to live up to my promise and to force myself to write this

article, and spent much more time on it than on many comparable ones

which I wrote with no difficulty at all – and it just didn’t work. ”

Von Neumann’s letter to Dr. Silsbee, July 2, 1945

96

Radical solution of the interpretational inconsistency

related to non-commutative probability theory:

Kolmogorovian censorship hypothesis (L. Szabo)

mwe never observe “quantum probabilities”

quantum probabilities = classical conditional probabilities

conditioning events: setting up measurements

Non-commutative (quantum) probability theory on this view is just

a mathematical framework enabling to handle very efficiently all

conceivable conditional probabilities one encounters in

experimental situations.

97

To maintain this interpretation in full generality one should be able

to prove that a non-commutative probability space (X,P(N ), φ)

can always be conditionally represented by a classical probability

space (SP(N ), pφ), where the “conditional representation” means

Proposition for any set Z of mutually incompatible projections in

P(N ) there exists a set E ⊆ S × S of pairs (A, a) of events and a

function A 7→ (A, a) ∈ E such that if for all Aλ, Aν ∈ Z we have

aλ ∩ aν = ∅ (if λ 6= ν) then (2)

φ(A) =pφ(A ∩ a)

pφ(a)(3)

(2) expresses that incompatible observables can never be

simultaneously measured, (3) expresses that quantum

“probabilities” are in fact classical conditional probabilities, the

conditions being the events of setting up measurement.

98

Such representation theorems can be proved for finite Z(Bana 1997) and for countably infinite Z (Szabo 2001)

Comments on Kolmogorovian Censorhsip

• Conditional representation theorems cannot hold for a Zcontaining a continuum number of mutually incompatible

projections (since there does not exist a measure space with a

σ-additive normalized measure and a continuum number of

mutually disjoint measurable sets each having a non-zero

measure); on the other hand, there does exist a continuum

number of mutually incompatible projections in a von

Neumann lattice ⇒ constraint on the generality of this

interpretation

• Kolmogorovian Censorhsip = strong instrumentalism, difficult

to accept philosophically

99

Summary of Lecture 3.

• Large (infinite degrees of freedom) quantum systems cannot be

described within the usual Hilbert space quantum

mechanics/probability theory

• The classification theory of von Neumann algebras shows that von

Neumann algebra theory is the non-commutative probability theory

that provides all the typical types of probability measure spaces

• Contrary to von Neumann’s early expectation, the interpretational

difficulties present in Hilbert space probability theory cannot be

solved by passing to the specific type II1 non-commutative

probability theory

• To avoid the difficulties von Neumann gave up the frequency

interpretation of quantum probability

• The Kolmogorovian censorship hypothesis re-interprets quantum

probabilities as classical conditional probabilities, thereby saving the

frequency view; but the interpretation lacks full generality

100

three lectures on probability in physicsphil.elte.hu/redei/talks/konstanz.pdf · three lectures on...

Documents