three lectures on probability in physicsphil.elte.hu/redei/talks/konstanz.pdf · three lectures on...
TRANSCRIPT
Three lectures on probability in physics
Miklos Redei
Department of History and Philosophy of Science
Eotvos University, Budapest, Hungary
Prepared for
Philosophy, Probability and the Special Sciences
International Summer School
Konstanz, Germany, July 27-August 4, 2003
(Thanks Gabriella, Stephan and Luc for the invitation!!)
1
Lecture 1
Probability in Classical Statistical Mechanics
Lecture 2
Probability in Hilbert Space Quantum Mechanics
Lecture 3
Probability in Algebraic Quantum Mechanics
2
Preliminary remarks
Each lecture
• will begin by a short review of the main issues to be discussed
in the talk
• will close by a brief summary of the main points
• will formulate one single sentence intended to be remembered
The pdf file of the lectures will be available from my home page
http://hps.elte.hu/∼redei
Some of the issues/points to be discussed are very technical but I
have suppressed the technicalities as much as I could
3
Probability in Classical Statistical Mechanics
Structure:
• Elementary mathematical comments
Probability theory in measure theoretic form, distributions, densities
• Informal overview of the basic problem and main idea of CSM
Averaging using Principle of Equal Apriori Probability
• Application of Central Limit Theorem in CSM
Physical assumptions making Central Limit Theorem applicable,
Central Limit Theorem, universal form of state density as
consequence of Central Limit Theorem
• Ergodic theory as an attempt to justify the Principle of Equal
Apriori Probability
The ergodic problem, Birkhoff’s theorem, metric transitivity
4
The one sentence to be remembered
Standard Equilibrium Classical Statistical Mechanics is based on
the difficult-to-justify Principle of Equal Apriori Probability and on
the physical assumptions that make possible the application of
Central Limit Theorem to such many body systems
5
Mathematical comments
(X,S, µ) classical measure space
X set
S Boolean algebra
µ:S → IR+ ∪∞ σ-additive measure
counting measure
L1(X, µ) integrable functions
pg(A) =∫
χAfdµ probability measure
g ∈ L1(X, µ) given by density g
w.r.t. counting measure µ
(X,S, pg) probability measure space
6
f : X → IR random variable
(S,B(IR)) measurable
p f−1:B(IR)→ IR distribution of f
F (r) = p(x ∈ X : f(x) ≤ r) distribution function of f
F (x) =∫ x
−∞φ(t)dt φ= density of distribution function
φ(t) =∫
eixtφ(x)dx characteristic function of
distribution function F
7
Proposition : If fi are independent random variables then
• the density φ of the distribution function of their sum is the
convolution of the densities of their distribution functions φi:
φ(x) =
∫
φN (x−N−1∑
i
xi)ΠN−1i φi(xi)dx1dx2 . . .dxN−1
• the characteristic function φ of the distribution function of
their sum is the product of the characteristic functions of their
distribution functions φi: φ(x) = Πiφi
8
Main problem of CSM
Macroscopically large system S in “equilibrium”
F physical quantity of large system, constant in time
si (i = 1, . . . N) identical subsystems of S such that
– si’s interact via known law
– si’s states change in time according to known law
– N is “large” (∼ 1023)
How can F be viewed/expressed/computed as the average
of a (suitable) function f that depends on the state’s of si’s?
Example:
Ideal gas
9
Sketch of Standard Solution of Main Problem of CSM
One macroscopic equilibrium state ↔ many microstates
Ω(E1, E2): # of (different) microstates between energies E1, E2
F =< f >=∫
fdp
f = appropriate phase function representing F
p = probability measure on phase space of large system expressing
that all microstates compatible with a given macrostate are equally
likely: the probability of a set of microstates is proportional to the
size (measure) of the set:
p(microstates between energies E1, E2) ≈ Ω(E1, E2)
Principle of Equal Apriori Probability
Ω typically has a density ω:
Ω(E1, E2) =
∫ E2
E1
ω(x)dx)
10
Principle of Equal Apriori Probability in the case when
macrostate = energy of the system is fixed (F = E = const.):
The microstates compatible with F = E = const.
lie on the constant energy (hyper)surface ΓE ⊂ X
the average of phase function f defined on ΓE :
< f >=∫
fdµ
where µ is the microcanonical probability measure:
µ(A) = 1µ(ΓE)v(A)
with v being the surface measure on ΓE
11
By the Principle of Equal Apriori Probability probabilities are
proportional to the “number” (measure) of states
⇓The state density ω is a crucial quantity:
ω determines the probabilistic behavior of large system
⇓Statistical mechanics is possible as a general theory of macroscopic
systems consisting of subsystems to the extent the density of states
of macroscopic systems has universal characteristics
12
Density of states does have universal features
Universal features are consequences of the following assumptions:
• The measure space describing a macroscopic system is the
product of measure spaces describing the subsystems
• The energy of the large system is the sum of energies of
subsystems
• The number of subsystems is very large (N →∞)
The force of these assumptions is that they make it possible to
apply the Central Limit Theorem to derive universal characteristics
of the state density function in the N →∞ limit
13
Central limit theorem:
Let (X,S, µ) be a probability space and fi: X → IR be independent
random variables such that
• Mµ(fi) = Mi, Mµ(|fi −Mi|2) = D2i , Mµ(|fi −Mk|3) = H3
i
exist
• limn→∞
3√∑
n
iH3
i√∑
n
iD2
i
= 0
Let gn = f1+f2+...fn
n be the average and g∗n = gn−M(gn)D(gn) be the
standardized average. If Gn is the distribution function of g∗n, then
limn→∞
Gn(x) =1
2π
∫ x
∞
e−y2
2 dy
The density of the distribution function of the (standardized)
average of n independent random variables tends to the Gaussian
as n→∞The point is that the proposition is true for any µ!
14
X = ×Ni Xi phase space of large/sub system
v = ×Ni vi, volume measure on B(X), (on B(Xi))
f : X → IR physical quantity of large system
f : Xi → IR, of subsystem e.g. energy fE , fEi
Ω(x) ≡ v(s ∈ X: fE(x) ≤ x) measure, volume of states of large
Ωi(x) ≡ vi(s ∈ Xi: fEi (x) ≤ x) of small system in which
the value of energy is less than x
ω(x) ≡ ddx
Ω(x) density of states, structure function
circle: area: Ω(x) = x2π measure of energy hypersurface Γx
circumference: ω(x) = 2xπ
< f >= 1ω(x)
ddx
∫
f−1E
([0,x])fdv expectation value of f
< f >=∫
f |Γxdµ on the energy surface Γx
µ(A) = 1Γ(x)
∫
A
vN−1
|gradfE |fE(y)=xµ = microcanonical measure
15
(X,B(X), v) = ×Ni (Xi,B(Xi), vi)
v = ×Ni vi fE =
∑
i fEi fE
i are independent in v
⇓
ω(x) =
∫
X\sn
ωn
(
x−n−1∑
i=1
xi
)[
Πn−1i=1 ωi(x)
]
dx
The state density of the large system is the convolution of the state
densities of the component systems
⇓If the densities ωi were probability densities (they are not because
they are not normalized), then the state density ω of the large
system would be the density of the distribution function of the sum
fE =∑
i fEi of the independent random variables fE
i representing
the energy of the subsystems, and we could apply Central Limit
Theorem to approximate ω. Yet, with a formally easy
normalization trick one can do this.
16
The normalization trick:
Define Φ(α), uα for the large system and
Φi(α), uαi for the subsystems by
Φ(α) :=
∫
e−αxω(x)dx (α ∈ IR+ parameter)
uα(x) :=
1
Φ(α)e−αxω(x) x ≥ 0
0 x < 0
Φi(α) :=
∫
e−αxωi(x)dx (α ∈ IR+ parameter)
uαi (x) :=
1
Φi(α)e−αxω(x) x ≥ 0
0 x < 0
17
Proposition:
• uα(x), uαi ≥ 0
∫uα =
∫uα
i = 1 (for all i)
(i.e. uα(x), uαi are probability densities)
• fEi are independent w.r.t. the probability measure defined by
the renormalized densities
• The convolution rule for ω entails
the product rule for Φ(α)
Φ(α) = ΠNi Φα
i
and the convolution rule for uα:
uα(x) =
∫
X\sn
uαn
(
x−n−1∑
i=1
)[
Πn−1i=1 uα
i (x)]
dx
18
Proposition, and in particular the convolution rule for uα shows
that uα behaves like the density of the distribution function of the
sum of N independent random variables having the densities uαi for
their distribution functions.
⇓ (Central Limit Theorem)
limN→∞
uα(x) =1
2πDuα
exp(
− (x−Muα)2
2Duα
)
⇓ (since ω(x) = Φ(α)eαxuα(x))
limN→∞
ω(x) = Φ(α)eαx 1
2πDuα
exp(
− (x−Muα)2
2Duα
)
In the limit N →∞ the density of states has a universal form
irrespective of the precise density of states ωi of the constituent
subsystems!
19
Example
ideal gas of N identical classical particles
enclosed in volume V , with particle mass m
Exact density function and approximate density function :
ωIG(x) = V N (2π)3N/2
Γ[(3N/2) + 1]m3n/2 3N
2x3N/2−1
ωIG(x) ≈ V N (2π)−3N/2
(3N/2)3N/2e−3n/2[2π(3n/2)]1/2m3N/2 3N
2x3N/2−1
Difference between ω(x) and ω(x): Γ[(3N/2) + 1] is replaced by its
approximation by Stirling’s formula N ! =√
2πNNNe−N (N 1)
where the gamma function is Γ(t) =∫ ∞
0e−xxt−1dx
20
Remark (Gibbs paradox)
ωIG and ωIG are the densities of the distribution of energy of ideal
gas of N particles. The Principle of Equal Apriori Probability
states that
p(microstates between energies E1, E2) ≈ Ω(E1, E2)
where Ω(E1, E2) is the number of different microstates. Computing
averages with Ω(E1, E2) =∫ E2
E1ωIG (Ω(E1, E2) =
∫ E2
E1ωIG) one
would get into contradiction with thermodynamics (the entropy
would not be additive = Gibbs paradox). One avoids the paradox
by defining
ωIGOK :=
1
N !ωIG ωIG
OK :=1
N !ωIG
Interpretation: the particles must not be considered
distinguishable when counting the number of microstates – not
even if they are considered distinguishable in classical mechanics.
21
Immediate, most important
corollary of universal form of density function
Boltzmann’s law:
The density of the probability distribution of a small subsystem of
a large system (=heat bath) is given by
ρsm(x) = const.e−βfEsm(x)
β=(inverse) temperature, Z = const.∫
Xsme−βfE
sm(x)dx partition
function, fEsm: Xsm → IR energy function of small system
< g >=1
const.Z
∫
Xsm
g(x)e−βfEsm(x)dx
expectation value of quantity (phase function) g of the small system
22
[
Boltzmann’s law + partition function]
⇓All relations of statistical mechanics
E.g.:
PV = NkT
state equation of classical ideal gas
of N particles in container of volume V
23
Provocatively formulated conclusion:
One can inflate bicycle tires because Central Limit Theorem is true
24
OK, let’s be more modest (and more precise):
One can inflate bicycle tires because Central Limit Theorem is true
AND
is applicable to physical systems
But why is it applicable?
Because the conditions of its applicability hold
in (some) physical systems
25
Conditions ensuring applicability of Central Limit Theorem were:
• The measure space describing a macroscopic system is the
product of measure spaces describing the subsystems
product assumption (OK )
• The energy of the large system is the sum of energies of
subsystems
sum assumption (OK – more or less )
• The number of subsystems is very large (N →∞)
size assumption (OK )
• Principle of Equal Apriori Probability
sounds metaphysical
26
Can one justify the Principle of Equal Apriori Probability ?
Possible attitudes:
• The success of Principle justifies it, further justification is not
needed/possible
success:
(Principle + Assumptions) ⇒ empirically correct predictions
• The Principle should/can be established by linking it to the
properties of the dynamic of large system (determined by the
dynamic of the subsystems)
ergodic-type theorems
27
Main idea of justifying microcanonical measure by ergodic theory :
Separation of
microscopic (= short) and macroscopic (=long)
time scales
Macroscopic measurements yielding macroscopic quantity F
take place on macroscopic time scale
⇓F = long time average of microscopically evolving phase function f :
using microcanonical averaging would be justified if
phase average of f = F = long time average of f
Can the equality of phase average and time average be proved ?
mThe ergodic problem
28
Classical results on the ergodic problem:
Proposition (Birkhoff’s theorem) Given a dynamical system
〈X,S, µ, Tt〉 the limit
f∗(x) := limτ→∞
1
τ
∫ τ
−τ
f(Ttx)dt
exists for µ-almost every x ∈ X and for all f ∈ L1(X, µ)
Definition : The dynamical system is metrically transitive
(synonym: ergodic ) if the T invariant sets have measure 0 or 1, i.e.
if[
(A ⊆ X and Tt[A] ⊆ A) imply µ(A) = 0 or µ(A) = 1]
29
Proposition : If the dynamical system is metrically transitive then
f∗(x) =1
τ
∫ τ
−τ
f(Ttx)dt =
∫
fdµ
for µ-almost every x ∈ X
⇓The time and phase averages are equal (almost everywhere) if the
dynamical system is metrically transitive
⇓If the dynamical system (ΓE ,B(ΓE), µ) on the constant energy
hypersurface ΓE defined by the dynamics of the subsystems in
CSM is metrically transitive (ergodic) then the
Principle of Equal Apriori Probability is justified
and the microcanonical probabilities can be given a
dynamical interpretation
30
Dynamical interpretation of microcanonical probabilities
of ergodic systems:
µ(A) =
∫
χAdµ = limτ→∞
∫ τ
−τ
χA(Ttx)dt
χA(Ttx) =
1 if Ttx ∈ A
0 if Ttx 6∈ A
⇓µ(A) = average time the phase point of the system can be found in
set A during its time evolution
31
Are the dynamical systems (ΓE ,B(ΓE), µ)
metrically transitive (ergodic)?
More generally:
On what conditions is a dynamical system ergodic?
Extremely difficult problem , interesting in its own right, important
in many branches of mathematics (not only in CSM)
32
The status of ergodicity of dynamical systems occurring in CSM is
(to the best of my knowledge) is still to a large extent an open
problem
• It has been claimed (Y. Sinai) that hard spheres in a box with
elastic collisions as the only interaction is ergodic but the full
proof of this claim has never been published (Wightman 1985)
and the claim is considered as (yet) unproven
• Results on the general theory of dynamical systems indicate
that ergodicity is not a property dynamical systems typically
possess
33
Even if ergodicity could be established for dynamical systems in
physics, the explanation of the Principle of Equal Apriori
Probability would only be an explanation
up to measure zero set
⇓we seem to be forced to assume that
the system is not in fact in the measure zero set
“Today’s formulation of the ergodic theorem is that, except for a set of
measure 0, the time average exists and = phase average. All the
hypothesis of ‘disorderliness’ are contained in the assumption that we are
not in that measure zero set.”
J. von Neumann to R. Ortvay (February 2, 1939)
“Revenge of the measure zero set”
34
Attempts have been made to weaken the notion of ergodicity:
• Requiring ε-ergodicity only
The dynamical system is 0 < ε-ergodic if it is metrically
transitive on an invariant set of measure (1− ε): there is a set
X ′ ⊆ X of measure (1− ε) such that if A ⊆ X ′ and Tt[A] ⊆ A
then µ(A) = 0 or µ(A) = 1− ε (Vranas, 1998)
The status of epsilon ergodicity is unclear
• Requiring equality of phase and time averages for the special
phase functions that are sums of phase functions of subsystems
(Khinchin, 1949)
35
Khinchin’s weakening of ergodicity
(using asymptotic form of energy density/microcanonical measure)
Proposition
µ(
s ∈ X :∣∣∣〈f(s)〉T − 〈f〉µ
〈f〉µ
∣∣∣ ≥ K1N
−1/4)
≤ K2N−1/4
µ = microcanonical measure
K1, K2 > 0 constants
f =∑N
i=1 fi phase function = sum of phase functions of subsystems
The set of points on the constant energy hypersurface where a sum
function differs from its microcanonical average more than some
amount that goes to zero as N →∞ has a measure that also goes
to infinity as N →∞.
Revenge of the measure zero set ⇒ Terror of the small measure set
36
Quotations
“This fundamental postulate [Principle of Equal Apriori Probability ] is
eminently reasonable and certainly does not contradict any of the laws of
mechanics. Whether the postulate is actually valid can, of course, only
be decided by making theoretical predictions based on it and by checking
whether these predictions are confirmed by experimental observations. A
large body of calculations based on this postulate have indeed yielded
results in very good agreement with observations. The validity of this
postulate can therefore be accepted with great confidence as the basis of
our theory.”
F. Reif: Fundamentals of statistical and thermal physis (McGraw-Hill,
1965) p. 55
37
“... in all expositions of the statistical mechanics, this phase average is
taken as a theoretical interpretation of any physical quantity. In doing so
either no arguments at all are given in favor of such a choice, or a special
hypothesis is constructed in order to justify this choice, or, finally,
various reasons are cited in favor of such an interpretation, indicating at
the same time that these reasons are not logically obligatory and that
the interpretation was generally accepted in view of the successful results
to which the theory based on this interpretation leads.”
A.I. Khinchin: Mathematical Foundations of Statistical Mechanics
(Dover Publications, 1948) p. 46
38
“...the task of a mathematical justification of the statistical mechanics
reduces essentially to two problems. The first of these two problems, to
investigate as exhaustively as possible, under what conditions and to
what degree the time averages of phase functions, which, as we have
seen, appear as a natural interpretation of experimental measurements,
can be replaced by the phase averages of the same functions. ... The
second problem ... is to create a general method for approximate
computation of phase averages or surfaces of constant energy.”
A.I. Khinchin: Mathematical Foundations of Statistical Mechanics
(Dover Publications) 1948, p. 47
39
“We emphasize once more that this distribution [the microcanonical
distribution ] is not the genuine statistical distribution of a closed
system. If it were, then this would be equivalent to the claim that,
during a sufficiently long time, the phase trajectory of a closed system
would come arbitrary close to any point of the manifold [of the constant
energy surface ]. This claim (under the name ergodic hypothesis ) is
however false in general.”
L.D. Landau, E.M. Lifschitz: Lehrbuch der Theoretischen Physik V.
Statistische Physik (Akademie Verlag, 1971) p. 13 [my translation ]
40
“The physical importance of ergodicity is that it can be used to justify
the use of the microcanonical ensemble for calculating equilibrium values
and fluctuations. Suppose f is some macroscopic observable and the
system is started at time zero from a dynamical state x, for which f(x)
has a value that is very far from its equilibrium value. As time proceeds,
we expect that the current value of f , which is f(Ttx), will approach and
mostly stay very close to an equilibrium value with only very rare large
fluctuations away from this value. This equilibrium value should
therefore be equal to the time average because the initial period during
which equilibrium is established contributes only negligibly to the
formula defining f∗(x). The [ergodic ] theorem tells us that this
equilibrium value is almost equal to 〈f〉, the average value of f in the
microcanonical ensemble, provided the system is ergodic.”
L. Lebowitz and R. Penrose: Modern ergodic theory Physics Today, vol. 26 (1973)
41
“Strict ergodicity has turned out to be surprisingly difficult to prove
even for relatively simple dynamical systems. Contrary to what is
sometimes asserted, the system of N elastic hard balls moving in a
cubical box with hard reflecting walls has not yet been proven to be
ergodic for arbitrary N – only for N ≤ 4 [references] Nevertheless it
iseems that mathematicians are coming increasingly closer to a proof
[references], and computational evidence suggests that this system is
indeed ergodic [references].”
“... epsilon-ergodicity needs to be investigated in more detail. It would
be nice to have theoretical results showing both that ergodicity is
approached when the number of degrees of freedom increases and that ε
decreases faster that exponentially with the number of degrees of
freedom.”
P. Vranas: Epsilon-ergodicity and the success of equilibrium statistical mechanics Philosophy of
Science vol. 65 (1998)
42
Summary
• Equilibrium CSM is possible as a general theory because the
assumptions (product, sum, size, Principle of Equal Apriori
Probability ) made on equilibrium systems make it possible to
apply the Central Limit Theorem, which implies that the
energy density has a universal form
• The Principle of Equal Apriori probability has a status
different from the other assumptions, and remains unmotivated
physically
• Attempts to justify the Principle of Equal Apriori Probability
via ergodic-type theorems are inconclusive, with unsolved
difficult problems
• “Revenge” of measure zero sets remains a problem even if
ergodicity obtains
43
Lecture 2
Structure
• Hilbert space quantum mechanics as non-commutative
probability theory
analogy between classical and quantum concepts, Hilbert lattice,
quantum state, Gleason’s theorem
• Interpretational difficulties related to Hilbert space QM
Kochen-Specker theorem, violation of subadditivity by quantum
states
• Von Neumann’s attitude towards Hilbert space QM
giving up Hilbert space quantum theory, preferring von Neumann
algebras
44
The single sentence one should take away from lecture 2
Hilbert space quantum mechanics is the non-commutative analogue
of classical, Kolmogorovian probability theory but Hilbert space
probability theory cannot be interpreted as the analogy suggests
45
Hilbert space Quantum Mechanics
‖non-commutative probability theory
classical probability theory ⇒ quantum probability theory
replace
Boolean algebra S by Hilbert lattice P(H)
probability measure p by quantum state φ on P(H)
46
(B,∨,∧,⊥) is a Boolean algebra if it is an orthocomplemented
distributive lattice with respect to the lattice operations ∨, ∧ and
A 7→ A⊥ orthocomplementation
Distributivity:
A ∨ (B ∧ C) = (A ∨B) ∧ (A ∨ C) for all A, B, C
A Boolean algebra is always isomorphic with a Boolean algebra of
subsets of a set X
with respect to the set theoretical operations
A ∧B = ∩BA ∨B = A ∪B
A⊥ = X \A
47
Hilbert lattice
(P(H),∨,∧,⊥)
P(H) = set of all closed linear subspaces of a Hilbert space H||
P(H) = set of all projections on a Hilbert space HLattice operations ∨,∧,⊥ defined by:
A ∧B = A ∩B
A ∨B = closure of[
(A + B) = ξ + η : ξ ∈ A, η ∈ B]
A⊥ = ξ ∈ H : 〈ξ, η〉 = 0 ∀η ∈ A
48
Crucial difference between Boolean algebra and Hilbert lattice
(between classical physics and quantum physics):
A Hilbert lattice is not distributive, only orthomodular:
Orthomodularity:
If A ≤ B and A⊥ ≤ C then A ∨ (B ∧ C) = (A ∨B) ∧ (A ∨ C)
Failure of distributivity
mnon commutativity of product of projections
Proposition : If A, B are projections from a distributive sublattice
of P(H) then AB=BA
Hence the terminology:
non-commutative = non-distributive = non-classical = quantum
49
Definition φ:P(H)→ [0, 1] is a quantum probability measure
(or quantum state ) if
(1) φ(0) = 0 φ(I) = 1
(2) φ(∨iAi) =∑
i φ(Ai) if[
Ai⊥Aj (⇔ Ai ≤ A⊥j ) (i 6= j)
]
A quantum state is a σ-additive map from P(H) into [0, 1]
φ is a complete analogue of a classical probability measure
50
Theorem (Gleason): If φ is a quantum state then there exists a
positive, trace class operator ρ with Tr(ρ) = 1 such that
φ(A) = Tr(ρA) =∑
i
〈ξi, ρAξi〉 (1)
and conversely, if ρ is a positive, trace class operator such that
Tr(ρ) = 1 then (1) defines a quantum state φ
Tr is defined by
Tr(Q) =∑
i
〈ξi, Qξi〉 ξi orthonormal basis in H
ρ is the analogue of the probability density function
Tr is the analogue of the counting measure
51
Gleason’s theorem shows: φ can be extended
from projections P(H) to bounded operators B(H)
Analogy: classical measure can be extended
from characteristic functions to integrable functions
The extension process is called: theory of integration
Conclusion: Gleason’s theorem is a theorem in non-commutative
integration
Recovering the standard notion of (vector) state as used in physics:
If ρ = Pξ = projection to ξ ∈ H (state vector) then
Tr(ρQ) =∑
i
〈ξi, QPξξi〉 = 〈ξ, QPξξ〉 = 〈ξ, Qξ〉
〈ξ, Qξ〉 = the usual expectation value of observable Q in state ξ
52
There is a very tight formal correspondence between concepts in
classical probability theory and quantum probability theory, an
analogy that goes beyond the key correspondence
p ←→ φ
µ counting measure ←→ Tr
probability density ←→ ρ
The correspondence is summarized in the next 3 slides
53
Classical Quantum
probability theory probability theory
(X,S, µ) (H,P(H), T r)
classical measure space Hilbert space QM
S Boolean algebra P(H) orthomodular lattice
µ counting measure Tr functional
L1(X, µ) T (H)
integrable functions trace class operators
54
L∞(X, µ) B(H)
essentially bounded functions bounded operators
(bounded) random variables (bounded) observables
g ∈ L1(X, µ), g ≥ 0,∫
gdµ = 1 ρ ∈ T (H), ρ ≥ 0, Tr(ρ) = 1
probability density density matrix ((normal) state)
S 3 A 7→ pg(A) =∫
χAgdµ ∈ [0, 1] P(H) 3 A 7→ Tr(ρA) ∈ [0, 1]
∫gfdµ, g ∈ L1(X, µ) Tr(ρA), ρ ∈ T (H)
expectation value of f ∈ L∞(X, µ) expectation value of A ∈ B(H)
with respect to pg in state ρ
55
L1(X, µ) Banach space T (H) Banach space
‖g‖1 =∫|g|dµ ‖ρ‖Tr = Tr(|ρ|)
L∞(X, µ) Banach space B(H) Banach space
‖f‖∞ = ess.sup.f ‖A‖ = sup‖ξ‖≤1‖Aξ‖
L1(X, µ)∗ = L∞(X, µ) duality T (H)∗ = B(H) duality
φ ∈ L1(X, µ)∗ φ ∈ T ∗
φ(g) =∫
fgdµ φ(ρ) = Tr(ρA)
for some f ∈ L∞(X, µ) for some A ∈ B(H)
L∞(X, µ)∗ ⊃ L1(X, µ) B(H)∗ ⊃ T (H)
L∞(X, µ) 3 f 7→∫
gfdµ, g ∈ L1(X, µ) B(H) 3 A 7→ Tr(ρA)
‖ · ‖∞-cont. functional ‖ · ‖ (op.norm) cont. functional
56
The major conceptual problem related to quantum mechanics:
How can a quantum probability space (H,P(H), φ) be interpreted
as a probability space ?
Specifically:
• What does P(H) stand for ?
• What interpretation can be given to the non-commutative
probability measure φ?
(relative frequency ?)
57
Interpretation of the classical probability space:
Classical propositional logic
||Boolean algebra
||Random event structure
Classical probability
||normalized measure µ on Boolean algebra with
subadditivity property:
µ(A) + µ(B) = µ(A ∪B) + µ(A ∩B)
probability = (possibly) relative frequency
58
The non-commutative version of the classical interpretation would be:
quantum (propositional) logic
||Hilbert lattice P(H)
||quantum (random) event structure
quantum probability
||normalized measure φ on Hilbert lattice
probability = relative frequency
Is such an interpretation possible?
59
Claim: The non-commutative version of a classical
interpretation is NOT possible
Arguments
• P(H) is not an event structure (under a natural conceptual
understanding of the notion of event )
• The quantum probability cannot be given a frequency
interpretation (a la von Mises)
60
(P(H),∨,∧,⊥) is event structure
mA ∧B = [A and B both happen]
A ∨B = [either A or B happens]
A⊥ = [A does not happen]
If A, B are events then
(i) every A either happens or does not happen
(ii) A happens ⇒ A⊥ does not happen
(iii) A happens and B happens ⇒ A ∧B happens
(iv) A ∨B happens ⇒ either A or B happens
(i)-(iv)
mThere exists a h:P(H)→ 0, 1 evaluation
||Boolean algebra homomorphism
61
Proposition : There exists no Boolean algebra homomorphism from
the Hilbert lattice P(H) into a Boolean algebra
because
Proposition (Kochen-Specker Theorem): There exists no partial
Boolean algebra homomorphism from a Hilbert lattice P(H) into
any Boolean algebra
h:P(H)→ B is a partial Boolean algebra homomorphism
mh is a Boolean algebra homomorphism
on every sub Boolean algebra of P(H)
62
Relative frequency interpretation of probability
(von Mises)
(X,S, p) has a relative frequency interpretation if there exists a
fixed statistical ensemble e1, e2, . . . such that
• For every attribute (event) A, presence/absence of A on every
element ei of the ensemble can be decided unambiguously
without changing ei/the ensemble
• For every A the number p(A) = (limit of) relative frequency of
event A in e1, e2, . . .
Von Mises: the ensemble is supposed to be random
Randomness is tricky and problematic but this is not the reason
why a frequency interpretation of quantum probability spaces is
not possible
63
Proposition : subadditivity of a probability measure
µ(A) + µ(B) = µ(A ∪B) + µ(A ∩B) is necessary for a relative
frequency interpretation
#(A ∪B)
N+
#(A ∩B)
N=
#((A \A ∩B) ∪ (B \A ∩B) ∪A ∩B))
N+
#(A ∩B)
N=
#(A \A ∩B) + #(B \A ∩B) + #(A ∩B) + #(A ∩B)
N=
#(A) + #(B)
N
64
Proposition : A quantum probability measure is NOT subadditive
because
Proposition : A countably additive map
φ:P(H)→ IR+ ∪∞
is subadditive iff
φ(A) = const.T r(A)
and
Tr(A) =∑
ξi∈A
〈ξ, Aξi〉+∑
ηj∈A⊥
〈ηj , Aηj〉
︸ ︷︷ ︸
0
= dim(A) =∞
for infinite dimensional linear subspaces A ∈ P(H)
65
Another version of failure of subadditivity of quantum states
(after (Szabo, 2001):
Proposition : For any projections A, B ∈ P(H) there exists a
quantum state φ such that
φ(A)︸ ︷︷ ︸
1
+ φ(B)︸ ︷︷ ︸
>0
−φ(A ∧B)︸ ︷︷ ︸
0
> 1
If A, B are events, how can A happen with certainty and B without
A?
Subadditivity excludes such a “strange” situation
66
Options to save interpretational consistency
quantum (propositional) logic
|| ←− give up ! - doesn’t help
Hilbert lattice P(H) ←− give up !? - very radical!
|| ←− give up !
quantum (random) event structure
quantum probability
|| ←− give up !
normalized measure on Hilbert lattice
|| ←− give up !
relative frequency interpretation
67
John von Neumann’s choice (1935-1936):
Give up Hilbert space probability theory!
“I would like to make a confession which may seem immoral: I do not
believe absolutely in Hilbert space any more. After all Hilbert-space (as
far as quantum-mechanical things are concerned) was obtained by
generalizing Euclidean space, footing on the principle of “conserving the
validity of all formal rules”. This is very clear, if you consider the
axiomatic-geometric definition of Hilbert-space, where one simply takes
Weyl’s axioms for a unitary-Euclidean-space, drops the condition on the
existence of a finite linear basis, and replaces it by a minimum of
topological assumptions (completeness + separability). Thus
Hilbert-space is the straightforward generalization of Euclidean space, if
one considers the vectors as the essential notions.
68
Now we [with F.J. Murray, von Neumann’s coauthor ] begin to believe,
that it is not the vectors which matter but the lattice of all linear
(closed) subspaces. Because:
1. The vectors ought to represent the physical states, but they do it
redundantly, up to a complex factor, only.
2. And besides the states are merely a derived notion, the primitive
(phenomenologically given) notion being the qualities, which
correspond to the linear closed subspaces.
But if we wish to generalize the lattice of all linear closed subspaces from
a Euclidean space to infinitely many dimensions, then one does not
obtain Hilbert space ...”
J. von Neumann to G. Birkhoff (Nov. 13., Wednesday [1935])
“I, for one, do not even believe, that the right formal frame for quantum
mechanics is already found.”
J. von Neumann to G. Birkhoff (Nov. 27, [1935])
69
What to replace Hilbert space quantum mechanics by?
John von Neumann’s answer: (hopefully) by the theory of
“rings of operators of type II1”
mtype II1 von Neumann algebras
Why? What are these “type II1 von Neumann algebras”?
This is the subject of the next lecture
70
Summary of Lecture 2.
• Hilbert space quantum mechanics is the non-commutative
(non-distributive) analogue of classical measure/probability
theory – the analogy is detailed and strong, the key elements
being
Boolean algebra ↔ Hilbert lattice
p ↔ quantum state φ
• The elements of non-commutative probability theory cannot be
interpreted as the analogy would suggest:
Hilbert lattice 6= random event structure (Kochen-Specker)
quantum probability φ(A) 6= relative frequency (violation of
subaditivity)
• Von Neumann’s suggestion: Hilbert space probability theory is
pathological, replace it by well behaving operator algebraic QM
71
Lecture 3
Structure
• Limits of Hilbert space probability theory
restricted type, not capable of describing large quantum systems
• Von Neumann algebras
notion, dimension function, classification theory, the five type)
• Rise and fall of the type II1 case
probabilistic reasons for von Neumann’s preference of the type II1
case, the interpretational problems remain for the type II1 case
• Re-interpretation of quantum probabilities as classical
conditional probabilities
Kolmogorovian Censorship hypothesis
72
The single sentence one should take away from Lecture 3
Von Neumann algebra theory is the non-commutative
generalization of Hilbert space probability theory that yields all the
types of non-commutative probability theories that occur in
classical probability theory but the interpretational difficulties
present in Hilbert space probability theory are not solved by
passing to the theory of von Neumann algebras
73
The analogy displayed in the tables between classical
measure/probability theory (reproduced in next slide) and the
non-commutative, Hilbert space measure/probability theory is not
perfect:
• Given a general (X,S) there is no canonical counting measure
µ on SIn contrast, given H, the non-commutative “counting measure”,
the trace functional, is uniquely determined, and only those
probability measures are present in the Hilbert space formalism
which can be given by densities with respect to the trace
• The types of classical measure/probability space and the type
of the Hilbert space measure/probability space may not match:
S can be non-atomic, P(H) is always atomic
µ can take on values in a continuum, Tr is discrete.
74
Classical Quantum
probability theory probability theory
(X,S, µ) (H,P(H), T r)
classical measure space Hilbert space QM
S Boolean algebra P(H) orthomodular lattice
µ counting measure Tr functional
L1(X, µ) T (H)
integrable functions trace class operators
75
Von Neumann algebra theory is precisely the non-commutative
generalization of Hilbert space measure/probability that yields all
the types of non-commutative measure/probability theories that
occur in classical measure/probability theory.
These typical classical measure/probability spaces are:
X = x1, x2, . . . xN, p(xi) = 1N
(i = 1, . . . N) discrete, finite
X = x1, x2, . . . xN , . . ., p(xi) = 1 (i = 1, . . . N) discrete, infinite
X = [0, 1], p = Lebesgue measure on [0, 1] continuous, finite
X = IR, p = Lebesgue measure on IR continuous, infinite
76
Definition : N ⊆ B(H) is a von Neumann algebra if
• I ∈ N (I= identity operator )
• If Q ∈ N then Q∗ ∈ N (*-closed)
• If Q1, Q2 ∈ N then (λ1Q1 + λ2Q2) ∈ N and Q1Q2 ∈ N(N algebra)
• N is closed in the sense that
if for some Q we have φ(Qn)→ φ(Q) for all states then Q ∈ N
B(H) itself is (obviously) a von Neumann algebra
Are there any other examples?
(Other = non-isomorphic to B(H))
mClassification problem
77
Classification of von Neumann algebras
The set of projections P(N ) of a von Neumann algebra is an
orthomodular lattice (= sublattice of the Hilbert lattice P(H))
Definition : d:N → IR+ ∪∞ is a dimension function if
d(A) + d(B) = d(A ∪B) + d(A ∩B)
msubadditivity !
The classification is in terms of the type of the range of the
dimension function defined on the projection lattice P(N ): the
type of the range of the dimension function coincides with the
notion of type used in classifying the classical probability spaces.
The ranges/types are shown on the next slide.
78
HN , dimHN = N finite
N = B(HN ), P(N ) = P(HN ) type IN dimensional
range of d(= Tr) = 1, 2, . . . N finite, discrete QM
H, dimH =∞ standard
N = B(H), P(N ) = P(H) type I∞ Hilbert space
range of d(= Tr) = 1, 2, . . . non-finite, discrete QM
N , P(N ) type II1 Quantum
range of d = [0, 1] finite, continuous stat.phys.
N , P(N ) type II∞ Quantum
range of d = IR non-finite, continuous stat.phys.
N , P(N ) type III Quantum
range of d = 0,∞ very non-finite field theory
79
Why do we need the
“esoteric” types of non-commutative probability theories?
Generally: large quantum systems cannot be described
probabilistically within the Hilbert space probability theory
large = infinite (in size or in degrees of freedom)
Examples of such systems
• lattice gases
(mathematically precise models of discrete quantum statistical
mechanical systems in thermodynamical limit)
• non-relativistic quantum field theory
(non-discrete quantum statistical mechanical systems in
thermodynamic limit)
• relativistic quantum field theory (type III)
80
Infinite, discrete quantum system
occurring in quantum statistical mechanics
not describable in Hilbert space QM
(one dimensional lattice gas, Ising model)
Small quantum system sitting at each point i on a one dimensional
lattice infinite in both directions:
. . .
B(Hi−2)•
i− 2
B(Hi−1)•
i− 1
B(Hi)•i
B(Hi+1)•
i + 1
B(Hi+2)•
i + 2 . . .
Hi = H identical copies, dim(H) = finite
Large quantum system: analogue of Descartes product of classical
probability spaces: (infinite) union of (finite) tensor product:
A = ∪N ⊗Ni B(Hi)
This system cannot be described within the Hilbert space
formalism because A 6= B(H) (there is a finite trace on A)
81
Types IN and II1 are distinguished:
• the dimension function is normalized and
• satisfies subadditivity ⇒ d(A) can be interpreted as relative
frequency
• the projection lattices P(HN ) and P(N ) are not only
orthomodular but modular :
If A ≤ B, then A ∨ (B ∧ C) = (A ∨B) ∧ (A ∨ C)
distributivity6⇐⇒ modularity
6⇐⇒ orthomodularity
(HN ,P(HN ), d) and (H,P(N ), d) seem to be well behaving
non-commutative (quantum) probability theories whose
probabilities can in principle be interpreted as relative frequencies
82
John von Neumann:
• The type II1 (finite, continuous) non-commutative probability
space (H,P(N ), d) is the proper infinite dimensional
generalization of the (HN ,P(HN ), d) finite dimensional
non-commutative probability space
• (HN ,P(HN ), d) is the proper probabilistic framework for QM
and not Hilbert space quantum mechanics
83
“I would like to make a confession which may seem immoral: I do not
believe absolutely in Hilbert space any more. After all Hilbert-space (as
far as quantum-mechanical things are concerned) was obtained by
generalizing Euclidean space, footing on the principle of “conserving the
validity of all formal rules”. This is very clear, if you consider the
axiomatic-geometric definition of Hilbert-space, where one simply takes
Weyl’s axioms for a unitary-Euclidean-space, drops the condition on the
existence of a finite linear basis, and replaces it by a minimum of
topological assumptions (completeness + separability). Thus
Hilbert-space is the straightforward generalization of Euclidean space, if
one considers the vectors as the essential notions.
84
Now we [with F.J. Murray, von Neumann’s coauthor ] begin to believe,
that it is not the vectors which matter but the lattice of all linear
(closed) subspaces. Because:
1. The vectors ought to represent the physical states, but they do it
redundantly, up to a complex factor, only.
2. And besides the states are merely a derived notion, the primitive
(phenomenologically given) notion being the qualities, which
correspond to the linear closed subspaces.
But if we wish to generalize the lattice of all linear closed subspaces from
a Euclidean space to infinitely many dimensions, then one does not
obtain Hilbert space, but that configuration, which Murray and I called
“case II1.” (The lattice of all linear closed subspaces of Hilbert-space is
our “case I∞”.) And this is chiefly due to the presence of the rule
a ≤ c→ a ∪ (b ∩ c) = (a ∪ b) ∩ c
This “formal rule” would be lost, by passing to Hilbert space!”
J. von Neumann to G. Birkhoff (Nov. 13., Wednesday [1935])
85
Von Neumann’s dream (around 1934-1935)
classical quantum
classical logic quantum logic
|| ||
Boolean algebra type II1 v. Neumann lattice
distributive modular
|| ||
random event structure quantum (random) event structure
classical probability quantum probability
|| ||
normalized measure dimension function d
|| ||
relative frequency relative frequency
86
Is (P(N ), d) with a type II1 v. Neumann lattice
REALLY a NON-COMMUTATIVE probability space
whose probabilities can be interpreted as relative frequencies?
NO
because
Proposition (Murray-von Neumann) : d is the restriction to P(N )
of (or can be extended to) a tracial state τ on Nτ is a tracial state iff
τ(XY ) = τ(Y X) for all X, Y ∈ N
“τ is insensitive for the non-commutativity”
87
Proposition : A linear functional on a von Neumann algebra is
subadditive if and only if it is a trace
Consequently : d (τ) is the ONLY subadditive measure on P(N ) ⇒Only those states satisfy a necessary condition for a relative
frequency interpretation which disregard the non-commutative
(non-classical) character of the “random event structure
Therefore
if one wants to have a genuinely non-commutative probability space
then the frequency interpretation has to go!
It did: von Neumann gave up the frequency view in 1937:
“This view, the so-called ‘frequency theory of probability’ has been very
brilliantly upheld and expounded by R. von Mises. This view, however,
is not acceptable to us, at least not in the present ‘logical’ context.”
“Quantum logic (strict- and probability logics)” unfinished, unpublished manuscript from 1937
88
How to interpret non-commutative probability
if not by relative frequency?
von Neumann: “logical interpretation”:
“However, all quantum mechanical probabilities are defined by inner
products of vectors. Essentially if a state of a system is given by one
vector, the transition probability in another state is the inner product of
the two which is the square of the cosine of the angle between them. In
other words, probability corresponds precisely to introducing the angles
geometrically. Furthermore, there is only one way to introduce it. The
more so because in the quantum mechanical machinery the negation of a
statement, so the negation of a statement which is represented by a
linear set of vectors, corresponds to the orthogonal complement of this
linear space.
And therefore, as soon as you have introduced into the projective
geometry the ordinary machinery of logics, you must have introduced the
concept of orthogonality. This actually is rigorously true and any
89
axiomatic elaboration of the subject bears it out. So in order to have
logics you need in this set a projective geometry with a concept of
orthogonality in it.
In order to have probability all you need is a concept of all angles, I mean
angles other than 90 . Now it is perfectly quite true that in a geometry,
as soon as you can define the right angle, you can define all angles.
Another way to put it is that if you take the case of an orthogonal space,
those mappings of this space on itself, which leave orthogonality intact,
leave all angles intact, in other words, in those systems which can be
used as models of the logical background for quantum theory, it is true
that as soon as all the ordinary concepts of logics are fixed under some
isomorphic transformation, all of probability theory is already fixed.
90
What I now say is not more profound than saying that the concept of a
priori probability in quantum mechanics is uniquely given from the start.
You can derive it by counting states and all the ambiguities which are
attached to it in classical theories have disappeared. This means,
however, that one has a formal mechanism, in which logics and
probability theory arise simultaneously and are derived simultaneously.”
J. von Neumann: Unsolved problems in mathematics” talk delivered to the International Congress
of Mathematicians, September 2-9, Amsterdam, 1954
91
What von Neumann says:
Assume: Quantum Logic = P(N ) of a type II1 von Neumann
algebra; U ∈ N unitary element
• U leaves angles between Hilbert space vectors invariant:
〈ξ, η〉 = 〈ξ, Uη〉
• every U leaves P(N ) invariant in the sense:
UAU∗ is projection if A is (U(·)U ∗ = symmetry/isomorphism
of logic)
• τ is a tracial state iff[
τ(UXU∗) = τ(X) for all U]
⇒ the set
of all U ’s determines the trace uniquely ⇒ set of U ’s
determines the dimension function d = subadditive probability
Probability (=d) is determined by logic
This situation only obtains in the type II1 and IN cases (=finite)
92
Von Neumann did not regard this “logical interpretation of
probability” well understood and well articulated:
“I think that it is quite important and will probably shade [shed] a great
deal of new light on logics and probably alter the whole formal structure
of logics considerably, if one succeeds in deriving this system from first
principles, in other words from a suitable set of axioms. All the existing
axiomatisations of this system are unsatisfactory in this sense, that they
bring in quite arbitrarily algebraical laws which are not clearly related to
anything that one believes to be true or that one has observed in
quantum theory to be true. So, while one has very satisfactorily
formalistic foundations of projective geometry of some infinite
generalizations of it, including orthogonality, including angles, none of
them are derived from intuitively plausible first principles in the manner
in which axiomatisations in other areas are.
93
Now I think that at this point lies a very important complex of open
problems, about which one does not know well of how to formulate them
now, but which are likely to give logics and the whole dependent system
of probability a new slam.”
J. von Neumann: Unsolved problems in mathematics” talk delivered to the International Congress
of Mathematicians, September 2-9, Amsterdam, 1954
Von Neumann never worked out the indicated “joint theory of
(quantum) logic and (quantum) probability” axiomatically, and not
because he did not try:
94
“Dear Doctor Silsbee,
It is with great regret that I am writing these lines to you, but I simply
cannot help myself. In spite of very serious attempts to write the article
on the ”Logics of quantum mechanics” I find it completely impossible to
do it at this time. As you may know, I wrote a paper on this subject
with Garrett Birkhoff in 1936 ([reference]), and I have thought a good
deal on the subject since. My work on continuous geometries, on which I
gave the Amer.Math.Soc. Colloquium lectures in 1937, comes to a
considerable extent from this source. Also a good deal concerning the
relationship between strict and probability logics (upon which I touched
briefly in the Henry Joseph Lecture) and the extension of this
“Propositional calculus” work to ”logics with quantifiers” (which I never
so far discussed in public). All these things should be presented as a
connected whole ...
When I offered to give the Henry Joseph Lecture on this subject, I
thought (and I hope that I was not too far wrong in this) that I could
give a reasonable general survey of at least part of the subject in a talk,
95
which might have some interest to the audience. I did not realize the
importance nor the difficulties of reducing this to writing.
I have now learned – after a considerable number of serious but very
unsuccessful efforts – that they are exceedingly great. I must, of course,
accept a good part of the responsibility for my method of writing – I
write rather freely and fast if a subject is ”mature” in my mind, but
develop the worst traits of pedantism and inefficiency if I attempt to give
a preliminary account of a subject which I do not have yet in what I can
believe in its final form.
I have tried to live up to my promise and to force myself to write this
article, and spent much more time on it than on many comparable ones
which I wrote with no difficulty at all – and it just didn’t work. ”
Von Neumann’s letter to Dr. Silsbee, July 2, 1945
96
Radical solution of the interpretational inconsistency
related to non-commutative probability theory:
Kolmogorovian censorship hypothesis (L. Szabo)
mwe never observe “quantum probabilities”
quantum probabilities = classical conditional probabilities
conditioning events: setting up measurements
Non-commutative (quantum) probability theory on this view is just
a mathematical framework enabling to handle very efficiently all
conceivable conditional probabilities one encounters in
experimental situations.
97
To maintain this interpretation in full generality one should be able
to prove that a non-commutative probability space (X,P(N ), φ)
can always be conditionally represented by a classical probability
space (SP(N ), pφ), where the “conditional representation” means
Proposition for any set Z of mutually incompatible projections in
P(N ) there exists a set E ⊆ S × S of pairs (A, a) of events and a
function A 7→ (A, a) ∈ E such that if for all Aλ, Aν ∈ Z we have
aλ ∩ aν = ∅ (if λ 6= ν) then (2)
φ(A) =pφ(A ∩ a)
pφ(a)(3)
(2) expresses that incompatible observables can never be
simultaneously measured, (3) expresses that quantum
“probabilities” are in fact classical conditional probabilities, the
conditions being the events of setting up measurement.
98
Such representation theorems can be proved for finite Z(Bana 1997) and for countably infinite Z (Szabo 2001)
Comments on Kolmogorovian Censorhsip
• Conditional representation theorems cannot hold for a Zcontaining a continuum number of mutually incompatible
projections (since there does not exist a measure space with a
σ-additive normalized measure and a continuum number of
mutually disjoint measurable sets each having a non-zero
measure); on the other hand, there does exist a continuum
number of mutually incompatible projections in a von
Neumann lattice ⇒ constraint on the generality of this
interpretation
• Kolmogorovian Censorhsip = strong instrumentalism, difficult
to accept philosophically
99
Summary of Lecture 3.
• Large (infinite degrees of freedom) quantum systems cannot be
described within the usual Hilbert space quantum
mechanics/probability theory
• The classification theory of von Neumann algebras shows that von
Neumann algebra theory is the non-commutative probability theory
that provides all the typical types of probability measure spaces
• Contrary to von Neumann’s early expectation, the interpretational
difficulties present in Hilbert space probability theory cannot be
solved by passing to the specific type II1 non-commutative
probability theory
• To avoid the difficulties von Neumann gave up the frequency
interpretation of quantum probability
• The Kolmogorovian censorship hypothesis re-interprets quantum
probabilities as classical conditional probabilities, thereby saving the
frequency view; but the interpretation lacks full generality
100