probability and measure - aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1...
TRANSCRIPT
University ofCambridge
Mathematics Tripos
Part II
Probability and Measure
Michaelmas, 2018
Lectures byE. Breuillard
Notes byQiangru Kuang
Contents
Contents
1 Lebesgue measure 21.1 Boolean algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Jordan measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Abstract measure theory 9
3 Integration and measurable functions 17
4 Product measures 25
5 Foundations of probability theory 28
6 Independence 326.1 Useful inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7 Convergence of random variables 38
8 πΏπ spaces 43
9 Hilbert space and πΏ2-methods 48
10 Fourier transform 55
11 Gaussians 63
12 Ergodic theory 6612.1 The canonical model . . . . . . . . . . . . . . . . . . . . . . . . . 67
Index 73
1
1 Lebesgue measure
1 Lebesgue measure
1.1 Boolean algebra
Definition (Boolean algebra). Let π be a set. A Boolean algebra on π isa family of subsets of π which
1. contains β ,
2. is stable under finite unions and complementation.
Example.
β’ The trivial Boolean algebra β¬ = {β , π}.
β’ The discrete Boolean algebra β¬ = 2π, the family of all subsets of π.
β’ Less trivially, if π is a topological space, the family of constructible setsforms a Boolean algebra, where a constructible set is the finite union oflocally closed set, i.e. a set πΈ = π β© πΉ where π is open and πΉ is closed.
Definition (finitely additive measure). Let π be a set and β¬ a Booleanalgebra on π. A finitely additive measure on (π, β¬) is a function π βΆ β¬ β[0, +β] such that
1. π(β ) = 0,
2. π(πΈ βͺ πΉ) = π(πΈ) + π(πΉ) where πΈ β© πΉ = β .
Example.
1. Counting measure: π(πΈ) = #πΈ, the cardinality of πΈ where β¬ is thediscrete Boolean algebra of π.
2. More generally, given π βΆ π β [0, +β], define for πΈ β π,
π(πΈ) = βπβπΈ
π(π).
3. Suppose π = βππ=1 ππ, then define β¬(π) to be the unions of ππβs. Assign
a weight ππ β₯ 0 to each ππ and define π(πΈ) = βπβΆππβπΈ ππ for πΈ β β¬.
1.2 Jordan measureThis section is a historic review and provides intuition for Lebesgue measuretheory. Weβll gloss over details of proofs in this section.
Definition. A subset of Rπ is called elementary if it is a finite union ofboxes, where a box is a set π΅ = πΌ1 Γ β― Γ πΌπ where each πΌπ is a finite intervalof R.
2
1 Lebesgue measure
Proposition 1.1. Let π΅ β Rπ be a box. Let β°(π΅) be the family of elementarysubsets of π΅. Then
1. β°(π΅) is a Boolean algebra on π΅,
2. every πΈ β β°(π΅) is a disjoint finite union of boxes,
3. if πΈ β β°(π΅) can be written as disjoint finite union in two ways,πΈ = βπ
π=1 π΅π = βππ=1 π΅β²
π, then βππ=1 |π΅π| = βπ
π=1 |π΅β²π| where |π΅| =
βππ=1 |ππ β ππ| if π΅ = πΌ1 Γ β― Γ πΌπ and πΌπ has endpoints ππ, ππ.
Following this, we can define a finitely additive measure correponding to ourintuition of length, area, volume etc:
Proposition 1.2. Define π(πΈ) = βππ=1 |π΅π| if πΈ is any elementary set and
is the disjoint union of boxes π΅π β Rπ. Then π is a finitely additive measureon β°(π΅) for any box π΅.
Definition. A subset πΈ β Rπ is Jordan measurable if for any π > 0 thereare elementary sets π΄, π΅, π΄ β πΈ β π΅ and π(π΅ \ π΄) < π.
Remark. Jordan measurable sets are bounded.
Proposition 1.3. If a set πΈ β Rπ is Jordan measurable, then
supπ΄βπΈ elementary
{π(π΄)} = infπ΅βπΈ elementary
{π(π΅)}.
In which case we define the Jordan measure of πΈ as
π(πΈ) = supπ΄βπΈ
{π(π΄)}.
Proof. Take π΄π β πΈ such that π(π΄π) β sup and π΅π β πΈ such that π(π΅π) β inf.Note that
inf β€ π(π΅π) = π(π΄π) + π(π΅π \ π΄π) β€ sup +π(π΅π \ π΄π) β€ sup +π
for arbitrary π > 0 so they are equal.
Exercise.
1. If π΅ is a box, the family π₯(π΅) of Jordan measurable subsets of π΅ is aBoolean algebra.
2. A subset πΈ β [0, 1] is Jordan measurable if and only if 1πΈ, the indicatorfunction on πΈ, is Riemann integrable.
3
1 Lebesgue measure
1.3 Lebesgue measureAlthough Jordan measure corresponds to the intuition of length, area and vol-ume, it suffer from a few severe problems and issues:
1. unbounded sets in Rπ are not Jordan measurable.
2. 1Qβ©[0,1] is not Riemann integrable, and therefore Q β© [0, 1] is not Jordanmeasurable.
3. pointwise limits of Riemann integrable function ππ βΆ= 1 1π!Zβ©[0,1] β 1Qβ©[0,1]
is not Riemann integrable.
The idea of Lebesgue is to use countable covers by boxes.
Definition. A subset πΈ β Rπ is Lebesgue measurable if for all π > 0, thereexists a countable union of boxes πΆ with πΈ β πΆ and πβ(πΆ \ πΈ) < π, whereπβ, the Lebesgue outer measure, is defined as
πβ(πΈ) = inf{βπβ₯1
|π΅π| βΆ πΈ β βπβ₯1
π΅π, π΅π boxes}
for every subset πΈ β Rπ.
Remark. wlog in these definitions we may assume that boxes are open.
Proposition 1.4. The family β of Lebesgue measurable subsets of Rπ is aBoolean algebra stable under countable unions.
Lemma 1.5.
1. πβ is monotone: if πΈ β πΉ then πβ(πΈ) β πβ(πΉ).
2. πβ is countably subadditive: if πΈ = βπβ₯1 πΈπ where πΈπ β Rπ then
πβ(πΈ) β€ βπβ₯1
πβ(πΈπ).
Proof. Monotonicity is obvious. For countable subadditivity, pick π > 0 and letπΆπ = βπβ₯1 πΆπ,π where πΆπ,π are boxes such that πΈπ β πΆπ and
βπβ₯1
|πΆπ,π| β€ πβ(πΈπ) + π2π .
Thenβπβ₯1
βπβ₯1
|πΆπ,π| β€ βπβ₯1
(πβ(πΈπ) + π2π ) = π + β
πβ₯1πβ(πΈπ)
and πΈ β βπβ₯1 πΆπ = βπβ₯1 βπβ₯1 πΆπ,π so
πβ(πΈ) β€ π + βπβ₯1
πβ(πΈπ)
for all π > 0.
4
1 Lebesgue measure
Remark. Note that πβ is not additive on the family of all subsets of Rπ.However, it will be on β, as we will show later.
Lemma 1.6. If π΄, π΅ are disjoint compact subsets of Rπ then
πβ(π΄ βͺ π΅) = πβ(π΄) + πβ(π΅).
Proof. β€ by the previous lemma so need to show β₯. Pick π > 0. Let π΄ βͺ π΅ ββπβ₯1 π΅π where π΅π are open boxes such that
βπβ₯1
|π΅π| β€ πβ(π΄ βͺ π΅) + π.
wlog we may assume that the side lengths of each π΅π are < πΌ2 , where
πΌ = inf{βπ₯ β π¦β1 βΆ π₯ β π΄, π¦ β π΅} > 0.
where the inequality comes from the fact that π΄ and π΅ are compact and thusclosed. wlog we may discard the π΅πβs that do not interesect π΄ βͺ π΅. Then byconstruction
βπβ₯1
|π΅π| = βπβ₯1,π΅πβ©π΄=β
|π΅π| + βπβ₯1,π΅πβ©π΅=β
|π΅π| β₯ πβ(π΄) + πβ(π΅)
soπ + πβ(π΄ βͺ π΅) β₯ πβ(π΄) + πβ(π΅)
for all π.
Lemma 1.7. If πΈ β Rπ has πβ(πΈ) = 0 then πΈ β β.
Definition (null set). A set πΈ β Rπ such that πβ(πΈ) = 0 is called a nullset.
Proof. For all π > 0, there exist πΆ = βπβ₯1 π΅π where π΅π are boxes such thatπΈ β πΆ and βπβ₯1 |π΅π| β€ π. But
πβ(πΆ \ πΈ) β€ πβ(πΆ) β€ π.
Lemma 1.8. Every open subset of Rπ and every closed subset of Rπ is inβ.
We will prove the lemma using the fact that the family of Lebesgue mea-surable subsets is stable under countable union, which itself does not use thislemma. This lemma, however, will be used to show the stability under comple-mentation. Since the proof is quite technical (it has more to do with generaltopology than measure theory), for brevity and fluency of ideas we present theproof the main proposition first.
5
1 Lebesgue measure
Proof of Proposition 1.4. It is obvious that β β β. To show it is stable undercountable unions, start with πΈπ β β for π β₯ 1. Need to show πΈ βΆ= βπβ₯1 πΈπ ββ.
Pick π > 0. By assumption there exist πΆπ = βπβ₯1 π΅π,π where π΅π,π are boxessuch that πΈπ β πΆπ and
πβ(πΆπ \ πΈπ) < π2π .
NowπΈ = β
πβ₯1πΈπ β β
πβ₯1πΆπ =βΆ πΆ
so πΆ is again a countable union of boxes and πΆ \ πΈ β βπβ₯1 πΆπ \ πΈπ. so
πβ(πΆ \ πΈ) β€ βπβ₯1
πβ(πΆπ \ πΈπ) β€ βπβ₯1
π2π = π
by countable subadditivity so πΈ β β.To show it is stable under complementation, suppose πΈ β β. By assumption
there exist πΆπ a countable union of boxes with πΈ β πΆπ and πβ(πΆπ \ πΈ) β€ 1π .
wlog we may assume the boxes are open so πΆπ is open, πΆππ is closed so πΆπ
π β β.Thus βπβ₯1 πΆπ
π β β by first part of the proof.But
πβ(πΈπ \ βπβ₯1
πΆππ) β€ πβ(πΈπ \ πΆπ
π) = πβ(πΆπ \ πΈ) β€ 1π
so πβ(πΈπ \ βπβ₯1 πΆππ) = 0 so πΈπ \ βπβ₯1 πΆπ
π β β since it is a null set. But
πΈπ = (πΈπ \ βπβ₯1
πΆππ) βͺ β
πβ₯1πΆπ
π,
both of which are in β so πΈπ β β.
Proof of Lemma 1.8. Every open set in Rπ is a countable union of boxes so isin β. It is more subtle for closed sets. The key observation is that every closedset is the countable union of compact subsets so we are left to show compactsets of Rπ are in β.
Let πΉ β Rπ be compact. For all π β₯ 1, there exist ππ a countable union ofopen sets such that πΉ β ππ βΆ= βπβ₯1 ππ,π where ππ,π are open boxes such that
βπβ₯1
|ππ,π| β€ πβ(πΉ) + 12π .
By compactness there exist a finite subcover so we can assume ππ is a finiteunion of open boxes. Moreover, wlog assume that
1. the side lengths of ππ,π are β€ 12π .
2. for each π, ππ,π intersects πΉ.
3. ππ+1 β ππ (by replacing ππ+1 with ππ+1 β© ππ iteratively).
6
1 Lebesgue measure
Then πΉ = βπβ₯1 ππ and we are left to show πβ(ππ \ πΉ) β 0. By additivity ondisjoint compact sets,
πβ(πΉ) + πβ(ππ \ ππ+1) = πβ(πΉ βͺ (ππ \ ππ+1))
soπβ(πΉ) + πβ(ππ \ ππ+1) β€ πβ(ππ) β€ β
πβ₯1|ππ,π| β€ πβ(πΉ) + 1
2π
so πβ(ππ \ ππ+1) β€ 12π . Finally,
πβ(ππ \ πΉ) = πβ(βπβ₯π
(ππ \ ππ+1)) β€ βπβ₯π
πβ(ππ \ ππ+1) β€ βπβ₯π
12π = 1
2πβ1 .
The result weβre working towards is
Proposition 1.9. πβ is countably additive on β, i.e. if (πΈπ)πβ₯1 whereπΈπ β β are pairwise disjoint then
πβ( βπβ₯1
πΈπ) = βπβ₯1
πβ(πΈπ).
Lemma 1.10. If πΈ β β then for all π > 0 there exists π open, πΉ closed,πΉ β πΈ β π such that πβ(π \ πΈ) < π and πβ(πΈ \ πΉ) < π.
Proof. By definition of β, there exists a countable union of open boxes πΈ ββπβ₯1 π΅π such that πβ(βπβ₯1 π΅π \ πΈ) < π. Just take π = βπβ₯1 π΅π which isopen.
For πΉ do the same with πΈπ = Rπ \ πΈ in place of πΈ.
Proof of Proposition 1.9. First we assume each πΈπ is compact. By a previouslemma πβ is additive on compact sets so for all π β N,
πβ(πβπ=1
πΈπ) =π
βπ=1
πβ(πΈπ).
In particularπ
βπ=1
πβ(πΈπ) β€ πβ( βπβ₯1
πΈπ)
since πβ is monotone. Take π β β to get one inequality. The other directionholds by countable subadditivity of πβ.
Now assume that each πΈπ is a bounded subset in β. By the lemma thereexists πΎπ β πΈπ closed, so compact, such that πβ(πΈπ \ πΎπ) β€ π
2π . Since πΎπβsare disjoint, by the previous case
πβ( βπβ₯1
πΎπ) = βπβ₯1
πβ(πΎπ)
7
1 Lebesgue measure
then
βπβ₯1
πβ(πΈπ)
β€ βπβ₯1
πβ(πΎπ) + πβ(πΈπ \ πΎπ)
β€πβ( βπβ₯1
πΎπ) + βπβ₯1
π2π
β€πβ( βπβ₯1
πΈπ) + π
so one direction of inequality. Similarly the other direction holds by countablesubadditivity of πβ.
For the general case, note that Rπ = βπβZπ π΄π where π΄π is bounded andin β, for example by taking π΄π to be product of half open intervals of unitlength. Write πΈπ as βπβZπ πΈπ β© π΄π so just apply the previous results to(πΈπ β© π΄π)πβ₯1,πβZπ .
Definition (Lebesgue measure). πβ when restricted to β is called theLebesgue measure and is simply denoted by π.
Example (Vitali counterexample). Althought β is pretty big (it includes allopen and closed sets, countable unions and intersections of them, and has car-dinality at least 2π where π is the continuum, by considering a null set withcardinality π , and each subset thereof), it does not include every subset of Rπ.
Consider (Q, +), the additive subgroup of (R, +). Pick a set of representativeπΈ of the cosets of (Q, +). Choose it inside [0, 1]. For each π₯ β R, there existsa unique π β πΈ such that π₯ β π β Q (here we require axiom of choice). Claimthat πΈ β β and πβ is not additive on the family of all subsets of Rπ.
Proof. Pick distinct rationals π1, β¦ , ππ in [0, 1]. The sets ππ + πΈ are pairwisedisjoint so if πβ were additive then we would have
πβ(πβπ=1
ππ + πΈ) =π
βπ=1
πβ(ππ + πΈ) = ππβ(πΈ)
by translation invariance of πβ. But then
πβπ=1
ππ + πΈ β [0, 2]
since πΈ β [0, 1] so by monotonicity of πβ have
πβ(πβπ=1
ππ + πΈ) β€ 2
so for all ππβ(πΈ) β€ 2 so πβ(πΈ) = 0. But
[0, 1] β βπβQ
πΈ + π = R,
8
1 Lebesgue measure
by countable subadditivity of πβ,
1 = πβ([0, 1]) β€ βπβQ
πβ(πΈ + π) = 0.
Absurd.In particular πΈ β β as πβ is additive on β.
9
2 Abstract measure theory
2 Abstract measure theoryIn this chapter we extend measure theory to arbitrary set. Most part of thetheory is developed by FrΓ©chet and CarathΓ©odory.
Definition (π-algebra). A π-algebra on a set π is a Boolean algebra stableunder countable unions.
Definition (measurable space). A measurable space is a couple (π, π)where π is a set and π is a π-algebra on π.
Definition (measure). A measure on (π, π) is a map π βΆ π β [0, β] suchthat
1. π(β ) = 0,
2. π is countably additive (also known as π-additive), i.e. for every family(π΄π)πβ₯1 of disjoint subsets in π, have
π( βπβ₯1
π΄π) = βπβ₯1
π(π΄π).
The triple (π, π, π) is called a measure space.
Example.
1. (Rπ, β, π) is a measure space.
2. (π, 2π, #) where # is the counting measure.
Proposition 2.1. Let (π, π, π) be a measure space. Then
1. π is monotone: π΄ β π΅ implies π(π΄) β π(π΅),
2. π is countably subadditive: π(βπβ₯1 π΄π) β€ βπβ₯1 π(π΄π),
3. upward monotone convergence: if
πΈ1 β πΈ2 β β― β πΈπ β β¦
thenπ( β
πβ₯1πΈπ) = lim
πββπ(πΈπ) = sup
πβ₯1π(πΈπ).
4. downard monotone convergence: if
πΈ1 β πΈ2 β β― β πΈπ β β¦
10
2 Abstract measure theory
and π(πΈ1) < β then
π( βπβ₯1
πΈπ) = limπββ
π(πΈπ) = infπβ₯1
π(πΈπ).
Proof.
1.π(π΅) = π(π΄) + π(π΅ \ π΄)β
β₯0
by additivity of π.
2. See example sheet. The idea is that every countable union βπβ₯1 π΄π is adisjoint countable union βπβ₯1 π΅π where for each π, π΅π β π΄π. It thenfollows by π-additivity.
3. Let πΈ0 = β soβπβ₯1
πΈπ = βπβ₯1
(πΈπ \ πΈπβ1),
a disjoint union. By π-additivity,
π( βπβ₯1
πΈπ) = βπβ₯1
π(πΈπ \ πΈπβ1)
but for all π, by additivity of π,
πβπ=1
π(πΈπ \ πΈπβ1) = π(πΈπ)
so take limit. The supremum part is obvious.
4. Apply the previous result to πΈ1 \ πΈπ.
Remark. Note the π(πΈ1) < β condition in the last part. Counterexample:πΈπ = [π, β) β R.
Definition (π-algebra generated by a family). Let π be a set and β± besome family of subsets of π. The the intersection of all π-algebras on πcontaining β± is a π-algebra, called the π-algebra generated by β± and isdenoted by π(β±).
Proof. Easy check. See example sheet.
Example.
1. Suppose π = βππ=1 ππ, i.e. π admits a finite partition. Let β± = {π1, β¦ , ππ},
then π(β±) consists of all subsets that are unions of ππβs.
2. Suppose π is countable and let β± be the collection of all singletons. Thenπ(β±) = 2π.
11
2 Abstract measure theory
Definition (Borel π-algebra). Let π be a topological space. The π-algebragenerated by open subsets of π is called the Borel π-algebra of π, denotedby β¬(π).
Proposition 2.2. If π = Rπ then β¬(π) β β. Moreover every π΄ β β canbe written as a disjoint union π΄ = π΅ βͺ π where π΅ β β¬(π) and π is a nullset.
Proof. Weβve shown that β is a π-algebra and contains all open sets so β¬(π) ββ. Given π΄ β β, π΄π β β so for all π β₯ 1 there exists πΆπ countable unions of(open) boxes such that π΄π β πΆπ and πβ(πΆπ \ π΄π) β€ 1
π . Take πΆ = βπβ₯1 πΆπ ββ¬(π). Thus π΅ βΆ= πΆπ β β¬(π) and π(π΄ \ π΅) = 0 because π΄ \ π΅ = πΆ \ π΄π.
Remark.
1. It can be shown that β¬(Rπ) β β. In fact |β| β₯ 2π and |β¬(Rπ)| = π .
2. If β± is a family of subsets of a set π, the Boolean algebra generated byβ± can be explicitly described as
β¬(β±) = {finite unions of πΉ1 β© β― β© πΉπ βΆ πΉπ β β± or πΉ ππ β β±}.
3. However, this is not so for π(β±). There is no βsimpleβ description of π-algebra generated by β±. (c.f. Borel hierarchy in descriptive set theory andtransfinite induction)
Definition (π-system). A family β± of subsets of a set π is called a π-systemif it contains β and it is closed under finite intersection.
Proposition 2.3 (measure uniqueness). Let (π, π) be a measurable space.Assume π1 and π2 are two finite measures (i.e. ππ(π) < β) such thatπ1(πΉ) = π2(πΉ) for every πΉ β β± where β± is a π-system with π(β±) = π.Then π1 = π2.
For Rπ, we only have to check open boxes.
Proof. We state first the following lemma:
Lemma 2.4 (Dynkin lemma). If β± is a π-system on π and π is a familyof subsets of π such that β± β π and π is stable under complementation anddisjoint countable unions. Then π(β±) β π.
Let π = {π΄ β π βΆ π1(π΄) = π2(π΄)}. Then π is clearly stable under comple-mentation as
ππ(π΄π) = ππ(π \ π΄) = ππ(π) β ππ(π΄).
π is also clearly stable under countable disjoint unions by π-additivity. Thus byDynkin lemma, π β π(β±) = π.
12
2 Abstract measure theory
Proof of Dynkin lemma. Let β³ be the smallest family of subsets of π contain-ing β± and stable under complementation and countable disjoint union (2π issuch a family and taking intersection). Sufficient to show that β³ is a π-algebra,as then β³ β π implies π(β±) β π.
It suffices to show β³ is a Boolean algebra. Let
β³β² = {π΄ β β³ βΆ π΄ β© π΅ β β³ for all π΅ β β±}.
β³β² again is stable under countable disjoint unions and complementation because
π΄π β© π΅ = (π΅π βͺ (π΄ β© π΅))π
as a disjoint union so is in β³.As β³β² β β±, by minimality of β³, have β³ = β³β². Now let
β³β³ = {π΄ β β³β² βΆ π΄ β© π΅ β β³ for all π΅ β β³}.
The same argument shows that β³β³ = β³. Thus β³ is a Boolean algebra and aπ-algebra.
Proposition 2.5 (uniqueness of Lebesgue measure). Lebesgue measure isthe unique translation invariant measure π on (Rπ, β¬(Rπ)) such that
π([0, 1]π) = 1.
Proof. Exercise. Hint: use the π-system β± made of all boxes in Rπ and dissecta cube into dyadic pieces. Then approximate and use monotone.
Remark.
1. There is no countably additive translation invariant measure on R definedon all subsets of R. (c.f. Vitaliβs counterexample).
2. However, the Lebesgue measure can be extended to a finitely additivemeasure on all subsets of R (proof requires Hahn-Banach theorem. SeeIID Linear Analysis).
Recall the construction of Lebesgue measure: we take boxes in Rπ, and defineelementary sets, which is the Boolean algebra generated by boxes. Then we candefine Jordan measure which is finitely additive. However, this is not countablyadditive but analysis craves limits so we define Lebesgue measurable sets, byintroducing the outer measure πβ, which is built from the Jordan measure.Finally we restrict this outer measure to β. We also define the Borel π-algebra,which is the same as the π-algebra generated by the boxes. We show that theBorel π-algebra is contained in β, and every element in β can be written as adisjoint union of an element in the Borel π-algebra and a measure zero set.
Suppose β¬ is a Boolean algebra on a set π. Let π be a finitely additivemeasure on β¬. We are going to construct a measure on π(β¬).
13
2 Abstract measure theory
Theorem 2.6 (CarathΓ©odory extension theorem). Assume that π is count-ably additive on β¬, i.e. if π΅π β β¬ disjoint is such that βπβ₯1 π΅π β β¬ thenπ(βπβ₯1 π΅π) = βπβ₯1 π(π΅π) and assume that π is π-finite, i.e. there existsππ β β¬ such that π = βπβ₯1 ππ and π(ππ) < β, then π extends uniquelyto a measure on π(β¬).
Proof. For any πΈ β π, let
πβ(πΈ) = inf{βπβ₯1
π(π΅π) βΆ πΈ β βπβ₯1
π΅π, π΅π β β¬}
and call it the outer measure associated to π. Define a subset πΈ β π to beπβ-measurable if for all π > 0 there exists πΆ = βπβ₯1 π΅π with π΅π β β¬ such thatπΈ β πΆ and
πβ(πΆ \ πΈ) β€ π.
We denote by β¬β the set of πβ-measurable subsets. Claim that
1. πβ is countably subadditive and monotone.
2. πβ(π΅) = π(π΅) for all π΅ β β¬.
3. β¬β is a π-algebra and contains all πβ-null sets and β¬.
4. πβ is π-additive on β¬β.
Then existence follows from the proposition as β¬β β π(β¬): πβ will be ameasure on β¬β and thus on π(β¬). Uniqueness follows from a similar proof forLebesgue measure via Dynkin lemma.
Proof. This will be very easy as we only need to adapt our previous work tothe general case. Note that in a few occassion we used properties of Rπ, suchas openness of some sets, so be careful.
1. Same.
2. πβ(π΅) β€ π(π΅) for all π΅ β β¬ by definition of πβ. For the other direction, forall π > 0, there exist π΅π β β¬ such that π΅ β βπβ₯1 π΅π and βπβ₯1 π(π΅π) β€πβ(π΅) + π. But
π΅ = βπβ₯1
π΅π β© π΅ = βπβ₯1
πΆπ
where πΆπ βΆ= π΅π β© π΅ \ βπ<π π΅ β© π΅π and so πΆπ β β¬. Thus by countableadditivity
π(π΅) = βπβ₯1
π(πΆπ) β€ βπβ₯1
π(π΅π) β€ πβ(π΅) + π
3. πβ-null sets and β¬ are obviously in β¬β. Thus it is left to show that β¬β
is a π-algebra. Stability under countable union is exactly the same andthen we claim that β¬β is stable under complementation. This is the bitwhere we used closed/open sets in Rπ in the original proof. Here we usea lemma as a substitute.
14
2 Abstract measure theory
Lemma 2.7. Suppose π΅π β β¬ then βπβ₯1 π΅π β β¬β.
Proof. First claim that if πΈ = βπβ₯1 πΌπ where πΌπ+1 β πΌπ and πΌπ β β¬ suchthat π(πΌ1) < β then πβ(πΈ) = limπββ π(πΌπ) and πΈ β β¬β: by additivity ofπ on β¬,
πβπ=1
π(πΌπ \ πΌπ+1) = π(πΌ1) β π(πΌπ)
which converges as π β β (because π(πΌπ+1) β€ π(πΌπ)), so
βπβ₯π
π(πΌπ \ πΌπ+1) β 0
as π β β. But LHS is greater than πβ(πΌπ \ πΈ) because πΌπ \ πΈ =βπβ₯π πΌπ \ πΌπ+1. Therefore πΈ β β¬β and
π(πΌπ) β€ πβ(πΌπ \ πΈ)ββββββ0
+ πβ(πΈ)ββ€π(πΌπ)
solim
πββπ(πΌπ) = πβ(πΈ).
Now for the actual lemma, let πΈ = βπβ₯1 πΌπ where πΌπ β β¬. wlog we mayassume πΌπ+1 β πΌπ. By π-finiteness assumption, π = βπβ₯1 ππ whereππ β β¬ with π(ππ) < β so
πΈ = βπβ₯1
πΈ β© ππ.
By the claim for all π, πΈ β© ππ β β¬β so πΈ β β¬β.
From the lemma we can derive that β¬β is also stable under complementa-tion: given πΈ β β¬β, for all π there exist πΆπ = βπβ₯1 π΅π,π where π΅π,π β β¬such that πΈ β πΆπ and πβ(πΆπ \ πΈ) β€ 1
π . Now
πΈπ = ( βπβ₯1
πΆππ) βͺ (πΈπ \ β
πβ₯1πΆπ
π)
but πΆππ is a countable intersection βπβ₯1 π΅π
π,π and πΈπ \ βπβ₯1 πΆππ is πβ-null
so by the lemma, πΆππ β β¬β. Therefore their union is also in β¬β. Since
weβve shown that null sets are in β¬β, πΈπ β β¬β.
4. We want to show πβ is countably additive on β¬β. Recall that π is π-finite:there exists ππ β β¬ such that π = βπβ₯1 ππ, π(ππ) < β. We sayπΈ β π is bounded if there exists π such that πΈ β ππ. It is then enoughto show countable additivity for bounded sets by the same argument asbefore: write π = βπβ₯1 οΏ½οΏ½π where οΏ½οΏ½π = ππ \ βπ<π ππ β β¬ so this is adisjoint union. Then if πΈ = βπβ₯1 πΈπ as a disjoint union then
πΈ = βπβ₯1
βπβ₯1
(πΈπ β© οΏ½οΏ½π)
15
2 Abstract measure theory
which is also a countable disjoint union.Given πΈ, if we can show finite additivity then
πβπ=1
πβ(πΈπ) = πβ(πβπ=1
πΈπ) β€ πβ(πΈ) β€ βπβ₯1
πβ(πΈπ)
take limit as π β β to have equality throughout.It suffices to prove finite additivity when πΈ and πΉ are countable intersec-tions of sets from β¬: πΈ, πΉ β β¬β so for π > 0 there exists πΆ, π· countableintersections of sets from β¬ such that πΆ β πΈ, π· β πΉ and
πβ(πΈ) β€ πβ(πΆ) + ππβ(πΉ) β€ πβ(π·) + π
As πΈ β© πΉ = β and πΆ β πΈ, π· β πΉ, πΆ β© π· = β so by finite additivity,
πβ(πΈ) + πβ(πΉ) β€ 2π + πβ(πΆ βͺ π·) β€ 2π + πβ(πΈ βͺ πΉ).
As usual, reverse holds by subadditivity.Finally for πΈ = βπβ₯1 πΌπ, πΉ = βπβ₯1 π½π bounded, wlog assume πΌπ+1 βπΌπ, π½π+1 β π½π. π(πΌπ), π(π½π) < β. Now use claim 3,
πβ(πΈ) = limπββ
πβ(πΌπ)
πβ(πΉ) = limπββ
πβ(π½π)
so
πβ(πΈ) + πβ(πΉ) = limπββ
π(πΌπ) + π(π½π) = limπββ
(π(πΌπ βͺ π½π) + π(πΌπ β© π½π))
But
βπβ₯1
(πΌπ β© π½π) = πΈ β© πΉ = β
βπβ₯1
(πΌπ βͺ π½π) = πΈ βͺ πΉ
so by claim 3
limπββ
π(πΌπ β© π½π) = 0
limπββ
π(πΌπ βͺ π½π) = πβ(πΈ βͺ πΉ)
which finishes the proof.
Remark. We prove that every set in β¬β is a disjoint union πΈ = πΉ βͺ π whereπΉ β π(β¬) and π is πβ-null.
16
2 Abstract measure theory
Definition (completion). We say that β¬β is the completion of π(β¬) withrespect to π.
Example. β is the completion of β¬(Rπ) in Rπ.
17
3 Integration and measurable functions
3 Integration and measurable functions
Definition (measurable function). Let (π, π) be a measurable space. Afunction π β R is called measurable or π-measurable if for all π‘ β R,
{π₯ β π βΆ π(π₯) < π‘} β π.
Remark. The π-algebra generated by intervals (ββ, π‘) where π‘ β R is the Borelπ-algebra of R, denote β¬(R). Thus for every measurable function π βΆ π β R,the preimage πβ1(π΅) β π for all π΅ β β¬(R). However, it is not true thatπβ1(πΏ) β π for any πΏ β β.
Remark. If π is allowed to take the values +β and ββ we will say that π ismeasurable if additionally πβ1({+β}) β π and πβ1({ββ}) β π.
More generally,
Definition (measurable map). Suppose (π, π) and (π , β¬) are measurablespaces. A map π βΆ π β π is measurable if for all π΅ β β¬, πβ1(π΅) β π.
Proposition 3.1.
1. The composition of measurable maps is measurable.
2. If π, π βΆ (π, π) β R are measurable functions then π + π, ππ and ππfor π β R are also measurable.
3. If (ππ)πβ₯1 is a sequence of measurable functions on (π, π) then so aresupπ ππ, infπ ππ, lim supπ ππ and lim infπ ππ.
Proof.
1. Obvious.
2. Follow from 1 once itβs shown that + βΆ R2 β R and Γ βΆ R2 β R aremeasurable (with respect to Borel sets). The sets
{(π₯, π¦) βΆ π₯ + π¦ < π‘}{(π₯, π¦) βΆ π₯π¦ < π‘}
are open in R2 and hence Borel.
3. infπ ππ(π₯) < π‘ if and only if
π₯ β βπ
{π₯ βΆ ππ(π₯) < π‘}
and similar for sup. Similarly lim supπ ππ(π₯) < π‘ if and only if
π₯ β βπβ₯1
βπβ₯1
βπβ₯π
{π₯ βΆ ππ(π₯) < π‘ β 1π
}.
18
3 Integration and measurable functions
Proposition 3.2. π = (π1, β¦ , ππ) βΆ (π, π) β (Rπ, β¬(Rπ)) where π β₯ 1 ismeasurable if and only if each ππ βΆ π β R is measurable.
Proof. One direction is easy: suppose π is measurable then
{π₯ βΆ ππ(π₯) < π‘} = πβ1({π¦ β Rπ βΆ π¦π < π‘}),
which is open so ππ is measurable.Conversely, suppose ππ is measurable. Then
πβ1(π
βπ=1
[ππ, ππ]) =π
βπ=1
{π₯ βΆ ππ β€ ππ(π₯) β€ ππ}
As the boxes generate the Borel sets, done.
Example.
1. Let (π, π) be a measurable space and πΈ β π. Then πΈ β π if and only if1πΈ, the indicator function on πΈ, is π-measurable.
2. If π = βππ=1 ππ and π is the Boolean algebra generated by the ππβs. A
function π βΆ (π, π) β R is measurable if and only if π is constant on eachππ. In this case the vector space of measurable functions has dimensionπ.
3. Every continuous function π βΆ Rπ β R is measurable.
Definition (Borel measurable). If π is a topological space, π βΆ π β R isBorel or Borel measurable if it is β¬(π)-measurable.
Definition (simple function). A function π on (π, π) is called simple if
π =π
βπ=1
ππ1π΄π
for some ππ β₯ 0 and π΄π β π.
Of course simple functions are measurable.
Lemma 3.3. If a simple function can be written in two ways
π =π
βπ=1
ππ1π΄π=
π βπ=1
ππ1π΅π
then πβπ=1
πππ(π΄π) =π
βπ=1
πππ(π΅π)
for any measure π on (π, π).
Proof. Example sheet 1.
19
3 Integration and measurable functions
Definition (integral of a simple function with respect to a measure). Theπ-integral of π is defined by
π(π) βΆ=π
βπ=1
πππ(π΄π).
Remark.
1. The lemma says that the integral is well-defined.
2. We also use the notation β«π
πππ to denote π(π).
Proposition 3.4. π-integral satisfies, for all simple functions π and π,
1. linearity: for all πΌ, π½ β₯ 0, π(πΌπ + π½π) = πΌπ(π) + π½π(π).
2. positivity: if π β€ π then π(π) β€ π(π).
3. if π(π) = 0 then π = 0 π-almost everywhere, i.e. {π₯ β π βΆ π(π₯) β 0}is a π-null set.
Proof. Obvious from definition and lemma.
Definition. If π β₯ 0 and measurable on (π, π), define
π(π) = sup{π(π) βΆ π simple , π β€ π} β [0, +β].
Remark. This is consistent with the definition for π simple, due to positivity.
Definition (integrable). If π is an arbitrary measurable function on (π, π)we say π is π-integrable if
π(|π|) < β.
Definition (integral with respect to a measure). If π is π-integrable, thenwe define its π-integral by
π(π) = π(π+) β π(πβ)
where π+ = max{0, π} and πβ = (βπ)+.
Note.
|π| = π+ + πβ
π = π+ β πβ
Theorem 3.5 (monotone convergence theorem). Let (ππ)πβ₯1 be a sequenceof measurable functions on a measure space (π, π, π) such that
0 β€ π1 β€ π2 β€ β― β€ ππ β€ β¦
20
3 Integration and measurable functions
Let π = limπββ ππ. Then
π(π) = limπββ
π(ππ).
Lemma 3.6. If π is a simple function on (π, π, π), the map
ππ βΆ π β [0, β]πΈ β¦ π(1πΈπ)
is a measure on (π, π).
Proof. Write π = βππ=1 ππ1π΄π
so π1πΈ = βππ=1 ππ1π΄πβ©πΈ so
π(1πΈπ) =π
βπ=1
πππ(π΄π β© πΈ).
By a question on example sheet this is well-defined. Then π-additivity followsimmediately from π-additivity of π.
Proof of monotone convergence theorem. ππ β€ ππ+1 β€ π by assumption so
π(ππ) β€ π(ππ+1) β€ π(π)
by definition of integral so
limπββ
π(ππ) β€ π(π),
although RHS may be infinite.Let π be any simple function with π β€ π. Need to show that π(π) β€
limπββ π(ππ). Pick π > 0. Let
πΈπ = {π₯ β π βΆ ππ(π₯) β₯ (1 β π)π(π₯)}.
Then π = βπβ₯1 πΈπ and πΈπ β πΈπ+1. So we may apply upward monotoneconvergence for sets to measure ππ and get
limπββ
ππ(πΈπ) = ππ(π) = π(π1π) = π(π).
But(1 β π)ππ(πΈπ) = π((1 β π)π1πΈπ
)) β€ π(ππ)
because (1 β π)π1πΈπis a simple function smaller than ππ. Taking limit,
(1 β π)π(π) β€ limπββ
π(ππ)
which holds for all π. Soπ(π) β€ lim
πββπ(ππ).
21
3 Integration and measurable functions
Lemma 3.7. If π β₯ 0 is a measurable function on (π, π) then there is asequence of simple functions (ππ)πβ₯1
0 β€ ππ β€ ππ+1 β€ π
such that for all π₯ β π, ππ(π₯) β π(π₯).
Notation. ππ β π means that limπββ ππ(π₯) = π(π₯) and ππ+1 β₯ ππ.
Proof. We can takeππ = 1
2π β2π min{π, π}β
pointwise. Check that β2π¦β β₯ 2βπ¦β for all π¦ β₯ 0.
Proposition 3.8. Basic properties of the integral (for positive functions):suppose π, π β₯ 0 are measurable on (π, π, π).
1. linearity: for all πΌ, π½ β₯ 0, π(πΌπ + π½π¦) = πΌπ(π) + π½π(π).
2. positivity: if 0 β€ π β€ π then π(π) β€ π(π).
3. if π(π) = 0 then π = 0 π-almost everywhere.
4. if π = π π-almost everywhere then π(π) = π(π).
Proof.
1. Follows from the same property for simple functions and from Lemma 3.7combined with monotone convergence theorem.
2. Obvious from definition.
3.{π₯ β π βΆ π(π₯) β 0} = β
πβ₯0{π₯ β π βΆ π(π₯) > 1
π}
set ππ = 1π 1{π₯βπβΆπ(π₯)>1/π} which is simple and ππ β€ π so by definition of
integral π(ππ) β€ π(π) so π(ππ) = 0, i.e. π({π₯ βΆ π(π₯) > 1π }) = 0.
4. Note that if πΈ β π, π(πΈπ) = 0 then
π(β1πΈ) = π(β)
for all β simple. Thus it holds for all β β₯ 0 measurable. Now takeπΈ = {π₯ βΆ π(π₯) = π(π₯)}.
Proposition 3.9 (linearity of integral). Suppose π, π are π-integrable func-tions and πΌ, π½ β R. Then πΌπ + π½π is π-integrable and
π(πΌπ + π½π) = πΌπ(π) + π½π(π).
Proof. We have shown the case when πΌ, π½ β₯ 0 and π, π β₯ 0. In the general case,use the positive and negative parts.
22
3 Integration and measurable functions
Lemma 3.10 (Fatouβs lemma). Suppose (ππ)πβ₯1 is a sequence of measurablefunctions on (π, π, π) such that ππ β₯ 0 for all π. Then
π(lim infπββ
ππ) β€ lim infπββ
π(ππ).
Remark. We may not have equality: let ππ = 1[π,π+1] on (R, β, π). Thenπ(ππ) = 1 but limπββ ππ = 0.
Proof. Let ππ βΆ= infπβ₯π ππ. Then ππ+1 β₯ ππ β₯ 0 so by monotone convergencetheorem, π(ππ) β π(π) as π β β where π = limπββ π = lim infπββ ππ andππ β€ ππ so π(ππ) β€ π(ππ) for all π. Take π β β,
π(π) β€ lim infπββ
π(ππ).
In both monotone convergence theorem and Fatouβs lemma we assumed thatthe sequence of functions is nonnegative. There is another version of convergencetheorem where we replace nonnegativity by domination:
Theorem 3.11 (Lebesgueβs dominated convergence theorem). Let (ππ)πβ₯1be a sequence of measurable functions on (π, π, π) and π a π-integrablefunction on π. Assume |ππ| β€ π for all π (domination assumption) andassume for all π₯ β π, limπββ ππ(π₯) = π(π₯). Then π is π-integrable and
π(π) = limπββ
π(ππ).
This allows us to swap limit and integral.
Proof. |ππ| β€ π so |π| β€ π so π(|π|) β€ π(π) < β and π is integrable. Note thatπ + ππ β₯ 0 so by Fatouβs lemma,
π(lim infπββ
(π + ππ)) β€ lim infπββ
π(π + ππ).
But lim infπββ(π + ππ) = π + π and by linearity π(π + ππ) = π(π) + π(ππ), so
π(π) + π(π) β€ π(π) + lim infπββ
π(ππ),
i.e.π(π) β€ lim inf
πββπ(ππ).
Do the same with π β ππ in place of π + ππ, get
π(βπ) β€ lim infπββ
π(βππ) = β lim supπββ
π(ππ)
soπ(π) = lim
πββπ(ππ).
23
3 Integration and measurable functions
Corollary 3.12 (exchanging integral and summation). Let (π, π, π) be ameasure space and let (ππ)πβ₯1 be a sequence of measurable functions on π.
1. If ππ β₯ 0 thenπ(β
πβ₯1ππ) = β
πβ₯1π(ππ).
2. If βπβ₯1 |ππ| is π-integrable then βπβ₯1 ππ is π-integrable and
π(βπβ₯1
ππ) = βπβ₯1
π(ππ).
Proof.
1. Let ππ = βππ=1 ππ, then ππ β βπβ₯1 ππ as π β β so the result follows
from monotone convergence theorem.
2. Let π = βπβ₯1 |ππ| and ππ as above. Then |ππ| β€ π for all π so thedomination assumption holds. The result thus follows from dominatedconvergence theorem.
Corollary 3.13 (differentiation under integral sign). Let (π, π, π) be ameasure space. Let π β R be an open set and let π βΆ π Γ π β R be suchthat
1. π₯ β¦ π(π‘, π₯) is π-integrable for all π‘ β π,
2. π‘ β¦ π(π‘, π₯) is differentiable for all π₯ β π,
3. domination: there exists π βΆ π β R π-integrable such that for allπ‘ β π, π₯ β π,
ππππ‘
(π‘, π₯) β€ π(π₯).
Then π₯ β¦ ππππ‘ (π‘, π₯) is π-integrable for all π‘ β π and if we set πΉ(π‘) =
β«π
π(π‘, π₯)ππ then πΉ is differentiable and
πΉ β²(π‘) = β«π
ππππ‘
(π‘, π₯)ππ.
Proof. Pick βπ > 0, βπ β 0 and define
ππ(π‘, π₯) βΆ= 1βπ
(π(π‘ + βπ, π₯) β π(π‘, π₯)).
Thenlim
πββππ(π‘, π₯) = ππ
ππ‘(π‘, π₯).
By mean value theorem, there exists ππ‘,π,π₯ β [π‘, π‘ + βπ] such that
ππ(π‘, π₯) = ππππ‘
(π, π₯)
24
3 Integration and measurable functions
so|ππ(π‘, π₯)| β€ π(π₯)
by domination assumption. Now apply dominated convergence theorem.
Remark.
1. If π βΆ [π, π] β R is continuous where π < π in R, then π is π-integrable(where π is the Lebesgue measure) and π(π) = β«π
ππ(π₯)ππ₯ is the Riemann
integral. In general if π is only assumed to be bounded, then π will beRiemann integrable if and only if the points of discontinuity of π is anπ-null set. See example sheet 2.
2. If π β GLπ(R) and π β₯ 0 is Borel measurable on Rπ, then
π(π β π) = 1| det π|
π(π).
See example sheet 2. In particular π is invariant under linear transfor-mation whose determinant has absolute value 1, e.g. rotation.
Remark. In each of monotone convergence theorem, Fatouβs lemma and dom-inated convergence theorem, we can replace pointwise assumption by the corre-sponding π-almost everywhere. The same conclusion holds. Indeed, let
πΈ = {π₯ β π βΆ assumptions hold at π₯}
so πΈπ is a π-null set. Replace each ππ (and similarly π etc) by 1πΈππ. Thenassumptions then hold everywhere as π(π1πΈ) = π(π) for all π measurable.
25
4 Product measures
4 Product measures
Definition (product π-algebra). Let (π, π) and (π , β¬) be measurable spaces.The π-algebra of subsets of π Γπ generated by the product sets πΈ ΓπΉ whereπΈ β π, πΉ β β¬ is called the product π-algebra of π and β¬ and is denoted byπ β π΅.
Remark.
1. By analogy with the notion of product topology, π β β¬ is the smallestπ-algebra of subsets of π Γπ making the two projection maps measurable.
2. β¬(Rπ1) β β¬(Rπ2) = β¬(Rπ1+π2). See example sheet. However this is not sofor β(Rπ).
Lemma 4.1. If πΈ β π Γ π is π β β¬-measurable then for all π₯ β π, theslice
πΈπ₯ = {π¦ β π βΆ (π₯, π¦) β πΈ}
is in β¬.
Proof. Letβ° = {πΈ β π Γ π βΆ πΈπ₯ β β¬ for all π₯ β π}.
Note that β° contains all product sets π΄ Γ π΅ where π΄ β π, π΅ β β¬. β° is a π-algebra: if πΈ β β° then πΈπ β β° and if πΈπ β β° then β πΈπ β β° since (πΈπ)π₯ = (πΈπ₯)π
and (β πΈπ)π₯ = β(πΈπ)π₯.
Lemma 4.2. Assume (π, π, π) and (π , β¬, π) are π-finite measure spaces.Let π βΆ π Γ π β [0, +β] be π β β¬-measurable. Then
1. for all π₯ β π, the function π¦ β¦ π(π₯, π¦) is β¬-measurable.
2. for all π₯ β π, the map π₯ β¦ β«π
π(π₯, π¦)ππ(π¦) is π-measurable.
Proof.
1. In case π = 1πΈ for πΈ β πββ¬ the function π¦ β¦ π(π₯, π¦) is just π¦ β¦ 1πΈπ₯(π¦),
which is measurable by the previous lemma.More generally, the result is true for simple functions and thus for allmeasurable functions by taking pointwise limit.
2. By the same reduction we may assume π = 1πΈ for some πΈ β π β β¬. Nowlet π = βπβ₯1 ππ with π(ππ) < β. Let
β° = {πΈ β π β β¬ βΆ π₯ β¦ π(πΈπ₯ β© ππ) is π-measurable for all π}.
β° contains all product sets πΈ = π΄ Γ π΅ where π΄ β π, π΅ β β¬ becauseπ(πΈπ₯ β© ππ) = 1π₯βππ(π΅ β© ππ). β° is stable under complementation:
π((πΈπ)π₯ β© ππ) = π(ππ) β π(ππ β© πΈπ₯)
26
4 Product measures
where LHS is π-measurable. β° is stable under disjoint countable union:let πΈ = βπβ₯1 πΈπ where πΈπ β β° disjoint. Then by π-additivity
π(πΈπ₯ β© ππ) = βπβ₯1
π((πΈπ)π₯ β© ππ)
which is π-measurable.The product sets form a π-system and generates the product measure soby Dynkin lemma β° = π β β¬.
Definition (product measure). Let (π, π, π) and (π , β¬, π) be measurespaces and π, π π-finite. Then there exists a unique product measure, denotedby π β π, on π β β¬ such that for all π΄ β π, π΅ β β¬,
(π β π)(π΄ Γ π΅) = π(π΄)π(π΅).
Proof. Uniqueness follows from Dynkin lemma. For existence, set
π(πΈ) = β«π
π(πΈπ₯)ππ(π₯).
π is well-defined because π₯ β¦ π(πΈπ₯) is π-measurable by lemma 2. π is countably-additive: suppose πΈ = βπβ₯1 πΈπ where πΈπ β π β β¬ disjoint, then
π(πΈ) = β«π
π(πΈπ₯)ππ(π₯) = β«π
βπβ₯1
π((πΈπ)π₯)πππ₯ = βπβ₯1
β«π
π((πΈπ)π₯)ππ(π₯) = βπβ₯2
π(πΈπ)
by a corollary of MCT.
Theorem 4.3 (Tonelli-Fubini). Let (π, π, π) and (π , β¬, π) be π-finite mea-sure spaces.
1. Let π βΆ π Γ π β [0, +β] be π β β¬-measurable. Then
β«πΓπ
π(π₯, π¦)π(πβπ) = β«π
β«π
π(π₯, π¦)ππ(π¦)ππ(π₯) = β«π
β«π
π(π₯, π¦)ππ(π₯)ππ(π¦).
2. If π βΆ π Γ π β R is π β π-integrable then for π-almost everywhere π₯,π¦ β¦ π(π₯, π¦) is π-integrable, and for π-almost everywhere π¦, π₯ β¦ π(π₯, π¦)is π-integrable and
β«πΓπ
π(π₯, π¦)π(πβπ) = β«π
β«π
π(π₯, π¦)ππ(π¦)ππ(π₯) = β«π
β«π
π(π₯, π¦)ππ(π₯)ππ(π¦).
Without the nonnegativity or integrability assumption, the result is false ingeneral. For example for π = π = N, let π = β¬ be discrete π-algebras and
27
4 Product measures
π = π counting measure. Let π(π, π) = 1π=π β 1π=π+1. Check that
βπβ₯1
π(π, π) = 0
βπβ₯1
π(π, π) = {0 π β₯ 21 π = 1
soβπβ₯1
βπβ₯1
π(π, π) β βπβ₯1
βπβ₯1
π(π, π).
Proof.
1. The result holds for π = 1πΈ where πΈ β πββ¬ by the definition of productmeasure and lemma 2, so it holds for all simple functions. Now take limitsand apply MCT.
2. Write π = π+ β πβ and apply 1.
Note.
1. The Lebesgue measure ππ on Rπ is equal to π1 β β― β π1, because it istrue on boxes and extend by uniqueness of measure.
2. πΈ β π β β¬ if is π β π-null if and only if for π-almost every π₯, π(πΈπ₯) = 0.
28
5 Foundations of probability theory
5 Foundations of probability theoryModern probability theory was founded by Kolmogorov, who formulated theaxioms of probability theory in 1933 in his thesis Foundations on the Theory ofProbability. He defined a probability space to be a measure space (Ξ©, β±,P). Theinterpretation is as follow: Ξ© is the universe of possible outcomes. However, wewouldnβt be able to assign probability to every single outcome unless the spaceis discrete. Instead we are interested in studying some subsets of Ξ©, whichare called events and contained in β±. Finally P is a probability measure withP(Ξ©) = 1. Thus for π΄ β β±, P(π΄ occurs) β [0, 1]. Thus finite additivity of P saysthat if π΄ and π΅ never occurs simultaneously then P(π΄ or π΅ = P(π΄) + P(π΅).π-additivity is slightly more difficult to justify and it is perhaps to see theequivalent notion continuity: if π΄π+1 β π΄π and βπβ₯1 π΄π = β then P(π΄π) β 0as π β β.
Definition (probability measure, probability space). Let Ξ© be a set and β±a π-algebra on Ξ©. A measure π on (Ξ©, β±) is called a probability measure ifπ(Ξ©) = 1 and the measure space (Ξ©, β±, π) is called a probability space.
Definition (random variable). A measurable function π βΆ Ξ© β R is calleda random variable.
We usually use a capital letter to denote a random variable.
Definition (expectation). If (Ξ©, β±,P) is a probability space then the P-integral is called expectation, denoted E.
Definition (distribution/law). A random variable π βΆ Ξ© β R on a proba-bility space (Ξ©, β±,P) determines a Borel measure ππ on R defined by
ππ((ββ, π‘]) = P(π β€ π‘) = P({π β Ξ© βΆ π(π) β€ π‘})
and ππ is called the distribution of π, or the law of π.
Note. ππ is the image of P under
Ξ© β Rπ β¦ π(π)
Definition (distribution function). The function
πΉπ βΆ R β [0, 1]π‘ β¦ P(π β€ π‘)
is called the distribution function of π.
29
5 Foundations of probability theory
Proposition 5.1. If (Ξ©, β±,P) is a probability space and π βΆ Ξ© β R is a ran-dom variable then πΉπ is non-decreasing, right-continuous and it determinesππ uniquely.
Proof. Given π‘π β π‘,
πΉπ(π‘π) = P(π β€ π‘π) β P( βπβ₯1
{π β€ π‘π}) = P({π β€ π‘}) = πΉπ(π‘)
by downward monotone convergence for sets. Uniqueness follows from Dynkinlemma applied to the π-system {β } βͺ {(ββ, π‘]}π‘βR.
Conversely,
Proposition 5.2. If πΉ βΆ R β [0, 1] is a non-decreasing right-continuousfunction with
limπ‘βββ
πΉ(π‘) = 0
limπ‘β+β
πΉ(π‘) = 1
then there exists a unique probability measure π on R such that
πΉ(π‘) = π((ββ, π‘])
for all π‘ β R.
Remark. The measure π is called the Lebesgue-Stieltjes measure on R associ-ated to πΉ. Furthermore for all π, π β R,
π((π, π]) = πΉ(π) β πΉ(π).
We can also construct Lebesgue measure this way.
Proof. Uniqueness is the same as above. For existence, we use the lemma
Lemma 5.3. Let
π βΆ (0, 1) β Rπ¦ β¦ inf{π₯ β R βΆ πΉ (π₯) β₯ π¦}
then π is non-decreasing, left-continuous and for all π₯ β R, π¦ β (0, 1), π(π¦) β€π₯ if and only if πΉ(π₯) β₯ π¦.
Proof. LetπΌπ¦ = {π₯ β R βΆ πΉ (π₯) β₯ π¦}.
Clearly if π¦1 β₯ π¦2 then πΌπ¦1β πΌπ¦2
so π(π¦2) β€ π(π¦1) so π is non-decreasing. πΌπ¦ isan interval of R because if π₯ > π₯1 and π₯1 β πΌπ¦ then πΉ(π₯) β₯ πΉ(π₯1) β₯ π¦ so π₯ β πΌπ¦.So πΌπ¦ is an interval with endpoints π(π¦) and +β. But πΉ is right-continuous soπ(π¦) = min πΌπ¦ and the minimum is obtained. Thus πΌπ¦ = [π(π¦), +β).
This means that π₯ β₯ π(π¦) if and only if π₯ β πΌπ¦ if and only if πΉ(π₯) β₯ π¦.Finally for left-continuity, suppose π¦π β π¦ then βπβ₯1 πΌπ¦π
= πΌπ¦ by definitionof πΌπ¦ so π(π¦π) β π(π¦).
30
5 Foundations of probability theory
Remark. If πΉ is continuous and strictly increasing then π = πΉ β1.
Now back to the proposition. Set π = πβπ where π is the Lebesgue measureon (0, 1). π is a probability measure as π is Borel-measurable. By the lemma
π((π, π]) = π(πβ1(π, π]) = π((πΉ(π), πΉ (π))) = πΉ(π) β πΉ(π).
Proposition 5.4. If π is a Borel probability measure on R then there existssome probability space (Ξ©, β±,P) and a random variable π on Ξ© such thatππ = π.
In fact, one can even pick Ξ© = (0, 1), β± = β¬(0, 1) and P = π, theLebesgue measure.
Proof. For the first claim set Ξ© = R, β± = β¬(R), P = π and π(π₯) = π₯.For the second claim, set πΉ(π‘) = π((ββ, π‘]) and take π = π where π is the
auxillary function defined in the previous lemma, namely
π(π) = inf{π₯ βΆ πΉ(π₯) β₯ π}.
Check that ππ = π:
ππ((π, π]) = P(π β (π, π])= π({π β (0, 1) βΆ π < π(π€) β€ π})= π({π β (0, 1) βΆ πΉ (π) < π < πΉ(π)})
Remark. If π is a Borel probability measure on R such that π = πππ‘ forsome π β₯ 0 measurable, we say that π has a density (with respect to Lebesguemeasure) and π is called the density of π. Here π = πππ‘ means that π((π, π]) =β«ππ
π(π‘)ππ‘.
Example.1. uniform distribution on [0, 1]:
π(π‘) = 1[0,1](π‘)πΉ(π‘) = π((ββ, π‘] β© [0, 1])
2. exponential distribution of rate π:
ππ(π‘) = ππβππ‘1π‘β₯0
πΉπ(π‘) = β«π‘
ββππ(π )ππ = 1π‘β₯0(1 β πβππ‘)
3. Gaussian distribution with standard deviation π and mean π:
ππ,π(π‘) = 1β2ππ2
exp(β(π‘ β π)2
2π2 )
πΉπ,π(π‘) = β«π‘
ββ
1β2ππ2
exp(β(π β π)2
2π2 )ππ
31
5 Foundations of probability theory
Definition (mean, moment, variance). If π is a random variable then
1. E(π) is called the mean,
2. E(ππ) is called the πth-moment of π,
3. Var(π) = E((π β Eπ)2) = E(π2) β E(π)2 is called the variance.
Remark. Suppose π β₯ 0 is measurable and π is a random variable. Then
E(π(π)) = β«R
π(π₯)πππ(π₯)
where by definition of ππ = πβP.
32
6 Independence
6 IndependenceIndependence is the key notion that makes probability theory distinct from(abstract) measure theory.
Definition (independence). Let (Ξ©, β±,P) be a probability space. A se-quence of events (π΄π)πβ₯1 is called independent or mutually independent iffor all πΉ β N finite,
P(βπβπΉ
π΄π) = βπβπΉ
P(π΄π).
Definition (independent π-algebra). A sequence of π-algebras (ππ)πβ₯1where ππ β β± is called independent if for all π΄π β ππ, the family (π΄π)πβ₯1is independent.
Remark.
1. To prove that (ππ)πβ₯1 is an independent family, it is enough to checkthe independence condition for all π΄πβs with π΄π β Ξ π where Ξ π is a π-system generating ππ. The proof is an application of Dynkin lemma. Forexample for π-algebras π1, π2, suffices to check
P(π΄1 β© π΄2) = P(π΄1)P(π΄2)
for all π΄1 β Ξ 1, π΄2 β Ξ 2. Fix π΄2 β Ξ 2, look at the measures
π΄1 β¦ P(π΄1 β© π΄2)π΄1 β¦ P(π΄1)P(π΄2)
on π1. They coincide on Ξ 1 by assumption and hence everywhere on π1.Subsequently consider π2.
Notation. Suppose π is a random variable. Denote by π(π) the smallestπ-subalgebra π of β± such that π is π-measurable, i.e.
π(π) = π({π β Ξ© βΆ π(π) β€ π‘}π‘βR).
Definition (independence). A sequence of random variables (ππ)πβ₯1 iscalled independent if the sequence of π-subalgebras (π(ππ))πβ₯1 is indepen-dent.
Remark. This is equivalent to the condition that for all (π‘π)πβ₯1, for all π,
P((π1 β€ π‘1) β© β― β© (ππ β€ π‘π)) =π
βπ=1
P(ππ β€ π‘π).
Yet another equivalent formulation is
π(π1,β¦,ππ) =π
β¨π=1
πππ
as Borel probability measures on Rπ. βThe joint law is the same as the productof individual lawsβ.
33
6 Independence
Note. Note that independence is a property of a family so pairwise indepen-dence is necessary but not sufficient for independence. A famous counterexampleis Bersteinβs example: take π and π to be random variables for two independentfair coins flips. Set π = |π β π |. Then π = 0 if and only if π = π. Check that
P(π = 0) = P(π = 1) = 12
and each pair (π, π ), (π, π) and (π , π) is independent. But (π, π , π) is notindependent.
Proposition 6.1. If π and π are independent random variables, π β₯0, π β₯ 0 then
E(ππ ) = E(π)E(π ).
Proof. Essentially Tonelli-Fubini:
E(ππ ) = β«R2
π₯π¦πππ,π(π₯, π¦) = β«R2
πππ(π₯)πππ(π¦)
= (β«R
π₯πππ(π₯)) (β«R
π¦πππ(π¦))
= E(π)E(π )
Remark. As in Tonelli-Fubini, we may require ππ to be integrable instead andthe same conclusion holds.
Example. Let Ξ© = (0, 1), β± = β¬(0, 1),P = π the Lebesgue measure. Writethe decimal expansion of π β (0, 1) as
π = 0.π1π2 β¦
where ππ(π) β {0, β¦ , 9}. Choose a convention so that each π has a well-definedexpansion (to avoid things like 0.099 β― = 0.100 β¦). Now let ππ(π) = ππ(π).Claim that the (ππ)πβ₯1 are iid. random variables uniformly distributed on{0, β¦ , 9}, where βiid.β stands for independently and identically distributed.
Proof. Easy check. For example π1(π) = β10πβ so
P(π1 = π1) = 110
.
Similarly for all π
P(π1 = π1, β¦ , ππ = ππ) = 110π , P(ππ = ππ) = 1
10so
P(π1 = π1, β¦ , ππ = ππ) =π
βπ=1
P(ππ = ππ).
34
6 Independence
Remark.π = β
πβ₯1
ππ(π)10π
is distributed according to Lebesgue measure so if we want we can constructLebesgue measure as the law of this random variable.
Proposition 6.2 (infinite product of product measure). Let (Ξ©π, β±π, ππ)πβ₯1be a sequence of probability spaces, Ξ© = βπβ₯1 Ξ©π and β° be the Boolean algebraof cylinder sets, i.e. sets of the form
π΄ Γ βπβ₯π
Ξ©π
for some π΄ β β¨ππ=1 β±π. Set β± = π(β°), the infinite product π-algebra. Then
there is a unique probability measure π on (Ξ©, β±) such that it agrees withproduct measures on all cylinder sets, i.e.
π(π΄ Γ βπ>π
Ξ©π) = (π
β¨π=1
ππ)(π΄)
for all π΄ β β¨ππ=1 β±π.
Proof. Omitted. See example sheet 3.
Lemma 6.3 (Borel-Cantelli). Let (Ξ©, β±,P) be a probability space and (π΄π)πβ₯1a sequence of events.
1. If βπβ₯1 P(π΄π) < β then
P(lim supπ
π΄π) = 0.
2. Conversely, if (π΄π)πβ₯1 are independent and βπβ₯1 P(π΄π) = β then
P(lim supπ
π΄π) = 1.
Note that lim supπ π΄π is also called π΄π io. meaning βinfinitely oftenβ.
Proof.
1. Let π = βπβ₯1 1π΄πbe a random variable. Then
E(π ) = βπβ₯1
E(1π΄π) = β
πβ₯1P(π΄π).
Since π β₯ 0, recall that we prove that E(π ) < β implies that π < βalmost surely, i.e. P-almost everywhere.
2. Note that(lim sup
ππ΄π)π = β
πβ
πβ₯ππ΄π
π
35
6 Independence
so
P( βπβ₯π
π΄ππ) β€ P(
πβ
π=ππ΄π
π)
=πβ
π=πP(π΄π
π) =πβ
π=π(1 β P(π΄π))
β€πβ
π=πexp(βP(π΄π))
β€ exp(βπ
βπ=π
P(π΄π))
β 0
as π β β. ThusP( β
πβ₯ππ΄π
π) = 0
for all π soP(β
πβ
πβ₯ππ΄π
π) = 0.
Definition (random/stochastic process, filtration, tail π-algebra, tail event).Let (Ξ©, β±,P) be a probability space and (ππ)πβ₯1 a sequence of random vari-ables.
1. (ππ)πβ₯1 is sometimes called a random process or stochastic process.
2.β±π = π(π1, β¦ , ππ) β β±
is called the associated filtration. β±π β β±π+1.
3.π = β
πβ₯1π(ππ, ππ+1, β¦ )
is called the tail π-algebra of the process. Its elements are called tailevents.
Example. Tail events are those not affected by the first few terms in the se-quence of random variables. For example,
{π β Ξ© βΆ limπ
ππ(π) exists}
is a tail event, so is{π β Ξ© βΆ lim sup
πππ(π) β₯ π }.
Theorem 6.4 (Kolmogorov 0β1 law). If (ππ)πβ₯1 is a sequence of mutually
36
6 Independence
independent random variables then for all π΄ β π,
P(π΄) β {0, 1}.Proof. Pick π΄ β π. Fix π. For all π΅ β π(π1, β¦ , ππ),
P(π΄ β© π΅) = P(π΄)P(π΅)
as π is independent of π(π1, β¦ , ππ). The measures π΅ β¦ P(π΄)P(π΅) andπ΅ β¦ P(π΄ β© π΅) coincide on each β±π so on βπβ₯1 β±π. Hence they coincideon π(βπβ₯1 β±π) β π so
P(π΄) = P(π΄ β© π΄) = P(π΄)P(π΄)
soP(π΄) β {0, 1}.
6.1 Useful inequalities
Proposition 6.5 (Cauchy-Schwarz). Suppose π, π are random variablesthen
E(|ππ |) β€ βE(π2) β E(π 2).
Proof. For all π‘ β R,
0 β€ E((|π| + π‘|π |)2) = E(π2) + 2π‘E(|ππ |) + π‘2E(π 2)
so viewed as a quadratic in π‘, the discriminant is nonpositive, i.e.
(E(|ππ |)2 β E(π)2 β E(π )2 β€ 0.
Proposition 6.6 (Markov). Let π β₯ 0 be a random variable. Then for allπ‘ β₯ 0,
π‘P(π β₯ π‘) β€ E(π).
Proof.E(π) β₯ E(π1πβ₯π‘) β₯ E(π‘1πβ₯π‘) = π‘P(π β₯ π‘)
Proposition 6.7 (Chebyshev). Let π be a random variable with E(π 2) < β,then for all π‘ β R,
π‘2P(|π β E(π )| β₯ π‘) β€ Var π .
E(π 2) < β implies that E(|π |) < β by Cauchy-Schwarz, so Var π < β.The converse is more subtle.
Proof. Apply Markov to π = |π β E(π )|2.
37
6 Independence
Theorem 6.8 (strong law of large numbers). Let (ππ)πβ₯1 be a sequence ofiid. random variables. Assume E(|π1|) < β. Let
ππ =π
βπ=1
ππ,
then 1π ππ converges almost surely to E(π1).
Proof. We prove the theorem under a stonger condition: we assume E(π41) <
β. This implies, by Cauchy-Schwarz, E(π21),E(|π1|) < β. Subsequently
E(|π1|3) < β. The full proof is much harder but will be given later whenwe have developed enough machinery.
wlog we may assume E(π1) = 0 by replacing ππ with ππ β E(π1). Have
E(π4π) = β
π,π,π,βE(πππππππβ).
All terms vanish because E(ππ) = 0 and (ππ)πβ₯1 are independent, except forE(π4
π ) and E(π2π π2
π ) for π β π. For example,
E(πππ3π ) = E(ππ) β E(π3
π ) = 0
for π β π. Thus
E(π4π) =
πβπ=1
E(π4π ) + 6 β
π<πE(π2
π π2π ).
By Cauchy-Schwarz,
E(π2π π2
π ) β€ βEπ4π β Eπ4
π = Eπ41
soE(π4
π) β€ (π + 6 β π(π β 1)2
)Eπ41
and asymptotically,E(ππ
π)4 = π( 1
π2 )
soE(β
πβ₯1(ππ
π)4) = β
πβ₯1E(ππ
π)4 < β.
Hence β( πππ )4 < β almost surely and it follows that
limπββ
πππ
= 0
almost surely.
Strong law of large numbers has a very important statistical implication: wecan sample the mean of larger number of iid. to detect an unknown law, at leastthe mean.
38
7 Convergence of random variables
7 Convergence of random variables
Definition (weak convergence). A sequence of probability measures (ππ)πβ₯1on (Rπ, β¬(Rπ)) is said to converge weakly to a measure π if for all π β πΆπ(Rπ),the set of continuous bounded functions on Rπ,
limπββ
ππ(π) = π(π).
Example.
1. Let ππ = πΏ1/π be the Dirac mass on Rπ, i.e. for π₯ β Rπ, πΏπ₯ is the Borelprobability measure on Rπ such that
πΏπ₯(π΄) = {1 π₯ β π΄0 π₯ β π΄
then ππ β πΏ0.
2. Let ππ = π©(0, π2π), Gaussian distribution with standard deviation ππ,
where ππ β 0, then again ππ β πΏ0. Indeed,
ππ(π) = β« π(π₯)πππ(π₯)
= β« π(π₯) 1β2ππ2
πexp(β π₯2
2π2π
)ππ₯
= β« π(π₯ππ) 1β2π
exp(βπ₯2
2)ππ₯
ππ β 0 so π(π₯ππ) β π(0) so by dominated convergence theorem, ππ(π) βπ(0) = πΏ0(π).
Definition (convergence of random variable). A sequence (ππ)πβ₯1 of Rπ-valued random variables on (Ξ©, β±,P) is said to converge to a random variableπ
1. almost surely iflim
πββππ(π) = π(π)
for P-almost every π.
2. in probability or in measure if for all π > 0,
limπββ
P(βππ β πβ > π) = 0.
Note that all norms on Rπ are equivalent so we donβt have to specifyone.
39
7 Convergence of random variables
3. in distribution or in law if πππβ ππ weakly, where ππ = πβP is
the law of π, a Borel probability measure on Rπ. Equivalently, for allπ β πΆπ(Rπ), E(π(ππ)) β E(π(π)).
Proposition 7.1. 1 βΉ 2 βΉ 3.
Proof.
1. 1 βΉ 2:P(βππ β πβ > π) = E(1βππβπβ>π)
so if ππ β π almost surely then
1βππβπβ>π β 0
P-almost everywhere so by dominated convergence theorem P(βππ βπβ >π) β 0.
2. 2 βΉ 3: given π β πΆπ(Rπ), need to show that πππ(π) β ππ(π). But
πππ(π) β ππ(π) = E(π(ππ) β π(π)).
To bound this, note that π is continuous and Rπ is locally compact so it islocally uniformly continuous. In particular for all π > 0 exists πΏ > 0 suchthat if βπ₯β < 1
π and βπ¦ β π₯β < πΏ then |π(π¦) β π(π₯)| < π. Thus
|E(π(ππ) β π(π))| β€ E(1βππβπβ<πΏ1βπβ<1/π |π(ππ) β π(π)|βββββββ<π
)
+ 2 βπβββ<β
(P(βππ β πβ β₯ πΏ) + P(βπβ β₯ 1π
))
solim sup
πββ|E(π(ππ) β π(π))| β€ π + 2βπββ P(βπβ > 1
π)
ββββββ0 as πβ0
.
which is 0 as π is arbitrary.
Remark. When π = 1, 3 is equivalent to πΉππ(π₯) β πΉπ(π₯) for all π₯ as π β β
where πΉπ is continuous. See example sheet 3.
The strict converses do not hold but we can say something weaker.
Proposition 7.2. If ππ β π in probability then there is a subsequence(ππ)π such that πππ
β π almost surely as π β β.
Proof. We know for all π > 0, P(βππ β πβ > π) β 0 as π β β. So for all πexists ππ such that
P(βπππβ πβ > 1
π) β€ 1
2π
soβπβ₯1
P(βπππβ πβ > 1
π) < β
40
7 Convergence of random variables
so by the first Borel-Cantelli lemma
P(βπππβ πβ > 1
πio.) = 0.
This means that with probability 1, βπππβ πβ β 0 as π β β.
Definition (convergence in mean). Let (ππ)πβ₯1 and π be Rπ-valued inte-grable random variables. We say that (ππ)π converges in mean or in πΏ1 toπ if
limπββ
E(βππ β πβ) = 0.
Remark.
1. If ππ β π in mean then ππ β π in probability by Markov inequality:
π β P(βππ β πβ > π) β€ E(βππ β πβ).
2. The converse is false. For example take Ξ© = (0, 1), β± = β¬(Ξ£) and PLebesgue measure. Let ππ = π1[0, 1
π ]. ππ β 0 almost surely but Eππ = 1.
When does convergence in probability imply convergence in mean? We needsome kind of domination assumption.
Definition (uniformly integrable). A sequence of random variables (ππ)πβ₯1is uniformly integrable if
limπββ
lim supπββ
E(βππβ1βππββ₯π) = 0.
Remark. If (ππ)πβ₯1 are dominated, namely exists an integrable random vari-able π β₯ 0 such that βππβ β€ π for all π then (ππ)π is uniformly integrable bydominated convergence theorem:
E(βππβ1βππββ₯π) β€ E(π1π β₯π) β 0
as π β β.
Theorem 7.3. Let (ππ)πβ₯1 be a sequence of Rπ-valued integrable randomvariable. Let π be another random variable. Then TFAE:
1. π is integrable and ππ β π in mean,
2. (ππ)πβ₯1 is uniformly integrable and ππ β π in probability.
Proof.
β’ 1 βΉ 2: Left to show uniform integrability:
E(βππβ1βππββ₯π) β€ E(βππ β πβ1βππββ₯π) + E(βπβ1βππββ₯π)β€ E(βππ β πβ) + E(βπβ1βππββ₯π(1βπββ€ π
2+ 1βπβ> π
2))
β€ E(βππ β πβ) + E(βπβ1βππβπββ₯ π2
1βπββ€ π2
) + E(βπβ1βπββ₯ π2
)
β€ E(βππ β πβ) + π2P(βππ β πβ β₯ π
2) + E(βπβ1βπββ₯ π
2)
41
7 Convergence of random variables
Take lim sup,
lim supπββ
E(βππβ1βππββ₯π) β€ 0 + 0 + E(βπβ1βπββ₯ π2
) β 0
by dominated convergence theorem.
β’ 2 βΉ 1: Prove first that π is integrable. By the previous proposition,we can find a subsequence (ππ)π such that πππ
β π almost surely. ByFatouβs lemma,
E(βπβ1βπββ₯π) β€ lim infπβ0
E(βπππβ1βπππ
ββ₯π)
which goes to 0 as π β β by uniform integrability assumption. Thus
E(βπβ) β€ π + E(βπβ1βπββ₯π) < β
for π sufficiently large. Thus π is integrable.To show convergence in mean, we use the same trick of spliting into smalland big parts.
E(βππ β πβ) = E((1βππβπββ€π + 1βππβπβ>π)βππ β πβ)β€ π + E(1βππβπβ>πβππ β πβ(1βππββ€π + 1βππβ>π))β€ π + β + β‘
where
β = E(βππ β πβ1βππβπβ>π1βππββ€π)β€ E((π + βπβ)1βππβπβ>π(1βπββ€π + 1βπβ>π)β€ 2πP(βππ β πβ > π) + 2E(βπβ1βπβ>π)
solim sup
πβββ β€ 2E(βπβ1βπβ>π) β 0
as π β β. On the other hand
β‘ = E(βππ β πβ1βππβπβ>π1βππβ>π)β€ E((βππβ + βπβ)1βππββ₯π)β€ E(βππβ1βππββ₯π + βπβ1βπβ>π + βπβ1βππββ₯π1βπββ€π)β€ 2E(βππβ1βππββ₯π) + E(βπβ1βπβ>π)
taking lim sup,
lim supπββ
β‘ β€ 2 lim supπββ
E(βππβ1βππββ₯π) + E(βπβ1βπβ>π) β 0 + 0
as π β β.
42
7 Convergence of random variables
Definition. We say that a sequence of random variables (ππ)πβ₯1 is boundedin πΏπ if there exists πΆ > 0 such that E(βππβπ) β€ πΆ for all π.
Proposition 7.4. If π > 1 and (ππ)πβ₯1 is bounded in πΏπ then (ππ)πβ₯1 isuniformly integrable.
Proof.
ππβ1E(βππβ1βππββ₯π) β€ E(βππβπ1βππββ₯π) β€ E(βππβπ) β€ πΆ
solim sup
πββE(βππβ1βππββ₯π) β€ πΆ
ππβ1 β 0
as π β β.
This provides a sufficient condition for uniform integrability and thus con-vergnce in mean.
43
8 πΏπ spaces
8 πΏπ spacesRecall that π βΆ πΌ β R is convex means that for all π₯, π¦ β πΌ, for all π‘ β [0, 1],
π(π‘π₯ + (1 β π‘)π¦) β€ π‘π(π₯) + (1 β π‘)π(π¦).
Proposition 8.1 (Jensen inequality). Let πΌ be an open interval of R andπ βΆ πΌ β R a convex function. Let π be a random variable (Ξ©, β±,P). Assumeπ is integrable and takes values in πΌ. Then
E(π(π)) β₯ π(E(π)).
Remark.
1. As π β πΌ almost surely and πΌ is an interval, have E(π) β πΌ.
2. Weβll show that π(π)β is integrable so
E(π(π)) = E(π(π)+) β E(π(π)β)
with the possibility that both sides are infinity.
Lemma 8.2. TFAE:
1. π is convex,
2. there exists a family β± of affine functions (π₯ β¦ ππ₯ + π) such thatπ = supβββ± β on πΌ.
Proof.
1. 2 βΉ 1: every β is convex and the supremum of β is convex:
β(π‘π₯ + (1 β π‘)π¦) = π‘β(π₯) + (1 β π‘)β(π¦) β€ π‘ supβββ±
β(π₯) + (1 β π‘) supβββ±
β(π¦)
so
π(π‘π₯ + (1 β π‘)π¦) = supβββ±
β(π‘π₯ + (1 β π‘)π¦)
β€ π‘ supβββ±
β(π₯) + (1 β π‘) supβββ±
β(π¦)
= π‘π(π₯) + (1 β π‘)π(π¦)
2. We need to show that for all π₯0 β πΌ we can find an affine function
βπ₯0(π₯) = ππ₯0
(π₯ β π₯0) + π(π₯0),
where ππ₯0is morally the slope at π₯0, such that π(π₯) β₯ βπ₯0
(π₯) for all π₯ β πΌ.Then have π = supπ₯0βπΌ βπ₯0
.
To find ππ₯0observe that for all π₯ < π₯0 < π¦ where π₯, π¦ β πΌ, have
π(π₯0) β π(π₯)π₯0 β π₯
β€ π(π¦) β π(π₯0)π¦ β π₯0
.
44
8 πΏπ spaces
Indeed this is the convexity of π on [π₯, π¦] with π‘ = π₯0βπ₯π¦βπ₯ . This holds for
all π₯ < π₯0, π¦ > π₯0 so there exists π β R such that
π(π₯0) β π(π₯)π₯0 β π₯
β€ π β€ π(π¦) β π(π₯0)π¦ β π₯0
.
Then just set βπ₯0(π₯) = π(π₯ β π₯0) + π(π₯0). By construction π(π₯) β₯ βπ₯0
(π₯)for all π₯ β πΌ.
Proof of Jensen inequality. Let π(π₯) = supβββ± β(π₯) where β affine. then
E(π(π)) β₯ E(β(π)) = β(E(π))
for all β β β± so take supremum,
E(π(π)) β₯ supβββ±
β(E(π)) = π(E(π)).
Also for the remark,
βπ = β supβββ±
β = infβββ±
(ββ)
so πβ = (βπ)+ β€ |β| for all β β β±. Then
π(π)β β€ |β(π)| β€ |π||π| + |π|.
As π is integrable, π(π)β is integrable.
Jensen inequality is for probability space only. The following applies to allmeasure spaces.
Proposition 8.3 (Minkowski inquality). Let (π, π, π) be a measure spaceand π, π measurable functions on π. Let π β [1, β) and define the π-norm
βπβπ = (β«π
|π|πππ)1/π
.
Thenβπ + πβπ β€ βπβπ + βπβπ.
Proof. wlog assume βπβπ, βπβπ β 0. Need to show
β₯βπβπ
βπβπ + βπβπ
πβπβ π
+βπβπ
βπβπ + βπβπ
πβπβπ
β₯π
β€ 1.
Suffice to show for all π‘ β [0, 1], for all πΉ, πΊ measurable such that βπΉβπ = βπΊβπ =1, have
βπ‘πΉ + (1 β π‘)πΊβπ β€ 1βthe unit ball is convexβ. For this note that
[0, +β) β [0, +β)π₯ β¦ π₯π
45
8 πΏπ spaces
is convex if π β₯ 1 so
|π‘πΉ + (1 β π‘)πΊ|π β€ π‘|πΉ |π + (1 β π‘)|πΊ|π
andβ«
π|π‘πΉ + (1 β π‘)πΊ|πππ β€ π‘ β«
π|πΉ |πππ
βββββ=1
+(1 β π‘) β«π
|πΊ|πππβββββ
=1
= 1.
Proposition 8.4 (HΓΆlder inequality). Suppose (π, π, π) is a measure spaceand let π, π be measurable functions on π. Given π, π β (1, β) such that1π + 1
π = 1,
β«π
|ππ|ππ β€ (β«π
|π|πππ)1/π
(β«π
|π|πππ)1/π
with equality if and only if there exists (πΌ, π½) β (0, 0) such that πΌ|π|π = π½|π|ππ-almost everywhere.
Lemma 8.5 (Young inequality). For all π, π β (1, β) such that 1π + 1
π = 1,for all π, π β₯ 0, have
ππ β€ ππ
π+ ππ
π.
Proof of HΓΆlder inequality. wlog assume βπβπ, βπβπ β 0. By scaling by factors(πΌ, π½) β (0, 0) wlog βπβπ = βπβπ = 1. Then by Young inequality,
|ππ| β€ 1π
|π|π + 1π
|π|π
soβ«
π|ππ|ππ β€ 1
πβ«
π|π|πππ + 1
πβ«
π|π|πππ = 1
π+ 1
π= 1.
Remark. Apply Jensen inequality to π(π₯) = π₯πβ²/π for πβ² > π, we have
E(|π|π)1/π β€ E(|π|πβ²)1/πβ² ,
so the function π β¦ E(|π|π)1/π is non-decreasing. This can be used, for example,to show that if π has finite πβ²th moment then it has finite πth moment for πβ² β₯ π.
Definition. Let (π, π, π) be a measure space.
β’ For π β₯ 1,
βπ(π, π, π) = {π βΆ π β R measurable such that |π|π is π-integrable}.
β’ For π = β,
ββ(π, π, π) = {π βΆ π β R measurable such that essup |π| < β}
46
8 πΏπ spaces
where
essup |π| = inf{π‘ βΆ |π(π₯)| β€ π‘ for π-almost every π₯}.
Lemma 8.6. βπ(π, π, π) is an R-vector space.
Proof. For π < β use Minkowski inequality. Similar we can check that βπ +πββ β€ βπββ + βπββ for all π, π.
Definition. We say π and π are π-equivalent and write π β‘π π if for π-almost every π₯, π(π₯) = π(π₯).
Check that this is an equvalence relation stable under addition and multi-plication.
Definition (πΏπ-space). Define
πΏπ(π, π, π) = βπ(π, π, π)/ β‘π
and if [π] denotes the equivalence class of π under β‘π we define
β[π]βπ = βπβπ.
Proposition 8.7. For π β [1, β], πΏπ(π, π, π) is a normed vector spaceunder ββ βπ and it is complete, i.e. it is a Banach space.
Proof. If π β‘π π then βπβπ = βπβπ so ββ βπ on πΏπ is well-defined. Triangleinequality follows from Minkowski inequality and linearity is obvious so ββ βπis indeed a norm.
For completeness, pick (ππ)π a Cauchy sequence in βπ(π, π, π). Need toshow that there exists π β βπ such that βππ β πβπ β 0 as π β β. This thenimplies that [ππ] β [π] in πΏπ.
We can extract a subsequence ππ β β such that βπππ+1β πππ
βπ β€ 2βπ. Let
ππΎ =πΎ
βπ=1
|πππ+1β πππ
|
then
βππΎβπ β€πΎ
βπ=1
βπππ+1β πππ
βπ β€πΎ
βπ=1
2βπ β€ 1
so by monotone convergence,
limπΎββ
β«π
|ππΎ|πππ = β«π
|πβ|πππ,
i.e. πβ β βπ. In particular for π-almost everywhere π₯, |πβ(π₯)| < β, i.e.
βπβ₯1
|πππ+1(π₯) β πππ
(π₯)| < β.
47
8 πΏπ spaces
Hence (πππ(π₯))π is a Cauchy sequence in R. By completeness of R, the limit
exists and set π(π₯) to be it. When this limit does not exist set π(π₯) = 0.We then have, in case π < β, by Fatouβs lemma
βππ β πβπ β€ lim infπββ
βππ β πππβπ β€ π
for any π for π sufficiently large. Thus
limπββ
βππ β πβπ = 0.
When π = β, we use the fact that if ππ β π π-almost everywhere then
βπββ β€ lim infπββ
βππββ.
Proof. Let π‘ > lim supπβββππββ. Then exists ππ β β such that
βπππββ = essup |πππ
| = inf{π β₯ 0 βΆ |πππ(π₯)| β€ π for π-almost every π₯} < π‘
for all π. Thus for all π, for π-almost every π₯, |πππ(π₯)| < π‘. But by π-additivity
of π we can swap the quantifiers, i.e. for π-almost every π₯, for all π₯, |πππ(π₯)| < π‘.
Thus for π-almost every π₯, βπ(π₯)ββ β€ π‘.
Proposition 8.8 (approximation by simple functions). Let π β [1, β). Letπ be the linear span of all simple functions on π. Then π β© βπ is dense inβπ.
Proof. Note that π β βπ implies π+, πβ β βπ. Thus by writing π = π+ β πβ
and using Minkowski inequality, suffice to show π β₯ 0 is the limit of a sequenceof simple functions.
Recall there exist simple functions 0 β€ ππ β€ π such that ππ(π₯) β π(π₯) forπ-almost every π₯. Then
limπββ
βππ β πβππ = lim
πβββ«
π|ππ β π|πππ = 0
by dominated convergence theorem (|ππ β π| β€ ππ + π β€ 2π so ππ β π is π-integrable).
Remark. When π = Rπ, π = β¬(Rπ) and π is the Lebesgue measure, πΆπ(Rπ),the space of continuous functions with compact support is dense in βπ(π, π, π)when π β [1, β) (this does not hold for π = β: a constant nonzero function hasno noncompact support). See example sheet. In fact, πΆβ
π (Rπ) suffices.
48
9 Hilbert space and πΏ2-methods
9 Hilbert space and πΏ2-methods
Definition (inner product). Let π be a complex vector space. A Hermitianinner product on π is a map
π Γ π β C(π₯, π¦) β¦ β¨π₯, π¦β©
such that
1. β¨πΌπ₯ + π½π¦, π§β© = πΌβ¨π₯, π§β© + π½β¨π¦, π§β© for all πΌ, π½ β C, for all π₯, π¦, π§ β π.
2. β¨π¦, π₯β© = β¨π₯, π¦β©.
3. β¨π₯, π₯β© β R and β¨π₯, π₯β© β₯ 0, with equality if and only if π₯ = 0.
Definition (Hermitian norm). The Hermitian norm is defined as βπ₯β =ββ¨π₯, π₯β©.
Lemma 9.1. Properties of norm:
1. linearity: βππ₯β = |π|βπ₯β for all π β C, π₯ β π,
2. Cauchy-Schwarz: |β¨π₯, π¦β©| β€ βπ₯β β βπβ,
3. triangle inequality: βπ₯ + π¦β β€ βπ₯β + βπ¦β
4. parallelogram identity: βπ₯ + π¦β2 + βπ₯ β π¦β2 = 2(βπ₯β2 + βπ¦β2)
Proof. Exercise. For reference see authorβs notes on IID Linear Analysis.
Corollary 9.2. (π , ββ β) is a normed vector space.
Definition (Hilbert space). We say (π , ββ β) is a Hilbert space if it is com-plete.
Example. Let π = πΏ2(π, π, π) where (π, π, π) is a measure space. Then wecan define
β¨π, πβ© = β«π
ππππ
which is well-defined (i.e. finite) by Cauchy-Schwarz. The axioms are easy tocheck, with positive-definiteness given by
0 = β¨π, πβ© = β«π
|π|2ππ
if and only if π = 0 π-almost everywhere so π = 0.
49
9 Hilbert space and πΏ2-methods
Proposition 9.3. Let π» be a Hilbert space and let π be a closed convexsubset of π». Then for all π₯ β π», there exists unique π¦ β π such that
βπ₯ β π¦β = π(π₯, π)
where by definitionπ(π₯, π) = inf
πβπβπ₯ β πβ.
This π¦ is called the orthogonal projection of π₯ on π.
Proof. Let ππ β π be a sequence such that βπ₯ β ππβ β π(π₯, π). Letβs show that(ππ)π is a Cauchy sequence. By parallelogram identity,
β₯π₯ β ππ2
+ π₯ β ππ2
β₯2
+ β₯π₯ β ππ2
β π₯ β ππ2
β₯2
= 12
(βπ₯ β ππβ2 + βπ₯ β ππβ2)
soβ₯β₯β₯β₯π₯ β ππ + ππ
2ββπ
β₯β₯β₯β₯
2
ββββββββ₯π(π₯,π)2
+14
βππ β ππβ2 = 12
(βπ₯ β ππβ2 + βπ₯ β ππβ2)βββββββββββ
βπ(π₯,π)2
solim
π,πβββππ β ππβ = 0
i.e. (ππ)π is Cauchy. By completeness exist limπββ ππ = π¦ β π». As π is closed,π¦ β π. As βπ₯ β ππβ β π(π₯, π), βπ₯ β π¦β β π(π₯, π). This shows existence of π¦.
For uniqueness, use parallelogram identity
β₯π₯ β π¦ + π¦β²
2β₯2
ββββββ₯π(π₯,π)
+14
βπ¦ β π¦β²β2 = 12
(βπ₯ β π¦β2 + βπ₯ β π¦β²β2) = π(π₯, π)2
so βπ¦ β π¦β²β = 0.
Corollary 9.4. Suppose π β€ π» is a closed subspace of a Hilbert space π».Then
π» = π β π β
whereπ β = {π₯ β π» βΆ β¨π₯, π£β© = 0 for all π£ β π }
is the orthogonal of π.
Proof. π β© π β = 0 by positivity of inner product. If π₯ β π» then there exists aunique π¦ β π such that βπ₯ β π¦β = π(π₯, π¦). Need to show that π₯ β π¦ β π β.
For all π§ β π,βπ₯ β π¦ β π§β β₯ βπ₯ β π¦β
as π¦ + π§ β π. Thus
βπ₯ β π¦β2 + βπ§β β 2 Reβ¨π₯ β π¦, π§β© β₯ βπ₯ β π¦β2.
50
9 Hilbert space and πΏ2-methods
Rearrange,2 Reβ¨π₯ β π¦, π§β© β€ βπ§β2
for all π§ β π. Now substitute π‘π§ for π§ where π‘ β R+, have
π‘ β 2 Re π₯ β π¦, π§ β€ π‘2βπ§β2.
For π‘ = 0,Reβ¨π₯ β π¦, π§β© β€ 0
Similarly replace π§ by βπ§ to conclude Reβ¨π₯βπ¦, π§β© = 0. Finally replace π§ by ππππ§to have β¨π₯ β π¦, π§β© = 0 for all π§. Thus π₯ β π¦ β π β.
Definition (bounded linear form). A linear form β βΆ π» β C is bounded ifthere exists πΆ > 0 such that |β(π₯)| β€ πΆβπ₯β for all π₯ β π».
Remark. β bounded is equivalent to β continuous.
Theorem 9.5 (Riesz representation theorem). Let π» be a Hilbert space.For every bounded linear form β there exists π£0 β π» such that
β(π₯) = β¨π₯, π£0β©
for all π₯ β π».
Proof. By boundedness of β, ker β is closed so write
π» = ker β β (ker β)β.
If β = 0 then just pick π£0 = 0. Otherwise pick π₯0 β (ker β)β \ {0}. But (ker β)β
is spanned by π₯0: indeed for any π₯ β (ker β)β,
β(π₯) = β(π₯)β(π₯0)
β(π₯0)
soβ(π₯ β β(π₯)
β(π₯0)π₯0) = 0
so π₯ β β(π₯)β(π₯0) β ker β β© (ker β)β = 0. Now let
π£0 = β(π₯0)βπ₯0β2 π₯0
and observe that β(π₯) β β¨π₯, π£0β© vanishes on ker β and on (ker β)β = Cπ₯0. Thusit is identically zero.
Definition (absolutely continuous, singular measure). Let (π, π) be a mea-surable space and let π, π be two measures on (π, π).
1. π is absolutely continuous with respect to π, denoted π βͺ π, if forevery π΄ β π, π(π΄) = 0 implies π(π΄) = 0.
2. π is singular, denoted π β π, if exists Ξ© β π such that π(Ξ©) = 0 and
51
9 Hilbert space and πΏ2-methods
π(Ξ©π) = 0.Example.
1. Let π be the Lebesgue measure on (R, β¬(R)) and ππ = πππ where π β₯ 0is a Borel function then π βͺ π.
2. If π = πΏπ₯0, the Dirac mass at π₯0 β R, then π β π£.
Non-examinable theorem and proof:
Theorem 9.6 (Radon-Nikodym). Assume π and π are π-finite measureson (π, π).
1. If π βͺ π then there exists π β₯ 0 measurable such that ππ = πππ,namely
π(π΄) = β«π΄
π(π₯)ππ(π₯)
for all π΄ β π. π is called the density of π with respect to π or Radon-Nikodym derivative, sometimes denoted by π = ππ
ππ .
2. For any π, π π-finite, π decomposes as
π = ππ + ππ
where ππ βͺ π and ππ β π. Moreover this decomposition is unique.
Proof. Consider π» = πΏ2(π, π, π + π), which is a Hilbert space. First assume πand π are finite. Consider the linear form
β βΆ π» β R
π β¦ π(π) = β«π
πππ
β is bounded by Cauchy-Schwarz and finiteness of the measures:
|π(π)| β€ π(|π|) β€ (π + π)(|π|) β€ β(π + π)(π) β ββ«π
|π|2π(π + π) = πΆ β βπβπ»
so by Riesz representation theorem, there exists π0 β πΏ2(π, π, π + π) such that
π(π) = β«π
ππ0π(π + π). (β)
Claim that for (π+π)-almost every π₯, 0 β€ π0(π₯) β€ 1: take π = 1{π0<0} and plugit into (β),
0 β€ π({π0 < 0}) = β«π
1{π0<0}π0ββ€0
π(π + π) β€ 0
so equality throughout. Thus π0 β₯ 0 (π + π)-almost everywhere. Similarly takeπ = 1{π0>1+π} for π > 0 and plug it into (β),
(π + π)({π0 > 1 + π}) β₯ π({π0 > 1 + π})
= β«π
1{π0>1+π}π0π(π + π)
β₯ (1 + π)(π + π)({π0 > 1 + π})
52
9 Hilbert space and πΏ2-methods
so must have(π + π)({π0 > 1 + π}) = 0
i.e. π0 β€ 1 (π + π)-almost everywhere.Now set Ξ© = {π₯ β π βΆ π0 β [0, 1)} so on Ξ©π, π0 = 1 (π + π)-almost
everywhere. Then (β) is equivalent to
β«π
π(1 β π0)ππ = β«π
ππ0ππ
for all π β πΏ2(π, π, π + π). Hence this holds for all π β₯ 0. Now π be π1βπ0
1Ξ©,get
β«Ξ©
πππ = β«Ξ©
π π01 β π0
ππ. (ββ)
Set
ππ(π΄) = π(π΄ β© Ξ©)ππ (π΄) = π(π΄ β© Ξ©π)
Clearly π = ππ + ππ . Claim that this is the required result, i.e.
1. ππ βͺ π,
2. ππ β π,
3. πππ = πππ where π = π01βπ0
1Ξ©.
Proof.
1. If π(π΄) = 0 set π = 1π΄ and plug into (ββ) to get π(π΄ β© Ξ©) = 0, namelyππ(π΄) = 0.
2. Set π = 1Ξ©π . On Ξ©π, π0 = 1 (π + π)-almost everywhere. Plug this into (β)to get π(Ξ©π) = 0. But ππ (Ξ©) = 0 so ππ β π.
3. (ββ) is equivalent to πππ = πππ where π = π01βπ0
1Ξ©.
This settles part 2 of the theorem, and also part 1 as if π βͺ π then π = ππ.If π are π are not finite but only π-finite, use the old trick of partition π
into countably many π- and π-finite sets and take their intersections. Supposewe get a disjoint countable union π = βπ ππ. Then π = βπ π|ππ
where foreach π we can write
π|ππ= (π|ππ
)π + (π|ππ)π .
Then set
ππ = βπ
(π|ππ)π
ππ = βπ
(π|ππ)π
53
9 Hilbert space and πΏ2-methods
Remains to check uniqueness of decomposition. Suppose π can be decom-posed in two ways
π = ππ + ππ = πβ²π + πβ²
π .
As ππ , πβ²π β π there exists Ξ©0, Ξ©β²
0 β π such that
ππ (Ξ©0) = 0, π(Ξ©π0) = 0
πβ²π (Ξ©β²
0) = 0, π((Ξ©β²0)π) = 0
Set Ξ©1 = Ξ©0 β© Ξ©β²0. Check that
ππ (Ξ©1) = πβ²π (Ξ©1) = 0
π(Ξ©π1) = π(Ξ©π
0 βͺ Ξ©β²π0 ) = 0
Now ππ, πβ²π βͺ π so
ππ(Ξ©π1) = πβ²
π(Ξ©π1) = 0.
Hence for all π΄ β π,
ππ(π΄) = ππ(π΄ β© Ξ©1) = π(π΄ β© Ξ©1) = πβ²π(π΄ β© Ξ©1) = πβ²
π(π΄)
so ππ = πβ²π and hence ππ = πβ²
π .
Proposition 9.7. Let (Ξ©, β±,P) be a probability space. Let π’ be a π-subalgbebra of β± and π a random variable on (Ξ©, β±,P). Assume π isintegrable, then there exists a random variable π on (Ξ©, π’,P) such that
E(1π΄π) = E(1π΄π )
for all π΄ β π’. Moreover π is unique almost surely.
If you are perplexed by why it is nontrivial to recover a random variable on π’from one on β±, as π’ β β± so it seems that we can easily restrict to a βsub-randomvariableβ. But this reasoning makes absolutely no sense as random variablesare functions from the space. In other words, id βΆ (Ξ©, β±,P) β (Ξ©, π’,P) ismeasurable but its inverse is not, and it is not obvious that π has a pushforward.
Definition (conditional expectation). π as above is called the conditionalexpectation of π with respect to π’, denote by
π = E(π|π’).
Proof. wlog assume π β₯ 0. Set π(π΄) = E(1π΄π) for all π΄ β π’. π is finiteby integrability of π and is a measure on (Ξ©, π’). Moreover π βͺ P. Thus byRadon-Nikodym there exists π β₯ 0 π’-measurable such that
π(π΄) = β«π΄
ππP = E(1π΄π).
Set π = π.Uniqueness is shown in example sheet 3.
54
9 Hilbert space and πΏ2-methods
Remark. In case π β πΏ2(Ξ©, β±,P) then π is the orthogonal projection of πonto πΏ2(Ξ©, π’,P). It is well-defined since πΏ2(Ξ©, β±,P) is a Hilbert space andπΏ2(Ξ©, π’,P) is a closed subspace. In this case TFAE:
1. E((π β π )1π΄) = 0 for all π΄ β π’,
2. E((π β π )β) = 0 for all β simple on (Ξ©, π’,P),
3. E((π β π )β) = 0 for all β β₯ 0 π’-measurable,
4. E((π β π )β) = 0 for all β β πΏ2(Ξ©, π’,P).
Remark. Special case when π’ = {β , π΅, π΅π, Ξ©} where π΅ β β±:
E(1π΄|π’)(π) = {P(π΄|π΅) π β π΅P(π΄|π΅π) π β π΅π
whereP(π΄|π΅) = P(π΄ β© π΅)
P(π΅).
Proposition 9.8 (non-examinable). Properties of conditional expectation:
1. linearity: E(πΌπ + π½π |π’) = πΌE(π|π’) + π½E(π |π’).
2. if π is π’-measurable then E(π|π’) = E(π).
3. positivity: if π β₯ 0 then E(π|π’) β₯ 0.
4. E(E(π|π’)|β) = E(π|β) if β β π’.
5. if π is π’-measurable and bounded then E(ππ|π’) = π β E(π|π’).
55
10 Fourier transform
10 Fourier transform
Definition (Fourier transform). Let π β πΏ1(Rπ, β¬(Rπ), ππ₯) where ππ₯ is theLebesgue measure. The function
π βΆ Rπ β C
π’ β¦ β«Rπ
π(π₯)ππβ¨π’,π₯β©ππ₯
where β¨π’, π₯β© = π’1π₯1 + β― + π’ππ₯π, is called the Fourier transform of π.
Proposition 10.1.
1. | π(π’)| β€ βπβ1.
2. π is continuous.
Proof. 1 is clear. 2 follows from dominated convergence theorem.
Definition. Given a finite Borel measure π on Rπ, its Fourier transform isgiven by
π(π’) = β«Rπ
ππβ¨π’,π₯β©ππ(π₯).
Again | π(π’)| β€ π(Rπ) and π is continuous.
Remark. If π is an Rπ-valued random variable with law ππ then ππ is calledthe characteristic function of π.
Example.
1. Normalised Gaussian distribution on R: π = π©(0, 1). ππ = πππ₯ where
π(π₯) = πβπ₯2/2β
2π.
Claim thatπ(π’) = π(π’) = πβπ’2/2,
i.e. π =β
2ππ. This is the defining characteristic of Gaussian distribution.
Proof. Since (π’ β¦ |ππ’ β πππ’π₯πβπ₯2/2|) β πΏ1(R), we can differentiate under
56
10 Fourier transform
integral sign to get
πππ’
π(π’) = πππ’
β«R
πππ’π₯πβπ₯2/2 ππ₯β2π
= β«R
ππ₯πππ’π₯πβπ₯2/2 ππ₯β2π
= β β«R
ππππ’π₯πβ²(π₯)ππ₯
= π β«R
(πππ’π₯)β²π(π₯)ππ₯
= βπ’ β«R
πππ’π₯π(π₯)ππ₯
= βπ’ π(π’)
Thusπ
ππ’( π(π’)ππ’2/2) = 0
soπ(π’) = π(0)πβπ’2/2.
Butπ(0) = β«
Rπ(π₯)ππ₯ = 1
so π(π’) = πβπ’2/2 as required.
2. π-dimensional version: π = π©(0, πΌπ). ππ(π₯) = πΊ(π₯)ππ₯ where ππ₯ =ππ₯1 β― ππ₯π and
πΊ(π₯) =π
βπ=1
π(π₯π) = πββπ₯β2/2
(β
2π)π.
Then
πΊ(π’) = β«Rπ
ππβ¨π’,π₯β©πΊ(π₯)ππ₯
=π
βπ=1
β«R
π(π₯π)πππ’ππ₯πππ₯π
=π
βπ=1
π(π’π)
= πββπ’β2/2
Theorem 10.2 (Fourier inversion formula).
1. If π β πΏ1(Rπ) is such that π β πΏ1(Rπ) then π is continuous (i.e. πequals to a continuous function almost everywhere) and
π(π₯) = 1(2π)π
π(βπ₯).
57
10 Fourier transform
2. If π is a finite Borel measure on Rπ such that π β πΏ1(Rπ) then π hasa continuous density with respect to Lebesgue measure, i.e. ππ = πππ₯with
π(π₯) = 1(2π)π
π(βπ₯).
Remark. In other words
π(π₯) = 1(2π)π β«
Rπ
π(π’)πβπβ¨π’,π₯β©ππ’
where π(π’) are Fourier coefficients and πβπβ¨π’,π₯β© are called Fourier modes, whichare characters ππβ¨π’,ββ© βΆ Rπ β {π§ β C βΆ |π§| = 1}. Informally this says that everyπ can be written as an βinfinite linear combinationβ of Fourier modes.
Proof. (!UPDATE: 1 does not quite reduce to 2 as π = π+ β πβ does not quitehold. Instead write π = π ππ β π ππ where π = βπ+β1, πππ₯ = π
βπ+β1ππ₯)
1 reduces to 2 by considering π = π+ β πβ. In 2 we may assume wlog πis a probability measure so is the law of some random variable π. Let π(π₯) =
1(2π)π
π(βπ₯). We need to show π = πππ₯, which is equivalent to for all π΄ β β¬(Rπ),
π(π΄) = β«Rπ
π1π΄ππ₯.
Let β = 1π΄ and wlog assume π΄ is a bounded Borel set. The trick is to introducean independent Gaussian random variable π βΌ π©(0, πΌπ) with law πΊππ₯. We have
β«Rπ
β(π₯)ππ(π₯) = E(β(π))) = limπβ0
E(β(π + ππ))
by dominated convergence theorem. But
E(β(π + ππ)) = E(β«Rπ
β(π + ππ₯)πΊ(π₯)ππ₯)
= E(β«Rπ
β«Rπ
β(π + ππ₯)ππβ¨π’,π₯β©πΊ(π’) ππ’ππ₯(β
2π)π)
asπΊ(π₯) = 1
(β
2π)ππΊ(π₯) = 1
(β
2π)πβ«Rπ
πΊ(π’)ππβ¨π’,π₯β©ππ’.
So by a change of variable π¦ = ππ₯,
E(β(π + ππ)) = E(β« β« β(π + π¦)ππβ¨π’,π¦/πβ©πΊ(π’) ππ’(β
2π)πππ¦ππ )
= E(β« β« β(π§)ππβ¨π’/π,π§βπ₯β©πΊ(π’) ππ’(β
2ππ2)πππ§)
= β« β« β(π§)ππβ¨π’/π,π§β© ππ(βπ’π
)πΊ(π’) ππ’(β
2ππ2)πππ§ Tonelli-Fubini
= 1(2π)π β« β« ππ(π’)πβπβ¨π’,π§β©β(π§)πβπ2βπ’β2/2ππ’ππ§
58
10 Fourier transform
We want a condition to ensure π β πΏ1(Rπ). Clearly continuity is necessary.Here we use a generally principle in Fourier analysis: Fourier transform convertsdecay at infinity to smoothness. In Fourier inversion formula, if π’ is large thenthe Fourier character has fast oscillation. Thus if π decays fast at infinity thenthe Fourier coefficients also decays fast, and the resulting transfrom is smoother.
Proposition 10.3. If π, π β² and πβ³ exists (for example if π is πΆ2) and arein πΏ1 then π β πΏ1.
Proof. We prove the case π = 1. The general case follows from Tonelli-Fubini.We show first that π, π β² β πΏ1 implies that π(π’) = π
π’π β²(π’). This easily follows
from integration by parts:
π(π’) = β« π(π₯)πππ’π₯ππ₯
= 1ππ’
β« π(π₯)(πππ’π₯)β²ππ₯
= β 1ππ’
β« π β²(π₯)πππ’π₯ππ₯
so in particular | π(π’)| β€ 1|π’| βπ
β²β1.Thus if π, π β², πβ³ β πΏ1 then π(π’) = β 1
π’2πβ³(π’) so
| π(π’)| β€ βπβ³β1|π’2|
.
As β«β1
1|π’|2 ππ’ < β, π β πΏ1.
Definition (convolution). Given two Borel measures π and π on Rπ, wedefine their convolution π β π as the image of π β π under the addition map
Ξ¦ βΆ Rπ Γ Rπ β Rπ
(π₯, π¦) β¦ π₯ + π¦
i.e. π β π = Ξ¦β(π β π)
Thus given π΄ β β¬(Rπ),
Ξ¦β(π β π£)(π΄) = π β π({(π₯, π¦) βΆ π₯ + π¦ β π΄}).
Example. Given π, π independent random variables and π, π be laws of π, πrespectively, then π β π is the law of π + π.
Definition (convolution). If π, π β πΏ1(Rπ) define their convolution π β π by
(π β π)(π₯) = β«Rπ
π(π₯ β π‘)π(π‘)ππ‘.
59
10 Fourier transform
This is well defined by Fubini: π, π β πΏ1 so
β«Rπ
β«Rπ
|π(π₯ β π‘)π(π‘)|ππ‘ππ₯ < β
and
βπ β πβ1 = β« β£ β« π(π₯ β π‘)π(π‘)ππ‘β£ππ₯ β€ β« β« |π(π₯ β π‘)π(π‘)|ππ‘ππ₯ β€ βπβ1 β βπβ1
Therefore (πΏ1(Rπ), β) forms a Banach algebra.
Remark. If π, π are two finite Borel measures on Rπ and if π, π βͺ ππ₯, i.e.absolutely continuous, then by Radon-Nikodym there exist π, π β πΏ1(Rπ) suchthat
ππ = πππ₯ππ = πππ₯
then π β π βͺ ππ₯ andπ(π β π) = (π β π)ππ₯.
Proposition 10.4 (Gaussian approximation). If π β πΏπ(Rπ) where π β[1, β) then
limπβ0
βπ β πΊπ β πβπ = 0
where πΊπ = π©(0, π2πΌπ), i.e.
πΊπ(π₯) = 1(β
2ππ2)ππβ βπ₯β2
2π2 .
Lemma 10.5 (continuity of translation in πΏπ). Suppose π β [1, β) andπ β πΏπ. Then
limπ‘β0
βππ‘(π) β πβπ = 0
where ππ‘(π)(π₯) = π(π₯ + π‘), π‘ β Rπ.
Proof. Example sheet.
Proof of Gaussian approximation.
(π β πΊπ β π)(π₯) = β«Rπ
πΊπ(π‘)(π(π₯ β π‘) β π(π₯))ππ‘ = E(π(π₯ β ππ) β π(π₯))
where π βΌ π©(0, πΌπ) is Gaussian with density πΊ1. Then
βπ β πΊπ β πβππ β€ E(βπ(π₯ + ππ) β π(π₯)βπ
π) = E(βπππ(π) β πβππ)
by Jensenβs inequality and convexity of π₯ β¦ π₯π. By the lemma,
limπβ0
βπππ(π) β πβπ = 0.
As βπππ(π) β πβπ β€ 2βπβπ, apply dominated convergence theorem to get therequired result.
60
10 Fourier transform
Proposition 10.6.
β’ If π, π β πΏ1(Rπ) thenπ β π = π β π.
β’ If π, π are finite Borel measure then
π β π = π β π.
Proof. 1 reduces to 2 by writing
π(π₯)ππ₯ = π+(π₯)ππ₯ β πβ(π₯)ππ₯ = πππ β πππ
for some probability measure π, π.wlog we may assume π and π are laws of independent random variables π
and π. Then by a previous result π β π is just the law of π + π so
π β π(π’) = β« ππβ¨π’,π₯+π¦β©ππ(π₯)ππ(π¦)
= E(ππβ¨π’,π+π β©) = E(ππβ¨π’,πβ©ππβ¨π’,π β©) homomorphism= E(ππβ¨π’,πβ©)E(ππβ¨π’,π β©) as π, π are independent= π(π’) β π(π’).
In short, this is precisely because ππβ¨π’,ββ© are characters.
Theorem 10.7 (LΓ©vy criterion). Let (ππ)πβ₯1 and π be Rπ-valued randomvariables. Then TFAE:
1. ππ β π in law,
2. For all π’ β Rπ, πππ(π’) β ππ(π’).
In particular if ππ = ππ for two random variables π and π then π = π inlaw, i.e. ππ = ππ.
Thus Fourier transform is an injection from Borel measure to certain functionspace.
Proof.
β’ 1 βΉ 2: Clear by defintion as π(π₯) = ππβ¨π’,π₯β© is continuous and boundedfor all π’ β Rπ.
β’ 2 βΉ 1: Need to show that for all π β πΆπ(Rπ),
E(π(ππ)) β E(π(π)).
wlog itβs enough to check this for all π β πΆβπ (Rπ). For the sufficiency see
example sheet.Note that for all π β πΆβ
π (Rπ), π β πΏ1 so by Fourier inversion formula
π(π₯) = β« π(π’)πβπβ¨π’,π₯β© ππ’(2π)π .
61
10 Fourier transform
Hence
E(π(ππ)) = β« π(π’)E(πβπβ¨π’,ππβ©)ππ’
= β« π(π’) πππ(βπ’) ππ’
(2π)π
β β« π(π’)οΏ½οΏ½π(βπ’) ππ’(2π)π
= E(π(π))
by dominated convergence theorem.
Theorem 10.8 (Plancherel formula).
1. If π β πΏ1(Rπ) β© πΏ2(Rπ) then π β πΏ2(Rπ) and
β πβ22 = (2π)πβπβ2
2.
2. If π, π β πΏ1(Rπ) β© πΏ2(Rπ) then
β¨ π, πβ©πΏ2 = (2π)πβ¨π, πβ©πΏ2 .
3. The Fourier transform
β± βΆ πΏ1(Rπ) β© πΏ2(Rπ) β πΏ2(Rπ)
π β¦ 1(β
2π)ππ
extends uniquely to a linear operator on πΏ2(Rπ) which is an isometry.Moreover
β± β β±(π) = π
where π(π₯) = π(βπ₯), for all π β πΏ2(Rπ).
Proof. First we prove 1 and 2 assuming π , π β πΏ1(Rπ). By Fourier inversionformula,
β πβ22 = β«
Rπ
| π(π’)|2ππ’
= β« π(π’) π(π’)ππ’
= β« (β« π(π₯)ππβ¨π’,π₯β©ππ₯) π(π’)ππ’
= β« β« π(π₯) π(π’)πβπβ¨π’,π₯β©ππ’ππ₯
= β« π(π₯)π(π₯)(2π)πππ₯
= (2π)πβπβ22
62
10 Fourier transform
and in particular π β πΏ2(Rπ).Similarly for 2,
β¨ π, πβ©πΏ2 = β« π(π’) π(π’)ππ’
= β« (β« π(π₯)ππβ¨π’,π₯β©ππ₯) π(π’)ππ’
= β« β« π(π₯) π(π’)πβπβ¨π’,π₯β©ππ’ππ₯
= β« π(π₯)π(π₯)ππ₯(2π)π
= (2π)πβ¨π, πβ©πΏ2
Now for the general case we use Gaussian as a mollifier. Consider
ππ = π β πΊπ
ππ = π β πΊπ
and based on results and computations before,ππ = π β πΊπ = ππβπβπ’β2/2.
As β πββ β€ βπβ1, ππ β πΏ1(Rπ). Thus ππ β πΏ2(Rπ) and β ππβ22 = (2π)πβππβ2
2. Butby Gaussian approximation we know that ππ β π in πΏ2(Rπ) as π β 0. Henceβππβ2 β βπβ2. Then
β ππβ22 = β π β πΊπβ2
2 = β« | π(π’)|2πβπβπ’β2/2ππ’ β βπβ22
as π β 0 by monotone convergence theorem. Thus
β πβ22 = (2π)πβπβ2
2.
For 2,β¨ ππ, ππβ© = β« π ππβπβπ’β2ππ’ β β« π πππ’
as π β 0 by dominated convergence theorem as π π β πΏ1. The result followsfrom Gaussian approximation.
For 3, πΏ1(Rπ)β©πΏ2(Rπ) is dense in πΏ2(Rπ) because it contains πΆπ(Rπ). Thenextend by completeness: given π β πΏ2(Rπ), pick a sequence ππ β πΏ1(Rπ) β©πΏ2(Rπ) such that ππ β π in πΏ2(Rπ). Then define
β±π = limπββ
β±ππ.
The limit exists as πΏ2(Rπ) is complete. β± is well-defined as
ββ±ππ β β±ππβ2 = βππ β ππβ2
by 1. Finally,ββ±πβ2 = βπβ2
for all π β πΏ2(Rπ) andβ± β β±(π) = π
for all π such that π, π β πΏ1(Rπ). Thus by continuity this holds for πΏ2(Rπ).
63
11 Gaussians
11 Gaussians
Definition (Gaussian). An Rπ-valued random variable π is called Gaussianif for all π’ β Rπ, β¨π, π’β© is Gaussian, namely its law has the form π©(π, π2)for some π β R, π > 0.
Proposition 11.1. The law of a Gaussian vector π = (π1, β¦ , ππ) β Rπ
is uniquely determined by
1. its mean Eπ = (Eπ1, β¦ ,Eππ),
2. its covariance matrix (Cov(ππ, ππ))ππ where
Cov(ππ, ππ) = E((ππ β Eππ)(ππ β Eππ)).
Proof. If π = 1 then this just says that it is determined by its mean π andcovariance π2, which is obviously true. For π > 1, compute the characteristicfunction
ππ(π’) = E(ππβ¨π,π’β©)
but by assumption β¨π, π’β© is Gaussian in π = 1 so its law is determined
1. the mean E(β¨π, π’β©) = β¨Eπ, π’β©,
2. the variance Varβ¨π, π’β©. But
Varβ¨π, π’β© = E((β¨π, π’β© β Eβ¨π, π’β©)2) = βπ,π
π’ππ’π Cov(ππ, ππ).
In particular this shows that (Cov(ππ, ππ))ππ is a non-negative semidefinitesymmetric matrix.
Proposition 11.2. If π is a Gaussian vector then exists π΄ β β³π(R), π β Rπ
such that π has the same law as π΄π + π where π = (π1, β¦ , ππ), (ππ)ππ=1
are iid. π©(0, 1).
Proof. Take π΄ such that
π΄π΄β = (Cov(ππ, ππ))ππ
where π΄β is the adjoint/transpose of π΄, and
π = (Eπ1, β¦ ,Eππ).
Check that for all π’ β Rπ,
E(β¨π, π’β©) = β¨π, π’β©Var(β¨π, π’β©) = β¨π΄π΄βπ’, π’β© = βπ΄βπ’β2
2 = Var(β¨π΄π + π, π’β©)
64
11 Gaussians
Proposition 11.3. If (π1, β¦ , ππ) is a Gaussian vector then TFAE:
1. ππβs are independent.
2. ππβs are pairwise independent.
3. (Cov(ππ, ππ))ππ is a diagonal matrix.
Proof. 1 βΉ 2 βΉ 3 is obvious. 3 βΉ 1 as we can choose π΄ to be diagonal.Thus π has the same law as (π1π1, β¦ , ππππ) + π.
Theorem 11.4 (central limit theorem). Let (ππ)πβ₯1 be Rπ-valued iid. ran-dom variables with law π. Assume they have second moment, i.e. E(βπ1β2) <β. Let π¦ = E(π1) β Rπ and
ππ = π1 + β― + ππ β π β π¦βπ
.
Then ππ converges in law to a central Gaussian on Rπ with law π©(0, πΎ)where
πΎππ = (Cov(π1))ππ = [β«Rπ
(π₯π β E(π1))(π₯π β E(π1))ππ(π₯)]ππ
.
Proof. The proof is an application of LΓ©vy criterion. Need to show πππ(π’) β
ππ(π’) as π β β for all π’, where π βΌ π©(0, πΎ). As
πππ(π’) = E(ππβ¨ππ,π’β©),
this is equivalent to show that for all π’, β¨ππ, π’β© converges in law to β¨π , π’β©. But
β¨ππ, π’β© = β¨π1, π’β© + β― + β¨ππ, π’β© β πβ¨π¦, π’β©βπ
so we reduce the problem to 1-dimension case. By rescaling wlog E(π1) =0,E(π2
1) = 1.Now
πππ(π’) = E(πππ’ππ)
= E(exp(ππ’π1 + β― + ππβπ
))
=π
βπ=1
E(exp(ππ’ ππβπ
))
= (E(exp(ππ’ ππβπ
)))π
= ( π( π’βπ
))π
65
11 Gaussians
But E(π1) = 0,E(π21) = 1 so we can differentiate π under the integral sign
π(π’) = β«R
πππ’π₯ππ(π₯)
πππ’
π(π’) = β«R
ππ₯πππ’π₯ππ(π₯) = πE(π1)
π2
ππ’2 π(π’) = β«R
βπ₯2πππ’π₯ππ(π₯) = βE(π21)
Taylor expand π around 0 to 2nd order,
π(π’) = π(0) + π’ πβ²(0) + π’2
2πβ³(π’) + π(π’2)
= 1 + 0 β π’ β π’2
2+ π(π’2)
soπππ
(π’) = (1 β π’2
2π+ π(π’2
π))π β πβπ’2/2 = π(π’)
as π β β where π is the law of π.
66
12 Ergodic theory
12 Ergodic theoryLet (π, π, π) be a measure space. Let π βΆ π β π be an π-measurable map. Weare interested in the trajectories of π ππ₯ for π β₯ 0 and their statistical behaviour.In particular we are interested in those π preserving measure π.
Definition (measure-preserving). π βΆ π β π is measure-preserving ifπβπ = π. (π, π, π, π ) is called a measure-preserving dynamical system.
Definition (invariant function, invariant set, invariant π-algebra).
β’ A measurable function π βΆ π β R is called π-invariant if π = π β π.
β’ A set π΄ β π is π-invariant if 1π΄ is π-invariant.
β’π― = {π΄ β π βΆ π΄ is π-invariant}
is called the π-invariant π-algebra.
Lemma 12.1. π is π-invariant if and only if π is π―-measurable.
Proof. Indeed for all π‘ β R,
{π₯ β π βΆ π(π₯) < π‘} = {π₯ β π βΆ π β π (π₯) < π‘} = π β1({π₯ β π βΆ π(π₯) < π‘}).
Definition (ergodic). π is ergodic with respect to π, or that π is ergodicwith respect to π, if for all π΄ β π―, π(π΄) = 0 or π(π΄π) = 0.
This condition asserts that π― is trivial, i.e. its elements are either null orconull.
Lemma 12.2. π is ergodic with respect to π if and only if every invariantfunction π is almost everywhere constant.
Proof. Exercise.
Example.
1. Let π be a finite space, π βΆ π β π a map and π = # the countingmeasure, then π is measure preserving is equivalent to π being a bijection,and π is ergodic is equivalent to there does not exists a partition π =π1 βͺ π2 such that both π1 and π2 are π-invariant, which is equivalent tofor all π₯, π¦ β π, there exists π such that π ππ₯ = π¦.
2. Let π = Rπ/Zπ, π the Borel π-algebra and π the Lebesgue measure.Given π β Rπ, translation ππ βΆ π₯ β¦ π₯ + π is measure-preserving. ππis ergodic with respect to π if and only if (1, π1, β¦ , ππ), where ππβs arecoordinates of π, are linearly independent. See example sheet. (hint:Fourier transform)
67
12 Ergodic theory
3. Let π = R/Z and again π Borel π-algebra and π the Lebesgue mea-sure. The doubling map π βΆ π₯ β¦ 2π₯ β β2π₯β is ergodic with respect toπ. (hint: again consider Fourier coefficeints). Intuitively in the graphof this function the preimage π is two segments each of length π/2, someasure-preserving.
4. Furstenberg conjecture: every ergodic measure π on R/Z invariant underπ2, π3 must be either Lebesgue or finitely supported.
12.1 The canonical modelLet (ππ)πβ₯1 be an Rπ-valued stochastic process on (Ξ©, β±,P). Let π = (Rπ)Nand define the sample path map
Ξ¦ βΆ Ξ© β ππ β¦ (ππ(π))πβ₯1
Let
π βΆ π β π(π₯π)πβ₯1 β¦ (π₯π+1)πβ₯1
be the shift map. Let π₯π βΆ π β Rπ be the πth coordinate function and letπ = π(π₯π βΆ π β₯ 1).
Note. π is the infinite product π-algebra β¬(Rπ)βN of β¬(Rπ)N.
Let π = Ξ¦βP, a probability measure on (π, π). This π is called the law ofthe process (ππ)πβ₯1. Now (π, π, π, π ) is called the canonical model associatedto (ππ)πβ₯1.
Proposition 12.3 (stationary process). TFAE:
1. (π, π, π, π ) is measure-preserving.
2. For all π β₯ 1, the law of (ππ, ππ+1, β¦ , ππ+π) on (Rπ)π is independentof π.
In this case we say that (ππ)πβ₯1 is a stationary process.
Proof.
β’ 1 βΉ 2: π = πβπ implies π = π πβ π for all π and this says law of (ππ)πβ₯1
is the same as that of (ππ+π)πβ₯1.
β’ 1 βΈ 2: π and π πβ π agree on cylinders π΄ Γ (Rπ)N\πΉ for πΉ β N finite,
π΄ β β¬((Rπ)πΉ).
In some sense ergodic system is the study of stationary process.
68
12 Ergodic theory
Proposition 12.4 (Bernoulli shift). If (ππ)πβ₯1 are iid. then (π, π, π, π )is ergodic. It is called the Bernoulli shift associated to the law π of π1. Wehave
π = πβN.
Proof. Claim that Ξ¦β1(π―) β π, the tail π-algebra of (ππ)πβ₯1. But Kolmogorov0-1 law says that if π΄ β π― then P(Ξ¦β1(π΄)) = 0 or 1, so π(π΄) = 0 or 1, thus πis π-ergodic.
Given π΄ β π―, π β1π΄ = π΄ so
Ξ¦β1(π΄) = {π β Ξ© βΆ (ππ(π))πβ₯1 β π΄}= {π β Ξ© βΆ (ππ(π))πβ₯1 β π β1π΄}= {π β Ξ© βΆ (ππ+1(π))πβ₯1 β π΄}= {π β Ξ© βΆ (ππ+π(π))πβ₯1 β π΄} for all πβ π(ππ, ππ+1, β¦ )
for π-almost every π₯.
Theorem 12.5 (von Neumann mean ergodic theorem). Let (π, π, π, π )be a measure-preserving system. Let π β πΏ2(π, π, π). Then the ergodicaverage
πππ = 1π
πβ1βπ=0
π β π π
converges in πΏ2 to π, a π-invariant function. In fact π is the orthogonalprojection of π onto πΏ2(π, π―, π).
The intuition is as follow: if π is the indicator function of a set, then πππ isexactly the average of the time each orbit spending in π΄.
Proof. Hilbert space argument. Let π» = πΏ2(π, π, π) and define
π βΆ π» β π»π β¦ π β π
which is an isometry: because π is π-invariant, β« |π β π |2ππ = β« |π|2ππ. Thenby Riesz representation theorem it has an adjoint
π β βΆ π» β π»π₯ β¦ π βπ₯
which satisfies β¨πβπ₯, π¦β© = β¨π₯, ππ¦β© for all π¦ β π». Let
π = {π β π β π βΆ π β π»}
be the coboundaries. Let π β π. Then
πππ = 1π
πβ1βπ=0
(π β π π β π β π π+1) = π β π β π π
πβ 0
69
12 Ergodic theory
as π β β.Let π β π then again πππ β 0 because for all π exists π β π such that
βπ β πβ < π. Then
βπππ β πππβ = βππ(π β π)β β€ βπ β πβ β€ π
so lim supπβπππβ β€ π.Have π» = π β πβ and πβ = π β. Claim π β is exactly the π-invariant
functions. The theorem then follows because if π β π = π then πππ = π for all π.
Proof of claim.
π β = {π β π» βΆ β¨π, π β ππβ© = 0 for all π β π»}= {π βΆ β¨π, πβ© = β¨π, ππβ© for all π}= {π βΆ β¨π, πβ© = β¨π βπ, πβ© for all π}= {π βΆ π βπ = π}= {π βΆ ππ = π}
where the last equality is by
βππ β πβ2 = 2βπβ2 β 2 Reβ¨π, ππβ© = 2βπβ2 β 2 Reβ¨πβπ, πβ©,
and this shows that π β are exactly π-invariant functions.
In fact we can do better:
Theorem 12.6 (Birkhoff (pointwise) ergodic theorem). Let (π, π, π, π ) bea measure-preserving system. Assume π is finite (actually π-finite suffices)and let π β πΏ1(π, π, π). Then
πππ = 1π
πβ1βπ=0
π β π π
converges π-almost everywhere to a π-invariant function π β πΏ1. Moreoverπππ β π in πΏ1.
Corollary 12.7 (strong law of large numbers). Let (ππ)πβ₯1 be a sequenceof iid. random variables. Assume E(|π1|) < β. Let
ππ =π
βπ=1
ππ,
then 1π ππ converges almost surely to E(π1).
Proof. Let (π, π, π, π ) be the canonical model associated to (ππ)πβ₯1, whereπ = RN, π = β¬(R)βN, π the shift operator and π = πβN where π is the law ofπ1. It is a Bernoulli shift. Let
π βΆ π β Rπ₯ β¦ π₯1
70
12 Ergodic theory
the first coodinate. Then π β π π(π₯) = π₯π+1 so
1π
(π1 + β― + ππ)(π) = πππ(π₯)
where π₯ = (ππ(π))πβ₯1. Hence by Birkhoff ergodic theorem
1π
(π1 + β― + ππ) β π = β« πππ = β« π₯1ππ = E(π1)
almost surely.
Remark. If π is ergodic then π is almost everywhere constant. Hence π =β« πππ.
Lemma 12.8 (maximal ergodic lemma). Let π β πΏ1(π, π, π) and πΌ β R.Let
πΈπΌ = {π₯ β π βΆ supπβ₯1
πππ(π₯) > πΌ}
thenπΌπ(πΈπΌ) β€ β«
πΈπΌ
πππ.
Lemma 12.9 (maximal inequality). Let
π0 = 0
ππ = ππππ =πβ1βπ=0
π β π π π β₯ 1
Letππ = {π₯ β π βΆ max
0β€πβ€πππ(π₯) > 0}.
Thenβ«
ππ
πππ β₯ 0.
Proof of maximal inequality. Set πΉπ = max0β€πβ€π ππ. Observe that for all π β€π, πΉπ β₯ ππ and hence
πΉπ β π + π β₯ ππ β π + π = ππ+1.
Now if π₯ β ππ then
πΉπ(π₯) β€ max0β€πβ€π
ππ+1 β€ πΉπ β π + π
Integrate to getβ«
ππ
πΉπππ β€ β«ππ
πΉπ β π ππ + β«ππ
πππ.
Note that πΉπ(π₯) = 0 if π₯ β ππ because π0 = 0 so
β«ππ
πΉπ = β«π
πΉπ β€ β«π
πΉπ β π + β«ππ
π.
71
12 Ergodic theory
As π is π-invariant, β« πΉπ β π = β« πΉπ so
β«ππ
πππ β₯ 0.
Proof of maximal ergodic lemma. Apply the maximal inequality to π = π β πΌ.Observe that
πΈπΌ(π) = βπβ₯1
ππ(π)
and πππ = πππ β πΌ. Thus
β«πΈπΌ(π)
(π β πΌ)ππ β₯ 0,
which is equivalent toπΌπ(πΈπΌ) β€ β«
πΈπΌ
πππ.
Proof of Birkhoff (pointwise) ergodic theorem. Let
π = lim supπ
πππ
π = lim infπ
πππ
Observe that π = π β π , π β π = π: indeed
πππ β π = 1π
(π β π + β― + π β π π) = 1π
((π + 1)ππ+1π β π).
Need to show that π = π π-almost everywhere. This is equivalent to for allπΌ, π½ β Q, πΌ > π½, the set
πΈπΌ,π½(π) = {π₯ β π βΆ π(π₯) < π½, π(π₯) > πΌ}
is π-null, as then{π₯ βΆ π(π₯) β π(π₯)} = β
πΌ>π½πΈπΌ,π½
is π-null by subadditivity. Observe that πΈπΌ,π½ is π-invariant. Apply the maximalergodic theorem to πΈπΌ,π½ to get
πΌπ(πΈπΌ,π½(π)) β€ β«πΈπΌ,π½
πππ.
Duallyβπ½π(πΈπΌ,π½(π)) β€ β«
πΈπΌ,π½
βπππ
72
12 Ergodic theory
soπΌπ(πΈπΌ,π½) β€ π½π(πΈπΌ,π½).
But πΌ > π½ so π(πΈπΌ,π½) = 0.We have proved that the limit limπ πππ exists almost everywhere, which we
now define to be π, and left to show π β πΏ1 and limπβπππ β πβ1 = 0. This is anapplication of Fatouβs lemma:
β« |π|ππ = β« lim infπ
|πππ|ππ
β€ lim infπ
β« |πππ|ππ
β€ lim infπ
βπππβ1
β€ βπβ1
so π β πΏ1 where the last inequality is because
βπππβ β€ 1π
(βπβ1 + β― + βπ β π πβ1β1) = βπβ1.
Now to show βπππ β πβ1 β 0, we truncate π. Let π > 0 and set ππ = π1|π|<π.Note that
β’ |ππ| β€ π so |ππππ| β€ π. Hence by dominated convergence theoremβππππ β ππβ1 β 0.
β’ ππ β π π-almost everywhere and also in πΏ1 by dominated convergencetheorem.
Thus by Fatouβs lemma,
βππ β πβ1 β€ lim infπ
βππππ β πππβ1 β€ βππ β πβ1
Finally
βπππ β πβ1 β€ βπππ β ππππβ1 + βππππ β ππβ1 + βππ β πβ1
β€ βπ β ππβ1 + βππππ β ππβ1 + βπ β ππβ1
solim supβπππ β πβ1 β€ 2βπ β ππβ1
for all π, so goes to 0 as π β β.
Remark.
1. The theorem holds if π is only assumed to be π-finite.
2. The theorem holds if π β πΏπ for π β [1, β). The πππ β π in πΏπ.
73
Index
πΏπ-space, 47π-system, 12π-algebra, 10
independence, 33invariant, 67tail, 36
Bernoulli shift, 69Birkhoff ergodic theorem, 70Boolean algebra, 2Borel π-algebra, 12Borel measurable, 19Borel-Cantelli lemma, 35
canonical model, 68CarathΓ©odory extension theorem,
14central limit theorem, 65characteristic function, 56coboundary, 69completion, 17conditional expectation, 54convergence
almost surely, 39in πΏ1, 41in distribution, 39in mean, 41in measure, 39in probability, 39
convolution, 59covariance matrix, 64
density, 31, 52Dirac mass, 39distribution, 29distribution function, 29Dynkin lemma, 12
ergodic, 67expectation, 29
Fatouβs lemma, 23filtration, 36Fourier transform, 56Furstenberg conjecture, 68
Gaussian, 64Gaussian approximation, 60
Hermitian norm, 49Hilbert space, 49HΓΆlder inequality, 46
iid., 34independence, 33infinite product π-algebra, 35, 68inner product, 49integrable, 20integral with respect to a measure,
20invariant π-algebra, 67invariant function, 67invariant set, 67io., 35
Jensen inequality, 44
Kolmogorov 0 β 1 law, 36, 69
law, 29law of large numbers, 38law of larger numbers, 70Lebesgue measure, 8Lebesgueβs dominated convergence
theorem, 23LΓ©vy criterion, 61
maximal ergodic lemma, 71maximal inequality, 71mean, 32, 64measurable, 18measurable space, 10measure, 10
absolutely continuous, 51finitely additive, 2singular, 51
measure-preserving, 67measure-preserving dynamical
system, 67Minkowski inequality, 45moment, 32monotone convergence theorem, 20
null set, 5
orthogonal projection, 50
Plancherel formula, 62probability measure, 29
74
Index
probability space, 29product π-algebra, 26
infinite, 35, 68product measure, 27
infinite, 35
Radon-Nikodym derivative, 52Radon-Nikodym theorem, 52random process, 36random variable, 29
independence, 33Riesz representation theorem, 51,
69
sample path map, 68simple function, 19
stationary process, 68stochastic process, 68strong law of large numbers, 38, 70
tail event, 36, 69Tonelli-Fubini theorem, 27
uniformly integrable, 41
variance, 32von Neumann mean ergodic
theorem, 69
weak convergence, 39
Young inequality, 46
75