probability and measure - aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1...

76
University of Cambridge Mathematics Tripos Part II Probability and Measure Michaelmas, 2018 Lectures by E. Breuillard Notes by Qiangru Kuang

Upload: others

Post on 30-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

University ofCambridge

Mathematics Tripos

Part II

Probability and Measure

Michaelmas, 2018

Lectures byE. Breuillard

Notes byQiangru Kuang

Page 2: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

Contents

Contents

1 Lebesgue measure 21.1 Boolean algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Jordan measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Abstract measure theory 9

3 Integration and measurable functions 17

4 Product measures 25

5 Foundations of probability theory 28

6 Independence 326.1 Useful inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7 Convergence of random variables 38

8 𝐿𝑝 spaces 43

9 Hilbert space and 𝐿2-methods 48

10 Fourier transform 55

11 Gaussians 63

12 Ergodic theory 6612.1 The canonical model . . . . . . . . . . . . . . . . . . . . . . . . . 67

Index 73

1

Page 3: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

1 Lebesgue measure

1 Lebesgue measure

1.1 Boolean algebra

Definition (Boolean algebra). Let 𝑋 be a set. A Boolean algebra on 𝑋 isa family of subsets of 𝑋 which

1. contains βˆ…,

2. is stable under finite unions and complementation.

Example.

β€’ The trivial Boolean algebra ℬ = {βˆ…, 𝑋}.

β€’ The discrete Boolean algebra ℬ = 2𝑋, the family of all subsets of 𝑋.

β€’ Less trivially, if 𝑋 is a topological space, the family of constructible setsforms a Boolean algebra, where a constructible set is the finite union oflocally closed set, i.e. a set 𝐸 = π‘ˆ ∩ 𝐹 where π‘ˆ is open and 𝐹 is closed.

Definition (finitely additive measure). Let 𝑋 be a set and ℬ a Booleanalgebra on 𝑋. A finitely additive measure on (𝑋, ℬ) is a function π‘š ∢ ℬ β†’[0, +∞] such that

1. π‘š(βˆ…) = 0,

2. π‘š(𝐸 βˆͺ 𝐹) = π‘š(𝐸) + π‘š(𝐹) where 𝐸 ∩ 𝐹 = βˆ….

Example.

1. Counting measure: π‘š(𝐸) = #𝐸, the cardinality of 𝐸 where ℬ is thediscrete Boolean algebra of 𝑋.

2. More generally, given 𝑓 ∢ 𝑋 β†’ [0, +∞], define for 𝐸 βŠ† 𝑋,

π‘š(𝐸) = βˆ‘π‘’βˆˆπΈ

𝑓(𝑒).

3. Suppose 𝑋 = βˆπ‘π‘–=1 𝑋𝑖, then define ℬ(𝑋) to be the unions of 𝑋𝑖’s. Assign

a weight π‘Žπ‘– β‰₯ 0 to each 𝑋𝑖 and define π‘š(𝐸) = βˆ‘π‘–βˆΆπ‘‹π‘–βŠ†πΈ π‘Žπ‘– for 𝐸 ∈ ℬ.

1.2 Jordan measureThis section is a historic review and provides intuition for Lebesgue measuretheory. We’ll gloss over details of proofs in this section.

Definition. A subset of R𝑑 is called elementary if it is a finite union ofboxes, where a box is a set 𝐡 = 𝐼1 Γ— β‹― Γ— 𝐼𝑑 where each 𝐼𝑖 is a finite intervalof R.

2

Page 4: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

1 Lebesgue measure

Proposition 1.1. Let 𝐡 βŠ† R𝑑 be a box. Let β„°(𝐡) be the family of elementarysubsets of 𝐡. Then

1. β„°(𝐡) is a Boolean algebra on 𝐡,

2. every 𝐸 ∈ β„°(𝐡) is a disjoint finite union of boxes,

3. if 𝐸 ∈ β„°(𝐡) can be written as disjoint finite union in two ways,𝐸 = ⋃𝑛

𝑖=1 𝐡𝑖 = β‹ƒπ‘šπ‘—=1 𝐡′

𝑗, then βˆ‘π‘›π‘–=1 |𝐡𝑖| = βˆ‘π‘š

𝑗=1 |𝐡′𝑗| where |𝐡| =

βˆπ‘‘π‘–=1 |𝑏𝑖 βˆ’ π‘Žπ‘–| if 𝐡 = 𝐼1 Γ— β‹― Γ— 𝐼𝑑 and 𝐼𝑖 has endpoints π‘Žπ‘–, 𝑏𝑖.

Following this, we can define a finitely additive measure correponding to ourintuition of length, area, volume etc:

Proposition 1.2. Define π‘š(𝐸) = βˆ‘π‘›π‘–=1 |𝐡𝑖| if 𝐸 is any elementary set and

is the disjoint union of boxes 𝐡𝑖 βŠ† R𝑑. Then π‘š is a finitely additive measureon β„°(𝐡) for any box 𝐡.

Definition. A subset 𝐸 βŠ† R𝑑 is Jordan measurable if for any πœ€ > 0 thereare elementary sets 𝐴, 𝐡, 𝐴 βŠ† 𝐸 βŠ† 𝐡 and π‘š(𝐡 \ 𝐴) < πœ€.

Remark. Jordan measurable sets are bounded.

Proposition 1.3. If a set 𝐸 βŠ† R𝑑 is Jordan measurable, then

supπ΄βŠ†πΈ elementary

{π‘š(𝐴)} = infπ΅βŠ‡πΈ elementary

{π‘š(𝐡)}.

In which case we define the Jordan measure of 𝐸 as

π‘š(𝐸) = supπ΄βŠ†πΈ

{π‘š(𝐴)}.

Proof. Take 𝐴𝑛 βŠ† 𝐸 such that π‘š(𝐴𝑛) ↑ sup and 𝐡𝑛 βŠ‡ 𝐸 such that π‘š(𝐡𝑛) ↓ inf.Note that

inf ≀ π‘š(𝐡𝑛) = π‘š(𝐴𝑛) + π‘š(𝐡𝑛 \ 𝐴𝑛) ≀ sup +π‘š(𝐡𝑛 \ 𝐴𝑛) ≀ sup +πœ€

for arbitrary πœ€ > 0 so they are equal.

Exercise.

1. If 𝐡 is a box, the family π’₯(𝐡) of Jordan measurable subsets of 𝐡 is aBoolean algebra.

2. A subset 𝐸 βŠ† [0, 1] is Jordan measurable if and only if 1𝐸, the indicatorfunction on 𝐸, is Riemann integrable.

3

Page 5: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

1 Lebesgue measure

1.3 Lebesgue measureAlthough Jordan measure corresponds to the intuition of length, area and vol-ume, it suffer from a few severe problems and issues:

1. unbounded sets in R𝑑 are not Jordan measurable.

2. 1Q∩[0,1] is not Riemann integrable, and therefore Q ∩ [0, 1] is not Jordanmeasurable.

3. pointwise limits of Riemann integrable function 𝑓𝑛 ∢= 1 1𝑛!Z∩[0,1] β†’ 1Q∩[0,1]

is not Riemann integrable.

The idea of Lebesgue is to use countable covers by boxes.

Definition. A subset 𝐸 βŠ† R𝑑 is Lebesgue measurable if for all πœ€ > 0, thereexists a countable union of boxes 𝐢 with 𝐸 βŠ† 𝐢 and π‘šβˆ—(𝐢 \ 𝐸) < πœ€, whereπ‘šβˆ—, the Lebesgue outer measure, is defined as

π‘šβˆ—(𝐸) = inf{βˆ‘π‘–β‰₯1

|𝐡𝑖| ∢ 𝐸 βŠ† ⋃𝑖β‰₯1

𝐡𝑖, 𝐡𝑖 boxes}

for every subset 𝐸 βŠ† R𝑑.

Remark. wlog in these definitions we may assume that boxes are open.

Proposition 1.4. The family β„’ of Lebesgue measurable subsets of R𝑑 is aBoolean algebra stable under countable unions.

Lemma 1.5.

1. π‘šβˆ— is monotone: if 𝐸 βŠ† 𝐹 then π‘šβˆ—(𝐸) βŠ† π‘šβˆ—(𝐹).

2. π‘šβˆ— is countably subadditive: if 𝐸 = ⋃𝑛β‰₯1 𝐸𝑛 where 𝐸𝑛 βŠ† R𝑑 then

π‘šβˆ—(𝐸) ≀ βˆ‘π‘›β‰₯1

π‘šβˆ—(𝐸𝑛).

Proof. Monotonicity is obvious. For countable subadditivity, pick πœ€ > 0 and let𝐢𝑛 = ⋃𝑖β‰₯1 𝐢𝑛,𝑖 where 𝐢𝑛,𝑖 are boxes such that 𝐸𝑛 βŠ† 𝐢𝑛 and

βˆ‘π‘–β‰₯1

|𝐢𝑛,𝑖| ≀ π‘šβˆ—(𝐸𝑛) + πœ€2𝑛 .

Thenβˆ‘π‘›β‰₯1

βˆ‘π‘–β‰₯1

|𝐢𝑛,𝑖| ≀ βˆ‘π‘›β‰₯1

(π‘šβˆ—(𝐸𝑛) + πœ€2𝑛 ) = πœ€ + βˆ‘

𝑛β‰₯1π‘šβˆ—(𝐸𝑛)

and 𝐸 βŠ† ⋃𝑛β‰₯1 𝐢𝑛 = ⋃𝑛β‰₯1 ⋃𝑖β‰₯1 𝐢𝑛,𝑖 so

π‘šβˆ—(𝐸) ≀ πœ€ + βˆ‘π‘›β‰₯1

π‘šβˆ—(𝐸𝑛)

for all πœ€ > 0.

4

Page 6: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

1 Lebesgue measure

Remark. Note that π‘šβˆ— is not additive on the family of all subsets of R𝑑.However, it will be on β„’, as we will show later.

Lemma 1.6. If 𝐴, 𝐡 are disjoint compact subsets of R𝑑 then

π‘šβˆ—(𝐴 βˆͺ 𝐡) = π‘šβˆ—(𝐴) + π‘šβˆ—(𝐡).

Proof. ≀ by the previous lemma so need to show β‰₯. Pick πœ€ > 0. Let 𝐴 βˆͺ 𝐡 βŠ†β‹ƒπ‘›β‰₯1 𝐡𝑛 where 𝐡𝑛 are open boxes such that

βˆ‘π‘›β‰₯1

|𝐡𝑛| ≀ π‘šβˆ—(𝐴 βˆͺ 𝐡) + πœ€.

wlog we may assume that the side lengths of each 𝐡𝑛 are < 𝛼2 , where

𝛼 = inf{β€–π‘₯ βˆ’ 𝑦‖1 ∢ π‘₯ ∈ 𝐴, 𝑦 ∈ 𝐡} > 0.

where the inequality comes from the fact that 𝐴 and 𝐡 are compact and thusclosed. wlog we may discard the 𝐡𝑛’s that do not interesect 𝐴 βˆͺ 𝐡. Then byconstruction

βˆ‘π‘›β‰₯1

|𝐡𝑛| = βˆ‘π‘›β‰₯1,π΅π‘›βˆ©π΄=βˆ…

|𝐡𝑛| + βˆ‘π‘›β‰₯1,π΅π‘›βˆ©π΅=βˆ…

|𝐡𝑛| β‰₯ π‘šβˆ—(𝐴) + π‘šβˆ—(𝐡)

soπœ€ + π‘šβˆ—(𝐴 βˆͺ 𝐡) β‰₯ π‘šβˆ—(𝐴) + π‘šβˆ—(𝐡)

for all πœ€.

Lemma 1.7. If 𝐸 βŠ† R𝑑 has π‘šβˆ—(𝐸) = 0 then 𝐸 ∈ β„’.

Definition (null set). A set 𝐸 βŠ† R𝑑 such that π‘šβˆ—(𝐸) = 0 is called a nullset.

Proof. For all πœ€ > 0, there exist 𝐢 = ⋃𝑛β‰₯1 𝐡𝑛 where 𝐡𝑛 are boxes such that𝐸 βŠ† 𝐢 and βˆ‘π‘›β‰₯1 |𝐡𝑛| ≀ πœ€. But

π‘šβˆ—(𝐢 \ 𝐸) ≀ π‘šβˆ—(𝐢) ≀ πœ€.

Lemma 1.8. Every open subset of R𝑑 and every closed subset of R𝑑 is inβ„’.

We will prove the lemma using the fact that the family of Lebesgue mea-surable subsets is stable under countable union, which itself does not use thislemma. This lemma, however, will be used to show the stability under comple-mentation. Since the proof is quite technical (it has more to do with generaltopology than measure theory), for brevity and fluency of ideas we present theproof the main proposition first.

5

Page 7: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

1 Lebesgue measure

Proof of Proposition 1.4. It is obvious that βˆ… ∈ β„’. To show it is stable undercountable unions, start with 𝐸𝑛 ∈ β„’ for 𝑛 β‰₯ 1. Need to show 𝐸 ∢= ⋃𝑛β‰₯1 𝐸𝑛 βˆˆβ„’.

Pick πœ€ > 0. By assumption there exist 𝐢𝑛 = ⋃𝑖β‰₯1 𝐡𝑛,𝑖 where 𝐡𝑛,𝑖 are boxessuch that 𝐸𝑛 βŠ† 𝐢𝑛 and

π‘šβˆ—(𝐢𝑛 \ 𝐸𝑛) < πœ€2𝑛 .

Now𝐸 = ⋃

𝑛β‰₯1𝐸𝑛 βŠ† ⋃

𝑛β‰₯1𝐢𝑛 =∢ 𝐢

so 𝐢 is again a countable union of boxes and 𝐢 \ 𝐸 βŠ† ⋃𝑛β‰₯1 𝐢𝑛 \ 𝐸𝑛. so

π‘šβˆ—(𝐢 \ 𝐸) ≀ βˆ‘π‘›β‰₯1

π‘šβˆ—(𝐢𝑛 \ 𝐸𝑛) ≀ βˆ‘π‘›β‰₯1

πœ€2𝑛 = πœ€

by countable subadditivity so 𝐸 ∈ β„’.To show it is stable under complementation, suppose 𝐸 ∈ β„’. By assumption

there exist 𝐢𝑛 a countable union of boxes with 𝐸 βŠ† 𝐢𝑛 and π‘šβˆ—(𝐢𝑛 \ 𝐸) ≀ 1𝑛 .

wlog we may assume the boxes are open so 𝐢𝑛 is open, 𝐢𝑐𝑛 is closed so 𝐢𝑐

𝑛 ∈ β„’.Thus ⋃𝑛β‰₯1 𝐢𝑐

𝑛 ∈ β„’ by first part of the proof.But

π‘šβˆ—(𝐸𝑐 \ ⋃𝑛β‰₯1

𝐢𝑐𝑛) ≀ π‘šβˆ—(𝐸𝑐 \ 𝐢𝑐

𝑛) = π‘šβˆ—(𝐢𝑛 \ 𝐸) ≀ 1𝑛

so π‘šβˆ—(𝐸𝑐 \ ⋃𝑛β‰₯1 𝐢𝑐𝑛) = 0 so 𝐸𝑐 \ ⋃𝑛β‰₯1 𝐢𝑐

𝑛 ∈ β„’ since it is a null set. But

𝐸𝑐 = (𝐸𝑐 \ ⋃𝑛β‰₯1

𝐢𝑐𝑛) βˆͺ ⋃

𝑛β‰₯1𝐢𝑐

𝑛,

both of which are in β„’ so 𝐸𝑐 ∈ β„’.

Proof of Lemma 1.8. Every open set in R𝑑 is a countable union of boxes so isin β„’. It is more subtle for closed sets. The key observation is that every closedset is the countable union of compact subsets so we are left to show compactsets of R𝑑 are in β„’.

Let 𝐹 βŠ† R𝑑 be compact. For all π‘˜ β‰₯ 1, there exist π‘‚π‘˜ a countable union ofopen sets such that 𝐹 βŠ† π‘‚π‘˜ ∢= ⋃𝑖β‰₯1 π‘‚π‘˜,𝑖 where π‘‚π‘˜,𝑖 are open boxes such that

βˆ‘π‘–β‰₯1

|π‘‚π‘˜,𝑖| ≀ π‘šβˆ—(𝐹) + 12π‘˜ .

By compactness there exist a finite subcover so we can assume π‘‚π‘˜ is a finiteunion of open boxes. Moreover, wlog assume that

1. the side lengths of π‘‚π‘˜,𝑖 are ≀ 12π‘˜ .

2. for each 𝑖, π‘‚π‘˜,𝑖 intersects 𝐹.

3. π‘‚π‘˜+1 βŠ† π‘‚π‘˜ (by replacing π‘‚π‘˜+1 with π‘‚π‘˜+1 ∩ π‘‚π‘˜ iteratively).

6

Page 8: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

1 Lebesgue measure

Then 𝐹 = β‹‚π‘˜β‰₯1 π‘‚π‘˜ and we are left to show π‘šβˆ—(π‘‚π‘˜ \ 𝐹) β†’ 0. By additivity ondisjoint compact sets,

π‘šβˆ—(𝐹) + π‘šβˆ—(𝑂𝑖 \ 𝑂𝑖+1) = π‘šβˆ—(𝐹 βˆͺ (𝑂𝑖 \ 𝑂𝑖+1))

soπ‘šβˆ—(𝐹) + π‘šβˆ—(𝑂𝑖 \ 𝑂𝑖+1) ≀ π‘šβˆ—(𝑂𝑖) ≀ βˆ‘

𝑗β‰₯1|𝑂𝑖,𝑗| ≀ π‘šβˆ—(𝐹) + 1

2𝑖

so π‘šβˆ—(𝑂𝑖 \ 𝑂𝑖+1) ≀ 12𝑖 . Finally,

π‘šβˆ—(π‘‚π‘˜ \ 𝐹) = π‘šβˆ—(⋃𝑖β‰₯π‘˜

(𝑂𝑖 \ 𝑂𝑖+1)) ≀ βˆ‘π‘–β‰₯π‘˜

π‘šβˆ—(𝑂𝑖 \ 𝑂𝑖+1) ≀ βˆ‘π‘–β‰₯π‘˜

12𝑖 = 1

2π‘˜βˆ’1 .

The result we’re working towards is

Proposition 1.9. π‘šβˆ— is countably additive on β„’, i.e. if (𝐸𝑛)𝑛β‰₯1 where𝐸𝑛 ∈ β„’ are pairwise disjoint then

π‘šβˆ—( ⋃𝑛β‰₯1

𝐸𝑛) = βˆ‘π‘›β‰₯1

π‘šβˆ—(𝐸𝑛).

Lemma 1.10. If 𝐸 ∈ β„’ then for all πœ€ > 0 there exists π‘ˆ open, 𝐹 closed,𝐹 βŠ† 𝐸 βŠ† π‘ˆ such that π‘šβˆ—(π‘ˆ \ 𝐸) < πœ€ and π‘šβˆ—(𝐸 \ 𝐹) < πœ€.

Proof. By definition of β„’, there exists a countable union of open boxes 𝐸 βŠ†β‹ƒπ‘›β‰₯1 𝐡𝑛 such that π‘šβˆ—(⋃𝑛β‰₯1 𝐡𝑛 \ 𝐸) < πœ€. Just take π‘ˆ = ⋃𝑛β‰₯1 𝐡𝑛 which isopen.

For 𝐹 do the same with 𝐸𝑐 = R𝑑 \ 𝐸 in place of 𝐸.

Proof of Proposition 1.9. First we assume each 𝐸𝑛 is compact. By a previouslemma π‘šβˆ— is additive on compact sets so for all 𝑁 ∈ N,

π‘šβˆ—(𝑁⋃𝑛=1

𝐸𝑛) =𝑁

βˆ‘π‘›=1

π‘šβˆ—(𝐸𝑛).

In particular𝑁

βˆ‘π‘›=1

π‘šβˆ—(𝐸𝑛) ≀ π‘šβˆ—( ⋃𝑛β‰₯1

𝐸𝑛)

since π‘šβˆ— is monotone. Take 𝑁 β†’ ∞ to get one inequality. The other directionholds by countable subadditivity of π‘šβˆ—.

Now assume that each 𝐸𝑛 is a bounded subset in β„’. By the lemma thereexists 𝐾𝑛 βŠ† 𝐸𝑛 closed, so compact, such that π‘šβˆ—(𝐸𝑛 \ 𝐾𝑛) ≀ πœ€

2𝑛 . Since 𝐾𝑛’sare disjoint, by the previous case

π‘šβˆ—( ⋃𝑛β‰₯1

𝐾𝑛) = βˆ‘π‘›β‰₯1

π‘šβˆ—(𝐾𝑛)

7

Page 9: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

1 Lebesgue measure

then

βˆ‘π‘›β‰₯1

π‘šβˆ—(𝐸𝑛)

≀ βˆ‘π‘›β‰₯1

π‘šβˆ—(𝐾𝑛) + π‘šβˆ—(𝐸𝑛 \ 𝐾𝑛)

β‰€π‘šβˆ—( ⋃𝑛β‰₯1

𝐾𝑛) + βˆ‘π‘›β‰₯1

πœ€2𝑛

β‰€π‘šβˆ—( ⋃𝑛β‰₯1

𝐸𝑛) + πœ€

so one direction of inequality. Similarly the other direction holds by countablesubadditivity of π‘šβˆ—.

For the general case, note that R𝑑 = β‹ƒπ‘›βˆˆZ𝑑 𝐴𝑛 where 𝐴𝑛 is bounded andin β„’, for example by taking 𝐴𝑛 to be product of half open intervals of unitlength. Write 𝐸𝑛 as β‹ƒπ‘šβˆˆZ𝑑 𝐸𝑛 ∩ π΄π‘š so just apply the previous results to(𝐸𝑛 ∩ π΄π‘š)𝑛β‰₯1,π‘šβˆˆZ𝑑 .

Definition (Lebesgue measure). π‘šβˆ— when restricted to β„’ is called theLebesgue measure and is simply denoted by π‘š.

Example (Vitali counterexample). Althought β„’ is pretty big (it includes allopen and closed sets, countable unions and intersections of them, and has car-dinality at least 2𝔠 where 𝔠 is the continuum, by considering a null set withcardinality 𝔠, and each subset thereof), it does not include every subset of R𝑑.

Consider (Q, +), the additive subgroup of (R, +). Pick a set of representative𝐸 of the cosets of (Q, +). Choose it inside [0, 1]. For each π‘₯ ∈ R, there existsa unique 𝑒 ∈ 𝐸 such that π‘₯ βˆ’ 𝑒 ∈ Q (here we require axiom of choice). Claimthat 𝐸 βˆ‰ β„’ and π‘šβˆ— is not additive on the family of all subsets of R𝑑.

Proof. Pick distinct rationals 𝑝1, … , 𝑝𝑁 in [0, 1]. The sets 𝑝𝑖 + 𝐸 are pairwisedisjoint so if π‘šβˆ— were additive then we would have

π‘šβˆ—(𝑁⋃𝑖=1

𝑝𝑖 + 𝐸) =𝑁

βˆ‘π‘–=1

π‘šβˆ—(𝑝𝑖 + 𝐸) = π‘π‘šβˆ—(𝐸)

by translation invariance of π‘šβˆ—. But then

𝑁⋃𝑖=1

𝑝𝑖 + 𝐸 βŠ† [0, 2]

since 𝐸 βŠ† [0, 1] so by monotonicity of π‘šβˆ— have

π‘šβˆ—(𝑁⋃𝑖=1

𝑝𝑖 + 𝐸) ≀ 2

so for all π‘π‘šβˆ—(𝐸) ≀ 2 so π‘šβˆ—(𝐸) = 0. But

[0, 1] βŠ† β‹ƒπ‘žβˆˆQ

𝐸 + π‘ž = R,

8

Page 10: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

1 Lebesgue measure

by countable subadditivity of π‘šβˆ—,

1 = π‘šβˆ—([0, 1]) ≀ βˆ‘π‘žβˆˆQ

π‘šβˆ—(𝐸 + π‘ž) = 0.

Absurd.In particular 𝐸 βˆ‰ β„’ as π‘šβˆ— is additive on β„’.

9

Page 11: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

2 Abstract measure theory

2 Abstract measure theoryIn this chapter we extend measure theory to arbitrary set. Most part of thetheory is developed by FrΓ©chet and CarathΓ©odory.

Definition (𝜎-algebra). A 𝜎-algebra on a set 𝑋 is a Boolean algebra stableunder countable unions.

Definition (measurable space). A measurable space is a couple (𝑋, π’œ)where 𝑋 is a set and π’œ is a 𝜎-algebra on 𝑋.

Definition (measure). A measure on (𝑋, π’œ) is a map πœ‡ ∢ π’œ β†’ [0, ∞] suchthat

1. πœ‡(βˆ…) = 0,

2. πœ‡ is countably additive (also known as 𝜎-additive), i.e. for every family(𝐴𝑛)𝑛β‰₯1 of disjoint subsets in π’œ, have

πœ‡( ⋃𝑛β‰₯1

𝐴𝑛) = βˆ‘π‘›β‰₯1

πœ‡(𝐴𝑛).

The triple (𝑋, π’œ, πœ‡) is called a measure space.

Example.

1. (R𝑑, β„’, π‘š) is a measure space.

2. (𝑋, 2𝑋, #) where # is the counting measure.

Proposition 2.1. Let (𝑋, π’œ, πœ‡) be a measure space. Then

1. πœ‡ is monotone: 𝐴 βŠ† 𝐡 implies πœ‡(𝐴) βŠ† πœ‡(𝐡),

2. πœ‡ is countably subadditive: πœ‡(⋃𝑛β‰₯1 𝐴𝑛) ≀ βˆ‘π‘›β‰₯1 πœ‡(𝐴𝑛),

3. upward monotone convergence: if

𝐸1 βŠ† 𝐸2 βŠ† β‹― βŠ† 𝐸𝑛 βŠ† …

thenπœ‡( ⋃

𝑛β‰₯1𝐸𝑛) = lim

π‘›β†’βˆžπœ‡(𝐸𝑛) = sup

𝑛β‰₯1πœ‡(𝐸𝑛).

4. downard monotone convergence: if

𝐸1 βŠ‡ 𝐸2 βŠ‡ β‹― βŠ‡ 𝐸𝑛 βŠ‡ …

10

Page 12: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

2 Abstract measure theory

and πœ‡(𝐸1) < ∞ then

πœ‡( ⋂𝑛β‰₯1

𝐸𝑛) = limπ‘›β†’βˆž

πœ‡(𝐸𝑛) = inf𝑛β‰₯1

πœ‡(𝐸𝑛).

Proof.

1.πœ‡(𝐡) = πœ‡(𝐴) + πœ‡(𝐡 \ 𝐴)⏟

β‰₯0

by additivity of πœ‡.

2. See example sheet. The idea is that every countable union ⋃𝑛β‰₯1 𝐴𝑛 is adisjoint countable union ⋃𝑛β‰₯1 𝐡𝑛 where for each 𝑛, 𝐡𝑛 βŠ† 𝐴𝑛. It thenfollows by 𝜎-additivity.

3. Let 𝐸0 = βˆ… so⋃𝑛β‰₯1

𝐸𝑛 = ⋃𝑛β‰₯1

(𝐸𝑛 \ πΈπ‘›βˆ’1),

a disjoint union. By 𝜎-additivity,

πœ‡( ⋃𝑛β‰₯1

𝐸𝑛) = βˆ‘π‘›β‰₯1

πœ‡(𝐸𝑛 \ πΈπ‘›βˆ’1)

but for all 𝑁, by additivity of πœ‡,

π‘βˆ‘π‘›=1

πœ‡(𝐸𝑛 \ πΈπ‘›βˆ’1) = πœ‡(𝐸𝑁)

so take limit. The supremum part is obvious.

4. Apply the previous result to 𝐸1 \ 𝐸𝑛.

Remark. Note the πœ‡(𝐸1) < ∞ condition in the last part. Counterexample:𝐸𝑛 = [𝑛, ∞) βŠ† R.

Definition (𝜎-algebra generated by a family). Let 𝑋 be a set and β„± besome family of subsets of 𝑋. The the intersection of all 𝜎-algebras on 𝑋containing β„± is a 𝜎-algebra, called the 𝜎-algebra generated by β„± and isdenoted by 𝜎(β„±).

Proof. Easy check. See example sheet.

Example.

1. Suppose 𝑋 = βˆπ‘π‘–=1 𝑋𝑖, i.e. 𝑋 admits a finite partition. Let β„± = {𝑋1, … , 𝑋𝑛},

then 𝜎(β„±) consists of all subsets that are unions of 𝑋𝑖’s.

2. Suppose 𝑋 is countable and let β„± be the collection of all singletons. Then𝜎(β„±) = 2𝑋.

11

Page 13: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

2 Abstract measure theory

Definition (Borel 𝜎-algebra). Let 𝑋 be a topological space. The 𝜎-algebragenerated by open subsets of 𝑋 is called the Borel 𝜎-algebra of 𝑋, denotedby ℬ(𝑋).

Proposition 2.2. If 𝑋 = R𝑑 then ℬ(𝑋) βŠ† β„’. Moreover every 𝐴 ∈ β„’ canbe written as a disjoint union 𝐴 = 𝐡 βˆͺ 𝑁 where 𝐡 ∈ ℬ(𝑋) and 𝑁 is a nullset.

Proof. We’ve shown that β„’ is a 𝜎-algebra and contains all open sets so ℬ(𝑋) βŠ†β„’. Given 𝐴 ∈ β„’, 𝐴𝑐 ∈ β„’ so for all 𝑛 β‰₯ 1 there exists 𝐢𝑛 countable unions of(open) boxes such that 𝐴𝑐 βŠ† 𝐢𝑛 and π‘šβˆ—(𝐢𝑛 \ 𝐴𝑐) ≀ 1

𝑛 . Take 𝐢 = ⋂𝑛β‰₯1 𝐢𝑛 βˆˆβ„¬(𝑋). Thus 𝐡 ∢= 𝐢𝑐 ∈ ℬ(𝑋) and π‘š(𝐴 \ 𝐡) = 0 because 𝐴 \ 𝐡 = 𝐢 \ 𝐴𝑐.

Remark.

1. It can be shown that ℬ(R𝑑) ⊊ β„’. In fact |β„’| β‰₯ 2𝔠 and |ℬ(R𝑑)| = 𝔠.

2. If β„± is a family of subsets of a set 𝑋, the Boolean algebra generated byβ„± can be explicitly described as

ℬ(β„±) = {finite unions of 𝐹1 ∩ β‹― ∩ 𝐹𝑁 ∢ 𝐹𝑖 ∈ β„± or 𝐹 𝑐𝑖 ∈ β„±}.

3. However, this is not so for 𝜎(β„±). There is no β€œsimple” description of 𝜎-algebra generated by β„±. (c.f. Borel hierarchy in descriptive set theory andtransfinite induction)

Definition (πœ‹-system). A family β„± of subsets of a set 𝑋 is called a πœ‹-systemif it contains βˆ… and it is closed under finite intersection.

Proposition 2.3 (measure uniqueness). Let (𝑋, π’œ) be a measurable space.Assume πœ‡1 and πœ‡2 are two finite measures (i.e. πœ‡π‘–(𝑋) < ∞) such thatπœ‡1(𝐹) = πœ‡2(𝐹) for every 𝐹 ∈ β„± where β„± is a πœ‹-system with 𝜎(β„±) = π’œ.Then πœ‡1 = πœ‡2.

For R𝑑, we only have to check open boxes.

Proof. We state first the following lemma:

Lemma 2.4 (Dynkin lemma). If β„± is a πœ‹-system on 𝑋 and π’ž is a familyof subsets of 𝑋 such that β„± βŠ† π’ž and π’ž is stable under complementation anddisjoint countable unions. Then 𝜎(β„±) βŠ† π’ž.

Let π’ž = {𝐴 ∈ π’œ ∢ πœ‡1(𝐴) = πœ‡2(𝐴)}. Then π’ž is clearly stable under comple-mentation as

πœ‡π‘–(𝐴𝑐) = πœ‡π‘–(𝑋 \ 𝐴) = πœ‡π‘–(𝑋) βˆ’ πœ‡π‘–(𝐴).

π’ž is also clearly stable under countable disjoint unions by 𝜎-additivity. Thus byDynkin lemma, π’ž βŠ‡ 𝜎(β„±) = π’œ.

12

Page 14: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

2 Abstract measure theory

Proof of Dynkin lemma. Let β„³ be the smallest family of subsets of 𝑋 contain-ing β„± and stable under complementation and countable disjoint union (2𝑋 issuch a family and taking intersection). Sufficient to show that β„³ is a 𝜎-algebra,as then β„³ βŠ† π’ž implies 𝜎(β„±) βŠ† π’ž.

It suffices to show β„³ is a Boolean algebra. Let

β„³β€² = {𝐴 ∈ β„³ ∢ 𝐴 ∩ 𝐡 ∈ β„³ for all 𝐡 ∈ β„±}.

β„³β€² again is stable under countable disjoint unions and complementation because

𝐴𝑐 ∩ 𝐡 = (𝐡𝑐 βˆͺ (𝐴 ∩ 𝐡))𝑐

as a disjoint union so is in β„³.As β„³β€² βŠ‡ β„±, by minimality of β„³, have β„³ = β„³β€². Now let

β„³β€³ = {𝐴 ∈ β„³β€² ∢ 𝐴 ∩ 𝐡 ∈ β„³ for all 𝐡 ∈ β„³}.

The same argument shows that β„³β€³ = β„³. Thus β„³ is a Boolean algebra and a𝜎-algebra.

Proposition 2.5 (uniqueness of Lebesgue measure). Lebesgue measure isthe unique translation invariant measure πœ‡ on (R𝑑, ℬ(R𝑑)) such that

πœ‡([0, 1]𝑑) = 1.

Proof. Exercise. Hint: use the πœ‹-system β„± made of all boxes in R𝑑 and dissecta cube into dyadic pieces. Then approximate and use monotone.

Remark.

1. There is no countably additive translation invariant measure on R definedon all subsets of R. (c.f. Vitali’s counterexample).

2. However, the Lebesgue measure can be extended to a finitely additivemeasure on all subsets of R (proof requires Hahn-Banach theorem. SeeIID Linear Analysis).

Recall the construction of Lebesgue measure: we take boxes in R𝑑, and defineelementary sets, which is the Boolean algebra generated by boxes. Then we candefine Jordan measure which is finitely additive. However, this is not countablyadditive but analysis craves limits so we define Lebesgue measurable sets, byintroducing the outer measure π‘šβˆ—, which is built from the Jordan measure.Finally we restrict this outer measure to β„’. We also define the Borel 𝜎-algebra,which is the same as the 𝜎-algebra generated by the boxes. We show that theBorel 𝜎-algebra is contained in β„’, and every element in β„’ can be written as adisjoint union of an element in the Borel 𝜎-algebra and a measure zero set.

Suppose ℬ is a Boolean algebra on a set 𝑋. Let πœ‡ be a finitely additivemeasure on ℬ. We are going to construct a measure on 𝜎(ℬ).

13

Page 15: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

2 Abstract measure theory

Theorem 2.6 (CarathΓ©odory extension theorem). Assume that πœ‡ is count-ably additive on ℬ, i.e. if 𝐡𝑛 ∈ ℬ disjoint is such that ⋃𝑛β‰₯1 𝐡𝑛 ∈ ℬ thenπœ‡(⋃𝑛β‰₯1 𝐡𝑛) = βˆ‘π‘›β‰₯1 πœ‡(𝐡𝑛) and assume that πœ‡ is 𝜎-finite, i.e. there existsπ‘‹π‘š ∈ ℬ such that 𝑋 = β‹ƒπ‘šβ‰₯1 π‘‹π‘š and πœ‡(π‘‹π‘š) < ∞, then πœ‡ extends uniquelyto a measure on 𝜎(ℬ).

Proof. For any 𝐸 βŠ† 𝑋, let

πœ‡βˆ—(𝐸) = inf{βˆ‘π‘›β‰₯1

πœ‡(𝐡𝑛) ∢ 𝐸 βŠ† ⋃𝑛β‰₯1

𝐡𝑛, 𝐡𝑛 ∈ ℬ}

and call it the outer measure associated to πœ‡. Define a subset 𝐸 βŠ† 𝑋 to beπœ‡βˆ—-measurable if for all πœ€ > 0 there exists 𝐢 = ⋃𝑛β‰₯1 𝐡𝑛 with 𝐡𝑛 ∈ ℬ such that𝐸 βŠ† 𝐢 and

πœ‡βˆ—(𝐢 \ 𝐸) ≀ πœ€.

We denote by β„¬βˆ— the set of πœ‡βˆ—-measurable subsets. Claim that

1. πœ‡βˆ— is countably subadditive and monotone.

2. πœ‡βˆ—(𝐡) = πœ‡(𝐡) for all 𝐡 ∈ ℬ.

3. β„¬βˆ— is a 𝜎-algebra and contains all πœ‡βˆ—-null sets and ℬ.

4. πœ‡βˆ— is 𝜎-additive on β„¬βˆ—.

Then existence follows from the proposition as β„¬βˆ— βŠ‡ 𝜎(ℬ): πœ‡βˆ— will be ameasure on β„¬βˆ— and thus on 𝜎(ℬ). Uniqueness follows from a similar proof forLebesgue measure via Dynkin lemma.

Proof. This will be very easy as we only need to adapt our previous work tothe general case. Note that in a few occassion we used properties of R𝑑, suchas openness of some sets, so be careful.

1. Same.

2. πœ‡βˆ—(𝐡) ≀ πœ‡(𝐡) for all 𝐡 ∈ ℬ by definition of πœ‡βˆ—. For the other direction, forall πœ€ > 0, there exist 𝐡𝑛 ∈ ℬ such that 𝐡 βŠ† ⋃𝑛β‰₯1 𝐡𝑛 and βˆ‘π‘›β‰₯1 πœ‡(𝐡𝑛) β‰€πœ‡βˆ—(𝐡) + πœ€. But

𝐡 = ⋃𝑛β‰₯1

𝐡𝑛 ∩ 𝐡 = ⋃𝑛β‰₯1

𝐢𝑛

where 𝐢𝑛 ∢= 𝐡𝑛 ∩ 𝐡 \ ⋃𝑖<𝑛 𝐡 ∩ 𝐡𝑖 and so 𝐢𝑛 ∈ ℬ. Thus by countableadditivity

πœ‡(𝐡) = βˆ‘π‘›β‰₯1

πœ‡(𝐢𝑛) ≀ βˆ‘π‘›β‰₯1

πœ‡(𝐡𝑛) ≀ πœ‡βˆ—(𝐡) + πœ€

3. πœ‡βˆ—-null sets and ℬ are obviously in β„¬βˆ—. Thus it is left to show that β„¬βˆ—

is a 𝜎-algebra. Stability under countable union is exactly the same andthen we claim that β„¬βˆ— is stable under complementation. This is the bitwhere we used closed/open sets in R𝑑 in the original proof. Here we usea lemma as a substitute.

14

Page 16: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

2 Abstract measure theory

Lemma 2.7. Suppose 𝐡𝑛 ∈ ℬ then ⋂𝑛β‰₯1 𝐡𝑛 ∈ β„¬βˆ—.

Proof. First claim that if 𝐸 = ⋂𝑛β‰₯1 𝐼𝑛 where 𝐼𝑛+1 βŠ† 𝐼𝑛 and 𝐼𝑛 ∈ ℬ suchthat πœ‡(𝐼1) < ∞ then πœ‡βˆ—(𝐸) = limπ‘›β†’βˆž πœ‡(𝐼𝑛) and 𝐸 ∈ β„¬βˆ—: by additivity ofπœ‡ on ℬ,

π‘βˆ‘π‘›=1

πœ‡(𝐼𝑛 \ 𝐼𝑛+1) = πœ‡(𝐼1) βˆ’ πœ‡(𝐼𝑁)

which converges as 𝑁 β†’ ∞ (because πœ‡(𝐼𝑛+1) ≀ πœ‡(𝐼𝑛)), so

βˆ‘π‘›β‰₯𝑁

πœ‡(𝐼𝑛 \ 𝐼𝑛+1) β†’ 0

as 𝑁 β†’ ∞. But LHS is greater than πœ‡βˆ—(𝐼𝑁 \ 𝐸) because 𝐼𝑁 \ 𝐸 =⋃𝑛β‰₯𝑁 𝐼𝑛 \ 𝐼𝑛+1. Therefore 𝐸 ∈ β„¬βˆ— and

πœ‡(𝐼𝑛) ≀ πœ‡βˆ—(𝐼𝑛 \ 𝐸)βŸβŸβŸβŸβŸβ†’0

+ πœ‡βˆ—(𝐸)βŸβ‰€πœ‡(𝐼𝑛)

solim

π‘›β†’βˆžπœ‡(𝐼𝑛) = πœ‡βˆ—(𝐸).

Now for the actual lemma, let 𝐸 = ⋂𝑛β‰₯1 𝐼𝑛 where 𝐼𝑛 ∈ ℬ. wlog we mayassume 𝐼𝑛+1 βŠ† 𝐼𝑛. By 𝜎-finiteness assumption, 𝑋 = β‹ƒπ‘šβ‰₯1 π‘‹π‘š whereπ‘‹π‘š ∈ ℬ with πœ‡(π‘‹π‘š) < ∞ so

𝐸 = β‹ƒπ‘šβ‰₯1

𝐸 ∩ π‘‹π‘š.

By the claim for all π‘š, 𝐸 ∩ π‘‹π‘š ∈ β„¬βˆ— so 𝐸 ∈ β„¬βˆ—.

From the lemma we can derive that β„¬βˆ— is also stable under complementa-tion: given 𝐸 ∈ β„¬βˆ—, for all 𝑛 there exist 𝐢𝑛 = ⋃𝑖β‰₯1 𝐡𝑛,𝑖 where 𝐡𝑛,𝑖 ∈ ℬsuch that 𝐸 βŠ† 𝐢𝑛 and πœ‡βˆ—(𝐢𝑛 \ 𝐸) ≀ 1

𝑛 . Now

𝐸𝑐 = ( ⋃𝑛β‰₯1

𝐢𝑐𝑛) βˆͺ (𝐸𝑐 \ ⋃

𝑛β‰₯1𝐢𝑐

𝑛)

but 𝐢𝑐𝑛 is a countable intersection ⋂𝑖β‰₯1 𝐡𝑐

𝑛,𝑖 and 𝐸𝑐 \ ⋃𝑛β‰₯1 𝐢𝑐𝑛 is πœ‡βˆ—-null

so by the lemma, 𝐢𝑐𝑛 ∈ β„¬βˆ—. Therefore their union is also in β„¬βˆ—. Since

we’ve shown that null sets are in β„¬βˆ—, 𝐸𝑐 ∈ β„¬βˆ—.

4. We want to show πœ‡βˆ— is countably additive on β„¬βˆ—. Recall that πœ‡ is 𝜎-finite:there exists π‘‹π‘š ∈ ℬ such that 𝑋 = β‹ƒπ‘šβ‰₯1 π‘‹π‘š, πœ‡(π‘‹π‘š) < ∞. We say𝐸 βŠ† 𝑋 is bounded if there exists π‘š such that 𝐸 βŠ† π‘‹π‘š. It is then enoughto show countable additivity for bounded sets by the same argument asbefore: write 𝑋 = β‹ƒπ‘šβ‰₯1 οΏ½οΏ½π‘š where οΏ½οΏ½π‘š = π‘‹π‘š \ ⋃𝑖<π‘š 𝑋𝑖 ∈ ℬ so this is adisjoint union. Then if 𝐸 = ⋃𝑛β‰₯1 𝐸𝑛 as a disjoint union then

𝐸 = ⋃𝑛β‰₯1

β‹ƒπ‘šβ‰₯1

(𝐸𝑛 ∩ οΏ½οΏ½π‘š)

15

Page 17: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

2 Abstract measure theory

which is also a countable disjoint union.Given 𝐸, if we can show finite additivity then

π‘βˆ‘π‘›=1

πœ‡βˆ—(𝐸𝑛) = πœ‡βˆ—(𝑁⋃𝑛=1

𝐸𝑛) ≀ πœ‡βˆ—(𝐸) ≀ βˆ‘π‘›β‰₯1

πœ‡βˆ—(𝐸𝑛)

take limit as 𝑁 β†’ ∞ to have equality throughout.It suffices to prove finite additivity when 𝐸 and 𝐹 are countable intersec-tions of sets from ℬ: 𝐸, 𝐹 ∈ β„¬βˆ— so for πœ€ > 0 there exists 𝐢, 𝐷 countableintersections of sets from ℬ such that 𝐢 βŠ† 𝐸, 𝐷 βŠ† 𝐹 and

πœ‡βˆ—(𝐸) ≀ πœ‡βˆ—(𝐢) + πœ€πœ‡βˆ—(𝐹) ≀ πœ‡βˆ—(𝐷) + πœ€

As 𝐸 ∩ 𝐹 = βˆ… and 𝐢 βŠ† 𝐸, 𝐷 βŠ† 𝐹, 𝐢 ∩ 𝐷 = βˆ… so by finite additivity,

πœ‡βˆ—(𝐸) + πœ‡βˆ—(𝐹) ≀ 2πœ€ + πœ‡βˆ—(𝐢 βˆͺ 𝐷) ≀ 2πœ€ + πœ‡βˆ—(𝐸 βˆͺ 𝐹).

As usual, reverse holds by subadditivity.Finally for 𝐸 = ⋂𝑛β‰₯1 𝐼𝑛, 𝐹 = ⋂𝑛β‰₯1 𝐽𝑛 bounded, wlog assume 𝐼𝑛+1 βŠ†πΌπ‘›, 𝐽𝑛+1 βŠ† 𝐽𝑛. πœ‡(𝐼𝑛), πœ‡(𝐽𝑛) < ∞. Now use claim 3,

πœ‡βˆ—(𝐸) = limπ‘›β†’βˆž

πœ‡βˆ—(𝐼𝑛)

πœ‡βˆ—(𝐹) = limπ‘›β†’βˆž

πœ‡βˆ—(𝐽𝑛)

so

πœ‡βˆ—(𝐸) + πœ‡βˆ—(𝐹) = limπ‘›β†’βˆž

πœ‡(𝐼𝑛) + πœ‡(𝐽𝑛) = limπ‘›β†’βˆž

(πœ‡(𝐼𝑛 βˆͺ 𝐽𝑛) + πœ‡(𝐼𝑛 ∩ 𝐽𝑛))

But

⋂𝑛β‰₯1

(𝐼𝑛 ∩ 𝐽𝑛) = 𝐸 ∩ 𝐹 = βˆ…

⋂𝑛β‰₯1

(𝐼𝑛 βˆͺ 𝐽𝑛) = 𝐸 βˆͺ 𝐹

so by claim 3

limπ‘›β†’βˆž

πœ‡(𝐼𝑛 ∩ 𝐽𝑛) = 0

limπ‘›β†’βˆž

πœ‡(𝐼𝑛 βˆͺ 𝐽𝑛) = πœ‡βˆ—(𝐸 βˆͺ 𝐹)

which finishes the proof.

Remark. We prove that every set in β„¬βˆ— is a disjoint union 𝐸 = 𝐹 βˆͺ 𝑁 where𝐹 ∈ 𝜎(ℬ) and 𝑁 is πœ‡βˆ—-null.

16

Page 18: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

2 Abstract measure theory

Definition (completion). We say that β„¬βˆ— is the completion of 𝜎(ℬ) withrespect to πœ‡.

Example. β„’ is the completion of ℬ(R𝑑) in R𝑑.

17

Page 19: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

3 Integration and measurable functions

3 Integration and measurable functions

Definition (measurable function). Let (𝑋, π’œ) be a measurable space. Afunction 𝑋 β†’ R is called measurable or π’œ-measurable if for all 𝑑 ∈ R,

{π‘₯ ∈ 𝑋 ∢ 𝑓(π‘₯) < 𝑑} ∈ π’œ.

Remark. The 𝜎-algebra generated by intervals (βˆ’βˆž, 𝑑) where 𝑑 ∈ R is the Borel𝜎-algebra of R, denote ℬ(R). Thus for every measurable function 𝑓 ∢ 𝑋 β†’ R,the preimage π‘“βˆ’1(𝐡) ∈ π’œ for all 𝐡 ∈ ℬ(R). However, it is not true thatπ‘“βˆ’1(𝐿) ∈ π’œ for any 𝐿 ∈ β„’.

Remark. If 𝑓 is allowed to take the values +∞ and βˆ’βˆž we will say that 𝑓 ismeasurable if additionally π‘“βˆ’1({+∞}) ∈ π’œ and π‘“βˆ’1({βˆ’βˆž}) ∈ π’œ.

More generally,

Definition (measurable map). Suppose (𝑋, π’œ) and (π‘Œ , ℬ) are measurablespaces. A map 𝑓 ∢ 𝑋 β†’ π‘Œ is measurable if for all 𝐡 ∈ ℬ, π‘“βˆ’1(𝐡) ∈ π’œ.

Proposition 3.1.

1. The composition of measurable maps is measurable.

2. If 𝑓, 𝑔 ∢ (𝑋, π’œ) β†’ R are measurable functions then 𝑓 + 𝑔, 𝑓𝑔 and πœ†π‘“for πœ† ∈ R are also measurable.

3. If (𝑓𝑛)𝑛β‰₯1 is a sequence of measurable functions on (𝑋, π’œ) then so aresup𝑛 𝑓𝑛, inf𝑛 𝑓𝑛, lim sup𝑛 𝑓𝑛 and lim inf𝑛 𝑓𝑛.

Proof.

1. Obvious.

2. Follow from 1 once it’s shown that + ∢ R2 β†’ R and Γ— ∢ R2 β†’ R aremeasurable (with respect to Borel sets). The sets

{(π‘₯, 𝑦) ∢ π‘₯ + 𝑦 < 𝑑}{(π‘₯, 𝑦) ∢ π‘₯𝑦 < 𝑑}

are open in R2 and hence Borel.

3. inf𝑛 𝑓𝑛(π‘₯) < 𝑑 if and only if

π‘₯ ∈ ⋃𝑛

{π‘₯ ∢ 𝑓𝑛(π‘₯) < 𝑑}

and similar for sup. Similarly lim sup𝑛 𝑓𝑛(π‘₯) < 𝑑 if and only if

π‘₯ ∈ β‹ƒπ‘šβ‰₯1

β‹‚π‘˜β‰₯1

⋃𝑛β‰₯π‘˜

{π‘₯ ∢ 𝑓𝑛(π‘₯) < 𝑑 βˆ’ 1π‘š

}.

18

Page 20: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

3 Integration and measurable functions

Proposition 3.2. 𝑓 = (𝑓1, … , 𝑓𝑑) ∢ (𝑋, π’œ) β†’ (R𝑑, ℬ(R𝑑)) where 𝑑 β‰₯ 1 ismeasurable if and only if each 𝑓𝑖 ∢ 𝑋 β†’ R is measurable.

Proof. One direction is easy: suppose 𝑓 is measurable then

{π‘₯ ∢ 𝑓𝑖(π‘₯) < 𝑑} = π‘“βˆ’1({𝑦 ∈ R𝑑 ∢ 𝑦𝑖 < 𝑑}),

which is open so 𝑓𝑖 is measurable.Conversely, suppose 𝑓𝑖 is measurable. Then

π‘“βˆ’1(𝑑

βˆπ‘–=1

[π‘Žπ‘–, 𝑏𝑖]) =𝑑

⋂𝑖=1

{π‘₯ ∢ π‘Žπ‘– ≀ 𝑓𝑖(π‘₯) ≀ 𝑏𝑖}

As the boxes generate the Borel sets, done.

Example.

1. Let (𝑋, π’œ) be a measurable space and 𝐸 βŠ† 𝑋. Then 𝐸 ∈ π’œ if and only if1𝐸, the indicator function on 𝐸, is π’œ-measurable.

2. If 𝑋 = βˆπ‘π‘–=1 𝑋𝑖 and π’œ is the Boolean algebra generated by the 𝑋𝑖’s. A

function 𝑓 ∢ (𝑋, π’œ) β†’ R is measurable if and only if 𝑓 is constant on each𝑋𝑖. In this case the vector space of measurable functions has dimension𝑁.

3. Every continuous function 𝑓 ∢ R𝑑 β†’ R is measurable.

Definition (Borel measurable). If 𝑋 is a topological space, 𝑓 ∢ 𝑋 β†’ R isBorel or Borel measurable if it is ℬ(𝑋)-measurable.

Definition (simple function). A function 𝑓 on (𝑋, π’œ) is called simple if

𝑓 =𝑛

βˆ‘π‘–=1

π‘Žπ‘–1𝐴𝑖

for some π‘Žπ‘– β‰₯ 0 and 𝐴𝑖 ∈ π’œ.

Of course simple functions are measurable.

Lemma 3.3. If a simple function can be written in two ways

𝑓 =𝑛

βˆ‘π‘–=1

π‘Žπ‘–1𝐴𝑖=

π‘ βˆ‘π‘—=1

𝑏𝑗1𝐡𝑗

then π‘›βˆ‘π‘–=1

π‘Žπ‘–πœ‡(𝐴𝑖) =𝑠

βˆ‘π‘—=1

π‘π‘—πœ‡(𝐡𝑗)

for any measure πœ‡ on (𝑋, π’œ).

Proof. Example sheet 1.

19

Page 21: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

3 Integration and measurable functions

Definition (integral of a simple function with respect to a measure). Theπœ‡-integral of 𝑓 is defined by

πœ‡(𝑓) ∢=𝑛

βˆ‘π‘–=1

π‘Žπ‘–πœ‡(𝐴𝑖).

Remark.

1. The lemma says that the integral is well-defined.

2. We also use the notation βˆ«π‘‹

π‘“π‘‘πœ‡ to denote πœ‡(𝑓).

Proposition 3.4. πœ‡-integral satisfies, for all simple functions 𝑓 and 𝑔,

1. linearity: for all 𝛼, 𝛽 β‰₯ 0, πœ‡(𝛼𝑓 + 𝛽𝑔) = π›Όπœ‡(𝑓) + π›½πœ‡(𝑔).

2. positivity: if 𝑔 ≀ 𝑓 then πœ‡(𝑔) ≀ πœ‡(𝑓).

3. if πœ‡(𝑓) = 0 then 𝑓 = 0 πœ‡-almost everywhere, i.e. {π‘₯ ∈ 𝑋 ∢ 𝑓(π‘₯) β‰  0}is a πœ‡-null set.

Proof. Obvious from definition and lemma.

Definition. If 𝑓 β‰₯ 0 and measurable on (𝑋, π’œ), define

πœ‡(𝑓) = sup{πœ‡(𝑔) ∢ 𝑔 simple , 𝑔 ≀ 𝑓} ∈ [0, +∞].

Remark. This is consistent with the definition for 𝑓 simple, due to positivity.

Definition (integrable). If 𝑓 is an arbitrary measurable function on (𝑋, π’œ)we say 𝑓 is πœ‡-integrable if

πœ‡(|𝑓|) < ∞.

Definition (integral with respect to a measure). If 𝑓 is πœ‡-integrable, thenwe define its πœ‡-integral by

πœ‡(𝑓) = πœ‡(𝑓+) βˆ’ πœ‡(π‘“βˆ’)

where 𝑓+ = max{0, 𝑓} and π‘“βˆ’ = (βˆ’π‘“)+.

Note.

|𝑓| = 𝑓+ + π‘“βˆ’

𝑓 = 𝑓+ βˆ’ π‘“βˆ’

Theorem 3.5 (monotone convergence theorem). Let (𝑓𝑛)𝑛β‰₯1 be a sequenceof measurable functions on a measure space (𝑋, π’œ, πœ‡) such that

0 ≀ 𝑓1 ≀ 𝑓2 ≀ β‹― ≀ 𝑓𝑛 ≀ …

20

Page 22: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

3 Integration and measurable functions

Let 𝑓 = limπ‘›β†’βˆž 𝑓𝑛. Then

πœ‡(𝑓) = limπ‘›β†’βˆž

πœ‡(𝑓𝑛).

Lemma 3.6. If 𝑔 is a simple function on (𝑋, π’œ, πœ‡), the map

π‘šπ‘” ∢ π’œ β†’ [0, ∞]𝐸 ↦ πœ‡(1𝐸𝑔)

is a measure on (𝑋, π’œ).

Proof. Write 𝑔 = βˆ‘π‘Ÿπ‘–=1 π‘Žπ‘–1𝐴𝑖

so 𝑔1𝐸 = βˆ‘π‘Ÿπ‘–=1 π‘Žπ‘–1π΄π‘–βˆ©πΈ so

πœ‡(1𝐸𝑔) =π‘Ÿ

βˆ‘π‘–=1

π‘Žπ‘–πœ‡(𝐴𝑖 ∩ 𝐸).

By a question on example sheet this is well-defined. Then 𝜎-additivity followsimmediately from 𝜎-additivity of πœ‡.

Proof of monotone convergence theorem. 𝑓𝑛 ≀ 𝑓𝑛+1 ≀ 𝑓 by assumption so

πœ‡(𝑓𝑛) ≀ πœ‡(𝑓𝑛+1) ≀ πœ‡(𝑓)

by definition of integral so

limπ‘›β†’βˆž

πœ‡(𝑓𝑛) ≀ πœ‡(𝑓),

although RHS may be infinite.Let 𝑔 be any simple function with 𝑔 ≀ 𝑓. Need to show that πœ‡(𝑔) ≀

limπ‘›β†’βˆž πœ‡(𝑓𝑛). Pick πœ€ > 0. Let

𝐸𝑛 = {π‘₯ ∈ 𝑋 ∢ 𝑓𝑛(π‘₯) β‰₯ (1 βˆ’ πœ€)𝑔(π‘₯)}.

Then 𝑋 = ⋃𝑛β‰₯1 𝐸𝑛 and 𝐸𝑛 βŠ† 𝐸𝑛+1. So we may apply upward monotoneconvergence for sets to measure π‘šπ‘” and get

limπ‘›β†’βˆž

π‘šπ‘”(𝐸𝑛) = π‘šπ‘”(𝑋) = πœ‡(𝑔1𝑋) = πœ‡(𝑔).

But(1 βˆ’ πœ€)π‘šπ‘”(𝐸𝑛) = πœ‡((1 βˆ’ πœ€)𝑔1𝐸𝑛

)) ≀ πœ‡(𝑓𝑛)

because (1 βˆ’ πœ€)𝑔1𝐸𝑛is a simple function smaller than 𝑓𝑛. Taking limit,

(1 βˆ’ πœ€)πœ‡(𝑔) ≀ limπ‘›β†’βˆž

πœ‡(𝑓𝑛)

which holds for all πœ€. Soπœ‡(𝑔) ≀ lim

π‘›β†’βˆžπœ‡(𝑓𝑛).

21

Page 23: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

3 Integration and measurable functions

Lemma 3.7. If 𝑓 β‰₯ 0 is a measurable function on (𝑋, π’œ) then there is asequence of simple functions (𝑔𝑛)𝑛β‰₯1

0 ≀ 𝑔𝑛 ≀ 𝑔𝑛+1 ≀ 𝑓

such that for all π‘₯ ∈ 𝑋, 𝑔𝑛(π‘₯) ↑ 𝑓(π‘₯).

Notation. 𝑔𝑛 ↑ 𝑓 means that limπ‘›β†’βˆž 𝑔𝑛(π‘₯) = 𝑓(π‘₯) and 𝑔𝑛+1 β‰₯ 𝑔𝑛.

Proof. We can take𝑔𝑛 = 1

2𝑛 ⌊2𝑛 min{𝑓, 𝑛}βŒ‹

pointwise. Check that ⌊2π‘¦βŒ‹ β‰₯ 2βŒŠπ‘¦βŒ‹ for all 𝑦 β‰₯ 0.

Proposition 3.8. Basic properties of the integral (for positive functions):suppose 𝑓, 𝑔 β‰₯ 0 are measurable on (𝑋, π’œ, πœ‡).

1. linearity: for all 𝛼, 𝛽 β‰₯ 0, πœ‡(𝛼𝑓 + 𝛽𝑦) = π›Όπœ‡(𝑓) + π›½πœ‡(𝑔).

2. positivity: if 0 ≀ 𝑓 ≀ 𝑔 then πœ‡(𝑓) ≀ πœ‡(𝑔).

3. if πœ‡(𝑓) = 0 then 𝑓 = 0 πœ‡-almost everywhere.

4. if 𝑓 = 𝑔 πœ‡-almost everywhere then πœ‡(𝑓) = πœ‡(𝑔).

Proof.

1. Follows from the same property for simple functions and from Lemma 3.7combined with monotone convergence theorem.

2. Obvious from definition.

3.{π‘₯ ∈ 𝑋 ∢ 𝑓(π‘₯) β‰  0} = ⋃

𝑛β‰₯0{π‘₯ ∈ 𝑋 ∢ 𝑓(π‘₯) > 1

𝑛}

set 𝑔𝑛 = 1𝑛 1{π‘₯βˆˆπ‘‹βˆΆπ‘“(π‘₯)>1/𝑛} which is simple and 𝑔𝑛 ≀ 𝑓 so by definition of

integral πœ‡(𝑔𝑛) ≀ πœ‡(𝑓) so πœ‡(𝑔𝑛) = 0, i.e. πœ‡({π‘₯ ∢ 𝑓(π‘₯) > 1𝑛 }) = 0.

4. Note that if 𝐸 ∈ π’œ, πœ‡(𝐸𝑐) = 0 then

πœ‡(β„Ž1𝐸) = πœ‡(β„Ž)

for all β„Ž simple. Thus it holds for all β„Ž β‰₯ 0 measurable. Now take𝐸 = {π‘₯ ∢ 𝑓(π‘₯) = 𝑔(π‘₯)}.

Proposition 3.9 (linearity of integral). Suppose 𝑓, 𝑔 are πœ‡-integrable func-tions and 𝛼, 𝛽 ∈ R. Then 𝛼𝑓 + 𝛽𝑔 is πœ‡-integrable and

πœ‡(𝛼𝑓 + 𝛽𝑔) = π›Όπœ‡(𝑓) + π›½πœ‡(𝑔).

Proof. We have shown the case when 𝛼, 𝛽 β‰₯ 0 and 𝑓, 𝑔 β‰₯ 0. In the general case,use the positive and negative parts.

22

Page 24: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

3 Integration and measurable functions

Lemma 3.10 (Fatou’s lemma). Suppose (𝑓𝑛)𝑛β‰₯1 is a sequence of measurablefunctions on (𝑋, π’œ, πœ‡) such that 𝑓𝑛 β‰₯ 0 for all 𝑛. Then

πœ‡(lim infπ‘›β†’βˆž

𝑓𝑛) ≀ lim infπ‘›β†’βˆž

πœ‡(𝑓𝑛).

Remark. We may not have equality: let 𝑓𝑛 = 1[𝑛,𝑛+1] on (R, β„’, π‘š). Thenπœ‡(𝑓𝑛) = 1 but limπ‘›β†’βˆž 𝑓𝑛 = 0.

Proof. Let 𝑔𝑛 ∢= infπ‘˜β‰₯𝑛 π‘“π‘˜. Then 𝑔𝑛+1 β‰₯ 𝑔𝑛 β‰₯ 0 so by monotone convergencetheorem, πœ‡(𝑔𝑛) ↑ πœ‡(𝑔) as 𝑛 β†’ ∞ where 𝑔 = limπ‘›β†’βˆž 𝑔 = lim infπ‘›β†’βˆž 𝑓𝑛 and𝑔𝑛 ≀ 𝑓𝑛 so πœ‡(𝑔𝑛) ≀ πœ‡(𝑓𝑛) for all 𝑛. Take 𝑛 β†’ ∞,

πœ‡(𝑔) ≀ lim infπ‘›β†’βˆž

πœ‡(𝑓𝑛).

In both monotone convergence theorem and Fatou’s lemma we assumed thatthe sequence of functions is nonnegative. There is another version of convergencetheorem where we replace nonnegativity by domination:

Theorem 3.11 (Lebesgue’s dominated convergence theorem). Let (𝑓𝑛)𝑛β‰₯1be a sequence of measurable functions on (𝑋, π’œ, πœ‡) and 𝑔 a πœ‡-integrablefunction on 𝑋. Assume |𝑓𝑛| ≀ 𝑔 for all 𝑛 (domination assumption) andassume for all π‘₯ ∈ 𝑋, limπ‘›β†’βˆž 𝑓𝑛(π‘₯) = 𝑓(π‘₯). Then 𝑓 is πœ‡-integrable and

πœ‡(𝑓) = limπ‘›β†’βˆž

πœ‡(𝑓𝑛).

This allows us to swap limit and integral.

Proof. |𝑓𝑛| ≀ 𝑔 so |𝑓| ≀ 𝑔 so πœ‡(|𝑓|) ≀ πœ‡(𝑔) < ∞ and 𝑓 is integrable. Note that𝑔 + 𝑓𝑛 β‰₯ 0 so by Fatou’s lemma,

πœ‡(lim infπ‘›β†’βˆž

(𝑔 + 𝑓𝑛)) ≀ lim infπ‘›β†’βˆž

πœ‡(𝑔 + 𝑓𝑛).

But lim infπ‘›β†’βˆž(𝑔 + 𝑓𝑛) = 𝑔 + 𝑓 and by linearity πœ‡(𝑔 + 𝑓𝑛) = πœ‡(𝑔) + πœ‡(𝑓𝑛), so

πœ‡(𝑔) + πœ‡(𝑓) ≀ πœ‡(𝑔) + lim infπ‘›β†’βˆž

πœ‡(𝑓𝑛),

i.e.πœ‡(𝑓) ≀ lim inf

π‘›β†’βˆžπœ‡(𝑓𝑛).

Do the same with 𝑔 βˆ’ 𝑓𝑛 in place of 𝑔 + 𝑓𝑛, get

πœ‡(βˆ’π‘“) ≀ lim infπ‘›β†’βˆž

πœ‡(βˆ’π‘“π‘›) = βˆ’ lim supπ‘›β†’βˆž

πœ‡(𝑓𝑛)

soπœ‡(𝑓) = lim

π‘›β†’βˆžπœ‡(𝑓𝑛).

23

Page 25: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

3 Integration and measurable functions

Corollary 3.12 (exchanging integral and summation). Let (𝑋, π’œ, πœ‡) be ameasure space and let (𝑓𝑛)𝑛β‰₯1 be a sequence of measurable functions on 𝑋.

1. If 𝑓𝑛 β‰₯ 0 thenπœ‡(βˆ‘

𝑛β‰₯1𝑓𝑛) = βˆ‘

𝑛β‰₯1πœ‡(𝑓𝑛).

2. If βˆ‘π‘›β‰₯1 |𝑓𝑛| is πœ‡-integrable then βˆ‘π‘›β‰₯1 𝑓𝑛 is πœ‡-integrable and

πœ‡(βˆ‘π‘›β‰₯1

𝑓𝑛) = βˆ‘π‘›β‰₯1

πœ‡(𝑓𝑛).

Proof.

1. Let 𝑔𝑁 = βˆ‘π‘π‘›=1 𝑓𝑛, then 𝑔𝑁 ↑ βˆ‘π‘›β‰₯1 𝑓𝑛 as 𝑁 β†’ ∞ so the result follows

from monotone convergence theorem.

2. Let 𝑔 = βˆ‘π‘›β‰₯1 |𝑓𝑛| and 𝑔𝑁 as above. Then |𝑔𝑁| ≀ 𝑔 for all 𝑁 so thedomination assumption holds. The result thus follows from dominatedconvergence theorem.

Corollary 3.13 (differentiation under integral sign). Let (𝑋, π’œ, πœ‡) be ameasure space. Let π‘ˆ βŠ† R be an open set and let 𝑓 ∢ π‘ˆ Γ— 𝑋 β†’ R be suchthat

1. π‘₯ ↦ 𝑓(𝑑, π‘₯) is πœ‡-integrable for all 𝑑 ∈ π‘ˆ,

2. 𝑑 ↦ 𝑓(𝑑, π‘₯) is differentiable for all π‘₯ ∈ 𝑋,

3. domination: there exists 𝑔 ∢ 𝑋 β†’ R πœ‡-integrable such that for all𝑑 ∈ π‘ˆ, π‘₯ ∈ 𝑋,

πœ•π‘“πœ•π‘‘

(𝑑, π‘₯) ≀ 𝑔(π‘₯).

Then π‘₯ ↦ πœ•π‘“πœ•π‘‘ (𝑑, π‘₯) is πœ‡-integrable for all 𝑑 ∈ π‘ˆ and if we set 𝐹(𝑑) =

βˆ«π‘‹

𝑓(𝑑, π‘₯)π‘‘πœ‡ then 𝐹 is differentiable and

𝐹 β€²(𝑑) = βˆ«π‘‹

πœ•π‘“πœ•π‘‘

(𝑑, π‘₯)π‘‘πœ‡.

Proof. Pick β„Žπ‘› > 0, β„Žπ‘› β†’ 0 and define

𝑔𝑛(𝑑, π‘₯) ∢= 1β„Žπ‘›

(𝑓(𝑑 + β„Žπ‘›, π‘₯) βˆ’ 𝑓(𝑑, π‘₯)).

Thenlim

π‘›β†’βˆžπ‘”π‘›(𝑑, π‘₯) = πœ•π‘“

πœ•π‘‘(𝑑, π‘₯).

By mean value theorem, there exists πœƒπ‘‘,𝑛,π‘₯ ∈ [𝑑, 𝑑 + β„Žπ‘›] such that

𝑔𝑛(𝑑, π‘₯) = πœ•π‘“πœ•π‘‘

(πœƒ, π‘₯)

24

Page 26: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

3 Integration and measurable functions

so|𝑔𝑛(𝑑, π‘₯)| ≀ 𝑔(π‘₯)

by domination assumption. Now apply dominated convergence theorem.

Remark.

1. If 𝑓 ∢ [π‘Ž, 𝑏] β†’ R is continuous where π‘Ž < 𝑏 in R, then 𝑓 is π‘š-integrable(where π‘š is the Lebesgue measure) and π‘š(𝑓) = βˆ«π‘

π‘Žπ‘“(π‘₯)𝑑π‘₯ is the Riemann

integral. In general if 𝑓 is only assumed to be bounded, then 𝑓 will beRiemann integrable if and only if the points of discontinuity of 𝑓 is anπ‘š-null set. See example sheet 2.

2. If 𝑔 ∈ GL𝑑(R) and 𝑓 β‰₯ 0 is Borel measurable on R𝑑, then

π‘š(𝑓 ∘ 𝑔) = 1| det 𝑔|

π‘š(𝑓).

See example sheet 2. In particular π‘š is invariant under linear transfor-mation whose determinant has absolute value 1, e.g. rotation.

Remark. In each of monotone convergence theorem, Fatou’s lemma and dom-inated convergence theorem, we can replace pointwise assumption by the corre-sponding πœ‡-almost everywhere. The same conclusion holds. Indeed, let

𝐸 = {π‘₯ ∈ 𝑋 ∢ assumptions hold at π‘₯}

so 𝐸𝑐 is a πœ‡-null set. Replace each 𝑓𝑛 (and similarly 𝑔 etc) by 1𝐸𝑓𝑛. Thenassumptions then hold everywhere as πœ‡(𝑓1𝐸) = πœ‡(𝑓) for all 𝑓 measurable.

25

Page 27: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

4 Product measures

4 Product measures

Definition (product 𝜎-algebra). Let (𝑋, π’œ) and (π‘Œ , ℬ) be measurable spaces.The 𝜎-algebra of subsets of 𝑋 Γ—π‘Œ generated by the product sets 𝐸 ×𝐹 where𝐸 ∈ π’œ, 𝐹 ∈ ℬ is called the product 𝜎-algebra of π’œ and ℬ and is denoted byπ’œ βŠ— 𝐡.

Remark.

1. By analogy with the notion of product topology, π’œ βŠ— ℬ is the smallest𝜎-algebra of subsets of 𝑋 Γ—π‘Œ making the two projection maps measurable.

2. ℬ(R𝑑1) βŠ— ℬ(R𝑑2) = ℬ(R𝑑1+𝑑2). See example sheet. However this is not sofor β„’(R𝑑).

Lemma 4.1. If 𝐸 βŠ† 𝑋 Γ— π‘Œ is π’œ βŠ— ℬ-measurable then for all π‘₯ ∈ 𝑋, theslice

𝐸π‘₯ = {𝑦 ∈ π‘Œ ∢ (π‘₯, 𝑦) ∈ 𝐸}

is in ℬ.

Proof. Letβ„° = {𝐸 βŠ† 𝑋 Γ— π‘Œ ∢ 𝐸π‘₯ ∈ ℬ for all π‘₯ ∈ 𝑋}.

Note that β„° contains all product sets 𝐴 Γ— 𝐡 where 𝐴 ∈ π’œ, 𝐡 ∈ ℬ. β„° is a 𝜎-algebra: if 𝐸 ∈ β„° then 𝐸𝑐 ∈ β„° and if 𝐸𝑛 ∈ β„° then ⋃ 𝐸𝑛 ∈ β„° since (𝐸𝑐)π‘₯ = (𝐸π‘₯)𝑐

and (⋃ 𝐸𝑛)π‘₯ = ⋃(𝐸𝑛)π‘₯.

Lemma 4.2. Assume (𝑋, π’œ, πœ‡) and (π‘Œ , ℬ, 𝜈) are 𝜎-finite measure spaces.Let 𝑓 ∢ 𝑋 Γ— π‘Œ β†’ [0, +∞] be π’œ βŠ— ℬ-measurable. Then

1. for all π‘₯ ∈ 𝑋, the function 𝑦 ↦ 𝑓(π‘₯, 𝑦) is ℬ-measurable.

2. for all π‘₯ ∈ 𝑋, the map π‘₯ ↦ βˆ«π‘Œ

𝑓(π‘₯, 𝑦)π‘‘πœˆ(𝑦) is π’œ-measurable.

Proof.

1. In case 𝑓 = 1𝐸 for 𝐸 ∈ π’œβŠ—β„¬ the function 𝑦 ↦ 𝑓(π‘₯, 𝑦) is just 𝑦 ↦ 1𝐸π‘₯(𝑦),

which is measurable by the previous lemma.More generally, the result is true for simple functions and thus for allmeasurable functions by taking pointwise limit.

2. By the same reduction we may assume 𝑓 = 1𝐸 for some 𝐸 ∈ π’œ βŠ— ℬ. Nowlet π‘Œ = β‹ƒπ‘šβ‰₯1 π‘Œπ‘š with 𝜈(π‘Œπ‘š) < ∞. Let

β„° = {𝐸 ∈ π’œ βŠ— ℬ ∢ π‘₯ ↦ 𝜈(𝐸π‘₯ ∩ π‘Œπ‘š) is π’œ-measurable for all π‘š}.

β„° contains all product sets 𝐸 = 𝐴 Γ— 𝐡 where 𝐴 ∈ π’œ, 𝐡 ∈ ℬ because𝜈(𝐸π‘₯ ∩ π‘Œπ‘š) = 1π‘₯βˆˆπ’œπœˆ(𝐡 ∩ π‘Œπ‘š). β„° is stable under complementation:

𝜈((𝐸𝑐)π‘₯ ∩ π‘Œπ‘š) = 𝜈(π‘Œπ‘š) βˆ’ 𝜈(π‘Œπ‘š ∩ 𝐸π‘₯)

26

Page 28: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

4 Product measures

where LHS is 𝜈-measurable. β„° is stable under disjoint countable union:let 𝐸 = ⋃𝑛β‰₯1 𝐸𝑛 where 𝐸𝑛 ∈ β„° disjoint. Then by 𝜎-additivity

𝜈(𝐸π‘₯ ∩ π‘Œπ‘š) = βˆ‘π‘›β‰₯1

𝜈((𝐸𝑛)π‘₯ ∩ π‘Œπ‘š)

which is π’œ-measurable.The product sets form a πœ‹-system and generates the product measure soby Dynkin lemma β„° = π’œ βŠ— ℬ.

Definition (product measure). Let (𝑋, π’œ, πœ‡) and (π‘Œ , ℬ, 𝜈) be measurespaces and πœ‡, 𝜈 𝜎-finite. Then there exists a unique product measure, denotedby πœ‡ βŠ— 𝜈, on π’œ βŠ— ℬ such that for all 𝐴 ∈ π’œ, 𝐡 ∈ ℬ,

(πœ‡ βŠ— 𝜈)(𝐴 Γ— 𝐡) = πœ‡(𝐴)𝜈(𝐡).

Proof. Uniqueness follows from Dynkin lemma. For existence, set

𝜎(𝐸) = βˆ«π‘‹

𝜈(𝐸π‘₯)π‘‘πœ‡(π‘₯).

𝜎 is well-defined because π‘₯ ↦ 𝜈(𝐸π‘₯) is π’œ-measurable by lemma 2. 𝜎 is countably-additive: suppose 𝐸 = ⋃𝑛β‰₯1 𝐸𝑛 where 𝐸𝑛 ∈ π’œ βŠ— ℬ disjoint, then

𝜎(𝐸) = βˆ«π‘‹

𝜈(𝐸π‘₯)π‘‘πœ‡(π‘₯) = βˆ«π‘‹

βˆ‘π‘›β‰₯1

𝜈((𝐸𝑛)π‘₯)π‘‘πœ‡π‘₯ = βˆ‘π‘›β‰₯1

βˆ«π‘‹

𝜈((𝐸𝑛)π‘₯)π‘‘πœ‡(π‘₯) = βˆ‘π‘›β‰₯2

𝜎(𝐸𝑛)

by a corollary of MCT.

Theorem 4.3 (Tonelli-Fubini). Let (𝑋, π’œ, πœ‡) and (π‘Œ , ℬ, 𝜈) be 𝜎-finite mea-sure spaces.

1. Let 𝑓 ∢ 𝑋 Γ— π‘Œ β†’ [0, +∞] be π’œ βŠ— ℬ-measurable. Then

βˆ«π‘‹Γ—π‘Œ

𝑓(π‘₯, 𝑦)𝑑(πœ‡βŠ—πœˆ) = βˆ«π‘‹

βˆ«π‘Œ

𝑓(π‘₯, 𝑦)π‘‘πœˆ(𝑦)π‘‘πœ‡(π‘₯) = βˆ«π‘‹

βˆ«π‘‹

𝑓(π‘₯, 𝑦)π‘‘πœ‡(π‘₯)π‘‘πœˆ(𝑦).

2. If 𝑓 ∢ 𝑋 Γ— π‘Œ β†’ R is πœ‡ βŠ— 𝜈-integrable then for πœ‡-almost everywhere π‘₯,𝑦 ↦ 𝑓(π‘₯, 𝑦) is 𝜈-integrable, and for 𝜈-almost everywhere 𝑦, π‘₯ ↦ 𝑓(π‘₯, 𝑦)is πœ‡-integrable and

βˆ«π‘‹Γ—π‘Œ

𝑓(π‘₯, 𝑦)𝑑(πœ‡βŠ—πœˆ) = βˆ«π‘‹

βˆ«π‘Œ

𝑓(π‘₯, 𝑦)π‘‘πœˆ(𝑦)π‘‘πœ‡(π‘₯) = βˆ«π‘‹

βˆ«π‘‹

𝑓(π‘₯, 𝑦)π‘‘πœ‡(π‘₯)π‘‘πœˆ(𝑦).

Without the nonnegativity or integrability assumption, the result is false ingeneral. For example for 𝑋 = π‘Œ = N, let π’œ = ℬ be discrete 𝜎-algebras and

27

Page 29: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

4 Product measures

πœ‡ = 𝜈 counting measure. Let 𝑓(𝑛, π‘š) = 1𝑛=π‘š βˆ’ 1𝑛=π‘š+1. Check that

βˆ‘π‘›β‰₯1

𝑓(𝑛, π‘š) = 0

βˆ‘π‘šβ‰₯1

𝑓(𝑛, π‘š) = {0 𝑛 β‰₯ 21 𝑛 = 1

soβˆ‘π‘›β‰₯1

βˆ‘π‘šβ‰₯1

𝑓(𝑛, π‘š) β‰  βˆ‘π‘šβ‰₯1

βˆ‘π‘›β‰₯1

𝑓(𝑛, π‘š).

Proof.

1. The result holds for 𝑓 = 1𝐸 where 𝐸 ∈ π’œβŠ—β„¬ by the definition of productmeasure and lemma 2, so it holds for all simple functions. Now take limitsand apply MCT.

2. Write 𝑓 = 𝑓+ βˆ’ π‘“βˆ’ and apply 1.

Note.

1. The Lebesgue measure π‘šπ‘‘ on R𝑑 is equal to π‘š1 βŠ— β‹― βŠ— π‘š1, because it istrue on boxes and extend by uniqueness of measure.

2. 𝐸 ∈ π’œ βŠ— ℬ if is πœ‡ βŠ— 𝜈-null if and only if for πœ‡-almost every π‘₯, 𝜈(𝐸π‘₯) = 0.

28

Page 30: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

5 Foundations of probability theory

5 Foundations of probability theoryModern probability theory was founded by Kolmogorov, who formulated theaxioms of probability theory in 1933 in his thesis Foundations on the Theory ofProbability. He defined a probability space to be a measure space (Ξ©, β„±,P). Theinterpretation is as follow: Ξ© is the universe of possible outcomes. However, wewouldn’t be able to assign probability to every single outcome unless the spaceis discrete. Instead we are interested in studying some subsets of Ξ©, whichare called events and contained in β„±. Finally P is a probability measure withP(Ξ©) = 1. Thus for 𝐴 ∈ β„±, P(𝐴 occurs) ∈ [0, 1]. Thus finite additivity of P saysthat if 𝐴 and 𝐡 never occurs simultaneously then P(𝐴 or 𝐡 = P(𝐴) + P(𝐡).𝜎-additivity is slightly more difficult to justify and it is perhaps to see theequivalent notion continuity: if 𝐴𝑛+1 βŠ‡ 𝐴𝑛 and ⋂𝑛β‰₯1 𝐴𝑛 = βˆ… then P(𝐴𝑛) β†’ 0as 𝑛 β†’ ∞.

Definition (probability measure, probability space). Let Ξ© be a set and β„±a 𝜎-algebra on Ξ©. A measure πœ‡ on (Ξ©, β„±) is called a probability measure ifπœ‡(Ξ©) = 1 and the measure space (Ξ©, β„±, πœ‡) is called a probability space.

Definition (random variable). A measurable function 𝑋 ∢ Ξ© β†’ R is calleda random variable.

We usually use a capital letter to denote a random variable.

Definition (expectation). If (Ξ©, β„±,P) is a probability space then the P-integral is called expectation, denoted E.

Definition (distribution/law). A random variable 𝑋 ∢ Ξ© β†’ R on a proba-bility space (Ξ©, β„±,P) determines a Borel measure πœ‡π‘‹ on R defined by

πœ‡π‘‹((βˆ’βˆž, 𝑑]) = P(𝑋 ≀ 𝑑) = P({πœ” ∈ Ξ© ∢ 𝑋(πœ”) ≀ 𝑑})

and πœ‡π‘‹ is called the distribution of 𝑋, or the law of 𝑋.

Note. πœ‡π‘‹ is the image of P under

Ξ© β†’ Rπœ” ↦ 𝑋(πœ”)

Definition (distribution function). The function

𝐹𝑋 ∢ R β†’ [0, 1]𝑑 ↦ P(𝑋 ≀ 𝑑)

is called the distribution function of 𝑋.

29

Page 31: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

5 Foundations of probability theory

Proposition 5.1. If (Ξ©, β„±,P) is a probability space and 𝑋 ∢ Ξ© β†’ R is a ran-dom variable then 𝐹𝑋 is non-decreasing, right-continuous and it determinesπœ‡π‘‹ uniquely.

Proof. Given 𝑑𝑛 ↓ 𝑑,

𝐹𝑋(𝑑𝑛) = P(𝑋 ≀ 𝑑𝑛) β†’ P( ⋂𝑛β‰₯1

{𝑋 ≀ 𝑑𝑛}) = P({𝑋 ≀ 𝑑}) = 𝐹𝑋(𝑑)

by downward monotone convergence for sets. Uniqueness follows from Dynkinlemma applied to the πœ‹-system {βˆ…} βˆͺ {(βˆ’βˆž, 𝑑]}π‘‘βˆˆR.

Conversely,

Proposition 5.2. If 𝐹 ∢ R β†’ [0, 1] is a non-decreasing right-continuousfunction with

limπ‘‘β†’βˆ’βˆž

𝐹(𝑑) = 0

lim𝑑→+∞

𝐹(𝑑) = 1

then there exists a unique probability measure πœ‡ on R such that

𝐹(𝑑) = πœ‡((βˆ’βˆž, 𝑑])

for all 𝑑 ∈ R.

Remark. The measure πœ‡ is called the Lebesgue-Stieltjes measure on R associ-ated to 𝐹. Furthermore for all π‘Ž, 𝑏 ∈ R,

πœ‡((π‘Ž, 𝑏]) = 𝐹(𝑏) βˆ’ 𝐹(π‘Ž).

We can also construct Lebesgue measure this way.

Proof. Uniqueness is the same as above. For existence, we use the lemma

Lemma 5.3. Let

𝑔 ∢ (0, 1) β†’ R𝑦 ↦ inf{π‘₯ ∈ R ∢ 𝐹 (π‘₯) β‰₯ 𝑦}

then 𝑔 is non-decreasing, left-continuous and for all π‘₯ ∈ R, 𝑦 ∈ (0, 1), 𝑔(𝑦) ≀π‘₯ if and only if 𝐹(π‘₯) β‰₯ 𝑦.

Proof. Let𝐼𝑦 = {π‘₯ ∈ R ∢ 𝐹 (π‘₯) β‰₯ 𝑦}.

Clearly if 𝑦1 β‰₯ 𝑦2 then 𝐼𝑦1βŠ† 𝐼𝑦2

so 𝑔(𝑦2) ≀ 𝑔(𝑦1) so 𝑔 is non-decreasing. 𝐼𝑦 isan interval of R because if π‘₯ > π‘₯1 and π‘₯1 ∈ 𝐼𝑦 then 𝐹(π‘₯) β‰₯ 𝐹(π‘₯1) β‰₯ 𝑦 so π‘₯ ∈ 𝐼𝑦.So 𝐼𝑦 is an interval with endpoints 𝑔(𝑦) and +∞. But 𝐹 is right-continuous so𝑔(𝑦) = min 𝐼𝑦 and the minimum is obtained. Thus 𝐼𝑦 = [𝑔(𝑦), +∞).

This means that π‘₯ β‰₯ 𝑔(𝑦) if and only if π‘₯ ∈ 𝐼𝑦 if and only if 𝐹(π‘₯) β‰₯ 𝑦.Finally for left-continuity, suppose 𝑦𝑛 ↑ 𝑦 then ⋂𝑛β‰₯1 𝐼𝑦𝑛

= 𝐼𝑦 by definitionof 𝐼𝑦 so 𝑔(𝑦𝑛) β†’ 𝑔(𝑦).

30

Page 32: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

5 Foundations of probability theory

Remark. If 𝐹 is continuous and strictly increasing then 𝑔 = 𝐹 βˆ’1.

Now back to the proposition. Set πœ‡ = π‘”βˆ—π‘š where π‘š is the Lebesgue measureon (0, 1). πœ‡ is a probability measure as 𝑔 is Borel-measurable. By the lemma

πœ‡((π‘Ž, 𝑏]) = π‘š(π‘”βˆ’1(π‘Ž, 𝑏]) = π‘š((𝐹(π‘Ž), 𝐹 (𝑏))) = 𝐹(𝑏) βˆ’ 𝐹(π‘Ž).

Proposition 5.4. If πœ‡ is a Borel probability measure on R then there existssome probability space (Ξ©, β„±,P) and a random variable 𝑋 on Ξ© such thatπœ‡π‘‹ = πœ‡.

In fact, one can even pick Ξ© = (0, 1), β„± = ℬ(0, 1) and P = π‘š, theLebesgue measure.

Proof. For the first claim set Ξ© = R, β„± = ℬ(R), P = πœ‡ and 𝑋(π‘₯) = π‘₯.For the second claim, set 𝐹(𝑑) = πœ‡((βˆ’βˆž, 𝑑]) and take 𝑋 = 𝑔 where 𝑔 is the

auxillary function defined in the previous lemma, namely

𝑋(πœ”) = inf{π‘₯ ∢ 𝐹(π‘₯) β‰₯ πœ”}.

Check that πœ‡π‘‹ = πœ‡:

πœ‡π‘‹((π‘Ž, 𝑏]) = P(𝑋 ∈ (π‘Ž, 𝑏])= π‘š({πœ” ∈ (0, 1) ∢ π‘Ž < 𝑋(𝑀) ≀ 𝑏})= π‘š({πœ” ∈ (0, 1) ∢ 𝐹 (π‘Ž) < πœ” < 𝐹(𝑏)})

Remark. If πœ‡ is a Borel probability measure on R such that πœ‡ = 𝑓𝑑𝑑 forsome 𝑓 β‰₯ 0 measurable, we say that πœ‡ has a density (with respect to Lebesguemeasure) and 𝑓 is called the density of πœ‡. Here πœ‡ = 𝑓𝑑𝑑 means that πœ‡((π‘Ž, 𝑏]) =βˆ«π‘π‘Ž

𝑓(𝑑)𝑑𝑑.

Example.1. uniform distribution on [0, 1]:

𝑓(𝑑) = 1[0,1](𝑑)𝐹(𝑑) = πœ‡((βˆ’βˆž, 𝑑] ∩ [0, 1])

2. exponential distribution of rate πœ†:

π‘“πœ†(𝑑) = πœ†π‘’βˆ’πœ†π‘‘1𝑑β‰₯0

πΉπœ†(𝑑) = βˆ«π‘‘

βˆ’βˆžπ‘“πœ†(𝑠)𝑑𝑠 = 1𝑑β‰₯0(1 βˆ’ π‘’βˆ’πœ†π‘‘)

3. Gaussian distribution with standard deviation 𝜎 and mean π‘š:

π‘“πœŽ,π‘š(𝑑) = 1√2πœ‹πœŽ2

exp(βˆ’(𝑑 βˆ’ π‘š)2

2𝜎2 )

𝐹𝜎,π‘š(𝑑) = βˆ«π‘‘

βˆ’βˆž

1√2πœ‹πœŽ2

exp(βˆ’(𝑠 βˆ’ π‘š)2

2𝜎2 )𝑑𝑠

31

Page 33: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

5 Foundations of probability theory

Definition (mean, moment, variance). If 𝑋 is a random variable then

1. E(𝑋) is called the mean,

2. E(π‘‹π‘˜) is called the π‘˜th-moment of 𝑋,

3. Var(𝑋) = E((𝑋 βˆ’ E𝑋)2) = E(𝑋2) βˆ’ E(𝑋)2 is called the variance.

Remark. Suppose 𝑓 β‰₯ 0 is measurable and 𝑋 is a random variable. Then

E(𝑓(𝑋)) = ∫R

𝑓(π‘₯)π‘‘πœ‡π‘‹(π‘₯)

where by definition of πœ‡π‘‹ = π‘‹βˆ—P.

32

Page 34: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

6 Independence

6 IndependenceIndependence is the key notion that makes probability theory distinct from(abstract) measure theory.

Definition (independence). Let (Ξ©, β„±,P) be a probability space. A se-quence of events (𝐴𝑛)𝑛β‰₯1 is called independent or mutually independent iffor all 𝐹 βŠ† N finite,

P(β‹‚π‘–βˆˆπΉ

𝐴𝑖) = βˆπ‘–βˆˆπΉ

P(𝐴𝑖).

Definition (independent 𝜎-algebra). A sequence of 𝜎-algebras (π’œπ‘›)𝑛β‰₯1where π’œπ‘› βŠ† β„± is called independent if for all 𝐴𝑛 ∈ π’œπ‘›, the family (𝐴𝑛)𝑛β‰₯1is independent.

Remark.

1. To prove that (π’œπ‘›)𝑛β‰₯1 is an independent family, it is enough to checkthe independence condition for all 𝐴𝑛’s with 𝐴𝑛 ∈ Π𝑛 where Π𝑛 is a πœ‹-system generating π’œπ‘›. The proof is an application of Dynkin lemma. Forexample for 𝜎-algebras π’œ1, π’œ2, suffices to check

P(𝐴1 ∩ 𝐴2) = P(𝐴1)P(𝐴2)

for all 𝐴1 ∈ Π1, 𝐴2 ∈ Π2. Fix 𝐴2 ∈ Π2, look at the measures

𝐴1 ↦ P(𝐴1 ∩ 𝐴2)𝐴1 ↦ P(𝐴1)P(𝐴2)

on π’œ1. They coincide on Ξ 1 by assumption and hence everywhere on π’œ1.Subsequently consider π’œ2.

Notation. Suppose 𝑋 is a random variable. Denote by 𝜎(𝑋) the smallest𝜎-subalgebra π’œ of β„± such that 𝑋 is π’œ-measurable, i.e.

𝜎(𝑋) = 𝜎({πœ” ∈ Ξ© ∢ 𝑋(πœ”) ≀ 𝑑}π‘‘βˆˆR).

Definition (independence). A sequence of random variables (𝑋𝑖)𝑖β‰₯1 iscalled independent if the sequence of 𝜎-subalgebras (𝜎(𝑋𝑖))𝑖β‰₯1 is indepen-dent.

Remark. This is equivalent to the condition that for all (𝑑𝑖)𝑖β‰₯1, for all 𝑛,

P((𝑋1 ≀ 𝑑1) ∩ β‹― ∩ (𝑋𝑛 ≀ 𝑑𝑛)) =𝑛

βˆπ‘–=1

P(𝑋𝑖 ≀ 𝑑𝑖).

Yet another equivalent formulation is

πœ‡(𝑋1,…,𝑋𝑛) =𝑛

⨂𝑖=1

πœ‡π‘‹π‘–

as Borel probability measures on R𝑛. β€œThe joint law is the same as the productof individual laws”.

33

Page 35: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

6 Independence

Note. Note that independence is a property of a family so pairwise indepen-dence is necessary but not sufficient for independence. A famous counterexampleis Berstein’s example: take 𝑋 and π‘Œ to be random variables for two independentfair coins flips. Set 𝑍 = |𝑋 βˆ’ π‘Œ |. Then 𝑍 = 0 if and only if 𝑋 = π‘Œ. Check that

P(𝑍 = 0) = P(𝑍 = 1) = 12

and each pair (𝑋, π‘Œ ), (𝑋, 𝑍) and (π‘Œ , 𝑍) is independent. But (𝑋, π‘Œ , 𝑍) is notindependent.

Proposition 6.1. If 𝑋 and π‘Œ are independent random variables, 𝑋 β‰₯0, π‘Œ β‰₯ 0 then

E(π‘‹π‘Œ ) = E(𝑋)E(π‘Œ ).

Proof. Essentially Tonelli-Fubini:

E(π‘‹π‘Œ ) = ∫R2

π‘₯π‘¦π‘‘πœ‡π‘‹,π‘Œ(π‘₯, 𝑦) = ∫R2

π‘‘πœ‡π‘‹(π‘₯)π‘‘πœ‡π‘Œ(𝑦)

= (∫R

π‘₯π‘‘πœ‡π‘‹(π‘₯)) (∫R

π‘¦π‘‘πœ‡π‘Œ(𝑦))

= E(𝑋)E(π‘Œ )

Remark. As in Tonelli-Fubini, we may require π‘‹π‘Œ to be integrable instead andthe same conclusion holds.

Example. Let Ξ© = (0, 1), β„± = ℬ(0, 1),P = π‘š the Lebesgue measure. Writethe decimal expansion of πœ” ∈ (0, 1) as

πœ” = 0.πœ€1πœ€2 …

where πœ€π‘–(πœ”) ∈ {0, … , 9}. Choose a convention so that each πœ” has a well-definedexpansion (to avoid things like 0.099 β‹― = 0.100 …). Now let 𝑋𝑛(πœ”) = πœ€π‘›(πœ”).Claim that the (𝑋𝑛)𝑛β‰₯1 are iid. random variables uniformly distributed on{0, … , 9}, where β€œiid.” stands for independently and identically distributed.

Proof. Easy check. For example 𝑋1(πœ”) = ⌊10πœ”βŒ‹ so

P(𝑋1 = 𝑖1) = 110

.

Similarly for all 𝑛

P(𝑋1 = 𝑖1, … , 𝑋𝑛 = 𝑖𝑛) = 110𝑛 , P(𝑋𝑛 = 𝑖𝑛) = 1

10so

P(𝑋1 = 𝑖1, … , 𝑋𝑛 = 𝑖𝑛) =𝑛

βˆπ‘–=1

P(π‘‹π‘˜ = π‘–π‘˜).

34

Page 36: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

6 Independence

Remark.πœ” = βˆ‘

𝑛β‰₯1

𝑋𝑛(πœ”)10𝑛

is distributed according to Lebesgue measure so if we want we can constructLebesgue measure as the law of this random variable.

Proposition 6.2 (infinite product of product measure). Let (Ω𝑖, ℱ𝑖, πœ‡π‘–)𝑖β‰₯1be a sequence of probability spaces, Ξ© = βˆπ‘–β‰₯1 Ω𝑖 and β„° be the Boolean algebraof cylinder sets, i.e. sets of the form

𝐴 Γ— βˆπ‘–β‰₯𝑛

Ω𝑖

for some 𝐴 ∈ ⨂𝑛𝑖=1 ℱ𝑖. Set β„± = 𝜎(β„°), the infinite product 𝜎-algebra. Then

there is a unique probability measure πœ‡ on (Ξ©, β„±) such that it agrees withproduct measures on all cylinder sets, i.e.

πœ‡(𝐴 Γ— βˆπ‘–>𝑛

Ω𝑖) = (𝑛

⨂𝑖=1

πœ‡π‘–)(𝐴)

for all 𝐴 ∈ ⨂𝑛𝑖=1 ℱ𝑖.

Proof. Omitted. See example sheet 3.

Lemma 6.3 (Borel-Cantelli). Let (Ξ©, β„±,P) be a probability space and (𝐴𝑛)𝑛β‰₯1a sequence of events.

1. If βˆ‘π‘›β‰₯1 P(𝐴𝑛) < ∞ then

P(lim sup𝑛

𝐴𝑛) = 0.

2. Conversely, if (𝐴𝑛)𝑛β‰₯1 are independent and βˆ‘π‘›β‰₯1 P(𝐴𝑛) = ∞ then

P(lim sup𝑛

𝐴𝑛) = 1.

Note that lim sup𝑛 𝐴𝑛 is also called 𝐴𝑛 io. meaning β€œinfinitely often”.

Proof.

1. Let π‘Œ = βˆ‘π‘›β‰₯1 1𝐴𝑛be a random variable. Then

E(π‘Œ ) = βˆ‘π‘›β‰₯1

E(1𝐴𝑛) = βˆ‘

𝑛β‰₯1P(𝐴𝑛).

Since π‘Œ β‰₯ 0, recall that we prove that E(π‘Œ ) < ∞ implies that π‘Œ < ∞almost surely, i.e. P-almost everywhere.

2. Note that(lim sup

𝑛𝐴𝑛)𝑐 = ⋃

𝑁⋂

𝑛β‰₯𝑁𝐴𝑐

𝑛

35

Page 37: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

6 Independence

so

P( ⋂𝑛β‰₯𝑁

𝐴𝑐𝑛) ≀ P(

𝑀⋂

𝑛=𝑁𝐴𝑐

𝑛)

=π‘€βˆ

𝑛=𝑁P(𝐴𝑐

𝑛) =π‘€βˆ

𝑛=𝑁(1 βˆ’ P(𝐴𝑛))

β‰€π‘€βˆ

𝑛=𝑁exp(βˆ’P(𝐴𝑛))

≀ exp(βˆ’π‘€

βˆ‘π‘›=𝑁

P(𝐴𝑛))

β†’ 0

as 𝑀 β†’ ∞. ThusP( β‹‚

𝑛β‰₯𝑁𝐴𝑐

𝑛) = 0

for all 𝑁 soP(⋃

𝑁⋂

𝑛β‰₯𝑁𝐴𝑐

𝑛) = 0.

Definition (random/stochastic process, filtration, tail 𝜎-algebra, tail event).Let (Ξ©, β„±,P) be a probability space and (𝑋𝑛)𝑛β‰₯1 a sequence of random vari-ables.

1. (𝑋𝑛)𝑛β‰₯1 is sometimes called a random process or stochastic process.

2.ℱ𝑛 = 𝜎(𝑋1, … , 𝑋𝑛) βŠ† β„±

is called the associated filtration. ℱ𝑛 βŠ† ℱ𝑛+1.

3.π’ž = β‹‚

𝑛β‰₯1𝜎(𝑋𝑛, 𝑋𝑛+1, … )

is called the tail 𝜎-algebra of the process. Its elements are called tailevents.

Example. Tail events are those not affected by the first few terms in the se-quence of random variables. For example,

{πœ” ∈ Ξ© ∢ lim𝑛

𝑋𝑛(πœ”) exists}

is a tail event, so is{πœ” ∈ Ξ© ∢ lim sup

𝑛𝑋𝑛(πœ”) β‰₯ 𝑇 }.

Theorem 6.4 (Kolmogorov 0βˆ’1 law). If (𝑋𝑛)𝑛β‰₯1 is a sequence of mutually

36

Page 38: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

6 Independence

independent random variables then for all 𝐴 ∈ π’ž,

P(𝐴) ∈ {0, 1}.Proof. Pick 𝐴 ∈ π’ž. Fix 𝑛. For all 𝐡 ∈ 𝜎(𝑋1, … , 𝑋𝑛),

P(𝐴 ∩ 𝐡) = P(𝐴)P(𝐡)

as π’ž is independent of 𝜎(𝑋1, … , 𝑋𝑛). The measures 𝐡 ↦ P(𝐴)P(𝐡) and𝐡 ↦ P(𝐴 ∩ 𝐡) coincide on each ℱ𝑛 so on ⋃𝑛β‰₯1 ℱ𝑛. Hence they coincideon 𝜎(⋃𝑛β‰₯1 ℱ𝑛) βŠ‡ π’ž so

P(𝐴) = P(𝐴 ∩ 𝐴) = P(𝐴)P(𝐴)

soP(𝐴) ∈ {0, 1}.

6.1 Useful inequalities

Proposition 6.5 (Cauchy-Schwarz). Suppose 𝑋, π‘Œ are random variablesthen

E(|π‘‹π‘Œ |) ≀ √E(𝑋2) β‹… E(π‘Œ 2).

Proof. For all 𝑑 ∈ R,

0 ≀ E((|𝑋| + 𝑑|π‘Œ |)2) = E(𝑋2) + 2𝑑E(|π‘‹π‘Œ |) + 𝑑2E(π‘Œ 2)

so viewed as a quadratic in 𝑑, the discriminant is nonpositive, i.e.

(E(|π‘‹π‘Œ |)2 βˆ’ E(𝑋)2 βˆ’ E(π‘Œ )2 ≀ 0.

Proposition 6.6 (Markov). Let 𝑋 β‰₯ 0 be a random variable. Then for all𝑑 β‰₯ 0,

𝑑P(𝑋 β‰₯ 𝑑) ≀ E(𝑋).

Proof.E(𝑋) β‰₯ E(𝑋1𝑋β‰₯𝑑) β‰₯ E(𝑑1𝑋β‰₯𝑑) = 𝑑P(𝑋 β‰₯ 𝑑)

Proposition 6.7 (Chebyshev). Let π‘Œ be a random variable with E(π‘Œ 2) < ∞,then for all 𝑑 ∈ R,

𝑑2P(|π‘Œ βˆ’ E(π‘Œ )| β‰₯ 𝑑) ≀ Var π‘Œ .

E(π‘Œ 2) < ∞ implies that E(|π‘Œ |) < ∞ by Cauchy-Schwarz, so Var π‘Œ < ∞.The converse is more subtle.

Proof. Apply Markov to 𝑋 = |π‘Œ βˆ’ E(π‘Œ )|2.

37

Page 39: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

6 Independence

Theorem 6.8 (strong law of large numbers). Let (𝑋𝑛)𝑛β‰₯1 be a sequence ofiid. random variables. Assume E(|𝑋1|) < ∞. Let

𝑆𝑛 =𝑛

βˆ‘π‘˜=1

π‘‹π‘˜,

then 1𝑛 𝑆𝑛 converges almost surely to E(𝑋1).

Proof. We prove the theorem under a stonger condition: we assume E(𝑋41) <

∞. This implies, by Cauchy-Schwarz, E(𝑋21),E(|𝑋1|) < ∞. Subsequently

E(|𝑋1|3) < ∞. The full proof is much harder but will be given later whenwe have developed enough machinery.

wlog we may assume E(𝑋1) = 0 by replacing 𝑋𝑛 with 𝑋𝑛 βˆ’ E(𝑋1). Have

E(𝑆4𝑛) = βˆ‘

𝑖,𝑗,π‘˜,β„“E(π‘‹π‘–π‘‹π‘—π‘‹π‘˜π‘‹β„“).

All terms vanish because E(𝑋𝑖) = 0 and (𝑋𝑖)𝑖β‰₯1 are independent, except forE(𝑋4

𝑖 ) and E(𝑋2𝑖 𝑋2

𝑗 ) for 𝑖 β‰  𝑗. For example,

E(𝑋𝑖𝑋3𝑗 ) = E(𝑋𝑖) β‹… E(𝑋3

𝑗 ) = 0

for 𝑖 β‰  𝑗. Thus

E(𝑆4𝑛) =

π‘›βˆ‘π‘–=1

E(𝑋4𝑖 ) + 6 βˆ‘

𝑖<𝑗E(𝑋2

𝑖 𝑋2𝑗 ).

By Cauchy-Schwarz,

E(𝑋2𝑖 𝑋2

𝑗 ) ≀ √E𝑋4𝑖 β‹… E𝑋4

𝑗 = E𝑋41

soE(𝑆4

𝑛) ≀ (𝑛 + 6 β‹… 𝑛(𝑛 βˆ’ 1)2

)E𝑋41

and asymptotically,E(𝑆𝑛

𝑛)4 = 𝑂( 1

𝑛2 )

soE(βˆ‘

𝑛β‰₯1(𝑆𝑛

𝑛)4) = βˆ‘

𝑛β‰₯1E(𝑆𝑛

𝑛)4 < ∞.

Hence βˆ‘( 𝑆𝑛𝑛 )4 < ∞ almost surely and it follows that

limπ‘›β†’βˆž

𝑆𝑛𝑛

= 0

almost surely.

Strong law of large numbers has a very important statistical implication: wecan sample the mean of larger number of iid. to detect an unknown law, at leastthe mean.

38

Page 40: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

7 Convergence of random variables

7 Convergence of random variables

Definition (weak convergence). A sequence of probability measures (πœ‡π‘›)𝑛β‰₯1on (R𝑑, ℬ(R𝑑)) is said to converge weakly to a measure πœ‡ if for all 𝑓 ∈ 𝐢𝑏(R𝑑),the set of continuous bounded functions on R𝑑,

limπ‘›β†’βˆž

πœ‡π‘›(𝑓) = πœ‡(𝑓).

Example.

1. Let πœ‡π‘› = 𝛿1/𝑛 be the Dirac mass on R𝑑, i.e. for π‘₯ ∈ R𝑑, 𝛿π‘₯ is the Borelprobability measure on R𝑑 such that

𝛿π‘₯(𝐴) = {1 π‘₯ ∈ 𝐴0 π‘₯ βˆ‰ 𝐴

then πœ‡π‘› β†’ 𝛿0.

2. Let πœ‡π‘› = 𝒩(0, 𝜎2𝑛), Gaussian distribution with standard deviation πœŽπ‘›,

where πœŽπ‘› β†’ 0, then again πœ‡π‘› β†’ 𝛿0. Indeed,

πœ‡π‘›(𝑓) = ∫ 𝑓(π‘₯)π‘‘πœ‡π‘›(π‘₯)

= ∫ 𝑓(π‘₯) 1√2πœ‹πœŽ2

𝑛exp(βˆ’ π‘₯2

2𝜎2𝑛

)𝑑π‘₯

= ∫ 𝑓(π‘₯πœŽπ‘›) 1√2πœ‹

exp(βˆ’π‘₯2

2)𝑑π‘₯

πœŽπ‘› β†’ 0 so 𝑓(π‘₯πœŽπ‘›) β†’ 𝑓(0) so by dominated convergence theorem, πœ‡π‘›(𝑓) →𝑓(0) = 𝛿0(𝑓).

Definition (convergence of random variable). A sequence (𝑋𝑛)𝑛β‰₯1 of R𝑑-valued random variables on (Ξ©, β„±,P) is said to converge to a random variable𝑋

1. almost surely iflim

π‘›β†’βˆžπ‘‹π‘›(πœ”) = 𝑋(πœ”)

for P-almost every πœ”.

2. in probability or in measure if for all πœ€ > 0,

limπ‘›β†’βˆž

P(‖𝑋𝑛 βˆ’ 𝑋‖ > πœ€) = 0.

Note that all norms on R𝑑 are equivalent so we don’t have to specifyone.

39

Page 41: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

7 Convergence of random variables

3. in distribution or in law if πœ‡π‘‹π‘›β†’ πœ‡π‘‹ weakly, where πœ‡π‘‹ = π‘‹βˆ—P is

the law of 𝑋, a Borel probability measure on R𝑑. Equivalently, for all𝑓 ∈ 𝐢𝑏(R𝑑), E(𝑓(𝑋𝑛)) β†’ E(𝑓(𝑋)).

Proposition 7.1. 1 ⟹ 2 ⟹ 3.

Proof.

1. 1 ⟹ 2:P(‖𝑋𝑛 βˆ’ 𝑋‖ > πœ€) = E(1β€–π‘‹π‘›βˆ’π‘‹β€–>πœ€)

so if 𝑋𝑛 β†’ 𝑋 almost surely then

1β€–π‘‹π‘›βˆ’π‘‹β€–>πœ€ β†’ 0

P-almost everywhere so by dominated convergence theorem P(‖𝑋𝑛 βˆ’π‘‹β€– >πœ€) β†’ 0.

2. 2 ⟹ 3: given 𝑓 ∈ 𝐢𝑏(R𝑑), need to show that πœ‡π‘‹π‘›(𝑓) β†’ πœ‡π‘‹(𝑓). But

πœ‡π‘‹π‘›(𝑓) βˆ’ πœ‡π‘‹(𝑓) = E(𝑓(𝑋𝑛) βˆ’ 𝑓(𝑋)).

To bound this, note that 𝑓 is continuous and R𝑑 is locally compact so it islocally uniformly continuous. In particular for all πœ€ > 0 exists 𝛿 > 0 suchthat if β€–π‘₯β€– < 1

πœ€ and ‖𝑦 βˆ’ π‘₯β€– < 𝛿 then |𝑓(𝑦) βˆ’ 𝑓(π‘₯)| < πœ€. Thus

|E(𝑓(𝑋𝑛) βˆ’ 𝑓(𝑋))| ≀ E(1β€–π‘‹π‘›βˆ’π‘‹β€–<𝛿1‖𝑋‖<1/πœ€ |𝑓(𝑋𝑛) βˆ’ 𝑓(𝑋)|⏟⏟⏟⏟⏟⏟⏟<πœ€

)

+ 2 β€–π‘“β€–βˆžβŸ<∞

(P(‖𝑋𝑛 βˆ’ 𝑋‖ β‰₯ 𝛿) + P(‖𝑋‖ β‰₯ 1πœ€

))

solim sup

π‘›β†’βˆž|E(𝑓(𝑋𝑛) βˆ’ 𝑓(𝑋))| ≀ πœ€ + 2β€–π‘“β€–βˆž P(‖𝑋‖ > 1

πœ€)

βŸβŸβŸβŸβŸβ†’0 as πœ€β†’0

.

which is 0 as πœ€ is arbitrary.

Remark. When 𝑑 = 1, 3 is equivalent to 𝐹𝑋𝑛(π‘₯) β†’ 𝐹𝑋(π‘₯) for all π‘₯ as 𝑛 β†’ ∞

where 𝐹𝑋 is continuous. See example sheet 3.

The strict converses do not hold but we can say something weaker.

Proposition 7.2. If 𝑋𝑛 β†’ 𝑋 in probability then there is a subsequence(π‘›π‘˜)π‘˜ such that π‘‹π‘›π‘˜

β†’ 𝑋 almost surely as π‘˜ β†’ ∞.

Proof. We know for all πœ€ > 0, P(‖𝑋𝑛 βˆ’ 𝑋‖ > πœ€) β†’ 0 as 𝑛 β†’ ∞. So for all π‘˜exists π‘›π‘˜ such that

P(β€–π‘‹π‘›π‘˜βˆ’ 𝑋‖ > 1

π‘˜) ≀ 1

2π‘˜

soβˆ‘π‘˜β‰₯1

P(β€–π‘‹π‘›π‘˜βˆ’ 𝑋‖ > 1

π‘˜) < ∞

40

Page 42: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

7 Convergence of random variables

so by the first Borel-Cantelli lemma

P(β€–π‘‹π‘›π‘˜βˆ’ 𝑋‖ > 1

π‘˜io.) = 0.

This means that with probability 1, β€–π‘‹π‘›π‘˜βˆ’ 𝑋‖ β†’ 0 as π‘˜ β†’ ∞.

Definition (convergence in mean). Let (𝑋𝑛)𝑛β‰₯1 and 𝑋 be R𝑑-valued inte-grable random variables. We say that (𝑋𝑛)𝑛 converges in mean or in 𝐿1 to𝑋 if

limπ‘›β†’βˆž

E(‖𝑋𝑛 βˆ’ 𝑋‖) = 0.

Remark.

1. If 𝑋𝑛 β†’ 𝑋 in mean then 𝑋𝑛 β†’ 𝑋 in probability by Markov inequality:

πœ€ β‹… P(‖𝑋𝑛 βˆ’ 𝑋‖ > πœ€) ≀ E(‖𝑋𝑛 βˆ’ 𝑋‖).

2. The converse is false. For example take Ξ© = (0, 1), β„± = ℬ(Ξ£) and PLebesgue measure. Let 𝑋𝑛 = 𝑛1[0, 1

𝑛 ]. 𝑋𝑛 β†’ 0 almost surely but E𝑋𝑛 = 1.

When does convergence in probability imply convergence in mean? We needsome kind of domination assumption.

Definition (uniformly integrable). A sequence of random variables (𝑋𝑛)𝑛β‰₯1is uniformly integrable if

limπ‘€β†’βˆž

lim supπ‘›β†’βˆž

E(‖𝑋𝑛‖1‖𝑋𝑛‖β‰₯𝑀) = 0.

Remark. If (𝑋𝑛)𝑛β‰₯1 are dominated, namely exists an integrable random vari-able π‘Œ β‰₯ 0 such that ‖𝑋𝑛‖ ≀ π‘Œ for all 𝑛 then (𝑋𝑛)𝑛 is uniformly integrable bydominated convergence theorem:

E(‖𝑋𝑛‖1‖𝑋𝑛‖β‰₯𝑀) ≀ E(π‘Œ1π‘Œ β‰₯𝑀) β†’ 0

as 𝑀 β†’ ∞.

Theorem 7.3. Let (𝑋𝑛)𝑛β‰₯1 be a sequence of R𝑑-valued integrable randomvariable. Let 𝑋 be another random variable. Then TFAE:

1. 𝑋 is integrable and 𝑋𝑛 β†’ 𝑋 in mean,

2. (𝑋𝑛)𝑛β‰₯1 is uniformly integrable and 𝑋𝑛 β†’ 𝑋 in probability.

Proof.

β€’ 1 ⟹ 2: Left to show uniform integrability:

E(‖𝑋𝑛‖1‖𝑋𝑛‖β‰₯𝑀) ≀ E(‖𝑋𝑛 βˆ’ 𝑋‖1‖𝑋𝑛‖β‰₯𝑀) + E(‖𝑋‖1‖𝑋𝑛‖β‰₯𝑀)≀ E(‖𝑋𝑛 βˆ’ 𝑋‖) + E(‖𝑋‖1‖𝑋𝑛‖β‰₯𝑀(1‖𝑋‖≀ 𝑀

2+ 1‖𝑋‖> 𝑀

2))

≀ E(‖𝑋𝑛 βˆ’ 𝑋‖) + E(‖𝑋‖1β€–π‘‹π‘›βˆ’π‘‹β€–β‰₯ 𝑀2

1‖𝑋‖≀ 𝑀2

) + E(‖𝑋‖1‖𝑋‖β‰₯ 𝑀2

)

≀ E(‖𝑋𝑛 βˆ’ 𝑋‖) + 𝑀2P(‖𝑋𝑛 βˆ’ 𝑋‖ β‰₯ 𝑀

2) + E(‖𝑋‖1‖𝑋‖β‰₯ 𝑀

2)

41

Page 43: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

7 Convergence of random variables

Take lim sup,

lim supπ‘›β†’βˆž

E(‖𝑋𝑛‖1‖𝑋𝑀‖β‰₯𝑛) ≀ 0 + 0 + E(‖𝑋‖1‖𝑋‖β‰₯ 𝑀2

) β†’ 0

by dominated convergence theorem.

β€’ 2 ⟹ 1: Prove first that 𝑋 is integrable. By the previous proposition,we can find a subsequence (π‘›π‘˜)π‘˜ such that π‘‹π‘›π‘˜

β†’ 𝑋 almost surely. ByFatou’s lemma,

E(‖𝑋‖1‖𝑋‖β‰₯𝑀) ≀ lim infπ‘˜β†’0

E(β€–π‘‹π‘›π‘˜β€–1β€–π‘‹π‘›π‘˜

β€–β‰₯𝑀)

which goes to 0 as 𝑀 β†’ ∞ by uniform integrability assumption. Thus

E(‖𝑋‖) ≀ 𝑀 + E(‖𝑋‖1‖𝑋‖β‰₯𝑀) < ∞

for 𝑀 sufficiently large. Thus 𝑋 is integrable.To show convergence in mean, we use the same trick of spliting into smalland big parts.

E(‖𝑋𝑛 βˆ’ 𝑋‖) = E((1β€–π‘‹π‘›βˆ’π‘‹β€–β‰€πœ€ + 1β€–π‘‹π‘›βˆ’π‘‹β€–>πœ€)‖𝑋𝑛 βˆ’ 𝑋‖)≀ πœ€ + E(1β€–π‘‹π‘›βˆ’π‘‹β€–>πœ€β€–π‘‹π‘› βˆ’ 𝑋‖(1‖𝑋𝑛‖≀𝑀 + 1‖𝑋𝑛‖>𝑀))≀ πœ€ + † + ‑

where

† = E(‖𝑋𝑛 βˆ’ 𝑋‖1β€–π‘‹π‘›βˆ’π‘‹β€–>πœ€1‖𝑋𝑛‖≀𝑀)≀ E((𝑀 + ‖𝑋‖)1β€–π‘‹π‘›βˆ’π‘‹β€–>πœ€(1‖𝑋‖≀𝑀 + 1‖𝑋‖>𝑀)≀ 2𝑀P(‖𝑋𝑛 βˆ’ 𝑋‖ > πœ€) + 2E(‖𝑋‖1‖𝑋‖>𝑀)

solim sup

π‘›β†’βˆžβ€  ≀ 2E(‖𝑋‖1‖𝑋‖>𝑀) β†’ 0

as 𝑀 β†’ ∞. On the other hand

‑ = E(‖𝑋𝑛 βˆ’ 𝑋‖1β€–π‘‹π‘›βˆ’π‘‹β€–>πœ€1‖𝑋𝑛‖>𝑀)≀ E((‖𝑋𝑛‖ + ‖𝑋‖)1‖𝑋𝑛‖β‰₯𝑀)≀ E(‖𝑋𝑛‖1‖𝑋𝑛‖β‰₯𝑀 + ‖𝑋‖1‖𝑋‖>𝑀 + ‖𝑋‖1‖𝑋𝑛‖β‰₯𝑀1‖𝑋‖≀𝑀)≀ 2E(‖𝑋𝑛‖1‖𝑋𝑛‖β‰₯𝑀) + E(‖𝑋‖1‖𝑋‖>𝑀)

taking lim sup,

lim supπ‘›β†’βˆž

‑ ≀ 2 lim supπ‘›β†’βˆž

E(‖𝑋𝑛‖1‖𝑋𝑛‖β‰₯𝑀) + E(‖𝑋‖1‖𝑋‖>𝑀) β†’ 0 + 0

as 𝑀 β†’ ∞.

42

Page 44: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

7 Convergence of random variables

Definition. We say that a sequence of random variables (𝑋𝑛)𝑛β‰₯1 is boundedin 𝐿𝑝 if there exists 𝐢 > 0 such that E(‖𝑋𝑛‖𝑝) ≀ 𝐢 for all 𝑛.

Proposition 7.4. If 𝑝 > 1 and (𝑋𝑛)𝑛β‰₯1 is bounded in 𝐿𝑝 then (𝑋𝑛)𝑛β‰₯1 isuniformly integrable.

Proof.

π‘€π‘βˆ’1E(‖𝑋𝑛‖1‖𝑋𝑛‖β‰₯𝑀) ≀ E(‖𝑋𝑛‖𝑝1‖𝑋𝑛‖β‰₯𝑀) ≀ E(‖𝑋𝑛‖𝑝) ≀ 𝐢

solim sup

π‘›β†’βˆžE(‖𝑋𝑛‖1‖𝑋𝑛‖β‰₯𝑀) ≀ 𝐢

π‘€π‘βˆ’1 β†’ 0

as 𝑀 β†’ ∞.

This provides a sufficient condition for uniform integrability and thus con-vergnce in mean.

43

Page 45: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

8 𝐿𝑝 spaces

8 𝐿𝑝 spacesRecall that πœ‘ ∢ 𝐼 β†’ R is convex means that for all π‘₯, 𝑦 ∈ 𝐼, for all 𝑑 ∈ [0, 1],

πœ‘(𝑑π‘₯ + (1 βˆ’ 𝑑)𝑦) ≀ π‘‘πœ‘(π‘₯) + (1 βˆ’ 𝑑)πœ‘(𝑦).

Proposition 8.1 (Jensen inequality). Let 𝐼 be an open interval of R andπœ‘ ∢ 𝐼 β†’ R a convex function. Let 𝑋 be a random variable (Ξ©, β„±,P). Assume𝑋 is integrable and takes values in 𝐼. Then

E(πœ‘(𝑋)) β‰₯ πœ‘(E(𝑋)).

Remark.

1. As 𝑋 ∈ 𝐼 almost surely and 𝐼 is an interval, have E(𝑋) ∈ 𝐼.

2. We’ll show that πœ‘(𝑋)βˆ’ is integrable so

E(πœ‘(𝑋)) = E(πœ‘(𝑋)+) βˆ’ E(πœ‘(𝑋)βˆ’)

with the possibility that both sides are infinity.

Lemma 8.2. TFAE:

1. πœ‘ is convex,

2. there exists a family β„± of affine functions (π‘₯ ↦ π‘Žπ‘₯ + 𝑏) such thatπœ‘ = supβ„“βˆˆβ„± β„“ on 𝐼.

Proof.

1. 2 ⟹ 1: every β„“ is convex and the supremum of β„“ is convex:

β„“(𝑑π‘₯ + (1 βˆ’ 𝑑)𝑦) = 𝑑ℓ(π‘₯) + (1 βˆ’ 𝑑)β„“(𝑦) ≀ 𝑑 supβ„“βˆˆβ„±

β„“(π‘₯) + (1 βˆ’ 𝑑) supβ„“βˆˆβ„±

β„“(𝑦)

so

πœ‘(𝑑π‘₯ + (1 βˆ’ 𝑑)𝑦) = supβ„“βˆˆβ„±

β„“(𝑑π‘₯ + (1 βˆ’ 𝑑)𝑦)

≀ 𝑑 supβ„“βˆˆβ„±

β„“(π‘₯) + (1 βˆ’ 𝑑) supβ„“βˆˆβ„±

β„“(𝑦)

= π‘‘πœ‘(π‘₯) + (1 βˆ’ 𝑑)πœ‘(𝑦)

2. We need to show that for all π‘₯0 ∈ 𝐼 we can find an affine function

β„“π‘₯0(π‘₯) = πœƒπ‘₯0

(π‘₯ βˆ’ π‘₯0) + πœ‘(π‘₯0),

where πœƒπ‘₯0is morally the slope at π‘₯0, such that πœ‘(π‘₯) β‰₯ β„“π‘₯0

(π‘₯) for all π‘₯ ∈ 𝐼.Then have πœ‘ = supπ‘₯0∈𝐼 β„“π‘₯0

.

To find πœƒπ‘₯0observe that for all π‘₯ < π‘₯0 < 𝑦 where π‘₯, 𝑦 ∈ 𝐼, have

πœ‘(π‘₯0) βˆ’ πœ‘(π‘₯)π‘₯0 βˆ’ π‘₯

≀ πœ‘(𝑦) βˆ’ πœ‘(π‘₯0)𝑦 βˆ’ π‘₯0

.

44

Page 46: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

8 𝐿𝑝 spaces

Indeed this is the convexity of πœ‘ on [π‘₯, 𝑦] with 𝑑 = π‘₯0βˆ’π‘₯π‘¦βˆ’π‘₯ . This holds for

all π‘₯ < π‘₯0, 𝑦 > π‘₯0 so there exists πœƒ ∈ R such that

πœ‘(π‘₯0) βˆ’ πœ‘(π‘₯)π‘₯0 βˆ’ π‘₯

≀ πœƒ ≀ πœ‘(𝑦) βˆ’ πœ‘(π‘₯0)𝑦 βˆ’ π‘₯0

.

Then just set β„“π‘₯0(π‘₯) = πœƒ(π‘₯ βˆ’ π‘₯0) + πœ‘(π‘₯0). By construction πœ‘(π‘₯) β‰₯ β„“π‘₯0

(π‘₯)for all π‘₯ ∈ 𝐼.

Proof of Jensen inequality. Let πœ‘(π‘₯) = supβ„“βˆˆβ„± β„“(π‘₯) where β„“ affine. then

E(πœ‘(𝑋)) β‰₯ E(β„“(𝑋)) = β„“(E(𝑋))

for all β„“ ∈ β„± so take supremum,

E(πœ‘(𝑋)) β‰₯ supβ„“βˆˆβ„±

β„“(E(𝑋)) = πœ‘(E(𝑋)).

Also for the remark,

βˆ’πœ‘ = βˆ’ supβ„“βˆˆβ„±

β„“ = infβ„“βˆˆβ„±

(βˆ’β„“)

so πœ‘βˆ’ = (βˆ’πœ‘)+ ≀ |β„“| for all β„“ ∈ β„±. Then

πœ‘(𝑋)βˆ’ ≀ |β„“(𝑋)| ≀ |π‘Ž||𝑋| + |𝑏|.

As 𝑋 is integrable, πœ‘(𝑋)βˆ’ is integrable.

Jensen inequality is for probability space only. The following applies to allmeasure spaces.

Proposition 8.3 (Minkowski inquality). Let (𝑋, π’œ, πœ‡) be a measure spaceand 𝑓, 𝑔 measurable functions on 𝑋. Let 𝑝 ∈ [1, ∞) and define the 𝑝-norm

‖𝑓‖𝑝 = (βˆ«π‘‹

|𝑓|π‘π‘‘πœ‡)1/𝑝

.

Then‖𝑓 + 𝑔‖𝑝 ≀ ‖𝑓‖𝑝 + ‖𝑔‖𝑝.

Proof. wlog assume ‖𝑓‖𝑝, ‖𝑓‖𝑝 β‰  0. Need to show

βˆ₯‖𝑓‖𝑝

‖𝑓‖𝑝 + ‖𝑔‖𝑝

𝑓‖𝑓‖ 𝑝

+‖𝑔‖𝑝

‖𝑓‖𝑝 + ‖𝑔‖𝑝

𝑔‖𝑔‖𝑝

βˆ₯𝑝

≀ 1.

Suffice to show for all 𝑑 ∈ [0, 1], for all 𝐹, 𝐺 measurable such that ‖𝐹‖𝑝 = ‖𝐺‖𝑝 =1, have

‖𝑑𝐹 + (1 βˆ’ 𝑑)𝐺‖𝑝 ≀ 1β€œthe unit ball is convex”. For this note that

[0, +∞) β†’ [0, +∞)π‘₯ ↦ π‘₯𝑝

45

Page 47: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

8 𝐿𝑝 spaces

is convex if 𝑝 β‰₯ 1 so

|𝑑𝐹 + (1 βˆ’ 𝑑)𝐺|𝑝 ≀ 𝑑|𝐹 |𝑝 + (1 βˆ’ 𝑑)|𝐺|𝑝

and∫

𝑋|𝑑𝐹 + (1 βˆ’ 𝑑)𝐺|π‘π‘‘πœ‡ ≀ 𝑑 ∫

𝑋|𝐹 |π‘π‘‘πœ‡

⏟⏟⏟⏟⏟=1

+(1 βˆ’ 𝑑) βˆ«π‘‹

|𝐺|π‘π‘‘πœ‡βŸβŸβŸβŸβŸ

=1

= 1.

Proposition 8.4 (HΓΆlder inequality). Suppose (𝑋, π’œ, πœ‡) is a measure spaceand let 𝑓, 𝑔 be measurable functions on 𝑋. Given 𝑝, π‘ž ∈ (1, ∞) such that1𝑝 + 1

π‘ž = 1,

βˆ«π‘‹

|𝑓𝑔|π‘‘πœ‡ ≀ (βˆ«π‘‹

|𝑓|π‘π‘‘πœ‡)1/𝑝

(βˆ«π‘‹

|𝑔|π‘žπ‘‘πœ‡)1/π‘ž

with equality if and only if there exists (𝛼, 𝛽) β‰  (0, 0) such that 𝛼|𝑓|𝑝 = 𝛽|𝑔|π‘žπœ‡-almost everywhere.

Lemma 8.5 (Young inequality). For all 𝑝, π‘ž ∈ (1, ∞) such that 1𝑝 + 1

π‘ž = 1,for all π‘Ž, 𝑏 β‰₯ 0, have

π‘Žπ‘ ≀ π‘Žπ‘

𝑝+ π‘π‘ž

π‘ž.

Proof of HΓΆlder inequality. wlog assume ‖𝑓‖𝑝, β€–π‘”β€–π‘ž β‰  0. By scaling by factors(𝛼, 𝛽) β‰  (0, 0) wlog ‖𝑓‖𝑝 = β€–π‘”β€–π‘ž = 1. Then by Young inequality,

|𝑓𝑔| ≀ 1𝑝

|𝑓|𝑝 + 1π‘ž

|𝑔|π‘ž

so∫

𝑋|𝑓𝑔|π‘‘πœ‡ ≀ 1

π‘βˆ«

𝑋|𝑓|π‘π‘‘πœ‡ + 1

π‘žβˆ«

𝑋|𝑔|π‘žπ‘‘πœ‡ = 1

𝑝+ 1

π‘ž= 1.

Remark. Apply Jensen inequality to πœ‘(π‘₯) = π‘₯𝑝′/𝑝 for 𝑝′ > 𝑝, we have

E(|𝑋|𝑝)1/𝑝 ≀ E(|𝑋|𝑝′)1/𝑝′ ,

so the function 𝑝 ↦ E(|𝑋|𝑝)1/𝑝 is non-decreasing. This can be used, for example,to show that if 𝑋 has finite 𝑝′th moment then it has finite 𝑝th moment for 𝑝′ β‰₯ 𝑝.

Definition. Let (𝑋, π’œ, πœ‡) be a measure space.

β€’ For 𝑝 β‰₯ 1,

ℒ𝑝(𝑋, π’œ, πœ‡) = {𝑓 ∢ 𝑋 β†’ R measurable such that |𝑓|𝑝 is πœ‡-integrable}.

β€’ For 𝑝 = ∞,

β„’βˆž(𝑋, π’œ, πœ‡) = {𝑓 ∢ 𝑋 β†’ R measurable such that essup |𝑓| < ∞}

46

Page 48: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

8 𝐿𝑝 spaces

where

essup |𝑓| = inf{𝑑 ∢ |𝑓(π‘₯)| ≀ 𝑑 for πœ‡-almost every π‘₯}.

Lemma 8.6. ℒ𝑝(𝑋, π’œ, πœ‡) is an R-vector space.

Proof. For 𝑝 < ∞ use Minkowski inequality. Similar we can check that ‖𝑓 +π‘”β€–βˆž ≀ β€–π‘“β€–βˆž + β€–π‘”β€–βˆž for all 𝑓, 𝑔.

Definition. We say 𝑓 and 𝑔 are πœ‡-equivalent and write 𝑓 β‰‘πœ‡ 𝑔 if for πœ‡-almost every π‘₯, 𝑓(π‘₯) = 𝑔(π‘₯).

Check that this is an equvalence relation stable under addition and multi-plication.

Definition (𝐿𝑝-space). Define

𝐿𝑝(𝑋, π’œ, πœ‡) = ℒ𝑝(𝑋, π’œ, πœ‡)/ β‰‘πœ‡

and if [𝑓] denotes the equivalence class of 𝑓 under β‰‘πœ‡ we define

β€–[𝑓]‖𝑝 = ‖𝑓‖𝑝.

Proposition 8.7. For 𝑝 ∈ [1, ∞], 𝐿𝑝(𝑋, π’œ, πœ‡) is a normed vector spaceunder ‖⋅‖𝑝 and it is complete, i.e. it is a Banach space.

Proof. If 𝑓 β‰‘πœ‡ 𝑔 then ‖𝑓‖𝑝 = ‖𝑔‖𝑝 so ‖⋅‖𝑝 on 𝐿𝑝 is well-defined. Triangleinequality follows from Minkowski inequality and linearity is obvious so ‖⋅‖𝑝is indeed a norm.

For completeness, pick (𝑓𝑛)𝑛 a Cauchy sequence in ℒ𝑝(𝑋, π’œ, πœ‡). Need toshow that there exists 𝑓 ∈ ℒ𝑝 such that ‖𝑓𝑛 βˆ’ 𝑓‖𝑝 β†’ 0 as 𝑛 β†’ ∞. This thenimplies that [𝑓𝑛] β†’ [𝑓] in 𝐿𝑝.

We can extract a subsequence π‘›π‘˜ ↑ ∞ such that β€–π‘“π‘›π‘˜+1βˆ’ π‘“π‘›π‘˜

‖𝑝 ≀ 2βˆ’π‘˜. Let

𝑆𝐾 =𝐾

βˆ‘π‘˜=1

|π‘“π‘›π‘˜+1βˆ’ π‘“π‘›π‘˜

|

then

‖𝑆𝐾‖𝑝 ≀𝐾

βˆ‘π‘˜=1

β€–π‘“π‘›π‘˜+1βˆ’ π‘“π‘›π‘˜

‖𝑝 ≀𝐾

βˆ‘π‘˜=1

2βˆ’π‘˜ ≀ 1

so by monotone convergence,

limπΎβ†’βˆž

βˆ«π‘‹

|𝑆𝐾|π‘π‘‘πœ‡ = βˆ«π‘‹

|π‘†βˆž|π‘π‘‘πœ‡,

i.e. π‘†βˆž ∈ ℒ𝑝. In particular for πœ‡-almost everywhere π‘₯, |π‘†βˆž(π‘₯)| < ∞, i.e.

βˆ‘π‘˜β‰₯1

|π‘“π‘›π‘˜+1(π‘₯) βˆ’ π‘“π‘›π‘˜

(π‘₯)| < ∞.

47

Page 49: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

8 𝐿𝑝 spaces

Hence (π‘“π‘›π‘˜(π‘₯))π‘˜ is a Cauchy sequence in R. By completeness of R, the limit

exists and set 𝑓(π‘₯) to be it. When this limit does not exist set 𝑓(π‘₯) = 0.We then have, in case 𝑝 < ∞, by Fatou’s lemma

‖𝑓𝑛 βˆ’ 𝑓‖𝑝 ≀ lim infπ‘˜β†’βˆž

‖𝑓𝑛 βˆ’ π‘“π‘›π‘˜β€–π‘ ≀ πœ€

for any πœ€ for 𝑛 sufficiently large. Thus

limπ‘›β†’βˆž

‖𝑓𝑛 βˆ’ 𝑓‖𝑝 = 0.

When 𝑝 = ∞, we use the fact that if 𝑓𝑛 β†’ 𝑓 πœ‡-almost everywhere then

β€–π‘“β€–βˆž ≀ lim infπ‘›β†’βˆž

β€–π‘“π‘›β€–βˆž.

Proof. Let 𝑑 > lim supπ‘›β†’βˆžβ€–π‘“π‘›β€–βˆž. Then exists π‘›π‘˜ ↑ ∞ such that

β€–π‘“π‘›π‘˜β€–βˆž = essup |π‘“π‘›π‘˜

| = inf{𝑠 β‰₯ 0 ∢ |π‘“π‘›π‘˜(π‘₯)| ≀ 𝑠 for πœ‡-almost every π‘₯} < 𝑑

for all π‘˜. Thus for all π‘˜, for πœ‡-almost every π‘₯, |π‘“π‘›π‘˜(π‘₯)| < 𝑑. But by 𝜎-additivity

of πœ‡ we can swap the quantifiers, i.e. for πœ‡-almost every π‘₯, for all π‘₯, |π‘“π‘›π‘˜(π‘₯)| < 𝑑.

Thus for πœ‡-almost every π‘₯, ‖𝑓(π‘₯)β€–βˆž ≀ 𝑑.

Proposition 8.8 (approximation by simple functions). Let 𝑝 ∈ [1, ∞). Let𝑉 be the linear span of all simple functions on 𝑋. Then 𝑉 ∩ ℒ𝑝 is dense inℒ𝑝.

Proof. Note that 𝑔 ∈ ℒ𝑝 implies 𝑔+, π‘”βˆ’ ∈ ℒ𝑝. Thus by writing 𝑓 = 𝑓+ βˆ’ π‘“βˆ’

and using Minkowski inequality, suffice to show 𝑓 β‰₯ 0 is the limit of a sequenceof simple functions.

Recall there exist simple functions 0 ≀ 𝑔𝑛 ≀ 𝑓 such that 𝑔𝑛(π‘₯) ↑ 𝑓(π‘₯) forπœ‡-almost every π‘₯. Then

limπ‘›β†’βˆž

‖𝑔𝑛 βˆ’ 𝑓‖𝑝𝑝 = lim

π‘›β†’βˆžβˆ«

𝑋|𝑔𝑛 βˆ’ 𝑓|π‘π‘‘πœ‡ = 0

by dominated convergence theorem (|𝑔𝑛 βˆ’ 𝑓| ≀ 𝑔𝑛 + 𝑓 ≀ 2𝑓 so 𝑔𝑛 βˆ’ 𝑓 is πœ‡-integrable).

Remark. When 𝑋 = R𝑑, π’œ = ℬ(R𝑑) and πœ‡ is the Lebesgue measure, 𝐢𝑐(R𝑑),the space of continuous functions with compact support is dense in ℒ𝑝(𝑋, π’œ, πœ‡)when 𝑝 ∈ [1, ∞) (this does not hold for 𝑝 = ∞: a constant nonzero function hasno noncompact support). See example sheet. In fact, 𝐢∞

𝑐 (R𝑑) suffices.

48

Page 50: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

9 Hilbert space and 𝐿2-methods

9 Hilbert space and 𝐿2-methods

Definition (inner product). Let 𝑉 be a complex vector space. A Hermitianinner product on 𝑉 is a map

𝑉 Γ— 𝑉 β†’ C(π‘₯, 𝑦) ↦ ⟨π‘₯, π‘¦βŸ©

such that

1. βŸ¨π›Όπ‘₯ + 𝛽𝑦, π‘§βŸ© = π›ΌβŸ¨π‘₯, π‘§βŸ© + π›½βŸ¨π‘¦, π‘§βŸ© for all 𝛼, 𝛽 ∈ C, for all π‘₯, 𝑦, 𝑧 ∈ 𝑉.

2. βŸ¨π‘¦, π‘₯⟩ = ⟨π‘₯, π‘¦βŸ©.

3. ⟨π‘₯, π‘₯⟩ ∈ R and ⟨π‘₯, π‘₯⟩ β‰₯ 0, with equality if and only if π‘₯ = 0.

Definition (Hermitian norm). The Hermitian norm is defined as β€–π‘₯β€– =√⟨π‘₯, π‘₯⟩.

Lemma 9.1. Properties of norm:

1. linearity: β€–πœ†π‘₯β€– = |πœ†|β€–π‘₯β€– for all πœ† ∈ C, π‘₯ ∈ 𝑉,

2. Cauchy-Schwarz: |⟨π‘₯, π‘¦βŸ©| ≀ β€–π‘₯β€– β‹… β€–πœ“β€–,

3. triangle inequality: β€–π‘₯ + 𝑦‖ ≀ β€–π‘₯β€– + ‖𝑦‖

4. parallelogram identity: β€–π‘₯ + 𝑦‖2 + β€–π‘₯ βˆ’ 𝑦‖2 = 2(β€–π‘₯β€–2 + ‖𝑦‖2)

Proof. Exercise. For reference see author’s notes on IID Linear Analysis.

Corollary 9.2. (𝑉 , β€–β‹…β€–) is a normed vector space.

Definition (Hilbert space). We say (𝑉 , β€–β‹…β€–) is a Hilbert space if it is com-plete.

Example. Let 𝑉 = 𝐿2(𝑋, π’œ, πœ‡) where (𝑋, π’œ, πœ‡) is a measure space. Then wecan define

βŸ¨π‘“, π‘”βŸ© = βˆ«π‘‹

π‘“π‘”π‘‘πœ‡

which is well-defined (i.e. finite) by Cauchy-Schwarz. The axioms are easy tocheck, with positive-definiteness given by

0 = βŸ¨π‘“, π‘“βŸ© = βˆ«π‘‹

|𝑓|2π‘‘πœ‡

if and only if 𝑓 = 0 πœ‡-almost everywhere so 𝑓 = 0.

49

Page 51: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

9 Hilbert space and 𝐿2-methods

Proposition 9.3. Let 𝐻 be a Hilbert space and let π’ž be a closed convexsubset of 𝐻. Then for all π‘₯ ∈ 𝐻, there exists unique 𝑦 ∈ π’ž such that

β€–π‘₯ βˆ’ 𝑦‖ = 𝑑(π‘₯, π’ž)

where by definition𝑑(π‘₯, π’ž) = inf

π‘βˆˆπ’žβ€–π‘₯ βˆ’ 𝑐‖.

This 𝑦 is called the orthogonal projection of π‘₯ on π’ž.

Proof. Let 𝑐𝑛 ∈ π’ž be a sequence such that β€–π‘₯ βˆ’ 𝑐𝑛‖ β†’ 𝑑(π‘₯, π’ž). Let’s show that(𝑐𝑛)𝑛 is a Cauchy sequence. By parallelogram identity,

βˆ₯π‘₯ βˆ’ 𝑐𝑛2

+ π‘₯ βˆ’ π‘π‘š2

βˆ₯2

+ βˆ₯π‘₯ βˆ’ 𝑐𝑛2

βˆ’ π‘₯ βˆ’ π‘π‘š2

βˆ₯2

= 12

(β€–π‘₯ βˆ’ 𝑐𝑛‖2 + β€–π‘₯ βˆ’ π‘π‘šβ€–2)

soβˆ₯βˆ₯βˆ₯βˆ₯π‘₯ βˆ’ 𝑐𝑛 + π‘π‘š

2βŸβˆˆπ’ž

βˆ₯βˆ₯βˆ₯βˆ₯

2

⏟⏟⏟⏟⏟⏟⏟β‰₯𝑑(π‘₯,π’ž)2

+14

‖𝑐𝑛 βˆ’ π‘π‘šβ€–2 = 12

(β€–π‘₯ βˆ’ 𝑐𝑛‖2 + β€–π‘₯ βˆ’ π‘π‘šβ€–2)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

→𝑑(π‘₯,π’ž)2

solim

𝑛,π‘šβ†’βˆžβ€–π‘π‘› βˆ’ π‘π‘šβ€– = 0

i.e. (𝑐𝑛)𝑛 is Cauchy. By completeness exist limπ‘›β†’βˆž 𝑐𝑛 = 𝑦 ∈ 𝐻. As π’ž is closed,𝑦 ∈ π’ž. As β€–π‘₯ βˆ’ 𝑐𝑛‖ β†’ 𝑑(π‘₯, π’ž), β€–π‘₯ βˆ’ 𝑦‖ β†’ 𝑑(π‘₯, π’ž). This shows existence of 𝑦.

For uniqueness, use parallelogram identity

βˆ₯π‘₯ βˆ’ 𝑦 + 𝑦′

2βˆ₯2

⏟⏟⏟⏟⏟β‰₯𝑑(π‘₯,π’ž)

+14

‖𝑦 βˆ’ 𝑦′‖2 = 12

(β€–π‘₯ βˆ’ 𝑦‖2 + β€–π‘₯ βˆ’ 𝑦′‖2) = 𝑑(π‘₯, π’ž)2

so ‖𝑦 βˆ’ 𝑦′‖ = 0.

Corollary 9.4. Suppose 𝑉 ≀ 𝐻 is a closed subspace of a Hilbert space 𝐻.Then

𝐻 = 𝑉 βŠ• 𝑉 βŸ‚

where𝑉 βŸ‚ = {π‘₯ ∈ 𝐻 ∢ ⟨π‘₯, π‘£βŸ© = 0 for all 𝑣 ∈ 𝑉 }

is the orthogonal of 𝑉.

Proof. 𝑉 ∩ 𝑉 βŸ‚ = 0 by positivity of inner product. If π‘₯ ∈ 𝐻 then there exists aunique 𝑦 ∈ 𝑉 such that β€–π‘₯ βˆ’ 𝑦‖ = 𝑑(π‘₯, 𝑦). Need to show that π‘₯ βˆ’ 𝑦 ∈ 𝑉 βŸ‚.

For all 𝑧 ∈ 𝑉,β€–π‘₯ βˆ’ 𝑦 βˆ’ 𝑧‖ β‰₯ β€–π‘₯ βˆ’ 𝑦‖

as 𝑦 + 𝑧 ∈ 𝑉. Thus

β€–π‘₯ βˆ’ 𝑦‖2 + ‖𝑧‖ βˆ’ 2 Re⟨π‘₯ βˆ’ 𝑦, π‘§βŸ© β‰₯ β€–π‘₯ βˆ’ 𝑦‖2.

50

Page 52: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

9 Hilbert space and 𝐿2-methods

Rearrange,2 Re⟨π‘₯ βˆ’ 𝑦, π‘§βŸ© ≀ ‖𝑧‖2

for all 𝑧 ∈ 𝑉. Now substitute 𝑑𝑧 for 𝑧 where 𝑑 ∈ R+, have

𝑑 β‹… 2 Re π‘₯ βˆ’ 𝑦, 𝑧 ≀ 𝑑2‖𝑧‖2.

For 𝑑 = 0,Re⟨π‘₯ βˆ’ 𝑦, π‘§βŸ© ≀ 0

Similarly replace 𝑧 by βˆ’π‘§ to conclude Re⟨π‘₯βˆ’π‘¦, π‘§βŸ© = 0. Finally replace 𝑧 by π‘’π‘–πœƒπ‘§to have ⟨π‘₯ βˆ’ 𝑦, π‘§βŸ© = 0 for all 𝑧. Thus π‘₯ βˆ’ 𝑦 ∈ 𝑉 βŸ‚.

Definition (bounded linear form). A linear form β„“ ∢ 𝐻 β†’ C is bounded ifthere exists 𝐢 > 0 such that |β„“(π‘₯)| ≀ 𝐢‖π‘₯β€– for all π‘₯ ∈ 𝐻.

Remark. β„“ bounded is equivalent to β„“ continuous.

Theorem 9.5 (Riesz representation theorem). Let 𝐻 be a Hilbert space.For every bounded linear form β„“ there exists 𝑣0 ∈ 𝐻 such that

β„“(π‘₯) = ⟨π‘₯, 𝑣0⟩

for all π‘₯ ∈ 𝐻.

Proof. By boundedness of β„“, ker β„“ is closed so write

𝐻 = ker β„“ βŠ• (ker β„“)βŸ‚.

If β„“ = 0 then just pick 𝑣0 = 0. Otherwise pick π‘₯0 ∈ (ker β„“)βŸ‚ \ {0}. But (ker β„“)βŸ‚

is spanned by π‘₯0: indeed for any π‘₯ ∈ (ker β„“)βŸ‚,

β„“(π‘₯) = β„“(π‘₯)β„“(π‘₯0)

β„“(π‘₯0)

soβ„“(π‘₯ βˆ’ β„“(π‘₯)

β„“(π‘₯0)π‘₯0) = 0

so π‘₯ βˆ’ β„“(π‘₯)β„“(π‘₯0) ∈ ker β„“ ∩ (ker β„“)βŸ‚ = 0. Now let

𝑣0 = β„“(π‘₯0)β€–π‘₯0β€–2 π‘₯0

and observe that β„“(π‘₯) βˆ’ ⟨π‘₯, 𝑣0⟩ vanishes on ker β„“ and on (ker β„“)βŸ‚ = Cπ‘₯0. Thusit is identically zero.

Definition (absolutely continuous, singular measure). Let (𝑋, π’œ) be a mea-surable space and let πœ‡, 𝜈 be two measures on (𝑋, π’œ).

1. πœ‡ is absolutely continuous with respect to 𝜈, denoted πœ‡ β‰ͺ 𝜈, if forevery 𝐴 ∈ π’œ, 𝜈(𝐴) = 0 implies πœ‡(𝐴) = 0.

2. πœ‡ is singular, denoted πœ‡ βŸ‚ 𝜈, if exists Ξ© ∈ π’œ such that πœ‡(Ξ©) = 0 and

51

Page 53: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

9 Hilbert space and 𝐿2-methods

𝜈(Ω𝑐) = 0.Example.

1. Let 𝜈 be the Lebesgue measure on (R, ℬ(R)) and π‘‘πœ‡ = π‘“π‘‘πœˆ where 𝑓 β‰₯ 0is a Borel function then πœ‡ β‰ͺ 𝜈.

2. If πœ‡ = 𝛿π‘₯0, the Dirac mass at π‘₯0 ∈ R, then πœ‡ βŸ‚ 𝑣.

Non-examinable theorem and proof:

Theorem 9.6 (Radon-Nikodym). Assume πœ‡ and 𝜈 are 𝜎-finite measureson (𝑋, π’œ).

1. If πœ‡ β‰ͺ 𝜈 then there exists 𝑔 β‰₯ 0 measurable such that π‘‘πœ‡ = π‘”π‘‘πœˆ,namely

πœ‡(𝐴) = ∫𝐴

𝑔(π‘₯)π‘‘πœˆ(π‘₯)

for all 𝐴 ∈ π’œ. 𝑔 is called the density of πœ‡ with respect to 𝜈 or Radon-Nikodym derivative, sometimes denoted by 𝑔 = π‘‘πœ‡

π‘‘πœˆ .

2. For any πœ‡, 𝜈 𝜎-finite, πœ‡ decomposes as

πœ‡ = πœ‡π‘Ž + πœ‡π‘ 

where πœ‡π‘Ž β‰ͺ 𝜈 and πœ‡π‘  βŸ‚ 𝜈. Moreover this decomposition is unique.

Proof. Consider 𝐻 = 𝐿2(𝑋, π’œ, πœ‡ + 𝜈), which is a Hilbert space. First assume πœ‡and 𝜈 are finite. Consider the linear form

β„“ ∢ 𝐻 β†’ R

𝑓 ↦ πœ‡(𝑓) = βˆ«π‘‹

π‘“π‘‘πœ‡

β„“ is bounded by Cauchy-Schwarz and finiteness of the measures:

|πœ‡(𝑓)| ≀ πœ‡(|𝑓|) ≀ (πœ‡ + 𝜈)(|𝑓|) ≀ √(πœ‡ + 𝜈)(𝑋) β‹… βˆšβˆ«π‘‹

|𝑓|2𝑑(πœ‡ + 𝜈) = 𝐢 β‹… ‖𝑓‖𝐻

so by Riesz representation theorem, there exists 𝑔0 ∈ 𝐿2(𝑋, π’œ, πœ‡ + 𝜈) such that

πœ‡(𝑓) = βˆ«π‘‹

𝑓𝑔0𝑑(πœ‡ + 𝜈). (βˆ—)

Claim that for (πœ‡+𝜈)-almost every π‘₯, 0 ≀ 𝑔0(π‘₯) ≀ 1: take 𝑓 = 1{𝑔0<0} and plugit into (βˆ—),

0 ≀ πœ‡({𝑔0 < 0}) = βˆ«π‘‹

1{𝑔0<0}𝑔0βŸβ‰€0

𝑑(πœ‡ + 𝜈) ≀ 0

so equality throughout. Thus 𝑔0 β‰₯ 0 (πœ‡ + 𝜈)-almost everywhere. Similarly take𝑓 = 1{𝑔0>1+πœ€} for πœ€ > 0 and plug it into (βˆ—),

(πœ‡ + 𝜈)({𝑔0 > 1 + πœ€}) β‰₯ πœ‡({𝑔0 > 1 + πœ€})

= βˆ«π‘‹

1{𝑔0>1+πœ€}𝑔0𝑑(πœ‡ + 𝜈)

β‰₯ (1 + πœ€)(πœ‡ + 𝜈)({𝑔0 > 1 + πœ€})

52

Page 54: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

9 Hilbert space and 𝐿2-methods

so must have(πœ‡ + 𝜈)({𝑔0 > 1 + πœ€}) = 0

i.e. 𝑔0 ≀ 1 (πœ‡ + 𝜈)-almost everywhere.Now set Ξ© = {π‘₯ ∈ 𝑋 ∢ 𝑔0 ∈ [0, 1)} so on Ω𝑐, 𝑔0 = 1 (πœ‡ + 𝜈)-almost

everywhere. Then (βˆ—) is equivalent to

βˆ«π‘‹

𝑓(1 βˆ’ 𝑔0)π‘‘πœ‡ = βˆ«π‘‹

𝑓𝑔0π‘‘πœˆ

for all 𝑓 ∈ 𝐿2(𝑋, π’œ, πœ‡ + 𝜈). Hence this holds for all 𝑓 β‰₯ 0. Now 𝑓 be 𝑓1βˆ’π‘”0

1Ξ©,get

∫Ω

π‘“π‘‘πœ‡ = ∫Ω

𝑓 𝑔01 βˆ’ 𝑔0

π‘‘πœˆ. (βˆ—βˆ—)

Set

πœ‡π‘Ž(𝐴) = πœ‡(𝐴 ∩ Ξ©)πœ‡π‘ (𝐴) = πœ‡(𝐴 ∩ Ω𝑐)

Clearly πœ‡ = πœ‡π‘Ž + πœ‡π‘ . Claim that this is the required result, i.e.

1. πœ‡π‘Ž β‰ͺ 𝜈,

2. πœ‡π‘  βŸ‚ 𝜈,

3. π‘‘πœ‡π‘Ž = π‘”π‘‘πœˆ where 𝑔 = 𝑔01βˆ’π‘”0

1Ξ©.

Proof.

1. If 𝜈(𝐴) = 0 set 𝑓 = 1𝐴 and plug into (βˆ—βˆ—) to get πœ‡(𝐴 ∩ Ξ©) = 0, namelyπœ‡π‘Ž(𝐴) = 0.

2. Set 𝑓 = 1Ω𝑐 . On Ω𝑐, 𝑔0 = 1 (πœ‡ + 𝜈)-almost everywhere. Plug this into (βˆ—)to get 𝜈(Ω𝑐) = 0. But πœ‡π‘ (Ξ©) = 0 so πœ‡π‘  βŸ‚ 𝜈.

3. (βˆ—βˆ—) is equivalent to π‘‘πœ‡π‘Ž = π‘”π‘‘πœˆ where 𝑔 = 𝑔01βˆ’π‘”0

1Ξ©.

This settles part 2 of the theorem, and also part 1 as if πœ‡ β‰ͺ 𝜈 then πœ‡ = πœ‡π‘Ž.If πœ‡ are 𝜈 are not finite but only 𝜎-finite, use the old trick of partition 𝑋

into countably many πœ‡- and 𝜈-finite sets and take their intersections. Supposewe get a disjoint countable union 𝑋 = ⋃𝑛 𝑋𝑛. Then πœ‡ = βˆ‘π‘› πœ‡|𝑋𝑛

where foreach 𝑛 we can write

πœ‡|𝑋𝑛= (πœ‡|𝑋𝑛

)π‘Ž + (πœ‡|𝑋𝑛)𝑠.

Then set

πœ‡π‘Ž = βˆ‘π‘›

(πœ‡|𝑋𝑛)π‘Ž

πœ‡π‘  = βˆ‘π‘›

(πœ‡|𝑋𝑛)𝑠

53

Page 55: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

9 Hilbert space and 𝐿2-methods

Remains to check uniqueness of decomposition. Suppose πœ‡ can be decom-posed in two ways

πœ‡ = πœ‡π‘Ž + πœ‡π‘  = πœ‡β€²π‘Ž + πœ‡β€²

𝑠.

As πœ‡π‘ , πœ‡β€²π‘  βŸ‚ 𝜈 there exists Ξ©0, Ξ©β€²

0 ∈ π’œ such that

πœ‡π‘ (Ξ©0) = 0, 𝜈(Ω𝑐0) = 0

πœ‡β€²π‘ (Ξ©β€²

0) = 0, 𝜈((Ξ©β€²0)𝑐) = 0

Set Ξ©1 = Ξ©0 ∩ Ξ©β€²0. Check that

πœ‡π‘ (Ξ©1) = πœ‡β€²π‘ (Ξ©1) = 0

𝜈(Ω𝑐1) = 𝜈(Ω𝑐

0 βˆͺ Ω′𝑐0 ) = 0

Now πœ‡π‘Ž, πœ‡β€²π‘Ž β‰ͺ 𝜈 so

πœ‡π‘Ž(Ω𝑐1) = πœ‡β€²

π‘Ž(Ω𝑐1) = 0.

Hence for all 𝐴 ∈ π’œ,

πœ‡π‘Ž(𝐴) = πœ‡π‘Ž(𝐴 ∩ Ξ©1) = πœ‡(𝐴 ∩ Ξ©1) = πœ‡β€²π‘Ž(𝐴 ∩ Ξ©1) = πœ‡β€²

π‘Ž(𝐴)

so πœ‡π‘Ž = πœ‡β€²π‘Ž and hence πœ‡π‘  = πœ‡β€²

𝑠.

Proposition 9.7. Let (Ξ©, β„±,P) be a probability space. Let 𝒒 be a 𝜎-subalgbebra of β„± and 𝑋 a random variable on (Ξ©, β„±,P). Assume 𝑋 isintegrable, then there exists a random variable π‘Œ on (Ξ©, 𝒒,P) such that

E(1𝐴𝑋) = E(1π΄π‘Œ )

for all 𝐴 ∈ 𝒒. Moreover π‘Œ is unique almost surely.

If you are perplexed by why it is nontrivial to recover a random variable on 𝒒from one on β„±, as 𝒒 βŠ† β„± so it seems that we can easily restrict to a β€œsub-randomvariable”. But this reasoning makes absolutely no sense as random variablesare functions from the space. In other words, id ∢ (Ξ©, β„±,P) β†’ (Ξ©, 𝒒,P) ismeasurable but its inverse is not, and it is not obvious that 𝑋 has a pushforward.

Definition (conditional expectation). π‘Œ as above is called the conditionalexpectation of 𝑋 with respect to 𝒒, denote by

π‘Œ = E(𝑋|𝒒).

Proof. wlog assume 𝑋 β‰₯ 0. Set πœ‡(𝐴) = E(1𝐴𝑋) for all 𝐴 ∈ 𝒒. πœ‡ is finiteby integrability of 𝑋 and is a measure on (Ξ©, 𝒒). Moreover πœ‡ β‰ͺ P. Thus byRadon-Nikodym there exists 𝑔 β‰₯ 0 𝒒-measurable such that

πœ‡(𝐴) = ∫𝐴

𝑔𝑑P = E(1𝐴𝑔).

Set π‘Œ = 𝑔.Uniqueness is shown in example sheet 3.

54

Page 56: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

9 Hilbert space and 𝐿2-methods

Remark. In case 𝑋 ∈ 𝐿2(Ξ©, β„±,P) then π‘Œ is the orthogonal projection of 𝑋onto 𝐿2(Ξ©, 𝒒,P). It is well-defined since 𝐿2(Ξ©, β„±,P) is a Hilbert space and𝐿2(Ξ©, 𝒒,P) is a closed subspace. In this case TFAE:

1. E((𝑋 βˆ’ π‘Œ )1𝐴) = 0 for all 𝐴 ∈ 𝒒,

2. E((𝑋 βˆ’ π‘Œ )β„Ž) = 0 for all β„Ž simple on (Ξ©, 𝒒,P),

3. E((𝑋 βˆ’ π‘Œ )β„Ž) = 0 for all β„Ž β‰₯ 0 𝒒-measurable,

4. E((𝑋 βˆ’ π‘Œ )β„Ž) = 0 for all β„Ž ∈ 𝐿2(Ξ©, 𝒒,P).

Remark. Special case when 𝒒 = {βˆ…, 𝐡, 𝐡𝑐, Ξ©} where 𝐡 ∈ β„±:

E(1𝐴|𝒒)(πœ”) = {P(𝐴|𝐡) πœ” ∈ 𝐡P(𝐴|𝐡𝑐) πœ” ∈ 𝐡𝑐

whereP(𝐴|𝐡) = P(𝐴 ∩ 𝐡)

P(𝐡).

Proposition 9.8 (non-examinable). Properties of conditional expectation:

1. linearity: E(𝛼𝑋 + π›½π‘Œ |𝒒) = 𝛼E(𝑋|𝒒) + 𝛽E(π‘Œ |𝒒).

2. if 𝑋 is 𝒒-measurable then E(𝑋|𝒒) = E(𝑋).

3. positivity: if 𝑋 β‰₯ 0 then E(𝑋|𝒒) β‰₯ 0.

4. E(E(𝑋|𝒒)|β„‹) = E(𝑋|β„‹) if β„‹ βŠ† 𝒒.

5. if 𝑍 is 𝒒-measurable and bounded then E(𝑋𝑍|𝒒) = 𝑍 β‹… E(𝑋|𝒒).

55

Page 57: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

10 Fourier transform

10 Fourier transform

Definition (Fourier transform). Let 𝑓 ∈ 𝐿1(R𝑑, ℬ(R𝑑), 𝑑π‘₯) where 𝑑π‘₯ is theLebesgue measure. The function

𝑓 ∢ R𝑑 β†’ C

𝑒 ↦ ∫R𝑑

𝑓(π‘₯)π‘’π‘–βŸ¨π‘’,π‘₯βŸ©π‘‘π‘₯

where βŸ¨π‘’, π‘₯⟩ = 𝑒1π‘₯1 + β‹― + 𝑒𝑑π‘₯𝑑, is called the Fourier transform of 𝑓.

Proposition 10.1.

1. | 𝑓(𝑒)| ≀ ‖𝑓‖1.

2. 𝑓 is continuous.

Proof. 1 is clear. 2 follows from dominated convergence theorem.

Definition. Given a finite Borel measure πœ‡ on R𝑑, its Fourier transform isgiven by

πœ‡(𝑒) = ∫R𝑑

π‘’π‘–βŸ¨π‘’,π‘₯βŸ©π‘‘πœ‡(π‘₯).

Again | πœ‡(𝑒)| ≀ πœ‡(R𝑑) and πœ‡ is continuous.

Remark. If 𝑋 is an R𝑑-valued random variable with law πœ‡π‘‹ then πœ‡π‘‹ is calledthe characteristic function of 𝑋.

Example.

1. Normalised Gaussian distribution on R: πœ‡ = 𝒩(0, 1). π‘‘πœ‡ = 𝑔𝑑π‘₯ where

𝑔(π‘₯) = π‘’βˆ’π‘₯2/2√

2πœ‹.

Claim thatπœ‡(𝑒) = 𝑔(𝑒) = π‘’βˆ’π‘’2/2,

i.e. 𝑔 =√

2πœ‹π‘”. This is the defining characteristic of Gaussian distribution.

Proof. Since (𝑒 ↦ |𝑖𝑒 β‹… 𝑒𝑖𝑒π‘₯π‘’βˆ’π‘₯2/2|) ∈ 𝐿1(R), we can differentiate under

56

Page 58: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

10 Fourier transform

integral sign to get

𝑑𝑑𝑒

𝑔(𝑒) = 𝑑𝑑𝑒

∫R

𝑒𝑖𝑒π‘₯π‘’βˆ’π‘₯2/2 𝑑π‘₯√2πœ‹

= ∫R

𝑖π‘₯𝑒𝑖𝑒π‘₯π‘’βˆ’π‘₯2/2 𝑑π‘₯√2πœ‹

= βˆ’ ∫R

𝑖𝑒𝑖𝑒π‘₯𝑔′(π‘₯)𝑑π‘₯

= 𝑖 ∫R

(𝑒𝑖𝑒π‘₯)′𝑔(π‘₯)𝑑π‘₯

= βˆ’π‘’ ∫R

𝑒𝑖𝑒π‘₯𝑔(π‘₯)𝑑π‘₯

= βˆ’π‘’ 𝑔(𝑒)

Thus𝑑

𝑑𝑒( 𝑔(𝑒)𝑒𝑒2/2) = 0

so𝑔(𝑒) = 𝑔(0)π‘’βˆ’π‘’2/2.

But𝑔(0) = ∫

R𝑔(π‘₯)𝑑π‘₯ = 1

so 𝑔(𝑒) = π‘’βˆ’π‘’2/2 as required.

2. 𝑑-dimensional version: πœ‡ = 𝒩(0, 𝐼𝑑). π‘‘πœ‡(π‘₯) = 𝐺(π‘₯)𝑑π‘₯ where 𝑑π‘₯ =𝑑π‘₯1 β‹― 𝑑π‘₯𝑑 and

𝐺(π‘₯) =𝑑

βˆπ‘–=1

𝑔(π‘₯𝑖) = π‘’βˆ’β€–π‘₯β€–2/2

(√

2πœ‹)𝑑.

Then

𝐺(𝑒) = ∫R𝑑

π‘’π‘–βŸ¨π‘’,π‘₯⟩𝐺(π‘₯)𝑑π‘₯

=𝑑

βˆπ‘–=1

∫R

𝑔(π‘₯𝑖)𝑒𝑖𝑒𝑖π‘₯𝑖𝑑π‘₯𝑖

=𝑑

βˆπ‘–=1

𝑔(𝑒𝑖)

= π‘’βˆ’β€–π‘’β€–2/2

Theorem 10.2 (Fourier inversion formula).

1. If 𝑓 ∈ 𝐿1(R𝑑) is such that 𝑓 ∈ 𝐿1(R𝑑) then 𝑓 is continuous (i.e. 𝑓equals to a continuous function almost everywhere) and

𝑓(π‘₯) = 1(2πœ‹)𝑑

𝑓(βˆ’π‘₯).

57

Page 59: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

10 Fourier transform

2. If πœ‡ is a finite Borel measure on R𝑑 such that πœ‡ ∈ 𝐿1(R𝑑) then πœ‡ hasa continuous density with respect to Lebesgue measure, i.e. π‘‘πœ‡ = 𝑓𝑑π‘₯with

𝑓(π‘₯) = 1(2πœ‹)𝑑

πœ‡(βˆ’π‘₯).

Remark. In other words

𝑓(π‘₯) = 1(2πœ‹)𝑑 ∫

R𝑑

𝑓(𝑒)π‘’βˆ’π‘–βŸ¨π‘’,π‘₯βŸ©π‘‘π‘’

where 𝑓(𝑒) are Fourier coefficients and π‘’βˆ’π‘–βŸ¨π‘’,π‘₯⟩ are called Fourier modes, whichare characters π‘’π‘–βŸ¨π‘’,βˆ’βŸ© ∢ R𝑑 β†’ {𝑧 ∈ C ∢ |𝑧| = 1}. Informally this says that every𝑓 can be written as an β€œinfinite linear combination” of Fourier modes.

Proof. (!UPDATE: 1 does not quite reduce to 2 as 𝑓 = 𝑓+ βˆ’ π‘“βˆ’ does not quitehold. Instead write 𝑓 = π‘Ž πœ‡π‘‹ βˆ’ 𝑏 πœ‡π‘Œ where π‘Ž = ‖𝑓+β€–1, π‘‘πœ‡π‘₯ = 𝑓

‖𝑓+β€–1𝑑π‘₯)

1 reduces to 2 by considering 𝑓 = 𝑓+ βˆ’ π‘“βˆ’. In 2 we may assume wlog πœ‡is a probability measure so is the law of some random variable 𝑋. Let 𝑓(π‘₯) =

1(2πœ‹)𝑑

πœ‡(βˆ’π‘₯). We need to show πœ‡ = 𝑓𝑑π‘₯, which is equivalent to for all 𝐴 ∈ ℬ(R𝑑),

πœ‡(𝐴) = ∫R𝑑

𝑓1𝐴𝑑π‘₯.

Let β„Ž = 1𝐴 and wlog assume 𝐴 is a bounded Borel set. The trick is to introducean independent Gaussian random variable 𝑁 ∼ 𝒩(0, 𝐼𝑑) with law 𝐺𝑑π‘₯. We have

∫R𝑑

β„Ž(π‘₯)π‘‘πœ‡(π‘₯) = E(β„Ž(𝑋))) = limπœŽβ†’0

E(β„Ž(𝑋 + πœŽπ‘))

by dominated convergence theorem. But

E(β„Ž(𝑋 + πœŽπ‘)) = E(∫R𝑑

β„Ž(𝑋 + 𝜎π‘₯)𝐺(π‘₯)𝑑π‘₯)

= E(∫R𝑑

∫R𝑑

β„Ž(𝑋 + 𝜎π‘₯)π‘’π‘–βŸ¨π‘’,π‘₯⟩𝐺(𝑒) 𝑑𝑒𝑑π‘₯(√

2πœ‹)𝑑)

as𝐺(π‘₯) = 1

(√

2πœ‹)𝑑𝐺(π‘₯) = 1

(√

2πœ‹)π‘‘βˆ«R𝑑

𝐺(𝑒)π‘’π‘–βŸ¨π‘’,π‘₯βŸ©π‘‘π‘’.

So by a change of variable 𝑦 = 𝜎π‘₯,

E(β„Ž(𝑋 + πœŽπ‘)) = E(∫ ∫ β„Ž(𝑋 + 𝑦)π‘’π‘–βŸ¨π‘’,𝑦/𝜎⟩𝐺(𝑒) 𝑑𝑒(√

2πœ‹)π‘‘π‘‘π‘¦πœŽπ‘‘ )

= E(∫ ∫ β„Ž(𝑧)π‘’π‘–βŸ¨π‘’/𝜎,π‘§βˆ’π‘₯⟩𝐺(𝑒) 𝑑𝑒(√

2πœ‹πœŽ2)𝑑𝑑𝑧)

= ∫ ∫ β„Ž(𝑧)π‘’π‘–βŸ¨π‘’/𝜎,π‘§βŸ© πœ‡π‘‹(βˆ’π‘’πœŽ

)𝐺(𝑒) 𝑑𝑒(√

2πœ‹πœŽ2)𝑑𝑑𝑧 Tonelli-Fubini

= 1(2πœ‹)𝑑 ∫ ∫ πœ‡π‘‹(𝑒)π‘’βˆ’π‘–βŸ¨π‘’,π‘§βŸ©β„Ž(𝑧)π‘’βˆ’πœŽ2‖𝑒‖2/2𝑑𝑒𝑑𝑧

58

Page 60: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

10 Fourier transform

We want a condition to ensure 𝑓 ∈ 𝐿1(R𝑑). Clearly continuity is necessary.Here we use a generally principle in Fourier analysis: Fourier transform convertsdecay at infinity to smoothness. In Fourier inversion formula, if 𝑒 is large thenthe Fourier character has fast oscillation. Thus if 𝑓 decays fast at infinity thenthe Fourier coefficients also decays fast, and the resulting transfrom is smoother.

Proposition 10.3. If 𝑓, 𝑓 β€² and 𝑓″ exists (for example if 𝑓 is 𝐢2) and arein 𝐿1 then 𝑓 ∈ 𝐿1.

Proof. We prove the case 𝑑 = 1. The general case follows from Tonelli-Fubini.We show first that 𝑓, 𝑓 β€² ∈ 𝐿1 implies that 𝑓(𝑒) = 𝑖

𝑒𝑓 β€²(𝑒). This easily follows

from integration by parts:

𝑓(𝑒) = ∫ 𝑓(π‘₯)𝑒𝑖𝑒π‘₯𝑑π‘₯

= 1𝑖𝑒

∫ 𝑓(π‘₯)(𝑒𝑖𝑒π‘₯)′𝑑π‘₯

= βˆ’ 1𝑖𝑒

∫ 𝑓 β€²(π‘₯)𝑒𝑖𝑒π‘₯𝑑π‘₯

so in particular | 𝑓(𝑒)| ≀ 1|𝑒| ‖𝑓

β€²β€–1.Thus if 𝑓, 𝑓 β€², 𝑓″ ∈ 𝐿1 then 𝑓(𝑒) = βˆ’ 1

𝑒2𝑓″(𝑒) so

| 𝑓(𝑒)| ≀ ‖𝑓″‖1|𝑒2|

.

As ∫∞1

1|𝑒|2 𝑑𝑒 < ∞, 𝑓 ∈ 𝐿1.

Definition (convolution). Given two Borel measures πœ‡ and 𝜈 on R𝑑, wedefine their convolution πœ‡ βˆ— 𝜈 as the image of πœ‡ βŠ— 𝜈 under the addition map

Ξ¦ ∢ R𝑑 Γ— R𝑑 β†’ R𝑑

(π‘₯, 𝑦) ↦ π‘₯ + 𝑦

i.e. πœ‡ βˆ— 𝜈 = Ξ¦βˆ—(πœ‡ βŠ— 𝜈)

Thus given 𝐴 ∈ ℬ(R𝑑),

Ξ¦βˆ—(πœ‡ βŠ— 𝑣)(𝐴) = πœ‡ βŠ— 𝜈({(π‘₯, 𝑦) ∢ π‘₯ + 𝑦 ∈ 𝐴}).

Example. Given 𝑋, π‘Œ independent random variables and πœ‡, 𝜈 be laws of 𝑋, π‘Œrespectively, then πœ‡ βˆ— 𝜈 is the law of 𝑋 + π‘Œ.

Definition (convolution). If 𝑓, 𝑔 ∈ 𝐿1(R𝑑) define their convolution 𝑓 βˆ— 𝑔 by

(𝑓 βˆ— 𝑔)(π‘₯) = ∫R𝑑

𝑓(π‘₯ βˆ’ 𝑑)𝑔(𝑑)𝑑𝑑.

59

Page 61: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

10 Fourier transform

This is well defined by Fubini: 𝑓, 𝑔 ∈ 𝐿1 so

∫R𝑑

∫R𝑑

|𝑓(π‘₯ βˆ’ 𝑑)𝑔(𝑑)|𝑑𝑑𝑑π‘₯ < ∞

and

‖𝑓 βˆ— 𝑔‖1 = ∫ ∣ ∫ 𝑓(π‘₯ βˆ’ 𝑑)𝑔(𝑑)π‘‘π‘‘βˆ£π‘‘π‘₯ ≀ ∫ ∫ |𝑓(π‘₯ βˆ’ 𝑑)𝑔(𝑑)|𝑑𝑑𝑑π‘₯ ≀ ‖𝑓‖1 β‹… ‖𝑔‖1

Therefore (𝐿1(R𝑑), βˆ—) forms a Banach algebra.

Remark. If πœ‡, 𝜈 are two finite Borel measures on R𝑑 and if πœ‡, 𝜈 β‰ͺ 𝑑π‘₯, i.e.absolutely continuous, then by Radon-Nikodym there exist 𝑓, 𝑔 ∈ 𝐿1(R𝑑) suchthat

π‘‘πœ‡ = 𝑓𝑑π‘₯π‘‘πœˆ = 𝑔𝑑π‘₯

then πœ‡ βˆ— 𝜈 β‰ͺ 𝑑π‘₯ and𝑑(πœ‡ βˆ— 𝜈) = (𝑓 βˆ— 𝑔)𝑑π‘₯.

Proposition 10.4 (Gaussian approximation). If 𝑓 ∈ 𝐿𝑝(R𝑑) where 𝑝 ∈[1, ∞) then

limπœŽβ†’0

‖𝑓 βˆ— 𝐺𝜎 βˆ’ 𝑓‖𝑝 = 0

where 𝐺𝜎 = 𝒩(0, 𝜎2𝐼𝑑), i.e.

𝐺𝜎(π‘₯) = 1(√

2πœ‹πœŽ2)π‘‘π‘’βˆ’ β€–π‘₯β€–2

2𝜎2 .

Lemma 10.5 (continuity of translation in 𝐿𝑝). Suppose 𝑝 ∈ [1, ∞) and𝑓 ∈ 𝐿𝑝. Then

lim𝑑→0

β€–πœπ‘‘(𝑓) βˆ’ 𝑓‖𝑝 = 0

where πœπ‘‘(𝑓)(π‘₯) = 𝑓(π‘₯ + 𝑑), 𝑑 ∈ R𝑑.

Proof. Example sheet.

Proof of Gaussian approximation.

(𝑓 βˆ— 𝐺𝜎 βˆ’ 𝑓)(π‘₯) = ∫R𝑑

𝐺𝜎(𝑑)(𝑓(π‘₯ βˆ’ 𝑑) βˆ’ 𝑓(π‘₯))𝑑𝑑 = E(𝑓(π‘₯ βˆ’ πœŽπ‘) βˆ’ 𝑓(π‘₯))

where 𝑁 ∼ 𝒩(0, 𝐼𝑑) is Gaussian with density 𝐺1. Then

‖𝑓 βˆ— 𝐺𝜎 βˆ’ 𝑓‖𝑝𝑝 ≀ E(‖𝑓(π‘₯ + πœŽπ‘) βˆ’ 𝑓(π‘₯)‖𝑝

𝑝) = E(β€–πœπœŽπ‘(𝑓) βˆ’ 𝑓‖𝑝𝑝)

by Jensen’s inequality and convexity of π‘₯ ↦ π‘₯𝑝. By the lemma,

limπœŽβ†’0

β€–πœπœŽπ‘(𝑓) βˆ’ 𝑓‖𝑝 = 0.

As β€–πœπœŽπ‘(𝑓) βˆ’ 𝑓‖𝑝 ≀ 2‖𝑓‖𝑝, apply dominated convergence theorem to get therequired result.

60

Page 62: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

10 Fourier transform

Proposition 10.6.

β€’ If 𝑓, 𝑔 ∈ 𝐿1(R𝑑) then𝑓 βˆ— 𝑔 = 𝑓 β‹… 𝑔.

β€’ If πœ‡, 𝜈 are finite Borel measure then

πœ‡ βˆ— 𝜈 = πœ‡ β‹… 𝜈.

Proof. 1 reduces to 2 by writing

𝑓(π‘₯)𝑑π‘₯ = 𝑓+(π‘₯)𝑑π‘₯ βˆ’ π‘“βˆ’(π‘₯)𝑑π‘₯ = π‘Žπ‘‘πœ‡ βˆ’ π‘π‘‘πœˆ

for some probability measure πœ‡, 𝜈.wlog we may assume πœ‡ and 𝜈 are laws of independent random variables 𝑋

and π‘Œ. Then by a previous result πœ‡ βˆ— 𝜈 is just the law of 𝑋 + π‘Œ so

πœ‡ βˆ— 𝜈(𝑒) = ∫ π‘’π‘–βŸ¨π‘’,π‘₯+π‘¦βŸ©π‘‘πœ‡(π‘₯)π‘‘πœˆ(𝑦)

= E(π‘’π‘–βŸ¨π‘’,𝑋+π‘Œ ⟩) = E(π‘’π‘–βŸ¨π‘’,π‘‹βŸ©π‘’π‘–βŸ¨π‘’,π‘Œ ⟩) homomorphism= E(π‘’π‘–βŸ¨π‘’,π‘‹βŸ©)E(π‘’π‘–βŸ¨π‘’,π‘Œ ⟩) as 𝑋, π‘Œ are independent= πœ‡(𝑒) β‹… 𝜈(𝑒).

In short, this is precisely because π‘’π‘–βŸ¨π‘’,βˆ’βŸ© are characters.

Theorem 10.7 (LΓ©vy criterion). Let (𝑋𝑛)𝑛β‰₯1 and 𝑋 be R𝑑-valued randomvariables. Then TFAE:

1. 𝑋𝑛 β†’ 𝑋 in law,

2. For all 𝑒 ∈ R𝑑, πœ‡π‘‹π‘›(𝑒) β†’ πœ‡π‘‹(𝑒).

In particular if πœ‡π‘‹ = πœ‡π‘Œ for two random variables 𝑋 and π‘Œ then 𝑋 = π‘Œ inlaw, i.e. πœ‡π‘‹ = πœ‡π‘Œ.

Thus Fourier transform is an injection from Borel measure to certain functionspace.

Proof.

β€’ 1 ⟹ 2: Clear by defintion as 𝑓(π‘₯) = π‘’π‘–βŸ¨π‘’,π‘₯⟩ is continuous and boundedfor all 𝑒 ∈ R𝑑.

β€’ 2 ⟹ 1: Need to show that for all 𝑔 ∈ 𝐢𝑏(R𝑑),

E(𝑔(𝑋𝑛)) β†’ E(𝑔(𝑋)).

wlog it’s enough to check this for all 𝑔 ∈ πΆβˆžπ‘ (R𝑑). For the sufficiency see

example sheet.Note that for all 𝑔 ∈ 𝐢∞

𝑐 (R𝑑), 𝑔 ∈ 𝐿1 so by Fourier inversion formula

𝑔(π‘₯) = ∫ 𝑔(𝑒)π‘’βˆ’π‘–βŸ¨π‘’,π‘₯⟩ 𝑑𝑒(2πœ‹)𝑑 .

61

Page 63: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

10 Fourier transform

Hence

E(𝑔(𝑋𝑛)) = ∫ 𝑔(𝑒)E(π‘’βˆ’π‘–βŸ¨π‘’,π‘‹π‘›βŸ©)𝑑𝑒

= ∫ 𝑔(𝑒) πœ‡π‘‹π‘›(βˆ’π‘’) 𝑑𝑒

(2πœ‹)𝑑

β†’ ∫ 𝑔(𝑒)��𝑋(βˆ’π‘’) 𝑑𝑒(2πœ‹)𝑑

= E(𝑔(𝑋))

by dominated convergence theorem.

Theorem 10.8 (Plancherel formula).

1. If 𝑓 ∈ 𝐿1(R𝑑) ∩ 𝐿2(R𝑑) then 𝑓 ∈ 𝐿2(R𝑑) and

β€– 𝑓‖22 = (2πœ‹)𝑑‖𝑓‖2

2.

2. If 𝑓, 𝑔 ∈ 𝐿1(R𝑑) ∩ 𝐿2(R𝑑) then

⟨ 𝑓, π‘”βŸ©πΏ2 = (2πœ‹)π‘‘βŸ¨π‘“, π‘”βŸ©πΏ2 .

3. The Fourier transform

β„± ∢ 𝐿1(R𝑑) ∩ 𝐿2(R𝑑) β†’ 𝐿2(R𝑑)

𝑓 ↦ 1(√

2πœ‹)𝑑𝑓

extends uniquely to a linear operator on 𝐿2(R𝑑) which is an isometry.Moreover

β„± ∘ β„±(𝑓) = 𝑓

where 𝑓(π‘₯) = 𝑓(βˆ’π‘₯), for all 𝑓 ∈ 𝐿2(R𝑑).

Proof. First we prove 1 and 2 assuming 𝑓 , 𝑔 ∈ 𝐿1(R𝑑). By Fourier inversionformula,

β€– 𝑓‖22 = ∫

R𝑑

| 𝑓(𝑒)|2𝑑𝑒

= ∫ 𝑓(𝑒) 𝑓(𝑒)𝑑𝑒

= ∫ (∫ 𝑓(π‘₯)π‘’π‘–βŸ¨π‘’,π‘₯βŸ©π‘‘π‘₯) 𝑓(𝑒)𝑑𝑒

= ∫ ∫ 𝑓(π‘₯) 𝑓(𝑒)π‘’βˆ’π‘–βŸ¨π‘’,π‘₯βŸ©π‘‘π‘’π‘‘π‘₯

= ∫ 𝑓(π‘₯)𝑓(π‘₯)(2πœ‹)𝑑𝑑π‘₯

= (2πœ‹)𝑑‖𝑓‖22

62

Page 64: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

10 Fourier transform

and in particular 𝑓 ∈ 𝐿2(R𝑑).Similarly for 2,

⟨ 𝑓, π‘”βŸ©πΏ2 = ∫ 𝑓(𝑒) 𝑔(𝑒)𝑑𝑒

= ∫ (∫ 𝑓(π‘₯)π‘’π‘–βŸ¨π‘’,π‘₯βŸ©π‘‘π‘₯) 𝑔(𝑒)𝑑𝑒

= ∫ ∫ 𝑓(π‘₯) 𝑔(𝑒)π‘’βˆ’π‘–βŸ¨π‘’,π‘₯βŸ©π‘‘π‘’π‘‘π‘₯

= ∫ 𝑓(π‘₯)𝑔(π‘₯)𝑑π‘₯(2πœ‹)𝑑

= (2πœ‹)π‘‘βŸ¨π‘“, π‘”βŸ©πΏ2

Now for the general case we use Gaussian as a mollifier. Consider

π‘“πœŽ = 𝑓 βˆ— 𝐺𝜎

π‘”πœŽ = 𝑔 βˆ— 𝐺𝜎

and based on results and computations before,π‘“πœŽ = 𝑓 β‹… 𝐺𝜎 = π‘“π‘’βˆ’πœŽβ€–π‘’β€–2/2.

As β€– π‘“β€–βˆž ≀ ‖𝑓‖1, π‘“πœŽ ∈ 𝐿1(R𝑑). Thus π‘“πœŽ ∈ 𝐿2(R𝑑) and β€– π‘“πœŽβ€–22 = (2πœ‹)π‘‘β€–π‘“πœŽβ€–2

2. Butby Gaussian approximation we know that π‘“πœŽ β†’ 𝑓 in 𝐿2(R𝑑) as 𝜎 β†’ 0. Henceβ€–π‘“πœŽβ€–2 β†’ ‖𝑓‖2. Then

β€– π‘“πœŽβ€–22 = β€– 𝑓 β‹… πΊπœŽβ€–2

2 = ∫ | 𝑓(𝑒)|2π‘’βˆ’πœŽβ€–π‘’β€–2/2𝑑𝑒 β†’ ‖𝑓‖22

as 𝜎 β†’ 0 by monotone convergence theorem. Thus

β€– 𝑓‖22 = (2πœ‹)𝑑‖𝑓‖2

2.

For 2,⟨ π‘“πœŽ, π‘”πœŽβŸ© = ∫ 𝑓 π‘”π‘’βˆ’πœŽβ€–π‘’β€–2𝑑𝑒 β†’ ∫ 𝑓 𝑔𝑑𝑒

as 𝜎 β†’ 0 by dominated convergence theorem as 𝑓 𝑔 ∈ 𝐿1. The result followsfrom Gaussian approximation.

For 3, 𝐿1(R𝑑)∩𝐿2(R𝑑) is dense in 𝐿2(R𝑑) because it contains 𝐢𝑐(R𝑑). Thenextend by completeness: given 𝑓 ∈ 𝐿2(R𝑑), pick a sequence 𝑓𝑛 ∈ 𝐿1(R𝑑) ∩𝐿2(R𝑑) such that 𝑓𝑛 β†’ 𝑓 in 𝐿2(R𝑑). Then define

ℱ𝑓 = limπ‘›β†’βˆž

ℱ𝑓𝑛.

The limit exists as 𝐿2(R𝑑) is complete. β„± is well-defined as

‖ℱ𝑓𝑛 βˆ’ β„±π‘“π‘šβ€–2 = ‖𝑓𝑛 βˆ’ π‘“π‘šβ€–2

by 1. Finally,‖ℱ𝑓‖2 = ‖𝑓‖2

for all 𝑓 ∈ 𝐿2(R𝑑) andβ„± ∘ β„±(𝑓) = 𝑓

for all 𝑓 such that 𝑓, 𝑓 ∈ 𝐿1(R𝑑). Thus by continuity this holds for 𝐿2(R𝑑).

63

Page 65: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

11 Gaussians

11 Gaussians

Definition (Gaussian). An R𝑑-valued random variable 𝑋 is called Gaussianif for all 𝑒 ∈ R𝑑, βŸ¨π‘‹, π‘’βŸ© is Gaussian, namely its law has the form 𝒩(π‘š, 𝜎2)for some π‘š ∈ R, 𝜎 > 0.

Proposition 11.1. The law of a Gaussian vector 𝑋 = (𝑋1, … , 𝑋𝑑) ∈ R𝑑

is uniquely determined by

1. its mean E𝑋 = (E𝑋1, … ,E𝑋𝑑),

2. its covariance matrix (Cov(𝑋𝑖, 𝑋𝑗))𝑖𝑗 where

Cov(𝑋𝑖, 𝑋𝑗) = E((𝑋𝑖 βˆ’ E𝑋𝑖)(𝑋𝑗 βˆ’ E𝑋𝑗)).

Proof. If 𝑑 = 1 then this just says that it is determined by its mean π‘š andcovariance 𝜎2, which is obviously true. For 𝑑 > 1, compute the characteristicfunction

πœ‡π‘‹(𝑒) = E(π‘’π‘–βŸ¨π‘‹,π‘’βŸ©)

but by assumption βŸ¨π‘‹, π‘’βŸ© is Gaussian in 𝑑 = 1 so its law is determined

1. the mean E(βŸ¨π‘‹, π‘’βŸ©) = ⟨E𝑋, π‘’βŸ©,

2. the variance VarβŸ¨π‘‹, π‘’βŸ©. But

VarβŸ¨π‘‹, π‘’βŸ© = E((βŸ¨π‘‹, π‘’βŸ© βˆ’ EβŸ¨π‘‹, π‘’βŸ©)2) = βˆ‘π‘–,𝑗

𝑒𝑖𝑒𝑗 Cov(𝑋𝑖, 𝑋𝑗).

In particular this shows that (Cov(𝑋𝑖, 𝑋𝑗))𝑖𝑗 is a non-negative semidefinitesymmetric matrix.

Proposition 11.2. If 𝑋 is a Gaussian vector then exists 𝐴 ∈ ℳ𝑑(R), 𝑏 ∈ R𝑑

such that 𝑋 has the same law as 𝐴𝑁 + 𝑏 where 𝑁 = (𝑁1, … , 𝑁𝑑), (𝑁𝑖)𝑑𝑖=1

are iid. 𝒩(0, 1).

Proof. Take 𝐴 such that

π΄π΄βˆ— = (Cov(𝑋𝑖, 𝑋𝑗))𝑖𝑗

where π΄βˆ— is the adjoint/transpose of 𝐴, and

𝑏 = (E𝑋1, … ,E𝑋𝑑).

Check that for all 𝑒 ∈ R𝑑,

E(βŸ¨π‘‹, π‘’βŸ©) = βŸ¨π‘, π‘’βŸ©Var(βŸ¨π‘‹, π‘’βŸ©) = βŸ¨π΄π΄βˆ—π‘’, π‘’βŸ© = β€–π΄βˆ—π‘’β€–2

2 = Var(βŸ¨π΄π‘ + 𝑏, π‘’βŸ©)

64

Page 66: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

11 Gaussians

Proposition 11.3. If (𝑋1, … , 𝑋𝑑) is a Gaussian vector then TFAE:

1. 𝑋𝑖’s are independent.

2. 𝑋𝑖’s are pairwise independent.

3. (Cov(𝑋𝑖, 𝑋𝑗))𝑖𝑗 is a diagonal matrix.

Proof. 1 ⟹ 2 ⟹ 3 is obvious. 3 ⟹ 1 as we can choose 𝐴 to be diagonal.Thus 𝑋 has the same law as (π‘Ž1𝑁1, … , π‘Žπ‘‘π‘π‘‘) + 𝑏.

Theorem 11.4 (central limit theorem). Let (𝑋𝑖)𝑖β‰₯1 be R𝑑-valued iid. ran-dom variables with law πœ‡. Assume they have second moment, i.e. E(‖𝑋1β€–2) <∞. Let 𝐦 = E(𝑋1) ∈ R𝑑 and

π‘Œπ‘› = 𝑋1 + β‹― + 𝑋𝑛 βˆ’ 𝑛 β‹… π¦βˆšπ‘›

.

Then π‘Œπ‘› converges in law to a central Gaussian on R𝑑 with law 𝒩(0, 𝐾)where

𝐾𝑖𝑗 = (Cov(𝑋1))𝑖𝑗 = [∫R𝑑

(π‘₯𝑖 βˆ’ E(𝑋1))(π‘₯𝑗 βˆ’ E(𝑋1))π‘‘πœ‡(π‘₯)]𝑖𝑗

.

Proof. The proof is an application of LΓ©vy criterion. Need to show πœ‡π‘Œπ‘›(𝑒) β†’

πœ‡π‘Œ(𝑒) as 𝑛 β†’ ∞ for all 𝑒, where π‘Œ ∼ 𝒩(0, 𝐾). As

πœ‡π‘Œπ‘›(𝑒) = E(π‘’π‘–βŸ¨π‘Œπ‘›,π‘’βŸ©),

this is equivalent to show that for all 𝑒, βŸ¨π‘Œπ‘›, π‘’βŸ© converges in law to βŸ¨π‘Œ , π‘’βŸ©. But

βŸ¨π‘Œπ‘›, π‘’βŸ© = βŸ¨π‘‹1, π‘’βŸ© + β‹― + βŸ¨π‘‹π‘›, π‘’βŸ© βˆ’ π‘›βŸ¨π¦, π‘’βŸ©βˆšπ‘›

so we reduce the problem to 1-dimension case. By rescaling wlog E(𝑋1) =0,E(𝑋2

1) = 1.Now

πœ‡π‘Œπ‘›(𝑒) = E(π‘’π‘–π‘’π‘Œπ‘›)

= E(exp(𝑖𝑒𝑋1 + β‹― + π‘‹π‘›βˆšπ‘›

))

=𝑛

βˆπ‘–=1

E(exp(𝑖𝑒 π‘‹π‘–βˆšπ‘›

))

= (E(exp(𝑖𝑒 π‘‹π‘–βˆšπ‘›

)))𝑛

= ( πœ‡( π‘’βˆšπ‘›

))𝑛

65

Page 67: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

11 Gaussians

But E(𝑋1) = 0,E(𝑋21) = 1 so we can differentiate πœ‡ under the integral sign

πœ‡(𝑒) = ∫R

𝑒𝑖𝑒π‘₯π‘‘πœ‡(π‘₯)

𝑑𝑑𝑒

πœ‡(𝑒) = ∫R

𝑖π‘₯𝑒𝑖𝑒π‘₯π‘‘πœ‡(π‘₯) = 𝑖E(𝑋1)

𝑑2

𝑑𝑒2 πœ‡(𝑒) = ∫R

βˆ’π‘₯2𝑒𝑖𝑒π‘₯π‘‘πœ‡(π‘₯) = βˆ’E(𝑋21)

Taylor expand πœ‡ around 0 to 2nd order,

πœ‡(𝑒) = πœ‡(0) + 𝑒 πœ‡β€²(0) + 𝑒2

2πœ‡β€³(𝑒) + π‘œ(𝑒2)

= 1 + 0 β‹… 𝑒 βˆ’ 𝑒2

2+ π‘œ(𝑒2)

soπœ‡π‘Œπ‘›

(𝑒) = (1 βˆ’ 𝑒2

2𝑛+ π‘œ(𝑒2

𝑛))𝑛 β†’ π‘’βˆ’π‘’2/2 = 𝑔(𝑒)

as 𝑛 β†’ ∞ where 𝑔 is the law of π‘Œ.

66

Page 68: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

12 Ergodic theory

12 Ergodic theoryLet (𝑋, π’œ, πœ‡) be a measure space. Let 𝑇 ∢ 𝑋 β†’ 𝑋 be an π’œ-measurable map. Weare interested in the trajectories of 𝑇 𝑛π‘₯ for 𝑛 β‰₯ 0 and their statistical behaviour.In particular we are interested in those 𝑇 preserving measure πœ‡.

Definition (measure-preserving). 𝑇 ∢ 𝑋 β†’ 𝑋 is measure-preserving ifπ‘‡βˆ—πœ‡ = πœ‡. (𝑋, π’œ, πœ‡, 𝑇 ) is called a measure-preserving dynamical system.

Definition (invariant function, invariant set, invariant 𝜎-algebra).

β€’ A measurable function 𝑓 ∢ 𝑋 β†’ R is called 𝑇-invariant if 𝑓 = 𝑓 ∘ 𝑇.

β€’ A set 𝐴 ∈ π’œ is 𝑇-invariant if 1𝐴 is 𝑇-invariant.

‒𝒯 = {𝐴 ∈ π’œ ∢ 𝐴 is 𝑇-invariant}

is called the 𝑇-invariant 𝜎-algebra.

Lemma 12.1. 𝑓 is 𝑇-invariant if and only if 𝑓 is 𝒯-measurable.

Proof. Indeed for all 𝑑 ∈ R,

{π‘₯ ∈ 𝑋 ∢ 𝑓(π‘₯) < 𝑑} = {π‘₯ ∈ 𝑋 ∢ 𝑓 ∘ 𝑇 (π‘₯) < 𝑑} = 𝑇 βˆ’1({π‘₯ ∈ 𝑋 ∢ 𝑓(π‘₯) < 𝑑}).

Definition (ergodic). 𝑇 is ergodic with respect to πœ‡, or that πœ‡ is ergodicwith respect to 𝑇, if for all 𝐴 ∈ 𝒯, πœ‡(𝐴) = 0 or πœ‡(𝐴𝑐) = 0.

This condition asserts that 𝒯 is trivial, i.e. its elements are either null orconull.

Lemma 12.2. 𝑇 is ergodic with respect to πœ‡ if and only if every invariantfunction 𝑓 is almost everywhere constant.

Proof. Exercise.

Example.

1. Let 𝑋 be a finite space, 𝑇 ∢ 𝑋 β†’ 𝑋 a map and πœ‡ = # the countingmeasure, then 𝑇 is measure preserving is equivalent to 𝑇 being a bijection,and 𝑇 is ergodic is equivalent to there does not exists a partition 𝑋 =𝑋1 βˆͺ 𝑋2 such that both 𝑋1 and 𝑋2 are 𝑇-invariant, which is equivalent tofor all π‘₯, 𝑦 ∈ 𝑋, there exists 𝑛 such that 𝑇 𝑛π‘₯ = 𝑦.

2. Let 𝑋 = R𝑑/Z𝑑, π’œ the Borel 𝜎-algebra and πœ‡ the Lebesgue measure.Given π‘Ž ∈ R𝑑, translation π‘‡π‘Ž ∢ π‘₯ ↦ π‘₯ + π‘Ž is measure-preserving. π‘‡π‘Žis ergodic with respect to πœ‡ if and only if (1, π‘Ž1, … , π‘Žπ‘‘), where π‘Žπ‘–β€™s arecoordinates of π‘Ž, are linearly independent. See example sheet. (hint:Fourier transform)

67

Page 69: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

12 Ergodic theory

3. Let 𝑋 = R/Z and again π’œ Borel 𝜎-algebra and πœ‡ the Lebesgue mea-sure. The doubling map 𝑇 ∢ π‘₯ ↦ 2π‘₯ βˆ’ ⌊2π‘₯βŒ‹ is ergodic with respect toπœ‡. (hint: again consider Fourier coefficeints). Intuitively in the graphof this function the preimage πœ€ is two segments each of length πœ€/2, someasure-preserving.

4. Furstenberg conjecture: every ergodic measure πœ‡ on R/Z invariant under𝑇2, 𝑇3 must be either Lebesgue or finitely supported.

12.1 The canonical modelLet (𝑋𝑛)𝑛β‰₯1 be an R𝑑-valued stochastic process on (Ξ©, β„±,P). Let 𝑋 = (R𝑑)Nand define the sample path map

Ξ¦ ∢ Ξ© β†’ π‘‹πœ” ↦ (𝑋𝑛(πœ”))𝑛β‰₯1

Let

𝑇 ∢ 𝑋 β†’ 𝑋(π‘₯𝑛)𝑛β‰₯1 ↦ (π‘₯𝑛+1)𝑛β‰₯1

be the shift map. Let π‘₯𝑛 ∢ 𝑋 β†’ R𝑑 be the 𝑛th coordinate function and letπ’œ = 𝜎(π‘₯𝑛 ∢ 𝑛 β‰₯ 1).

Note. π’œ is the infinite product 𝜎-algebra ℬ(R𝑑)βŠ—N of ℬ(R𝑑)N.

Let πœ‡ = Ξ¦βˆ—P, a probability measure on (𝑋, π’œ). This πœ‡ is called the law ofthe process (𝑋𝑛)𝑛β‰₯1. Now (𝑋, π’œ, πœ‡, 𝑇 ) is called the canonical model associatedto (𝑋𝑛)𝑛β‰₯1.

Proposition 12.3 (stationary process). TFAE:

1. (𝑋, π’œ, πœ‡, 𝑇 ) is measure-preserving.

2. For all π‘˜ β‰₯ 1, the law of (𝑋𝑛, 𝑋𝑛+1, … , 𝑋𝑛+π‘˜) on (R𝑑)π‘˜ is independentof 𝑛.

In this case we say that (𝑋𝑛)𝑛β‰₯1 is a stationary process.

Proof.

β€’ 1 ⟹ 2: πœ‡ = π‘‡βˆ—πœ‡ implies πœ‡ = 𝑇 π‘›βˆ— πœ‡ for all πœ‡ and this says law of (𝑋𝑖)𝑖β‰₯1

is the same as that of (𝑋𝑖+𝑛)𝑖β‰₯1.

β€’ 1 ⟸ 2: πœ‡ and 𝑇 π‘›βˆ— πœ‡ agree on cylinders 𝐴 Γ— (R𝑑)N\𝐹 for 𝐹 βŠ† N finite,

𝐴 ∈ ℬ((R𝑑)𝐹).

In some sense ergodic system is the study of stationary process.

68

Page 70: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

12 Ergodic theory

Proposition 12.4 (Bernoulli shift). If (𝑋𝑛)𝑛β‰₯1 are iid. then (𝑋, π’œ, πœ‡, 𝑇 )is ergodic. It is called the Bernoulli shift associated to the law 𝜈 of 𝑋1. Wehave

πœ‡ = πœˆβŠ—N.

Proof. Claim that Ξ¦βˆ’1(𝒯) βŠ† π’ž, the tail 𝜎-algebra of (𝑋𝑛)𝑛β‰₯1. But Kolmogorov0-1 law says that if 𝐴 ∈ 𝒯 then P(Ξ¦βˆ’1(𝐴)) = 0 or 1, so πœ‡(𝐴) = 0 or 1, thus πœ‡is 𝑇-ergodic.

Given 𝐴 ∈ 𝒯, 𝑇 βˆ’1𝐴 = 𝐴 so

Ξ¦βˆ’1(𝐴) = {πœ” ∈ Ξ© ∢ (𝑋𝑛(πœ”))𝑛β‰₯1 ∈ 𝐴}= {πœ” ∈ Ξ© ∢ (𝑋𝑛(πœ”))𝑛β‰₯1 ∈ 𝑇 βˆ’1𝐴}= {πœ” ∈ Ξ© ∢ (𝑋𝑛+1(πœ”))𝑛β‰₯1 ∈ 𝐴}= {πœ” ∈ Ξ© ∢ (𝑋𝑛+π‘˜(πœ”))𝑛β‰₯1 ∈ 𝐴} for all π‘˜βˆˆ 𝜎(π‘‹π‘˜, π‘‹π‘˜+1, … )

for πœ‡-almost every π‘₯.

Theorem 12.5 (von Neumann mean ergodic theorem). Let (𝑋, π’œ, πœ‡, 𝑇 )be a measure-preserving system. Let 𝑓 ∈ 𝐿2(𝑋, π’œ, πœ‡). Then the ergodicaverage

𝑆𝑛𝑓 = 1𝑛

π‘›βˆ’1βˆ‘π‘–=0

𝑓 ∘ 𝑇 𝑖

converges in 𝐿2 to 𝑓, a 𝑇-invariant function. In fact 𝑓 is the orthogonalprojection of 𝑓 onto 𝐿2(𝑋, 𝒯, πœ‡).

The intuition is as follow: if 𝑓 is the indicator function of a set, then 𝑆𝑛𝑓 isexactly the average of the time each orbit spending in 𝐴.

Proof. Hilbert space argument. Let 𝐻 = 𝐿2(𝑋, π’œ, πœ‡) and define

π‘ˆ ∢ 𝐻 β†’ 𝐻𝑓 ↦ 𝑓 ∘ 𝑇

which is an isometry: because πœ‡ is 𝑇-invariant, ∫ |𝑓 ∘ 𝑇 |2π‘‘πœ‡ = ∫ |𝑓|2π‘‘πœ‡. Thenby Riesz representation theorem it has an adjoint

π‘ˆ βˆ— ∢ 𝐻 β†’ 𝐻π‘₯ ↦ π‘ˆ βˆ—π‘₯

which satisfies βŸ¨π‘ˆβˆ—π‘₯, π‘¦βŸ© = ⟨π‘₯, π‘ˆπ‘¦βŸ© for all 𝑦 ∈ 𝐻. Let

π‘Š = {πœ‘ βˆ’ πœ‘ ∘ 𝑇 ∢ πœ‘ ∈ 𝐻}

be the coboundaries. Let 𝑓 ∈ π‘Š. Then

𝑆𝑛𝑓 = 1𝑛

π‘›βˆ’1βˆ‘π‘–=0

(πœ‘ ∘ 𝑇 𝑖 βˆ’ πœ‘ ∘ 𝑇 𝑖+1) = πœ‘ βˆ’ πœ‘ ∘ 𝑇 𝑛

𝑛→ 0

69

Page 71: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

12 Ergodic theory

as 𝑛 β†’ ∞.Let 𝑓 ∈ π‘Š then again 𝑆𝑛𝑓 β†’ 0 because for all πœ€ exists 𝑔 ∈ π‘Š such that

‖𝑓 βˆ’ 𝑔‖ < πœ€. Then

‖𝑆𝑛𝑓 βˆ’ 𝑆𝑛𝑔‖ = ‖𝑆𝑛(𝑓 βˆ’ 𝑔)β€– ≀ ‖𝑓 βˆ’ 𝑔‖ ≀ πœ€

so lim sup𝑛‖𝑆𝑛𝑓‖ ≀ πœ€.Have 𝐻 = π‘Š βŠ• π‘ŠβŸ‚ and π‘ŠβŸ‚ = π‘Š βŸ‚. Claim π‘Š βŸ‚ is exactly the 𝑇-invariant

functions. The theorem then follows because if 𝑓 ∘ 𝑇 = 𝑓 then 𝑆𝑛𝑓 = 𝑓 for all 𝑓.

Proof of claim.

π‘Š βŸ‚ = {𝑔 ∈ 𝐻 ∢ βŸ¨π‘”, πœ‘ βˆ’ π‘ˆπœ‘βŸ© = 0 for all πœ‘ ∈ 𝐻}= {𝑔 ∢ βŸ¨π‘”, πœ‘βŸ© = βŸ¨π‘”, π‘ˆπœ‘βŸ© for all πœ‘}= {𝑔 ∢ βŸ¨π‘”, πœ‘βŸ© = βŸ¨π‘ˆ βˆ—π‘”, πœ‘βŸ© for all πœ‘}= {𝑔 ∢ π‘ˆ βˆ—π‘” = 𝑔}= {𝑔 ∢ π‘ˆπ‘” = 𝑔}

where the last equality is by

β€–π‘ˆπ‘” βˆ’ 𝑔‖2 = 2‖𝑔‖2 βˆ’ 2 ReβŸ¨π‘”, π‘ˆπ‘”βŸ© = 2‖𝑔‖2 βˆ’ 2 ReβŸ¨π‘ˆβˆ—π‘”, π‘”βŸ©,

and this shows that π‘Š βŸ‚ are exactly 𝑇-invariant functions.

In fact we can do better:

Theorem 12.6 (Birkhoff (pointwise) ergodic theorem). Let (𝑋, π’œ, πœ‡, 𝑇 ) bea measure-preserving system. Assume πœ‡ is finite (actually 𝜎-finite suffices)and let 𝑓 ∈ 𝐿1(𝑋, π’œ, πœ‡). Then

𝑆𝑛𝑓 = 1𝑛

π‘›βˆ’1βˆ‘π‘–=0

𝑓 ∘ 𝑇 𝑖

converges πœ‡-almost everywhere to a 𝑇-invariant function 𝑓 ∈ 𝐿1. Moreover𝑆𝑛𝑓 β†’ 𝑓 in 𝐿1.

Corollary 12.7 (strong law of large numbers). Let (𝑋𝑛)𝑛β‰₯1 be a sequenceof iid. random variables. Assume E(|𝑋1|) < ∞. Let

𝑆𝑛 =𝑛

βˆ‘π‘˜=1

π‘‹π‘˜,

then 1𝑛 𝑆𝑛 converges almost surely to E(𝑋1).

Proof. Let (𝑋, π’œ, πœ‡, 𝑇 ) be the canonical model associated to (𝑋𝑛)𝑛β‰₯1, where𝑋 = RN, π’œ = ℬ(R)βŠ—N, 𝑇 the shift operator and πœ‡ = πœˆβŠ—N where 𝜈 is the law of𝑋1. It is a Bernoulli shift. Let

𝑓 ∢ 𝑋 β†’ Rπ‘₯ ↦ π‘₯1

70

Page 72: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

12 Ergodic theory

the first coodinate. Then 𝑓 ∘ 𝑇 𝑖(π‘₯) = π‘₯𝑖+1 so

1𝑛

(𝑋1 + β‹― + 𝑋𝑛)(πœ”) = 𝑆𝑛𝑓(π‘₯)

where π‘₯ = (𝑋𝑛(πœ”))𝑛β‰₯1. Hence by Birkhoff ergodic theorem

1𝑛

(𝑋1 + β‹― + 𝑋𝑛) β†’ 𝑓 = ∫ π‘“π‘‘πœ‡ = ∫ π‘₯1π‘‘πœ‡ = E(𝑋1)

almost surely.

Remark. If 𝑇 is ergodic then 𝑓 is almost everywhere constant. Hence 𝑓 =∫ π‘“π‘‘πœ‡.

Lemma 12.8 (maximal ergodic lemma). Let 𝑓 ∈ 𝐿1(𝑋, π’œ, πœ‡) and 𝛼 ∈ R.Let

𝐸𝛼 = {π‘₯ ∈ 𝑋 ∢ sup𝑛β‰₯1

𝑆𝑛𝑓(π‘₯) > 𝛼}

thenπ›Όπœ‡(𝐸𝛼) ≀ ∫

𝐸𝛼

π‘“π‘‘πœ‡.

Lemma 12.9 (maximal inequality). Let

𝑓0 = 0

𝑓𝑛 = 𝑛𝑆𝑛𝑓 =π‘›βˆ’1βˆ‘π‘–=0

𝑓 ∘ 𝑇 𝑖 𝑛 β‰₯ 1

Let𝑃𝑁 = {π‘₯ ∈ 𝑋 ∢ max

0≀𝑛≀𝑁𝑓𝑛(π‘₯) > 0}.

Then∫

𝑃𝑁

π‘“π‘‘πœ‡ β‰₯ 0.

Proof of maximal inequality. Set 𝐹𝑁 = max0≀𝑛≀𝑁 𝑓𝑛. Observe that for all 𝑛 ≀𝑁, 𝐹𝑁 β‰₯ 𝑓𝑛 and hence

𝐹𝑁 ∘ 𝑇 + 𝑓 β‰₯ 𝑓𝑛 ∘ 𝑇 + 𝑓 = 𝑓𝑛+1.

Now if π‘₯ ∈ 𝑃𝑛 then

𝐹𝑁(π‘₯) ≀ max0≀𝑛≀𝑁

𝑓𝑛+1 ≀ 𝐹𝑁 ∘ 𝑇 + 𝑓

Integrate to get∫

𝑃𝑁

πΉπ‘π‘‘πœ‡ ≀ βˆ«π‘ƒπ‘

𝐹𝑁 ∘ 𝑇 π‘‘πœ‡ + βˆ«π‘ƒπ‘

π‘“π‘‘πœ‡.

Note that 𝐹𝑁(π‘₯) = 0 if π‘₯ βˆ‰ 𝑃𝑁 because 𝑓0 = 0 so

βˆ«π‘ƒπ‘

𝐹𝑁 = βˆ«π‘‹

𝐹𝑁 ≀ βˆ«π‘‹

𝐹𝑁 ∘ 𝑇 + βˆ«π‘ƒπ‘›

𝑓.

71

Page 73: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

12 Ergodic theory

As πœ‡ is 𝑇-invariant, ∫ 𝐹𝑁 ∘ 𝑇 = ∫ 𝐹𝑁 so

βˆ«π‘ƒπ‘

π‘“π‘‘πœ‡ β‰₯ 0.

Proof of maximal ergodic lemma. Apply the maximal inequality to 𝑔 = 𝑓 βˆ’ 𝛼.Observe that

𝐸𝛼(𝑓) = ⋃𝑁β‰₯1

𝑃𝑁(𝑔)

and 𝑆𝑛𝑔 = 𝑆𝑛𝑓 βˆ’ 𝛼. Thus

βˆ«πΈπ›Ό(𝑓)

(𝑓 βˆ’ 𝛼)π‘‘πœ‡ β‰₯ 0,

which is equivalent toπ›Όπœ‡(𝐸𝛼) ≀ ∫

𝐸𝛼

π‘“π‘‘πœ‡.

Proof of Birkhoff (pointwise) ergodic theorem. Let

𝑓 = lim sup𝑛

𝑆𝑛𝑓

𝑓 = lim inf𝑛

𝑆𝑛𝑓

Observe that 𝑓 = 𝑓 ∘ 𝑇 , 𝑓 ∘ 𝑇 = 𝑓: indeed

𝑆𝑛𝑓 ∘ 𝑇 = 1𝑛

(𝑓 ∘ 𝑇 + β‹― + 𝑓 ∘ 𝑇 𝑛) = 1𝑛

((𝑛 + 1)𝑆𝑛+1𝑓 βˆ’ 𝑓).

Need to show that 𝑓 = 𝑓 πœ‡-almost everywhere. This is equivalent to for all𝛼, 𝛽 ∈ Q, 𝛼 > 𝛽, the set

𝐸𝛼,𝛽(𝑓) = {π‘₯ ∈ 𝑋 ∢ 𝑓(π‘₯) < 𝛽, 𝑓(π‘₯) > 𝛼}

is πœ‡-null, as then{π‘₯ ∢ 𝑓(π‘₯) β‰  𝑓(π‘₯)} = ⋃

𝛼>𝛽𝐸𝛼,𝛽

is πœ‡-null by subadditivity. Observe that 𝐸𝛼,𝛽 is 𝑇-invariant. Apply the maximalergodic theorem to 𝐸𝛼,𝛽 to get

π›Όπœ‡(𝐸𝛼,𝛽(𝑓)) ≀ βˆ«πΈπ›Ό,𝛽

π‘“π‘‘πœ‡.

Duallyβˆ’π›½πœ‡(𝐸𝛼,𝛽(𝑓)) ≀ ∫

𝐸𝛼,𝛽

βˆ’π‘“π‘‘πœ‡

72

Page 74: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

12 Ergodic theory

soπ›Όπœ‡(𝐸𝛼,𝛽) ≀ π›½πœ‡(𝐸𝛼,𝛽).

But 𝛼 > 𝛽 so πœ‡(𝐸𝛼,𝛽) = 0.We have proved that the limit lim𝑛 𝑆𝑛𝑓 exists almost everywhere, which we

now define to be 𝑓, and left to show 𝑓 ∈ 𝐿1 and lim𝑛‖𝑆𝑛𝑓 βˆ’ 𝑓‖1 = 0. This is anapplication of Fatou’s lemma:

∫ |𝑓|π‘‘πœ‡ = ∫ lim inf𝑛

|𝑆𝑛𝑓|π‘‘πœ‡

≀ lim inf𝑛

∫ |𝑆𝑛𝑓|π‘‘πœ‡

≀ lim inf𝑛

‖𝑆𝑛𝑓‖1

≀ ‖𝑓‖1

so 𝑓 ∈ 𝐿1 where the last inequality is because

‖𝑆𝑛𝑓‖ ≀ 1𝑛

(‖𝑓‖1 + β‹― + ‖𝑓 ∘ 𝑇 π‘›βˆ’1β€–1) = ‖𝑓‖1.

Now to show ‖𝑆𝑛𝑓 βˆ’ 𝑓‖1 β†’ 0, we truncate 𝑓. Let 𝑀 > 0 and set πœ‘π‘€ = 𝑓1|𝑓|<𝑀.Note that

β€’ |πœ‘π‘€| ≀ 𝑀 so |π‘†π‘›πœ‘π‘€| ≀ 𝑀. Hence by dominated convergence theoremβ€–π‘†π‘›πœ‘π‘€ βˆ’ πœ‘π‘€β€–1 β†’ 0.

β€’ πœ‘π‘€ β†’ 𝑓 πœ‡-almost everywhere and also in 𝐿1 by dominated convergencetheorem.

Thus by Fatou’s lemma,

β€–πœ‘π‘€ βˆ’ 𝑓‖1 ≀ lim inf𝑛

β€–π‘†π‘›πœ‘π‘€ βˆ’ 𝑆𝑛𝑓‖1 ≀ β€–πœ‘π‘€ βˆ’ 𝑓‖1

Finally

‖𝑆𝑛𝑓 βˆ’ 𝑓‖1 ≀ ‖𝑆𝑛𝑓 βˆ’ π‘†π‘›πœ‘π‘€β€–1 + β€–π‘†π‘›πœ‘π‘€ βˆ’ πœ‘π‘€β€–1 + β€–πœ‘π‘€ βˆ’ 𝑓‖1

≀ ‖𝑓 βˆ’ πœ‘π‘€β€–1 + β€–π‘†π‘›πœ‘π‘€ βˆ’ πœ‘π‘€β€–1 + ‖𝑓 βˆ’ πœ‘π‘€β€–1

solim sup‖𝑆𝑛𝑓 βˆ’ 𝑓‖1 ≀ 2‖𝑓 βˆ’ πœ‘π‘€β€–1

for all 𝑀, so goes to 0 as 𝑀 β†’ ∞.

Remark.

1. The theorem holds if πœ‡ is only assumed to be 𝜎-finite.

2. The theorem holds if 𝑓 ∈ 𝐿𝑝 for 𝑝 ∈ [1, ∞). The 𝑆𝑛𝑓 β†’ 𝑓 in 𝐿𝑝.

73

Page 75: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

Index

𝐿𝑝-space, 47πœ‹-system, 12𝜎-algebra, 10

independence, 33invariant, 67tail, 36

Bernoulli shift, 69Birkhoff ergodic theorem, 70Boolean algebra, 2Borel 𝜎-algebra, 12Borel measurable, 19Borel-Cantelli lemma, 35

canonical model, 68CarathΓ©odory extension theorem,

14central limit theorem, 65characteristic function, 56coboundary, 69completion, 17conditional expectation, 54convergence

almost surely, 39in 𝐿1, 41in distribution, 39in mean, 41in measure, 39in probability, 39

convolution, 59covariance matrix, 64

density, 31, 52Dirac mass, 39distribution, 29distribution function, 29Dynkin lemma, 12

ergodic, 67expectation, 29

Fatou’s lemma, 23filtration, 36Fourier transform, 56Furstenberg conjecture, 68

Gaussian, 64Gaussian approximation, 60

Hermitian norm, 49Hilbert space, 49HΓΆlder inequality, 46

iid., 34independence, 33infinite product 𝜎-algebra, 35, 68inner product, 49integrable, 20integral with respect to a measure,

20invariant 𝜎-algebra, 67invariant function, 67invariant set, 67io., 35

Jensen inequality, 44

Kolmogorov 0 βˆ’ 1 law, 36, 69

law, 29law of large numbers, 38law of larger numbers, 70Lebesgue measure, 8Lebesgue’s dominated convergence

theorem, 23LΓ©vy criterion, 61

maximal ergodic lemma, 71maximal inequality, 71mean, 32, 64measurable, 18measurable space, 10measure, 10

absolutely continuous, 51finitely additive, 2singular, 51

measure-preserving, 67measure-preserving dynamical

system, 67Minkowski inequality, 45moment, 32monotone convergence theorem, 20

null set, 5

orthogonal projection, 50

Plancherel formula, 62probability measure, 29

74

Page 76: Probability and Measure - Aleph epsilonqk206.user.srcf.net/notes/probability_and_measure.pdfΒ Β· 1 Lebesguemeasure Proposition1.1.Letπ΅βŠ†R𝑑beabox.Letβ„°(𝐡)bethefamilyofelementary

Index

probability space, 29product 𝜎-algebra, 26

infinite, 35, 68product measure, 27

infinite, 35

Radon-Nikodym derivative, 52Radon-Nikodym theorem, 52random process, 36random variable, 29

independence, 33Riesz representation theorem, 51,

69

sample path map, 68simple function, 19

stationary process, 68stochastic process, 68strong law of large numbers, 38, 70

tail event, 36, 69Tonelli-Fubini theorem, 27

uniformly integrable, 41

variance, 32von Neumann mean ergodic

theorem, 69

weak convergence, 39

Young inequality, 46

75