measure and integration - syafiqjohar.files.wordpress.com · measure and integration sya q johar...

MEASURE AND INTEGRATION

Syafiq Johar

[email protected]

Contents

1 Riemann/Darboux Integration 1

2 Measure 6

2.1 Caratheodory Extension Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Lebesgue Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Borel Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Cantor Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Measurable Functions 16

4 Lebesgue Integration 18

4.1 Lebesgue Integration of Non-Negative Functions . . . . . . . . . . . . . . . . . . 19

4.2 Lebesgue Integration of General Functions . . . . . . . . . . . . . . . . . . . . . . 21

5 Convergence Theorems 22

6 Double Integrals 27

6.1 Product Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.2 Theorems by Tonelli and Fubini . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1 Riemann/Darboux Integration

The intuition behind Riemann or Darboux integral is that we slice the subgraph of a function

into strips and approximate the area of the subgraph from above and below by rectangles.

These rectangles have a well defined area which is simply the product of their side lengths. The

integral is defined to be the “limiting” area of these rectangles if we take finer and finer slices.

The first step in defining a Riemann integral is to define the partition and step functions.

Definition 1.1 (Partition). Let [a, b] be an interval in R. Then a partition P of the interval

[a, b] is a finite sequence of a = x0 < x1 < x2 < · · · < xn = b. Each (xi−1, xi] is called a

subinterval of the partition.

1

Definition 1.2 (Refinement of partition). A partition P ′ given by a = x0 < x′1 < x′2 < · · · <x′m = b is called a refinement of the partition P defined above if each xi is equal to some x′j for

some j ∈ {1, 2, . . . ,m}.

An important remark here is that if P1 and P2 are two different partitions of [a, b], then

there exists a common refinement P3 of both P1 and P2. This is simply done by taking the

union of the partition points from each partition and reorder them increasingly.

Definition 1.3 (Step function). A step function φ : [a, b]→ R is called a step function adapted

to the partition P if φ is a constant on each interval (xi, xi+1].

Suppose that for each i = 1, 2, . . . , n we have φ(x) = ci for x ∈ (xi, xi−1]. We can express

the step function φ as the sum:

φ =

n∑i=1

ci1(xi−1,xi],

where 1(xi−1,xi] is the indicator function on the interval (xi−1, xi]. We know how to “integrate”

these step functions. This is done by summing up the lengths of each of the subintervals in the

partition weighted by the value of φ in this interval. More precisely, this is defined as:

Definition 1.4. Let φ be a step function for some partition P. Suppose that for each i =

1, 2, . . . , n we have φ(x) = ci for x ∈ (xi, xi−1]. Then, we define

I(φ,P) =

n∑i=1

ci|xi − xi−1|.

Note that if P ′ is a refinement of P, we have I(φ,P) = I(φ,P ′).How do we adapt this to general functions? We define upper and lower Darboux sums. For

a function f : [a, b]→ R, given a partition P, we define:

mi = infx∈(xi−1,xi]

f(x),

Mi = supx∈(xi−1,xi]

f(x).

From this, we approximate the function f from below and from above via step functions

adapted to the partition P with two different approximations called the lower and upper ap-

proximations respectively:

¯f(P) =

n∑i=1

mi1(xi−1,xi],

f(P) =n∑i=1

Mi1(xi−1,xi].

Note that given a partition P, we have the pointwise ordering¯f(P) ≤ f ≤ f(P). Further-

more, if P ′ is a refinement of P, then we have¯f(P ′) ≥

¯f(P) and f(P ′) ≤ f(P), that is the finer

the partition, the lower approximation gets bigger and the upper approximation gets smaller

pointwise.

2

From this, we can define the lower and upper Darboux sum which is given as:

Lf,P = I(¯f(P)) =

n∑i=1

mi|xi − xi−1|,

Uf,P = I(f(P)) =n∑i=1

Mi|xi − xi−1|.

By the same argument as the pointwise approximation, we get that the finer the partition,

the lower Darboux sum gets bigger and the upper Darboux sum gets smaller. Note that for

a given partition P, we have Lf,P ≤ Uf,P by definitions of mi and Mi. Finally, we define the

lower and upper Darboux integral by taking the supremum and infimum of the ower and upper

sums over all possible partitions:

Lf = sup{I(¯f(P)) : P is a partition of [a, b]},

Uf = inf{I(f(P)) : P is a partition of [a, b]}.

Note that, if P1 and P2 are two different partitions of [a, b], letting P3 be their common

refinement, we have:

Lf,P1 ≤ Lf,P3 ≤ Uf,P3 ≤ Uf,P2 ,

which implies that the lower Darboux sum for any partition P of [a, b] is always smaller than

equal to the upper Darboux sum of any partition. Thus, we must have Lf ≤ Uf . We call a

function f : [a, b] → R Darboux integrable or integrable if Lf = Uf and we define its integral

as: ˆ b

af(x) dx = Lf = Uf .

Remark 1.5. Technically, the above construction is called the Darboux integral. The construc-

tion done by Riemann is done by considering a “tagged partition” in which for each subinterval

in a partition P, a point (the tag) pi ∈ (xi−1, xi) is chosen. A Riemann sum is defined exactly

as the Darboux sum but instead building a step function by evaluating at the infimum or supre-

mum of the function at the subinterval, the step function is built by evaluating at the tagged

points pi. This can be shown to be equivalent to the construction by Darboux.

Proposition 1.6 (Properties of Riemann/Darboux integral). Suppose that f, g : [a, b]→ R are

integrable on [a, b], then

1. The integral is linear, that is for constants λ, µ, we have´ ba λf + µg dx = λ

´ ba f dx +

µ´ ba g dx.

2. The functions min(f, g) and max(f, g) are also integrable.

3. If f ≤ g pointwise, then´ ba f dx ≤

´ ba g dx. A direct corollary is |

´ ba f dx| ≤

´ ba |f | dx.

The following are called the fundamental theorem of calculus:

Proposition 1.7 (Fundamental theorem of calculus).

3

1. Suppose that h : (a, b)→ R is integrable on (a, b), let H : [a, b]→ R be defined as:

H(x) =

ˆ x

ah(y) dy.

Then, H is continuous on [a, b]. Furthermore, if h is continuous at c ∈ (a, b), then H is

differentiable at c with H ′(c) = h(c).

2. Suppose that H(y) is continuous on [a, b] and differentiable in (a, b). Suppose furthermore

that its derivative H ′(y) is integrable on (a, b). Then:

ˆ b

aH ′(y) dy = H(b)−H(y).

Proposition 1.8.

1. If f differs from an integrable function in finitely many points, then f is also integrable,

2. If f is continuous on [a, b], then it is integrable,

3. If f is monotone on [a, b], then it is integrable,

4. If f : (a, b)→ R is continuous and bounded on (a, b), then it is integrable on (a, b).

There are some problems with taking limits with Darboux/Riemann integral. For example,

the function g : [0, 1]→ R defined as:

g(x) =

0 if x ∈ Q,

1 if x ∈ Q,

differs from the integrable constant 0 (or 1) function at countably infinite many points, so it

is a limit of a sequence of functions which differ from 0 at finitely many points which are all

integrable by Proposition 1.8. However, this function is not integrable because regardless of the

partitions used, we would always have Lg,P = 0 and Ug,P = 1, so the lower and upper Darboux

integral will never be equal.

Another example is the following: for n = 1, 2, . . ., let fn : [0, 1] → R be defined as the

sequence of functions:

fn(x) =

n if x ∈ [0, 1n ],

0 otherwise.

However, we have:

limn→∞

ˆ 1

0fn dx = lim

n→∞1 = 1,

ˆ 1

0limn→∞

fn dx =

ˆ 1

00 dx = 0.

However, not all is lost. By placing stronger hypothesis, we have some limits/convergence

results:

4

Proposition 1.9.

1. Suppose that there is a sequence of functions fn : [a, b] → R which converges uniformly

to a function f that is fnu−→ f . Suppose that all of fn are integrable in [a, b], then f is

also integrable with:

limn→∞

ˆ b

afn dx =

ˆ b

alimn→∞

fn dx =

ˆ b

af dx.

2. Suppose that φn : [a, b]→ R are integrable functions and |φn| ≤Mn with∑∞

n=1Mi <∞.

Then the sum∑∞

n=1 φn is integrable and:

∞∑n=1

ˆ b

aφn dx =

ˆ b

a

∞∑n=1

φn dx.

3. Suppose that fn : [a, b]→ R is a sequence of functions that are continuously differentiable

in (a, b), fnpw−−→ f on [a, b] and f ′n

u−→ g where g : (a, b)→ R where g is a bounded function.

Then f is differentiable and f ′ = g, that is:

limn→∞

f ′n = ( limn→∞

fn)′.

Propositions 1.8 and 1.9 are state-of-the-art results in Riemann/Darboux integral, and they

are quite restrictive. Even with these limitations, the Riemann/Darboux integral is good enough

for some purposes.

However, there is the limitation of not being able to take limits (and hence differentiation,

infinite sums, ...) across the integral. The limitation here stems from the fact we build our

functions by approximating them with step functions which are derived from finite partitions of

the domain of the integration. This is important in the construction because the lengths of the

subintervals make sense and this the integral makes sense. For example, the function g which

takes the value 0 on the rationals and 1 otherwise cannot be approximated by a limit of step

functions.

If we enlarge the class of the approximating functions to some other functions which are

“integrable” in some other sense, we have more chance of integrating these more complicated

functions. We just need to approximate them by “simple” functions of the form:

φ =

n∑i=1

ci1Ei ,

where Ei are some sets of the domain on which we can assign sizes, just like the intervals. Then,

the approximating integral will be defined as:

I(φ) =

n∑i=1

ci|Ei|,

where |Ei| denotes the size of these sets. These sizes are called measure of Ei and these sets Ei

will be called measurable sets.

5

2 Measure

We first think about what a measure is. A measure should be defined on “certain subsets” of

our domain say X (for demonstration, we shall consider R) with images in [0,∞]. This is done

by demanding the measure satisfies certain reasonable properties. First we need to define the

domain of the measure. What are these “certain subsets”? They must be a subset of the power

set of X. Is it possible for the collection of subsets to coincide with the power set? We shall see

that for some domains, the power set is too big for the measure to make reasonable “physical”

sense. We denote 2X as the power set of the set X, that is the collection of all subsets of X.

Definition 2.1 (π-system). Let X be a domain. A non-empty collection of subsets of X is

called a π-system P ⊂ 2X if it is non-empty and closed under finite intersection, that is for an

A,B ∈ P, we have A ∩B ∈ P.

We have seen an example of a π-system during the construction of Riemann/Darboux inte-

gral. The collection of sets:

J = {intervals in [a, b] of the form (c, d] where a ≤ c ≤ d ≤ b}

forms a π-system. We can “measure” the sets in this π-system by defining the length of an

interval (c, d] as the non-negative number d − c ≥ 0. This length is called a content. Now we

generalise this collection further by allowing taking countable unions.

Definition 2.2 (Rings). Let X be a domain. A non-empty collection of subsets of X is called

a ring R ⊂ 2X if it is closed under union and set difference, that is for A,B ∈ R, we have:

1. A ∪B ∈ R,

2. A \B ∈ R.

These also imply that ∅ ∈ R and A ∩B = A \ (A \B) ∈ R.

Here are two facts about rings:

Lemma 2.3.

1. The intersection of finitely many rings⋂ni=1Ri of X is also a ring.

2. Let S ⊂ 2X be an arbitrary non-empty collection of sets of X. Then there exists exactly

one ring R(S) containing S and contained in every ring R containing S (that is R(S) is

the smallest ring containing S). We say this R(S) the ring generated by S.

From the lemma above, there exists a unique ring generated by J the set of intervals in

[a, b]. This ring R(J) is called the elementary sets. Again, there are no qualms about defining

the content on the sets in R(J). The content of union of finite number of disjoint intervals is

just the finite sum of their individual content. We say that the content on J extends to the

content on R(J).

6

Definition 2.4 (σ-rings). Let X be a domain. A non-empty collection of subsets of X is called

a σ-ring R ⊂ 2X if it is a ring which is closed under countable union, that is for Ai, B ∈ R, we

have:

1.⋃∞i=1Ai ∈ R,

2. A \B ∈ R.

These also imply that ∅ ∈ R and⋂∞i=1Ai ∈ R.

This is where the problem of defining contents start: how do we define the content of a

set made up of infinitely many pieces? The content is not suitable here since it is only finitely

additive. Also, how would we define the “length” of the intersection of infinitely many sets?

For good measure (no pun intended), we throw in the universal set and any other sets related

to it into the collection, which we define a σ-algebra. This will be natural collection of sets to

define measures on.

Definition 2.5 (σ-algebra). Let X be a domain. A non-empty collection of subsets of X is

called a σ-algebra (or σ-field) F ⊆ 2X if it is a σ-ring such that X itself is contained in F . That

is, for A,Ai, B ∈ F , we have:

1. X ∈ F ,

2.⋃∞i=1Ai ∈ F ,

3. A \B ∈ F .

These also imply that ∅ ∈ F and⋂∞i=1Ai ∈ F .

Remark 2.6. Alternatively, instead of requiring A \ B ∈ F for all A,B ∈ F , it is enough to

have that X \C ∈ F for any C ∈ F since for any A,B ∈ F , we have A \B = X \ (B ∪ (X \A))

Similar to rings, we have the following Lemma for σ-algebras.

Lemma 2.7. Let S ⊂ 2X be an arbitrary non-empty collection of sets of X. Then there exists

exactly one σ-algebra F(S) containing S and contained in every σ-algebra F containing S (that

is F(S) is the smallest σ-algebra containing S). We say this F(S) the σ-algebra generated by

S.

Thus for a given set X, we have the inclusion:

π-system ⊆ σ-ring ⊆ σ-algebra ⊆ 2X

Note that 2X is also a σ-algebra of X because it satisfies all the defining properties of a σ-algebra.

We aim to generalise the notion of lengths from π-system to the largest collection of subsets

possible. In general, we cannot do this for the set of all subsets 2X , sadly. We shall see this in the

next section. However, in general, the best we can do is to define it to some smaller σ-algebra

of X by requiring a condition called the Caratheodory condition. We demand the measure on a

σ-algebra to satisfy certain reasonable “physical” properties. From these properties, we define:

7

Definition 2.8 (Measure). A measure µ is a function µ : F → [0,∞] such that:

1. µ(∅) = 0,

2. µ is non-negative, that is for any E ∈ F , we have µ(E) ≥ 0,

3. µ is σ-additive, that is for any countable collection of pairwise disjoint sets Ei ∈ F , we

have:

µ

( ∞⋃i=1

Ei

)=∞∑i=1

µ(Ei).

The sets in F are called µ-measurable sets.

Remark 2.9. The content, as mentioned earlier, is just like measure except instead of σ-

additive, we only have finite additivity for content, which is why it is well defined on rings but

not on algebras.

The pair (X,F) is called a measurable space. The triple (X,F , µ) is called a measure space.

A measure space satisfies these properties:

Proposition 2.10. Let (X,F , µ) be a measure space. Then:

1. If A ⊆ B where A,B ∈ F , then µ(A) ≤ µ(B).

2. If Ai ∈ F such that Ai ⊆ Ai+1 for all i = 1, 2, . . ., then µ(⋃∞i=1Ai) = limi→∞ µ(Ai).

3. If Ai ∈ F such that Ai ⊇ Ai+1 for all i = 1, 2, . . . and µ(A1) < ∞, then µ(⋂∞i=1Ai) =

limi→∞ µ(Ai).

Proof. For the first one, since B = A ∪ (A \B), we have µ(B) ≥ µ(A).

For the second one, define A′1 = A1 and A′i = Ai \ Ai−1 for all i = 2, 3, . . .. Then we have

that A′i are disjoint and An =⋃ni=1A

′i and

⋃∞i=1Ai =

⋃∞i=1A

′i. Thus:

µ

( ∞⋃i=1

Ai

)= µ

( ∞⋃i=1

A′i

)=

∞∑i=1

µ(A′i) = limi→∞

i∑k=1

µ(A′k) = limi→∞

µ(Ai).

Finally, for the third, define Bi = A1 \Ai. Then the sequence Bi is an increasing nested set.

Note that⋃∞i=1Bi = A1 \

⋂∞i=1Ai. Thus, using (2), we get:

limi→∞

µ(Bi) = µ

( ∞⋃i=1

Bi

)= µ

(A1 \

∞⋂i=1

Ai

)= µ(A1)− µ

( ∞⋂i=1

Ai

), (1)

while on the other hand since A1 = Ai ∪ Bi for all i and Ai and Bi are disjoint, we have

µ(A1) = µ(Bi) + µ(Ai) for all i = 1, 2, . . .. Hence (1) becomes:

limi→∞

(µ(A1)− µ(Ai)) = µ(A1)− µ

( ∞⋂i=1

Ai

),

using the fact that µ(A1) <∞ and rearrange the terms yield the result.

8

Most of the time, we are interested in σ-finite measures since this allows us to work locally

in order to get a global picture. The definition of σ-finite measure is:

Definition 2.11 (σ-finite measure). Let (X,F , µ) be a measure space. Then, the measure µ is

σ-finite if there exists countable subsets Ei ∈ F such that X ⊆ ∪∞i=1Ei and µ(Ei) < ∞ for all

i = 1, 2, . . ..

2.1 Caratheodory Extension Theorem

Let us construct a σ-algebra on the space of R by working on what we had known so far. This

idea can be extended to a general space easily. We begin with the π-system J made up of

intervals of the form (a, b]. The content m on the sets in J are defined as:

m : J → [0,∞],

(a, b] 7→ |b− a|.

This content m can be extended to the ring generated by J , which we call R(J), easily.

Now we want to extend this content to a bigger collection of set. Let’s be ambitious and try

to define it on all the sets of R. For any A ∈ 2R, we define the size of the set A by something

called an outer measure m∗:

m∗(A) = inf

{ ∞∑i=1

m(Ji) : Ji ∈ J s.t. A ⊆∞⋃i=1

Ji

}, (2)

essentially what we are doing is covering the set A with intervals and defining the outer measure

on A as the smallest possible contents of interval covers of A, which could also be infinity. The

outer measure m∗ satisfies the following properties:

Lemma 2.12. The outer measure m∗ : 2R → [0,∞] satisfies:

1. m∗(∅) = 0,

2. m∗(A) ≥ 0 for any A ∈ 2R,

3. m∗ is σ-subadditive, that is that is for any countable collection of pairwise disjoint sets

Ei ∈ 2R, we have:

m∗

( ∞⋃i=1

Ei

)≤∞∑i=1

m∗(Ei).

Proof. For the last property, if∑∞

i=1m∗(Ei) = ∞, we are done. Otherwise, if

∑∞i=1m

∗(Ei) <

∞, then m∗(Ei) <∞ for all i. Fix an arbitrary ε > 0. Then, by definition of the outer measure,

for each i = 1, 2, . . ., there exists a countable cover {J ji }∞j=1 of Ei such that:

∞∑j=1

m(J ji ) < m∗(Ei) +ε

2i,

9

thus, since⋃∞i=1{J

ji }∞j=1 forms a countable cover of

⋃∞i=1Ei, by definition of the outer measure,

we have:

m∗

( ∞⋃i=1

Ei

)≤∞∑i=1

∞∑j=1

m(J ji ) <

∞∑i=1

(m∗(Ei) +

ε

2i

)=

∞∑i=1

m∗(Ei) + ε,

and since ε > 0 is arbitrary, we are done.

The last property above implies that the outer measure on 2R is not a genuine measure in

the sense of Definition 2.8. In some other set, this would probably be enough, but for R, this is

not good. To recap:

J ⊆ ? ⊆ 2R

(content)→ m ? m∗ ← (outer measure)

which means that J is too small to be equipped with a measure while 2R is too big to be

equipped with a measure. So we need a σ-algebra somewhere in the middle, that is we need to

throw away some sets from 2R so that the outer measure becomes a measure on the resulting

set. This is where the Caratheodory condition comes in:

Definition 2.13 (Caratheodory condition). Consider X a domain and a σ-algebra G ⊂ 2X

equipped with an outer measure m∗. A subset E ∈ G is m∗-measurable if it satisfies the

Caratheodory condition:

m∗(F ) = m∗(F ∩ E) +m∗(F ∩ Ec) for all F ∈ G.

The subset of G which satisfies the Caratheodory conditions is denoted G∗.

Remark 2.14. Sometimes the Caratheodory condition is relaxed to simply requiring:

m∗(F ) ≥ m∗(F ∩ E) +m∗(F ∩ Ec) for all F ∈ G,

since the ≤ relation is trivially true by the σ-subadditivity of m∗.

An important result of this definition is that:

Theorem 2.15 (Caratheodory Extension Theorem). If X ∈ G, then G∗ is a σ-algebra of X

and m∗ is a measure on G∗.

Proof. We begin by showing that the set G∗ is a σ-algebra. Clearly ∅, X ∈ G∗. Now we want

to show that for any A,B ∈ G∗, we have A \B = A ∩Bc ∈ G∗, that is we require:

Goal: m∗(G) = m∗(G ∩A ∩Bc) +m∗(G ∩ (Ac ∪B)) for any G ∈ G.

Note that since A,B ∈ G∗, for any F ∈ G, we have:

m∗(F ) = m∗(F ∩A) +m∗(F ∩Ac), (3)

m∗(F ) = m∗(F ∩B) +m∗(F ∩Bc). (4)

10

Applying (4) to F = G ∩ A, we get m∗(G ∩ A) = m∗(G ∩ A ∩ B) + m∗(G ∩ A ∩ Bc).

Substituting this in (3) with F = G yields:

m∗(G) = m∗(G ∩A ∩B) +m∗(G ∩A ∩Bc) +m∗(G ∩Ac). (5)

Next, consider (3) with F = G ∩ (Ac ∪B). This gives us:

m∗(G ∩ (Ac ∪B) = m∗(G ∩ (Ac ∪B) ∩A) +m∗(G ∩ (Ac ∪B) ∩Ac)

= m∗(G ∩B ∩A) +m∗(G ∩Ac),

by using distributivity of the set operations. Substituting this in (5) yields the desired goal.

Note that A ∪B = X \ ((X \A) \B) ∈ G∗, thus any finite union of elements in G∗ also lies

in G∗.Next we want to prove closure under countable union. Let Ei all satisfy the Caratheodory

condition, that is Ei ∈ G∗. WLOG assume further that they are all pairwise disjoint. Fix

n <∞. Then,⋃ni=1Ei is also m∗-measurable and for any F ∈ G we have :

m∗(F ) = m∗

(F ∩

(n⋃i=1

Ei

))+m∗

(F ∩

(n⋃i=1

Ei

)c), (6)

Furthermore, since En ∈ G∗, we have:

m∗

(F ∩

(n⋃i=1

Ei

))= m∗

(F ∩

(n⋃i=1

Ei

)∩ En

)+m∗

(F ∩

(n⋃i=1

Ei

)∩ Ecn

)

= m∗(F ∩ En) +m∗

(F ∩

(n−1⋃i=1

Ei

))= . . . =

n∑i=1

m∗(F ∩ Ei),

by using the fact that all the Ei are pairwise disjoint and induction on n. Substituting this in

(6), we have:

m∗(F ) =n∑i=1

m∗(F ∩ Ei) +m∗

(F ∩

(n⋃i=1

Ei

)c)

≥n∑i=1

m∗(F ∩ Ei) +m∗

(F ∩

( ∞⋃i=1

Ei

)c).

Taking the limit as n→∞ and using the σ-subadditivity of m∗, we get:

m∗(F ) ≥∞∑i=1

m∗(F ∩ Ei) +m∗

(F ∩

( ∞⋃i=1

Ei

)c)

≥ m∗( ∞⋃i=1

(F ∩ Ei)

)+m∗

(F ∩

( ∞⋃i=1

Ei

)c)= m∗

(F ∩

( ∞⋃i=1

Ei

))+m∗

(F ∩

( ∞⋃i=1

Ei

)c),

which implies Caratheodory condition from Remark 2.14. Thus G∗ is a σ-algebra. Furthermore,

since this is in fact an equality, we must have:

∞∑i=1

m∗(F ∩ Ei) = m∗

(F ∩

( ∞⋃i=1

Ei

))for any F ∈ G.

11

So, by setting F =⋃∞i=1Ei ∈ G, we have:

∞∑i=1

m∗(Ei) = m∗

( ∞⋃i=1

Ei

),

for arbitrary disjoint sets Ei ∈ G∗. Thus m∗ is σ-additive on G∗ and hence a measure on G∗.

One of the most obvious sets that satisfy the Caratheodory condition are the m∗-null sets,

which are the sets of measure 0 under m∗.

Lemma 2.16.

1. Suppose that Ei is a collection of countably many m∗-null sets. Then,⋃∞i=1Ei is also

m∗-null.

2. If E is m∗-null, then E ∈ G∗.

Proof. The first is clearly true by the σ-subadditivity property of m∗.

The second, we have to show that E satisfies the Caratheodory condition. Pick an arbitrary

F ∈ G, then note that F ⊃ F ∩Ac for any A ⊂ X. Thus, by setting A = E, we have:

m∗(F ) ≥ m∗(F ∩ Ec). (7)

On the other hand, m∗(F ∩E) ≤ m∗(E) = 0. Thus, (7) is really m∗(F ) ≥ m∗(F ∩Ec)+m∗(F ∩E), which is the Caratheodory condition.

2.2 Lebesgue Space

Going back to our construction, from the set of intervals J with content m, we constructed an

outer measure m∗ on 2R. To turn this outer measure m∗ into a genuine measure, we discard

the sets which do not satisfy the Caratheodory condition from 2R so that m∗ restricted to the

remaining set, which is a σ-algebra denoted as L, is a measure. We call m∗|L = µ.

Furthermore, the measure µ restricted to J is simply the content of the set. The space

(R,L, µ) is called the Lebesgue space. One property of the Lebesgue space, from Lemma 2.16,

is it is a complete measure space:

Definition 2.17 (Complete measure space). A measure space (X,F , µ) is called a complete

measure space if whenever E ∈ F is such that µ(E) = 0, then any subset of E is also in F .

The Lebesgue space L is not all of 2R. Though it is very difficult to construct a set which

is not in L. Here is an example of a subset of R which is not measurable:

Example 2.18. The Lebesgue measure is invariant under translation, that is for any E ∈ L,

we have µ(E + x) = µ(E) for any x ∈ R.

Consider the interval [0, 1] and define an equivalence relation on [0, 1] by: x ∼ y iff x−y ∈ Q.

Divide the interval [0, 1] into equivalent classes and by Axiom of Choice, choose a representative

from each class. Take the union of these representatives to form a subset A ⊂ [0, 1].

12

This set A is not Lebesgue measurable. Suppose it is, then µ(A) ≥ 0 is a constant. The

rational numbers in [−1, 1] is countable, so order them r1, r2, . . .. Consider the sets ri + A

for i = 1, 2, . . .. These sets are disjoint by construction and µ(ri + A) = µ(A). Furthermore,

[0, 1] ⊆⋃∞i=1(ri +A) ⊆ [−1, 2], thus by σ-additivity we have:

1 ≤∞∑i=1

µ(ri +A) =∞∑i=1

µ(A) ≤ 3.

However, this yields a contradiction since µ(A) ≥ 0, we either have µ(A) = 0 or µ(A) > 0.

The former implies that∑∞

i=1 µ(A) = 0 ≥ 1 while the latter implies that an infinite sum of

constant positive terms is bounded by 3, both of which are absurd. Hence, A is not Lebesgue

measurable.

By the above construction, we have:

Lemma 2.19. Every set in L with positive measure contains a non-Lebesgue measurable set.

2.3 Borel Space

Of course, there are other ways to construct a σ-algebras from a given ring. Recall Lemma

2.7. Given a collection of sets S ⊂ 2X , we can construct the σ-algebra generated by S, which

is called F(S). This is the opposite way from how we do the construction previously: we add

subsets to the collection S until it becomes a σ-algebra (in the previous, we discard sets from

2X until it becomes measurable with the outer measure m∗).

For X = R, our previous method yields the Lebesgue space L. This other method yields

Borel space B. A Borel space is the σ-algebra on R generated by open sets in R. Via construc-

tion, we can expect that the Borel space is smaller than Lebesgue space. This is because the

Borel space, by definition, is the smallest σ-algebra that contains open sets and L is a σ-algebra

that contains open sets, so B ⊆ L. In fact, we have:

Lemma 2.20.

1. B ⊂ L that is any Borel measurable set is Lebesgue measurable and there are Lebesgue

measurable sets which are not Borel measurable.

2. If E ∈ L, then there are A,B ∈ B ⊂ L such that A ⊂ E ⊂ B with µ(B\E) = µ(E\A) = 0.

That is L and B differ by Lebesgue measure 0 subsets.

3. L is a completion of B.

An interesting lemma is the following:

Lemma 2.21. A strictly increasing homeomorphism on an interval maps Borel sets to Borel

sets.

13

Proof. The function f is a strictly increasing homeomorphism, so it has a continuous inverse.

It is easy to check that for any continuous function g the set:

A = {E : g−1(E) ∈ B},

is a σ-algebra containing open sets. Thus, A ⊃ B by definition of B. Taking g = f−1, we have:

B ⊂ A = {E : f(E) ∈ B},

which yields the result.

2.4 Cantor Set

One important example in measure theory is the Cantor set C. The Cantor set is constructed

iteratively from the interval [0, 1] by removing the open middle third from this segment, and

removing the open middle third of all the remaining two segments, and removing the open

middle third of all the remaining four segments, and removing the open middle third of all the

remaining eight segments, and. . . ad infinitum. The explicit formula for C is:

C = [0, 1] \∞⋃n=1

2n−1−1⋃k=0

(3k + 1

3n,3k + 2

3n

).

n=0

n=1

n=2...

......

...

Furthermore, in this form, we see that C is a closed set in [0, 1]. It has Lebesgue measure 0.

Lemma 2.22. The Lebesgue measure of C is 0, that is µ(C) = 0.

Proof. We note that at each iteration of the construction, the remaining intervals form a finite

cover of the set C. That is [0, 1] is a cover, [0, 13 ]∪ [23 , 1] is a cover, et cetera. In general for every

n ∈ N the following set is a cover for C:

2n−1⋃i=0

[3k + 0

3n,3k + 1

3n

]∪[

3k + 2

3n,3k + 3

3n

].

Thus, µ(C) ≤(23

)nfor all n ∈ N. Thus, taking the limit as n→∞, we have µ(C) = 0.

Another construction of the Cantor set is to consider the ternary expansion of numbers in

the interval [0, 1]. Note ternary expansion is the expansion of a number x in this interval as

x =∑∞

i=1ai3i

= 0.a1a2a3 . . . (base 3) where ai ∈ {0, 1, 2} and it does not end with ak = 2 for

all k ≥ N for some N ∈ N. By doing the first iteration of the step, we are removing all the

numbers in [0, 1] with ternary expansion of the form 0.1 . . . (base 3), during the second step, we

14

are removing all numbers of the form 0.01 . . . (base 3) and 0.21 . . . (base 3) from the remaining

set. Iteratively, by construction, we are removing any numbers with 1 appearing in its ternary

expansion. So any element of C does not have 1 in its ternary expansion. By Cantor diagonal

argument, the number of elements in C is uncountable.

Definition 2.23 (Cantor staircase). The Cantor staircase is a function C : [0, 1]→ [0, 1] defined

iteratively as the limit of the sequence of functions Cn : [0, 1] → [0, 1] constructed as follows:

Let C0(x) = x. For every integer n ∈ N, the function Cn+1(x) is be defined in terms of Cn(x)

as:

Cn+1(x) =

Cn(3x)

2 if x ∈[0, 13],

12 if x ∈

[13 ,

23

],

1+Cn(3x−2)2 if x ∈

[23 , 1].

13

23

79

89

19

29

14

12

34

From this construction, one may check that the convergence is uniform. Indeed, for each

n ∈ N,by splitting into the three different regions, we have:

maxx∈[0,1]

|Cn+1(x)− Cn(x)| ≤ 1

2maxx∈[0,1]

|Cn(x)− Cn−1(x)|,

so iteratively, for every n ∈ N, we have:

maxx∈[0,1]

|C(x)− Cn(x)| ≤ 2

2nmaxx∈[0,1]

|C1(x)− C0(x)| = 1

3 · 2n,

which proves uniform convergence. And since all Cn are continuous, C(x) is continuous.

Furthermore for an arbitrary x /∈ C, since C is closed, there exist an open ball Bε(x) ⊂ Cc.Furthermore, there exists some N ∈ N such that the sequence {fn(y) : y ∈ Bε(x))}∞n=N is

constant 2k+12N

for some k ∈ {0, 1, , . . . , N}. Thus, C ′(x) = 0 for x ∈ Cc which implies that

the derivative of C vanishes almost everywhere. So, this Cantor staircase is constant almost

everywhere but increases from 0 to 1. Strange!

Example 2.24. Now let us construct a Lebesgue measurable set which is not Borel measurable.

Consider the function f(x) = C(x) + x for x ∈ [0, 1]. Then this function satisfies:

15

1. f is continuous and strictly increasing.

2. f ′ = 1 almost everywhere.

3. f−1 exists, since it is strictly increasing and continuous, it is onto its image, which is [0, 2].

4. f−1 is continuous (use topological argument).

The function f maps intervals in [0, 1] \ C to intervals in [0, 2] of the same length. Indeed,

suppose (a, b) ⊂ [0, 1] \ C, then C(a) = C(b) and hence µ(f((a, b))) = µ((f(a), f(b))) = f(b) −f(a) = C(b)+b−C(a)−a = b−a. Thus, the the image of [0, 1]\C has measure µ(f([0, 1]\C)) =

µ([0, 1] \ C) = 1 since µ(C) = 0.

However, since [0, 2] = f(C) ∪ f([0, 1] \ C), taking their measure, we have 2 = µ(f(C)) + 1

which implies that the image of C has measure 1. So the map f stretches the null set into a set

of positive measure. Thus, from Lemma 2.19, we conclude that there exists a non-measurable

set in f(C), which we call N /∈ L.

Note that N ⊂ f(C), thus f−1(N) ⊂ C so the inverse of N has measure 0 and hence

f−1(N) ∈ L. However, f−1(N) is not Borel. Indeed, suppose it is, then by Lemma 2.3, we have

f(f−1(N)) = N ∈ B. since B ⊆ L and N by choice is not in L, we get a contradiction.

3 Measurable Functions

Since we have properly defined measures and subsets of domains which can be measured, we now

proceed to define functions which we want to integrate. These functions are called measurable

functions. A remark here is that the measurable functions do not depend on the measures

themselves, just the underlying σ-algebra on the domain X. We define it as:

Definition 3.1 (Measurable functions). Let (X,F) and (Y, E) be measurable spaces. The map

f : X → Y is measurable if for any E ∈ E , its preimage under f is in F , that is f−1(E) ∈ F .

To put emphasis on the dependence on E and F , we sometimes write f : (X,F)→ (Y, E).

In particular, we have:

Definition 3.2. Let (X,F) be a measurable space. The map f : X → R is F-measurable if for

any E ∈ B, its preimage under f is in F , that is f−1(E) ∈ F .

Some properties:

Proposition 3.3. Let (X,F) be a measurable space. Then:

1. Let A ⊂ X, then 1A is measurable iff A ∈ F .

2. Let f : X → R be F-measurable and g : R→ R be Borel measurable. Then g ◦f : X → Ris F-measurable,

3. If a ∈ R and f, g : X → R are F-measurable, then so are af, f ± g, fg, f/g (if g 6= 0),

max(f, g),min(f, g), f+, f− and |f |, where f+ = max(f, 0) and f− = max(−f, 0) .

16

We can extend the definition of F-measurable functions to functions with image ±∞.

Definition 3.4. A function f : X → [−∞,∞] is F-measurable if {f =∞} and {f = −∞} are

measurable and if for any E ∈ B, its preimage under f is in F , that is f−1(E) ∈ F .

Measurable functions behave in a nice manner under limits. Thus, they are excellent candi-

dates of functions for integration. Recall these definitions:

Definition 3.5 (Limit inferior and limit superior). Let (xn)∞n=1 be a sequence of real numbers.

We define the limit inferior of this sequence as the infimum of the set of limit points of the

subsequences of (xn). More explicitly, it is defined as:

lim infn→∞

xn = limn→∞

(infm≥n

xm

)= sup

n≥0

(infm≥n

xn

).

Similarly, the limit superior of this sequence is the supremum of the set of limit points of

the subsequences of (xn). More explicitly, it is defined as:

lim supn→∞

xn = limn→∞

(supm≥n

xm

)= inf

n≥0

(supm≥n

xn

).

Lemma 3.6. Here are some properties of the limit inferior and limit superior of sequences (xn)

and (yn):

1. for the sequence (xn), we have inf xn ≤ lim infn→∞ xn ≤ lim supn→∞ xn ≤ supxn,

2. the sequence (xn) converges if and only if lim supn→∞ xn = lim infn→∞ xn and this com-

mon value is the limn→∞ xn,

3. the limit superior satisfies finite subadditivity, that is whenever all the terms are defined,

we have:

lim supn→∞

(xn + yn) ≤ lim supn→∞

xn + lim supn→∞

yn

4. the limit inferior satisfies finite superadditivity, that is whenever all the terms are defined,

we have:

lim infn→∞

(xn + yn) ≥ lim infn→∞

xn + lim infn→∞

yn

With the limit inferior and limit superior defined for sequences of real numbers defined, we

extend this pointwise to sequences of functions and we have:

Proposition 3.7. Let fn : X → R be a sequence of F-measurable functions. Then, supn fn,

infn fn, lim supn→∞ fn and lim infn→∞ fn are F-measurable. In particular, if limn→∞ fn exists

and equal to f , then f is F-measurable.

Now, we are ready to approximate the F-measurable functions by an analogue of step

functions we defined for Riemann/Darboux integral. We define simple functions:

17

Definition 3.8 (Simple functions). Let (X,F) be a measure space. Then a function φ : X → Ris called a simple function if there exists a finite n, some constants ci ∈ R and a collection of

measurable sets Ei ∈ F such that:

φ =n∑i=1

ci1Ei .

The following is a useful result that says any measurable functions can be approximated by

simple functions.

Proposition 3.9. Let f : X → [0,∞] be an F-measurable function. Then there is an increasing

sequence fn of simple functions such that fn ↑ f .

Proof. For each n ∈ N, we define:

fn =22n−1∑k=1

k

2n1E

(n)k

+ 2n1An ,

where:

E(n)k =

{x :

k

2n≤ f(x) ≤ k + 1

2n

}and An = {x : f(x) ≥ 2n}.

Thus, fn is a sequence of increasing simple functions and 0 ≤ f − fn ≤ 12n on {f < 2n} and

fn = 2n on {f ≥ 2n}. Therefore fn ↑ f as n→∞.

Another useful result is the almost everywhere property.

Proposition 3.10. Let (X,F , µ) be a measure space. We say that a property holds almost

everywhere (or a.e.) if the measure of the set such that the property does not hold is 0. Suppose

that (X,F) is a complete measure space. Then:

1. If f : X → [−∞,∞] is F-measurable and f = g a.e. (that is µ{x : f(x) 6= g(x)} = 0),

then g is also F-measurable.

2. If fn : X → [−∞,∞] is a sequence of F-measurable functions and fna.e.−−→ f , then f is

also F-measurable. Note fna.e.−−→ f means:

µ{x : fn(x) does not converge to a number} = µ{x : fn(x) is not a Cauchy sequence}

= µ

∞⋃k=1

∞⋂N=1

∞⋃m,n=N

{|fn − fm| ≥

1

k

} = 0.

4 Lebesgue Integration

Now we are in position to define Lebesgue integral. We assume two conditions here: the measure

space is complete and σ-finite. We first shall look at the space of non-negative functions.

18

4.1 Lebesgue Integration of Non-Negative Functions

As we did for the Riemann/Darboux integral, we define the integral of simple functions. Recall

that simple functions are functions φ of the form:

φ =

n∑i=1

ci1Ei .

where Ei ∈ F . Thus, the obvious integral for the simple functions is:

I(φ) =n∑i=1

ciµ(Ei).

Note that the integral over simple function is a linear operation, that is I(λφ + νϕ) =

λI(φ) + νI(ϕ). Thus, for a general non-negative F-measurable function f : X → [0,∞], we

define: ˆXf dµ = sup{I(φ) : φ is a simple function s.t. φ ≤ f}.

Another definition of the Lebesgue integration is via the improper integration. Suppose that

f : (X,F , µ) → [0,∞] is an F-measurable function. Then, we define its Lebesgue integral as

the improper Riemann integral:ˆXf dµ =

ˆ ∞0

µ{x ∈ X : f(x) ≥ t} dt.

This definition makes perfect sense because f is F-measurable, so the integrand on the RHS

is well defined for all t and the function

F : [0,∞]→ [0,∞]

t 7→ µ{x ∈ X : f(x) ≥ t}

is a monotone function and by Proposition 1.8, is Riemann integrable in the improper sense.

A non-negative function f is called Lebesgue integrable if´X f dµ < ∞. The space of

integrable functions is denoted L1(X,F , µ) or simply L1(X) if there is no confusion. From the

definition, these properties are clear:

1. if λ > 0, then´X λf dµ = λ

´X f dµ,

2. if 0 ≤ f ≤ g and f, g are both F-measurable functions, then´X f dµ ≤

´X g dµ,

3. if E,F ∈ F are disjoint measurable subsets of X and f ∈ L1(X), then´E∪F f dµ =´

E f dµ+´F f dµ.

These two other results need some work:

Proposition 4.1.

1. If f : X → [0,∞] is integrable, then µ{f =∞} = 0.

2. If f : X → [0,∞] is F-measurable and´X f dµ = 0, then f = 0 a.e. on X.

19

Proof. The proof uses Markov’s inequality which states that if f : X → [0,∞] is F-measurable

and for any λ > 0, we have:

µ{f ≥ λ} ≤ 1

λ

ˆXf dµ.

This inequality is proven by considering the function φ = λ1{f≥λ} ≤ f which implies that

λµ{f ≥ λ} =´{f≥λ} λ dµ =

´X φdµ ≤

´X f dµ which yields the inequality after rearrangement.

For the first, we know that´X f dµ <∞ a constant, thus taking the limit as λ→∞ yields

µ{f = ∞} = 0. For the second one, Markov’s inequality implies that µ{f ≥ λ} = 0 for every

λ > 0, so µ{f > 0} =∑∞

n=1 µ{f ≥ 1

n

}= 0.

A useful result for non-negative F-measurable functions is the Monotone Convergence The-

orem or the MCT:

Theorem 4.2 (Monotone Convergence Theorem, MCT). Let fn : X → [0,∞] be an increasing

sequence of non-negative F-measurable function and limn→∞ fn = f . Then, we have:

ˆXf dµ =

ˆX

limn→∞

fn dµ = limn→∞

ˆXfn dµ,

where this integral takes values in [0,∞].

Proof. We know that fn ≤ f . Thus, we have´X fn dµ ≤

´X f dµ for all n ∈ N. Taking the

supremum yields supn(´X fn dµ

)≤´X f dµ. Now we need to show the reverse inequality.

Consider a simple function φ =∑k

i=1 ci1Ei such that 0 ≤ φ ≤ f . Fix λ ∈ (0, 1) and consider

the set Bn = {x : fn(x) ≥ λφ(x)}. Thus, Bn is F-measurable and since fn is increasing,

Bn ⊆ Bn+1 for all n ∈ N. Furthermore, we have⋃∞n=1Bn = X since we have f(x) > λφ(x) and

fn(x)→ f(x) for all x ∈ X. Since λφ1Bn ≤ fn1Bn ≤ fn, we have

λ

ˆBn

φdµ ≤ˆXfn dµ, (8)

By definition of φ, we have:

ˆBn

φdµ =

k∑i=1

ciµ(Ei ∩Bn) −−−→n→∞

k∑i=1

ciµ(Ei) =

ˆXφdµ.

Thus, taking the limit as n→∞ in (8), we get:

λ

ˆXφdµ ≤ lim

n→∞fn dµ,

for all λ ∈ (0, 1). Taking the limit as λ→ 1 yields:

ˆXφdµ ≤ lim

n→∞fn dµ.

Since φ is an arbitrary simple function such that φ ≤ f , by definition of the Lebesgue integral

of f , we have the opposite inequality.

20

A direct corollary is that the Lebesgue integral is linear over the integrable functions, that

is if non-negative functions f, g are F-measurable functions and λ, ν ∈ R, we have:

ˆXλf + νg dµ = λ

ˆXf dµ+ ν

ˆXg dµ.

This is done by considering simple functions φn ↑ f and ϕn ↑ g, then φn + ϕn ↑ f + g and

by linearity of integration of simple functions, we have:

ˆXf + g dµ = lim

n→∞

ˆXφn + ϕn dµ = lim

n→∞

(ˆXφn dµ+

ˆXϕn dµ

)= lim

n→∞

ˆXφn dµ+ lim

n→∞

ˆXϕn dµ =

ˆXf dµ+

ˆXg dµ.

Since the Lebesgue integral behaves well under limits on non-negative functions, we can do

much more. For example, an infinite sum is defined as the limit of the finite sum as the number

of terms go to infinity. For example, if (fk) is a sequence of functions, we define the infinite

sum/series pointwise via the limits of partial sums Sn(x) =∑n

k=1 fk(x), that is:

∞∑k=1

fk(x) := limn→∞

n∑k=1

fk(x) = limn→∞

Sn(x).

If the functions in the series are all non-negative, then the partial sums Sn(x) are non-

negative and is increasing pointwise. Thus, applying MCT on the partial sums and noting that

the Lebesgue integral is linear for finite a finite sum of terms, we have:

Proposition 4.3 (Series of non-negative functions). Suppose that fn : X → [0,∞] is a sequence

of non-negative F-measurable functions. Then we have:

∞∑n=1

ˆXfn dµ =

ˆX

∞∑n=1

fn dµ.

In particular,∑∞

n=1 fn ∈ L1(X,µ) iff∑∞

n=1

´X fn dµ <∞.

A variant of this is to consider for a sequence of disjoint measurable subsets of E:

Proposition 4.4. Suppose that En is a sequence of measurable sets with En all pairwise disjoint

and E =⋃∞n=1En. Let f : X → [0,∞] be a non-negative F-measurable function. Then we

have: ˆEf dµ =

∞∑n=1

ˆEn

f dµ.

4.2 Lebesgue Integration of General Functions

In fact, the construction of integral generalises to F-measurable functions having image in

[−∞,∞], not just non-negative functions. This is done by breaking an arbitrary F-measurable

function into its positive and negative parts, that is:

f = f+ − f−,

21

where f+ = max(f, 0) and f− = max(−f, 0) are non-negative functions on X. Since these two

functions are also F-measurable,´X f

+ dµ and´X f− dµ are well defined (can also be infinite).

So, if at least one of the two integrals above is finite, we can define its Lebesgue integral as:ˆXf dµ =

ˆXf+ dµ−

ˆXf− dµ,

which takes values in [−∞,∞].

Definition 4.5 (Lebesgue integrable functions). If both of the integrals´X f

+ dµ and´X f− dµ

are finite, we call the function f = f+ − f− Lebesgue integrable. The space of Lebesgue

integrable functions over X is denoted as:

L1(X,F , µ) =

{f : f : X → [−∞,∞] is F-measurable and

ˆX|f | dµ =

ˆXf+ + f− dµ <∞

}.

Here are some direct consequences of the definition above:

Lemma 4.6. For F-measurable functions f, g : X → [−∞,∞], we have the following results:

1. the function f is Lebesgue integrable if and only if |f | is Lebesgue integrable,

2. if |g| ≤ f and f ∈ L1(X), then g ∈ L1(X),

3. if 0 ≤ g ≤ |f | and g /∈ L1(X), then f /∈ L1(X),

4. if f, g ∈ L1(X) and f ≤ g, then´X f dµ ≤

´X g dµ,

5. if Y ∈ F is a measurable subset of X and f ∈ L1(X), then f ∈ L1(Y ),

6. if f and g are both integrable, α, β ∈ R and αf +βg is defined, then the function αf +βg

is integrable and´X αf + βg dµ = α

´X f dµ+ β

´X g dµ,

7. if f ∈ L1(X) and f = g a.e., then g ∈ L1(X),

8. if f ∈ L1(X), then f ∈ R a.e.,

9. if f ∈ L1(X) and´X |f | dµ = 0, then f = 0 a.e.,

10. if f ∈ L1(X) and g : X → R is a bounded and F-measurable function, then fg ∈ L1(X).

5 Convergence Theorems

After looking at the monotone convergence theorem (MCT) for non-negative functions, we

want to extend this to a general family of functions. We can improve Theorem 4.2 to general

integrable functions in X (not necessarily non-negative functions):

Theorem 5.1 (General MCT). Let fn : X → R be an increasing sequence of integrable

functions and limn→∞ fn = f a.e. Suppose further that the set{´

X fn dµ}

is bounded above.

Then f ∈ L1(X,F , µ) and:ˆXf dµ =

ˆX

limn→∞

fn dµ = limn→∞

ˆXfn dµ.

22

Proof. Consider the sequence of non-negative functions defined by gn = fn−f1 for n ∈ N. This

sequence of functions is increasing. Apply Theorem 4.2 to this sequence of functions.

Since integrals can be thought of as limits of sums, the finite subadditivity of limit superior

and finite superadditivity of limit inferior, which we have seen in Lemma 3.6, carries forward

to Lebesgue integrals. These generalisations are called Fatou’s Lemmas.

Theorem 5.2 (Fatou’s Lemma). Suppose that fn : X → [0,∞] is a sequence of F-measurable

functions. Then: ˆX

lim infn→∞

fn dµ ≤ lim infn→∞

ˆXfn dµ.

In particular, if fn is a sequence of non-negative F-measurable functions and fna.e.−−→ f , then:

ˆXf dµ ≤ lim inf

n→∞

ˆXfn dµ.

Proof. Define gn(x) = infi≥n fi(x) for n ∈ N. Thus, gi ≤ fn for all i = n, n + 1, . . . and

gn ↑ lim infn→∞ fn. We apply Theorem 5.1 to the sequence gn to get:

limn→∞

ˆXgn dµ =

ˆX

limn→∞

gn dµ =

ˆX

lim infn→∞

fn dµ. (9)

On the other hand, gn ≤ fi for all i = n, n+1, . . ., so´X gn dµ ≤

´X fi dµ for i = n, n+1, . . ..

Thus, we have: ˆXgn dµ ≤ inf

i≥n

ˆXfi dµ. (10)

Thus, putting (9) and (10) together, we get the result.

Corollary 5.3 (Reverse Fatou’s Lemma). Suppose that fn : X → [0,∞] is a sequence of

F-measurable functions such that there exists a non-negative integrable function g such that

fn ≤ g for all n ∈ N. Then:

lim supn→∞

ˆXfn dµ ≤

ˆX

lim supn→∞

fn dµ.

Proof. Apply Fatou’s Lemma to the sequence of non-negative functions hn = g − fn and rear-

range the inequality. Note that lim inf(−f) = − lim sup(f)

Another powerful convergence theorems that do not require monotonicity of the terms in

the sequence are the Dominated/Bounded Convergence Theorem:

Theorem 5.4 (Dominated Convergence Theorem, DCT). Let fn : X → [−∞,∞] be a sequence

of F-measurable functions with limn→∞ fn = f a.e. on X. Suppose further that there exists

an integrable function g ∈ L1(X,F , µ) such that |fn(x)| ≤ g(x) a.e. on X for n ∈ N. Then:

1. fn and f are in L1(X,F , µ).

2. limn→∞´X fn dµ =

´X f dµ.

23

Proof. From Proposition 3.10, we have f is F-measurable. By comparison, since |fn(x)| ≤ g(x)

we have fn integrable for all n ∈ N and taking limits, we have f integrable as well. Apply

Fatou’s Lemma to the sequence of non-negative functions hn = g − fn, we getˆX

lim infn→∞

hn dµ ≤ lim infn→∞

ˆXhn dµ

⇒ˆXg − f dµ ≤ lim inf

n→∞

ˆX

(g − fn) dµ =

ˆXg dµ− lim sup

n→∞

ˆXfn dµ

⇒ lim supn→∞

ˆXfn dµ ≤

ˆXfdµ.

Repeating the process with the sequence kn = g + fn, we get:ˆXfdµ ≤ lim inf

n→∞

ˆXfn dµ.

So, we get the inequality:

lim supn→∞

ˆXfn dµ ≤

ˆXfdµ ≤ lim inf

n→∞

ˆXfn dµ.

However, since the last term is, by definition of lim sup and lim inf, actually smaller than

the first term, the whole inequality is an equality, so:

lim supn→∞

ˆXfn dµ =

ˆXfdµ = lim inf

n→∞

ˆXfn dµ,

which implies that limn→∞´X fn dµ =

´X f dµ.

The integrable function g here, called the dominating function, is a necessary condition. A

counter-example would be X = R and fn = 1(n−1,n]. We know that´X fn dµ = 1 for all n ∈ N

and fn → 0. But

1 = limn→∞

ˆXfn dµ while

ˆX

limn→∞

fn dµ = 0,

which do not agree. This is because there is no integrable function that dominates all of fn at

the same time. A corollary of this is the Bounded Convergence Theorem:

Corollary 5.5 (Bounded Convergence Theorem, BCT). Let (X,F , µ) be a finite measure

space, that is µ(X) < ∞. Let fn : X → [−∞,∞] be a sequence of F-measurable functions

with limn→∞ fn = f a.e. on X. Suppose further that there exists a constant K ∈ R such that

|fn(x)| ≤ K a.e. on X for n ∈ N. Then:

1. fn and f are in L1(X,F , µ).

2. limn→∞´X fn dµ =

´X f dµ.

Proof. Use integrable function K1X as the control function and apply the DCT.

Limits also appear implicitly in infinite sums and differentiations. Recall Proposition 4.3

for which we use the MCT to exchange the order of infinite sum and integral. By applying the

MCT this to the series of positive and negative parts separately, we have:

24

Proposition 5.6 (Series of functions). Suppose that fn : X → [∞,∞] is a sequence of F-

measurable functions. Then we have:

1. if∑∞

n=1

´X |fn| dµ < ∞, then the partial sums Sn(x) converges a.e. to an integrable

function,

2. if∑∞

n=1 |fn(x)| is integrable, then then the partial sums Sn(x) converges a.e. to an

integrable function .

In both cases, we have:∞∑n=1

ˆXfn dµ =

ˆX

∞∑n=1

fn dµ.

Another operation that requires a limiting process is the differentiation operation. Recall

that if f : R→ R is a function, we define its derivative at y0 ∈ R as:

df

dy(y0) = lim

h→0

f(y0 + h)− f(y0)

h.

If we have a function f : X ×R→ R in two variables, say f(x, y), we can fix the R variable

y and consider the function fy(x) = f(x, y) on X. If for each y ∈ R, the functions fy(x) are in

L1(X), we have a well-defined function F : R→ R given by:

F (y) =

ˆXf(x, y) dµ =

ˆXfy(x) dµ.

We can prove the continuity of this function under some mild assumptions.

Theorem 5.7. Let I ⊂ R and f : X × I → R be a function such that:

1. for each y ∈ I, the function fy are in L1(X),

2. for a.e. x ∈ X and every y0 ∈ I, we have limy→y0 f(x, y) = f(x, y0),

3. for each y0 ∈ I, there is an open subinterval J0 ⊂ I with y0 ∈ J0 and a function g0(x) ∈L1(X) such that for all y ∈ J0, we have |f(x, y)| ≤ g0(x) a.e. in x,

then F (y) is continuous on I.

Proof. Pick y0 ∈ I and let (yn) be a sequence of points in I such that yn → y0. For large enough

N , we must have that yn ∈ J0 for all n ≥ N . WLOG, we assume yn ∈ J0 for all n ∈ N. Then

(fyn(x)) is a sequence of functions from X to R. For this y0 we have an integrable dominating

function g0, then for all yn ∈ J0, we have |fyn(x)| ≤ g0(x) a.e. in x and f(x, yn)→ f(x, y), we

apply the DCT to get:

limyn→y

F (yn) = limyn→y0

ˆXfyn(x) dµ =

ˆX

limyn→y

f(x, yn) dµ =

ˆXf(x, y0) dµ = F (y0),

which proves the continuity of the function F at y0 ∈ I. Since y0 is arbitrarily chosen, this

proves the theorem.

25

Since F is continuous everywhere in I, we have some hope that the function F is differ-

entiable. We now want to find some conditions for which the derivative with respect to the y

variable commutes with the Lebesgue integral, that is:

dF

dy(y) =

ˆX

∂f

∂y(x, y) dµ.

We have the following theorem:

Theorem 5.8. Let I ⊂ R and f : X × I → R be a function such that:

1. for each y ∈ I, the function fy are in L1(X),

2. for each x ∈ X and y ∈ I, the partial derivative ∂f∂y (x, y) exists,

3. for each y0 ∈ I, there is an open subinterval J0 ⊂ I with y0 ∈ J0 and a function g0(x) ∈L1(X) such that for all y ∈ J0, we have |∂f∂y (x, y)| ≤ g0(x) a.e. in x,

then F (y) is differentiable on I and:

dF

dy(y) =

ˆX

∂f

∂y(x, y) dµ.

Proof. Pick y0 ∈ I and let (yn) be a sequence of points in I such that yn → y0 and yn 6= y0.

For large enough N , we must have that yn ∈ J0 for all n ≥ N . WLOG, we assume yn ∈ J0 for

all n ∈ N. Then (fyn(x)) is a sequence of functions from X to R. Define:

gn(x) =f(x, yn)− f(x, y0)

yn − y0.

Since yn 6= y0 for all n ∈ N, gn is a difference of two functions in L1(X), which implies that

gn(x) ∈ L1(X) for all n. Furthermore, since ∂f∂y (x, y) exists for all x ∈ X and y ∈ Y , we have

gn(x)→ ∂f∂y (x, y0) as n→∞.

Furthermore, by Mean Value Theorem, for each n, since f(x, y) is continuous and differen-

tiable with respect to the variable y in I, there exists some ξn ∈ (min(y0, yn),max(y0, yn)) ⊂ J0such that:

gn(x) =f(x, yn)− f(x, y0)

yn − y0=∂f

∂y(x, ξn).

For this y0 we have an integrable dominating function g0, then for all ξn ∈ J0, we have

|gn(x)| = |∂f∂y (x, ξn)| ≤ g0(x) a.e. in x and gn(x) = ∂f∂y (x, ξn) → ∂f

∂y (x, y0). Thus, we apply the

DCT to get:

F (yn)− F (y0)

yn − y0=

ˆX

f(x, yn)− f(x, y0)

yn − y0dµ =

ˆXgn(x) dµ −−−→

n→∞

ˆX

∂f

∂y(x, y0) dµ,

which proves the differentiability of the function F at y0 ∈ I. By uniqueness of limits, we have:

dF

dy(y0) =

ˆX

∂f

∂y(x, y0) dµ,

and since y0 is arbitrarily chosen, this proves the theorem.

26

6 Double Integrals

6.1 Product Measure

Let us extend the Lebesgue integral from two distinct measure spaces (X,F , µ1) and (Y,G, µ2)to their product space X ×Y . We first need to define a candidate for the measure µ on X ×Y .

We can clearly define the measure (or in this case, the content) on any sets in X × Y of

the rectangular form A × B ⊂ X × Y where A ∈ F and B ∈ G via m(A × B) = µ1(A)µ2(B).

The system of all rectangular sets form a π-system, which is not good enough for Lebesgue

integration.

The σ-algebra in X × Y that contains all the sets A × B for any A ∈ F and B ∈ G would

contain many more sets than these rectangular sets! We proceed to define an outer measure m∗

these all of the subsets of X × Y by the covering argument we have seen in Section 2.1. That

is, for any E ∈ 2X×Y , we define:

m∗(E) = inf

{ ∞∑i=1

µ1(Ai)µ2(Bi) : Ai ∈ F , Bi ∈ G s.t. E ⊆∞⋃i=1

Ai ×Bi

}, (11)

Similar to the construction in Section 2.1, this defines an outer measure, not a genuine

measure. Therefore, we proceed by removing the sets which does not satisfy the Caratheodory

condition (see Definition 2.13) from 2X×Y . The resulting collection of set, which we now call

H, is a σ-algebra and the outer measure restricted to this σ-algebra, which we now call µ, is a

genuine measure. We call this measure the product measure.

Furthermore, for any A ∈ F and B ∈ G, we have A×B ∈ H and µ(A×B) = µ1(A)µ2(B).

Moreover, if µ1 and µ2 are σ-finite measures on X and Y respectively, the product measure µ

is unique. Caratheodory Extension Theorem saves the day again!

6.2 Theorems by Tonelli and Fubini

Let (X,F , µ1) and (Y,G, µ2) be σ-finite measure spaces and (X × Y,H, µ) be their product

measure space. Now let us look at the theorem by Tonelli, which allows one to swap the order

of integration of non-negative functions under some very mild assumptions.

Theorem 6.1 (Tonelli’s theorem). Let f : X×Y → [0,∞] be a H-measurable function. Then:

1. the functions x 7→ f(x, y) is F-measurable for a.e. y ∈ Y ,

2. the functions x 7→´Y f(x, y) dµ2 is non-negative and F-measurable,

3. we have the equality:

ˆX×Y

f(x, y) dµ =

ˆX

(ˆYf(x, y) dµ2

)dµ1 =

ˆY

(ˆXf(x, y) dµ1

)dµ2.

We can extend this theorem to integrable functions by considering f+ and f− separately.

This is Fubini’s theorem:

27

Theorem 6.2 (Fubini’s theorem). Let f : X × Y → [−∞,∞] be Lebesgue integrable, that is

f ∈ L1(X × Y,H, µ). Then:

1. for almost all y ∈ Y , fy(x) ∈ L1(X), where fy(x) = f(x, y) is a function of x for a fixed y,

2. defining F (y) =´X fy(x) dµ1, we have that F ∈ L1(Y ), and the equality:

ˆYF (y) dµ2 =

ˆX×Y

f(x, y) dµ,

3. we have the equality:

ˆX×Y

f(x, y) dµ =

ˆX

(ˆYf(x, y) dµ2

)dµ1 =

ˆY

(ˆXf(x, y) dµ1

)dµ2.

Putting these two theorems together, we have the Fubini-Tonelli theorem:

Theorem 6.3 (Fubini-Tonelli theorem). Let f : X × Y → [−∞,∞] be H-measurable and

suppose that:

ˆX

(ˆY|f(x, y)| dµ2

)dµ1 <∞ or

ˆY

(ˆX|f(x, y)| dµ1

)dµ2 <∞.

Then, f is Lebesgue integrable, that is f ∈ L1(X × Y,H, µ).

28

measure and integration - syafiqjohar.files.wordpress.com · measure and integration sya q johar...

Documents