2301690 (2011/1) special topics in advanced mathematics ...left blank for 2nd page of contents...
TRANSCRIPT
2301 690 (2011/1)
Special Topics in Advanced Mathematics:
Theory of Copulae
Songkiat Sumetkijakan
Contents
1 Necessary Probability Theory 3
1.1 Riemann and Lebesgue integrations . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Expected values and variances . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Conditional probability and independence . . . . . . . . . . . . . . . . . . . 20
1.7 More on expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8 Moment generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.9 Laws of large numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.10 Central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.2
Left blank for 2nd page of contents
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.3
1 Necessary Probability Theory
N = 1, 2, 3, . . . , the set of all natural numbers.
N0 = 0, 1, 2, 3, . . . , the set of all nonnegative integers.
Z = . . . ,−3,−2,−1, 0, 1, 2, 3, . . . , the set of all integers.
Q =mn : m ∈ Z, n ∈ Z, n 6= 0
, the set of all rational numbers.
R = the set of all real numbers.
1.1 Riemann and Lebesgue integrations
Riemann integration and its shortcomings
Given a bounded function f : [a, b] → R+, one may estimate the area between the curve
y = f(x) and the x-axis over the interval [a, b] by first subdividing [a, b] into n equal
subintervals. This first step can be done by giving the n + 1 endpoints, called a partition
P = x0, x1, . . . , xn, of those n intervals. It is customary to order the endpoints in the
partition P so that a = x0 < x1 < · · · < xn = b. We then compute the lower sum and the
upper sum of f with respect to the partition P by
L(P ; f) =
n∑i=1
(xi − xi−1)mi(f)
U(P ; f) =
n∑i=1
(xi − xi−1)Mi(f)
where mi(f) is the smallest value of f(x) and Mi(f) is the largest value of f(x) for x ∈[xi−1, xi]. If lim‖P‖→0 L(P ; f) = lim‖P‖→0 U(P ; f) then f is said to be Riemann-integrable
on [a, b] and the Riemann-integral of f on [a, b] is∫ b
af(x) dx = lim
‖P‖→0L(P ; f) = lim
‖P‖→0U(P ; f).
Here, ‖P‖ is the smallest distance between two adjacent points in the partition P . In the
process of taking limit where the partition is finer and finer, the lower and upper sums
of a Riemann-integrable function are increasingly better lower and upper bounds of the
area under the curve. The Riemann-integrability of any bounded function f : [a, b] → R is
defined exactly the same as above. It only loses the area interpretation.
Even though the definition of the Riemann integral is very intuitive and behaves nicely
when the integrand f is nice and smooth e.g. continuous, it has some serious drawbacks,
some of which are illustrated below. We first define uniform convergence, a very strong
mode of convergence.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.4
Figure 1: Riemann integration
Definition 1. A sequence of functions fn : D → R converges uniformly to f : D → R if
limn→∞
sup |fn(x)− f(x)| : x ∈ D = 0.
Note that if fn converges uniformly to f then fn converges pointwise to f , i.e. fn(x)
converges to f(x) at every x ∈ D. The converse is false as can be seen by considering the
sequence fn(x) =nx
nx+ 1.
Example. i)
nx
n2x2 + 1
ii)
nx2
nx+ 1
iii) nx(1− x)n
Theorem 1.1 (Limits of Riemann Integrals). Assume that fn is a sequence of continuous
functions defined on [a, b] and fn converges uniformly to a function f . Then f is Riemann-
integrable (in fact, f is continuous) and
limn→∞
∫ b
afn(x) dx =
∫ b
alimn→∞
fn(x) dx =
∫ b
af(x) dx.
Theorem 1.2 (Interchanging the order of Riemann Integration). Assume that f : [a, b] ×[c, d]→ R is continuous. Then the functions
F (y) =
∫ b
af(x, y) dx and G(x) =
∫ d
cf(x, y) dy,
where a ≤ x ≤ b and c ≤ y ≤ d, are Riemann-integrable and∫ dc F (y) dy =
∫ ba G(x) dx. In
other words, ∫ d
c
∫ b
af(x, y) dx dy =
∫ b
a
∫ d
cf(x, y) dy dx.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.5
Example. i) There is a simple function that is not Riemann-integrable. Define f : [0, 1]→R by
f(x) =
1 if x ∈ Q,
0 if x /∈ Q.
Since Q is dense in R, which means that every open interval in R contains a member of
Q, every partition P = x0, . . . , xn of [0, 1] yields Mi(f) = supx∈[xi−1,xi)
|f(x)| = 1 and hence
U(P ; f) =∑n
i=1(xi − xi−1) = 1. Thus, lim‖P‖→0 U(P ; f) = 1. Likewise, since R \ Q is
also dense in R, any partition P of [0, 1] gives lim‖P‖→0 L(P ; f) = 0. Therefore, f is not
Riemann-integrable even though it is just a characteristic function of Q on [0, 1]
ii) An increasing sequence of Riemann-integrable functions whose limit is not Riemann-
integrable. Let r1, r2, . . . be an enumeration of all rational numbers. We then define, for
each n ∈ N, fn : [0, 1]→ R by
fn(x) =
1 if x ∈ r1, . . . , rn ,
0 elsewhere.
Obviously, fn is an increasing sequence and limn→∞ fn(x) = f(x) for all x ∈ [0, 1]. We
have seen that f is not Riemann-integrable while fn, being a finite sum of characteristic
functions of singletons, is clearly Riemann-integrable.
Lebesgue integration
Given f : [a, b] → R+ and suppose f([a, b]) ⊆ [A,B), let us subdivide [A,B) into n subin-
tervals by a partition y0, y1, . . . , yn and define
En = x ∈ [a, b] : f(x) ∈ [yn−1, yn) .
We then estimate the area between the curve y = f(x) and the x-axis over the interval
[a, b] by∑n
i=1m(En) yi where m(En) should be some meaningful “length” of the set En. It
would be easy if En is just an interval with endpoints a < b, i.e. (a, b), [a, b), (a, b], or [a, b],
whose length is simply b− a. But En could be a very complicated set as f is an arbitrary
function and hence the length of En is not clearly defined. In fact, it turns out that there
is no way to assign “length” to every subset of R in a consistent manner. This is where
measure theory comes in. A solution is to regard some sets to be “measurable” and discard
the rest. The class of measurable sets is so large that constructing a non-measurable set has
to invoke the Axiom of Choice. Then, since the shape of the set En depends directly on f , it
is also necessary to define Lebesgue integral only for “measurable functions.” Nonetheless,
we shall postpone detailed discussion on measure/probability theory to the next chapter.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.6
Figure 2: Lebesgue integration
Let us now focus on defining Lebesgue integration. Observe that the finer the partition
P = y0, . . . , yn, the larger the approximate area S(P ; f) =∑n
i=1m(En) yi. We then
define ∫[a,b]
f dm = lim‖P‖→0
S(P ; f).
This deceptively simple idea of partitioning the range of f (in Lebesgue integration)
instead of partitioning the domain (in Riemann integration) has proved to be very crucial
in developing a much more satisfactory and broader theory of integration. We state here
some fundamental theorems without proofs.
Theorem 1.3 (Monotone Convergence Theorem). Let fn be a sequence of non-negative
measurable functions on [a, b]. Suppose that 0 ≤ f1 ≤ f2 ≤ . . . and limn→∞
fn(x) = f(x) at
each x ∈ [a, b]. Then
limn→∞
∫[a,b]
fn dm =
∫[a,b]
f dm.
Theorem 1.4 (Dominated Convergence Theorem). Let fn and f be measurable functions
for which fn converges to f . Suppose there is a function g for which∫
[a,b] g dm is finite and
|fn| ≤ g. Then
limn→∞
∫[a,b]
fn dm =
∫[a,b]
f dm.
Theorem 1.5 (Fubini’s Theorem). Let f : [a, b]× [c, d]→ R. Assume that either
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.7∫[a,b]
∫[c,d] |f(x, y)| dm(y) dm(x) or
∫[c,d]
∫[a,b] |f(x, y)| dm(x) dm(y) is finite. Then∫
[a,b]
∫[c,d]
f(x, y) dm(y) dm(x) =
∫[c,d]
∫[a,b]
f(x, y) dm(x) dm(y).
Example. 1. The functions fn in Example after Theorem 1.2 all have∫
[0,1] fn dm = 0.
And since f = 0 everywhere except on Q which has measure zero,∫
[0,1] f dm = 0 also.
2. The sequence fn defined by fn(x) = nx(1 − x)n for x ∈ [0, 1] converges pointwise
to f = 0 but not uniformly. But one can check that∫ 1
0fn(x) dx =
∫ 1
0nx(1− x)n dx =
n
(n+ 1)(n+ 2)→ 0 =
∫ 1
0f(x) dx (1)
as n → ∞. Though limn→∞∫ 1
0 fn(x) dx =∫ 1
0 f(x) dx, it doesn’t satisfy the uniform
convergence assumption in Theorem 1.1. On the other hand, every fn is bounded by
e−1, a function with finite integral on [0, 1], and hence (1) follows straightforwardly
from Theorem 1.4.
1.2 Probability spaces
Definition 2 (Algebras). Let Ω be any nonempty set. A collection F of subsets of Ω is
called an algebra if
1. Ω ∈ F ;
2. if A ∈ F then Ac ∈ F ; and
3. if A1, A2, . . . , An are in F thenn⋃k=1
Ak ∈ F .
Definition 3 (σ-algebras or event spaces). Let Ω be any nonempty set. A collection F of
subsets of Ω is called a σ-algebra if
1. Ω ∈ F ;
2. if A ∈ F then Ac ∈ F ; and
3. if A1, A2, . . . are in F then∞⋃k=1
Ak ∈ F .
The pair (Ω,F) is called a measurable space, and the members of F are called F-measurable
sets or events. The σ-algebra F itself is sometimes referred to as an event space.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.8
Interpretation. To model an experiment, Ω serves as the set of all possible outcomes,
the so-called sample space. We think of any member of the σ-algebra F as an event that
may take place. For example, in the experiment of rolling a fair die once, we should take
Ω = 1, 2, . . . , 6. How would we choose a σ-algebra? If we observe the toss directly and
are interested in the number that turns up, then our σ-algebra must contain all of the
1 , 2 , . . . , 6 and the only choice is P(Ω). On the other hand, if we are only interested
in whether the toss turns up high (4-6) or low (1-3), then we don’t want to distinguish, e.g.,
between 1,2,3. So a natural σ-algebra is ∅, 1, 2, 3 , 4, 5, 6 ,Ω.
Example. 1. P(Ω) is the finest σ-algebra while ∅,Ω is the coarsest σ-algebra.
2. In an experiment of tossing a fair coin twice, our sample space is
Ω = HH,HT, TH, TT
where, for example, HT means the coin turns up head then tail. If we are told only
that how many heads turn up then our σ-algebra must contain HH, HT, TH,and TT. After taking all possible set operations, the σ-algebra is
∅, HH , HT, TH , TT , HH,HT, TH , HT, TH, TT , HH,TT ,Ω .
Proposition 1.6. Let (Ω,F) be a measurable space. Then
1. the empty set ∅ is in F ;
2. if A1, A2, . . . are in F then∞⋂n=1
An ∈ F ; and
3. if A,B ∈ F then A \B ∈ F .
In practice, we usually know exactly what kind of sets we wish to consider as events but
this collection rarely is a σ-algebra. So we want to find the most suitable σ-algebra for the
problem, i.e., it has to be large enough to contain the collection of sets we are interested
in but not too large. Certainly, P(Ω) is always large enough and we can always use it
whenever Ω is a finite set. However, when Ω is an uncountably infinite set, using P(Ω) as
our σ-algebra would cause serious problem once we want to define probability measure on
it.
Proposition 1.7. The intersection of an arbitrary nonempty collection of σ-algebras on a
set Ω is a σ-algebra.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.9
Corollary 1.8. Let Ω be a set and let C be a family of subsets of Ω. Then there exists
the smallest σ-algebra on Ω containing all members of C. It is called the σ-algebra on Ω
generated by C, and is denoted by σ(C).
Proof. Observe that P(Ω) is a σ-algebra containing all members of C. Then
σ(C) =⋂F : F is a σ-algebra and C ⊆ F
is the smallest σ-algebra on Ω containing all members of C.
Proposition 1.9. Let A ⊆ P(F).
1. If A is a σ-algebra then σ(A) = A.
2. If B is a σ-algebra and A ⊆ B then σ(A) ⊆ B.
Example. 1. Ω = 1, 2, . . . , 6 and C = 1, 2, . . . , 6 ⇒ σ(C) = P(Ω).
2. Ω = 1, 2, . . . , 6 and C = 1, 2, 3 ⇒σ(C) = ∅, 1, 2, 3, 4, 5, 6, 1, 2, 3, 2, 3, 4, 5, 6, 1, 4, 5, 6,Ω
3. In general, it is not always possible to list all members of σ(C). However, if C is a
countable set of mutually disjoint sets, say C = A1, A2, . . . , such that ∪∞n=1An = Ω,
then σ(C) = ∪i∈IAi : I ⊆ N.
Definition 4 (Probability spaces). Let (Ω,F) be a measurable space. A measure on F is
a function µ : F → [0,∞] such that µ(∅) = 0 and for any countable collection of disjoint
sets An∞n=1 in F ,
µ
( ∞⋃n=1
An
)=∞∑n=1
µ(An). (countably additive)
(Ω,F , µ) is called a measure space. A measure P is said to be a probability measure if
P (Ω) = 1 and, in this case, (Ω,F , P ) is called a probability space.
Example. Let Ω be any set and F = P(Ω).
1. [Relative frequency] For each A ⊆ Ω, define
µ(A) =
|A| ≡ the number of elements in A if A is a finite set,
∞ if A is an infinite set.
Then µ is a measure called the counting measure on Ω.
If |Ω| = N < ∞ and if each ω ∈ Ω is equally likely to happen, then we let P (A) =µ(A)
µ(Ω)=|A|N
. This defines a probability measure on (Ω,P(Ω)). Notice that in this
case of finite sample space we can always take P(Ω) as the σ-algebra.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.10
2. Fix an ω0 ∈ Ω, we define, for each A ⊆ Ω,
µω0(A) =
1 if ω0 ∈ A,
0 if ω0 /∈ A.
Then µω0 is a measure on Ω (Verify!), called a Dirac measure concentrated at ω0. In
this probability space, ω0 happens with probability 1.
3. If |Ω| is countably infinite then it is impossible to construct a probability measure if
each ω ∈ Ω is equally likely to happen.
4. If |Ω| is uncountably infinite then it is impossible to assign probability to all subsets
of Ω. So we have to opt for a smaller σ-algebra which still contains simple sets such
as intervals.
Definition 5 (Borel σ-algebra). The σ-algebra generated by the collection of all open sets
in R is called the Borel σ-algebra on R, denoted by B. Each element in the Borel σ-algebra
is called a Borel set.
Since every open set in R is a countable union of open intervals, the Borel σ-algebra on
R can also be generated by open intervals (a, b).
Proposition 1.10. Let Ω 6= ∅, Ai∞i=1 be a sequence of mutually disjoint subsets of Ω
(mutually exclusive events) such that⋃∞i=1Ai = Ω, and F be the σ-algebra generated by
Ai∞i=1. If p1, p2, . . . are real numbers in [0, 1] for which∑∞
i=1 pi = 1 then the set function
P : F → [0, 1] defined by
P
(⋃i∈I
Ai
)=∑i∈I
P (Ai) =∑i∈I
pi
is a probability measure on F .
Proposition 1.11. Let (Ω,F , P ) be a probability space.
1. If A1, A2, . . . , An ∈ F are mutually disjoint, then
P
(N⋃n=1
An
)=
N∑n=1
P (An). (finitely additive)
2. If A,B ∈ F and A ⊆ B, then P (A) ≤ P (B).
3. If A,B ∈ F and A ⊆ B, then P (B \A) = P (B)− P (A).
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.11
4. For any events A1, A2, . . . ,
P
( ∞⋃n=1
An
)≤∞∑n=1
P (An).
Theorem 1.12 (Continuity of probability measures). Let F be a σ-algebra on Ω, and P
be a probability measure on F , and An∞n=1 be a sequence of events. Then:
1. If A1 ⊆ A2 ⊆ . . . , then P
( ∞⋃n=1
An
)= lim
n→∞P (An).
2. If A1 ⊇ A2 ⊇ . . . , then P
( ∞⋂n=1
An
)= lim
n→∞P (An).
Proof. Put B1 = A1, B2 = A2 \ A1, . . . , Bn = An \ An−1 so that the Bn’s are mutually
disjoint sets in F ,
An =n⋃k=1
Bk, and∞⋃n=1
An =∞⋃n=1
Bn.
Then
P
( ∞⋃k=1
Ak
)= P
( ∞⋃k=1
Bk
)=∞∑k=1
P (Bk)
= limn→∞
n∑k=1
P (Bk) = limn→∞
P
(n⋃k=1
Bk
)= lim
n→∞P (An).
Let A =⋂∞n=1An and, for each n = 1, 2, . . . , let Dn = A1 \An. Then
D1 ⊆ D2 ⊆ . . . and A1 \A =∞⋃n=1
(A1 \An) =∞⋃n=1
Dn.
Applying 1. to Dn gives P (A1 \A) = limn→∞ P (Dn). Then
P (A1)− P (A) = P (A1)− limn→∞
P (An),
and hence P (A) = limn→∞ P (An).
The following example illustrates that constructing a probability space on an infinite
sample space is not at all trivial.
Example. Consider the experiment of tossing successively a fair coin infinitely many times.
The sample space is the sequence space
Ω = H,T∞ = (ωn) ≡ ω1ω2 · · ·ωn · · · : ωi is either H or T for each i = 1, 2, . . . .
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.12
The infinite cardinality of Ω prevents us from using the idea of relative frequency to define
probability here. So we start by defining easy events:
Aa1a2···ak ≡ (ωn) ∈ H,T∞ : ω1 = a1, ω2 = a2, . . . , ωk = ak ,
where a1a2 · · · ak ∈ H,Tk. It is natural to assign probability 1/2k to each of these events.
More generally, if S ⊆ H,Tk then the event
AS ≡ (ωn) ∈ H,T∞ : ω1ω2 · · ·ωk ∈ S
should be assigned probability |S|/2k. We then take as the event space the σ-algebra σ(A)
generated by
A =∞⋃k=1
⋃S⊆H,Tk
AS ∪ Ω .
σ(A) contains some events that are not in A but certainly can be obtained from events in
A through a series of countable unions or taking complements. For instance, show that the
followings are events in σ(A) \ A.
• “all tosses after the third toss come up heads”
• “infinitely many heads come up throughout the experiment”
Next step is to define probability measure on σ(A). We only know what probabilities to
assign to events in A. For the members of σ(A) \ A, the probabilities are determined by
the Caratheodory’ Extension Theorem.
Theorem 1.13 (Caratheodory’ Extension Theorem). Let A be an algebra on Ω 6= ∅ and
let µ : A → [0, 1] satisfy µ(Ω) = 1. If µ is countably additive on A, then there exists a
probability measure P on σ(A) such that P (A) = µ(A) for all A ∈ A.
To apply Caratheodory’ Extension Theorem to the above example, we define µ on A by
µ(Ω) = 1 and
µ (AS) =|S|2k
for S ⊆ H,Tk and k ∈ N.
Although it is not at all easy, it is possible to verify that µ is countably additive on A. By
Caratheodory’ Extension Theorem, we have a unique probability measure P on σ(A) which
agrees with µ on each AS . Let us now compute the probability of the two events listed
above.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.13
1.3 Random variables
Definition 6 (Random variables). Let Ω be a set, F be a σ-algebra on Ω, and X : Ω→ R.
X is called a random variable if ω ∈ Ω: X(ω) < α ∈ F for all α ∈ R. A random variable
is called discrete if X(Ω) is a countable set.
Remark. 1. If F = P(Ω) then every function X : Ω → R is a random variable. In
particular, whenever Ω is finite, we can take F = P(Ω) and every X is a random
variable.
2. If X is discrete then X is a random variable if and only if ω ∈ Ω: X(ω) = α ∈ Ffor all α ∈ R. This is not true for general random variables.
3. The σ-algebra depends on the random variable we want to consider.
Example. 1. Two dice are rolled and Jack bets on the sum being 10. If he wins, he’ll
get 2 baht; Otherwise, he’ll have to pay 1 baht. Then a random variable representing
Jack’s gain from this game is
X(ω) =
2 if ω = (4, 6), (5, 5), or (6, 4)
−1 otherwise.
What is the sample space Ω? In each of the following situations, what is the σ-algebra
F? and is X a random variable on such (Ω,F)?
• Jack observes the dice himself. I.e., he has complete information of the outcome
of the experiment.
• Jack is told only the sum of the two dice.
• Jack is told only whether each die is a Hi(4-6) or a Lo(1-3).
In situations that X is a random variable, assuming the die is fair, find P (X = 2),
i.e. the probability that Jack wins.
2. Even though head-to-head record of Federer versus Roddick is something like 10-1,
let’s assume that the probability that Federer wins a set from Roddick is 2/3. In a
five-set match, let X be the number of sets played. For example, if somebody wins in
three sets, then X = 3. Compute P (X = 3), P (X = 4), and P (X = 5). What are
the underlying sample space and σ-algebra?
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.14
3. Let (Ω,F , P ) be a probability space. For any event A ∈ Ω, the indicator function of
A is the function 1A : Ω→ [0, 1] defined as
1A =
1, if ω ∈ A,
0, otherwise.
For example, the random variableX in Example 1 is 21A−1Ω\A whereA = (4, 6), (5, 5), (6, 4).
Theorem 1.14. The following statements are equivalent.
1. ω ∈ Ω: X(ω) < α ∈ F for all α ∈ R.
2. ω ∈ Ω: X(ω) ≤ α ∈ F for all α ∈ R.
3. ω ∈ Ω: X(ω) > α ∈ F for all α ∈ R.
4. ω ∈ Ω: X(ω) ≥ α ∈ F for all α ∈ R.
Proof. 1.⇒2.: Let α ∈ R. Sinceω : X(ω) < α+ 1
n
is measurable for each n = 1, 2, . . . ,
so is the countable intersection
ω : X(ω) ≤ α =
∞⋂n=1
ω : X(ω) < α+
1
n
.
2.⇒3.: For each real number α, if ω : X(ω) ≤ α is measurable then ω : X(ω) > α =
ω : X(ω) ≤ αc is clearly measurable as well.
3.⇒4.: This follows from observing that
ω : X(ω) ≥ α =∞⋂n=1
ω : X(ω) > α− 1
n
.
4.⇒1.: ω : X(ω) < α = ω : X(ω) ≥ αc ∈ F .
Theorem 1.15. If X and Y are random variables and c ∈ R, then cX, X + Y , X · Y , and
|X| are random variables.
Proposition 1.16. If X is a random variable on (Ω,F , P ) and f is Borel measurable on
R, then f X is a random variable.
1.4 Distributions
Definition 7 (Distribution functions). Let (Ω,F , P ) be a probability space and X be a
random variable. Define FX : R→ [0, 1] by
FX(x) = P (ω ∈ Ω: X(ω) ≤ x) .
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.15
FX is called the distribution function of X. In probability, we (almost) always write P (X ≤x) for P (ω ∈ Ω: X(ω) ≤ x). Similarly, for E ⊆ R, P (X ∈ E) means P (ω ∈ Ω: X(ω) ∈ E).This is an abuse of notation where Ω is understood from the context.
Example. 1. Consider the game Jack played. Find FX . Now, if Jack plays the game
twice, find the distribution of Jack’s gain.
2. By randomly choosing a real number in Ω = [0, 1], we mean that the probability of
a number in an interval (a, b) ⊆ [0, 1] being chosen is b − a, i.e. P ((a, b)) = b − a.
Here, we use the Borel σ-algebra on [0, 1] and the probability measure P determined
by probabilities of open intervals. If a is chosen, let X = a. Find FX and FX2 .
3. Two random variables on different probability spaces may have the same distribution
functions. As should be apparent from the definition, distribution functions extract
only the information on the probability of the r.v. taking values in any given intervals.
Proposition 1.17. Let (Ω,F , P ) be a probability space. Then X is a random variable if
and only if X−1(E) = ω ∈ Ω: X(ω) ∈ E ∈ F for all Borel sets E ∈ B(R).
Definition 8 (Distributions). LetX be a random variable on the probability space (Ω,F , P ).
The set function PX defined on (R,B(R)) by
PX(E) = P (X ∈ E) = P (X−1(E)) for all E ∈ B(R)
is a probability measure called the distribution of X.
Theorem 1.18. A function F : R → [0, 1] is a distribution function of a random variable
if and only if
1. F is nondecreasing.
2. limx→−∞
F (x) = 0 and limx→∞
F (x) = 1
3. F is right continuous, i.e., ∀x0 ∈ R, limx→x+0
F (x) = F (x0).
Definition 9 (Probability functions). If X is a discrete random variable on (Ω,F , P ) then
the function f : R→ [0, 1] defined as
f(x) = P (X = x) for all x ∈ R
is called the probability function of X.
Note that 0 ≤ f(x) ≤ 1 and F (x) =∑t≤x
f(t). Clearly, if t /∈ X(Ω) then f(t) = 0.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.16
Definition 10 (Continuous random variables). A random variable X on (Ω,F , P ) is called
a continuous random variable if there is a Lebesgue integrable function f : R→ [0,∞) such
that
FX(x) =
∫ x
−∞f(t) dt for all x ∈ R.
f is called the probability density function of X. Of course,
∫ ∞−∞
f(t) dt = limx→∞
F (x) = 1.
Note that this is the same as saying
PX(A) =
∫Af(t) dt for all A ∈ B(R).
Example. 1. A continuous random variable is not a random variable that is continuous
on Ω. Consider the constant functions.
2. If Ω is finite, no function on Ω is a continuous random variable.
3. A random variable could be neither discrete nor continuous. For example, consider
an experiment where we toss a coin, if it comes up head, let X = 2. Otherwise, we
choose a number x between 0 and 1 randomly and let X = x. This random variable
X is neither discrete nor continuous.
Definition 11. Two random variables X and Y , probably on different probability spaces,
are said to be identically distributed if they have the same distribution function (distribu-
tion), i.e. FX = FY (PX = PY ).
Proposition 1.19. Let f : R → [0,∞) be a Lebesgue integrable function on R for which∫R f(x) dx = 1. Then f is a pdf of a continuous random variable with distribution function
F : R→ [0, 1] defined as
F (x) =
∫ x
−∞f(t) dt, x ∈ R.
Proposition 1.20. Suppose X is a continuous random variable on (Ω,F , P ). Then
1. ∀a ∈ R, P (X = a) = 0
2. For any interval I with endpoints a < b, P (X ∈ I) =∫ ba f(x) dx.
Definition 12 (Some discrete distributions). .
Bernoulli. A r.v. X with P (X = 0) = p and P (X = 1) = 1 − p is said to be a Bernoulli
random variable with parameter p.
Binomial. X is said to have a binomial distribution with parameters n and p if
P (X = k) =
(n
k
)pk(1− p)k.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.17
Poisson. X is said to have a Poisson distribution with parameter λ if for each k ∈ N0,
P (X = k) = e−λλk
k!.
Definition 13 (Some continuous distributions). .
Uniform. X is said to have the uniform distribution on [a, b] or uniformly distributed on
[a, b], a < b, if its pdf is
f(t) =
1b−a , a ≤ t ≤ b,
0, otherwise.
Exponential. X has the exponential distribution with parameter λ or exponentially dis-
tributed with parameter λ, λ > 0, if its pdf is
f(t) =
λe−λt t ≥ 0,
0, otherwise.
Standard Normal. Define the standard normal density
φ(x) =1√2πe−x
2/2.
Then X is said to be normally distributed if its pdf is φ.
Definition 14 (Almost surely). An event A is said to happen almost surely (a.s.) in
(Ω,F , P ) if P (A) = 1. For example,
1. X = Y a.s. on Ω means P (X = Y ) = 1.
2. |X| ≤M a.s. means P (|X| ≤M) = 1.
1.5 Expected values and variances
X is called a simple random variable if X(Ω) is a finite set. Note that simple random
variables are discrete. Also, a simple random variable can always be written in the standard
form
X =n∑k=1
ak1Ak(2)
where Ak = X = ak, X(Ω) = a1, a2, . . . , an, and all a1, a2, . . . , an are distinct.
Let us note that any given random variable X can be written as
X = X+ −X−
where X+(ω) = max(X(ω), 0), ω ∈ Ω, is the positive part and X−(ω) = −min(X(ω), 0),
ω ∈ Ω, is the negative part. Note that both X+ ≥ 0 and X− ≥ 0.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.18
Definition 15 (Expected values/Expectations). Let X be a random variable on a proba-
bility space (Ω,F , P ). We define the expected value E[X] of X as follows.
• If X is a simple random variable with the standard form in (2), we define
E[X] =
∫ΩX dP =
n∑k=1
akP (Ak).
In particular, if X = 1A then E[X] = P (A).
• If X ≥ 0 then we define
E[X] =
∫ΩX dP = sup E[Y ] : Y is simple and Y ≤ X .
• If X is integrable, i.e., either E[X+] <∞ or E[X−] <∞, then we define
E[X] = E[X+]− E[X−].
From now on, we shall consider only integrable random variables. We also say that the
expectation of X does not exist if it is not integrable.
This measure-theoretic approach of defining expected values needs a series of theorems in
measure theory, such as the monotone convergence theorem, Fatou’s lemma, and Lebesgue’s
dominated convergence theorem.
Proposition 1.21. Let X,Y be (integrable) random variables, a ∈ R, and A,B ∈ F . Then
1. If X ≤ Y a.s., then E[X] ≤ E[Y ].
2. If A ⊆ B then E[X · 1A] ≤ E[X · 1B].
3. E[aX] = aE[X]
4. E[X + Y ] = E[X] + E[Y ]
5. E[X · 1A∪B] = E[X · 1A] + E[X · 1B]
Example. Let (Ω,F , P ) be any probability space with Ω = ω1, ω2, . . . being a countable
set. Then the expected value of X with∑∞
n=1 |X(ω)|P (ω) <∞ is
E[X] =
∞∑n=1
X(ω)P (ω).
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.19
Note that for a given random variable X on (Ω,F , P ), PX is a probability measure on
(R,B(R)) and so∫R x dPX(x) is defined. In fact, by Proposition 1.23,
E[X] =
∫Rx dPX(x).
Proposition 1.22. Let (Ω,F , P ) be a probability space. If X is a discrete random variable
on (Ω,F , P ) with probability function f , then
E[X] =∑
x∈X(Ω)
xf(x).
If X is a continuous random variable on (Ω,F , P ) with pdf f , then
E[X] =
∫Rxf(x) dx. (3)
Definition 16 (Variances). Let X be a random variable for which both E[X] and E[X2]
exist. Then the variance of X is defined as
V ar(X) = E[(X − E[X])2].
To justify the equation (3), we have to make a change of variable and use the fact that
the distribution of a continuous random variable is absolutely continuous with respect to
Lebesgue measure on R. This is why continuous random variables are sometimes referred
to as absolutely continuous random variables.
Proposition 1.23 (Change of variables). Let X be a random variable on a probability
space (Ω,F , P ), and let g be any random variable on (X(Ω),B(X(Ω)), PX) with g ≥ 0 or
EPX[g] <∞. Then ∫
X(Ω)g(x) dPX(x) =
∫Ω
(g X)(ω) dP (ω).
Definition 17 (Absolutely continuous). Let (Ω,F) be a measurable space and let P , Q be
probability measures on (Ω,F). We say that P is absolutely continuous with respect to Q,
denoted by P Q, if P (A) = 0 whenever Q(A) = 0 for all A ∈ F .
Theorem 1.24 (Radon-Nikodym Theorem). Let (Ω,F) be a measurable space and let P ,
Q be probability measures on (Ω,F). If P Q then there exists unique f : Ω→ [0,∞) such
that
P (A) =
∫Af dQ, A ∈ F
and, consequently, ∫Ag dP =
∫Ag · f dQ, A ∈ F .
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.20
1.6 Conditional probability and independence
Definition 18 (Conditional probabilities). Let A,B ∈ F and P (A) > 0. We define the
conditional probability of B given A by
P (B|A) =P (B ∩A)
P (A).
If both P (A) and P (B) are positive, P (A ∩B) = P (A)P (B|A) = P (B)P (A|B).
Theorem 1.25. Let (Ω,F , P ) be a probability space and A ∈ F with P (A) > 0. Define
FA = A ∩B : B ∈ F and PA : FA → [0, 1] by PA(B) = P (B|A). Then (A,FA, PA) is a
probability space.
We call a set of events B1, B2, . . . , Bn a partition of Ω if B1, B2, . . . , Bn are mutually
disjoint,⋃ni=1Bi = Ω, and each P (Bi) > 0.
Theorem 1.26 (Bayes’ rule). Let B1, B2, . . . , Bn be a partition of (Ω,F , P ) and P (A) >
0. Then
1. P (A) =n∑i=1
P (Bi)P (A|Bi)
2. P (Bk|A) =P (Bk)P (A|Bk)
P (A)=
P (Bk)P (A|Bk)∑ni=1 P (Bi)P (A|Bi)
Example. A football team has had a poor season and the manager is likely to be fired at
the end of the season. If the team wins its final game his chance of being fired is 60% but
if the team fails to win then the chance of his being fired is 80%. The probability that the
team wins its final game is 0.3. Find
1. The probability that the manager is fired.
2. If you are told that he was fired, what is the probability that the team won the final
game.
Definition 19 (Independence of events). Let (Ω,F , P ) be a probability space. The events
A,B ∈ F are said to be independent if
P (A ∩B) = P (A)P (B).
A subcollection A of F is said to be pairwise independent if, for any pair of events A,B ∈ A,
P (A∩B) = P (A)P (B). More generally, a collection A of events in F is called independent
if
P (A1 ∩A2 ∩ · · · ∩An) = P (A1)P (A2) · · ·P (An)
for all A1, A2, . . . , An ⊆ A.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.21
Example. 1. Pairwise independence does not imply independence. Consider, for exam-
ple, events A = HH,HT, B = HH,TH, and C = HH,TT when a fair coin is
tossed twice.
2. ∅ and Ω are independent of any event. More generally, events with probability 0 or 1
are independent of any event.
3. If two disjoint events A and B are independent then either P (A) = 0 or P (B) = 0.
Proposition 1.27. 1. Suppose P (B) > 0. Then A,B are independent if and only if
P (A|B) = P (A).
2. If A and B are independent then Ω \A and B are independent.
3. If A ⊂ F is independent then it is pairwise independent.
Given a random variable X on a probability space (Ω,F , P ), we define the σ-algebra
generated by X, denoted by σ(X), to be the σ-algebra generated by all inverse images of
Borel sets under X, i.e.
σ(X) =X−1(E) : E ∈ B(R)
.
Verify that σ(X) is a σ-algebra.
Definition 20 (Independence of random variables). Random variables X1, X2, . . . , Xm on
(Ω,F , P ) are said to be independent if for any Ai ∈ σ(Xi), i=1,2,. . . ,m, A1, A2, . . . , Amis independent. In other words, X and Y are independent if
P(X−1(E) ∩ Y −1(F )
)= P
(X−1(E)
)P(Y −1(F )
)= PX(E)PY (F )
for all Borel sets E,F ∈ B(R).
Example. 1. Consider two urns, red and blue, each holding k chips that are numbered
1, 2, . . . , k. A chip is to be drawn at random from each urn. Let Xr, Xb be the numbers
on the chips drawn from the red and blue urns, respectively. Then Xr and Xb are
independent.
2. A constant (a.s.) random variable is independent of any other random variable.
3. A random variable is independent of itself iff it is constant a.s.
4. Let X,Y be independent random variables and let f, g be Borel measurable functions
on R. Then the random variables f X and g Y are independent.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.22
Proposition 1.28. X and Y are independent if and only if for all a, b ∈ R,
P (X ≤ a, Y ≤ b) = P (X ≤ a)P (Y ≤ b).
Proposition 1.29. Discrete random variables X1, . . . , Xm are independent iff for any
x1, . . . , xm ∈ R,
P (X1 = x1, . . . , Xm = xm) = P (X1 = x1) · · ·P (Xm = xm).
Example. 1. Toss a fair coin twice and consider X1(HH) = 2 and X1 = −1 otherwise;
X2(TT ) = 2 and X2 = −1 otherwise;
Y1(ω) =
2 ω ∈ HH,HT ,
−1 ω ∈ TH, TT ,and Y2(ω) =
2 ω ∈ HH,TH ,
−1 ω ∈ HT, TT .
Then X1, X2 are not independent while Y1, Y2 are independent.
Definition 21. The joint distribution function of random variables X1, X2, . . . , Xm on
(Ω,F , P ) is the function FX1,...,Xm : Rm → [0, 1] defined by
FX1,...,Xm (x1, . . . , xm) = P (X1 ≤ x1, . . . , Xm ≤ xm) .
The joint probability density function ofX1, X2, . . . , Xm, if exists, is the function fX1,X2,...,Xm : Rm →[0,∞) for which
P (X1 ≤ x1, . . . , Xm ≤ xm) =
∫ xm
−∞· · ·∫ x1
−∞fX1,X2,...,Xm(t1, . . . , tm) dt1 · · · dtm.
So, for example, if FX1,X2 is differentiable then fX1,X2(x1, x2) =∂2FX1,X2∂x1∂x2
.
Example. If fX,Y (x, y) =ce−2x
1 + y2for x > 0 and y ∈ R, what is c?
Remark. One can recover the pdf of Xi from the joint pdf of X1, . . . , Xn. For instance,
fX(x) =
∫RfX,Y (x, y) dy, fY (y) =
∫RfX,Y (x, y) dx.
Proposition 1.30. Let X1, X2, . . . , Xm be random variables with the joint distribution
function FX1,...,Xm. Then they are independent iff for any x1, . . . , xm ∈ R,
FX1,...,Xm (x1, . . . , xm) = FX1(x1) . . . FXm(xm).
In particular, if Xi’s are continuous independent random variables, then the joint probability
density function of X1, X2, . . . , Xm is
fX1,X2,...,Xm(t1, t2, . . . , tm) = fX1(t1)fX2(t2) · · · fXm(tm).
Example. 1. Toss a fair coin twice
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.23
1.7 More on expectations
Similar to the one variable case, if g : R2 → R is a Borel measurable function and X,Y are
random variables on (Ω,F , P ) and (Σ,G, Q), respectively, then we define
Eg(X,Y ) =
∫Ω×Σ
g(X,Y ) dP ×Q.
Hence, if X,Y are discrete with probability function f(x, y) then
Eg(X,Y ) =∑
g(x, y)f(x, y);
and if X,Y are continuous with density function f then
Eg(X,Y ) =
∫R
∫Rg(x, y)f(x, y) dx dy.
Theorem 1.31. Let X,Y be random variables on (Ω,F , P ) whose expectations exist.
1. If X,Y are independent then E[XY ] = E[X]E[Y ]. But the converse is not true in
general.
2. X,Y are independent if and only if for any measurable functions g, h for which both
Eg(X) and Eh(Y ) exist,
E[g(X)h(Y )] = E[g(X)]E[h(Y )].
The covariance of X and Y is defined as
Cov(X,Y ) = E[(X − EX)(Y − EY )] = E[XY ]− E[X]E[Y ].
It then follows immediately that if X,Y are independent then their covariance is 0. And
since
V ar(aX + bY ) = a2V ar(X) + b2V ar(Y ) + 2abCov(X,Y ),
if X,Y are independent then V ar(X + Y ) = V ar(X) + V ar(Y ).
Example. Since a binomial is the sum of n independent Bernoulli’s, its variance is np(1−p).
Definition 22 (Conditional expectation). Let X be a discrete random variable whose
expectation exists and A be an event with P (A) > 0. The conditional expectation of X
given A is
E[X|A] =∑
x∈ImXxP (X = x|A).
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.24
Theorem 1.32. Let X be a discrete random variable whose expectation exists and B1, . . . , Bnbe a partition of Ω. Then
E[X] =
n∑i=1
E[X|Bi]P (Bi).
Example. Suppose that an urn contains N cards labeled x1, . . . , xN . Let X,Y be the
number on the first and second cards chosen at random. Suppose the cards are drawn
randomly without replacement. Find E[X], E[Y ] and Cov(X,Y ).
If P and Q are probability measures on (Ω,F), then we have expectations with respect
to different measures:
EP [X] =
∫ΩX dP and EQ[X] =
∫ΩX dQ.
If P Q then one can write expectations in terms of the different measures:
EP [X] =
∫ΩX dP =
∫ΩXdP
dQdP = EQ[X
dP
dQ],
where dPdQ is the Radon-Nikodym derivative.
Definition 23. Let X,Y be random variables on (Ω,F , P ). Then E[X|Y ] is a random
variable on (Ω, σ(Y )) satisfying∫AX dP =
∫AE[X|Y ] dP for all A ∈ σ(Y ).
Note that E[X|Y ] is unique up to a.s. equivalence.
Example. Consider E[X|Y ] where Y = 1A,∑n
i=1 ai1Ai .
Proposition 1.33. Let X,Y be random variables on (Ω,F , P ).
1. E [E[X|Y ]] = E[X]
2. If Y ≡ c then E[X|Y ] = E[X].
3. If X,Y are independent then E[X|Y ] = E[X].
In the above condition, what’s really important is the σ-algebra Y generates.
Definition 24. Let X be a random variable on (Ω,F , P ) and G be a σ-algebra with G ⊆ F .
Then E[X|G] is a random variable on (Ω,G) satisfying∫AX dP =
∫AE[X|G] dP for all A ∈ σ(Y ).
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.25
Theorem 1.34. Let X,Y be random variables on (Ω,F , P ) and G ⊆ F be a σ-algebra. Let
a, b ∈ R.
1. If X is G-measurable, then E[X|G] = X a.s.
2. E[aX + bY |G] = E[aX|G] + E[bY |G] a.s.
3. If X is G-measurable and E[XY ] is finite then E[XY |G] = X E[Y |G] a.s.
4. If H ⊆ G is a σ-algebra then
E[X|H] = E [E[X|G]|H] = E [E[X|H]|G] a.s.
5. If X ≤ Y a.s., then E[X|G] ≤ E[Y |G] a.s.
1.8 Moment generating functions
Definition 25. The moment generating function of a random variable X on (Ω,F , P ) is
defined as
MX(t) = E[etX ].
It is understood that the domain of MX is the set of all t for which E[etX ] exists.
Example. Find MX if
1. X ∼ Ber(p)
2. X ∼ Poi(λ)
3. X ∼ N (µ, σ2)
Proposition 1.35. Let X be a random variable whose moment generating function exists
on (−δ, δ) for some δ > 0. Then
1. M(k)X (t) =
∑x∈Im(X)
dketx
dtkP (X = x) if X is discrete.
2. M(k)X (t) =
∫R
dketx
dtkf(x) dx if X is continuous.
3. E[Xk] = M(k)X (0)
4. MX(t) =∞∑k=0
tk
k!E[Xk]
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.26
Theorem 1.36. If MX = MY on some interval (−δ, δ) then X and Y have the same
distribution.
Theorem 1.37. If X,Y are independent then MX+Y = MXMY .
Example. 1. Find MX if X is a binomial with parameters n, p.
2. If X ∼ N (µ1, σ21) and Y ∼ N (µ2, σ
22) where X,Y are independent then X + Y ∼
N (µ1 + µ2, σ21 + σ2
2).
1.9 Laws of large numbers
Theorem 1.38 (Chebyshev’s inequality). Let X be a random variable on (Ω,F , P ) with
finite expectation and ε > 0. Then
P (|X − EX| ≥ ε) ≤ 1
ε2V ar(X).
More generally, for any r > 0,
P (|X − EX| ≥ ε) ≤ 1
εrE(|X − EX|r).
Definition 26 (Modes of convergence). Let Xn be a sequence of random variables on a
common probability space (Ω,F , P ).
• Xn is said to converge almost surely to X (Xn → X a.s.) if
P(
limn→∞
Xn = X)
= 1.
• Xn is said to converge in the mean of order p > 0 to X (Xn → X in Lp) if
limn→∞
E (|Xn −X|p) = 0.
• Xn is said to converge in probability to a random variable X (Xn → X in prob.) if
for all ε > 0,
limn→∞
P (|Xn −X| ≥ ε) = 0.
Observe that either convergence in the mean and convergence a.s. implies convergence
in probability. There is no other implication among the different modes. Convergence in
probability is the weakest mode of convergence.
Theorem 1.39 (Weak law of large numbers I). Let Xn be a sequence of independent
random variables, each with expected value µ and variance σ2. Then
1
n
n∑k=1
Xk → µ, as n→∞, in probability.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.27
Proof. 1n
∑nk=1Xk has expectation µ and variance σ2/n. Therefore, by Chebyshev’s in-
equality,
P
(∣∣∣∣X1 + · · ·+Xn
n− µ
∣∣∣∣ ≥ ε) ≤ σ2
nε2.
Theorem 1.40 (Weak law of large numbers II). Let X1, X2, . . . be uncorrelated random
variables (E[XiXj ] = E[Xi]E[Xj ]) with means µ1, µ2, . . . and variances σ21, σ
22, . . . . Assume
that
limn→∞
1
n2
n∑k=1
σ2k = 0.
Then for each ε > 0,
limn→∞
P
(∣∣∣∣X1 + · · ·+Xn
n− µ1 + · · ·+ µn
n
∣∣∣∣ ≥ ε) = 0.
Example. Suppose that an urn contains N cards labelled x1, . . . , xN such that∑N
i=1 xi = 0.
Let X1, . . . , Xn be the number on the cards chosen at random. Suppose the cards are drawn
randomly without replacement.
Let A1, A2, . . . be events in a probability space (Ω,F , P ). Then the event
∞⋂n=1
∞⋃m=n
Am = ω ∈ Ω: ω belongs to infinitely many of the An
is called “An i.o.”
Lemma 1.41 (Borel-Cantelli). 1. If∞∑n=1
P (An) <∞ then P (An i.o.) = 0.
2. If
∞∑n=1
P (An) =∞ and An’s are independent, then P (An i.o.) = 1.
Example.
If Xn → X in probability then there is a subsequence nk such that Xnkconverges to X a.s.
What is the probability that two consecutive heads will come up infinitely often in the
repeated tossing of a fair coin?
Theorem 1.42 (Strong law of large numbers). Let Xn be a sequence of independent,
identically distributed, random variables with finite expected value µ = E[Xi]. Then
limn→∞
1
n
n∑k=1
Xk → µ almost surely.
A simple proof assumes that the fourth moment is finite (E[X4i ] < ∞) and uses the
Chebyshev’s inequality and Borel-Cantelli lemma.
2301 690 Special Topics in Advanced Mathematics: Copulae 2011/1 (Songkiat) p.28
1.10 Central limit theorem
Theorem 1.43 (Central limit theorem). Let Xn be a sequence of independent, identically
distributed, random variables with
E[Xi] = µ, V ar(Xi) = σ2.
Then for all (a, b) ⊆ (−∞,∞)
limn→∞
P
(a ≤
∑ni=1(Xi − µ)√
nσ≤ b)
=1√2π
∫ b
ae−x
2/2 dx.