table of contents chapter acknm~ledgments abstract i introduction 1.1 introduction and preliminary...
TRANSCRIPT
A dissertation under the direction of Gordon Simons.
SOME STRUCTURAL RELATIONSHIPS BETWEEN WEAK CONVERGENCEOF PROBABILITY MEASURES AND CONVERGENCE IN PROBABILITY
Flav;o W. Rodr;guesDepartment of Statistics
University of North Carolina at Chapel HiH
Institute of Statistics Mimeo Series No. 812
March, 1972
TABLE OF CONTENTS
CHAPTER
ACKNm~LEDGMENTS
ABSTRACT
I INTRODUCTION
1.1 Introduction and preliminary background1.2 Review of the literature and description of the
resul ts
II THE REAL LINE
2.1 The non-atomic case2.2 The atomic case2.3 The general case
III GENERALIZATION TO COMPLETE, SEPARABLE METRIC SPACES
3.1 The Skorokhod partition of S3.2 The non-atomic case3.3 The atonric and the general case
IV SOME RELATIONSHIPS BETWEEN THE METRICS LAND p
4.1 The lower bound for p4.2 Weak convergence and equivalent probability
measures
APPENDIX
REFERENCES
PAGE
iii
v
1
4
91824
324961
67
77
86
91
ACKNOWLEDGMENTS
I would like to express my deepest gratitude to my advisor
Professor G. D. Simons for proposing this problem and for the many val
uable suggestions he made during the course of this investigation. I
would also like to thank him for his patience in going through the sev
eral first drafts of the manuscript and for his words of encouragement
and confidence in the difficult moments.
I wish to thank Professor W. Hoeffding, Professor M. R. Leadbetter
and Professor W. L. Smith for reading the manuscript and offering help
ful suggestions.
I extend my thanks to those members of the faculty in the Depart
ments of Statistics and Mathematics who contributed to my education at
the University of North Carolina at Chapel Hill.
For financial support, I would like to express my gratitude to
Conselho Nacional de Pesquisas, (C. N. Pq.), Brazil, which granted me
a scholarship for the larger part of my stay in the United States.
My thanks also go to Pan American Health Organization for a travel
grant and financial support during my first year of graduate studies.
It is a pleasure to acknowledge the cooperation and understanding I
received from the faculty and administration of the School of Public
Health of the University of SAo Paulo during my leave of absence.
lowe a deep sense of gratitude to my wife, Regina, for her con
tinuous support and enthusiasm during all the phases of my studies. I
also want to thank Dr. Carlos A. B. Dantas whose insistence and
enthusiasm were greatly responsible for my decision to come to the
United States to pursue graduate studies.
Finally, I thank Mrs. Cynthia Grossman for her able and careful
typing of the manuscript.
iv
ABSTRACT
Let {X} be a sequence of random variables defined on a proban
bility space (Q, A, P) whose corresponding sequence of probability
distributions {Qn} is weakly convergent to a probability distribution
Q. In this study, we show the existence, on (Q, A, P), of a sequence
{Y} of identically distributed random variables, with probability disn
tribution Q, such that X - Y converges to zero, in probability, asn n
n tends to infinity. In the special case of a non-atomic (Q, A, P)
we use the quantile function, associated with the distribution function
of Q, to explicitly determine a particular version of the Y 'sn
as
functions of the X 's and of an auxiliary uniform random variable.n
This result is then extended to sequences of random elements taking
values on a complete, separable, metric space. To accomplish this ex-
tension we consider a total ordering of the metric space induced by a
special sequence of partitions which were first introduced by Skorokhod.
The techniques used in the real line are adapted to metric spaces by
means of a type of generalized distribution function associated with
each probability measure on the space.
Some relationships between the Inetric p, associated with conver-
gence in probability, and the Levy-Prohorov metric, L, are also in-
vestigated. In particular, a result relating convergence in probability,
weak convergence and the class of probability Ineasures which are equi-
valent to P, is shown to be valid on separable metric spaces.
CHAPTER I
INTRODUCTION
1.1. Introduction and preli8inary background.
The basic facts, concerning the extension of the theory of weak
convergence of probability measures, to metric spaces, have been known,
at least, since 1940 (see, e.g. Alexandrov [1940]). However, it was not
until 1956, with the publication of Prohorov's fundamental paper: "Con-
vergence of random processes and limit theorems in probability theory",
that the importance and far-reaching consequences of such an extension,
were fully understood. A detailed account of both the theory and appli-
cations of weak convergence, plus an extensive bibliography, can be
found in the monographs by Parthasarathy [1967] and Billingsley [1968].
In this dissertation, we will be concerned with problems relating
weak convergence with other types of convergence, and with the structure
of the basic probability space, (n, A, P). Suppose, for example, that
we are given a weakly convergent sequence, {Pn }, of probability mea
sures on the a-field of Borel sets, of a separable metric space S.
Consider, now, a probability space, (n, A, P), where it is possible to
define a sequence, {X}, of random elements, with values in S, whosen
corresponding sequence of probability distributions is {p }.n We are
interested in the implications of the weak convergence of the P 'sn
for
the convergence properties of the X 'snand for the structure of
2
(n, A, P). Before describing our results, we need to recall some defi
nitions and facts, about probability measures in metric spaces.
From now on, S will always denote a separable metric space with
distance d. The particular distance will, in general, be of no concern
to us; what matters is that the topology of S is given by means of a
metric. For the reader's convenience we have assembled, in the Appendix
the results about metric spaces that will be needed in the sequel.
Definition 1.1.1. The Borel a-field of S is the smallest a-field
containing the open (closed) sets of S. It will be denoted by Sand,
since S is separable, it is also the a-field generated by the open
balls of S.
Definition 1.1.2. A measurable function from a measurable space
(n, A) to (S, S) will be called a random element (r.e.) ·of S. Any
r.e. defined on a probability space (n, A, P) induces, in the usual
way, a probability measure on (S, S).
Definition 1.1.3. A sequence {Q} of probability measures onn
(S, S) is weakly convergent to a probability measure Q on (S, S) if
for every real valued, continuous and bounded function, f, on S, we
(1.1.1) f fd~ ~ f fdQ as n ~ ~.
Definition 1.1.4. Let Q be any probability measure on (S, S).
A Borel set A is said to be a Q-continuity set if Q(oA) is equal to
zero. Here, oA denotes the boundary of A.
3
THEOREM 1.1.1. Let {Pn } and P be probabiUty measures on
(5, S). Then~ P aonve~es weakZy to P if and onZy if P (A) aon-n n
verges to P(A) for every P-aontinuity set A E S.
For the proof and for other conditions equivalent to weak conver-
gence, see Billingsley [1968; pages 11-12].
For any subset A of 5 and 0 > 0, let
(1.1. 2) ... {xE5: d(x,A) $ oJ •
Definition 1.1.5. Let P and Q be two measures on S. The
Levy-Prohorov distance, between P and Q is defined to be:
(1.1.3) L(P,Q) = inf{€ > 0: P(F) $ Q(F€)+£,
for all F, closed, F c S}.
L is a metric on the space Z(5) of probability measures on (5, S)
and it has been shown, by Prohorov [1956] and Dudley [1968], that con-
vergence in the metric L is equivalent to weak convergence.
For random elements X and Y, from (n, A, P) to (5, S) we
will denote by d(X,Y), the function that at WEn takes the value:
d(X(w), Y(w». It can be shown, (see Billingsley, [1968], pg. 225) that
when 5 is separable, d(X,Y) is a random variable.
Definition 1.1.6. Let {X} and X be r.e.'s from (n, A, P) ton
(5, S). The sequence is said to converge in probability to X, if, for
all £ > 0: P({w: d(X ,X) ~ £}) converges to zero as n tends ton
infinity.
Convergence in probability can also be expressed by means of a
metric, on the space of all r.e.'s from (n, A, P) to (5, S). Given
4
any two r.e.'s X and Y define
(1.1.4) p(X,Y) = inf{£ > 0: P({w: d(X,Y) > £}) S £}.
If we interpret equality of r.e. 's to mean equality a.e. then, p
is a metric and convergence in the metric p is equivalent to conver-
gence in probability.
1.2. Review of the literature and description of the results.
We start by quoting the classical result which, loosely speaking,
says that convergence in probability implies weak convergence.
THEOREM 1.2.1. Let {X} and X be r.e. 's defined on (Q, A, P)n
with values on (8, S). {X }n
converges in probability to X~ the
corresponding sequence {Qn} of probability distributions converges
weakly to the probability distribution Q of x. The converse is also
true in the case where Q has aZl its mass concentrated on a singleton.
In this dissertation we shall look into several types of converses
for Theorem 1.2.1. As a matter of fact, our main results could be con-
sidered as converses for Theorem 1.2.1, in the special case where the
r.e. 's are assumed to be defined on the same probability space. The
following theorem, due to Skorokhod [1956], can be regarded as an
existence type of converse for Theorem 1.2.1 and will playa fundamental
role in the sequel.
sures on (8, S) and assume that
THEOREM 1.2.2. (Skorokhod) Let {p} and P be probability mean
{p} converges weakly to P. If Sn
is separable and compZete 3 we can find a probability 8pace~ with random
5
eZements {X} and X defined on it~ such thatn
a) For each n;:;; I, Pn is the pl'obabiZity distribution of xn
and P is the pl'obabiZity distribution of X.
b) Xn converges to X everywhel'e~ as n tends to infinity.
This result has been extended, by Dudley [1968], to spaces 8
which are only separable. For further extensions, which don't assume
separability see the paper by Wichura [1970]. A survey paper by Pyke
[1970] gives several examples of a.s. convergent processes, constructed
with the help of Theorem 1.2.2, which have a wide variety of app1ica-
tions to probability and statistics.
For S complete, Strassen [1965] proved the important result that
if P and Q are any probability measures in Z(8), the Prohorov dis-
tance L(P,Q) is the minimum distance, "in probability", (see Defini-
tion 1.1.4) between r.e.'s distributed according to P and Q. From
this result, it follows that if P converges weakly to P,n
sequences
of r.e.'s {X} and {y}, whose distributions are respectively, {P}n n n
and P, could be constructed in such a way that, p(X , Y )n n converges
to zero as n tends to infinity. Of course, this last result is also
implied by Skorokhod's theorem which is in fact, stronger.
Both results mentioned above, assume as given a weakly convergent
sequence, {P },n
of probability measures and go on to construct a prob-
ability space, with a sequence of r.e.'s defined on it, distributed ac-
cording to {P }n and satisfying a specific convergence property. In
our main result (Theorem 1.2.3, below) we take a different approach,
assuming both the probability space and the sequence of r.e. 's to be
given a priori.
6
THEOREM 1.2.3. Let {Xn } be a sequence of r.e. 's defined on a
probability space (n, A, P), with values on a complete~ separable
metric space S. Suppose that the corre sponding sequence of probabi Zity
distributions~ {~}~ converges weakly to the probability distribution
Q. Then~ there exists~ on (n, A, P)~ a sequence {y} of identicallyn
distributed r.e. 's~ with probability distribution Q~ such that the
distance "in probabi Zi ty"
infinity.
p (x , y )n n converges to zero as n tends to
Remarks: 1) We say that a probability measure Q, on (8, S), is
admissible for (n, A, P) if there exists, in this probability space,
a r.e., with values in S, whose probability distribution is Q. It
follows, from Theorem 1.2.3, that the class of probability measures on
(S, S), which are admissible for a given probability space, is closed
under weak convergence.
2) Suppose that for some probability space (n, A, P) and for
some probability measure, Q on S, X is the unique (up to an equiv
alence) r.e. on (n, A, P) whose probability distribution is Q.l
Then, if {X }n is any sequence of r.e. 's on (n, A, P), whose proba-
bility distributions are weakly convergent to Q,
converges to X in probability.
it follows that Xn
1
In Chapter II, we discuss the case where S is the real line with
the usual metric. In this case, the proofs can be greatly simplified by
An example of this situation is provided by a degenerate probability measure, i.e. a probability measure that put all its mass on asingleton. For spaces with atoms (see Definition 2.1.2) this situationmay occur even for non-degenerate Q.
7
the well-known constructions based on the natural order of the real
line. For example, we will show that if X is any random variable,
with distribution function F, defined on a non-atomic probability
space, (Definition 2.l.3) it is possible to modify F(X} in order to
obtain a uniform random variable U, in such a way that X can be ex
pressed as a measurable function of U. This result, described in
Theorem 2.1.2, may have some interest in itself. For atomic probability
spaces, we show that Theorem 1.2.3 is a consequence of the relations be
tween tightness (see Definition 2.2.2) of the sequence of probability
measures and the existence of a.e. convergent subsequences. (Theorem
2.2.2}. The general case requires a little more than a simple combi
nation of the other two, in virtue of the fact that weak convergence is
not, necessarily, preserved under conditioning.
In Chapter III, we consider the problem in an abstract complete,
separable, metric space. The difficulty here will be in the non-atomic
case, since the powerful machinery of distribution functions is not
available. We will show, however, that an equivalent procedure can be
developed, by partitioning the metric space, in the way used by
Skorokhod in the proof of his result (Theorem 1.2.2).
In Chapter IV, we will discuss some relationships between the met
rics Land p from the point of view of the structure of the proba
bility space (n, A, P). By restricting L to the subspace of admis
sible probability measures we will be able to discuss a structural ver
sion of Strassen's result. Furthermore, we will extend to metric spaces
a result of Padmanabhan [1970] and show how this extension will allow us
to use L to define a new metric PO' in the space of r.e.'s, which is
equivalent to p. By means of an example we will show that Po is not
necessarily complete which implies that, although equivalent, the two
metrics generate different uniformities.
8
CHAPTER II
THE REAL LINE
In this chapter we consider sequences of probability measures on
the a-field B of Borel sets of the real line (R1). As it is well
known, a probability measure on (R1
, B) is completely determined by
the corresponding distribution function (to be hereafter abbreviated as
d.f.). A random variable (r.v.), X, defined in some probability space
(n, A, P) induces in (R1
, B) a probability measure Px given by:
-1PX(B) = P(X (B» for all B € B.
We will make use of the words inc~asing~ dec~asing~ positive~
negative in their loose interpretation. The qualifier strictly will be
added when necessary. All d.f. 's are assumed to be continuous from the
right and proper, that is such that F(-oo) = 0 and F(+oo) = 1.
Finally, we recall that the weak convergence of {p} to Pis,n
here, equivalent to the convergence of the corresponding sequence of
d.f.'s, {F },n
to the d.f. F of P, at all points of continuity of
the latter.
2.1. The non-atomic case.
Definition 2.1.1. Let (n, A, P) be a probability space. Given
A € A, we define the P-equivalence class determined by A, to be:
10
[AJ = {B € A: P(AAB) = a}, where 6 indicates the symmetric difference.
Definition 2.1.2. An atom of a probability space (n, A, P) is
the P-equiva1ence class of a set A € A for which peA) > 0 and such
that for all B € A, B c A, we have either PCB) = 0 or PCB) = peA).
Remapk. If A is an atom of (n, A, P) then, the d.f. of every
r.v. defined on this space has a jump point of size ~ peA). On the
other hand if X is a r.v. defined on (n, A, P) and F is the d.f.
-1of X, then the atoms of the probability space (n, X (B), P) are
-1given by: {X ({x}): x is a jump point of F}.
We will not, in general, distinguish between the event A and the
P-equivalence cZass [AJ. Statements such as: the event A is an atom,
can be made rigorous by the convention that 2 atoms A and Bare
equal whenever P(A6B) = O. Since the intersection of two distinct
atoms of (n, A, P) has probability zero it follows that a probability
space has at most a countable number of atoms.~
atoms of (n, A, P) and put: AO = U 1 A •n= n
Let {A}n denote the
Definition 2.1.3. A probability space without atoms is called non-
atomic. A probability space is called atomic if P(AO) = 1.
The following theorem is due to Varadarajan [1958]:
THEOREM 2.1.1. If (n, A, P) is a non-atomic ppobability space,
it is possible to define r.v.'s ~1'~2"'" with aPbitrapy, pre
assigned consistent, joint distPibutions.
CopolZ~. A probability space admits a uniformly distributed ran-
dom variable if and only if it is non-atomic.
11
Definition 2.1.4. Let F be any d.f. For all t, 0 < t < 1,
define:
(2.1.1) F-1 (t) = inf{x € R: F(x) ~ t}.
It is easy to see that -1F is increasing and continuous from the left.
-1F ,Some properties of that will be needed later, are listed in the
following lemma:
Lemma 2.1.1. Let F be any d.f. For all real x and every t,
o < t < 1, we have:
if and only if
implies
and
x < F-1 (t)
t ::; F(x)
F-1 (t) ::; x.
and
implies
t < F(x)
i)
ii)
iii)
iv) -1F is continuous at t if and only if {x: F(x) = t} is
either empty or a singleton. Conversely, F is continuous
at x if and only if-1
{t: F (t) = x} is either empty or a
singleton.
v) F converges weakly ton
F if and only if-1
Fn converges
to -1F at all continuity points of -1
F •
Proof: i, ii, iii, iv are direct consequences of the definition of
-1F and the properties of the infimum of a set of real numbers. To
prove v, suppose first that
note a continuity point of
Fn
-1F •
converges weakly to F. Let
In correspondence to a given
de-
e: > 0,
arbitrary, choose two continuity points, x and y, of F, such that:
(2.1.2) and
(2.1.3) y - x < E.
12
This choice is possible in virtue of the fact that the set of con tin-
uity points of F is dense in the reals.
By part ii of the lemma, we have:
(2.1.4) F(x) < to'
By part i of the lemma and the fact that F is increasing, we have:
(2.1.5)
Finally, since to is a continuity point of -1F , the first half of
part iv of the lemma implies:
(2.1.6) to < F(y).
The convergence of F (x)n
to F(x) and of F (y)n
to F(y) together
with (2.1.4) and (2.1.6) imply that there exists an integer N, such
that:
(2.1. 7) for all n ~ N.
Parts ii and iii of the lemma, imply:
(2.1.8) for all n ~ N.
By putting together (2.1.2), (2.1.3) and (2.1.8) we have:
(2.1.9) for all n ~ N.
This completes the proof of the first part of v.
To prove the other half, observe first that the set of continuity
points of-1
F is dense in (0,1) . Hence, given E > 0, and a
13
continuity point Xo of F, consider the possibilities:
a) o < F(xO
) < 1. Choose two continuity points, t1
and t 2 '
of -1 such that: < F(xO
) andF , t1
< t2 t 2 - t < E.
1
b) If F(xO
) = 0 or F(xO) = 1 choose, respectively, a con-
tinuity point t 1 < E or a continuity point t2
> 1 - E.
The rest of the proof is analogous. o
Remark. It follows from part iii) of the lemma that if U is a
uniform r.v. on [0,1] and F is any d.f., the r.v. F-1(U) has d.f.
F. Hence, it can be shown that if X is a r.v. with a continuous d.f.
F, there exists a uniform r.v. U such that: -1F (U) = X a.e. In
fact, since F is continuous, F(X) is uniformly distributed and put-
ting U = F(X) the result follows from part i) of the lemma and the
fact that X and F-1 (F(X» have the same distribution.
THEOREM 2.1.2. Let X be a r.v.~ with d.f. F~ defined in a non-
atomic probabiZity space (n, A, P). Then, there exists in (n, A, P)
a uniformZy distributed r.v. U~ such that: F-1(U) = X a.e.
Remark on notation. For any d.£. F and 1x E R , we write:
F(x-) = 1imttx F(t). Similarly for any bivariate d.f. G, we write:
(2.1.10) G(x-, y) = lim G(t, y).t t x
Proof of the theorem. Choose a uniformly distributed r.v. Z in
(n, A, P). (The existence of Z is assured by the Corollary to Theorem
2.1.1.) Denote by G the joint d.f. of X and Z. For every pair of
real numbers, (x, z), define:
(2.1.11) H(x,z) = F(x-) + G(x,z) - G(x-,z).
14
Now, define for WEn:
(2.1.12) U(w) = H(X(w), Z(w».
It follows that:
(2.1.13) F(X(w)-) s U(w) s F(X(w» all w € n.
We claim that the r.v. U satisfies the requirements of the theorem.
Recall that to show that U is uniformly distributed, it will be enough
to show that:
(2.1.14) P({w: U(w) < t}) s t s P({w: U(w) s t}) all t E (0,1).
Let t, 0 < t < 1, be given. Put:
Lemma 2.1.1, we have:
-1Xo = F (t). By part i) of
(2.1.15) = t.
We first prove that:
(2.1.16) {w: U(w) < t} c {w: X(w) s xo}.
In fact, suppose that Wo is such that: U(WO) < t. It follows, from
(2.1.13), that: F(X(wo)-) < t.
Now, observe that for any real y > xO' we have: F(y-) ~ F(xo) ~
~ t. The result follows.
We now consider, separately, the two possible cases in (2.1.15).
Case 1. F(xO) = t. First note that, X(w) s xo' together with (2.1.13)
imply:
U(w) S F(X(w» S F(xO) = t.
Hence, we have
(2.1.17) {w: X(w) ~ xo} c {w: U(w) ~ t}.
15
(2.1.16) and (2.1.17) imply:
P({w: U(w) < t}) ~ F(xO
) $ P({w: U(w) $ t}).
Since, F(xo) = t, (2.1.14) follows.
Case 2. F(p-l(t)) = P(xO) > t. This assumption and the definition of
F-1 imply: F(xO
-) $ t. Hence:
(2.1.18) <
Now, observe that, G(xO'z) - G(xO-'z) is, for z € [0,1], a contin
uous function of Z, whose maximum is F(xO) - F(xO-) and whose mini-
mum is zero. Since a continuous function, defined in a compact set,
assumes all the values between its maximum and its minimum, it follows
from (2.1.18), that there exists zo € [0,1], such that:
(2.1.19)
We will show that:
(2.1.20) {w: U(w) < t} c {w: X(w) < xo} u {{w: X(w)=xO}n
{w: Z(W)~zo}} c {w: U(w) $ t}.
To prove the first half of (2.1.20) assume that Wo is such that:
U(wo) < t. It follows by (2.1.16), that X(WO) $ xo. If X(WO) < xO'
we are done. Suppose: X(WO
) = xo. Hence, from the definition of U,
we have:
From the assumption: U(wO) ~ t and (2.1.19), we have
16
Observe now that: G(xo'z) - G(xO-'z) = P({w: X(w)=xo}n{w: Z(w)~z}) is
increasing with z, and hence (2.1.21) implies:
(2.1.22)
The first half of (2.1.20) follows.
To complete the proof of (2.1.20), let w1 be such that:
X(w1) ~ xo. If X(w1) < xo' since Xo = F-1(t), we have, by part i)
of Lemma 2.1.1, that:
(2.1.23)
(2.1.13) and (2.1.23) imply: U(w1) < t and we are done. Suppose
now, that: X(w1) = Xo and Z(w1) ~ z00 We have:
= F(XO-) + G(xO'Z(w1» - G(xO-'Z(w1»
~ F(xO-) + G(xo'zo) - G(xO-'zo)'
From (2.1.19), the expression on the right hand side is equal to t and
the proof of (2.1.20) is completed.
Using (2.1.20) we can write:
(2.1.14) now follows from (2.1.19) and the proof that U has the uni-
form distribution is complete.
-1To finish the proof of the theorem, observe that F (U) is a r.v.
whose d.f. is F. On the other hand, from (2.1.13) and part i) of
Lemma 2.1.1, we have:
(2.1.24)
17
Since F-1 (U) and X have the same distribution, (2.1.24) implies:
Remarks. 1) In the case that F is continuous, the theoremre-
duces to the situation discussed in the remark following Lemma 2.1.1.
2) The r.v. U is clearly not unique and the role played by Z,
o
in the construction of U, would have been fulfilled by any r.v., with
a continuous d.f., taking its values on a compact set.
THEOREM 2.1.3. Let {X} be a sequence of r.v. 's defined in an
non-atomic probability space (n, A, P). Suppose that the corresponding
sequence, {F}, of d.f. 's is weakly convergent to a d.f. F. Then,n
there exists in (n, A, P) a sequence, {Y}, of identically distrin
buted r.v. 's, with d.f. F, such that the sequence Z = X - Y con-n n n
verges in probability to zero as n tends to infinity.
Proof· We first prove that if U is any uniformly distributed
r. v. , F-1 (U) converges to F-1 (U) a.e. In fact, by part v) of Lemman
2.1.1, the weak convergence of F to F, implies the convergence ofn
-1F
nto -1F at all points of continuity of
-1F . Since the set D of
discontinuities of-1
F is, at most, countable it follows that:
P({w: U(w) € D}) = O. Therefore, F-1 (U)n
converges to-1
F (U) a.e.
By the previous theorem, to each r.v. X, we can associate an
uniformly distributed r.v. u ,n
such that:.(2.1.25) F-1 (U) = X a.e. all n ~ 1.
n n n
Define:
(2.1.26) Y = F-1 (U) all n ~ 1.n n
Clearly t the y 'sn
are identically distributed with d.f. F.
18
Further-
more t observe that for fixed n t the distribution of F-1 (U) - F-1 (U )n n n
depends only on the distribution of Un
and is, therefore, the same as
the distribution of F-1 (U) - F-1 (U). Since the latter sequence conn
verges to zero a.e. it follows that:
verges to zero, in probability, as n tends to infinity.
con-
o
Remark. In the case where all the X 's have continuous d.f.'s,n
we have, for all n ~ 1, U = F (X ).n n n Hence, each yn
can be explic-
itly determined as a Borel measurable function of Xn alone Le.:
Y = F-1 (F (X» all n ~ 1.n n n
2.2. The atomic case.
We will now consider sequences of r.v.'s defined in an atomic prob-
ability space. Many of the basic definitions and results, used in this
chapter, will again be needed in Chapters Three and Four in the more
general context of separable metric spaces. We introduce them here, in
their particular versions for R1 , in order to make this chapter com-
plete and self contained.
Definition 2.2.1. Let G be a family of d.f.'s (equivalently a
family of probability measures on 1(R ,8». We call G relatively com-
paot if every sequence of elements of G contains a subsequence which
is weakly convergent to a d.f. (not necessarily in G).
Definition 2.2.2. A family G of d.f.'s is said to be tight if
for every E > 0, there exists a closed, finite interval, [a,b], such
that:
(2.2.1) F(b) - F(a) > 1 - £ all F € G.
19
THEOREM 2.2.1. (Prohorov) A family G of d.f. 's is relatively
compact if and onLy if it is tight.
For the proof and a discussion of the implications of this result
see e.g. Billingsley [1968, pages 35-40].
THEOREM 2.2.2. Let {X} be a sequenae of random variables den
fined in an atomia probability spaae. If the aorresponding sequenae~
{Fn } of d.f. 's is tight~ then every subsequence of {Xn
} has a further
subsequence which converges a.e.
Proof. Since every subsequence of a tight sequence of d.f.'s is
itself a tight sequence it is enough to show that {X} has an a.e.n
convergent subsequence. Let us denote by {Ai}i~l the atoms of
(n, A, P) and write:
P(A.)1.
= all i ~ 1.
Definition 2.1.2 and the assumption that the space is atomic, imply:
00
(2.2.2) Pi > 0 all i ~ 1 and L Pi = 1.i=l
Furthermore, for every fixed i ~ 1 there exists a numerical sequence
{x} such that:ni n~l
(2.2.3) Xn(W) = xni for all n ~ 1 and almost all W € Ai'
We now prove that, for every fixed i ~ 1, {xni}n~l is a bounded
sequence of real numbers. In fact, since Pi > 0, we can choose £,
20
such that:
(2.2.4) o < 8 < p.,1
It follows from the assumption of tightness that there exists a
bounded, closed interval, [a,b], such that:
(2.2.5) all n ~ 1.
On the other hand, from (2.2.3) and (2.2.4) we have:
(2.2.6) P({w: X (w) =xi})n n = all n ~ 1.
It is clear that if for some nO ~ 1,
have a contradiction. It follows that
of real numbers, for all i ~ 1.
we had x i t [a,b], we wouldno
{x} is a bounded sequenceni n~l
The construction of an a.e. convergent subsequence can now be
accomplished by the diagonal procedure as used, for example, in the
proof of the theorem of ReIly-Bray. Equivalently, we can say that the
existence of such a subsequence, is a consequence of the criterion for
00
the relative compactness of subsets of R (see, e~g., Billingsley
[1968, page 219]). 0
We now recall that, in Chapter I, we introduced the notation p
for the metric corresponding to convergence in probability.
In any metric space (S, d), for s E S and A a subset of S,
we write d(s, A) for the distance between s and A, which is
defined to be:
d(s, A) = inf d(s, t).t € A
21
THEOREM 2.2.3. Let {X} be a sequence of r.v. 's defined in ann
atomic probabiZity space (n, A, p). Suppose that the corresponding se
quence, {Fn}, of d.f. 's is weakZy convergent to a d.f. F. Then, if
we denote by G, the famiZy of r.v. 's in (n, A, P), whose d.f. is F,
we have:
i) G is not empty.
iiJ p(X, G) converges to zero as n tends to infinity.n
Proof· Clearly, every subsequence of {F }n
is weakly convergent
to the same limit F. Hence, {F} is weakly compact and therefore,n
by Theorem 2.2.1, {F} is tight. By the previous theorem, theren
exists a subsequence {X.} ofn
{X} which converges a.e.n
It follows
that: X(w) = lim, X ,(w) is a r.v. on (n, A, P), whose d.f. is Fn n
and that implies that G is not empty.
To prove iiJ let {X ,} be any subsequence ofn
{X }.n
Since the
corresponding sequence, {Fn ,}, of d.f. 's is again tight it follows,
by the previous theorem, that we can find a subsequence, {Xn"}' which
converges a.e. and hence, in probability to a r.v. Y € G. Hence:
(2.2.7) p (X '" G) ~ p (X '" Y) ='> 0n n as
This shows that any subsequence of the numerical sequence
{p(X ,G)}, contains a further subsequence which converges to zero. Then
result follows.
CoroZZary. In the same conditions, as in the above theorem, there
exists on (0, A, P) a sequence, {Y}, of identically distributedn
r.v.'s, with d.f. F, such that X - Y converges to zero, in proba-n n
bility, as n tends to infinity.
22
Proof. Let {an} be a strictly positive numerical sequence, such
that:
that:
(2.2.8)
lim a = 0.n-+oo n
p (X ,Y )n n
<
For each n?: 1, there exists Y € G suchn
Furthermore, it is clear from the definition of p that:
p(X ,Y ) = p(X -Y ,0).n n n n
By the previous theorem and the assumption about
that:
{a }n
it follows
p(X -Y ,0) -+ °n n as n -+ 00.
The result now follows from the equivalence between p-convergence and
convergence in probability.
Remarks. 1) In the first comment, that followed Theorem 1.1.3 of
Chapter I, we introduced the concept of an admissible probability mea-
sure for a probability space. In the real line we will talk about ad-
missible d.f.'s and part i) of the previous theorem says that the class
of admissible d.f.'s, for an atomic space, is closed under weak con-
vergence. This result may seem contradictory, if we consider the fact
that there are sequences, {F}, of purely discrete d.f.'s, which conn
verge weakly to a continuous d.f. F. Of course, the answer to this
apparent contradiction lies in the fact that no sequence of r.v.'s,
with those d.f. 's, can be defined in an atomic space.
2) Consider, now, a sequence, {X}, of identically distributedn
Bernoulli r.v.'s, with probability p of success, 0 < p < 1. Of
course, the corresponding sequence of d.f.'s is admissible for any prob-
ability space, which contains an event with probability p. However,
if we require the x 'sn to be independent it follows, from the above
23
results and the central limit theorem, that such a sequence of r.v.'s
cannot be defined on an atomic space. Since independence plays an im-
portant role, in the construction of such examples, it is probably
worthwhile to consider the problem of the existence of sequences of in-
dependent r.v.'s on atomic spaces.
Renyi [1970, pages 167-168] shows that there are atomic probability
spaces where, non-trivial, sequences of independent events exist but,
our freedom, to choose the values of their probabilities, is severely
restricted by the atomic structure. In the theorem below, we rephrase
Renyi's result in a way which is more convenient for our purposes.
THEOREM 2.2.4. (Renyi) Let {B} be a sequence of independentn
events, on an atomic p~babiZity space
n ~ 1. Then, we have:
(Xl
(n, A, P). Let q ... P(B ),n n
(2.2.9) min(q , 1-q ) < (Xl.n n
It is well-known that, in atomic spaces, convergence in probability
and convergence a.e. are equivalent (see, e.g., Neveu [1965, page 48]).
We now use Renyi's result to show that, with the assumption of indepen-
dence, convergence in law is also equivalent to the other two.
THEOREM 2.2.5. Let {X} be a sequence of independent r.v.'s den
fined on an atomic probabiZity spaae. If the aorresponding sequenae of
d.f. 's is weakZy aonvergent to a d.f. F, F is degenerate and the
Xn's converge a.e.
24
Proof. Let x denote a continuity point of F. The events:
B = {w: X (w) s x}n n
form a sequence of independent events. Put: q = P(B ),n n all n;:: 1.
From the assumption of weak convergence, we have:
(2.2.10) lim Cln = q, Osq:sl.
From (2.2.9) it follows that:
(2.2.11)
But,
lim min[q , (l-q )] = O.n nn -+ 00
all n ;:: 1.
From (2.2.10) and (2.2.ll), we have:
1 - 12q-ll = O.
It follows that, q is either zero or one, which shows that F is
degenerate. Since convergence in law, to a degenerate d.f., implies
convergence in probability and the latter is, in atomic spaces, equi-
valent to convergence a.e. the proof is complete.
2.3. The general case.
In this section we discuss the problem in general probability
o
spaces, that is, spaces which have both an atomic and a non-atomic part.
If, as in Definition 2.1.3, we denote by AO the union of all the atoms
of (n, A, P) we have:
(2.3.1)
25
In order to reduce the problem to the situations discussed in the
previous sections, we introduce two new probability measures, P1
and
P2' on the measurable space (n, A). For all B E A define:
(2.3.2) P1 (B) P(B/AO)p(BnAo)
= = P(AO
)
(2.3.3) P2(B) = P(B/A~)p(BnA~)
=P(A~)
Because of (2.3.1), Pi and P2 are both well defined and they are,
clearly, probability measures on (n, A). Furthermore, it can be easily
checked that the probability spaces, (n, A, P1
) and (n, A, P2) are
atomic and non-atomic respectively.
Some remarks on notation. Since we now have three different proba-
bility measures in the same space, we will, whenever confusion is pos-
sible, state between parenthesis, the measure under consideration at
that moment. So far, we have considered r.v.'s as being defined in
probability spaces, since we do not distinguish between two real valued
measurable functions defined on (n, A), which differ only on a set of
P-measure zero. In this section, each real valued, measurable function
defined on (n, A) determines three different r.v. 's. Observe also,
that any element of the P-equivalence class of X, determines the two
other corresponding r.v. 's in (n, A, P1) and (n, A, P2) since, both
P1
and P2
are absolutely continuous with respect to P. The con
verse is, of course, not true. Finally, if F denotes the d.f. of X
with respect to P, we will write F(1) and F(2) for the d.f.'s of
X with respect to P1
and P2
respectively.
It follows, from (2.3.2) and (2.3.3), that:
(2.3.4)
26
The above remarks suggest that a way to deal with questions, concerning
r.v.'s in a general probability space, is to try to solve the problem,
separately, in the associated atomic and non-atomic spaces. If solu-
tions can be found, in both cases, we hope that a suitable combination
of them will provide an answer for the original question.
In our problem, however, this approach will not allow us to apply
directly the theorems of the previous sections. This is so because, as
the following example shows, the weak convergence of the
necessarily imply the weak convergence of {F(l)} andn
does not
Example 2.3.1. Let n be the closed interval [0,1], B the 0-
field of its Borel sets and m be the Lebesgue measure. Let A be the
class of sets, formed by the interval (~,l] and all Borel subsets of
[O,~]. A is clearly a sub a-field of B and if P denotes the re-
striction of m to A, (n, A, P) is a probability space whose unique
atom is (~,l]. Define, for all n ~ 1:
{Oif W E: (.~, 1]
X2n (w) =1 if w E: [O,~]
( if w e: [O,~J
X2n
_1
(w) =if w € (~ ,1]
Since the X 's are, with respect to P, identically distributedn
it follows that the F 's form a constant sequence and are, therefore,n
weakly convergent. However, for each x, °< x < 1 both
and {F(2)(x)} are oscillating sequences of zeros and ones, whichn
shows that neither one of the sequences,
verges weakly.
con-
27
The next two lemmas will show how to overcome this difficulty. The
first one shows that, although weak convergence may be lost in the de-
composition of the space, tightness is preserved.
Lemma 2.3.1. Let {X} be a sequence of r.v. 's on the probabilityn
space (Q, A, P). Suppose that the corresponding sequence, {F}, ofn
d.f.'s is tight. Then, both and are tight.
Proof· We prove the result for the other case being
totally analogous. Let e > 0, arbitrary, be given. Since {F} isn
tight, there exists a closed, finite interval [a,b], such that:
(2.3.5) P({w: Xn(w) € [a,b]}) > 1 - eP(Ao) all n ~ 1.
Clearly, for any 2 sets, B € A and C € A, we have:
(2.3.6) P[BnC] ~ P[B] + P[C] - 1.
Hence, it follows that:
Hence, by (2.3.2):
=
The result follows.
all n ~ 1.
Observations. 1. We choose to present a direct proof of the lemma
above because it is simple and depends only on the definitions. A
shorter proof can be obtained from the fact that both P1 and P2 are
absolutely continuous with respect to P.
28
2. Note that the tightness of is not suffi-or
However, if both,{F} to be tight.n
{F} is tight.n
dent for
are tight,
Lemma 2.3.2. Let {X} be a sequence of r.v. 's in the probabilityn
quences
space (n, A, P). Then, the weak convergence of any two of the se
{F} {F{l)} {F(2)} imply the weak convergence of the third.n' n ' n
Furthermore, if the weak-limits are, respectively, F, and F(2),
(2.3.4) is true.
Proof. It is enough to show the result for one pair of sequences,
the proof for the other two being totally analogous.
is weakly convergent to F and
Assume that Fn
is weakly convergent to F{l).
From (2.3.4) we have, for all real x and all n ~ 1:
(2.3.7)
Let D1 denote the set of points in the real line, which are con-
tinuity points of both F and Since is countable, it fol-
It follows from our assumptions and from (2.3.7) that
is weakly convergent to a d.f.
lows that D1 is dense in the line.
verges at every point of
it follows that
Since is dense and
{F (2)} conn
{F(2)} is tightn
F(2). (2.3.7)
now implies the validity of (2.3.4) for all real x which is a contin
uity point of the three d.f. 's F, F{l) and F(2). Since the set of
such XIS is again dense in the line (it has a countable complement),
the validity of (2.3.4), for all real x, follows and the proof is
complete.
29
We are now in a position to extend the results of the previous
sections to general probability spaces.
THEOREM 2.3.1. Let {X} be a sequence of r.v. 's~ defined in then
probabiLity space (n, A, P) and suppose that the sequence {F} ofn
d.f. 's is weakLy convergent to a d.f. F. Then~ if G denotes the
class of r.v. 's on (n, A, P), whose d.f. is F~ we have:
iJ G is not empty
iiJ p(X, G) ~ 0 as n ~ ~.n
Proof. Since {F } is tight it follows, from Lemma 2.3.1, thatn
both {F(1) } and {F(2)} are tight. Let {F(1)} denote a weakly con-n n n'
vergent subsequence of {F(i)} and let G be its limit. The sequencen
satisfies the conditions of Lemma 2.3.2. It follows that
is weakly convergent to a d.f. H and we have:
(2.3.8) all 1x € R .
The sequence {Xn
,}, considered as a sequence of r.v. 's in
(Q, A, P1), satisfies the assumptions of Theorem 2.2.3. Hence, there
exists in (n, A, Pi) a r.v. Y whose d.f. (Pi) is G. On the
other hand, since (n, A, P2
) is non-atomic we can choose a r.v. Z
whose d.f. (PZ
) is H.
Define:
=X(w) Y(w)IA (w) + Z(w)I (w)o AO
(IA is the indicator of the set Ao)'o
(2.3.9)
(2.3.2) and (2.3.3), plus some easy calculations, imply that the
d.f. (P) of X is equal to the right-hand side of (2.3.8) and that
implies that G is not empty.
30
To prove ii) we will show, as in the proof of Theorem 2.2.3, that
any subsequence of the numerical sequence, {p (X ,G)},n contains a fur-
ther subsequence which converges to zero. Let {X ,} be any subsen
by Lemma 2.3.1 that both
quence of {X }.n Since {F ,} is a tight sequence of d.f.'s it follows
n
{F(l)} and {F(2)} are tight. Choose a sub-n' n'
sequence {F (1) } which is weakly convergent to a d.f. G. By Lemman"
2.3.2, {F(2) } is also weakly convergent and we let H denote itsn"
limit. {Xn
,,} , considered as a sequence of r.v.'s in (S1, A, Pi)' sat-
isfies the assumptions of Theorem 2.2.3 and, considered as a sequence
in (S1, A, P2), it satisfies the assumptions of Theorem 2.1.3. By the
corollary to Theorem 2.1.3 there exists, on (S1, A, Pi)' a sequence,
{Zn"}' of identically distributed r.v. 's, with d.£. G, such that:
(2.3.10)Prob
X"-Z,, --;:. 0 [Pi]'n n
Similarly, there exists on (S1, A, P2) a sequence {Un"} of identi
cally distributed r.v. 's, with d.f. H, such that:
(2.3.11)Prob
X " - U" --;:. 0 [P2].n n
Define, as in (2.3.9):
y" ::n Z "IA + U "I
nOn A~
Clearly, the y ,,' sn
have d.f. F and it remains to be shown, that
X " - Y" converges to zero, in probability (P). Let E: > 0, arbi-n n
trary, be given.
{w: Ix ,,-Y "In nc
n AO'
> d = ({w: Ix ,,-Y "I> d n AO) U {w: Ix u-Y "I> d)n n n n
31
Hence
P ({w: Ix lJ-Y "I > d) = P ({ w: IX ll-Z "I > e:} n AO
)n n n n
+ P({w: IXn,,-un II I > e:} n A~) = P(AO) • Pi ({w: IXn"-Zn" I > d)
+ P(A~) • P2 ({w: Ixn"-un" I > e:}).
By (2.3.10) and (2.3.11) both terms in the last sum·converge to
zero as n" + 00. Hence, it follows that:
But, we also have:
p(X "' Y II) + 0n n as n + 00 [P].
From this we have:
all nil ~ 1.
p (X ,,, G) + 0 as n + 00.n
The result follows.
Remarks. 1. The corollary to Theorem 2.2.3 is also valid here and
the proof is similar.
2. The observations that followed Theorem 2.2.3 also apply in the
present situation since, the existence of a unique atom is enough to
prevent the admissibility of continuous d.f.'s. In particular, the
proof of Renyi's result follows without any changes and in the proof of
Theorem 2.2.5, we substitute the equivalence between the two types of
convergence (which is not true here), by a combination of (2.2.9) with
the Borel-Cantelli lemma.
CHAPTER III
GENERALIZATION TO COMPLETE, SEPARABLE METRIC SPACES
In this chapter we are going to consider sequences of random ele
ments, defined on a probability space (n, A, P) and taking values on
(S, S), where S is a complete, separable, metric space and S is the
a-field of its Borel sets. As mentioned in Chapter I, we will make ex
tensive use of the class of partitions of S introduced by Skorokhod
in the proof of his result (Theorem 3.2.2). The main part of Section
3.1 will be devoted to the study of the properties of this class of
partitions.
3.1. The Skorokhod partition of S
Definition 3.1.1. Let S be a separable metric space and Q a
probability measure on (S, S). Q is said to be non-atomic if
Q({s}) = 0 for all s € s.
Since we have been using the concept of non-atomic in a more gen
eral context (with respect to an abstract measurable space (Q, A», we
decided to include here the proof of the equivalence of the two defini
tions on separable metric spaces.
33
THEOREM 3.1.1. Let S be a separabZe metric space and Q a prob-
abiZity measure on (S, S). Then~ a necessary and sufficient condition
for (S, S, Q) to be a non-atomic probabiZity space (Definition 2.1.3)
is that Q({s}) = 0 for aZZ s E S.
Proof: Suppose first that (S, S, Q) is a non-atomic probability
space. Let So denote an element of S. Since the only subsets of
{sO} are {sO} itself and the empty set it follows, from Definition
2.1.2, that {sO} is an atom of (S, S, Q) if and only if Q({so}) > o.
Since (S, S, Q) is assumed to be atom-free it follows that
Conversely let us now assume Q to be such that Q({s}) = 0, for
all s E S. Let B be an atom of (S, S, Q). We will show that these
assumptions lead to a contradiction. We are going to make use of the
fact that a separable metric space can, for each n ~ 1, be covered by a
countable sequence of disjoint Borel sets, each one with diameter (diam)
less than lin (see Appendix). For each n ~ 1, let {An} denote thei
collection of Borel sets satisfying the above requirements. Therefore,
we have:
(3.1.1) B =oc
Ui=l
(A~nB)~
for all n ~ 1.
Since B was assumed to be an atom, there exists, for each n ~ 1,
a unique index in such that:
34
Therefore we have:
(3.1.4) Q(A) = Q(B)n and Q(B-A) = O.
n
From (3.1.4) it follows that:
(3.1.5) QI~ (B-An~ = O.~=l J
But, we also have:
00 t [n~,<B-An)II(3.1.6) n A =n=l n
From (3.1.5) and (3.1.6) we conclude:
(3.1.7) Q[n A] = Q(B).n=l ~
Let now x and y denote 2 points in n°O 1A •n= n We can write:
(3.1.8) d(x,y) :s; diam[n A] :s;n=l ~
diam An:s; diam An
in
< lIn.
Since (3.1.8) is true for all n ~ 1, it follows that d(x,y) = 0
and hence we can conclude that n°O An=l n
is either empty or a singleton.
On the other hand the assumption that B is an atom implies Q(B) > 0
and hence it follows from (3.1.7) that there exists So € S such that:
00
(3.1.9)
Hence, from (3.1.7):
Q({so}) = Q(B) > O.
The last statement contradicts the assumption that, Q({s}) = 0 for all
S € S, and the proof of the theorem is complete. o
35
Definition 3.1.2. A partition of a set X is any finite, or de-
numerably infinite, collection of disjoint subsets of X, whose union
is X. Given two partitions, {Ai} and {B.} of the same set X weJ
say that {Ai} is a refinement (or a sub-partition) of {B. } if eachJ
Ai is a subset of some B.•J
Definition 3.1.3. A Skoroknod partition P of a separable metric
space S, is an ordered countable collection Pl
, P2
, of parti-
tions of S, each partition being a refinement of the preceding one and
such that for all k ~ 1 the elements of Pk
are nonempty Borel sets
whose diameters are at most (1/2)k.l
Definition 3.1.4. Let P be a probability measure on (S, S). A
Skorokhod partition P of S is said to be P-aontinuous if for all
k ~ 1 the elements of Pk are P-continuity sets.
We will show in the Appendix that for each probability measure P
on (S, S) there exists a P-continuous Skorokhod partition of S.
Remarks on notation. We will adopt the same nested-type notation
andk > 1Hence, for all
we have:
{S : i ~ l}i
1, ••. ,ik _
l,ik k
sets of the element
{Si : i 1 ~ l}.1
the elements of
all elements
used by Skorokhod [1956]. The elements of Pl will be denoted by
For each fixed value of i 1 {Si1,i
2: i 2 ~ I} denotes
P2 which are subsets of Si' In general,1
denotes the elements of Pk which are sub-
(3.1.10) S i'i 1 , .. " k-l
1 The name "Skorokhod partition" involves a slight abuse of languagesince P is actually a family of partitions of S.
36
Furthermore, since P1 is a partition of S it follows that:
(3.1.11) = s.
Observe that since each set S. i is required to be non-~1' ••• , k
empty we may not assume without loss of generality that the range of ~
is infinite. Since S is nonempty the range of ~ con-i1""'~_l
tains at least the number one.
With this notation we have a one-to-one correspondence between Pm
and a nonempty subset of the Cartesian product Nm, where N denotes
the set of natural numbers. Recall that Nm can be ordered lexico-
graphically and with this order Nm is a totally ordered set.
Given any two elements a and b of Nm, we write a ~ b to in
dicate that a precedes b on the lexicographic order of Nm• We will
write a < b;II
to indicate that a precedes and is not equal to b.
Recall that a totally ordered set is said to be well-ordered if
each nonempty subset has a first element. In other words, (A, {) is
well-ordered if for each nonempty subset B there exists b € Bo sat-
isfying bo ~ b for each b € B. It is clear from the definition that
the induced order on a subset of a well-ordered set is a well-ordering
of that subset. The following two lemmas are stated here for future
reference.
Lemma 3.1.1. Let A1 and A2 be two well-ordered sets. The
Cartesian product A1
x A2 , ordered lexicographically, is a well
ordered set.
Proof: We give only a brief outline of the proof. Let D be a
nonempty subset of A1
x A2
. Consider the set:
=
37
U {al~Al: (a1 ,a2) ~ D}a2EA2
Since D is nonempty it follows that B1
is a nonempty subset of A1 .
Let oa1 be the minimum element of B1 , which exists since A
1
ordered. Consider now the set:
is well
=
B2 is a nonempty subset of
It is now easy to show that
A2 and hence it has a minimum element
o 0(a1
,a2
) is the minimum element of D
in the lexicographic order of A1
x A2
•
Lemma 3.1.2. For each m ~ 1, Nm ordered lexicographically, is a
well-ordered set.
Proof: The result is trivially true for m = 1 since N is a
well-ordered set. The result follows by induction using the previous
lemma.
Definition 3.1.5. Let P be a Skorokhod partition of Sand P
a probability measure on (S, S). For a fixed s E S and each m ~ 1,
let S denote the unique element of P which contains s.kl, ••• ,km m
Define:
0 1f k1 = 1
(3.1.12) F1
(s) = k -11L P(Si ) if k1 > 1
i =1 11
(3.1.13) G1
(s) = F1(s) + P(Sk ).1
38
In general, for m > 1, define:
F _(s) if k = 1m-l m
(3.1.14) F (s) k -1= mmFm_
1(s) + I P(s ) > 1if kk
1,k2 ,· .. ,km_
1,i mi =1 mm
(3.1.15) G (s) = F (s) + p(S ).m m k1
,k2
, ••• ,km
for whichPmof
is equal to theObservations. 1) G (s), for s E Sk k 'm 1"'" m
probability of the union of all elements Si il' ... , m
(i1 ,···,im) ~ (k1 ,···,km). Similarly Fm(S), for s E Sk k' isl' ... , m
equal to the probability of the union of all elements 5 of. ~ i 1 ,···,im
Pm satisfying: (11 , .•. ,im) ~ (k1 , •.. ,km). In symbols we have:
(3.1.16) Fm(s) = p[(' iU)~(k k )5 i1 , ... ,i] s E Sk1
, .. ,km1 1 ' •.. , m ~ l' ... , m m
(3.1.17) Gm(s) = p[(' iU)J(k k )Si1
, ... ,i]1 1 , ... , m 1 l' .. " m m
2) F and G are both measurable functions from (5, S) to [0,1]m m
and for fixed s, the numerical sequences {F (s)} and {G (s)} arem m
both monotone, the first increasing and the latter decreasing. Hence,
for each s E S, their limits exist and we put:
(3.1.18) F(s) = lim F (s) G(5) = lim G (s).m mm+ oo m+ oo
3) Observe that if we order P lexicographically \-1e induce in a nat-m
ural way a total order on S. In fact, if s1 and s2 are tl-10 dis-
tinct points of S, and hence for some m ~ 1, and
are in different elements of P ,m and we may say that precedes
s2 if the set containing s1 precedes the set containing 52' With
39
respect to this order the functions F and G behave somewhat like
distribution functions associated with the probability measure P.
Lemma 3.1.3. Let S be a separable metric space, P a Skorokhod
partition of Sand P a probability measure on (5, S). Then, the
functions F and G, satisfy the relation:
(3.1.19) G(s) = F(s) + P({s}) all S € S.
Hence, if P is non-atomic F(s) = G(s) for all s € S.
Proof: From (3.1.15) we have that:
is monotonically decreasing, the se-
s,m increases, for fixed
and s € S allki
, ... ,km'
By the same reasoning used in the proof of Theorem 3.1.1 (seem~1.
for all m ~ 1 and s € Sk k' As1" •. , m
the sequence of sets {S }k i ' ... ,km
quence of its diameters converges to zero
(3.1.8» we have that:
(3.1. 20) {s}.
Therefore, we have:
(3.1.21) III = P({s}) •
(3.1.19) follows from (3.1.18) and (3.1.21). The last part of the lemma
follows from Theorem 3.1.1.
Remark. So far, it may appear that we have a perfect analogy be
tween F and G and distribution functions on Ri. However, it should
be kept in mind that the properties of distribution functions, which
40
depend on the relations between the order and the topology of R1 , have
no straightfonvard counterparts in the present situation. In fact, al-
though P depends on the topology of S, the relative arbitrariness
with which the elements of P are chosen prevents the existence of a
simple relationship between the order on S, induced by P, and the
topology of S.
Lerrma 3.1.4. Let {P } be a sequence of probability measures onn
(S, S) and assume that {p } is weakly convergent to a probabilityn
measure P on (S, S). Let P denote a P-continuous Skorokhod parti-
tion of S and, for each n ~ 1, let n{F : m ~ l}m
be the sequences of measurable functions associated with the pair
(P , P)n
in the way described in Definition 3.1.5. Similarly {F }m
and
{G} will denote the functions ~ssociated with (P,P). For each m ~ 1m
and all s € S:
lim Fn(s) = F (s)m m
and lim Gn(s) = G (s).m mn -+ 00
Proof: We will prove the result by induction on m. Consider
m = 1 and let s be any element of S. From (3.1.11) it follows that
there exists n1 ~ 1, such that S € Pi and s € S If n1 = 1,n1
n1we have by (3.1.12) , that
(3.1.22) all n ~ 1
and also that:
F1
(S) = O.
Suppose now that n1
> 1. Again by (3.1.12) we have that:
41
n -1F(n)(s)
1= 2 P (S. ) all n ~ 11 i =1 n ~1
1
(3.1.23)n -11
F1
(s) = I P(Si ).i =1 1
1
Since the S 's are P-continuity sets and P converges weakly to Pi
1n
it follows that for each i1
, 1 ~ i1
~ n1-l, we have:
lim P (S. ) = P(Si) and, the result follows.n+oo n ~1 1
Assume now that the result is true for m = k-l and let us prove
the result for m = k. Let S denote the element of Pkn1 ,n2 ,··· ,nkwhich contains s. By (3.1.14) we have:
=
Similarly:
=
nFk
_1
(s)
n -1k
+ Li =1k
n -1k
Fk_1(s) + Li =1k
if
if
~>1.
~ = 1
Clearly, if nk
= 1 the result follows by the induction hypothesis.
Otherwise:
= limn -+ co
nFk
_1
(s)
nk-1
+ 2 limik=l n -+ 00
The result now follows by the induction hypothesis and the fact that the
42
elements of Pk are P-continuity sets. The proof of the result for
{Gn
} follows from the result for {Fn } and (3.1.15).m m o
We would like to point out that our objective is to use the
Skorokhod partition as a device that would allow us to adapt, for metric
spaces, the proofs given in Chapter II for the real line. With this ob-
jectlve we proceed to find the probability distributions of the r.v. 's
F(s) and G(s) defined on (S, S, P) with values in [0,1].
Let P be a probability measure on (5, S), P a Skorokhod parti-
tion of S. Let {F} and {G} be the two sequences of measurablem m
functions associated with the pair (P, P) as in Definition 3.1.5. In
correspondence to each t € [0,1] define:
(3.1.24) B = {s: F (s) < t}m m A = {s: G (s) ~ t}
m mm~l.
Both B and A are clearly dependent on t but, since twillm m
be held fixed throughout the proofs, we do not indicate this dependence
in the notation.
In the following lemmas we establish some important properties of
the sequences of sets introduced in (3.1.24). In all of them t is an
arbitrary but fixed real number in [0,1] and {F}, {G} have them m
meaning given to them in Definition 3.1.5.
Lemma 3.1.5. i) For each m ~ 1 both B and A contain, withm m
each element s, the entire set of P which containsm
s. Furthermore,
is either empty or equal to an
satis fying:
the same is true forA ,m
orBm
Pm
of
whenever S is contained ink 1 ' ••• ,km
all elements Si il' ... , m
ii) B - A, for each m ~ 1,m m
element of P.m
43
Proof: The first part of i) is an immediate consequence of the
fact that both F and G are constant on each element of P. Them m m
second part comes from the fact that, for fixed m, F and G arem m
increasing with respect to the lexicographic order on P .m
To prove ii) observe first that if B - A is not empty it alsom m
contains, with each element s, the entire element of P which conm
tains s. Suppose that we have two distinct elements of P ,m
S and S , contained in B - A. Assume thati 1 , ••• ,i k
1, ••• ,k m m
m i m(i1 , .. ·,i) (k
1,···,k).m ~ m
Since the elements of P are not empty we can takem
sl € S. . and~1 ' ••• '~m
follows that:
s2 € Sk kl' ... , mFrom (3.1.16) and (3.1.17) it
(3.1.25)
On the other hand, since sl' Am it follows from (3.1.24) that:
(3.1.26)
(3.1.25) and (3.1.26) imply that Fm(s2) > t and hence, by (3.1.2~
we have that s2 t Bm which clearly contradicts our assumption that
ments of P and hence the desired conclusion follows.m
s2 € B -A .m m This shows that B - A cannot contain two distinct e1em m
o
Lemma 3.1.6. i) For each m ~ 1 and s € B ..m
G (s) $ PCB ).m m
ii) For each m ~ 1 and s € B -A :m m
G (s) = PCB ) and F (s) = peA ).m m m m
Proof:
tains s.
i) Let S denote the element of Pn1 ,· .. ,nm m
By part i) of the previous lemma we have that:
44
which con-
U(i1 ,··· ,im)~ (n1 ,·· • ,nm)
S. .~1" •• '~m
c B .m
Hence it follows that:
~ P(B).m
The result now follows from (3.1.17).
ii) Since we assumed that B - A is not empty by part ii) of them m
part i) of the same lemma, it follows that:
u
previous lemma
B =m
B - Am m is equal to an element
A =m
p •m
Si 1 ,· •{ ,im
(i1,··,i) (n1 ,··,n)m;lt m
By
The conclusion now follows from (3.1.16) and (3.1.17).
Lemma 3.1.7. i)
ii)
PCB ) ~ t.m
peA ) ~ t.m
Proof: i) If PCB ) = 1, we are done sincemt ~ 1. Assume then
PCB ) < 1. Hence there are elements of P which are not contained inm m
Bm and therefore the set:
L = {(i1 , ••• ,i ): Si i c BC
}m 1"'" m m
is a nonempty subset of Nm. It follows by Lemma 3.1.2 that L has a
minimum element that we denote by (n1 , ••• ,nm). By part i) of Lemma
3.1.5 we have that:
45
(3.1.27) B =m U J Si'(k i ) 1 ( ) l' ••• '].m
1 '···' n1,···,nm;Z: m
Let now s € S . By (3.1.16) and (3.1.27) we have that:n1
, ... , nm
0.1.28) F (s) = P(B).m m
But siB and hence by (3.1.24):m
(3.1. 29) F (s) ~ tm
(3.1.28) and (3.1.29) imply P(B) ~ t and the proof of i) ism
complete.
ii) Let R = {t € [0,1]: P({s:G (s)=t}) >m
o}. First we show ii)
for t € R. Observe again that, since G is constant on the elementsm
of P, {s: G (s) = t} is a finite or countable union of elements ofm m
Pm. Let (n1
, ... ,nm) be the first element of the set
s ck l ,···,km
It follows from the def-
{(il
, ... ,i ): Si .. c {s:G (s)=t}}. Assume now thatm 1, .•• ,1m m
{s:G(s)=t} and (n1
, ... ,n)"/:(k1
, ... ,k).m m m
inition of G that P(S ) = o.m k1 , •.. ,k
m3.1.5, we have that:
Hence, by part i) of Lemma
(3.1. 30) P(A )m
=
From (3.1.17), it follows that, if S € S :n
1, ... ,n
m
P(A) = G (s) = t.m m
Consider now t € R, t i R. Assume first that we have a sequence
{t}, t € R, t < t, all n ~ 1, lim ~ t = t. Then, we can write:n n n n~ n<Xl
(3.1.31) {s: G (s) < dm = U
n=l{s: G (s)
m:s; t }.
n
46
Hence, since t t R, we have:
(3.1. 32) P(A) = P({s: G (s) < t}) = P(U{s: G (s) ~ t }).m m tn n
Therefore, since ({s: Gm(s) ~ tn})n~l is an increasing sequence
of sets:
(3.1. 33) peA )m = lim P({s: G (5) ~ t })
m n = lim t n = to
The case where there exists a sequence {tn}, t > tall n ~ 1,n
can be handled in a similar way. Hence, we have proved that for all
t E R, peA ) = tom
Finally, consider the case t i R. If there exists no t*,
t* E R, t* < t we would have that P({s: G (s) ~ t}) = peA ) = ° ~ tm m
and we would be done. Otherwise since R is a closed subset of [0,1]
there exists a largest element t* E R, t* < t. Clearly:
P(A) = P({s: G (s) ~ t}) = P({s: G (s) ~ t*}) = t* < t.m m m
This completes the proof of the lemma. o
THEOREM 3.1.2. Let P be a probahiZity measure on (S, S) and P
a Skorokhod partition of S. As before~ {F} and {G}~ F and Gm m
are the funotions introduoed in Definition 3.1.5. Por al~ m ~ 1 and
eaoh t E [0,1] let B and A be given by (3.1.24). Then~ for allm m
values of t E [0,1] for whioh lim~ P(Bm) = t:
00
(3.1.33) {s: F(s) < t} c nBcm=l m
{s: G(s) ~ t}.
Henoe, for all suoh values of t:
(3.1.34) P({s: F(s) < t}) ~ t ~ P({s: G(s) ~ t}).
47
Proof: The first inclusion in (3.1.33) is in fact true for all
t E [0,1]. It follows from (3.1.24) and the fact that {Fm(s)}~l is
increasing that:
(3.1.35) {s: F(s) < t} c {s: F (s) < t} = Bm m all m ~ 1.
The first inclusion in (3.1.33) follows from (3.1.35).
To prove the other inclusion, let
Lemma 3.1.6, we have that:
By part i) of
(3.1. 36) G (so) :;; P(B )rn m
all m ~ 1.
Therefore, by (3.1.18) we have that:
:;; lim P(B) = t.m
Hence, So E {s: G(s) :;; t} and the proof of (3.1.33) is complete.
(3.1.34) now follows from the fact that {B} is a decreasing sem
quence of sets and hence
= lim P(B ).m
For the next theorem we assume that we are given a non-atomic prob-
ability measure on (S, S). From Theorem 3.1.1 it follows that for the
existence of a non-atomic measure on (8, S) it is necessary that S
be an uncountable set. This condition is also sufficient if 8 is
separable and complete (see, e.g., Parthasarathy [1967, pages 53-55]).
THEOREM 3.1.3. Let P be a non-atomic probability measure on
(S, S) and P a Skorokhod partition of s. Then, the r.v. F(s) =
G(s), defined on (8, S, P) is uniforrnZy distributed on [0,1].
48
Proof: We first observe that for each m ~ 1, In
fact suppose that S € A nBc.m m It follows from (3.1.24) that:
(3.1.37) G (s) :::; tm and
It follows from (3.1.15) that the element of P,m
s belongs has P-measure zero. Therefore, since
to which
not
empty is a finite or countable union of elements of
Hence, we have for all m ~ 1:
P it follows thatm
(3.1. 38) P(B -A) = P(B) - P(A ).m m m m
Suppose now that for a given value of t, we have:
(3.1. 39) lim P (B ) > t.m
From (3.1.38), (3.1.39) and part ii) of Lemma 3.1.7 it follows
that:
(3.1.40) P[~ (B -A )]1
m mm=
= lim P«B -A » > o.m m
However, by part ii) of Lemma 3.1.5 we know that B - A is anm m
element of P for each m ~ 1. By the same argument used in the proofm00
of Theorem 3.1.1, nm=l (Bm-Am) is either empty or a singleton. Hence,
(3.1.40) implies the existence of an element So € S such that:
This clearly contradicts the assumption that P is non-atomic and there-
fore for all t € [0,1] we should have: lim P(B) = t. Therefore,~ m
by the previous theorem and the fact that F = G we have for all
t € [0,1]:
49
P({s: F(s) < t}) ~ t ~ P({s: F(s) ~ t}).
The result follows.
3.2. The non-atomic case
o
We are now going to apply the techniques developed in the previous
section to extend to metric spaces the results of Chapter II. Our first
theorem shows how to associate to each r.e. defined on a non-atomic
probability space a convenient uniform r.v. It is, therefore, the
equivalent for metric spaces of Theorem 2.1.2 of Chapter II.
THEOREM 3.2.1. Let X be a r.e. defined on a non-atomio proba
bility spaoe (Q, A, P) and let Px denote the probability measure in
du~ed by X on (8, $). Let P be a Skorokhod partition of 8, F and
G be the two measurable funotions determined by the pair (Px ' P) as
desoribed in Definition 3.1.5. Finally, let Z be any uniformly dis
tributed r.v. defined on (Q, A, P). For s € 8 and z € [0,1],
defi~:
(3.2.1) H(s,Z) = F(s) + P({w: Z(w) ~ z} n X-1({s}».
Then, H(s,z) is a measurable funotion from «s x [0,1], $ x B» to
([0,1], B) and the r.v. U(w) = H(X(w), z(w» is uniformZy distributed
on [0,1].
Proof: The proof that H is measurable is straightforward. Ob
serve only that F(s) is measurable and that the second term is dif
ferent from zero for, at most, a countable number of values of s.
50
By Lemma 3.1.3 we can write:
(3.2.2) G(s) = = F(s) + P(X-1({S}».
From (3.2.1) (3.2.2) and the definition of D, it follows that:
(3.2.3) F(X(w» ~ U(w) ~ G(X(w» all w € ~
Let now t, ° ~ t ~ 1, be given. In correspondence to t, Pxand P consider the sequences
We have two cases to consider:
{B} and {A} introduced in (3.1.24).m m
1) For the given value of t, lim PX(Bm) = t. It then follows
from Theorem 3.1.2 that:
(3.2.4) P({w: F(X(w» < t}) ~ t ~ P({w: G(X(w» ~ t}).
On the other hand, from (3.2.3) we have that:
{w: U(w) < t} c {w: F(X(w» < t}(3.2.5)
{w: G(X(w» ~ t} c {w: D(w) ~ t}.
(3.2.4) and (3.2.5) imply that for all values of t E [0,1] for which
lim PX(B) = t we have that:~ m
(3.2.6) P({w: D(w) < t}) ~ t ~ P({w: D(w) ~ t}).
2) Assume now that for the given value of t we have:
lim PX(B) > t. By the argument used in the proof of Theorem 3.1.3,~ m
it follows that there exists So € S, such that:
(3.2.7)
(3.2.8)
S € B - A all m 2 1o m m
Hence, by part ii) of Lemma 3.1.6 we have that:
51
0.2.9)Gm(so) = PX(Bm)
Fm(sO) = PX(Am)·
Therefore, it follows from our assumption and from part ii) of
Lemma 3.1.7 that:
(3.2.10) = =
Consider now, for every z E [0,1]:
(3.2.11) fez) -1= P(x ({so}) n {w: Z(w) ~ z}).
Clearly fez) is a continuous function of z and we have:
(3.2.12) f(O) = 0; f(l)
On the other hand, from Lemma 3.1.3 and (3.2.10) we have that:
(3.2.13)
Since a continuous function defined on a compact set assumes all
the values between its minimum and its maximum it follows that there
exists Zo E [0,1], such that:
0.2.14) =
(3.2.15)
We will now show that:
{w: U(w) < tl c t-1[ilAm]u{x-1({SQI)n{w: Z(W)<'Qlj
c {w: U(w) ~ d.
52
To prove the first inclusion in (3.2.15) let Wo be such that
U(wo) < t. It follows from (3.2.3) that F(X(wO» < t. Recall now
that, as mentioned in the proof of Theorem 3.1.2, the first inclusion in
(3.1.33) is valid for all t € [0,1]. Hence, we have that:
00
~l
(3.2.16) n B •m
Observe that we have either X(wo) € Am for some m ~ 1 or
X(wo) € B -A for all m ~ 1. In the first case we are done and in them m
latter it follows that we must have X(wo) = sO' Therefore, in this
last case we can write:
(3.2.17)
On the other hand from (3.2.1) and (3.2.11) it follows that:
(3.2.18) =
Therefore, by (3.2.14) we have that:
(3.2.19)
Now if we observe that H(s,z) is, for fixed s, an increasing func-
tion of z it follows from (3.2.17) and (3.2.19) that Z(wO) < z00
This completes the proof of the first inclusion in (3.2.15).
To prove the other half, let wi be such that:
consider:
We have two cases to
1) It fo1-
lows from (3.1.24) that:
Hence, since {G (s)}m is a decreasing sequence:
S3
G(X(w1» S to
Therefore, by (3.2.3) we have that U(W1
) Stand hence
001 E {w: U(w) S t}.
2) X(W1) = So and Z(w1) ~ z00 Hence, from the definition of U
we have that:
Hence, it follows from (3.2.19) that U(W1
) Stand the proof of
(3.2.15) is complete.
On the other hand, we have by (3.2.11):
Hence it follows from (3.2.10) and (3.2.14) that:
(3.2.20)
From (3.2.15) and (3.2.20) it follows that (3.2.6) is valid for all
t E [0,1] and this implies that U(w) is uniformly distributed in
[0,1] •
Remarks. 1) If Px is a non-atomic measure on (5, S) we have
that:
F(X(w» = U(w) = G(X(w»
and hence the theorem above reduces to Theorem 3.1.3.
2) It is not necessary that the r.v. Z(w) be uniformly distri-
o
buted. Any continuous r.v. taking values on a compact set of1R would
be enough for our purposes.
In Chapter II, we saw how the inverse function -1F of ad. f.
54
F
determines a r.v. on the probability space ([0,1], B, L) whose d.f. is
F. Here, B denotes the Borel sets of [0,1] and L is the ordinary
Lebesgue measure. We now introduce the construction that will play the
role of -1F in the present context. The idea of this construction is
due to Skorokhod [1956] and we reproduce his proof in the theorem below
as a matter of convenience.
THEOREM 3.2.2. rSkorokhodJ. Let 8 be a complete, separabZe
metric space and Q be a probabiUty measure on (5, S). Then there
exists on ([0,1], B, L) a r.e. X with vaZues on (8, S) whose proba-
biZity distribution is Q.
Proof: Let P denote a Q-continuous Skorokhod partition of S.
In correspondence to each P we are going to associate a partition ofm
[0,1], by means of intervals ~ i ' which we take to be 1eft-i 1 ,i2 ,···, m
closed and right-open, satisfying the following conditions:
~i' i' ifl' ... , m
2) The length of
From each element
Q(S ).i 1 ,···,im
P we choose a pointm
-Xi1 ,i2 ,···,im·
For each m ~ 1 and W E [0,1] we define:
(3.2.21) x (w)m = xi .
1, ... ,1
mif W £. ~. i .
1 1 , .. " m
Since each partition in P is a refinement of the preceding ones
we have:
(3.2.22)
55
It follows that {X (w)} is a Cauchy sequence and since S ism
complete, lim X (w) exists.~ m
We define for all w € [0,1]:
(3.2.23) X(w) = lim X (w).m ~ <Xl m
To complete the proof it remains to be shown that the probability
distribution of X is Q. We claim that it is enough to show that A
and LX-1
are equal on the Borel subsets of S which are Q-continuity
sets. In fact, this class of sets is a field and it can be shown that
the a-field generated by this field is S.
Let A denote a Q-continuity set. Let denote the union of
all elements of P which are contained in A and let A,(m) denotem
the union of all elements of P which are not contained in S - A.m
Clearly:
(3.2.24) A(m) cAe A' (m) .
Observe also that from the construction of X we have:
(3.2.25) = Q(A(m».
Similarly we have that:
(3.2.26) L({w: X(w) € A,(m)}) =
Let now C(m) denote the set of points whose distance from the
boundary of A is less than or equal to (~)m. Observe now that if
S. . c A,(m) - A(m), S1 . contains points of the boundary11 ,···,1m 1,···,1m
of A and hence the distance from a point in S. . to the bound-J.1 ,··· ,J.m
i I h (L)m. H h h tary s ess t an ~ ence, we ave t a :
(3.2.27)
56
But, {C(m)} is a decreasing sequence of sets whose limit is the
boundary of A. Hence, since A is a Q-continuity set, it follows
that:
(3.2.28)
The result follows. o
Remark. Observe that the partition of [0,1] given by the inter-
coincides with the partition determined by the in-
areGmandF
mwhereS E Si .
1" .• '~m
as in Definition 3.1.5.
for
(Q, P)
va1s 1:::.. i~1' 2,· .. ,im
tervals: [F (s) ,G (s»,m m
determined by the pair
Corollary. Let S be a complete, separable metric space and Q a
probability measure on (S, S). Then, there exists in any non-atomic
probability space a r.e. X, with values in S, whose probability dis-
tribution is Q.
Proof: Since (n, A, P) is non-atomic, by Theorem 2.1.1, there
exists on (n, A, P) a uniform r.v. Z. The construction of the above
theorem applied to each value Z(w) produces the desired r.e.
Lemma 3.2.1. Let Z be a uniformly distributed r.v., defined on
the non-atomic probability space (n, A, P) and let Q be a
probability measure on (S, S) where S is a complete separable metric
space. Let X denote the r.e. obtained by applying the Skorokhod con-
struction to the r.v. Z. Then, for all m ~ I and all choices of
(3.2.29) P({w: X(w) E s. 1 } n {w: Z(w) i 1:::.1 1}) = 0.~1 ' .•. , m 1 ' .• " m
57
Proof: The result is trivial and it is in fact assumed in the
proof of Skorokhod's result (see (3.2.25) and (3.2.26». Observe that
if Z(w) E ~k k 'l' .. " m
X(W) E S Hence, we have that:k1
, .•. ,km
'
(3.2.30) c
Hence it follows that:
(3.2.31) P({W: X(W)ES. i }n{w: Z(w)t~. i })~1 ' •• , m ~1 ' •• , m
= Q(S. i n Sc )~1 ' . " m i 1 ,··,im
:s; Q(S n SC ) = O.i1
, .. ,im i 1 ,··,im
The last equality follows from the fact that P is Q-continuous. 0
We now prove a more general version of Theorem 2.1.3 of Chapter II.
THEOREM 3.2.3. Let {X} be a sequence of r.e. 's defined on an
non-atomic probability space (n, A, P)3 with values on (S, S) where
S is a complete 3 separable metric space. Suppose that the corre
sponding sequence {Qn} of probability distributions on (8, S) is
weakly convergent to a probabiUty distribution Q. Then 3 there exists
on (n, A, P) a sequence {y} of r.e. 'S3 with values onn
(8, S)J and
such that the following conditions are satisfied:
For each n ~ 1 the probability distribution of Y isn
Q.
it) d(X ,Y )n n
converges to zero in probabi U ty as n tends to
infinity.
58
Proof: Let P be a Q-continuous Skorokhod partition of S. For
each n ~ 1, let U denote a uniformly distributed r.v. associatedn
with the r.e. X and the partition P as in Theorem 3.2.1. For eachn
n ~ 1, {Fn
} and {Gn} denote the measurable functions, introduced inm m
Definition 3.1.5, determined by the pair (Q , P).n
Similarly, {F }m
and {Gm} are determined by (Q, P). The Skorokhod construction
applied to each U •ndetermines by the corollary to Theorem 3.2.2, a
r.e. Yn with probability distribution Q. We now proceed to show that
the sequence {Y }n
satisfies ii).
Let us recall that in the Skorokhod construction for each m ~ 1 a
partition of [0,1] was determined by (Q, P) and we denoted by
6. i the elements of the partition associated with P. Further-~1 , •• " m m
more, recall that by the remark following the proof of Theorem 3.2.2 we
had; for each m ~ 1 and all choices of i1
, .•. ,im:
0.2.32} [F (s},G (8))m m 8 € S i .i 1 , •.. , m
By (3.2.3) and the properties of the sequences
we have:
n{F } >1'm m-
n{G } >1m m-
(3.2.33) s U (w)n
nG (X (w».m n
If we now observe that the intervals [Fn(s},Gn(s» for s € Sm m i 1 ,··· ,im
form also a partition of [0,1) it follows from (3.2.33) that the
assumption that Un
is an interior point of any such an interval im-
plies that Xn
should belong to the corresponding element
Hence we can write:
(3.2.34)
(3.2.35)
59
Hence, from (3.2.32) and (3.2.34)
P({w: X (w) r/. Si .} n {w: U (w) E 6. })n 1, .• ,1m n 1 1 , .• ,im
P(w: U (w) t [Fn(s),Gn(s» n {w: U (w) E (F (s),G (s»}n m m n m m
where s is an element of S. i •1 1 , .•• , m
The term on the right on (3.2.35) is the probability that a uni-
form r.v. belongs to the intersection of an interval with the complement
of another interval. It can be easily checked that the value of this
probability is given by:
(3.2.36) (G (s)-F (s» - max(O,(min(G (s) ,Gn(s»-max(F (s),Fn(s»).m m m m m m
On the other hand by Lemma 3.1.4:
nlim F (s) = F (s)m m
n+cx>
lim Gn(s) = G (s).m m
Therefore, it follows that:
lim P({w: X (w) , S. i } n {w: U (w) E 6 }) = O.n 11
, .. , n i1
, .. , i mn+cx> m
We now observe that given any three events A1 , A2 , A3 on a probability
space we always have:
(3.2.37)
Using (3.2.37) we can write:
(3.2.38)
60
If we now recall that Y was obtained by applying the Skorokhod conn
struction to U, it follows by Lemma 3.2.1 that the second term on then
right of (3.2.38) is zero. Furthermore, we have just shown that the
first term converges to zero as n tends to infinity. Hence, we have:
(3.2.39) lim P({w: X (w) i Si 1 } n {w: Y (w) E S }) = O.n ~ 00 n 1"" m n i 1 ,··,im
Let now E > 0 be given. Choose m ~ 1 such that
write:
(~)m < E. We can
(3.2.40) P({w: d(X (w),Y (w» > E}) ~ P({w: d(X ,Y ) > (~)m}).n n n n
On the other hand, we have that:
(3.2.41) {w: d(X ,Y ) > (~)m} c U{w: X i S1
1 }n{w: Y ES i }n n n 1'" m n 11 ", m
The last union extends to all sets
Hence we have that:
S in P.1
1, ... , 1
mm
(3.2.42) P({w: d(X ,Y ) > E})n n
since
I p({w: Yn E Sl .} n {w: X i S1 1 }).( 1 . ) 1 ' . . , 1 m n 1 ' . . ., m1" •• ,1m
By (3.2.39) each term of the sum on the right of (3.2.42) converges to
zero as n tends to infinity. On the other hand, each term is
dominated by P(Y E Si . ), which is independent of n,n 1, .•. ,1m
all Y 's have the same probability distribution. Furthermore,n
LP(Yn E Sil, ••. ,im
) = 1 and hence by the dominated convergence theorem
it follows that:
(3.2.43) limn~oo
P({w: d(X (w),Y (w» > E})n n = o. o
61
3.3. The atomic and the general cases
As we mentioned before very few changes are required on the proofs
of Sections 2.2 and 2.3 of Chapter II to generalize the results, ob-
tained there, to complete separable metric spaces. In this section, we
prove a few lemmas that will allow us to make the necessary changes. We
start by stating a more general version of the definitions of relative
compactness and tightness.
Definition 3.3.1. Let S be any metric space. A family r of
probability measures on (S, S) is relatively oompaot if every sequence
{P} of elements of r contains a subsequence shich is weakly convern
gent to a probability measure on (S, S) (not necessarily an element of
n.
Definition 3.3.2. Let S be an arbitrary metric space and r a
family of probability measures on (S, S). r is said to be tight if fo
for every £ > 0 there exists a compact set K such that: P(k) > 1-£
for all P in r.
THEOREM 3.3.1. (Prohorov). Let 8 be a oomplete separable metria
spaae. For a famiZy r of probability measures on (8, S) to be rela
tively aompaat it is neaessary and suffioient that r be tight.
Theorem 3.3.1 is stated in the way it was proved by Prohorov [1956]
and this will be enough for our purposes. Varadarajan [1961] extended
the sufficiency of the condition to arbitrary metric spaces.
THEOREM 3.3.2. Let {X} be a sequenoe of r.e. 's defined on then
atomio probability spaoe (Q, A, P) and taking vaZues on (8, S). If
the aorresponding sequenae
62
{Q} of probabiUty measures on (S, S) isn
tight then eveT'!f subsequence of {X} has a further subsequenae whichn
converges a.e.
Proof: Let {A.} denote the atoms,of1.
(n, A, P). Since P(Ai»O,
all i ~ 1, in the same way as in the real line case tightness implies
that for each i there exists a compact set Ki
such that for almost
all w € Ai the sequence {x (w)}n is entirely contained in k i • But,
each sequence contained in a compact set of a metric space has a con-
vergent subsequence. Hence, we can apply the diagonal procedure as in
Theorem 2.2.2 to complete the proof of the result.
THEOREM J. J. J. Let {X} be a sequence of r.e. 's defined on ann
atomic probabitity spaae (n, A, P) and taking vatues on (S, S) where
S is a aomptete separable metria space. Suppose that the corresponding
sequence of probabiUty measures {Qn} is weakly convergent to a prob
abiUty measure Q on (S, $). Then, if AQ
denotes the class of
r.e. 's on (n, A, P) whose probability distribution on (S, S) is Q,
we have:
i) AQ
is not empty.
ii) p(Xn,AQ) converges to zero as n tends to infinity.
The proof is totally analogous to the proof of Theorem 2.2.3.
Corottary. In the same conditions, as in the above theorem, there
exists on (n, A, P) a sequence {Y} of identically distributedn
r.e.'s, with probability distribution Q, such that d(X ,Y )n n con-
verges to zero in probability as n tends to infinity.
63
Proof: See proof of corollary to Theorem 2.2.3.
Remark on notation: Recall that we use d to denote the metric
on Sand p to denote the metric on the space of r.e. 's corresponding
to convergence in probability.
We consider now the general case. Let ('1, A, P) be a probability
space, let {A. } denote the atoms of (0. , A, P) and put as in1
Section 2.3 A = Ui~l Ai' Since the cases P(AO
) = 0 and P(AO
) = 10
were already discussed we shall assume here that:
(3.3.1)
We introduce again the two measures Pi and P2
on ('1, A)
given by:
(3.3.2) P1
(B)P(BnAo)
= P(Ao)p(BnA~)
all B E A.(3.3.3) P2(B) =
P(A~)
The probability spaces ('1, A, P1
) and ('1, A, P2
) are atomic and
non-atomic respectively. If X is a r.e. defined on ('1, A) with
values on (S, S) we will denote by Q, Q(1) and Q(2) the proba-
bility measures induced by X on (S, S) in correspondence to P, Pi
and P2
respectively.
Lemma 3.3.1. Let {X} be a sequence of r.e. 's on ('1, A, P) andn
suppose that the corresponding sequence of probability distributions
on (S, S) is tight. Then, both are tight.
Proof: Totally analogous to the proof of Lemma 2.3.1.
64
The result contained in the next lemma can be quite useful when we
deal with weak-convergence from the point of view of continuity sets.
Although we found no reference to it in the literature the result is
probably known and its proof is relatively simple.
Lemma 3.3.2. Let S be a separable metric space, Q1 and Q2
be any two probability measures on (5, S). Let F1
and F2 denote
the fields of Ql-continuity sets and Q2-continuity sets respectively.
Then, S is the smallest cr-field containing the field F = F1 n F2 •
P~oof: Let x be any element of 5 and r be any positive real
number. We will denote by B(x,r) the open ball of center x and
radius r, that is:
(3.3.4) B(x,r) = {yES: d(x,y) < r}.
Recall that for A c 5 we write oA to denote the boundary of A. It
is easy to see that:
(3.3.5) oB(x,r) c {y: d(x,y) = r}.
Let now Xo E 5 and rO
> 0 be given. There exists at most a
countable number of values of r for which:
(3.3.6)
A similar statement can be made for Q2'
Hence from (3.3.5) and Definition 1.1.4, it follows that for all
but a countable number of values of r < r O we have:
(3.3.7)
Hence, we can choose a sequence {r }n of real numbers,
65
for all n ~ 1, limn~ r n = r O and such that for all n ~ 1,
B(xO,rn) € F1 n F2
,
Therefore, we have that:
(3,3.8) =
(3.3.8) shows that every open ball of S belongs to the a-field
generated by F1 n F2
• Therefore, it follows that this a-field contains
the a-field generated by the open balls. Since the latter, for sepa-
rable S, coincides with S the proof of the lemma is complete.
Lemma 3.3.3. Let {X} be a sequence of r.e. 's defined onn
(n, A, P) with values on (S, S). Let us denote by {Qn}' {Q~l)} and
{Q(2)} the three sequences of probability measures induced by {X} onn n
(S, S). Then the weak convergence of any two of the sequences implies
the weak convergence of the third. Furthermore, if Q, Q(l) and Q(2)
denote the respective weak limits we have for all B € S:
(3.3.9)
We observe first that if any two of the sequences is tight the same is
true for the third. This is a consequence of Lemma 3.3.1 if the pair
assumed tight includes {Q }.n
Otherwise, recall that:
(3.3.10) Q (B)n =
for all B € S, all n ~ 1. Hence, if both are
tight the tightness of {Qn} follows from the fact that the union of
two compact sets is compact.
Assume now that Qn converges weakly to Q and Q(l)n
converges
weakly to Q(l). It follows from (3.3.10) that converges for
66
all sets which are continuity sets of both Q and Ql.
is tight every subsequence of contains a weakly convergent sub-
sequence. Let {Q~~)} be a weakly convergent subsequence of
and let QO be its limit. It follows from (3.3.10) that for all B
which is both a Q-continuity set and a Ql-continuity set, we have:
=
Therefore all subsequential limits of {Q(2)} coincide on a class ofn
sets which is, by the previous lemma, a field which generates S. Hence
it follows that {Q(2)} is weakly convergent to a limit Q(2) and then
validity of (3.3.9) for all B € S follows.
THEOREM 3.3.4. Let {X} be a sequenae of r.e. 's defined onn
(n, A, P) with values on (8, S) where 8 is a aomplete separable
metria spaae. Suppose that the aorresponding sequenae {Qn} of prob
ability measures is weakly aonvergent to a probability measure Q. Then
if AQ
denotes the alass of r.e. 's on (n, A, P) whose probability
distribution is Q we have:
iJ AQ
is not empty.
iiJ P(Xn,AQ
) aonverges to zero as n tendS to infinity.
Proof: The proof follows the same lines of the proof of Theorem
2.3.1 of Chapter II.
CHAPTER IV
SOME RELATIONSHIPS BETWEEN THE METRICS LAND p
In this chapter we will look into some relationships between the
Levy-Prohorov metric L (Definition 1.1.5) and the metric p (see
1.1.4) associated with convergence in probability.
4.1. The lower bound for p
We will begin by stating a result which is a consequence of the
definitions of Land p and which, we believe, was first proved by
Prohorov [1956].
Lemma 4.1.1. Let X and Y be any two r.e. 's defined on a prob-
ability space (n, A, P), with values on (S, S). Let Px and Py be
the probability measures induced on (S, S) by X and Y respectively.
Proof; Let £0 > p(X,Y) be given and let F be a closed set in
S. As before we write F£Q to indicate the set: {XES: d(x,F) s EO}
we can write:
P(XEF)£0 £0= P(XEF & Y€F ) + P(X€F & YiF ).
Hence, it follows that:
68
But, on the other hand:
{w: X(w) € F} n c {w: d(X(w),Y(w» > EO}.
Since £0 > p(X,Y) it follows from the definition of p, that:
P({w: d(X(w) ,Y(w» > EO}
Hence, we can write:
P(X€F)
Therefore, it follows from the definition of L (Definition 1.1.5),
that:
Since the last result is true for all
lemma is complete.
the proof of the
o
Lemma 4.1.1 shows that given two probability measures Q1 and Q2 on
(S, $), L(Q1,Q2) is a lower bound for the distance in probability be-
tween r.e. IS, defined in some probability space, whose margina1s are Q1
and Q2. The following result due to Strassen [1965J shows that this
lower bound is always attained when 5 is a complete, separable metric
space.
THEOREM 4.1.1. (Strassen). Let S be a aomplete separable metria
spaae and Q1
and Q2
be two probability measures on (5, $). Then,
there exists a probability measure A on S x S with marginals Q1
and Q2 suah that for every pair of r.e.'a (Xl' X2), Whose joint
probability distribution is A, P(Xl ,X2) = L(Ql,Q2).
69
For the proof, see 5trassen [1965], Theorem 11, pages 436-438.
We will now look at this result from a different point of view.
Let (Q, A, P) be a probability space, E be the class of r.e. IS de-
fined on (Q, A, P) with values on (5, S) and G be the class of
probability measures on (8, S) which are admissible for (Q, A, P).
Recall that since we do not distinguish between random elements which
are equal a.e., each element of E is in fact an equivalence class of
r.e. 's. Furthermore, if (Q, A, P) is non-atomic, G coincides with
the class Z(S) of all probability measures on (5, S).
Let Q1 and Q2 be two elements of G and let AQ and AQ1 2denote respectively the classes of r.e. IS whose probability distribu-
tions are Q1
and Q2' For a given X E: AQ
we will look for condi-1
tions for the existence of Y E: AQ for which p(X,Y) is arbitrarily
2close to the lower bound L(Ql,Q2)' He now shot... , by means of an ex-
ample, that when S is the real line and (Q, A, P) is non-atomic, the
construction used in Chapter II (using the inverse of a d.f.) does not,
in general, produce a r.e. for which the lower bound is attained.
Therefore, if the metric p is used as a criterion of optimality the
sequence {Y }n
constructed in Theorem 2.1.3 is not necessarily the
"best" one possible.
Example 4.1.1. Let n be the closed interval [0,1], A be the
a-field of Borel subsets of [0,1] and P be the Lebesgue measure.
Let the d.f. IS G1
and G2 be given by:
(4.1.1) = {~ififif
x < 0° S x s Ix > I
(4.1.2)= {+ if x < 0
if o:s; x < 1/4if x ~ 1/4.
70
Let Q1 and Q2 be the probability measures determined by G1 and G2
respectively. To determine L(Q1,Q2) we need to find the values of
€ > 0 for which the inequality:
(4.1.3)
is satisfied for all closed subsets F of the real line.
It is easy to see that for any closed subset
that:
F of 1R we have
= Q2(F n {O,%})
€~ Q1«F n {O,%}) ).
Hence it follows that to calculate L(Q1,Q2) we can restrict our
attention to the three nonempty subsets of {O,%}. If we write for
those sets the inequalities given by (4.1.3) and we use (4.1.1) and
3(4.1.2) we will get that L(Q1,Q2) = i'
Let now X be the uniformly distributed r.v. given by the iden-
tity map from [0,1] to the real line. From the definition of the 1n-
verse of a d.f. (Definition 2.1.4) we have that:
(4.1.4) Y(x) = -1G
2(X) =
o :s; x < 2/32/3 < x :s; 1.
Hence, we can write:
P({w: Ix(w)-Y(w)I > €}) = P(y=O & X>e)
+ P(Y = 1/4 & Ix - 1/41 > E).
71
Therefore, from (4.1.4) and the definition of X it follows that:
P({w: Ix«(v)-Y(w)I > e}) = P([O,~] n (e:,l])
211+ P«J,l] n ([0'4-e:) u (4+e:,1]».
Hence, it follows that:
(4.1. 5) P({w: Ix(w)-Y(w)I > e:}) =
To obtain p(X,Y) we equate the right hand side of (4.1.5) to e: and
17solve the resulting equation. The result, 36 is clearly larger than
38· However, it can be easily checked that the lower bound, 3
8' is
attained for the r.v. Z defined by:
= {10/4Z(w)
3/8 ~ w ~ 17/24
otherwise.
We will now show that if (Q, A, P) is non-atomic for each X E AQ1
there exists Y E AQ
for which p(X,Y) is arbitrarily close to2
L(Q1,Q2). We are going to need the following known results about non-
atomic spaces.
Lemma 4.1.2. Let AO be any event on a non-atomic probability
space (n, A, P). Let q1,···,qn be real numbers such that: qi ~ 0
I~=l qi ~ P(Ao)· Then, there exist n disjoint subsets of AO'
B1 ,B2 ,· .• ,Bn , B. € A, P(Bi ) = qi' all i=l, ••. ,o.1.
ppoof: For n = 1, this is a well-known result. (See, e.g.,
Neveu [1965], page 18.) The result for n can be easily proved by
induction.
72
Lemma 4.1.3. Let Q1
and Q2
be two probability measures on
1(S, S) with finite support. Let A be a probability measure on
[SxS, SxS] whose marginals are Q1 and Q2' Then, for any r.e. X
defined on the non-atomic space (Q, A, P) with probability distri-
bution Ql' there exists a r.e. Y with probability distribution Q2
and such that the joint distribution of (X,Y) is A.
ppoof: Let {sl, ... ,sn} and {51" .. ,s~} denote the supports of
Q1 and Q2 respectively. Assume that:
Q1({Sin = Pi 1 ::;; i ::;; n
Q2({sj}) = qj 1 ::;; j ::;; m
A«si ,sj» = Aij 1 ::;; i ::;; n, 1 ::;; j ::;; m.
Clearly, from the assumptions about Q1' Q2 and A it follows
that:
mI Aij = Pi 1 ::;; i ::;; n
j=lnI Aij = qj 1 ::;; j ::;; m.
i=l
The construction of the r.e. Y can now be accomplished by
applying the previous lemma to each one of the sets-1
X ({s.n1.
with the
o
1
THEOREM 4.1.2. Let Q1
and Q2
be any two ppobability meaBupes
on (8, S) whepe 8 is a aomplete., sepapable metPia spaae. Let
(Q, A, P) be a non-atonna ppobability space and as befope let AQ
and1
By support of a probability measure we understand the smallestclosed subset of S which contains all the mass.
73
denote the olasses of roe. 's on (n, A, P)AQ2
tributions are and respeotively.
whose probability dis-
Then, for any
P(X,AQ )2
=
Proof:
a sequence
Let
{x }n
X € AQ
be given. Since S is complete there exists1
of simple r.e. 's (roe. 's taking at most a finite num-
ber of values) such that {x }n
converges aoe. (and hence in proba-
bility) to X. Let {Q1n} denote the probability distributions of the
Xn's. Clearly {Q1n} converges weakly to Q1' On the other hand,
there exists a sequence {Q2n} of probability measures with finite sup-
port such that {Q2n} is weakly convergent to Q2' In fact, the set of
probability measures with finite support is dense in the class of all
probability measures in (S, S). (See, e.g., Parthasarathy [1967],
page 440) Recall now that, by Strassen's result, given any two proba-
bility measures R1
and R2
on (S, S), there exists a probability
measure A on [SxS, SxS], whose marginals are R1 and R2 and such
that for every pair of r.e. 's (Zl,22) whose joint probability distri
bution is A, we have: P(Zl,Z2) = L(R1 ,R2). For each n ~ 1, let us
denote by An
the probability measure on [sxS, SxS] , associated in
this way with the pair Since both Q1nand Q
2nhave a
finite support, by Lemma 4.1.3 given Xn
there exists Y ,n
with prob-
ability distribution Q and such that (X ,Y) has JOoint distri-2n' n n
bution A. Hence, it follows that, for all n ~ 1:n
(4.1.6) p (X ,Y )n n =
But, {Y }n
is a sequence of r.e. 's on (n, A, P) whose probabi-
lity distributions, {Q2n} are weakly convergent to Q2' Hence by our
74
main result of Chapter III:
(4.1.7) lim P(YntAQ ) = O.n + ~ 2
On the other hand. s~nce {Q }.• 1n are weakly convergent to
Q1 and Q2 respectivelYt it follows that:
(4.1.8) =
FinallYt by the triangle inequality we can write:
p(X)AQ
)2
Therefore, by (4.1.6):
p(X)AQ
)2
p(X,X) + p(X tY ) + p(Y )AQ
).n n n n 2
p(X)X) + L(Q1 ,Q2 ) + p(Y ,AQ
).n n n n 2
Hence) by (4.1.7) and (4.1.8) it follows that:
P(XtAQ )2
$ lim p(X,Xn) + L(Q1 tQ2)'n+ oo
Since X converges to X a.e. it follows that:n
Since the reverse inequality is always true by Lemma 4.1.1 t the
proof of the theorem is complete. o
Remark. The theorem above does not permit us to conclude that for
for whicheach X € AQ
there exists Y € AQ1 2
Theorem 4.1.1 says that if we consider in the space E of r.e. 's the
classes of equivalence determined by their probability distributions t
elements of the same class are at the same distance from any other
class) provided that the probability space is non-atomic.
75
The following example shows that this result fails in general.
ExampZe 4.1.2. Let n = [O,lJ and A be the a-field formed by
1 1[0'3] and all Borel subsets of (3,1]. P will be the Lebesgue mea-
sure. Since the mass of the atomic part is smaller than ~ it can be
easily seen that for any a, ° S a S 1, there exists in (n, A, P)
an event with probability a. Consider the following probability rnea-
sures on the real line:
(4.1.9)= Q ({I}) = ~
1
= %; Q2({l}) = 3/4.
Both Q1
and Q2
are admissible for (n, A, P) and by the same
argument used in Example 4.1.1, we can say that to calculate L(Ql,Q2)
we need only to consider the nonempty subsets of {O,l}. However, it is
easy to see that {I}
isfied for all E > 0.
is the only set for which (4.1.3)
Hence, it follows that: L(Ql,Q2)
is not sat-
3= 7; - ~ = ~.
Let now X be given by:
-- {01X(w)otherwise.
Observe now that any set in (n, A, P) whose probability is 3/4 has
Hence any r.v.to contain the atom
bution Q2
has to be equal to 1 in
Y with probability distri-
1[0'3]' Therefore, for any
p(X,Y) ;;:: 1/3.
Hence, it follows that:
p(X,AQ
) ;;:: 1/3.2
On the other hand, if we consider the r.v. X'(w) = 1 - X(w),
X' € AQ and we will show that:1
defined by:
{Ol
Y(w) =
p (X ',A ) = 1/4.Q2
o s w s 3/4
otherwise.
In fact, let
76
y be
Clearly, for any €, 0 < € < 1, we can write:
{w: IX'(w)-y(w)1 ~ €} = (~,f]. The result follows. o
Remark. Let us consider for Q1 and Q2
in G, the quantity La
defined by:
if and only if Q1 = Q2. Furthermore, it follows from Theorem 3.3.4 of
Chapter III that a sequence {Qn} in G converges weakly to Q if and
only if Lo(Qn,Q) converges to zero as n tends to infinity.
It is not difficult to show that La satisfies the triangle in
equality and it follows from Theorem 4.2.2 that for non-atomic proba-
bility spaces La coincides with L. However the previous example may
be used to show that LO is not necessarily symmetric. We will make
use of Example 4.1.2 to show that the standard procedures to symmetrize
La (e.g. maximum, arithmetic mean) fail to preserve the equivalence
with L. Let (Q, A, P) be as in Example 4.1.2 and define, for each
n ~ 1:
2 I= - +3 n =
Clearly, Qn € G for all nand {Qn} converges weakly to Q, given
by:
Q({O}) = 2/3 Q({l}) = 1/3.
77
Let Y € AQ
be defined by:
{:if 1/3 < w ~ 1
Y(w) =if 0 ~ w ~ 1/3.
By the same argument used in
must contain the atom
Example 4.1.2 for any Xn € AQ
, X~1({0})n
Hence, it follows that it is impossible
such that
does not converge
to choose a sequence {Xn }, Xn € AQn
zero as n tends to infinity. Therefore,
p(X ,Y)n
converges to
to zero and hence the symmetrized LO will not be equivalent to L.
4.2. Weak convergence and equivalent probability measures
In this section, we extend to metric spaces a result of
Padmanabhan [1970] and show how this extension allows us to use L to
define a metric in the space of r.e. IS which is equivalent to p.
Recall that given two probability measures P, Q, in the same mea-
surable space (Q, A) we say that Q is absolutely oontinuous with
respect to P (Q« P) if Q(A) = 0 for every A € A, for which
P(A) = O. Two probability measures P and Q for which both condi-
tions P« Q and Q« P are satisfied are said to be equivalent.
Lemma 4.2.1. Let A and B be any two events on a probability
space (n, A, P). A necessary and sufficient condition for the
existence of an equivalent probability measure Q on (n, A) for which
Q(A) ; Q(B) is that P(A6B) > O.
Proof: Suppose first that for some probability measure Q,
equivalent to P, we have: Q(A); Q(B). Then we can write:
o < IQ(A)-Q(B)! S Q(A6B)
78
Hence, Q(~B) > OJ P(~B) > 0 now follows from the definition of
equivalence.
To prove the converse, note that
A ~ B = (A-B) u (B-A).
Hence, P(~B) > 0 implies that either P(A-B) > 0 or P(B-A) > O.
Assume that P(A-B) > O. Observe now that if peA) ~ PCB) we are done
since P is equivalent to itself. It only remains to consider the case
peA) = PCB).
Define for each C € A
(4.2.1) = P(C/A) = P(CnA)peA)
Since P(A-B) > 0, peA) is strictly positive and hence Po is we11
defined.
Define for each C € A:
(4.2.2) Q(C)Pa(C) + P(C)
=2
Q is a probability measure on (n, A) and Q is equivalent to P. On
the other hand:
Q(A)(4.2.3)
Q(B)
=
=
PO(A) + peA) = 1 + peA)2 2 c
I[1 + PCB) - P(AnB )]
IR(AnB) + P(B)l 2 = peA)[peA) ~ 2
Since P(AnBc) > 0 and peA) = PCB) it follows from (4.2.3) that:
Q(A) > Q(B).
THEOREM 4.2.1. Let X and y be any two r.e. 's defined on a
probability space (n, A, P) and taking values on (S, S) where S is
a separable metric space. Suppose also that the two r.e. 's are dis
tinct, that is: P({w: X(w) = yew)}) < 1. Then, there exists a proba-
79
bility measure Q on (n t A)t Q equivaZent to P~ suoh that~ with
respeot to Q.. X and Y have different probabi Uty distributions. In
other words if two r.e. 's.. X and Y~ induoe the same probability mea
sures on (St S).. with respeot to all probability measures equivalent
to P it follows that X = Y a.e.
Proof: Let d denote the metric on S. Since X and Yare
distinct there exists £0 > 0 such that:
(4.2.4) P({w: d(X(w),Y(w» > EO}) > O.
Since S is separable there exists a countable, disjoint collection,
{Ai} of elements of St whose union is S and such that t for each it
diam(Ai ) < £0'
Hence, it follows that:
(4.2.5)00
= I P(X-1 (Ai ) n {w: d(X,Y) > £0»i=l
From (4.2.4) and (4.2.5) it follows that for some Ai we have:o
P(X-1 (Ai
) n {w: d(X(w) tY(W» > EO}) > O.o
On the other hand, since diam(Ai
) < Eato
Therefore, it follows that:
(4.2.6) P(x-1(Ai
) n (y-1(Ai
»c) > o.o 0
By the previous lemma, there exists a probability measure Q on
(n t A) t Q equivalent to P t such that:
-1Q(Y (Ai» •
o
80
Or, equivalently, we have:
Q({w: X(w) e A. })~O
The result follows.
=
o
Definition 4.2.1. Let E be any set. A map d: ExE ~ R1 is said
to be a pseudo-metria in E whenever, for all (x,y) e ExE:
and1) d(x,y);;?: 0
2) d(x,y) = d(y,x).
d(x,x) = O.
3) d(x,y) ~ d(x,z) + d(z,y).
Remark. Every metric on E is a pseudometric but the converse is
not true since for a pseudometric d, d(x,y) = 0 does not necessarily
imply x = y.
A family {de: e e 0} of pseudometrics in a set E is said to be
separating if for every pair (x,y) E ExE, with x f y, there exists
de such that: de(x,y) > O. Given a separating family of pseudometrics
on a set E, the topology on E generated by the sets:
{{y: de(x,y) < d: x e E, e E 0 E > O}is called the topology induced on E by the family {de: e e 0}.
Clearly, since the family {de: e e 0} is assumed to be separating, the
topology induced by this family, is Hausdorf.
Definition 4.2.2. Let (n, A, P) be a probability space and let
E denote the space of r.e. 's defined on (n, A, P) with values on
(S, S). Let {pe : e e 0} denote the class of probability measures on
(n, A) which are equivalent to P. For each e e 0 and (X,Y) € ExE,
define:
(4.2.7) =
81
Lernrrn 4.2.2. For each e E: G, da is a pseudometric in E. The
family V· {de: 6 E: G} is separating and hence induces an Hausdorf
topology on E.
Proof: Since L is a metric on the space of probability measures
on (S, S), the verification that each de is a pseudometric on E is
straightforward. The other half of the lemma fellows trivially from
Theorem 4.2 . .l.
THEOREM 4.2.2. Let {X} and X be r.e.'s defined on (n, A, P)n
with values on (8, S). Then~ {X} aonvergea to X in probability ifn
and only if for every Px-aontinuity set A c: S and every C € A we
have:
(4.2.8) lim P(X-1 (A) n C) = P(X-1 (A) n C).2n+ ClO n
2
Proof: 1) Suppose first that {X } converges to X in probabi-n
lity. Then, it can be shown (see, e.g. , Billingsley [1968] page 26)
that, for every PX-continuity set A:
(4.2.9) lim P(X-\A) 6 X-1 (A» = O.nn+ ClO
On the other hand, using the properties of the operation 6, we can
write for any C E: A:
(4.2.10) Ip(X-1(A)nC) - P(X-1(A)nC)I ~ P{(X-1(A)nC) 6 (X-1(A)nC)}n n
= P(C n (X-1(A)6X-1(A») ~ P(X-1(A) 6 X-1(A».n n
The desired result follows from (4.2.9) and (4.2.10).
The ''leak convergence of {P~} to Px is equivalent to condition
(4.2.8) with C = n.
82
2) Suppose now that for every Px-continuity set A and every
C € S we have:
(4.2.11) lim P(X-1 (A) n C)nn + 00
= P(X-1(A) n C).
Let £ > O. be given. By the argument used in the proof of the
existence of Skorokhod partitions (see Appendix) there exists a parti-
tion of S by means of a countable collection {B. }1.
of PX-continuity
sets, each one of them with diameter smaller than e. Hence. we can
write:
(4.2.12) P({w: d(X (w).X(w» ~ e})n L
i~l
P(X-1(B.) -1( c»n X B. •1. n 1.
For each i ~ 1. it follows, from the fact that is a P -conX
tinuity set and (4.2.11). that:
= O.
But on the other hand, the sum on the right of (4.2.12) is domin
ated by ~p(X-l(Bi» = 1. for all n ~ 1. It follows that:
lim P({w: d(X (w) .X(w» ~ £}) = O.n
n+ oo
Since £ > 0 is arbitrary, this completes the proof.
The next result is the extension to metric spaces of' Theorem 2.1
of ,Padmanabhan [1970].
CoroZZary. With the notation introduced in Definition 4.2.2, the
metric topology on E, given by p, and the topology generated by
v = {de: e € 0} are sequentially equivalent. In other words, a se
quence {X} converges to X in probability if and only if then
o
corresponding sequence
e E: 0.
converges weakly to for all
83
P~of: In one direction the result is trivial since convergence
in probability is preserved when we substitute P by an equivalent
probability measure.
To prove the result in the other direction let A be a Px con
tinuity set and let C E: A, P(C) # O. Define for D E: A:
(4.2.13) Q(C) = [P(D/C) + P(D)]/2.
Clearly Q is equivalent to P and A is a QX
continuity set. It
follows from our assumptions that:
(4.2.l4) lim P(X-1{A) n C)nn -+ 00
=
Since (4.2.l4) is trivially true for sets C, with P{C) = 0, the
result follows from Theorem 4.2.2. o
Remark. If we denote by T the p-topology on E and by TV the
topology induced by V it follows from the previous theorem and by
some well-known results in topology (see Wilansky [1970], page 27,
Theorem 3.1.2) that TV C T.
THEOREM 4.2.3. Let (S'2, A, P) be a probability space and assume
that A is a separable a-field that is A has a countable sub-class
F such that A is the smallest a-field containing F. Then there
exists a countable subclass G c V such that TG = T.
Proof: We will show first that with the assumption of separability
Theorem 4.2.2 holds under the weaker assumption: C E: F. There is no
loss of generality if we assume that F is a field since the field
generated by a countable class is countable. Let
of sets in A and assume that for some A E A:
(A) be a sequencen
84
(4.2.15) lim P(A nC)n
n+ oo= P(AnC) all C E F.
Let now B E A and £ > 0 be given. Since F is a field and F gen-
erates A, there exists C E F such that:
P(B ~ C) s £/3.
By the same argument used in (4.2.10) it follows that for all
D E A:
(4.2.16) IP(DnC) - P(DnB)I s £/3.
On the other hand, we can write:
Ip(A nB) - P(AnB)! S Ip(A nB) - P(A nC)/ + Ip(A nC) - P(AnC)In n n n
+ IP(AnC) - P(AnB)I .
From (4.2.15) and (4.2.16) it follows that:
all n larger than some N ~ 1.
Hence, it follows that:
Ip(A nB) - P(AnB)I s £n
lim P(A nB) = P(AnB)nn+ oo
all B EA.
Let now {Ci } denote the elements of F and in correspondence to each
Ci , with P(Ci ) f 0, define a probability measure Qi, given by:
= {P(B/C.) + P(B)}/21
B E A.
Let G denote the family of pseudometrics determined by {Q~} as in
(4.2.7). We have just shown that 'G and , are sequentially equiv
alent. Furthermore, since G is countable, 'G can be metrized by the
metric:
85
Therefore, and the two metrics p and are equivalent. o
We now show, by means of an example, that the space (E,PO) is not
necessarily complete.
ExampZe 4.2.2. Let {X} be a sequence of independent r.v. 's den
fined on (n, A, P) and assume that the corresponding sequence of prob-
ability distributions is weakly convergent to a non-degenerate d.f. F.
F is the smallestn
a-field with respect to which X1"",Xn are measurable. Put
F = U >1 F, and let G be the a-field generated by F. Let B E Fn- n
and x be a continuity point of F. Hence,
Clearly, for all n > nO:
for some nO ~ L
P({w: X ~ x} n B)n
It follows that:
= P({w: Xn ~ x})P(B).
(4.2.17) lim P({w: X (w) ~ x} n B)nn -+ 00
= F(x)P(B).
Hence, it follows that the sequence of probability distributions
converge weakly to F(x) with respect to all probability measures given
by (4.2.13) with C E F. It follows that in the case of a countable F,
{Xn} is fundamental in the metric PO' However, if there would exist
X such that {X} converges to X in probability it follows fromn
(4.2.17) and Theorem 4.2.2 that X would have to be independent of F
and hence F would have to be degenerate which is a contradiction.
APPENDIX
We are now going to review the essential facts about metric spaces
that were used in the previous chapters. With the exception of the re
sults related to the Skorokhod partition the material is standard and it
is presented here only for the reader's convenience. A full account of
these and related results can be found in any good introductory text
book on topology or real analysis (see, e.g., Royden [1963] or Wilansky
[1970]).
The basic concepts (e.g.: open and closed sets; interior, closure
and boundary of a set; dense and nowhere dense sets; etc.) as well as
their properties are assumed to be known. As before we will denote
metric spaces by the letter S and the metric on S will be denoted by
d. If A is a subset of S we write A, AO and oA to denote the
closure, the interior and the boundary of A respectively.
A.l. Separable metric spaces
Definition A.l.l. An open aove~ of a metric space S is a collec
tion of open subsets of S whose union in S.
Definition A.l.2. A collection B of open subsets of S
to be a base for S if for every open set 0 in S and every
there exists B € B such that: x € B c O.
is said
x € 0
87
Remark. It is an obvious consequence of the definition that if B
is a base for S then every open subset of S can be expressed as a
union of elements of B.
Definition A.1.3. A metric space S is said to be second Gount-
able if it has a countable base.
Definition A.1.4. A metric space S is separabZe if it contains a
countable dense subset. In other words, S is separable if there
exists a countable subset D of S such that D= S.
THEOREM A. 1. 1. For a metric space S the three condi tions be l(J/;)
are equivalent.
i) S is separable.
ii) S is second countable.
iii) Each open cover of any subset of S has a countab le subcover.
For the proof, see Wi1ansky [1970; page 76].
CoroZlary. Let S be a separable metric space and S be the
a-field of the Borel sets of S. Let 0 be any positive real number.
Then, S can be written as a countable, disjoint union of elements of
S each one of them having diameter smaller than o.
Proof: Consider the open cover of S given by Let
{B} denote a countable sub cover which exists by the previous theorem.n
Hence, we have:
(A.1.l) S = U B .n~l
n
Furthermore, it is clear that:
(A.1.2) diam(Bn) < 0 all n ~ 1.
88
Now since S is a a-field it follows that there exists a disjoint
sequence {An} of elements of S such that:
UA = UB = S.n n
(A.!. 3)
(A.I.4)
A c Bn n all n ~ 1
The result follows. o
THEOREM A.l. 2. Let S be a sepal'ab'le metl'ic space. Then, S is
the sma'l'lest a-field containing the open balls of S.
Pl'oof: Since S is separable it follows from Theorem A.I.l that
there exists a countable collection of open balls which is a base for
S. By the remark following Definition A.l.2 we can say that every open
set in S is a countable union of open balls of S. Hence, it follows
that the a-field generated by the open balls of S contains all open
sets of S. Hence, this a-field contains S since S is by definition
the smallest a-field containing the open sets of S. The inclusion in
the other direction is trivial since every open ball is an open set. 0
A.2. Completeness and compactness on metric spaces
Definition A.2.1. A sequence {x} of points of a metric spacen s
is said to be a Cauchy sequence if
m and n go to infinity.
d(x ,x )m nconverges to zero as both
Remark. It is easy to see that every convergent sequence in a
metric space is a Cauchy sequence and also that the converse to this
statement is not in general true.
89
Definition A.2.2. A metric space where every Cauchy sequence is
convergent is called a complete metric space.
Remark. Completeness is not a topological property in the sense
that two metrics d1
and d2
can generate the same topology on S
(that is, they determine the same class of open sets) and yet S can be
complete with respect to one of them and not with respect to the other.
Definition A.2.3. A subset K of a metric space S is compact if
every open cover of K has a finite subcover.
There are several important properties which are equivalent to
compactness in metric spaces. We will only state the one below which
was used in relation with the definition of tightness.
THEOREM A. 2.1. Let k be a compact subset of a metric space S
and {x} be a sequence of elements of k. Then {x} has a limitn n
point. {Equivalently {x} admits a convergent subsequence.)n
For the proof, see Wilansky [1970; oage 124).
A.3. The existence of Skorokhod partitions
THEOREM A. 3.1. (Skorokhod). Let S be a separabZe rootria space
and P a probability measure on (8, S). Then, there exists a P-con-
tinuous Skorokhod partition of S.
Proof: In the proof of the Corollary to Theorem A.l.l, we saw that
for each k ~ 1, we can express S as a countable union of open balls
whose radius are smaller than (~)k+2. Therefore, for each k ~ 1
there exists a sequence k{xi}i~l of points of S and a sequence
90
(k){ri }i~l of positive real numbers such that:
(A.3.l) s = Ui~l
all i ~ 1.
We claim now that for each k ~ 1 we can choose a real number r k ,
such that for all i ~ 1, is a P-con-
tinuity set. In fact, by the same argument used in the proof of Lemma
3.3.2, for each i the number of values of r in that interval for
(k)which B(x
i,r) is not a P-continuity set is at most countable. Hence
it follows from (A.3.l) and the choice of the rk's that for each k ~ 1
(A.3.2) S ::
It is also clear that
We are now going to express the union in (A.3.2) as a disjoint
union by, defining:
(A.3.3) =i-I (k)U B(x
j,r
k).
j=l
Finally, we define:
k() •.• () Di
.k
(k)is a subset of B(x
i,rk) and hence
k k(~) . Furthermore, since
=
We observe first that Si 1 ,···,ik
the diameter of S. i is smaller than~1 ' ••• , k
the class of P-continuity sets is a field and the open balls
are in this class it follows that S. i is a P-continuity set.~1' ..• , k
The verification of the other properties of the Skorokhod partition
(Definition 3.1.3) is straightforward.
REFERENCES
ALEXANDROV, A. D. (1940-1943). Additive set functions in abstract
spaces. Mat. Sb. 8, 307-348; 9, 563-628; 13, 169-238.
BILLINGSLEY, P. (1968). Convergence of Probability Measures.
John Wiley & Sons, Inc., New York.
DUDLEY, R. M. (1968). Distances of probability measures and random
variables. Ann. Math. Statist. 39, 1563-1572.
NEVEU, J. (1965). Mathematical Foundations of the Calculus of
Probability. Holden-Day, Inc., San Francisco.
PADMANABHAN, A. R. (1970). Convergence in probability and allied
results. Math. Jap. 15, 111-117.
PARTHASARATHY, D. R. (1967). Probability Measures on Metric Spaces.
Academic Press, New York.
PROHOROV, Y. V. (1956). Convergence of random processes and limit
theorems in probability theory. Theor. Prob. Appl. 1, 157-214.
PTKE, R. (1968). Applications of almost surely convergent constructions
of weakly convergent processes. Proc. Internat. Symp. Prob.
Inform. Theor., Springer-Verlag, Berlin.
RENYI, A. (1970). Foundations of Probability. Holden-Day, Inc.
San Francisco.
92
ROYDEN, H. L. (1968). Real Analysis. Second edition, MacMillan,
New York.
SKOROKHOD, A. V. (1956). Limit theorems for stochastic processes.
Theor. Prob. Appl. 1, 261-290.
STRASSEN, V. (1965). The existence of probability measures with given
margina1s. Ann. Math. Statist. 36, 423-439.
VARADARAJAN, V. s. (1958). On an existence theorem for probability
spaces. Selected Translations in Mathematical Statistics, Vol. 2.
American Mathematical Society, Providence.
VARADARAJ&~, V. S. (1961). ~~asures on topological spaces.
Translations of the American Mathematical Society, Series 2, 1965,
Vol. 48, American Mathematical Society, Providence.
WICHURA, M. J. (1970). On the construction of almost uniformly conver
gent random variables with given weakly convergent image laws.
Ann. Math. Statist. 41,284-291.
WILANSKY. A. (1970). Topology for Analysis. Ginn and Company,
Waltham, t~ssachusetts.