table of contents chapter acknm~ledgments abstract i introduction 1.1 introduction and preliminary...

A dissertation under the direction of Gordon Simons.

SOME STRUCTURAL RELATIONSHIPS BETWEEN WEAK CONVERGENCEOF PROBABILITY MEASURES AND CONVERGENCE IN PROBABILITY

Flav;o W. Rodr;guesDepartment of Statistics

University of North Carolina at Chapel HiH

Institute of Statistics Mimeo Series No. 812

March, 1972

TABLE OF CONTENTS

CHAPTER

ACKNm~LEDGMENTS

ABSTRACT

I INTRODUCTION

1.1 Introduction and preliminary background1.2 Review of the literature and description of the

resul ts

II THE REAL LINE

2.1 The non-atomic case2.2 The atomic case2.3 The general case

III GENERALIZATION TO COMPLETE, SEPARABLE METRIC SPACES

3.1 The Skorokhod partition of S3.2 The non-atomic case3.3 The atonric and the general case

IV SOME RELATIONSHIPS BETWEEN THE METRICS LAND p

4.1 The lower bound for p4.2 Weak convergence and equivalent probability

measures

APPENDIX

REFERENCES

PAGE

iii

v

1

4

91824

324961

67

77

86

91

ACKNOWLEDGMENTS

I would like to express my deepest gratitude to my advisor

Professor G. D. Simons for proposing this problem and for the many val

uable suggestions he made during the course of this investigation. I

would also like to thank him for his patience in going through the sev

eral first drafts of the manuscript and for his words of encouragement

and confidence in the difficult moments.

I wish to thank Professor W. Hoeffding, Professor M. R. Leadbetter

and Professor W. L. Smith for reading the manuscript and offering help

ful suggestions.

I extend my thanks to those members of the faculty in the Depart

ments of Statistics and Mathematics who contributed to my education at

the University of North Carolina at Chapel Hill.

For financial support, I would like to express my gratitude to

Conselho Nacional de Pesquisas, (C. N. Pq.), Brazil, which granted me

a scholarship for the larger part of my stay in the United States.

My thanks also go to Pan American Health Organization for a travel

grant and financial support during my first year of graduate studies.

It is a pleasure to acknowledge the cooperation and understanding I

received from the faculty and administration of the School of Public

Health of the University of SAo Paulo during my leave of absence.

lowe a deep sense of gratitude to my wife, Regina, for her con

tinuous support and enthusiasm during all the phases of my studies. I

also want to thank Dr. Carlos A. B. Dantas whose insistence and

enthusiasm were greatly responsible for my decision to come to the

United States to pursue graduate studies.

Finally, I thank Mrs. Cynthia Grossman for her able and careful

typing of the manuscript.

iv

ABSTRACT

Let {X} be a sequence of random variables defined on a proban

bility space (Q, A, P) whose corresponding sequence of probability

distributions {Qn} is weakly convergent to a probability distribution

Q. In this study, we show the existence, on (Q, A, P), of a sequence

{Y} of identically distributed random variables, with probability disn

tribution Q, such that X - Y converges to zero, in probability, asn n

n tends to infinity. In the special case of a non-atomic (Q, A, P)

we use the quantile function, associated with the distribution function

of Q, to explicitly determine a particular version of the Y 'sn

as

functions of the X 's and of an auxiliary uniform random variable.n

This result is then extended to sequences of random elements taking

values on a complete, separable, metric space. To accomplish this ex-

tension we consider a total ordering of the metric space induced by a

special sequence of partitions which were first introduced by Skorokhod.

The techniques used in the real line are adapted to metric spaces by

means of a type of generalized distribution function associated with

each probability measure on the space.

Some relationships between the Inetric p, associated with conver-

gence in probability, and the Levy-Prohorov metric, L, are also in-

vestigated. In particular, a result relating convergence in probability,

weak convergence and the class of probability Ineasures which are equi-

valent to P, is shown to be valid on separable metric spaces.

CHAPTER I

INTRODUCTION

1.1. Introduction and preli8inary background.

The basic facts, concerning the extension of the theory of weak

convergence of probability measures, to metric spaces, have been known,

at least, since 1940 (see, e.g. Alexandrov [1940]). However, it was not

until 1956, with the publication of Prohorov's fundamental paper: "Con-

vergence of random processes and limit theorems in probability theory",

that the importance and far-reaching consequences of such an extension,

were fully understood. A detailed account of both the theory and appli-

cations of weak convergence, plus an extensive bibliography, can be

found in the monographs by Parthasarathy [1967] and Billingsley [1968].

In this dissertation, we will be concerned with problems relating

weak convergence with other types of convergence, and with the structure

of the basic probability space, (n, A, P). Suppose, for example, that

we are given a weakly convergent sequence, {Pn }, of probability mea

sures on the a-field of Borel sets, of a separable metric space S.

Consider, now, a probability space, (n, A, P), where it is possible to

define a sequence, {X}, of random elements, with values in S, whosen

corresponding sequence of probability distributions is {p }.n We are

interested in the implications of the weak convergence of the P 'sn

for

the convergence properties of the X 'snand for the structure of

2

(n, A, P). Before describing our results, we need to recall some defi

nitions and facts, about probability measures in metric spaces.

From now on, S will always denote a separable metric space with

distance d. The particular distance will, in general, be of no concern

to us; what matters is that the topology of S is given by means of a

metric. For the reader's convenience we have assembled, in the Appendix

the results about metric spaces that will be needed in the sequel.

Definition 1.1.1. The Borel a-field of S is the smallest a-field

containing the open (closed) sets of S. It will be denoted by Sand,

since S is separable, it is also the a-field generated by the open

balls of S.

Definition 1.1.2. A measurable function from a measurable space

(n, A) to (S, S) will be called a random element (r.e.) ·of S. Any

r.e. defined on a probability space (n, A, P) induces, in the usual

way, a probability measure on (S, S).

Definition 1.1.3. A sequence {Q} of probability measures onn

(S, S) is weakly convergent to a probability measure Q on (S, S) if

for every real valued, continuous and bounded function, f, on S, we

(1.1.1) f fd~ ~ f fdQ as n ~ ~.

Definition 1.1.4. Let Q be any probability measure on (S, S).

A Borel set A is said to be a Q-continuity set if Q(oA) is equal to

zero. Here, oA denotes the boundary of A.

3

THEOREM 1.1.1. Let {Pn } and P be probabiUty measures on

(5, S). Then~ P aonve~es weakZy to P if and onZy if P (A) aon-n n

verges to P(A) for every P-aontinuity set A E S.

For the proof and for other conditions equivalent to weak conver-

gence, see Billingsley [1968; pages 11-12].

For any subset A of 5 and 0 > 0, let

(1.1. 2) ... {xE5: d(x,A) $ oJ •

Definition 1.1.5. Let P and Q be two measures on S. The

Levy-Prohorov distance, between P and Q is defined to be:

(1.1.3) L(P,Q) = inf{€ > 0: P(F) $ Q(F€)+£,

for all F, closed, F c S}.

L is a metric on the space Z(5) of probability measures on (5, S)

and it has been shown, by Prohorov [1956] and Dudley [1968], that con-

vergence in the metric L is equivalent to weak convergence.

For random elements X and Y, from (n, A, P) to (5, S) we

will denote by d(X,Y), the function that at WEn takes the value:

d(X(w), Y(w». It can be shown, (see Billingsley, [1968], pg. 225) that

when 5 is separable, d(X,Y) is a random variable.

Definition 1.1.6. Let {X} and X be r.e.'s from (n, A, P) ton

(5, S). The sequence is said to converge in probability to X, if, for

all £ > 0: P({w: d(X ,X) ~ £}) converges to zero as n tends ton

infinity.

Convergence in probability can also be expressed by means of a

metric, on the space of all r.e.'s from (n, A, P) to (5, S). Given

4

any two r.e.'s X and Y define

(1.1.4) p(X,Y) = inf{£ > 0: P({w: d(X,Y) > £}) S £}.

If we interpret equality of r.e. 's to mean equality a.e. then, p

is a metric and convergence in the metric p is equivalent to conver-

gence in probability.

1.2. Review of the literature and description of the results.

We start by quoting the classical result which, loosely speaking,

says that convergence in probability implies weak convergence.

THEOREM 1.2.1. Let {X} and X be r.e. 's defined on (Q, A, P)n

with values on (8, S). {X }n

converges in probability to X~ the

corresponding sequence {Qn} of probability distributions converges

weakly to the probability distribution Q of x. The converse is also

true in the case where Q has aZl its mass concentrated on a singleton.

In this dissertation we shall look into several types of converses

for Theorem 1.2.1. As a matter of fact, our main results could be con-

sidered as converses for Theorem 1.2.1, in the special case where the

r.e. 's are assumed to be defined on the same probability space. The

following theorem, due to Skorokhod [1956], can be regarded as an

existence type of converse for Theorem 1.2.1 and will playa fundamental

role in the sequel.

sures on (8, S) and assume that

THEOREM 1.2.2. (Skorokhod) Let {p} and P be probability mean

{p} converges weakly to P. If Sn

is separable and compZete 3 we can find a probability 8pace~ with random

5

eZements {X} and X defined on it~ such thatn

a) For each n;:;; I, Pn is the pl'obabiZity distribution of xn

and P is the pl'obabiZity distribution of X.

b) Xn converges to X everywhel'e~ as n tends to infinity.

This result has been extended, by Dudley [1968], to spaces 8

which are only separable. For further extensions, which don't assume

separability see the paper by Wichura [1970]. A survey paper by Pyke

[1970] gives several examples of a.s. convergent processes, constructed

with the help of Theorem 1.2.2, which have a wide variety of app1ica-

tions to probability and statistics.

For S complete, Strassen [1965] proved the important result that

if P and Q are any probability measures in Z(8), the Prohorov dis-

tance L(P,Q) is the minimum distance, "in probability", (see Defini-

tion 1.1.4) between r.e.'s distributed according to P and Q. From

this result, it follows that if P converges weakly to P,n

sequences

of r.e.'s {X} and {y}, whose distributions are respectively, {P}n n n

and P, could be constructed in such a way that, p(X , Y )n n converges

to zero as n tends to infinity. Of course, this last result is also

implied by Skorokhod's theorem which is in fact, stronger.

Both results mentioned above, assume as given a weakly convergent

sequence, {P },n

of probability measures and go on to construct a prob-

ability space, with a sequence of r.e.'s defined on it, distributed ac-

cording to {P }n and satisfying a specific convergence property. In

our main result (Theorem 1.2.3, below) we take a different approach,

assuming both the probability space and the sequence of r.e. 's to be

given a priori.

6

THEOREM 1.2.3. Let {Xn } be a sequence of r.e. 's defined on a

probability space (n, A, P), with values on a complete~ separable

metric space S. Suppose that the corre sponding sequence of probabi Zity

distributions~ {~}~ converges weakly to the probability distribution

Q. Then~ there exists~ on (n, A, P)~ a sequence {y} of identicallyn

distributed r.e. 's~ with probability distribution Q~ such that the

distance "in probabi Zi ty"

infinity.

p (x , y )n n converges to zero as n tends to

Remarks: 1) We say that a probability measure Q, on (8, S), is

admissible for (n, A, P) if there exists, in this probability space,

a r.e., with values in S, whose probability distribution is Q. It

follows, from Theorem 1.2.3, that the class of probability measures on

(S, S), which are admissible for a given probability space, is closed

under weak convergence.

2) Suppose that for some probability space (n, A, P) and for

some probability measure, Q on S, X is the unique (up to an equiv

alence) r.e. on (n, A, P) whose probability distribution is Q.l

Then, if {X }n is any sequence of r.e. 's on (n, A, P), whose proba-

bility distributions are weakly convergent to Q,

converges to X in probability.

it follows that Xn

1

In Chapter II, we discuss the case where S is the real line with

the usual metric. In this case, the proofs can be greatly simplified by

An example of this situation is provided by a degenerate probability measure, i.e. a probability measure that put all its mass on asingleton. For spaces with atoms (see Definition 2.1.2) this situationmay occur even for non-degenerate Q.

7

the well-known constructions based on the natural order of the real

line. For example, we will show that if X is any random variable,

with distribution function F, defined on a non-atomic probability

space, (Definition 2.l.3) it is possible to modify F(X} in order to

obtain a uniform random variable U, in such a way that X can be ex

pressed as a measurable function of U. This result, described in

Theorem 2.1.2, may have some interest in itself. For atomic probability

spaces, we show that Theorem 1.2.3 is a consequence of the relations be

tween tightness (see Definition 2.2.2) of the sequence of probability

measures and the existence of a.e. convergent subsequences. (Theorem

2.2.2}. The general case requires a little more than a simple combi

nation of the other two, in virtue of the fact that weak convergence is

not, necessarily, preserved under conditioning.

In Chapter III, we consider the problem in an abstract complete,

separable, metric space. The difficulty here will be in the non-atomic

case, since the powerful machinery of distribution functions is not

available. We will show, however, that an equivalent procedure can be

developed, by partitioning the metric space, in the way used by

Skorokhod in the proof of his result (Theorem 1.2.2).

In Chapter IV, we will discuss some relationships between the met

rics Land p from the point of view of the structure of the proba

bility space (n, A, P). By restricting L to the subspace of admis

sible probability measures we will be able to discuss a structural ver

sion of Strassen's result. Furthermore, we will extend to metric spaces

a result of Padmanabhan [1970] and show how this extension will allow us

to use L to define a new metric PO' in the space of r.e.'s, which is

equivalent to p. By means of an example we will show that Po is not

necessarily complete which implies that, although equivalent, the two

metrics generate different uniformities.

8

CHAPTER II

THE REAL LINE

In this chapter we consider sequences of probability measures on

the a-field B of Borel sets of the real line (R1). As it is well

known, a probability measure on (R1

, B) is completely determined by

the corresponding distribution function (to be hereafter abbreviated as

d.f.). A random variable (r.v.), X, defined in some probability space

(n, A, P) induces in (R1

, B) a probability measure Px given by:

-1PX(B) = P(X (B» for all B € B.

We will make use of the words inc~asing~ dec~asing~ positive~

negative in their loose interpretation. The qualifier strictly will be

added when necessary. All d.f. 's are assumed to be continuous from the

right and proper, that is such that F(-oo) = 0 and F(+oo) = 1.

Finally, we recall that the weak convergence of {p} to Pis,n

here, equivalent to the convergence of the corresponding sequence of

d.f.'s, {F },n

to the d.f. F of P, at all points of continuity of

the latter.

2.1. The non-atomic case.

Definition 2.1.1. Let (n, A, P) be a probability space. Given

A € A, we define the P-equivalence class determined by A, to be:

10

[AJ = {B € A: P(AAB) = a}, where 6 indicates the symmetric difference.

Definition 2.1.2. An atom of a probability space (n, A, P) is

the P-equiva1ence class of a set A € A for which peA) > 0 and such

that for all B € A, B c A, we have either PCB) = 0 or PCB) = peA).

Remapk. If A is an atom of (n, A, P) then, the d.f. of every

r.v. defined on this space has a jump point of size ~ peA). On the

other hand if X is a r.v. defined on (n, A, P) and F is the d.f.

-1of X, then the atoms of the probability space (n, X (B), P) are

-1given by: {X ({x}): x is a jump point of F}.

We will not, in general, distinguish between the event A and the

P-equivalence cZass [AJ. Statements such as: the event A is an atom,

can be made rigorous by the convention that 2 atoms A and Bare

equal whenever P(A6B) = O. Since the intersection of two distinct

atoms of (n, A, P) has probability zero it follows that a probability

space has at most a countable number of atoms.~

atoms of (n, A, P) and put: AO = U 1 A •n= n

Let {A}n denote the

Definition 2.1.3. A probability space without atoms is called non-

atomic. A probability space is called atomic if P(AO) = 1.

The following theorem is due to Varadarajan [1958]:

THEOREM 2.1.1. If (n, A, P) is a non-atomic ppobability space,

it is possible to define r.v.'s ~1'~2"'" with aPbitrapy, pre

assigned consistent, joint distPibutions.

CopolZ~. A probability space admits a uniformly distributed ran-

dom variable if and only if it is non-atomic.

11

Definition 2.1.4. Let F be any d.f. For all t, 0 < t < 1,

define:

(2.1.1) F-1 (t) = inf{x € R: F(x) ~ t}.

It is easy to see that -1F is increasing and continuous from the left.

-1F ,Some properties of that will be needed later, are listed in the

following lemma:

Lemma 2.1.1. Let F be any d.f. For all real x and every t,

o < t < 1, we have:

if and only if

implies

and

x < F-1 (t)

t ::; F(x)

F-1 (t) ::; x.

and

implies

t < F(x)

i)

ii)

iii)

iv) -1F is continuous at t if and only if {x: F(x) = t} is

either empty or a singleton. Conversely, F is continuous

at x if and only if-1

{t: F (t) = x} is either empty or a

singleton.

v) F converges weakly ton

F if and only if-1

Fn converges

to -1F at all continuity points of -1

F •

Proof: i, ii, iii, iv are direct consequences of the definition of

-1F and the properties of the infimum of a set of real numbers. To

prove v, suppose first that

note a continuity point of

Fn

-1F •

converges weakly to F. Let

In correspondence to a given

de-

e: > 0,

arbitrary, choose two continuity points, x and y, of F, such that:

(2.1.2) and

(2.1.3) y - x < E.

12

This choice is possible in virtue of the fact that the set of con tin-

uity points of F is dense in the reals.

By part ii of the lemma, we have:

(2.1.4) F(x) < to'

By part i of the lemma and the fact that F is increasing, we have:

(2.1.5)

Finally, since to is a continuity point of -1F , the first half of

part iv of the lemma implies:

(2.1.6) to < F(y).

The convergence of F (x)n

to F(x) and of F (y)n

to F(y) together

with (2.1.4) and (2.1.6) imply that there exists an integer N, such

that:

(2.1. 7) for all n ~ N.

Parts ii and iii of the lemma, imply:

(2.1.8) for all n ~ N.

By putting together (2.1.2), (2.1.3) and (2.1.8) we have:

(2.1.9) for all n ~ N.

This completes the proof of the first part of v.

To prove the other half, observe first that the set of continuity

points of-1

F is dense in (0,1) . Hence, given E > 0, and a

13

continuity point Xo of F, consider the possibilities:

a) o < F(xO

) < 1. Choose two continuity points, t1

and t 2 '

of -1 such that: < F(xO

) andF , t1

< t2 t 2 - t < E.

1

b) If F(xO

) = 0 or F(xO) = 1 choose, respectively, a con-

tinuity point t 1 < E or a continuity point t2

> 1 - E.

The rest of the proof is analogous. o

Remark. It follows from part iii) of the lemma that if U is a

uniform r.v. on [0,1] and F is any d.f., the r.v. F-1(U) has d.f.

F. Hence, it can be shown that if X is a r.v. with a continuous d.f.

F, there exists a uniform r.v. U such that: -1F (U) = X a.e. In

fact, since F is continuous, F(X) is uniformly distributed and put-

ting U = F(X) the result follows from part i) of the lemma and the

fact that X and F-1 (F(X» have the same distribution.

THEOREM 2.1.2. Let X be a r.v.~ with d.f. F~ defined in a non-

atomic probabiZity space (n, A, P). Then, there exists in (n, A, P)

a uniformZy distributed r.v. U~ such that: F-1(U) = X a.e.

Remark on notation. For any d.£. F and 1x E R , we write:

F(x-) = 1imttx F(t). Similarly for any bivariate d.f. G, we write:

(2.1.10) G(x-, y) = lim G(t, y).t t x

Proof of the theorem. Choose a uniformly distributed r.v. Z in

(n, A, P). (The existence of Z is assured by the Corollary to Theorem

2.1.1.) Denote by G the joint d.f. of X and Z. For every pair of

real numbers, (x, z), define:

(2.1.11) H(x,z) = F(x-) + G(x,z) - G(x-,z).

14

Now, define for WEn:

(2.1.12) U(w) = H(X(w), Z(w».

It follows that:

(2.1.13) F(X(w)-) s U(w) s F(X(w» all w € n.

We claim that the r.v. U satisfies the requirements of the theorem.

Recall that to show that U is uniformly distributed, it will be enough

to show that:

(2.1.14) P({w: U(w) < t}) s t s P({w: U(w) s t}) all t E (0,1).

Let t, 0 < t < 1, be given. Put:

Lemma 2.1.1, we have:

-1Xo = F (t). By part i) of

(2.1.15) = t.

We first prove that:

(2.1.16) {w: U(w) < t} c {w: X(w) s xo}.

In fact, suppose that Wo is such that: U(WO) < t. It follows, from

(2.1.13), that: F(X(wo)-) < t.

Now, observe that for any real y > xO' we have: F(y-) ~ F(xo) ~

~ t. The result follows.

We now consider, separately, the two possible cases in (2.1.15).

Case 1. F(xO) = t. First note that, X(w) s xo' together with (2.1.13)

imply:

U(w) S F(X(w» S F(xO) = t.

Hence, we have

(2.1.17) {w: X(w) ~ xo} c {w: U(w) ~ t}.

15

(2.1.16) and (2.1.17) imply:

P({w: U(w) < t}) ~ F(xO

) $ P({w: U(w) $ t}).

Since, F(xo) = t, (2.1.14) follows.

Case 2. F(p-l(t)) = P(xO) > t. This assumption and the definition of

F-1 imply: F(xO

-) $ t. Hence:

(2.1.18) <

Now, observe that, G(xO'z) - G(xO-'z) is, for z € [0,1], a contin

uous function of Z, whose maximum is F(xO) - F(xO-) and whose mini-

mum is zero. Since a continuous function, defined in a compact set,

assumes all the values between its maximum and its minimum, it follows

from (2.1.18), that there exists zo € [0,1], such that:

(2.1.19)

We will show that:

(2.1.20) {w: U(w) < t} c {w: X(w) < xo} u {{w: X(w)=xO}n

{w: Z(W)~zo}} c {w: U(w) $ t}.

To prove the first half of (2.1.20) assume that Wo is such that:

U(wo) < t. It follows by (2.1.16), that X(WO) $ xo. If X(WO) < xO'

we are done. Suppose: X(WO

) = xo. Hence, from the definition of U,

we have:

From the assumption: U(wO) ~ t and (2.1.19), we have

16

Observe now that: G(xo'z) - G(xO-'z) = P({w: X(w)=xo}n{w: Z(w)~z}) is

increasing with z, and hence (2.1.21) implies:

(2.1.22)

The first half of (2.1.20) follows.

To complete the proof of (2.1.20), let w1 be such that:

X(w1) ~ xo. If X(w1) < xo' since Xo = F-1(t), we have, by part i)

of Lemma 2.1.1, that:

(2.1.23)

(2.1.13) and (2.1.23) imply: U(w1) < t and we are done. Suppose

now, that: X(w1) = Xo and Z(w1) ~ z00 We have:

= F(XO-) + G(xO'Z(w1» - G(xO-'Z(w1»

~ F(xO-) + G(xo'zo) - G(xO-'zo)'

From (2.1.19), the expression on the right hand side is equal to t and

the proof of (2.1.20) is completed.

Using (2.1.20) we can write:

(2.1.14) now follows from (2.1.19) and the proof that U has the uni-

form distribution is complete.

-1To finish the proof of the theorem, observe that F (U) is a r.v.

whose d.f. is F. On the other hand, from (2.1.13) and part i) of

Lemma 2.1.1, we have:

(2.1.24)

17

Since F-1 (U) and X have the same distribution, (2.1.24) implies:

Remarks. 1) In the case that F is continuous, the theoremre-

duces to the situation discussed in the remark following Lemma 2.1.1.

2) The r.v. U is clearly not unique and the role played by Z,

o

in the construction of U, would have been fulfilled by any r.v., with

a continuous d.f., taking its values on a compact set.

THEOREM 2.1.3. Let {X} be a sequence of r.v. 's defined in an

non-atomic probability space (n, A, P). Suppose that the corresponding

sequence, {F}, of d.f. 's is weakly convergent to a d.f. F. Then,n

there exists in (n, A, P) a sequence, {Y}, of identically distrin

buted r.v. 's, with d.f. F, such that the sequence Z = X - Y con-n n n

verges in probability to zero as n tends to infinity.

Proof· We first prove that if U is any uniformly distributed

r. v. , F-1 (U) converges to F-1 (U) a.e. In fact, by part v) of Lemman

2.1.1, the weak convergence of F to F, implies the convergence ofn

-1F

nto -1F at all points of continuity of

-1F . Since the set D of

discontinuities of-1

F is, at most, countable it follows that:

P({w: U(w) € D}) = O. Therefore, F-1 (U)n

converges to-1

F (U) a.e.

By the previous theorem, to each r.v. X, we can associate an

uniformly distributed r.v. u ,n

such that:.(2.1.25) F-1 (U) = X a.e. all n ~ 1.

n n n

Define:

(2.1.26) Y = F-1 (U) all n ~ 1.n n

Clearly t the y 'sn

are identically distributed with d.f. F.

18

Further-

more t observe that for fixed n t the distribution of F-1 (U) - F-1 (U )n n n

depends only on the distribution of Un

and is, therefore, the same as

the distribution of F-1 (U) - F-1 (U). Since the latter sequence conn

verges to zero a.e. it follows that:

verges to zero, in probability, as n tends to infinity.

con-

o

Remark. In the case where all the X 's have continuous d.f.'s,n

we have, for all n ~ 1, U = F (X ).n n n Hence, each yn

can be explic-

itly determined as a Borel measurable function of Xn alone Le.:

Y = F-1 (F (X» all n ~ 1.n n n

2.2. The atomic case.

We will now consider sequences of r.v.'s defined in an atomic prob-

ability space. Many of the basic definitions and results, used in this

chapter, will again be needed in Chapters Three and Four in the more

general context of separable metric spaces. We introduce them here, in

their particular versions for R1 , in order to make this chapter com-

plete and self contained.

Definition 2.2.1. Let G be a family of d.f.'s (equivalently a

family of probability measures on 1(R ,8». We call G relatively com-

paot if every sequence of elements of G contains a subsequence which

is weakly convergent to a d.f. (not necessarily in G).

Definition 2.2.2. A family G of d.f.'s is said to be tight if

for every E > 0, there exists a closed, finite interval, [a,b], such

that:

(2.2.1) F(b) - F(a) > 1 - £ all F € G.

19

THEOREM 2.2.1. (Prohorov) A family G of d.f. 's is relatively

compact if and onLy if it is tight.

For the proof and a discussion of the implications of this result

see e.g. Billingsley [1968, pages 35-40].

THEOREM 2.2.2. Let {X} be a sequenae of random variables den

fined in an atomia probability spaae. If the aorresponding sequenae~

{Fn } of d.f. 's is tight~ then every subsequence of {Xn

} has a further

subsequence which converges a.e.

Proof. Since every subsequence of a tight sequence of d.f.'s is

itself a tight sequence it is enough to show that {X} has an a.e.n

convergent subsequence. Let us denote by {Ai}i~l the atoms of

(n, A, P) and write:

P(A.)1.

= all i ~ 1.

Definition 2.1.2 and the assumption that the space is atomic, imply:

00

(2.2.2) Pi > 0 all i ~ 1 and L Pi = 1.i=l

Furthermore, for every fixed i ~ 1 there exists a numerical sequence

{x} such that:ni n~l

(2.2.3) Xn(W) = xni for all n ~ 1 and almost all W € Ai'

We now prove that, for every fixed i ~ 1, {xni}n~l is a bounded

sequence of real numbers. In fact, since Pi > 0, we can choose £,

20

such that:

(2.2.4) o < 8 < p.,1

It follows from the assumption of tightness that there exists a

bounded, closed interval, [a,b], such that:

(2.2.5) all n ~ 1.

On the other hand, from (2.2.3) and (2.2.4) we have:

(2.2.6) P({w: X (w) =xi})n n = all n ~ 1.

It is clear that if for some nO ~ 1,

have a contradiction. It follows that

of real numbers, for all i ~ 1.

we had x i t [a,b], we wouldno

{x} is a bounded sequenceni n~l

The construction of an a.e. convergent subsequence can now be

accomplished by the diagonal procedure as used, for example, in the

proof of the theorem of ReIly-Bray. Equivalently, we can say that the

existence of such a subsequence, is a consequence of the criterion for

00

the relative compactness of subsets of R (see, e~g., Billingsley

[1968, page 219]). 0

We now recall that, in Chapter I, we introduced the notation p

for the metric corresponding to convergence in probability.

In any metric space (S, d), for s E S and A a subset of S,

we write d(s, A) for the distance between s and A, which is

defined to be:

d(s, A) = inf d(s, t).t € A

21

THEOREM 2.2.3. Let {X} be a sequence of r.v. 's defined in ann

atomic probabiZity space (n, A, p). Suppose that the corresponding se

quence, {Fn}, of d.f. 's is weakZy convergent to a d.f. F. Then, if

we denote by G, the famiZy of r.v. 's in (n, A, P), whose d.f. is F,

we have:

i) G is not empty.

iiJ p(X, G) converges to zero as n tends to infinity.n

Proof· Clearly, every subsequence of {F }n

is weakly convergent

to the same limit F. Hence, {F} is weakly compact and therefore,n

by Theorem 2.2.1, {F} is tight. By the previous theorem, theren

exists a subsequence {X.} ofn

{X} which converges a.e.n

It follows

that: X(w) = lim, X ,(w) is a r.v. on (n, A, P), whose d.f. is Fn n

and that implies that G is not empty.

To prove iiJ let {X ,} be any subsequence ofn

{X }.n

Since the

corresponding sequence, {Fn ,}, of d.f. 's is again tight it follows,

by the previous theorem, that we can find a subsequence, {Xn"}' which

converges a.e. and hence, in probability to a r.v. Y € G. Hence:

(2.2.7) p (X '" G) ~ p (X '" Y) ='> 0n n as

This shows that any subsequence of the numerical sequence

{p(X ,G)}, contains a further subsequence which converges to zero. Then

result follows.

CoroZZary. In the same conditions, as in the above theorem, there

exists on (0, A, P) a sequence, {Y}, of identically distributedn

r.v.'s, with d.f. F, such that X - Y converges to zero, in proba-n n

bility, as n tends to infinity.

22

Proof. Let {an} be a strictly positive numerical sequence, such

that:

that:

(2.2.8)

lim a = 0.n-+oo n

p (X ,Y )n n

<

For each n?: 1, there exists Y € G suchn

Furthermore, it is clear from the definition of p that:

p(X ,Y ) = p(X -Y ,0).n n n n

By the previous theorem and the assumption about

that:

{a }n

it follows

p(X -Y ,0) -+ °n n as n -+ 00.

The result now follows from the equivalence between p-convergence and

convergence in probability.

Remarks. 1) In the first comment, that followed Theorem 1.1.3 of

Chapter I, we introduced the concept of an admissible probability mea-

sure for a probability space. In the real line we will talk about ad-

missible d.f.'s and part i) of the previous theorem says that the class

of admissible d.f.'s, for an atomic space, is closed under weak con-

vergence. This result may seem contradictory, if we consider the fact

that there are sequences, {F}, of purely discrete d.f.'s, which conn

verge weakly to a continuous d.f. F. Of course, the answer to this

apparent contradiction lies in the fact that no sequence of r.v.'s,

with those d.f. 's, can be defined in an atomic space.

2) Consider, now, a sequence, {X}, of identically distributedn

Bernoulli r.v.'s, with probability p of success, 0 < p < 1. Of

course, the corresponding sequence of d.f.'s is admissible for any prob-

ability space, which contains an event with probability p. However,

if we require the x 'sn to be independent it follows, from the above

23

results and the central limit theorem, that such a sequence of r.v.'s

cannot be defined on an atomic space. Since independence plays an im-

portant role, in the construction of such examples, it is probably

worthwhile to consider the problem of the existence of sequences of in-

dependent r.v.'s on atomic spaces.

Renyi [1970, pages 167-168] shows that there are atomic probability

spaces where, non-trivial, sequences of independent events exist but,

our freedom, to choose the values of their probabilities, is severely

restricted by the atomic structure. In the theorem below, we rephrase

Renyi's result in a way which is more convenient for our purposes.

THEOREM 2.2.4. (Renyi) Let {B} be a sequence of independentn

events, on an atomic p~babiZity space

n ~ 1. Then, we have:

(Xl

(n, A, P). Let q ... P(B ),n n

(2.2.9) min(q , 1-q ) < (Xl.n n

It is well-known that, in atomic spaces, convergence in probability

and convergence a.e. are equivalent (see, e.g., Neveu [1965, page 48]).

We now use Renyi's result to show that, with the assumption of indepen-

dence, convergence in law is also equivalent to the other two.

THEOREM 2.2.5. Let {X} be a sequence of independent r.v.'s den

fined on an atomic probabiZity spaae. If the aorresponding sequenae of

d.f. 's is weakZy aonvergent to a d.f. F, F is degenerate and the

Xn's converge a.e.

24

Proof. Let x denote a continuity point of F. The events:

B = {w: X (w) s x}n n

form a sequence of independent events. Put: q = P(B ),n n all n;:: 1.

From the assumption of weak convergence, we have:

(2.2.10) lim Cln = q, Osq:sl.

From (2.2.9) it follows that:

(2.2.11)

But,

lim min[q , (l-q )] = O.n nn -+ 00

all n ;:: 1.

From (2.2.10) and (2.2.ll), we have:

1 - 12q-ll = O.

It follows that, q is either zero or one, which shows that F is

degenerate. Since convergence in law, to a degenerate d.f., implies

convergence in probability and the latter is, in atomic spaces, equi-

valent to convergence a.e. the proof is complete.

2.3. The general case.

In this section we discuss the problem in general probability

o

spaces, that is, spaces which have both an atomic and a non-atomic part.

If, as in Definition 2.1.3, we denote by AO the union of all the atoms

of (n, A, P) we have:

(2.3.1)

25

In order to reduce the problem to the situations discussed in the

previous sections, we introduce two new probability measures, P1

and

P2' on the measurable space (n, A). For all B E A define:

(2.3.2) P1 (B) P(B/AO)p(BnAo)

= = P(AO

)

(2.3.3) P2(B) = P(B/A~)p(BnA~)

=P(A~)

Because of (2.3.1), Pi and P2 are both well defined and they are,

clearly, probability measures on (n, A). Furthermore, it can be easily

checked that the probability spaces, (n, A, P1

) and (n, A, P2) are

atomic and non-atomic respectively.

Some remarks on notation. Since we now have three different proba-

bility measures in the same space, we will, whenever confusion is pos-

sible, state between parenthesis, the measure under consideration at

that moment. So far, we have considered r.v.'s as being defined in

probability spaces, since we do not distinguish between two real valued

measurable functions defined on (n, A), which differ only on a set of

P-measure zero. In this section, each real valued, measurable function

defined on (n, A) determines three different r.v. 's. Observe also,

that any element of the P-equivalence class of X, determines the two

other corresponding r.v. 's in (n, A, P1) and (n, A, P2) since, both

P1

and P2

are absolutely continuous with respect to P. The con

verse is, of course, not true. Finally, if F denotes the d.f. of X

with respect to P, we will write F(1) and F(2) for the d.f.'s of

X with respect to P1

and P2

respectively.

It follows, from (2.3.2) and (2.3.3), that:

(2.3.4)

26

The above remarks suggest that a way to deal with questions, concerning

r.v.'s in a general probability space, is to try to solve the problem,

separately, in the associated atomic and non-atomic spaces. If solu-

tions can be found, in both cases, we hope that a suitable combination

of them will provide an answer for the original question.

In our problem, however, this approach will not allow us to apply

directly the theorems of the previous sections. This is so because, as

the following example shows, the weak convergence of the

necessarily imply the weak convergence of {F(l)} andn

does not

Example 2.3.1. Let n be the closed interval [0,1], B the 0-

field of its Borel sets and m be the Lebesgue measure. Let A be the

class of sets, formed by the interval (~,l] and all Borel subsets of

[O,~]. A is clearly a sub a-field of B and if P denotes the re-

striction of m to A, (n, A, P) is a probability space whose unique

atom is (~,l]. Define, for all n ~ 1:

{Oif W E: (.~, 1]

X2n (w) =1 if w E: [O,~]

( if w e: [O,~J

X2n

_1

(w) =if w € (~ ,1]

Since the X 's are, with respect to P, identically distributedn

it follows that the F 's form a constant sequence and are, therefore,n

weakly convergent. However, for each x, °< x < 1 both

and {F(2)(x)} are oscillating sequences of zeros and ones, whichn

shows that neither one of the sequences,

verges weakly.

con-

27

The next two lemmas will show how to overcome this difficulty. The

first one shows that, although weak convergence may be lost in the de-

composition of the space, tightness is preserved.

Lemma 2.3.1. Let {X} be a sequence of r.v. 's on the probabilityn

space (Q, A, P). Suppose that the corresponding sequence, {F}, ofn

d.f.'s is tight. Then, both and are tight.

Proof· We prove the result for the other case being

totally analogous. Let e > 0, arbitrary, be given. Since {F} isn

tight, there exists a closed, finite interval [a,b], such that:

(2.3.5) P({w: Xn(w) € [a,b]}) > 1 - eP(Ao) all n ~ 1.

Clearly, for any 2 sets, B € A and C € A, we have:

(2.3.6) P[BnC] ~ P[B] + P[C] - 1.

Hence, it follows that:

Hence, by (2.3.2):

=

The result follows.

all n ~ 1.

Observations. 1. We choose to present a direct proof of the lemma

above because it is simple and depends only on the definitions. A

shorter proof can be obtained from the fact that both P1 and P2 are

absolutely continuous with respect to P.

28

2. Note that the tightness of is not suffi-or

However, if both,{F} to be tight.n

{F} is tight.n

dent for

are tight,

Lemma 2.3.2. Let {X} be a sequence of r.v. 's in the probabilityn

quences

space (n, A, P). Then, the weak convergence of any two of the se

{F} {F{l)} {F(2)} imply the weak convergence of the third.n' n ' n

Furthermore, if the weak-limits are, respectively, F, and F(2),

(2.3.4) is true.

Proof. It is enough to show the result for one pair of sequences,

the proof for the other two being totally analogous.

is weakly convergent to F and

Assume that Fn

is weakly convergent to F{l).

From (2.3.4) we have, for all real x and all n ~ 1:

(2.3.7)

Let D1 denote the set of points in the real line, which are con-

tinuity points of both F and Since is countable, it fol-

It follows from our assumptions and from (2.3.7) that

is weakly convergent to a d.f.

lows that D1 is dense in the line.

verges at every point of

it follows that

Since is dense and

{F (2)} conn

{F(2)} is tightn

F(2). (2.3.7)

now implies the validity of (2.3.4) for all real x which is a contin

uity point of the three d.f. 's F, F{l) and F(2). Since the set of

such XIS is again dense in the line (it has a countable complement),

the validity of (2.3.4), for all real x, follows and the proof is

complete.

29

We are now in a position to extend the results of the previous

sections to general probability spaces.

THEOREM 2.3.1. Let {X} be a sequence of r.v. 's~ defined in then

probabiLity space (n, A, P) and suppose that the sequence {F} ofn

d.f. 's is weakLy convergent to a d.f. F. Then~ if G denotes the

class of r.v. 's on (n, A, P), whose d.f. is F~ we have:

iJ G is not empty

iiJ p(X, G) ~ 0 as n ~ ~.n

Proof. Since {F } is tight it follows, from Lemma 2.3.1, thatn

both {F(1) } and {F(2)} are tight. Let {F(1)} denote a weakly con-n n n'

vergent subsequence of {F(i)} and let G be its limit. The sequencen

satisfies the conditions of Lemma 2.3.2. It follows that

is weakly convergent to a d.f. H and we have:

(2.3.8) all 1x € R .

The sequence {Xn

,}, considered as a sequence of r.v. 's in

(Q, A, P1), satisfies the assumptions of Theorem 2.2.3. Hence, there

exists in (n, A, Pi) a r.v. Y whose d.f. (Pi) is G. On the

other hand, since (n, A, P2

) is non-atomic we can choose a r.v. Z

whose d.f. (PZ

) is H.

Define:

=X(w) Y(w)IA (w) + Z(w)I (w)o AO

(IA is the indicator of the set Ao)'o

(2.3.9)

(2.3.2) and (2.3.3), plus some easy calculations, imply that the

d.f. (P) of X is equal to the right-hand side of (2.3.8) and that

implies that G is not empty.

30

To prove ii) we will show, as in the proof of Theorem 2.2.3, that

any subsequence of the numerical sequence, {p (X ,G)},n contains a fur-

ther subsequence which converges to zero. Let {X ,} be any subsen

by Lemma 2.3.1 that both

quence of {X }.n Since {F ,} is a tight sequence of d.f.'s it follows

n

{F(l)} and {F(2)} are tight. Choose a sub-n' n'

sequence {F (1) } which is weakly convergent to a d.f. G. By Lemman"

2.3.2, {F(2) } is also weakly convergent and we let H denote itsn"

limit. {Xn

,,} , considered as a sequence of r.v.'s in (S1, A, Pi)' sat-

isfies the assumptions of Theorem 2.2.3 and, considered as a sequence

in (S1, A, P2), it satisfies the assumptions of Theorem 2.1.3. By the

corollary to Theorem 2.1.3 there exists, on (S1, A, Pi)' a sequence,

{Zn"}' of identically distributed r.v. 's, with d.£. G, such that:

(2.3.10)Prob

X"-Z,, --;:. 0 [Pi]'n n

Similarly, there exists on (S1, A, P2) a sequence {Un"} of identi

cally distributed r.v. 's, with d.f. H, such that:

(2.3.11)Prob

X " - U" --;:. 0 [P2].n n

Define, as in (2.3.9):

y" ::n Z "IA + U "I

nOn A~

Clearly, the y ,,' sn

have d.f. F and it remains to be shown, that

X " - Y" converges to zero, in probability (P). Let E: > 0, arbi-n n

trary, be given.

{w: Ix ,,-Y "In nc

n AO'

> d = ({w: Ix ,,-Y "I> d n AO) U {w: Ix u-Y "I> d)n n n n

31

Hence

P ({w: Ix lJ-Y "I > d) = P ({ w: IX ll-Z "I > e:} n AO

)n n n n

+ P({w: IXn,,-un II I > e:} n A~) = P(AO) • Pi ({w: IXn"-Zn" I > d)

+ P(A~) • P2 ({w: Ixn"-un" I > e:}).

By (2.3.10) and (2.3.11) both terms in the last sum·converge to

zero as n" + 00. Hence, it follows that:

But, we also have:

p(X "' Y II) + 0n n as n + 00 [P].

From this we have:

all nil ~ 1.

p (X ,,, G) + 0 as n + 00.n

The result follows.

Remarks. 1. The corollary to Theorem 2.2.3 is also valid here and

the proof is similar.

2. The observations that followed Theorem 2.2.3 also apply in the

present situation since, the existence of a unique atom is enough to

prevent the admissibility of continuous d.f.'s. In particular, the

proof of Renyi's result follows without any changes and in the proof of

Theorem 2.2.5, we substitute the equivalence between the two types of

convergence (which is not true here), by a combination of (2.2.9) with

the Borel-Cantelli lemma.

CHAPTER III

GENERALIZATION TO COMPLETE, SEPARABLE METRIC SPACES

In this chapter we are going to consider sequences of random ele

ments, defined on a probability space (n, A, P) and taking values on

(S, S), where S is a complete, separable, metric space and S is the

a-field of its Borel sets. As mentioned in Chapter I, we will make ex

tensive use of the class of partitions of S introduced by Skorokhod

in the proof of his result (Theorem 3.2.2). The main part of Section

3.1 will be devoted to the study of the properties of this class of

partitions.

3.1. The Skorokhod partition of S

Definition 3.1.1. Let S be a separable metric space and Q a

probability measure on (S, S). Q is said to be non-atomic if

Q({s}) = 0 for all s € s.

Since we have been using the concept of non-atomic in a more gen

eral context (with respect to an abstract measurable space (Q, A», we

decided to include here the proof of the equivalence of the two defini

tions on separable metric spaces.

33

THEOREM 3.1.1. Let S be a separabZe metric space and Q a prob-

abiZity measure on (S, S). Then~ a necessary and sufficient condition

for (S, S, Q) to be a non-atomic probabiZity space (Definition 2.1.3)

is that Q({s}) = 0 for aZZ s E S.

Proof: Suppose first that (S, S, Q) is a non-atomic probability

space. Let So denote an element of S. Since the only subsets of

{sO} are {sO} itself and the empty set it follows, from Definition

2.1.2, that {sO} is an atom of (S, S, Q) if and only if Q({so}) > o.

Since (S, S, Q) is assumed to be atom-free it follows that

Conversely let us now assume Q to be such that Q({s}) = 0, for

all s E S. Let B be an atom of (S, S, Q). We will show that these

assumptions lead to a contradiction. We are going to make use of the

fact that a separable metric space can, for each n ~ 1, be covered by a

countable sequence of disjoint Borel sets, each one with diameter (diam)

less than lin (see Appendix). For each n ~ 1, let {An} denote thei

collection of Borel sets satisfying the above requirements. Therefore,

we have:

(3.1.1) B =oc

Ui=l

(A~nB)~

for all n ~ 1.

Since B was assumed to be an atom, there exists, for each n ~ 1,

a unique index in such that:

34

Therefore we have:

(3.1.4) Q(A) = Q(B)n and Q(B-A) = O.

n

From (3.1.4) it follows that:

(3.1.5) QI~ (B-An~ = O.~=l J

But, we also have:

00 t [n~,<B-An)II(3.1.6) n A =n=l n

From (3.1.5) and (3.1.6) we conclude:

(3.1.7) Q[n A] = Q(B).n=l ~

Let now x and y denote 2 points in n°O 1A •n= n We can write:

(3.1.8) d(x,y) :s; diam[n A] :s;n=l ~

diam An:s; diam An

in

< lIn.

Since (3.1.8) is true for all n ~ 1, it follows that d(x,y) = 0

and hence we can conclude that n°O An=l n

is either empty or a singleton.

On the other hand the assumption that B is an atom implies Q(B) > 0

and hence it follows from (3.1.7) that there exists So € S such that:

00

(3.1.9)

Hence, from (3.1.7):

Q({so}) = Q(B) > O.

The last statement contradicts the assumption that, Q({s}) = 0 for all

S € S, and the proof of the theorem is complete. o

35

Definition 3.1.2. A partition of a set X is any finite, or de-

numerably infinite, collection of disjoint subsets of X, whose union

is X. Given two partitions, {Ai} and {B.} of the same set X weJ

say that {Ai} is a refinement (or a sub-partition) of {B. } if eachJ

Ai is a subset of some B.•J

Definition 3.1.3. A Skoroknod partition P of a separable metric

space S, is an ordered countable collection Pl

, P2

, of parti-

tions of S, each partition being a refinement of the preceding one and

such that for all k ~ 1 the elements of Pk

are nonempty Borel sets

whose diameters are at most (1/2)k.l

Definition 3.1.4. Let P be a probability measure on (S, S). A

Skorokhod partition P of S is said to be P-aontinuous if for all

k ~ 1 the elements of Pk are P-continuity sets.

We will show in the Appendix that for each probability measure P

on (S, S) there exists a P-continuous Skorokhod partition of S.

Remarks on notation. We will adopt the same nested-type notation

andk > 1Hence, for all

we have:

{S : i ~ l}i

1, ••. ,ik _

l,ik k

sets of the element

{Si : i 1 ~ l}.1

the elements of

all elements

used by Skorokhod [1956]. The elements of Pl will be denoted by

For each fixed value of i 1 {Si1,i

2: i 2 ~ I} denotes

P2 which are subsets of Si' In general,1

denotes the elements of Pk which are sub-

(3.1.10) S i'i 1 , .. " k-l

1 The name "Skorokhod partition" involves a slight abuse of languagesince P is actually a family of partitions of S.

36

Furthermore, since P1 is a partition of S it follows that:

(3.1.11) = s.

Observe that since each set S. i is required to be non-~1' ••• , k

empty we may not assume without loss of generality that the range of ~

is infinite. Since S is nonempty the range of ~ con-i1""'~_l

tains at least the number one.

With this notation we have a one-to-one correspondence between Pm

and a nonempty subset of the Cartesian product Nm, where N denotes

the set of natural numbers. Recall that Nm can be ordered lexico-

graphically and with this order Nm is a totally ordered set.

Given any two elements a and b of Nm, we write a ~ b to in

dicate that a precedes b on the lexicographic order of Nm• We will

write a < b;II

to indicate that a precedes and is not equal to b.

Recall that a totally ordered set is said to be well-ordered if

each nonempty subset has a first element. In other words, (A, {) is

well-ordered if for each nonempty subset B there exists b € Bo sat-

isfying bo ~ b for each b € B. It is clear from the definition that

the induced order on a subset of a well-ordered set is a well-ordering

of that subset. The following two lemmas are stated here for future

reference.

Lemma 3.1.1. Let A1 and A2 be two well-ordered sets. The

Cartesian product A1

x A2 , ordered lexicographically, is a well

ordered set.

Proof: We give only a brief outline of the proof. Let D be a

nonempty subset of A1

x A2

. Consider the set:

=

37

U {al~Al: (a1 ,a2) ~ D}a2EA2

Since D is nonempty it follows that B1

is a nonempty subset of A1 .

Let oa1 be the minimum element of B1 , which exists since A

1

ordered. Consider now the set:

is well

=

B2 is a nonempty subset of

It is now easy to show that

A2 and hence it has a minimum element

o 0(a1

,a2

) is the minimum element of D

in the lexicographic order of A1

x A2

•

Lemma 3.1.2. For each m ~ 1, Nm ordered lexicographically, is a

well-ordered set.

Proof: The result is trivially true for m = 1 since N is a

well-ordered set. The result follows by induction using the previous

lemma.

Definition 3.1.5. Let P be a Skorokhod partition of Sand P

a probability measure on (S, S). For a fixed s E S and each m ~ 1,

let S denote the unique element of P which contains s.kl, ••• ,km m

Define:

0 1f k1 = 1

(3.1.12) F1

(s) = k -11L P(Si ) if k1 > 1

i =1 11

(3.1.13) G1

(s) = F1(s) + P(Sk ).1

38

In general, for m > 1, define:

F _(s) if k = 1m-l m

(3.1.14) F (s) k -1= mmFm_

1(s) + I P(s ) > 1if kk

1,k2 ,· .. ,km_

1,i mi =1 mm

(3.1.15) G (s) = F (s) + p(S ).m m k1

,k2

, ••• ,km

for whichPmof

is equal to theObservations. 1) G (s), for s E Sk k 'm 1"'" m

probability of the union of all elements Si il' ... , m

(i1 ,···,im) ~ (k1 ,···,km). Similarly Fm(S), for s E Sk k' isl' ... , m

equal to the probability of the union of all elements 5 of. ~ i 1 ,···,im

Pm satisfying: (11 , .•. ,im) ~ (k1 , •.. ,km). In symbols we have:

(3.1.16) Fm(s) = p[(' iU)~(k k )5 i1 , ... ,i] s E Sk1

, .. ,km1 1 ' •.. , m ~ l' ... , m m

(3.1.17) Gm(s) = p[(' iU)J(k k )Si1

, ... ,i]1 1 , ... , m 1 l' .. " m m

2) F and G are both measurable functions from (5, S) to [0,1]m m

and for fixed s, the numerical sequences {F (s)} and {G (s)} arem m

both monotone, the first increasing and the latter decreasing. Hence,

for each s E S, their limits exist and we put:

(3.1.18) F(s) = lim F (s) G(5) = lim G (s).m mm+ oo m+ oo

3) Observe that if we order P lexicographically \-1e induce in a nat-m

ural way a total order on S. In fact, if s1 and s2 are tl-10 dis-

tinct points of S, and hence for some m ~ 1, and

are in different elements of P ,m and we may say that precedes

s2 if the set containing s1 precedes the set containing 52' With

39

respect to this order the functions F and G behave somewhat like

distribution functions associated with the probability measure P.

Lemma 3.1.3. Let S be a separable metric space, P a Skorokhod

partition of Sand P a probability measure on (5, S). Then, the

functions F and G, satisfy the relation:

(3.1.19) G(s) = F(s) + P({s}) all S € S.

Hence, if P is non-atomic F(s) = G(s) for all s € S.

Proof: From (3.1.15) we have that:

is monotonically decreasing, the se-

s,m increases, for fixed

and s € S allki

, ... ,km'

By the same reasoning used in the proof of Theorem 3.1.1 (seem~1.

for all m ~ 1 and s € Sk k' As1" •. , m

the sequence of sets {S }k i ' ... ,km

quence of its diameters converges to zero

(3.1.8» we have that:

(3.1. 20) {s}.

Therefore, we have:

(3.1.21) III = P({s}) •

(3.1.19) follows from (3.1.18) and (3.1.21). The last part of the lemma

follows from Theorem 3.1.1.

Remark. So far, it may appear that we have a perfect analogy be

tween F and G and distribution functions on Ri. However, it should

be kept in mind that the properties of distribution functions, which

40

depend on the relations between the order and the topology of R1 , have

no straightfonvard counterparts in the present situation. In fact, al-

though P depends on the topology of S, the relative arbitrariness

with which the elements of P are chosen prevents the existence of a

simple relationship between the order on S, induced by P, and the

topology of S.

Lerrma 3.1.4. Let {P } be a sequence of probability measures onn

(S, S) and assume that {p } is weakly convergent to a probabilityn

measure P on (S, S). Let P denote a P-continuous Skorokhod parti-

tion of S and, for each n ~ 1, let n{F : m ~ l}m

be the sequences of measurable functions associated with the pair

(P , P)n

in the way described in Definition 3.1.5. Similarly {F }m

and

{G} will denote the functions ~ssociated with (P,P). For each m ~ 1m

and all s € S:

lim Fn(s) = F (s)m m

and lim Gn(s) = G (s).m mn -+ 00

Proof: We will prove the result by induction on m. Consider

m = 1 and let s be any element of S. From (3.1.11) it follows that

there exists n1 ~ 1, such that S € Pi and s € S If n1 = 1,n1

n1we have by (3.1.12) , that

(3.1.22) all n ~ 1

and also that:

F1

(S) = O.

Suppose now that n1

> 1. Again by (3.1.12) we have that:

41

n -1F(n)(s)

1= 2 P (S. ) all n ~ 11 i =1 n ~1

1

(3.1.23)n -11

F1

(s) = I P(Si ).i =1 1

1

Since the S 's are P-continuity sets and P converges weakly to Pi

1n

it follows that for each i1

, 1 ~ i1

~ n1-l, we have:

lim P (S. ) = P(Si) and, the result follows.n+oo n ~1 1

Assume now that the result is true for m = k-l and let us prove

the result for m = k. Let S denote the element of Pkn1 ,n2 ,··· ,nkwhich contains s. By (3.1.14) we have:

=

Similarly:

=

nFk

_1

(s)

n -1k

+ Li =1k

n -1k

Fk_1(s) + Li =1k

if

if

~>1.

~ = 1

Clearly, if nk

= 1 the result follows by the induction hypothesis.

Otherwise:

= limn -+ co

nFk

_1

(s)

nk-1

+ 2 limik=l n -+ 00

The result now follows by the induction hypothesis and the fact that the

42

elements of Pk are P-continuity sets. The proof of the result for

{Gn

} follows from the result for {Fn } and (3.1.15).m m o

We would like to point out that our objective is to use the

Skorokhod partition as a device that would allow us to adapt, for metric

spaces, the proofs given in Chapter II for the real line. With this ob-

jectlve we proceed to find the probability distributions of the r.v. 's

F(s) and G(s) defined on (S, S, P) with values in [0,1].

Let P be a probability measure on (5, S), P a Skorokhod parti-

tion of S. Let {F} and {G} be the two sequences of measurablem m

functions associated with the pair (P, P) as in Definition 3.1.5. In

correspondence to each t € [0,1] define:

(3.1.24) B = {s: F (s) < t}m m A = {s: G (s) ~ t}

m mm~l.

Both B and A are clearly dependent on t but, since twillm m

be held fixed throughout the proofs, we do not indicate this dependence

in the notation.

In the following lemmas we establish some important properties of

the sequences of sets introduced in (3.1.24). In all of them t is an

arbitrary but fixed real number in [0,1] and {F}, {G} have them m

meaning given to them in Definition 3.1.5.

Lemma 3.1.5. i) For each m ~ 1 both B and A contain, withm m

each element s, the entire set of P which containsm

s. Furthermore,

is either empty or equal to an

satis fying:

the same is true forA ,m

orBm

Pm

of

whenever S is contained ink 1 ' ••• ,km

all elements Si il' ... , m

ii) B - A, for each m ~ 1,m m

element of P.m

43

Proof: The first part of i) is an immediate consequence of the

fact that both F and G are constant on each element of P. Them m m

second part comes from the fact that, for fixed m, F and G arem m

increasing with respect to the lexicographic order on P .m

To prove ii) observe first that if B - A is not empty it alsom m

contains, with each element s, the entire element of P which conm

tains s. Suppose that we have two distinct elements of P ,m

S and S , contained in B - A. Assume thati 1 , ••• ,i k

1, ••• ,k m m

m i m(i1 , .. ·,i) (k

1,···,k).m ~ m

Since the elements of P are not empty we can takem

sl € S. . and~1 ' ••• '~m

follows that:

s2 € Sk kl' ... , mFrom (3.1.16) and (3.1.17) it

(3.1.25)

On the other hand, since sl' Am it follows from (3.1.24) that:

(3.1.26)

(3.1.25) and (3.1.26) imply that Fm(s2) > t and hence, by (3.1.2~

we have that s2 t Bm which clearly contradicts our assumption that

ments of P and hence the desired conclusion follows.m

s2 € B -A .m m This shows that B - A cannot contain two distinct e1em m

o

Lemma 3.1.6. i) For each m ~ 1 and s € B ..m

G (s) $ PCB ).m m

ii) For each m ~ 1 and s € B -A :m m

G (s) = PCB ) and F (s) = peA ).m m m m

Proof:

tains s.

i) Let S denote the element of Pn1 ,· .. ,nm m

By part i) of the previous lemma we have that:

44

which con-

U(i1 ,··· ,im)~ (n1 ,·· • ,nm)

S. .~1" •• '~m

c B .m

Hence it follows that:

~ P(B).m

The result now follows from (3.1.17).

ii) Since we assumed that B - A is not empty by part ii) of them m

part i) of the same lemma, it follows that:

u

previous lemma

B =m

B - Am m is equal to an element

A =m

p •m

Si 1 ,· •{ ,im

(i1,··,i) (n1 ,··,n)m;lt m

By

The conclusion now follows from (3.1.16) and (3.1.17).

Lemma 3.1.7. i)

ii)

PCB ) ~ t.m

peA ) ~ t.m

Proof: i) If PCB ) = 1, we are done sincemt ~ 1. Assume then

PCB ) < 1. Hence there are elements of P which are not contained inm m

Bm and therefore the set:

L = {(i1 , ••• ,i ): Si i c BC

}m 1"'" m m

is a nonempty subset of Nm. It follows by Lemma 3.1.2 that L has a

minimum element that we denote by (n1 , ••• ,nm). By part i) of Lemma

3.1.5 we have that:

45

(3.1.27) B =m U J Si'(k i ) 1 ( ) l' ••• '].m

1 '···' n1,···,nm;Z: m

Let now s € S . By (3.1.16) and (3.1.27) we have that:n1

, ... , nm

0.1.28) F (s) = P(B).m m

But siB and hence by (3.1.24):m

(3.1. 29) F (s) ~ tm

(3.1.28) and (3.1.29) imply P(B) ~ t and the proof of i) ism

complete.

ii) Let R = {t € [0,1]: P({s:G (s)=t}) >m

o}. First we show ii)

for t € R. Observe again that, since G is constant on the elementsm

of P, {s: G (s) = t} is a finite or countable union of elements ofm m

Pm. Let (n1

, ... ,nm) be the first element of the set

s ck l ,···,km

It follows from the def-

{(il

, ... ,i ): Si .. c {s:G (s)=t}}. Assume now thatm 1, .•• ,1m m

{s:G(s)=t} and (n1

, ... ,n)"/:(k1

, ... ,k).m m m

inition of G that P(S ) = o.m k1 , •.. ,k

m3.1.5, we have that:

Hence, by part i) of Lemma

(3.1. 30) P(A )m

=

From (3.1.17), it follows that, if S € S :n

1, ... ,n

m

P(A) = G (s) = t.m m

Consider now t € R, t i R. Assume first that we have a sequence

{t}, t € R, t < t, all n ~ 1, lim ~ t = t. Then, we can write:n n n n~ n<Xl

(3.1.31) {s: G (s) < dm = U

n=l{s: G (s)

m:s; t }.

n

46

Hence, since t t R, we have:

(3.1. 32) P(A) = P({s: G (s) < t}) = P(U{s: G (s) ~ t }).m m tn n

Therefore, since ({s: Gm(s) ~ tn})n~l is an increasing sequence

of sets:

(3.1. 33) peA )m = lim P({s: G (5) ~ t })

m n = lim t n = to

The case where there exists a sequence {tn}, t > tall n ~ 1,n

can be handled in a similar way. Hence, we have proved that for all

t E R, peA ) = tom

Finally, consider the case t i R. If there exists no t*,

t* E R, t* < t we would have that P({s: G (s) ~ t}) = peA ) = ° ~ tm m

and we would be done. Otherwise since R is a closed subset of [0,1]

there exists a largest element t* E R, t* < t. Clearly:

P(A) = P({s: G (s) ~ t}) = P({s: G (s) ~ t*}) = t* < t.m m m

This completes the proof of the lemma. o

THEOREM 3.1.2. Let P be a probahiZity measure on (S, S) and P

a Skorokhod partition of S. As before~ {F} and {G}~ F and Gm m

are the funotions introduoed in Definition 3.1.5. Por al~ m ~ 1 and

eaoh t E [0,1] let B and A be given by (3.1.24). Then~ for allm m

values of t E [0,1] for whioh lim~ P(Bm) = t:

00

(3.1.33) {s: F(s) < t} c nBcm=l m

{s: G(s) ~ t}.

Henoe, for all suoh values of t:

(3.1.34) P({s: F(s) < t}) ~ t ~ P({s: G(s) ~ t}).

47

Proof: The first inclusion in (3.1.33) is in fact true for all

t E [0,1]. It follows from (3.1.24) and the fact that {Fm(s)}~l is

increasing that:

(3.1.35) {s: F(s) < t} c {s: F (s) < t} = Bm m all m ~ 1.

The first inclusion in (3.1.33) follows from (3.1.35).

To prove the other inclusion, let

Lemma 3.1.6, we have that:

By part i) of

(3.1. 36) G (so) :;; P(B )rn m

all m ~ 1.

Therefore, by (3.1.18) we have that:

:;; lim P(B) = t.m

Hence, So E {s: G(s) :;; t} and the proof of (3.1.33) is complete.

(3.1.34) now follows from the fact that {B} is a decreasing sem

quence of sets and hence

= lim P(B ).m

For the next theorem we assume that we are given a non-atomic prob-

ability measure on (S, S). From Theorem 3.1.1 it follows that for the

existence of a non-atomic measure on (8, S) it is necessary that S

be an uncountable set. This condition is also sufficient if 8 is

separable and complete (see, e.g., Parthasarathy [1967, pages 53-55]).

THEOREM 3.1.3. Let P be a non-atomic probability measure on

(S, S) and P a Skorokhod partition of s. Then, the r.v. F(s) =

G(s), defined on (8, S, P) is uniforrnZy distributed on [0,1].

48

Proof: We first observe that for each m ~ 1, In

fact suppose that S € A nBc.m m It follows from (3.1.24) that:

(3.1.37) G (s) :::; tm and

It follows from (3.1.15) that the element of P,m

s belongs has P-measure zero. Therefore, since

to which

not

empty is a finite or countable union of elements of

Hence, we have for all m ~ 1:

P it follows thatm

(3.1. 38) P(B -A) = P(B) - P(A ).m m m m

Suppose now that for a given value of t, we have:

(3.1. 39) lim P (B ) > t.m

From (3.1.38), (3.1.39) and part ii) of Lemma 3.1.7 it follows

that:

(3.1.40) P[~ (B -A )]1

m mm=

= lim P«B -A » > o.m m

However, by part ii) of Lemma 3.1.5 we know that B - A is anm m

element of P for each m ~ 1. By the same argument used in the proofm00

of Theorem 3.1.1, nm=l (Bm-Am) is either empty or a singleton. Hence,

(3.1.40) implies the existence of an element So € S such that:

This clearly contradicts the assumption that P is non-atomic and there-

fore for all t € [0,1] we should have: lim P(B) = t. Therefore,~ m

by the previous theorem and the fact that F = G we have for all

t € [0,1]:

49

P({s: F(s) < t}) ~ t ~ P({s: F(s) ~ t}).

The result follows.

3.2. The non-atomic case

o

We are now going to apply the techniques developed in the previous

section to extend to metric spaces the results of Chapter II. Our first

theorem shows how to associate to each r.e. defined on a non-atomic

probability space a convenient uniform r.v. It is, therefore, the

equivalent for metric spaces of Theorem 2.1.2 of Chapter II.

THEOREM 3.2.1. Let X be a r.e. defined on a non-atomio proba

bility spaoe (Q, A, P) and let Px denote the probability measure in

du~ed by X on (8, $). Let P be a Skorokhod partition of 8, F and

G be the two measurable funotions determined by the pair (Px ' P) as

desoribed in Definition 3.1.5. Finally, let Z be any uniformly dis

tributed r.v. defined on (Q, A, P). For s € 8 and z € [0,1],

defi~:

(3.2.1) H(s,Z) = F(s) + P({w: Z(w) ~ z} n X-1({s}».

Then, H(s,z) is a measurable funotion from «s x [0,1], $ x B» to

([0,1], B) and the r.v. U(w) = H(X(w), z(w» is uniformZy distributed

on [0,1].

Proof: The proof that H is measurable is straightforward. Ob

serve only that F(s) is measurable and that the second term is dif

ferent from zero for, at most, a countable number of values of s.

50

By Lemma 3.1.3 we can write:

(3.2.2) G(s) = = F(s) + P(X-1({S}».

From (3.2.1) (3.2.2) and the definition of D, it follows that:

(3.2.3) F(X(w» ~ U(w) ~ G(X(w» all w € ~

Let now t, ° ~ t ~ 1, be given. In correspondence to t, Pxand P consider the sequences

We have two cases to consider:

{B} and {A} introduced in (3.1.24).m m

1) For the given value of t, lim PX(Bm) = t. It then follows

from Theorem 3.1.2 that:

(3.2.4) P({w: F(X(w» < t}) ~ t ~ P({w: G(X(w» ~ t}).

On the other hand, from (3.2.3) we have that:

{w: U(w) < t} c {w: F(X(w» < t}(3.2.5)

{w: G(X(w» ~ t} c {w: D(w) ~ t}.

(3.2.4) and (3.2.5) imply that for all values of t E [0,1] for which

lim PX(B) = t we have that:~ m

(3.2.6) P({w: D(w) < t}) ~ t ~ P({w: D(w) ~ t}).

2) Assume now that for the given value of t we have:

lim PX(B) > t. By the argument used in the proof of Theorem 3.1.3,~ m

it follows that there exists So € S, such that:

(3.2.7)

(3.2.8)

S € B - A all m 2 1o m m

Hence, by part ii) of Lemma 3.1.6 we have that:

51

0.2.9)Gm(so) = PX(Bm)

Fm(sO) = PX(Am)·

Therefore, it follows from our assumption and from part ii) of

Lemma 3.1.7 that:

(3.2.10) = =

Consider now, for every z E [0,1]:

(3.2.11) fez) -1= P(x ({so}) n {w: Z(w) ~ z}).

Clearly fez) is a continuous function of z and we have:

(3.2.12) f(O) = 0; f(l)

On the other hand, from Lemma 3.1.3 and (3.2.10) we have that:

(3.2.13)

Since a continuous function defined on a compact set assumes all

the values between its minimum and its maximum it follows that there

exists Zo E [0,1], such that:

0.2.14) =

(3.2.15)

We will now show that:

{w: U(w) < tl c t-1[ilAm]u{x-1({SQI)n{w: Z(W)<'Qlj

c {w: U(w) ~ d.

52

To prove the first inclusion in (3.2.15) let Wo be such that

U(wo) < t. It follows from (3.2.3) that F(X(wO» < t. Recall now

that, as mentioned in the proof of Theorem 3.1.2, the first inclusion in

(3.1.33) is valid for all t € [0,1]. Hence, we have that:

00

~l

(3.2.16) n B •m

Observe that we have either X(wo) € Am for some m ~ 1 or

X(wo) € B -A for all m ~ 1. In the first case we are done and in them m

latter it follows that we must have X(wo) = sO' Therefore, in this

last case we can write:

(3.2.17)

On the other hand from (3.2.1) and (3.2.11) it follows that:

(3.2.18) =

Therefore, by (3.2.14) we have that:

(3.2.19)

Now if we observe that H(s,z) is, for fixed s, an increasing func-

tion of z it follows from (3.2.17) and (3.2.19) that Z(wO) < z00

This completes the proof of the first inclusion in (3.2.15).

To prove the other half, let wi be such that:

consider:

We have two cases to

1) It fo1-

lows from (3.1.24) that:

Hence, since {G (s)}m is a decreasing sequence:

S3

G(X(w1» S to

Therefore, by (3.2.3) we have that U(W1

) Stand hence

001 E {w: U(w) S t}.

2) X(W1) = So and Z(w1) ~ z00 Hence, from the definition of U

we have that:

Hence, it follows from (3.2.19) that U(W1

) Stand the proof of

(3.2.15) is complete.

On the other hand, we have by (3.2.11):

Hence it follows from (3.2.10) and (3.2.14) that:

(3.2.20)

From (3.2.15) and (3.2.20) it follows that (3.2.6) is valid for all

t E [0,1] and this implies that U(w) is uniformly distributed in

[0,1] •

Remarks. 1) If Px is a non-atomic measure on (5, S) we have

that:

F(X(w» = U(w) = G(X(w»

and hence the theorem above reduces to Theorem 3.1.3.

2) It is not necessary that the r.v. Z(w) be uniformly distri-

o

buted. Any continuous r.v. taking values on a compact set of1R would

be enough for our purposes.

In Chapter II, we saw how the inverse function -1F of ad. f.

54

F

determines a r.v. on the probability space ([0,1], B, L) whose d.f. is

F. Here, B denotes the Borel sets of [0,1] and L is the ordinary

Lebesgue measure. We now introduce the construction that will play the

role of -1F in the present context. The idea of this construction is

due to Skorokhod [1956] and we reproduce his proof in the theorem below

as a matter of convenience.

THEOREM 3.2.2. rSkorokhodJ. Let 8 be a complete, separabZe

metric space and Q be a probabiUty measure on (5, S). Then there

exists on ([0,1], B, L) a r.e. X with vaZues on (8, S) whose proba-

biZity distribution is Q.

Proof: Let P denote a Q-continuous Skorokhod partition of S.

In correspondence to each P we are going to associate a partition ofm

[0,1], by means of intervals ~ i ' which we take to be 1eft-i 1 ,i2 ,···, m

closed and right-open, satisfying the following conditions:

~i' i' ifl' ... , m

2) The length of

From each element

Q(S ).i 1 ,···,im

P we choose a pointm

-Xi1 ,i2 ,···,im·

For each m ~ 1 and W E [0,1] we define:

(3.2.21) x (w)m = xi .

1, ... ,1

mif W £. ~. i .

1 1 , .. " m

Since each partition in P is a refinement of the preceding ones

we have:

(3.2.22)

55

It follows that {X (w)} is a Cauchy sequence and since S ism

complete, lim X (w) exists.~ m

We define for all w € [0,1]:

(3.2.23) X(w) = lim X (w).m ~ <Xl m

To complete the proof it remains to be shown that the probability

distribution of X is Q. We claim that it is enough to show that A

and LX-1

are equal on the Borel subsets of S which are Q-continuity

sets. In fact, this class of sets is a field and it can be shown that

the a-field generated by this field is S.

Let A denote a Q-continuity set. Let denote the union of

all elements of P which are contained in A and let A,(m) denotem

the union of all elements of P which are not contained in S - A.m

Clearly:

(3.2.24) A(m) cAe A' (m) .

Observe also that from the construction of X we have:

(3.2.25) = Q(A(m».

Similarly we have that:

(3.2.26) L({w: X(w) € A,(m)}) =

Let now C(m) denote the set of points whose distance from the

boundary of A is less than or equal to (~)m. Observe now that if

S. . c A,(m) - A(m), S1 . contains points of the boundary11 ,···,1m 1,···,1m

of A and hence the distance from a point in S. . to the bound-J.1 ,··· ,J.m

i I h (L)m. H h h tary s ess t an ~ ence, we ave t a :

(3.2.27)

56

But, {C(m)} is a decreasing sequence of sets whose limit is the

boundary of A. Hence, since A is a Q-continuity set, it follows

that:

(3.2.28)

The result follows. o

Remark. Observe that the partition of [0,1] given by the inter-

coincides with the partition determined by the in-

areGmandF

mwhereS E Si .

1" .• '~m

as in Definition 3.1.5.

for

(Q, P)

va1s 1:::.. i~1' 2,· .. ,im

tervals: [F (s) ,G (s»,m m

determined by the pair

Corollary. Let S be a complete, separable metric space and Q a

probability measure on (S, S). Then, there exists in any non-atomic

probability space a r.e. X, with values in S, whose probability dis-

tribution is Q.

Proof: Since (n, A, P) is non-atomic, by Theorem 2.1.1, there

exists on (n, A, P) a uniform r.v. Z. The construction of the above

theorem applied to each value Z(w) produces the desired r.e.

Lemma 3.2.1. Let Z be a uniformly distributed r.v., defined on

the non-atomic probability space (n, A, P) and let Q be a

probability measure on (S, S) where S is a complete separable metric

space. Let X denote the r.e. obtained by applying the Skorokhod con-

struction to the r.v. Z. Then, for all m ~ I and all choices of

(3.2.29) P({w: X(w) E s. 1 } n {w: Z(w) i 1:::.1 1}) = 0.~1 ' .•. , m 1 ' .• " m

57

Proof: The result is trivial and it is in fact assumed in the

proof of Skorokhod's result (see (3.2.25) and (3.2.26». Observe that

if Z(w) E ~k k 'l' .. " m

X(W) E S Hence, we have that:k1

, .•. ,km

'

(3.2.30) c

Hence it follows that:

(3.2.31) P({W: X(W)ES. i }n{w: Z(w)t~. i })~1 ' •• , m ~1 ' •• , m

= Q(S. i n Sc )~1 ' . " m i 1 ,··,im

:s; Q(S n SC ) = O.i1

, .. ,im i 1 ,··,im

The last equality follows from the fact that P is Q-continuous. 0

We now prove a more general version of Theorem 2.1.3 of Chapter II.

THEOREM 3.2.3. Let {X} be a sequence of r.e. 's defined on an

non-atomic probability space (n, A, P)3 with values on (S, S) where

S is a complete 3 separable metric space. Suppose that the corre

sponding sequence {Qn} of probability distributions on (8, S) is

weakly convergent to a probabiUty distribution Q. Then 3 there exists

on (n, A, P) a sequence {y} of r.e. 'S3 with values onn

(8, S)J and

such that the following conditions are satisfied:

For each n ~ 1 the probability distribution of Y isn

Q.

it) d(X ,Y )n n

converges to zero in probabi U ty as n tends to

infinity.

58

Proof: Let P be a Q-continuous Skorokhod partition of S. For

each n ~ 1, let U denote a uniformly distributed r.v. associatedn

with the r.e. X and the partition P as in Theorem 3.2.1. For eachn

n ~ 1, {Fn

} and {Gn} denote the measurable functions, introduced inm m

Definition 3.1.5, determined by the pair (Q , P).n

Similarly, {F }m

and {Gm} are determined by (Q, P). The Skorokhod construction

applied to each U •ndetermines by the corollary to Theorem 3.2.2, a

r.e. Yn with probability distribution Q. We now proceed to show that

the sequence {Y }n

satisfies ii).

Let us recall that in the Skorokhod construction for each m ~ 1 a

partition of [0,1] was determined by (Q, P) and we denoted by

6. i the elements of the partition associated with P. Further-~1 , •• " m m

more, recall that by the remark following the proof of Theorem 3.2.2 we

had; for each m ~ 1 and all choices of i1

, .•. ,im:

0.2.32} [F (s},G (8))m m 8 € S i .i 1 , •.. , m

By (3.2.3) and the properties of the sequences

we have:

n{F } >1'm m-

n{G } >1m m-

(3.2.33) s U (w)n

nG (X (w».m n

If we now observe that the intervals [Fn(s},Gn(s» for s € Sm m i 1 ,··· ,im

form also a partition of [0,1) it follows from (3.2.33) that the

assumption that Un

is an interior point of any such an interval im-

plies that Xn

should belong to the corresponding element

Hence we can write:

(3.2.34)

(3.2.35)

59

Hence, from (3.2.32) and (3.2.34)

P({w: X (w) r/. Si .} n {w: U (w) E 6. })n 1, .• ,1m n 1 1 , .• ,im

P(w: U (w) t [Fn(s),Gn(s» n {w: U (w) E (F (s),G (s»}n m m n m m

where s is an element of S. i •1 1 , .•• , m

The term on the right on (3.2.35) is the probability that a uni-

form r.v. belongs to the intersection of an interval with the complement

of another interval. It can be easily checked that the value of this

probability is given by:

(3.2.36) (G (s)-F (s» - max(O,(min(G (s) ,Gn(s»-max(F (s),Fn(s»).m m m m m m

On the other hand by Lemma 3.1.4:

nlim F (s) = F (s)m m

n+cx>

lim Gn(s) = G (s).m m

Therefore, it follows that:

lim P({w: X (w) , S. i } n {w: U (w) E 6 }) = O.n 11

, .. , n i1

, .. , i mn+cx> m

We now observe that given any three events A1 , A2 , A3 on a probability

space we always have:

(3.2.37)

Using (3.2.37) we can write:

(3.2.38)

60

If we now recall that Y was obtained by applying the Skorokhod conn

struction to U, it follows by Lemma 3.2.1 that the second term on then

right of (3.2.38) is zero. Furthermore, we have just shown that the

first term converges to zero as n tends to infinity. Hence, we have:

(3.2.39) lim P({w: X (w) i Si 1 } n {w: Y (w) E S }) = O.n ~ 00 n 1"" m n i 1 ,··,im

Let now E > 0 be given. Choose m ~ 1 such that

write:

(~)m < E. We can

(3.2.40) P({w: d(X (w),Y (w» > E}) ~ P({w: d(X ,Y ) > (~)m}).n n n n

On the other hand, we have that:

(3.2.41) {w: d(X ,Y ) > (~)m} c U{w: X i S1

1 }n{w: Y ES i }n n n 1'" m n 11 ", m

The last union extends to all sets

Hence we have that:

S in P.1

1, ... , 1

mm

(3.2.42) P({w: d(X ,Y ) > E})n n

since

I p({w: Yn E Sl .} n {w: X i S1 1 }).( 1 . ) 1 ' . . , 1 m n 1 ' . . ., m1" •• ,1m

By (3.2.39) each term of the sum on the right of (3.2.42) converges to

zero as n tends to infinity. On the other hand, each term is

dominated by P(Y E Si . ), which is independent of n,n 1, .•. ,1m

all Y 's have the same probability distribution. Furthermore,n

LP(Yn E Sil, ••. ,im

) = 1 and hence by the dominated convergence theorem

it follows that:

(3.2.43) limn~oo

P({w: d(X (w),Y (w» > E})n n = o. o

61

3.3. The atomic and the general cases

As we mentioned before very few changes are required on the proofs

of Sections 2.2 and 2.3 of Chapter II to generalize the results, ob-

tained there, to complete separable metric spaces. In this section, we

prove a few lemmas that will allow us to make the necessary changes. We

start by stating a more general version of the definitions of relative

compactness and tightness.

Definition 3.3.1. Let S be any metric space. A family r of

probability measures on (S, S) is relatively oompaot if every sequence

{P} of elements of r contains a subsequence shich is weakly convern

gent to a probability measure on (S, S) (not necessarily an element of

n.

Definition 3.3.2. Let S be an arbitrary metric space and r a

family of probability measures on (S, S). r is said to be tight if fo

for every £ > 0 there exists a compact set K such that: P(k) > 1-£

for all P in r.

THEOREM 3.3.1. (Prohorov). Let 8 be a oomplete separable metria

spaae. For a famiZy r of probability measures on (8, S) to be rela

tively aompaat it is neaessary and suffioient that r be tight.

Theorem 3.3.1 is stated in the way it was proved by Prohorov [1956]

and this will be enough for our purposes. Varadarajan [1961] extended

the sufficiency of the condition to arbitrary metric spaces.

THEOREM 3.3.2. Let {X} be a sequenoe of r.e. 's defined on then

atomio probability spaoe (Q, A, P) and taking vaZues on (8, S). If

the aorresponding sequenae

62

{Q} of probabiUty measures on (S, S) isn

tight then eveT'!f subsequence of {X} has a further subsequenae whichn

converges a.e.

Proof: Let {A.} denote the atoms,of1.

(n, A, P). Since P(Ai»O,

all i ~ 1, in the same way as in the real line case tightness implies

that for each i there exists a compact set Ki

such that for almost

all w € Ai the sequence {x (w)}n is entirely contained in k i • But,

each sequence contained in a compact set of a metric space has a con-

vergent subsequence. Hence, we can apply the diagonal procedure as in

Theorem 2.2.2 to complete the proof of the result.

THEOREM J. J. J. Let {X} be a sequence of r.e. 's defined on ann

atomic probabitity spaae (n, A, P) and taking vatues on (S, S) where

S is a aomptete separable metria space. Suppose that the corresponding

sequence of probabiUty measures {Qn} is weakly convergent to a prob

abiUty measure Q on (S, $). Then, if AQ

denotes the class of

r.e. 's on (n, A, P) whose probability distribution on (S, S) is Q,

we have:

i) AQ

is not empty.

ii) p(Xn,AQ) converges to zero as n tends to infinity.

The proof is totally analogous to the proof of Theorem 2.2.3.

Corottary. In the same conditions, as in the above theorem, there

exists on (n, A, P) a sequence {Y} of identically distributedn

r.e.'s, with probability distribution Q, such that d(X ,Y )n n con-

verges to zero in probability as n tends to infinity.

63

Proof: See proof of corollary to Theorem 2.2.3.

Remark on notation: Recall that we use d to denote the metric

on Sand p to denote the metric on the space of r.e. 's corresponding

to convergence in probability.

We consider now the general case. Let ('1, A, P) be a probability

space, let {A. } denote the atoms of (0. , A, P) and put as in1

Section 2.3 A = Ui~l Ai' Since the cases P(AO

) = 0 and P(AO

) = 10

were already discussed we shall assume here that:

(3.3.1)

We introduce again the two measures Pi and P2

on ('1, A)

given by:

(3.3.2) P1

(B)P(BnAo)

= P(Ao)p(BnA~)

all B E A.(3.3.3) P2(B) =

P(A~)

The probability spaces ('1, A, P1

) and ('1, A, P2

) are atomic and

non-atomic respectively. If X is a r.e. defined on ('1, A) with

values on (S, S) we will denote by Q, Q(1) and Q(2) the proba-

bility measures induced by X on (S, S) in correspondence to P, Pi

and P2

respectively.

Lemma 3.3.1. Let {X} be a sequence of r.e. 's on ('1, A, P) andn

suppose that the corresponding sequence of probability distributions

on (S, S) is tight. Then, both are tight.

Proof: Totally analogous to the proof of Lemma 2.3.1.

64

The result contained in the next lemma can be quite useful when we

deal with weak-convergence from the point of view of continuity sets.

Although we found no reference to it in the literature the result is

probably known and its proof is relatively simple.

Lemma 3.3.2. Let S be a separable metric space, Q1 and Q2

be any two probability measures on (5, S). Let F1

and F2 denote

the fields of Ql-continuity sets and Q2-continuity sets respectively.

Then, S is the smallest cr-field containing the field F = F1 n F2 •

P~oof: Let x be any element of 5 and r be any positive real

number. We will denote by B(x,r) the open ball of center x and

radius r, that is:

(3.3.4) B(x,r) = {yES: d(x,y) < r}.

Recall that for A c 5 we write oA to denote the boundary of A. It

is easy to see that:

(3.3.5) oB(x,r) c {y: d(x,y) = r}.

Let now Xo E 5 and rO

> 0 be given. There exists at most a

countable number of values of r for which:

(3.3.6)

A similar statement can be made for Q2'

Hence from (3.3.5) and Definition 1.1.4, it follows that for all

but a countable number of values of r < r O we have:

(3.3.7)

Hence, we can choose a sequence {r }n of real numbers,

65

for all n ~ 1, limn~ r n = r O and such that for all n ~ 1,

B(xO,rn) € F1 n F2

,

Therefore, we have that:

(3,3.8) =

(3.3.8) shows that every open ball of S belongs to the a-field

generated by F1 n F2

• Therefore, it follows that this a-field contains

the a-field generated by the open balls. Since the latter, for sepa-

rable S, coincides with S the proof of the lemma is complete.

Lemma 3.3.3. Let {X} be a sequence of r.e. 's defined onn

(n, A, P) with values on (S, S). Let us denote by {Qn}' {Q~l)} and

{Q(2)} the three sequences of probability measures induced by {X} onn n

(S, S). Then the weak convergence of any two of the sequences implies

the weak convergence of the third. Furthermore, if Q, Q(l) and Q(2)

denote the respective weak limits we have for all B € S:

(3.3.9)

We observe first that if any two of the sequences is tight the same is

true for the third. This is a consequence of Lemma 3.3.1 if the pair

assumed tight includes {Q }.n

Otherwise, recall that:

(3.3.10) Q (B)n =

for all B € S, all n ~ 1. Hence, if both are

tight the tightness of {Qn} follows from the fact that the union of

two compact sets is compact.

Assume now that Qn converges weakly to Q and Q(l)n

converges

weakly to Q(l). It follows from (3.3.10) that converges for

66

all sets which are continuity sets of both Q and Ql.

is tight every subsequence of contains a weakly convergent sub-

sequence. Let {Q~~)} be a weakly convergent subsequence of

and let QO be its limit. It follows from (3.3.10) that for all B

which is both a Q-continuity set and a Ql-continuity set, we have:

=

Therefore all subsequential limits of {Q(2)} coincide on a class ofn

sets which is, by the previous lemma, a field which generates S. Hence

it follows that {Q(2)} is weakly convergent to a limit Q(2) and then

validity of (3.3.9) for all B € S follows.

THEOREM 3.3.4. Let {X} be a sequenae of r.e. 's defined onn

(n, A, P) with values on (8, S) where 8 is a aomplete separable

metria spaae. Suppose that the aorresponding sequenae {Qn} of prob

ability measures is weakly aonvergent to a probability measure Q. Then

if AQ

denotes the alass of r.e. 's on (n, A, P) whose probability

distribution is Q we have:

iJ AQ

is not empty.

iiJ P(Xn,AQ

) aonverges to zero as n tendS to infinity.

Proof: The proof follows the same lines of the proof of Theorem

2.3.1 of Chapter II.

CHAPTER IV

SOME RELATIONSHIPS BETWEEN THE METRICS LAND p

In this chapter we will look into some relationships between the

Levy-Prohorov metric L (Definition 1.1.5) and the metric p (see

1.1.4) associated with convergence in probability.

4.1. The lower bound for p

We will begin by stating a result which is a consequence of the

definitions of Land p and which, we believe, was first proved by

Prohorov [1956].

Lemma 4.1.1. Let X and Y be any two r.e. 's defined on a prob-

ability space (n, A, P), with values on (S, S). Let Px and Py be

the probability measures induced on (S, S) by X and Y respectively.

Proof; Let £0 > p(X,Y) be given and let F be a closed set in

S. As before we write F£Q to indicate the set: {XES: d(x,F) s EO}

we can write:

P(XEF)£0 £0= P(XEF & Y€F ) + P(X€F & YiF ).


68

But, on the other hand:

{w: X(w) € F} n c {w: d(X(w),Y(w» > EO}.

Since £0 > p(X,Y) it follows from the definition of p, that:

P({w: d(X(w) ,Y(w» > EO}

Hence, we can write:

P(X€F)

Therefore, it follows from the definition of L (Definition 1.1.5),

that:

Since the last result is true for all

lemma is complete.

the proof of the

o

Lemma 4.1.1 shows that given two probability measures Q1 and Q2 on

(S, $), L(Q1,Q2) is a lower bound for the distance in probability be-

tween r.e. IS, defined in some probability space, whose margina1s are Q1

and Q2. The following result due to Strassen [1965J shows that this

lower bound is always attained when 5 is a complete, separable metric

space.

THEOREM 4.1.1. (Strassen). Let S be a aomplete separable metria

spaae and Q1

and Q2

be two probability measures on (5, $). Then,

there exists a probability measure A on S x S with marginals Q1

and Q2 suah that for every pair of r.e.'a (Xl' X2), Whose joint

probability distribution is A, P(Xl ,X2) = L(Ql,Q2).

69

For the proof, see 5trassen [1965], Theorem 11, pages 436-438.

We will now look at this result from a different point of view.

Let (Q, A, P) be a probability space, E be the class of r.e. IS de-

fined on (Q, A, P) with values on (5, S) and G be the class of

probability measures on (8, S) which are admissible for (Q, A, P).

Recall that since we do not distinguish between random elements which

are equal a.e., each element of E is in fact an equivalence class of

r.e. 's. Furthermore, if (Q, A, P) is non-atomic, G coincides with

the class Z(S) of all probability measures on (5, S).

Let Q1 and Q2 be two elements of G and let AQ and AQ1 2denote respectively the classes of r.e. IS whose probability distribu-

tions are Q1

and Q2' For a given X E: AQ

we will look for condi-1

tions for the existence of Y E: AQ for which p(X,Y) is arbitrarily

2close to the lower bound L(Ql,Q2)' He now shot... , by means of an ex-

ample, that when S is the real line and (Q, A, P) is non-atomic, the

construction used in Chapter II (using the inverse of a d.f.) does not,

in general, produce a r.e. for which the lower bound is attained.

Therefore, if the metric p is used as a criterion of optimality the

sequence {Y }n

constructed in Theorem 2.1.3 is not necessarily the

"best" one possible.

Example 4.1.1. Let n be the closed interval [0,1], A be the

a-field of Borel subsets of [0,1] and P be the Lebesgue measure.

Let the d.f. IS G1

and G2 be given by:

(4.1.1) = {~ififif

x < 0° S x s Ix > I

(4.1.2)= {+ if x < 0

if o:s; x < 1/4if x ~ 1/4.

70

Let Q1 and Q2 be the probability measures determined by G1 and G2

respectively. To determine L(Q1,Q2) we need to find the values of

€ > 0 for which the inequality:

(4.1.3)

is satisfied for all closed subsets F of the real line.

It is easy to see that for any closed subset

that:

F of 1R we have

= Q2(F n {O,%})

€~ Q1«F n {O,%}) ).

Hence it follows that to calculate L(Q1,Q2) we can restrict our

attention to the three nonempty subsets of {O,%}. If we write for

those sets the inequalities given by (4.1.3) and we use (4.1.1) and

3(4.1.2) we will get that L(Q1,Q2) = i'

Let now X be the uniformly distributed r.v. given by the iden-

tity map from [0,1] to the real line. From the definition of the 1n-

verse of a d.f. (Definition 2.1.4) we have that:

(4.1.4) Y(x) = -1G

2(X) =

o :s; x < 2/32/3 < x :s; 1.

Hence, we can write:

P({w: Ix(w)-Y(w)I > €}) = P(y=O & X>e)

+ P(Y = 1/4 & Ix - 1/41 > E).

71

Therefore, from (4.1.4) and the definition of X it follows that:

P({w: Ix«(v)-Y(w)I > e}) = P([O,~] n (e:,l])

211+ P«J,l] n ([0'4-e:) u (4+e:,1]».


(4.1. 5) P({w: Ix(w)-Y(w)I > e:}) =

To obtain p(X,Y) we equate the right hand side of (4.1.5) to e: and

17solve the resulting equation. The result, 36 is clearly larger than

38· However, it can be easily checked that the lower bound, 3

8' is

attained for the r.v. Z defined by:

= {10/4Z(w)

3/8 ~ w ~ 17/24

otherwise.

We will now show that if (Q, A, P) is non-atomic for each X E AQ1

there exists Y E AQ

for which p(X,Y) is arbitrarily close to2

L(Q1,Q2). We are going to need the following known results about non-

atomic spaces.

Lemma 4.1.2. Let AO be any event on a non-atomic probability

space (n, A, P). Let q1,···,qn be real numbers such that: qi ~ 0

I~=l qi ~ P(Ao)· Then, there exist n disjoint subsets of AO'

B1 ,B2 ,· .• ,Bn , B. € A, P(Bi ) = qi' all i=l, ••. ,o.1.

ppoof: For n = 1, this is a well-known result. (See, e.g.,

Neveu [1965], page 18.) The result for n can be easily proved by

induction.

72

Lemma 4.1.3. Let Q1

and Q2

be two probability measures on

1(S, S) with finite support. Let A be a probability measure on

[SxS, SxS] whose marginals are Q1 and Q2' Then, for any r.e. X

defined on the non-atomic space (Q, A, P) with probability distri-

bution Ql' there exists a r.e. Y with probability distribution Q2

and such that the joint distribution of (X,Y) is A.

ppoof: Let {sl, ... ,sn} and {51" .. ,s~} denote the supports of

Q1 and Q2 respectively. Assume that:

Q1({Sin = Pi 1 ::;; i ::;; n

Q2({sj}) = qj 1 ::;; j ::;; m

A«si ,sj» = Aij 1 ::;; i ::;; n, 1 ::;; j ::;; m.

Clearly, from the assumptions about Q1' Q2 and A it follows

that:

mI Aij = Pi 1 ::;; i ::;; n

j=lnI Aij = qj 1 ::;; j ::;; m.

i=l

The construction of the r.e. Y can now be accomplished by

applying the previous lemma to each one of the sets-1

X ({s.n1.

with the

o

1

THEOREM 4.1.2. Let Q1

and Q2

be any two ppobability meaBupes

on (8, S) whepe 8 is a aomplete., sepapable metPia spaae. Let

(Q, A, P) be a non-atonna ppobability space and as befope let AQ

and1

By support of a probability measure we understand the smallestclosed subset of S which contains all the mass.

73

denote the olasses of roe. 's on (n, A, P)AQ2

tributions are and respeotively.

whose probability dis-

Then, for any

P(X,AQ )2

=

Proof:

a sequence

Let

{x }n

X € AQ

be given. Since S is complete there exists1

of simple r.e. 's (roe. 's taking at most a finite num-

ber of values) such that {x }n

converges aoe. (and hence in proba-

bility) to X. Let {Q1n} denote the probability distributions of the

Xn's. Clearly {Q1n} converges weakly to Q1' On the other hand,

there exists a sequence {Q2n} of probability measures with finite sup-

port such that {Q2n} is weakly convergent to Q2' In fact, the set of

probability measures with finite support is dense in the class of all

probability measures in (S, S). (See, e.g., Parthasarathy [1967],

page 440) Recall now that, by Strassen's result, given any two proba-

bility measures R1

and R2

on (S, S), there exists a probability

measure A on [SxS, SxS], whose marginals are R1 and R2 and such

that for every pair of r.e. 's (Zl,22) whose joint probability distri

bution is A, we have: P(Zl,Z2) = L(R1 ,R2). For each n ~ 1, let us

denote by An

the probability measure on [sxS, SxS] , associated in

this way with the pair Since both Q1nand Q

2nhave a

finite support, by Lemma 4.1.3 given Xn

there exists Y ,n

with prob-

ability distribution Q and such that (X ,Y) has JOoint distri-2n' n n

bution A. Hence, it follows that, for all n ~ 1:n

(4.1.6) p (X ,Y )n n =

But, {Y }n

is a sequence of r.e. 's on (n, A, P) whose probabi-

lity distributions, {Q2n} are weakly convergent to Q2' Hence by our

74

main result of Chapter III:

(4.1.7) lim P(YntAQ ) = O.n + ~ 2

On the other hand. s~nce {Q }.• 1n are weakly convergent to

Q1 and Q2 respectivelYt it follows that:

(4.1.8) =

FinallYt by the triangle inequality we can write:

p(X)AQ

)2

Therefore, by (4.1.6):

p(X)AQ

)2

p(X,X) + p(X tY ) + p(Y )AQ

).n n n n 2

p(X)X) + L(Q1 ,Q2 ) + p(Y ,AQ

).n n n n 2

Hence) by (4.1.7) and (4.1.8) it follows that:

P(XtAQ )2

$ lim p(X,Xn) + L(Q1 tQ2)'n+ oo

Since X converges to X a.e. it follows that:n

Since the reverse inequality is always true by Lemma 4.1.1 t the

proof of the theorem is complete. o

Remark. The theorem above does not permit us to conclude that for

for whicheach X € AQ

there exists Y € AQ1 2

Theorem 4.1.1 says that if we consider in the space E of r.e. 's the

classes of equivalence determined by their probability distributions t

elements of the same class are at the same distance from any other

class) provided that the probability space is non-atomic.

75

The following example shows that this result fails in general.

ExampZe 4.1.2. Let n = [O,lJ and A be the a-field formed by

1 1[0'3] and all Borel subsets of (3,1]. P will be the Lebesgue mea-

sure. Since the mass of the atomic part is smaller than ~ it can be

easily seen that for any a, ° S a S 1, there exists in (n, A, P)

an event with probability a. Consider the following probability rnea-

sures on the real line:

(4.1.9)= Q ({I}) = ~

1

= %; Q2({l}) = 3/4.

Both Q1

and Q2

are admissible for (n, A, P) and by the same

argument used in Example 4.1.1, we can say that to calculate L(Ql,Q2)

we need only to consider the nonempty subsets of {O,l}. However, it is

easy to see that {I}

isfied for all E > 0.

is the only set for which (4.1.3)

Hence, it follows that: L(Ql,Q2)

is not sat-

3= 7; - ~ = ~.

Let now X be given by:

-- {01X(w)otherwise.

Observe now that any set in (n, A, P) whose probability is 3/4 has

Hence any r.v.to contain the atom

bution Q2

has to be equal to 1 in

Y with probability distri-

1[0'3]' Therefore, for any

p(X,Y) ;;:: 1/3.


p(X,AQ

) ;;:: 1/3.2

On the other hand, if we consider the r.v. X'(w) = 1 - X(w),

X' € AQ and we will show that:1

defined by:

{Ol

Y(w) =

p (X ',A ) = 1/4.Q2

o s w s 3/4

otherwise.

In fact, let

76

y be

Clearly, for any €, 0 < € < 1, we can write:

{w: IX'(w)-y(w)1 ~ €} = (~,f]. The result follows. o

Remark. Let us consider for Q1 and Q2

in G, the quantity La

defined by:

if and only if Q1 = Q2. Furthermore, it follows from Theorem 3.3.4 of

Chapter III that a sequence {Qn} in G converges weakly to Q if and

only if Lo(Qn,Q) converges to zero as n tends to infinity.

It is not difficult to show that La satisfies the triangle in

equality and it follows from Theorem 4.2.2 that for non-atomic proba-

bility spaces La coincides with L. However the previous example may

be used to show that LO is not necessarily symmetric. We will make

use of Example 4.1.2 to show that the standard procedures to symmetrize

La (e.g. maximum, arithmetic mean) fail to preserve the equivalence

with L. Let (Q, A, P) be as in Example 4.1.2 and define, for each

n ~ 1:

2 I= - +3 n =

Clearly, Qn € G for all nand {Qn} converges weakly to Q, given

by:

Q({O}) = 2/3 Q({l}) = 1/3.

77

Let Y € AQ

be defined by:

{:if 1/3 < w ~ 1

Y(w) =if 0 ~ w ~ 1/3.

By the same argument used in

must contain the atom

Example 4.1.2 for any Xn € AQ

, X~1({0})n

Hence, it follows that it is impossible

such that

does not converge

to choose a sequence {Xn }, Xn € AQn

zero as n tends to infinity. Therefore,

p(X ,Y)n

converges to

to zero and hence the symmetrized LO will not be equivalent to L.

4.2. Weak convergence and equivalent probability measures

In this section, we extend to metric spaces a result of

Padmanabhan [1970] and show how this extension allows us to use L to

define a metric in the space of r.e. IS which is equivalent to p.

Recall that given two probability measures P, Q, in the same mea-

surable space (Q, A) we say that Q is absolutely oontinuous with

respect to P (Q« P) if Q(A) = 0 for every A € A, for which

P(A) = O. Two probability measures P and Q for which both condi-

tions P« Q and Q« P are satisfied are said to be equivalent.

Lemma 4.2.1. Let A and B be any two events on a probability

space (n, A, P). A necessary and sufficient condition for the

existence of an equivalent probability measure Q on (n, A) for which

Q(A) ; Q(B) is that P(A6B) > O.

Proof: Suppose first that for some probability measure Q,

equivalent to P, we have: Q(A); Q(B). Then we can write:

o < IQ(A)-Q(B)! S Q(A6B)

78

Hence, Q(~B) > OJ P(~B) > 0 now follows from the definition of

equivalence.

To prove the converse, note that

A ~ B = (A-B) u (B-A).

Hence, P(~B) > 0 implies that either P(A-B) > 0 or P(B-A) > O.

Assume that P(A-B) > O. Observe now that if peA) ~ PCB) we are done

since P is equivalent to itself. It only remains to consider the case

peA) = PCB).

Define for each C € A

(4.2.1) = P(C/A) = P(CnA)peA)

Since P(A-B) > 0, peA) is strictly positive and hence Po is we11

defined.

Define for each C € A:

(4.2.2) Q(C)Pa(C) + P(C)

=2

Q is a probability measure on (n, A) and Q is equivalent to P. On

the other hand:

Q(A)(4.2.3)

Q(B)

=

=

PO(A) + peA) = 1 + peA)2 2 c

I[1 + PCB) - P(AnB )]

IR(AnB) + P(B)l 2 = peA)[peA) ~ 2

Since P(AnBc) > 0 and peA) = PCB) it follows from (4.2.3) that:

Q(A) > Q(B).

THEOREM 4.2.1. Let X and y be any two r.e. 's defined on a

probability space (n, A, P) and taking values on (S, S) where S is

a separable metric space. Suppose also that the two r.e. 's are dis

tinct, that is: P({w: X(w) = yew)}) < 1. Then, there exists a proba-

79

bility measure Q on (n t A)t Q equivaZent to P~ suoh that~ with

respeot to Q.. X and Y have different probabi Uty distributions. In

other words if two r.e. 's.. X and Y~ induoe the same probability mea

sures on (St S).. with respeot to all probability measures equivalent

to P it follows that X = Y a.e.

Proof: Let d denote the metric on S. Since X and Yare

distinct there exists £0 > 0 such that:

(4.2.4) P({w: d(X(w),Y(w» > EO}) > O.

Since S is separable there exists a countable, disjoint collection,

{Ai} of elements of St whose union is S and such that t for each it

diam(Ai ) < £0'


(4.2.5)00

= I P(X-1 (Ai ) n {w: d(X,Y) > £0»i=l

From (4.2.4) and (4.2.5) it follows that for some Ai we have:o

P(X-1 (Ai

) n {w: d(X(w) tY(W» > EO}) > O.o

On the other hand, since diam(Ai

) < Eato

Therefore, it follows that:

(4.2.6) P(x-1(Ai

) n (y-1(Ai

»c) > o.o 0

By the previous lemma, there exists a probability measure Q on

(n t A) t Q equivalent to P t such that:

-1Q(Y (Ai» •

o

80

Or, equivalently, we have:

Q({w: X(w) e A. })~O

The result follows.

=

o

Definition 4.2.1. Let E be any set. A map d: ExE ~ R1 is said

to be a pseudo-metria in E whenever, for all (x,y) e ExE:

and1) d(x,y);;?: 0

2) d(x,y) = d(y,x).

d(x,x) = O.

3) d(x,y) ~ d(x,z) + d(z,y).

Remark. Every metric on E is a pseudometric but the converse is

not true since for a pseudometric d, d(x,y) = 0 does not necessarily

imply x = y.

A family {de: e e 0} of pseudometrics in a set E is said to be

separating if for every pair (x,y) E ExE, with x f y, there exists

de such that: de(x,y) > O. Given a separating family of pseudometrics

on a set E, the topology on E generated by the sets:

{{y: de(x,y) < d: x e E, e E 0 E > O}is called the topology induced on E by the family {de: e e 0}.

Clearly, since the family {de: e e 0} is assumed to be separating, the

topology induced by this family, is Hausdorf.

Definition 4.2.2. Let (n, A, P) be a probability space and let

E denote the space of r.e. 's defined on (n, A, P) with values on

(S, S). Let {pe : e e 0} denote the class of probability measures on

(n, A) which are equivalent to P. For each e e 0 and (X,Y) € ExE,

define:

(4.2.7) =

81

Lernrrn 4.2.2. For each e E: G, da is a pseudometric in E. The

family V· {de: 6 E: G} is separating and hence induces an Hausdorf

topology on E.

Proof: Since L is a metric on the space of probability measures

on (S, S), the verification that each de is a pseudometric on E is

straightforward. The other half of the lemma fellows trivially from

Theorem 4.2 . .l.

THEOREM 4.2.2. Let {X} and X be r.e.'s defined on (n, A, P)n

with values on (8, S). Then~ {X} aonvergea to X in probability ifn

and only if for every Px-aontinuity set A c: S and every C € A we

have:

(4.2.8) lim P(X-1 (A) n C) = P(X-1 (A) n C).2n+ ClO n

2

Proof: 1) Suppose first that {X } converges to X in probabi-n

lity. Then, it can be shown (see, e.g. , Billingsley [1968] page 26)

that, for every PX-continuity set A:

(4.2.9) lim P(X-\A) 6 X-1 (A» = O.nn+ ClO

On the other hand, using the properties of the operation 6, we can

write for any C E: A:

(4.2.10) Ip(X-1(A)nC) - P(X-1(A)nC)I ~ P{(X-1(A)nC) 6 (X-1(A)nC)}n n

= P(C n (X-1(A)6X-1(A») ~ P(X-1(A) 6 X-1(A».n n

The desired result follows from (4.2.9) and (4.2.10).

The ''leak convergence of {P~} to Px is equivalent to condition

(4.2.8) with C = n.

82

2) Suppose now that for every Px-continuity set A and every

C € S we have:

(4.2.11) lim P(X-1 (A) n C)nn + 00

= P(X-1(A) n C).

Let £ > O. be given. By the argument used in the proof of the

existence of Skorokhod partitions (see Appendix) there exists a parti-

tion of S by means of a countable collection {B. }1.

of PX-continuity

sets, each one of them with diameter smaller than e. Hence. we can

write:

(4.2.12) P({w: d(X (w).X(w» ~ e})n L

i~l

P(X-1(B.) -1( c»n X B. •1. n 1.

For each i ~ 1. it follows, from the fact that is a P -conX

tinuity set and (4.2.11). that:

= O.

But on the other hand, the sum on the right of (4.2.12) is domin

ated by ~p(X-l(Bi» = 1. for all n ~ 1. It follows that:

lim P({w: d(X (w) .X(w» ~ £}) = O.n

n+ oo

Since £ > 0 is arbitrary, this completes the proof.

The next result is the extension to metric spaces of' Theorem 2.1

of ,Padmanabhan [1970].

CoroZZary. With the notation introduced in Definition 4.2.2, the

metric topology on E, given by p, and the topology generated by

v = {de: e € 0} are sequentially equivalent. In other words, a se

quence {X} converges to X in probability if and only if then

o

corresponding sequence

e E: 0.

converges weakly to for all

83

P~of: In one direction the result is trivial since convergence

in probability is preserved when we substitute P by an equivalent

probability measure.

To prove the result in the other direction let A be a Px con

tinuity set and let C E: A, P(C) # O. Define for D E: A:

(4.2.13) Q(C) = [P(D/C) + P(D)]/2.

Clearly Q is equivalent to P and A is a QX

continuity set. It

follows from our assumptions that:

(4.2.l4) lim P(X-1{A) n C)nn -+ 00

=

Since (4.2.l4) is trivially true for sets C, with P{C) = 0, the

result follows from Theorem 4.2.2. o

Remark. If we denote by T the p-topology on E and by TV the

topology induced by V it follows from the previous theorem and by

some well-known results in topology (see Wilansky [1970], page 27,

Theorem 3.1.2) that TV C T.

THEOREM 4.2.3. Let (S'2, A, P) be a probability space and assume

that A is a separable a-field that is A has a countable sub-class

F such that A is the smallest a-field containing F. Then there

exists a countable subclass G c V such that TG = T.

Proof: We will show first that with the assumption of separability

Theorem 4.2.2 holds under the weaker assumption: C E: F. There is no

loss of generality if we assume that F is a field since the field

generated by a countable class is countable. Let

of sets in A and assume that for some A E A:

(A) be a sequencen

84

(4.2.15) lim P(A nC)n

n+ oo= P(AnC) all C E F.

Let now B E A and £ > 0 be given. Since F is a field and F gen-

erates A, there exists C E F such that:

P(B ~ C) s £/3.

By the same argument used in (4.2.10) it follows that for all

D E A:

(4.2.16) IP(DnC) - P(DnB)I s £/3.

On the other hand, we can write:

Ip(A nB) - P(AnB)! S Ip(A nB) - P(A nC)/ + Ip(A nC) - P(AnC)In n n n

+ IP(AnC) - P(AnB)I .

From (4.2.15) and (4.2.16) it follows that:

all n larger than some N ~ 1.


Ip(A nB) - P(AnB)I s £n

lim P(A nB) = P(AnB)nn+ oo

all B EA.

Let now {Ci } denote the elements of F and in correspondence to each

Ci , with P(Ci ) f 0, define a probability measure Qi, given by:

= {P(B/C.) + P(B)}/21

B E A.

Let G denote the family of pseudometrics determined by {Q~} as in

(4.2.7). We have just shown that 'G and , are sequentially equiv

alent. Furthermore, since G is countable, 'G can be metrized by the

metric:

85

Therefore, and the two metrics p and are equivalent. o

We now show, by means of an example, that the space (E,PO) is not

necessarily complete.

ExampZe 4.2.2. Let {X} be a sequence of independent r.v. 's den

fined on (n, A, P) and assume that the corresponding sequence of prob-

ability distributions is weakly convergent to a non-degenerate d.f. F.

F is the smallestn

a-field with respect to which X1"",Xn are measurable. Put

F = U >1 F, and let G be the a-field generated by F. Let B E Fn- n

and x be a continuity point of F. Hence,

Clearly, for all n > nO:

for some nO ~ L

P({w: X ~ x} n B)n

It follows that:

= P({w: Xn ~ x})P(B).

(4.2.17) lim P({w: X (w) ~ x} n B)nn -+ 00

= F(x)P(B).

Hence, it follows that the sequence of probability distributions

converge weakly to F(x) with respect to all probability measures given

by (4.2.13) with C E F. It follows that in the case of a countable F,

{Xn} is fundamental in the metric PO' However, if there would exist

X such that {X} converges to X in probability it follows fromn

(4.2.17) and Theorem 4.2.2 that X would have to be independent of F

and hence F would have to be degenerate which is a contradiction.

APPENDIX

We are now going to review the essential facts about metric spaces

that were used in the previous chapters. With the exception of the re

sults related to the Skorokhod partition the material is standard and it

is presented here only for the reader's convenience. A full account of

these and related results can be found in any good introductory text

book on topology or real analysis (see, e.g., Royden [1963] or Wilansky

[1970]).

The basic concepts (e.g.: open and closed sets; interior, closure

and boundary of a set; dense and nowhere dense sets; etc.) as well as

their properties are assumed to be known. As before we will denote

metric spaces by the letter S and the metric on S will be denoted by

d. If A is a subset of S we write A, AO and oA to denote the

closure, the interior and the boundary of A respectively.

A.l. Separable metric spaces

Definition A.l.l. An open aove~ of a metric space S is a collec

tion of open subsets of S whose union in S.

Definition A.l.2. A collection B of open subsets of S

to be a base for S if for every open set 0 in S and every

there exists B € B such that: x € B c O.

is said

x € 0

87

Remark. It is an obvious consequence of the definition that if B

is a base for S then every open subset of S can be expressed as a

union of elements of B.

Definition A.1.3. A metric space S is said to be second Gount-

able if it has a countable base.

Definition A.1.4. A metric space S is separabZe if it contains a

countable dense subset. In other words, S is separable if there

exists a countable subset D of S such that D= S.

THEOREM A. 1. 1. For a metric space S the three condi tions be l(J/;)

are equivalent.

i) S is separable.

ii) S is second countable.

iii) Each open cover of any subset of S has a countab le subcover.

For the proof, see Wi1ansky [1970; page 76].

CoroZlary. Let S be a separable metric space and S be the

a-field of the Borel sets of S. Let 0 be any positive real number.

Then, S can be written as a countable, disjoint union of elements of

S each one of them having diameter smaller than o.

Proof: Consider the open cover of S given by Let

{B} denote a countable sub cover which exists by the previous theorem.n

Hence, we have:

(A.1.l) S = U B .n~l

n

Furthermore, it is clear that:

(A.1.2) diam(Bn) < 0 all n ~ 1.

88

Now since S is a a-field it follows that there exists a disjoint

sequence {An} of elements of S such that:

UA = UB = S.n n

(A.!. 3)

(A.I.4)

A c Bn n all n ~ 1

The result follows. o

THEOREM A.l. 2. Let S be a sepal'ab'le metl'ic space. Then, S is

the sma'l'lest a-field containing the open balls of S.

Pl'oof: Since S is separable it follows from Theorem A.I.l that

there exists a countable collection of open balls which is a base for

S. By the remark following Definition A.l.2 we can say that every open

set in S is a countable union of open balls of S. Hence, it follows

that the a-field generated by the open balls of S contains all open

sets of S. Hence, this a-field contains S since S is by definition

the smallest a-field containing the open sets of S. The inclusion in

the other direction is trivial since every open ball is an open set. 0

A.2. Completeness and compactness on metric spaces

Definition A.2.1. A sequence {x} of points of a metric spacen s

is said to be a Cauchy sequence if

m and n go to infinity.

d(x ,x )m nconverges to zero as both

Remark. It is easy to see that every convergent sequence in a

metric space is a Cauchy sequence and also that the converse to this

statement is not in general true.

89

Definition A.2.2. A metric space where every Cauchy sequence is

convergent is called a complete metric space.

Remark. Completeness is not a topological property in the sense

that two metrics d1

and d2

can generate the same topology on S

(that is, they determine the same class of open sets) and yet S can be

complete with respect to one of them and not with respect to the other.

Definition A.2.3. A subset K of a metric space S is compact if

every open cover of K has a finite subcover.

There are several important properties which are equivalent to

compactness in metric spaces. We will only state the one below which

was used in relation with the definition of tightness.

THEOREM A. 2.1. Let k be a compact subset of a metric space S

and {x} be a sequence of elements of k. Then {x} has a limitn n

point. {Equivalently {x} admits a convergent subsequence.)n

For the proof, see Wilansky [1970; oage 124).

A.3. The existence of Skorokhod partitions

THEOREM A. 3.1. (Skorokhod). Let S be a separabZe rootria space

and P a probability measure on (8, S). Then, there exists a P-con-

tinuous Skorokhod partition of S.

Proof: In the proof of the Corollary to Theorem A.l.l, we saw that

for each k ~ 1, we can express S as a countable union of open balls

whose radius are smaller than (~)k+2. Therefore, for each k ~ 1

there exists a sequence k{xi}i~l of points of S and a sequence

90

(k){ri }i~l of positive real numbers such that:

(A.3.l) s = Ui~l

all i ~ 1.

We claim now that for each k ~ 1 we can choose a real number r k ,

such that for all i ~ 1, is a P-con-

tinuity set. In fact, by the same argument used in the proof of Lemma

3.3.2, for each i the number of values of r in that interval for

(k)which B(x

i,r) is not a P-continuity set is at most countable. Hence

it follows from (A.3.l) and the choice of the rk's that for each k ~ 1

(A.3.2) S ::

It is also clear that

We are now going to express the union in (A.3.2) as a disjoint

union by, defining:

(A.3.3) =i-I (k)U B(x

j,r

k).

j=l

Finally, we define:

k() •.• () Di

.k

(k)is a subset of B(x

i,rk) and hence

k k(~) . Furthermore, since

=

We observe first that Si 1 ,···,ik

the diameter of S. i is smaller than~1 ' ••• , k

the class of P-continuity sets is a field and the open balls

are in this class it follows that S. i is a P-continuity set.~1' ..• , k

The verification of the other properties of the Skorokhod partition

(Definition 3.1.3) is straightforward.

REFERENCES

ALEXANDROV, A. D. (1940-1943). Additive set functions in abstract

spaces. Mat. Sb. 8, 307-348; 9, 563-628; 13, 169-238.

BILLINGSLEY, P. (1968). Convergence of Probability Measures.

John Wiley & Sons, Inc., New York.

DUDLEY, R. M. (1968). Distances of probability measures and random

variables. Ann. Math. Statist. 39, 1563-1572.

NEVEU, J. (1965). Mathematical Foundations of the Calculus of

Probability. Holden-Day, Inc., San Francisco.

PADMANABHAN, A. R. (1970). Convergence in probability and allied

results. Math. Jap. 15, 111-117.

PARTHASARATHY, D. R. (1967). Probability Measures on Metric Spaces.

Academic Press, New York.

PROHOROV, Y. V. (1956). Convergence of random processes and limit

theorems in probability theory. Theor. Prob. Appl. 1, 157-214.

PTKE, R. (1968). Applications of almost surely convergent constructions

of weakly convergent processes. Proc. Internat. Symp. Prob.

Inform. Theor., Springer-Verlag, Berlin.

RENYI, A. (1970). Foundations of Probability. Holden-Day, Inc.

San Francisco.

92

ROYDEN, H. L. (1968). Real Analysis. Second edition, MacMillan,

New York.

SKOROKHOD, A. V. (1956). Limit theorems for stochastic processes.

Theor. Prob. Appl. 1, 261-290.

STRASSEN, V. (1965). The existence of probability measures with given

margina1s. Ann. Math. Statist. 36, 423-439.

VARADARAJAN, V. s. (1958). On an existence theorem for probability

spaces. Selected Translations in Mathematical Statistics, Vol. 2.

American Mathematical Society, Providence.

VARADARAJ&~, V. S. (1961). ~~asures on topological spaces.

Translations of the American Mathematical Society, Series 2, 1965,

Vol. 48, American Mathematical Society, Providence.

WICHURA, M. J. (1970). On the construction of almost uniformly conver

gent random variables with given weakly convergent image laws.

Ann. Math. Statist. 41,284-291.

WILANSKY. A. (1970). Topology for Analysis. Ginn and Company,

Waltham, t~ssachusetts.

table of contents chapter acknm~ledgments abstract i introduction 1.1 introduction and preliminary...

Documents