probability - basic ideas and selected topics

247
www.MathGeek.com Probability: Basic Ideas and Selected Topics Eric B. Hall Gary L. Wise ALL RIGHTS RESERVED. UNAUTHORIZED DUPLICATION IS STRICTLY PROHIBITED. www.MathGeek.com

Upload: mathgeek

Post on 27-Apr-2015

246 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Probability: Basic Ideas and Selected Topics

Eric B. Hall

Gary L. Wise

ALL RIGHTS RESERVED.

UNAUTHORIZED DUPLICATION

IS STRICTLY PROHIBITED.

www.MathGeek.com

Page 2: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Preface

In writing this book we were faced with a serious dilemma. To how much of the vast subject of probability theory should an undergraduate student be exposed? Although it is tempting to remain at the level of coin flipping, card shuffling, and Riemann integration, we feel that such an approach does a great disservice to the students by reinforcing the many popular myths about probability. In particular, probability theory is simply a branch of measure theory, and no one should sugar-coat that fact. Al­though some might suggest that this approach is over the head of an average student, such has not been the case in our experi­ence. Indeed, most of the reluctance to cover probability at this level seems to originate behind the desk rather than in front of it. The importance of probability is increasing even faster than the frontier of scientific knowledge, and hence the usefulness of the standard non-measure-theoretic approach is being left far behind. Students need to be able to reason clearly and think critically rather than just learn how to parrot a few simplistic results. Our goal with this book is to provide the serious student of engineering with a rigorous yet understandable introduction to basic probability that not only will serve his or her present needs but will continue to serve as a useful reference into the next century.

www.MathGeek.com

Page 3: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

www.MathGeek.com

Page 4: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Contents

Preface

Acknowledgments

Introduction

Notation

1 Set Theory 1.1 Introduction 1.2 Unions and Intersections 1.3 Relations.. 1.4 Functions ...... . 1.5 cr-Algebras ...... . 1. 6 Dynkin 's 1l" - A Theorem 1.7 Topological Spa<:es . . 1.8 Caveats and Curiosities.

2 Measure Theory 2.1 Definitions..

1

7

9

11

13 13 14 19 19 24 28 30 31

33 33

2.2 Snpremums and Infimnms . . . . . . . . . 35 2.3 Convergen<:e of Sets: Lim Inf and Lim Sup 36 2.4 Measurable Functions. . . . . . . . . . . . 38 2.5 Real Borel Sets . . . . . . . . . . . . . . . 39 2.6 Lebesgue Measure and Lebesgue Measurable Sets 43 2.7 Caveats and Curiosities. . . . . . . . . . . . . .. 46

3 Integration 3.1 The Riemann Integral ..... 3.2 The Riemann-Stieltjes Integral 3.3 The Lebesgue Integral . . . . .

www.MathGeek.com

47 47 49 51

Page 5: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

3.3.1 Simple Functions 51 3.3.2 Measurable Functions. 52 3.3.3 Properties of the Lebesgue Integral 53

3.4 The Riemann Integral and the Lebesgue Integral. 55 3 t:: .0 The Riemann-Stieltjes Integral and the Lebesgue

Integral 56 3.6 Caveats and Curiosities . 57

4 Functional Analysis 59 4.1 Vector Spaces 59 4.2 Normed Linear Spaces 60 4.3 Inner Product Spaces . 62 4.4 The Radon-Nikodym Theorem 68 4.5 Caveats and Curiosities . 68

5 Probability Theory 69 5.1 Introduction. 69 5.2 Random Variables and Distributions 70 5.3 Independence 75 5.4 The Binomial Distribution 80

5.4.1 The Poisson Approximation to the Bino-mial Distribution 82

5.5 Multivariate Distributions 83 5.6 Caratheodory Extension Theorem 86 5.7 Expectation 94 5.8 Useful Inequalities 98 5.9 Transformations of Random Variables. 102 5.10 Moment Generating and Characteristic Functions 105 5.11 The Gaussian Distribution 108 5.12 The Bivariate Gaussian Distribution 112 5.13 Multivariate Gaussian Distributions 113 5.14 Convergence of Random Variables 116

5.14.1 Pointwise Convergence 116 5.14.2 Almost Sure Convergence 116 5.14.3 Convergence in Probability. 117 5.14.4 Convergence in Lp 118 5.14.5 Convergence in Distribution 119

5.15 The Central Limit Theorem 120 5.16 Laws of Large Numbers 122 5.17 Conditioning 123

www.MathGeek.com

Page 6: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

5.18 Regression Functions 129 5.19 Statistical Hypothesis Testing 132 5.20 Caveats and Curiosities . 134

6 Random Processes 135 6.1 Introduction. 135 6.2 Gaussian Processes 138 6.3 Second Order Random Pro<:esses 139 6.4 The Karhunen-Loeve Expansion. 143 6.5 Markov Chains 145 6.6 Markov Processes 147 6.7 Martingales 149 6.8 Random Processes with Orthogonal Increments 151 6.9 \Vide Sense Stationary Random Processes 154 6.10 Complex-Valued Random Processes. 156 6.11 Linear Operations on VVSS Random Processes 157 6.12 Nonlinear Transformations 158 6.13 Brownian Motion 164 6.14 Caveats and Cnriosities . 168

7 Problems 169 7.1 Set Theory. 169 7.2 Measnre Theory . 171 7.3 Integration Theory 172 7.4 Functional Analysis 174 7.5 Distributions & Probabilities. 174 7.6 Independence 176 7.7 Random Variables. 177 7.8 Moments. 179 7.9 Transformations of Random Variables. 181 7.10 The Gaussian Distribution 182 7.11 Convergen<:e . 183 7.12 Conditioning 185 7.13 True/False Questions 187

8 Solutions 193 8.1 Solutions to Exercises. 193 8.2 Solutions to Problems 202 8.3 Solutions to True/False Questions 238

www.MathGeek.com

Page 7: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

www.MathGeek.com

Page 8: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Acknow ledgments

The authors would like to thank David Drumm for many help­ful suggestions. In addition, they would like to thank Dr. Herb \Voodson, Dr. Stephen Szygenda, Dr. Tom Edgar, Dr. Edward Powers, Dr. Francis Bostick, and Dr. James Cogdell. Also, they would like to acknowledge the wonderful help that GL\V received in his recovery from a stroke, and in this regard, they mention the supportive friendship of the preceding friends as well as that of Dr. Michael Edmond and many dedicated therapists, includ­ing Michelle Sanderson, Jerilyn Iliff, Janice Johnson, Audrey Schooling, Liz Larue, and Mischa Smith. Finally, they are grate­ful to Carey Taylor of the Texas Rehabilitation Commission for his help in providing services for GL\V's recovery.

This book was typeset using the u\TEX typesetting system de­veloped by Donald Knuth and Leslie Lamport.

www.MathGeek.com

Page 9: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

www.MathGeek.com

Page 10: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Introduction

This book is designed to impart a working knowledge of prob­ability theory and random processes that will enable a student to undertake serious studies in this area. No prior experien<:e with probability, statistics, or real analysis is required. All that is needed is a familiarity with basic <:akulus and an ability to follow mathemati<:al reasoning.

Any course on probability theory must go down one of two roads. On the first road the student flips mins, shuffles <:ards, looks at pretty bell <:nrves, and mnsiders many simple <:onseqnen<:es of deep, dark theorems mentioned only in footnotes. Although this road is popular with engineers (and some statistidans), it is a dead-end road that produces students capable of dealing only with a few overly-restrictive special cases and incapable of think­ing for themselves. The semnd road treats probability theory as a bran<:h of an area of mathematics known as measnre theory. Althongh this approa<:h reqnires a stndent to first learn some very basic aspects of set theory and real analysis, the benefits of taking this road are enormous. Students suddenly understand the results that they are applying, formerly obtuse theorems be­come transparently easy, and seemingly advanced engineering tools such as the Kalman filter are seen as simple consequences of much more general results. In this work we will take the latter road without apology.

www.MathGeek.com

Page 11: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

www.MathGeek.com

Page 12: Probability - Basic Ideas and Selected Topics

Notation

'l

z*

A~B lP(S)

IA AC

A\B ADB

-A Lp(D, F, J-L)

B(T)

M(A)

m ).

a({A;: i E I}) 1T({X;:iEI})

www.MathGeek.com

the set of all real numbers the set of all integers the set of all integers greater than zero the set of all rational numbers the set of all complex numbers the empty set the imaginary unit the complex conjugate of the complex

number z A c B and A i=- B the set of all subsets of the set S the indicator function of the set A the complement of the set A the set of points in A that are not in B (A \ B) U (B \ A) {-x: x E A} for A c JR. the set of all J-L-equivalence classes

of functions f:(0., F) --+ (JR., B(JR.)) such that Jo IflP dJ-L < OCI

the Borel subsets of a Borel subset T of JR.

the Lebesgue measurable subsets of A E B(JR.)

Lebesgue measure on M(JR.) Lebesgue measure on B(JR.) the smallest a-algebra including {Ai: i E I} the smallest IT-algebra for which

Xi is measurable for each i E I

www.MathGeek.com

Page 13: Probability - Basic Ideas and Selected Topics

12

a.e. [J1]

a.s. {a E A: condition}

V ::3 st

wp

D <)

www.MathGeek.com

the function j restricted to A max{j, O} for a real-valued

function j - min{j, O} for a real-valued

function j almost everywhere with respect to the

measure J1; i.e. pointwise off a J1-null set

almost surely the set of points in A for which the

indicated condition is true "for all" or "for each" "there exists" "such that" "with probability" "has the distribution" Quod Erat Demonstrandum This symbol denotes an unusually difficult

section or problem. Proceed with caution.

www.MathGeek.com

Page 14: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

1 Set Theory

1.1 Introduction

\Ve will take a naive approach to set theory. That is, we will assnme that any describable collection of objects is a set. Con­sider a set A By writing x E A we will mean that x is an element of the set A By writing x t/:. A we will mean that x is not an element of the set A. Note that x E A and x t/:. A cannot both be true simultaneously. To see why our approach is naive, let R denote the set of all sets A such that A t/:. Al If R E R then by definition it follows that R t/:. R. Similarly, if R t/:. R then by definition it follows that R E R. Thus, although R is a describable collection of objects, R is not a set!

This paradox was discovered by Bertrand Russell and had a rather devastating effect on the work of a German logician named Gottlob Frege who later wrote: "To a scientific author hardly something worse can happen than the destruction of the foundation of his edifice after the completion of his work. I was placed in this position by a letter of Mr. Bertrand Russell when the printing came to a dose." To avoid such paradoxes, set theory is based upon systems of axioms such as the Zermelo­F):·aenkel system. Mathematics is based upon such systems of axioms and "mathematical truths" mnst be nnderstood in that light. One such axiom that we will use without hesitation is the Axiom of Choice which simply states that for any collec­tion {Xa : a E A} of nonempty sets, there exists a function c mapping A to UaEA Xa such that c( a) E Xa for each a E A Although seemingly innocuous, there are many deep and dark consequences of the Axiom of Choice.

The set with no elements is called the empty set and is denoted by 0. Vve say that a set B is a subset of a set A and we write B c A if x E A whenever x E B. Vve sometimes denote this by

1 It is possible for a set to be an element of itself. For example, the set of all sets that contain more than one element is itself a set that contains more than one element, and hence is an elelnent of itself.

www.MathGeek.com

Page 15: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

14 Set Theory

saying that A is a superset of B in which case we write A :J B. Note that any set is a subset and a superset of itself. Two sets A and B are said to be equal if A c B and if B c A. In this case we write A = B. If A and B are not equal we write A i=- B. A set A is said to be a proper subset of B if A c B and if A i=- B. (We sometimes denote this by writing A~B.)

Later generations will regard set theory as a malady from which one has recovered. -Poincare

Consider a nonempty set 0 and let x be an element from 0. The set {x} containing only the element x is called a singleton set. In general, for elements Xi from 0 where i ranges over some index set I we will let {Xi: i E I} denote the set containing only the elements Xi for i E I.

Exercise 1.1 Is there any difference between {0} and 0?

For any set A the power set of A is denoted by JID(A) and is defined to be the set of all subsets of A. That is, a set B is an element of JID(A) if and only if B c A. In set notation we may write JID(A) = {B: B C A}. Note that 0 E JID(A) and that A E JID(A) for any set A.

Exercise 1.2 \Vhat is JID(0)?

1.2 Unions and Intersections

Let 0 and I be nonempty sets and consider a collection of sub­sets of 0 denoted by {Ai: i E I}. In this case the set I is called an index set and is often taken to be a subset of the real line R The intersection of the sets in {Ai:i E I} is denoted by niEI Ai

www.MathGeek.com

Page 16: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Unions and Intersections 15

and is defined to be the set of all points in n that are in Ai for each i E I. That is,

n A = {x En: x E Ai Vi E I}. iEI

(Note that this intersection equals n if I = 0.) The union of the sets in {Ai: i E I} is denoted by UiEI Ai and is defined to be the set of all points in n that are in Ai for some i E I. That IS,

U Ai = {x En: 3i E I st x E Ai}. iEI

(Note that this union equals 0 if I = 0.) In other words, for two sets A and B, the set An B contains elements that are in A and in B and the set A U B contains elements that are in A or in B. 2 If I = {I, ... , n} for some positive integer n then we will often write

as n

or as Al n ... n An and similarly for unions. If 1= N, the set of positive integers, then we will often write

as =

and similarly for unions.

Consider three sets A, B, and C. YOll should be able to prove the following properties concerning unions and intersections:

1. A n B = B n A and A U B = B U A. That is, unions and intersections are commutative.

2. A n 0 = 0 and A U 0 = A.

2This "or" is not an exclusive or. That is, a point that is in both A and B is also in A u B.

www.MathGeek.com

Page 17: Probability - Basic Ideas and Selected Topics

16

www.MathGeek.com

Set Theory

3. A U A = A n A = A. That is, unions and intersections are idempotent.

4. (AUB)UC = AU(BUC) and (AnB)nC = An(BnC). That is, unions and intersedions are associative.

5. (A n B) c A and A c (A U B).

6. A c B if and only if A U B = B.

7. If A c C and B c C then (A U B) c C.

8. (A U B) n C = (A n C) U (B n C) and (A n B) U C = (A U C) n (B U C). That is, nnions and intersedions are distributive.

Exercise 1.3 Prove that A c B if and only if A U B = B.

The set difference of two sets A and B is denoted by A \ Band is defined to be the set of points in A that are not in B. The set A \ B is sometimes called the relative complement of B in A. If the set A is clear from the context of our discussion we will often write A \ B as BC and refer to it simply as the complement of B. That is, if n is some fixed nonempty underlying set then BC = {x En: x tj. B}.

The symmetric difference of two sets A and B is denoted by AL:-.B and is defined to be the set (A \B) U (B\A). Two sets A and B are said to be disjoint if A n B = 0. A colledion of sets is said to be disjoint if any two distind sets from the colledion are disjoint.

In what follows, any set of the form AC should be interpreted to refer to the set n \ A for some fixed nonempty set n that contains every point of interest. You should be able to prove the following properties:

2. Au AC = n.

3. A and AC are disjoint.

www.MathGeek.com

Page 18: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Unions and Intersections

It may seem to be a stark paradox that, just when mathematics has been brought close to the ultimate in abstractness, its applications have begun to multiply and proliferate in an extraordinary fashion . . . . Far from being paradoxical, however, this conjunction of two apparently oppo­site trends in the development of mathe­matics may rightly be viewed as the sign of an essential truth about mathematics it­self. For it is only to the extent that math­ematics is freed fi"Oln the bonds which have attached it in the past to particular aspects of reality that it can become the extremely flexible and powerful instrument we need to break paths into areas now beyond our ken. -Marshall Stone

17

The Cartesian product of two sets A and B is denoted by A x B and is defined to be the set of all ordered pairs (a, b) for which a E A and bE B. For example, JR. x JR. (often denoted by JR.2) is the plane. For n E N, the Cartesian product of n sets A 1 , ... ,

An is the set of all orderedn-tuples (a1, ... , an) where ai E Ai for each positive integeri :::::; n. This product is denoted by

n

ITA; i=l

or by A1 X ... x An. Note that this product is empty if Ai is empty for any i. For example, JR. x JR. x JR. (denoted by JR.3) is the set of all ordered triples of three real numbers. Note that JR.3, JR. X JR.2, and JR.2 x JR. are three distinct sets.

For sets A and B, the set BA is the set of all functions mapping A into B. Let A and 0 be nonempty sets and, for each), E A, let A,\ be a nonempty subset of 0. The Cartesian product of the A,\ 's over the set A is a subset {w.,\ EO: ). E A} of OA such

www.MathGeek.com

Page 19: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

18 Set Theory

that for all A E A, w).. E A)... \Ve denote this product by

II A)... )..EA

In the context of this product, the set A).. is called the A-th factor. Also, if {h En: A E A} is a point in the product then h is called the A-th coordinate of the point. For A E A, we will let 'iT)..: ITaEA A" ---7 A).. be the mapping that assigns a point in IT)..EA A).. to its A-th coordinate. The map 'iT).. is called the canonical projedion into the A-th fador or the evaluation at A.

1.1 Theorem (DeMorgan's Law) Let n and I be nonempty sets and assume that Ai C n for each i E I. Then

Proof. If I = 0 then the result reduces to n follows by definition.

Let I be an arbitrary nonempty set, and for each i E I, let Ai be a subset of n. First assume that niEI Ai = 0. Then for each wEn, there exists some i E I such that w tj Ai and hence such that w E UiEI Af. We have shown that n c UiEI Af. Clearly, UiEI Af c n. Thus, n = UiEI Af, and it follows that

n A = 0 = nc = (u A~) c

iEliEI

Now, assume that niEI Ai i= 0. Let w be any point belonging to niEI Ai· Then w E Ai for eachi E I. In particular, w tj UiEI Af, and thus

nAi c (UA~)C iEI iEI

Conversely, it follows from this that assnming that niEI Ai i= 0

implies that (UiEI A~r i= 0. Now let w E (UiEI A~r. Then w tj Af for any i E I and thus w E Ai for all i E I. Therefore,

(u A~)C c n A. lEliEI

www.MathGeek.com

Page 20: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Relations 19

Hence, we have

n Ai = (U A~)C iEI iEI

for any set I and for any family {Ai: i E I} of subsets of D. D

Note that the following corollaries are an immediate consequence of the previous theorem.

1.1 Corollary Au B = (AC n BC)c.

1.2 Corollary An B = (AC U BC)c.

1.3 Relations

Consider subsets A and B of a nonempty set o. A relation R between A and B is a subset of Ax B. If R is a relation between A and B, then two points a E A and b E B are said to be R­related if (a, b) E R. Vve will call a relation R between A and A a relation R on A. A relation R on A is said to be transitive3

if (aI, a2) E Rand (a2, a3) E R imply that (aI, a3) E R. A relation R on A is symmetric if (aI, a2) E R implies that (a2' al) E R. A relation R on A is reflexive if (a, a) E R for all a E A. A relation R on A is called an equivalence relation if it is reflexive, symmetric, and transitive.

1.4 Functions

Let A and B be nonempty sets. A function f mapping A into B (written as f: A ---7 B) is a relation between A and B such that:

JEven though a relation is a set, there is a difference between a transitive set and a transitive relation. Here we are defining a transitive relation.

www.MathGeek.com

Page 21: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

20 Set Theory

1. if a E A then there exists b E B such that (a, b) E 1, and,

2. if (a, bd E 1 and (a, b2 ) E 1 then bl = b2 ·

Thus, a function is defined everywhere on A and assigns precisely one element of B to an element in A. If A and Bare nonempty sets and if 1 is a function mapping A into B, then we typically use the notation 1 (a) = b to denote that (a, b) E .f. The set A is called the domain of the function 1. If SeA then we will let I(S) denote the subset of B given by {b E B : b = l(a) for some a E S}. The set 1 (S) is called the image of Sunder 1. The set 1 (A) is sometimes called the range of 1. By convention, for a E A, 1 ( { a }) is usually taken to be the element 1 ( a) E B rather than the subset {I (a)} of B. For any set A, the indicator function of A is denoted by IA (x) and equals 1 if x E A and equals zero otherwise.

Example 1.1 Let f: lR ----7 [0,00) via l(x) = x 2. Then 1((1,

2]) = (1, 4], 1( {2, 3}) = {4, 9}, and 1( {-3, 3}) = {9}. D

A function 1: A ----7 B is said to be injective or one-to-one if any two distinct elements of A have distinct images in B; that is, if UI i= a2 then l(al) i= l(a2). A function f: A ----7 B is said to be snrjective or onto if I(A) = B; that is, given any b E B there exists an a E A such that 1 (a) = b. A function 1: A ----7 B is said to be bijective or to be a bijection between A and B if it is both injective and surjective.

Let f: A ----7 B and let M c B. The inverse image of }vl with respect to 1 is denoted by 1-1 (}vl) and is defined to be the set {a E A : l(a) E AI}. Note that 1-1 is a function mapping lP(B) into lP(A). A function f: A ----7 B is bijective if and only if 1-1 ( {x} ) is a function mapping the set of all singleton subsets of B into the set of all singleton subsets of A. In this case we write 1-1 ( { X } ) as 1-1 (x) and say that 1 is invertible with inverse 1-1.

Exercise 1.4 For each of the following functions answer the following questions: Is the function onto? Is the function one­to-one? If yes to both then what is the inverse of the function?

www.MathGeek.com

Page 22: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Functions

The inverse of a function f: A ----7 B can be defined in two different ways. First, we can consider f- 1 to be a function that maps JP(B) to JP(A). This type of inverse always exists for any function f. Second, we can consider f- 1

to be a function that maps B to A. This type of inverse exists if and only if f is one-to-one and onto. The type of inverse under consideration must be inferred from the context.

2. f: JR. ----7 [0, (0) via f(x) = x 2.

3. f: [0, (0) ----7JR. via f(x) = x 2.

4. f: [0, (0) ----7 [0, (0) via f(x) = x 2.

21

Exercise 1.5 For a bijection f: A ----7 B show that fU- 1 (b)) = band f-l(f(a)) = a for each a in A and each b in B.

Exercise 1.6 Let S be any set with exactly two elements and let R be any set with exactly three elements. Does there exist a bijection from R into S? Why or why not? Does there exist a bijection from S into R? Why or why not?

Exercise 1. 7 If there exists a bijection of A into B then must there also exist a bijection of B into A?

Two sets A and B are said to be equipotent if there exists a bijection mapping A to B. A set S is said to be countable if it is empty or if it is equipotent to a subset of the positive integers. A set S is said to be finite if it is empty or if it equipotent to a set of the form {I, 2, ... , n} for some positive integer n. A set S is said to be countably infinite if it is countable but not finite. A set is said to be uncountable if it is not countable.

<) Example 1. 2 Let A = JR. and let B denote the set of all functions that map A into {O, I}. Vve will show that A and B

www.MathGeek.com

Page 23: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

22 Set Theory

are not equipotent. Assume, by way of contradiction, that A and B are equipotent. There then exists a function g: A ----+

B such that 9 is onto and one-to-one. For a real number a, denote g(a) by fa(x); that is, g(a) is a function mapping lR to {O, I}. Let 4J(x) = 1 - fx(x) and note that 4J E B. Since 9 is bijective, there exists a point a E A such that 4J(x) = fa (x) which implies that 1 - .Ix (x) = fa (x). If we let x = 0: then it follows that 1 - .In (a) = .In (a) which in turn implies that fa(c~) = ~. This, however, is not possible since fa takes values only in the set {O, I}. This contradiction implies that A and B are not equipotent. D

1.2 Theorem (Dedekind) A set ,is an infinite set if and only 'if it is eq1L'ipotent to a proper subset of 'itself.

1.1 Lemma If A and B are sets, if A is countable, and if .I: A ----+

B J then f(A) ,is cmmtable.

1.2 Lemma Let T be a set having at least two distinct elements and let I be an infinite set. The set of all functions mapping I to T is uncountable.

1.3 Theorem (Schroeder-Bernstein) Let A and B be sets. If there eJ;ists a one-to-one mapping of A to B and a one-to-one mapping of B to A then A and B are equipotent.

Proof. This result is proved on page 20 of Real and Abstract Analysis by E. Hewitt and K. Stromberg (Springer-Verlag, New York, 1965). D

1.3 Lemma ( Cantor) For any set OJ the sets 0 and lfD(O) are not equipotent.

Proof. Assume that 0 and lfD(O) are equipotent. There then exists a function .I mapping 0 to lfD(O) that is onto and one-to­one. Let U = {w ED: w tj. f(w)}. Since U E lfD(O) it follows that U = .I (x) for some point x in D. Is x E U? If x E U then x tj. f(x) which implies that x tj. U. If x tj. U then x E f(x) which implies that x E U. Thus, no sHch fmIction f exists and

www.MathGeek.com

Page 24: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Functions 23

the desired result follows. Note that this lemma implies that "the set of all sets" is not a set! D

Exercise 1.8 Show that any subset of a countable set must itself be countable.

Exercise 1.9 Show that a count ably infinite union of count­able sets is conntable. That is, show that UiEN Ai must be connt­able if Ai is countable for each i E N.

1.4 Theorem The set Q of rational numbeTs is countable.

Proof. Note that

Q - U [U {::}] nE.:z kEN k

and hence Q is countable since it may be written as a countable union of countable sets. D

1.5 Theorem The set of all r-eal numbeTs is an uncountable set.

Proof. Assume that [0, 1) is countable. Hence there exists a bijective function f: [0, 1) ----7 N. Using this function f, enumer­ate the set [0,1) as a sequence {aI, a2, ... }. Notice that each ai

corresponds to a point in [0, 1) and hence may be expressed as a decimal expansion where we agree that any expansion ending with a string of all 9's will instead be written in a form ending with a string of O's. Construct an element b of [0, 1) as follows: Let b = 0.17.117.217.3'" where, for each i E N, 17.i is chosen to be a single digit that is not equal to the ith digit in the decimal expansion of ai. Since b is an element in [0, 1) that is not eqnal to ai for any i we conclude that [0, 1) (and hence JR) is uncount­able. D

www.MathGeek.com

Page 25: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

24 Set Theory

1.5 o--Algebras

Consider a nonempty set D and a subset A of JPl(D). (That is, an element of A is a subset of D.) The set A is said to be an alge bra (or afield) on D if the following three properties are satisfied:

1. DE A.

2. If A E A then Ac E A.

3. If A E A and B E A then A U B E A.

That is, an algebra on D is a subset of JPl(D) that contains D, that is dosed under complementation, and that is dosed under finite unions.

The set A is said to be a a-algebra (or a a-field) on the nonempty set D if the following three properties are satisfied:

1. DE A.

2. If A E A then AC E A.

3. If An E A for each n E N then UnEl'I An E A.

That is, a IT-algebra on D is a subset of JPl(D) that contains D, that is closed under complementation, and that is closed under countable unions. Note that any a-algebra is an algebra and that any algebra contains the empty set. Note also that DeMorgan's Law implies an algebra is closed under finite intersections and that a a-algebra is closed under countable intersections. Finally, note that any algebra containing only a finite number of elements is also a IT-algebra.

Let's briefly review our notation: Let A be a subset of a nonempty set D and assume that w is a point in A and that A is an element of an algebra A on D. Then w E {w} cAe D E A, A E A c JPl(D), and w E A. In the following exercises let D be a nonempty set.

www.MathGeek.com

Page 26: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

cr -Algebras 25

Exercise 1.10 Is {0, O} a cr-algebra on O?

Exercise 1.11 Is lP(O) a cr-algebra on O?

Exercise 1.12 algebras on O.

Let 0 {l, 2, 3}. Find five different a-

Exercise 1.13 Show that an intersection of cr-algebras is itself a a-algebra. Does the same hold for a union of a-algebras?

Exercise 1.14 Let 0 be the set of all real nnmbers and let A be the collection of all snbsets of D that are either finite or have finite complements. (A set with a finite complement is said to be cofinite.) Is A an algebra on O? Is A a a-algebra on O?

Exercise 1.15 Let 0 be the set of all real numbers and let A be the collection of all subsets of 0 that are either countable or have conntable complements. (A set with a countable com­plement is said to be coconntable.) Is A an algebra on D? Is A a cr-algebra on O?

Consider a nonempty set 0 and a cr-algebra A on O. The ordered pair (0, A) is called a measurable space and sets in A are called measurable sets. Later we will refer to D as a sample space and refer to measnrable sets as events.

Consider a non empty set 0 and let F be any subset oflP(O). The cr-algebra generated by F is denoted by cr(F) and is defined to be the smallest cr-algebra on 0 that contains each element in F. That is, if B is any cr-algebra on 0 that contains each element in F then cr(F) C B. Note, also, that if F is already a cr-algebra, then a(F) = F.

Exercise 1.16 ~What is the difference (if any) between a(0) and a( {0})?

www.MathGeek.com

Page 27: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

26 Set Theory

Exercise 1.17 \Vhat is iT( {0})7

Exercise 1.18 \Vhat is iT( {A}) for a subset A of 07

Exercise 1.19 What is 0"( {A, B}) for subsets A and B of 07 (In general, 0"( {A, B}) will contain 16 elements.)

Exercise 1.20 Consider a iT-algebra F on a nonempty set O. Does there exist a subset A of n snch that A C F and A E F7

\Ve will next consider several properties of inverse functions. In each of the following three lemmas we will let the context set our notation.

Let A and B be nonempty sets and let .I: A ----+ B. Further, for a nonempty set I, let Bi be a subset of B for each i E I. Note that:

iEI

(by definition of the inverse)

{a E A : 3i E I st .I (a) E B i }

(by definition of the union)

{a E A : 3i E I st a E .1-1 (Bin

(by definition of .1-1)

{a E A: a E U f-1(B in iEI

(by definition of the union)

U f-1(Bi ).

iEI

Thus, we have the following result:

1.4 Lemma .1-1 (U Bi) = U f-1(B i ).

iEI iEI

www.MathGeek.com

Page 28: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

cr -Algebras 27

Next, let M be a subset of B and notice that

a E (f-l(NI))C ¢:::::} a tJ. j-l (lVI)

¢:::::} j(a) tJ. .M

¢:::::} j(a) E N[C ¢:::::} a E j-l(l\;IC

).

Thus, we have the following result:

Again, let A and B be nonempty sets and let f: A ---+ B. Fur­ther, for a nonempty set I, let B;, for i E I, be a subset of B. Note that:

(UU-1(Bi))C)C; via DeMorgan's Law 1EI

(u j-l(Bn)C; via Lemma 1.5 1EI

(j-l (~Bf) J; via Lemma 1.4

j-l ( (~Bf J) ; via Lemma 1.5

j-l (n Bi) ; via DeMorgan's Law. 1EI

Thus, we have the following result:

1.6 Lemma n j-l(Bi) = j-l (n Bi) . 1EI 1EI

If f: A ---+ B and if F is a subset of JID(B) then we will let j-l(F) denote the subset of JID(A) consisting of every subset of A that is an inverse image of some element in F. That is, S E j-l (F) if and only if S = j-l (T) for some T E F. The following theorem follows quickly from the three preceding results.

1.6 Theorem Let A and B be nonempty sets and let j: A ---+ B. If B is a cr-algebm on B then j-l(B) is a cr-algebm on A.

www.MathGeek.com

Page 29: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

28 Set Theory

Exercise 1.21 Let A and B be nonempty sets and let f: A ---+ B. For a subset 5 of A, let f(5) denote the subset of B given by {f(.s) : .s E 5}. For a subset P of lP(A), let f(P) denote the subset of lP(B) given by {f(5) : 5 E P}. If A is a cr-algebra on A then must f(A) be a cr-algebra on B?

1.7 Theorem Consider- measumble spaces (rh, .r1) and (02 , .r2) and let f be a function mapping 0 1 to O2 . Let A be a collection of subsets of O2 s1Lch that rr(A) = .r2. If f-1(A) c .r1 then f- 1(.r2) c .r1.

Proof. It follows from Lemma 1.4 and Lemma 1.5 that the collection Q of all subsets A of O2 such that f-1(A) E .r1 is a cr­algebra on O2 . Note that O2 E Q since f-1(0) = 0. (Note that in the last equation the first empty set is o~ and the second empty set is n1-) Further, note that A c Q. This implies that rr(A) C Q. Since rr(A) = .r2 the desired result follows immediately. D

<> 1.6 Dynkin's 'iT-A Theorem

Consider a nonempty set o. A subset P of lP(O) is said to be a 7r-system if it is dosed under the formation of finite intersections; that is, if A E P and B E P imply that An B E P. A subset L of lP(O) is said to be a A-system if it satisfies the following three properties:

1. 0 E L.

2. If A E L then AC E L.

3. If An E L for each 17 E N and if Ai n Aj = 0 when i 1- j then Un EN An E £.

www.MathGeek.com

Page 30: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Dynkin's IT-A Theorem 29

That is, a A-system contains 0, is closed under the formation of complements, and is closed under the formation of countable disjoint unions.

The following resnlt is called Dynkin's 1T-A Theorem and is often quite helpful in proving uniqneness.

1.8 Theorem (Dynkin) Consider a nonempty set 0 and subsets P and £, of JID(O). If P is a 1T-system and if £, is a A-system then Pc£, implies that rr(P) C £'.

Proof. Let A(P) denote the intersection of all A-systems that in dude P as a snbset. Each family of sets in this intersection contains 0, is closed under proper differences, and is closed un­der strictly increasing limits of sets. 4 Thus, the intersection itself contains 0, is closed under proper differences, and is closed un­der strictly increasing limits of sets. Hence, A(P) is a A-system. Note that A(P) C £. Thus if A(P) is a 1T-system, then it will be a rr-algebra that is a superset of rr(P) and a subset of £', and hence the desired result will follow. Therefore, we will show that A(P) is a 1T-system.

For each subset A of 0, let PA denote the family of all subsets B of 0 such that An B is an element of A(P). Let Al E A(P). Notice that 0 E PAl since Al nO = Al E A(P). Now assume that C1 and C2 are elements of PAl such that C1 c C2. Then (AI n C1 ) c (AI n C2 ) and thus ((AI n C2 ) \ (AI n C1 )) E A(P); also, (AI n C2 ) \ (AI n Cd = (AI n C2 ) n (AI n CdC = (AI n C2) n (A~ U Cf) = Al n C2 n Cf = Al n (C2 \ Cd· Thns PAl is dosed lmder proper differences. Finally, assnme that {Dn}nEN is an increasing seqnence of sets in PAl. Then the sequence {AI n Dn}nEN is either increasing or there exists some kEN such that n > k implies that Al n Dn = AI. In either case, lim(Al n Dn) E A(P) since A(P) is a 1T-system containing the set AI. Thus, PAl is a A-system. Furthermore, notice that P C PAl

since (AI n B) E P c A(P) for all B E P. Thns, if Al E P, then PAl is a A-system that indndes P. Since A(P) is the minimal A-system that indndes P, we see that for Al E P, A(P) C PAl.

4Limits of sets will be defined later. If this is your first trip through the book, then hold off on this proof until you have read the next chapter.

www.MathGeek.com

Page 31: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

30 Set Theory

From this we see that if Al E P and B E ).(P) then (Al n B) E

).(P). This, in turn, implies that for B E )'(P), P C PB. Since, for B E )'(P), PB is a ).-system that includes P and since ).(P) is the minimal ).-system that includes P, we see that ).(P) C PB. Now we observe that this means that for B E ).(P) and. for C E ).(P) we have (BnC) E ).(P). Thus, ).(P) is a 1T-system. D

<) 1.7 Topological Spaces

Let 0 be a nonempty set. A topology U for 0 is a subset of JP(O) that contains 0, contains the empty set, is closed under finite intersections, and closed under arbitrary unions. A topological space is an ordered pair (0, U) where 0 is a nonempty set and U is a topology for D. The sets in U are called the open sets with respect to the topology U on O. The complement of an open set is called a closed set. Note that in any topological space (0, U) the sets 0 and 0 are both open and closed. It follows from DeMorgan's Law that a finite union of closed sets is closed and an arbitrary intersection of closed sets is closed.

Example 1.3 Consider the set JR:k for a positive integer k. For x E JR:k and r E (0, (0), let B(x, r) denote the open Euclidean ball in JR:k centered at x with radius r; that is,

B(x, r) = {Y E JR:k : "t,(1T i (X) - 1T.;(y))2 < r2} .

For the usual topology on JR:k , a subset U of JR:k is open if and only if for any x E U there exists a positive real number r snch that B(x, r) C U. Unless noted otherwise, we will always assnme that JR:k is equipped with its usual topology. D

Let (0, U) be a topological space and let A C o. A point w E 0 is a limit point of A if A n (U \ {w}) is not empty for any open set U that contains w. Note that a limit point of A need. not be an element of A. The closure of A is the union of A with the set

www.MathGeek.com

Page 32: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Caveats and Curiosities 31

of all limit points of A. A neighborhood of A is any subset of n that includes an open superset of A. An isolated point of A is a point w E A such that AnN = {w} for some neighborhood N of {w}. A closed set that has no isolated points is said to be a perfect set. A subset of n is said to be a Gb set if it is expressible as a <:ountable intersedion of open sets. A subset of n is said to be an Fer set if it is expressible as a countable union of dosed sets.

1.8 Caveats and Curiosities

It is important to keep in mind the crucial role that topology plays in dealing with questions regarding convergence. For ex­ample, the set {ffi., 0} is a topology on the real line that is called the trivial topology on ffi.. Under this topology, every sequence of real numbers converges to every real number!

www.MathGeek.com

Page 33: Probability - Basic Ideas and Selected Topics

32

www.MathGeek.com

I wanted certainty in the kind of way in which people want religious faith. I thought that certainty is more likely to be found in mathematics than elsewhere. But I dismvered that many mathemati­cal demonstrations, which my teachers ex­pected me to accept, were full of fallacies, and that, if certainty were indeed discov­erable in mathematics, it would be in a new field of mathematics, with more solid foundations than those that had hitherto been thought secure. But as the work proceeded, I was continually reminded of the fable about the elephant and the tor­toise. Having mnstructed an elephant upon which the mathematical world could rest, I found the elephant tottering, and proceeded to construct a tortoise to keep the elephant from falling. But the tortoise was no more secure than the elephant, and after some twenty years of very arduous toil, I came to the conclusion that there was nothing more that I could do in the way of making mathematical knowledge in­dubitable. -Bertrand Russell (who, with Alfred North Vlhitehead, constructed a 362 page proof that 1+1=2.)

www.MathGeek.com

Set Theory

Page 34: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

2 Measure Theory

2.1 Definitions

A measure J-L on a measurable space (n, F) is a fmIction on F that satisfies the following three properties:

1. J-L: F ---7 [0, 00]

2. J-L(0) = 0

3. If An E F for all n E N and if An n Am = 0 when m i= n then

A function that satisfies property (3) is said to be countably additive. Thus, a measure is a countably additive, nonnegative, extended real-valued set function that maps the empty set to zero. If A is an element of F then J-L(A) is called the measure (or J-L-measure) of A.

A measure J-L on a measurable space (n, F) is said to be a finite measure if j1(n) < 00. A measure j1 on a measurable space (n, F) is said to be a IT-finite measure if n may be written as n = UnEN An where An E F and J-L(An) < 00 for each n.

If J-L is a measure on a measurable space (n, F) then the re­sulting ordered triplet (n, F, J-L) is called a measure space. A probability measure P on a measurable space (n, F) is a mea­sure on (n, F) such that p(n) = 1. The associated measure space (n, F, P) is then called a probability space and sets in F are called events. If A is an event then P(A) is called the probability of A. Note that it does not make sense to discuss the probability of subsets of n that are not events.

Example 2.1 Let n be a nonempty set and let Wo be a point in n. Let J-L: JID(n) ---7 {O, I} via J-L(A) = 1 if Wo E A and J-L(A) = 0 if Wo tt A. Then J-L is a measure on (n, JID(n)) and (n, JID(D) , J-L)

www.MathGeek.com

Page 35: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

34 Measure Theory

is a probability space. The particular measure in this example is known as Dirac measure at the point Wo. D

Example 2.2 Let 0 be any nonempty set and define J-L on lP(O) by letting J-L(A) = 00 if A is an infinite set and by letting J-L(A) equal the number of points in A if A is a finite set. Then J-L is a measure on (0, lP(O)). The particular measure in this example is known as munting measure. D

2.1 Theorem (Monotonicity) Consider a measure space (0, F, J-L) and let A and B be elements of F. If A c B then J1 (A) ::; J-L( B).

Proof. Notice that B = Au (B \ A) and also that An (B \ A) = 0. Since J1 is countably additive we see that J-L(B) = J-L(A) + J-L(B \ A). Further, since J-L is nonnegative, we see that J-L(B \ A) 2': o. Thus, it follows that J-L(A) ::; J-L(B). D

2.2 Theorem Consider a measure space (0, F, J-L). The measure J-L is countably s1Lbaddii-ive. That is, given any (not necessarily disjoint) sequence {An}nEN of sets in F it follows that

Proof. Define a new sequence {A;'JnEN of measurable sets as follows:

n-l

A~ An \ U Ak for n E N \ {I}. k=l

Note that

U An = U A~ nEN nEN

and that the A~ 's are disjoint. (The collection {A;' : n E N} is called a disjointification of the collection {An : n E N}.) Further, since A;1 c An for each n, Theorem 2.1 implies that

www.MathGeek.com

Page 36: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Supremums and Infimums 35

JL(A~) :S JL(An) for each n. This observation combined with the countable additivity of JL implies that

Logic is the railway track along which the mind glides easily. It is the axioms that determine our destination by setting us on this track or the other, and it is in the mat­ter of choice of axioms that applied math­ematics differs most fundamentally from pure. Pure mathematics is controlled (or should we say "uncontrolled"?) by a prin­ciple of ideological isotropy: any line of thought is as good as another, provided that it is logically smooth. Applied math­ematics on the other hand follows only those tracks which offer a view of natural scenery; if sometimes the track dives into a tunnel it is because there is prospect of scenery at the far end. -J. L. Synge

2.2 Supremums and Infimums

D

Let S be a subset of JR. An element x E JR is said to be an upper bound of S if y :S x for all yES. An element x E JR is said to be a lower bound of S if x :S y for all yES.

\Ve say that a subset of JR is bounded above (below) if it has an upper (lower) bound. If a subset of JR has both an upper bound and a lower bound then we say that the set is bounded. vVe say

www.MathGeek.com

Page 37: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

36 Measure Theory

that a subset of lR is unbounded if it lacks either an upper or a lower bound.

Let S be a subset of R If S is bounded above then an upper bonnd of S is said to be a snpremnm (or least npper bonnd) of S if it is less than every other npper bonnd of S. We denote the supremum of S by sup S. If S is bounded below then a lower bonnd of S is said to be an infimnm (or greatest lower bonnd) of S if it is greater than every other lower bound of S. \Ve denote the infimum of S by inf S. If S is not bounded above then we will define snp S to be 00 and if S is not bonnded below then we will define inf S to be -00. Thus, any subset of lR possesses an infimum and a supremum.

The supremum of a subset S of lR and the infimum of Sneed not belong to S. If sup S is an element of S then we sometimes refer to sup S as the maximum of S and denote it by max S. If inf S is an element of S then we sometimes refer to inf S as the minimum of S and denote it by min S. For example, if S = (a, b] then inf S = a, sup S = b, max S = b, and min S does not exist.

<) Exercise 2.1 Does there exist a subset A of lR such that sup A < inf A?

2.3 Convergence of Sets: Lim Inf and Lim Sup

Let {An }nEN be a sequence of subsets of some non empty set O. The set of all elements from 0 that belong to all but a finite number of the An's is called the inferior limit of the sequence {An}nEN and is denoted by liminf An or sometimes by [An a.a.] where a.a. is an abbreviation for "almost always." The set of all elements from 0 that belong to infinitely many An's is called the superior limit of the sequence {An}nEN and is denoted by lim sup An or sometimes by [An i.o.] where i.o. is an abbreviation

www.MathGeek.com

Page 38: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Convergence of Sets: Lim Inf and Lim Sup 37

for "infinitely often." That is,

00 00

liminf An = U n Am k=l m=k

and 00 00

limsllpAn= n U Am. k=l m=k

If lim inf An = lim sup An = A then we say that the sequence {An }nEN converges to the set A. In such a case we denote A by limn-->0O An.

Exercise 2.2 Show that limsllpAn = (liminf(A~))c.

Exercise 2.3 Show that lim inf An C lim sup An.

Exercise 2.4 Define subsets An of JR; via:

A. - {(-lin, 1] n - (-1, lin]

if n is odd if n is even

for positive integers n. Show that lim inf An = {O} and that limsllpAn = (-1,1].

2.3 Theorem (The First Borel-Cantelli Lemma) Let (0, F, JL) be a measure space. Ij {An}nEN is a sequence oj measurable sets and 'if

=

then JL(limsupAn) = o.

Proof. To begin, note that since

00 00

lim sup An = n U Ak m=l k=m

it follows that lim sup An C U~m Ak for each mEN. Thus, the monotonicity of JL implies that

JL(lim sup An) :::; JL CQ, Ak)

www.MathGeek.com

Page 39: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

38 Measure Theory

for each mEN. Hence, via the countable subadditivity of p, it follows that

00

k=m,

for each mEN. Note that since ~~=1 P,(An) < 00 it follows that ~~m p,(Ak) ---7 0 as m ---7 00; that is, the tail of the convergent sequence vanishes. Therefore, since p,(lim sup An) is nonnegative and must be smaller than any positive value we conclnde that p,(lim sup An) = O. D

The following continuity property of finite measures will be use­ful in proving some later results.

2.1 Lemma Consider a measure space (n, F, p,) such that p,(n) < 00. If {An}nEN is a sequence of measurable sets that converges to some (measurable) set A then the sequence p,( An) converges to p,(A) as 17 ---7 00.

2.4 Measurable Functions

Let (nl' Fd and (n2' F2 ) be measurable spaces. If f: n 1 ---7 n 2 is such that f- 1(F2 ) C Fl then we say that f is a measurable function mapping (nl' F 1 ) into (n2' F 2 ), and we denote this property by writing f: (nl, F1 ) ---7 (n2' F2).

Example 2.3 Let n 1 = {red, blue, green} and let n 2 = {O, I}. Further, let Fl = {0, n 1 , {red, blue}, {green}} and let F2 = {0, n 2, {O}, {I}}. Define f: n 1 ---7 n 2 via f(red) = f(blue) = 0 and f(green) = 1. Define g: n 1 ---7 n 2 via g(red) =

o and g(green) = g(blue) = 1. Note that f-l(0) = 0 and f- 1 (n 2 ) = n 1 . (Indeed, these relationships always hold.) In addition, note that f- 1

( {O}) = {red, blue} and that f- 1 ( {I}) =

{green}. Thus, since f- 1 (F2 ) C Fl (equal, in fact) we conclude that f is a measurable function mapping (nl' F 1 ) into (n2' F2). Note, however, that since g-l( {O}) = {red} tj:. Fl it follows that 9 is not a measurable function mapping (nl, F 1 ) into (n2, F2). D

www.MathGeek.com

Page 40: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Real Borel Sets 39

2.5 Real Borel Sets

Recall that a bounded open interval in JR is a subset of JR of the form (a, b) where a and b are real numbers snch that a < b and where, as usnal, (a, b) = {x E JR : a < x < b}. Let A denote the collection of all bounded open intervals in R The collection of Borel subsets of JR is denoted by B(JR) and is defined by B(JR) = a-(A). That is, B(JR) is the smallest a--algebra on JR that contains every bounded open interval. The subsets of JR in B(JR) are called real Borel sets or Borel measurable subsets of R Note that (JR, B(JR)) is a measurable space.

I hold ... that utility alone is not a proper measure of value, and would even go so far as to say that it is, when strictly and short-sightedly applied, a dangerously false measure of value. For mathematics, which is at once the pure and untrammelled cre­ation of the mind and the indispensable tool of science and modern technology, the adoption of a strictly utilitarian standard could lead only to disaster; it would first bring about the drying up of the sources of new mathematical knowledge and would thereby eventually cause the suspension of significant new activity in applied mathe­matics as well. In mathematics we need rather to aim at a proper balance be­tween pure theory and practical applica­tions ... -Marshall Stone

<:; Exercise 2.5 Try to find a subset of JR that is not a real Borel set.

www.MathGeek.com

Page 41: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

40 Measure Theory

Consider a measurable space (0, :F). If f: (0, :F) ---+ (JR, B(JR)) then f is said to be a real-valued :F-measurable function defined on 0. If.f: (JR, B(JR)) ---+ (JR, B(JR)) then f is said to be a real-valued Borel measurable function defined on JR.

Exercise 2.6 Show that any countable subset of JR is a real Borel set.

Let f : JR ---+ R Recall that we say that f is continuous at the real number x if and only if for any c > 0 there exists 8 > 0 such that if Ix - yl < 8, then If(x) - f(y)1 < c. Further, if f is continuous at x for each real number x, then we say that f is continuous.

2.4 Theorem Let f : JR ---+ JR. The function f is continuous if and only if for each open set U of real numbers, f-l(U) zs an open set.

Proof. Suppose that f- 1 (U) is open for each open set U of real numbers, and let x be an arbitrary real number. Then, given any real number c > 0, the interval 1= (.t(x) -c, f(x) +c) is an open set, and so f-l(1) must be open. Now, since x E f- 1 (1), there must exist some real number 8 > 0 such that (x - 8, x + 8) C f-l(1). But this implies that if Ix - yl < 8, then f(y) E (f(x) - c, f(x) + c). Hence, f is continuous at x and, since x was arbitrary, f is continuous.

Now, suppose that f : JR ---+ JR is continuous, and let U be a nonempty open subset of R If f-l(U) is empty then the desired result follows since the empty set is open. Assume then that f-l(U) is not empty and let x E f-l(U). Then, since f(x) E U there exists some c > 0 such that (f(x) - s, f(x) + c) is a subset of U. Since f is continuous at x there exists a b > 0 such that If(x) - f(y)1 < c when Ix - YI < b. Thus, for every y E (x - 8, x + 8), it follows that f(y) E (f(x) - c, f(x) + c) which is an open subset of f-l(U). Thus, f-l(U) is open. D

<) Exercise 2.7 Show that any continuous function mapping JR to JR is Borel measurable.

www.MathGeek.com

Page 42: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Real Borel Sets 41

Consider a nonempty set o. A IT-subalgebra of a IT-algebra F on 0 is a IT-algebra on D that is a subset of F. For example, {0, O} is a IT-sub algebra of any IT-algebra on O. For a second example, let F = {0, 0, A, AC} for some proper subset A of O. Even though the subset {0, A} of F is a IT-algebra on A, it is not a IT-subalgebra of F.

\Ve say that a IT-algebra A on a nonempty set 0 is collntably generated if A = IT( {An: n E N} ) for some choice of the An's. If F is a countably generated IT-algebra on a nonempty set 0 and if 9 is a IT-sllbalgebra of F, then mllst 9 be collntably generated? In the following example we show that the answer is no.

Example 2.4 Let 0 = [0, 1] and let F = 8([0, 1]). Fur­ther, let 9 be the IT-subalgebra of F given by the countable and cocountable subsets of [0, 1]. (A set is cocountable if it has a countable complement.) It follows from one of the problems that F is countable generated. Assume now that 9 is also countably generated. That is, assume that 9 = IT( {An : n E N}) where An C [0, 1] for each n E N. Note that without loss of general­ity, we may assume that An is countable for each n E N. Let B = UnEl'l An and note that B is also countable. Thus, there exists some real number x such that x E [0, 1] \ B. Notice also that if D is the family of all subsets of B and their complements then D is a IT-algebra such that 9 :J D :J IT( {An: n E N}). But, D i= 9 since {x} is in 9 but not in D. This contradiction implies that 9 is not countably generated even though it is a IT-subalgebra of the countably generated IT-algebra F. D

2.2 Lemma Consider a mea.'mrable space (D, F) and real-vahled F -measurable functions f and 9 defined on O. The set {w EO: f(w) > g(w)} is an element of F.

Proof. \iVrite the set Q of rational numbers as a sequence {rn}"EN. Note that

{w EO: f(w) > g(w)}

U {w EO: f(w) > Tn > g(w)}

U ({WE 0 : f (w) > Tn} n {W EO: 9 (w) < Tn} ) nEl'l

www.MathGeek.com

Page 43: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

42 Measure Theory

U U-1((rn' (0)) ng-1((-00, rn))). nEN

The desired result then follows since (rn' (0) and (-00, rn) is in B(JR) for each n E N. D

2.3 Lemma Consider a meas1Lrable space (fl, F) and a real-valued F-measurable ftmction f defined on fl. If 0; is any real number then f + 0; and o;f are F -measurable functions defined on fl.

2.4 Lemma Consider a measurable space (fl, F) and real-valued F-measurable functions f and 9 defined on fl. The function f + 9 is an F-measurable function defined on fl.

2.5 Lemma Consider a measurable space (fl, F) and real-valued F -measumble functions f and 9 defined on fl. The function f 9 is an F -measm-able f1mdion defined on fl, and, if 9 ,is nonzeTO then f /g is an F-measurable function defined on fl.

2.6 Lemma Consider a measurable space (fl, F) and a sequence {fn}nEN of real-valued F-measumble functions defined on fl. The functions SUPkEN fk(X) and inhEN fk(X) are F-measurable functions defined on fl.

Consider a sequence {Xn}nEN of real numbers. Recall that the superior limit of this sequence is given by

lim snp Xn = inf snp Xn n~= JEN n';?j

and the inferior limit of this sequence is given by

lim inf Xn = sup inf Xn. n-+= JEN n';?j

Further, this sequence is said to converge to a real number x if

lim sup Xn = lim inf Xn = x. n-.= n-----7CX)

Finally, a sequence {fn}nEN of real-valued functions defined on some nonempty set fl is said to converge pointwise to a function f : fl ----+JR is the sequence {fn(W)}nEN ofreal numbers converges to the real number f (w) for each W E fl. In this case, we denote the pointwise limit f as limn -.= fn.

www.MathGeek.com

Page 44: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Lebesgue Measure and Lebesgue Measurable Sets 43

2.7 Lemma Consider a measurable space (D,:F) and a sequence {fn}nEN of real-valued :F -measurable functions defined on D. ~f

limn ---*= fn exists then it is an :F-measurable function defined on D.

2.6 Lebesgue Measure and Lebesgue Measurable Sets

For an open interval (a, b) of JR., let £( (a, b)) denote the length of the interval (a, b). That is, if I = (a, b) with a < b then £(1) = b - a.

Let A be a snbset of R \lVe will say that a conntable colledion {In: n E 11{ C N} of open intervals covers A if

A C (U In). nEM

For each snch set A, let SA be the subset of JR. given by

The onter Lebesgne measure of A is denoted by m * (A) and is defined by m*(A) = inf SA. (Note that outer Lebesgue measure is defined for any set in lP(JR.) but is not a measure on (JR., lP(JR.)) since it fails to be conntably additive.)

2.1 Definition (The Caratheodory Criterion) A subset E of JR. is said to be Lebesgue measurable if

m*(B) = m*(B n E) + 'm*(B n E C)

for every subset B of R

Let M (JR.) denote the collection of all subsets of JR. that satisfy the Caratheodory Criterion; that is, .1\// (JR.) denotes the collec­tion of all Lebesgue measurable subsets of R

www.MathGeek.com

Page 45: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

44 Measure Theory

2.5 Theorem The set B(~)is a proper subset of M(~).

2.6 Theorem The set M(~) is a proper subset ofJPl(~).

Proof. For the construction of a non-Lebesgue measurable subset of the real line, see pages 41-42 of Counie'rexamples in Probability and Real Analysis by G. Wise and E. Hall (Oxford University Press, New York, 1993). Also, see page 63 of Real Analysis by H. L. Royden (lVIacmillan, New York, 1988, Second ~~. D

2.7 Theorem The set M(~) ,is a (J-algebm on R

Proof. See pages 56-58 of Real Analysis by H. L. Royden (Macmillan, New York, 1988, Second edition). D

Lebesgue measure m on the measurable space (~, M(~)) is de­fined to be the restriction of m* to M(~). That is, m(A) is equal tom*(A) if A E M(~) and m(A) is left undefined if A 1:. M(~). Lebesgue measure A on the measurable space (~, B(~)) is defined to be the restriction of m to B(~).

Lebesgne measure corresponds to om intuitive concept of length. That is, the Lebesgue measure of an interval is the length of the interval. Lebesgue measure, however, is defined for subsets of ~ that are mnch more complicated than intervals. Note, also, that we have only defined Lebesgue measure for certain subsets of the real line. Later, we will define it for certain snbsets of ~k. In any case, however, when discllssing the Lebesglle measnre of a set A it will always be true that A is a subset of ~k for some positive integer k.

2.8 Theorem Let A denote Lebesgue measure on (~, B(~)). If x E

~ then A( {x}) = o.

Proof. For each positive integer TI, let In denote the subset of ~ given by

I = (x -~. x + ~) . n 211' 2n

www.MathGeek.com

Page 46: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Lebesgue Measure and Lebesgue Measurable Sets 45

Note that )..(In) = lin. Further, since {x} C In for each n it follows via monotonicity that )..({x}) ::::; l/n for any positive integer rL. Thus, we conclude that )..( {x}) = o. D

<> Exercise 2.8 If A is a Lebesgue measurable subset of Jl{ hav­ing zero Lebesgue measure then must A be countable?

Consider a measure space (0, :F, JL). A subset of ° is said to be a nnll set (or a JL-null set) if it is measurable and has measnre zero. That is, A is a null set if A E :F and if JL(A) = O. Let A C B where where B is a null set. If A E :F then A must also be a nnll set since JL(A) ::::; JL(B). In general, however, A need not be a null set since A need not be an element of :F. A measure space is said to be complete if every snbset of a null set is a measurable set. Note that while the empty set is always a null set, a null set need not be empty.

2.9 Theorem Corresponding to any measure space (0, :F, JL) there exists a complete measure space (0, :Fa, JLo) such that

1. :F c :Fa.

2. JL(A) = JLo(A) for each set A E :F.

3. A E :Fa if and only if A = E U F where E E :F and where FeN for some N E :F with JL(N) = O.

The measure space (0, :Fa, JLo) is said to be the completion of

(0, :F, JL).

2.10 Theorem The measure space (Jl{, M(Jl{),m) is the completion of the measure space (Jl{, B(Jl{) , )..).

Exercise 2.9 If we complete Lebesgue measure on the real Borel sets, then we obtain the real Lebesgue sets. There do exist measures on the real Borel sets that when completed yield the power set of R Can you think of such a measure?

www.MathGeek.com

Page 47: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

46 Measure Theory

For a positive integer n, let JRn denote the n-fold Cartesian prod­uct of JR with itself. That is, an element of JRn is an ordered n-tuple of the form (aI, ... ,an) where ai E JR for each i. A set I of the form I = II X ... x In where h is an open interval of the form (ak' bk) for each k is called an open rectangle in JRn. The smallest IT-algebra on JRn that contains every open rectan­gle in JRn is denoted by B(JRn) and is called the set of Borel measurable subsets of JRn. Note that, for any positive integer n, (JRn, B(JRn)) is a measurable space. If f: (JRk, B(JRk)) ---+ (JR, B (JR)) for some kEN then f is said to be a real-valued Borel measurable function defined on JRk

.

2.11 Theorem For any kEN there exists a unique measure A on (JRk, B(JRk)) such that

A(A1 X ... X Ak) = "\(A1) ... "\(Ak)

for any sets AI, ... , Ak from B(JR) where..\ is Lebesgue measure on (JR, B(JR)). The measure A on (JRk, B(JRk)) is called Lebesgue measure on (JRk, B(JRk)).

2.7 Caveats and Curiosities

www.MathGeek.com

Page 48: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

3 Integration

3.1 The Riemann Integral

Let f be a bonnded real-valued fnndion defined on an interval [a, b] and let f = {aD, ... , an} be a subdivision of [a, b]; that is, a = aD < al < ... < an = b for some positive integer n. Let S denote the collection of all subdivisions of [a, b]. Define real-valued functions 51 and 52 on S via

n

51(r) = 2:)0:; - ai-I) sup{f(x) : ai-l < x ::::; o:d i=1

and.

n

S:z(r) = 2]a; - O:i-l) inf{f(x) : ai-l < x ::::; ai} i=1

where f = {aD, ... , an} is an element from S. The upper Riemann integral of f over [a, b] is given by

U lb f(x)dx = inf{51 (f) : f E S}

and the lower Riemann integral of f over [a, b] is given by

£ lb f(x) dx = sup{52(f) : f E S}.

If the upper and lower Riemann integrals of f over [a, b] are each equal to the same value j3 then we say that f is Riemann integrable over [a, b] and we denote the value /3 by J: f(x) dx and call it the Riemann integral of f over [a, b]. As the next example shows, it is not difficult to find functions that are not Riemann integrable.

Example 3.1 Let [a, b] with a < b be a subinterval of ffi. and define a real-valued fnndion f on [a, b] via

f(x) = {~ if x is irrational if x is rational.

www.MathGeek.com

Page 49: Probability - Basic Ideas and Selected Topics

48

www.MathGeek.com

Once when walking past a lounge in the University of Chicago that was filled with a loud crowd watching TV, [Zygmund] asked one of his students what was going on. The student told him that the crowd was watching the \Vorld Series and explained to him some of the features of this baseball phenomenon. Zygmund thought about it all for a few minutes and commented, "I think it should be called the \Yorld Se­quence." -Ronald Coifman and Robert Strichartz writing about Antoni Zygmund

Integration

That is, f(x) = IiQ)(x). Let r = {aD, ... , an} be a subdivision of [a, b]. Given any positive integer i ::; 1/, there exists a rational number qi and an irrational number Ti such that O:i-1 < qi ::; ai and such that O:i-1 < Ti ::; ai. Hence, it follows that sup{f (x) : ai-1 < x ::; ai} = 1 and inf{f(x) : ai-1 < x ::; ai} = O. From this we conclude that Sl (r) = L:~=1 O:i - a,-l = b - a

and SAr) = O. Since these values do not depend upon the particular subdivision r that was selected it follows that the upper Riemann integral of f over [a, b] is equal to b - a and that the lower Riemann integral of f over [a, b] is equal to zero. Since these values do not coincide, we see that f is not Riemann integrable over [a, b]. D

Example 3.1 points out a serious shortcoming of the Riemann integral. In particular, for a Borel set E we would like IE to be integrable and fJJ~IE(x) dx to equal the Lebesgue measure of E. That is, ideally fIR IiQ)(x) dx should equal zero (the Lebesgue measure of Q) but the Riemann integral of IiQ) does not exist. Although the Riemann integral is not general enough or powerful enough for our purposes, it remains useful for other purposes due to its simplicity and computability.

\Ve will consider two additional types of integration. The first will be a straightforward extension of the Riemann integral and, as above, will be used to integrate functions defined on a subset

www.MathGeek.com

Page 50: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

The Riemann-Stieltjes Integral 49

of the real line. The second new integration technique will be much more general in that it will allow us to integrate functions defined on arbitrary sets.

3.2 The Riemann-Stieltjes Integral

Let f be a real-valued function that is defined on an interval [a, bj. As before, let r = {ao, ... , an} be a subdivision of [a, bj. (That is, a = ao < a1 < ... < an = b.) Let g denote the set of all subdivisions of [a, bj. Define a function S mapping g into the extended nonnegative reals via

n

S(r) = L If(ai) - f(ai-1)1· i=l

The variation of f over [a, bj is defined by

v = sup{S(r) : rEg}.

If V < 00 then we say that f is of bounded variation on [a, bj. If V = 00 then we say that f is of unbounded variation on [a, bj.

Example 3.2 Consider a function f defined on [a, bj that is nondecreasing; that is, if a ::; x < y ::; b then f (x) ::; f (y). Then S(r) = f(b) - f(a) for any subdivision r and hence it follows that V = f(b) - f(a). D

Example 3.3 Let f(x) = I<QI(x) for x E [a, bj. Then, given any positive number B there exists a subdivision r of [a, bj such that S(r) > B. (Simply choose r = {(Yo, ... , an} such that 11

is large and such that ai is rational when i is even and irrational when i is odd.) Thus, V = 00 and we conclude that f is of unbounded variation on [a, bj. D

Exercise 3.1 A function f defined on [a, bj and taking values in lR is said to satisfy a Lipschitz condition on [a, bj if there

www.MathGeek.com

Page 51: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

50 Integration

exists a constant C such that If(x) - f(y)1 s Clx - YI for all x and y in [a, bj. Show that for such a function f it follows that V S C(b - a) where V is the variation of f over [a, bj.

Now, let f and 9 be real-valued functions defined on [a, bj and c:onsider a subdivision r = {ao, ... , an} of [a, bj. Let <D be a sample from the subdivision r. That is, <D = {PI, ... , Pn} is a collection of real numbers such that ai-l S /3i S 0:; for each positive integer i S n. Let g denote the set of all subdivisions of [a, bj and for a subdivision r let Sr denote the collection of all samples from the subdivision r. Let S denote the set of all ordered pairs of the form (r, <D) where <D E Sr and define a function R mapping S to lR via

n

R((r, <D)) = "Lf(Pi)(g(O:i) - g(ai-l)). i=1

The value R( (r, <D)) is called a Riemann-Stieltjes sum of f with respect to 9 for the subdivision r.

This was all part of his passion for order in the world of mathematics. He could not stand untidiness in his chosen ter­ritory, blunders, obscurity, or vagueness, unproven assertions or half substantiated claims ... the man who did his job incom­petently, who spoilt Landau's world, re­ceived no mercy: that was the unpardon­able sin in Landau's eyes, to make a math­ematical mess where there had been order before. -G. H. Hardy and H. Heilbronn writing about Edmund Landau

For a subdivision r = {o:o, ... , Ctn } of [a, b], let

If! = max (O:i - ai-I) 1 <:::1.<:::n

denote the size of r. If the limit

lim R((r, <D)) II'I-->O

www.MathGeek.com

Page 52: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

The Lebesgue Integral 51

exists and is finite then that limit is called the Riemann-Stieltjes integral of f with respect to 9 on [a, b] and is denoted by

1 f(x) dg(x). [a, b]

(N ate that this limit does not depend on 1>.) If 9 (x) = x then I[a, b] f (x) dg (x) is simply the Riemann integral of the function f over [a, b].

3.1 Theorem If f is continuous on [a, b] and if 9 ,is of bmmded var-iation on [a, b] then the Riemann-Stieltjes integml of f with r-espect to 9 on [a, b] exists.

3.2 Theorem (Integration by Parts) fr

1 f(x) dg(x) [a. b]

exists then so does

Ja,b] g(x) df(x)

and

1 f(x) dg(x) = (f(b)g(b) - f(a)g(a)) -1 g(x) df(x). [a, b] [a, b]

3.3 Theorem If f is continuous on [a, b] and if 9 has a continuous der-ivative g' on [a, b] then

{ f dg = Ib f g' dx. J[a,b] a

3.3 The Lebesgue Integral

3.3.1 Simple Functions

Consider a measure space (0, y, J-L). A function f: 0 ---7 lR is said to be a simple function 1 if it has the form

n

f(w) = LCYJA;(W) i=l

lOr, more precisely, a rneasumble simple function.

www.MathGeek.com

Page 53: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

52 Integration

wheren E N, where ai E JR for each i, and where the Ai '8

are disjoint elements of .:F. Note that such a simple function is a measurable mapping from (0, .:F) to (JR, B(JR)). Note also that any function having the form given above with the A/s not disjoint may be written as a simple function by taking intersec­tions. For a simple function f as given above we will define the Lebesgue integral of f over 0 to be

Example 3.4 Let 0 = {Head, Tail} and let .:F = lP(O). Let J-L be a measure defined on (0, .:F) via J-L( {Head}) = 1/2 and J-L( {Tail}) = 1/2. Let f map 0 to JR via

f(w) = a1I {Tail}(W) + a2I {Head}(w)

where a1 and a2 are real numbers. Note that

D

3.3.2 Measurable Functions

Consider a measurable real-valued function f defined on (0, .:F) and assume that f (w) ~ 0 for all w E 0. Let Sf denote the set of all simple functions h defined on (0, .:F) such that 0 ::::; h( w) ::::; f (w) for all w E 0. For such a nonnegative measurable function f we define the Lebesgue integral of f over 0 to be

in f dJ-L = sup {in h dJ-L : h E Sf} .

Consider a measurable real-valued function f defined on (0, .:F) and let f+ and f- denote the positive and negative parts of f, respectively. That is, f+(w) = max{f(w), O} and f-(w) = max{ - f(w), 0) for each w E 0. Note that f+ and f- are non­negative measurable functions, that If I = f+ + f-, and that

www.MathGeek.com

Page 54: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

The Lebesgue Integral 53

j = j+ - j-. \Ve will define the Lebesgue integral of j over 0 to be L j d~ = L j+ d~ - L j- d~ provided that the two integrals on the right are not both equal to 00; if they are ea<:h infinite then we say that the Lebesgue in­tegral of j does not exist. The function j is said to be Lebesgue integrable if 10 j d~ exists and is finite. If A E :F then we will let

! j d~ = r j 1/1 dp . . A Jo

Note that the Lebesgne integral of a nonnegative measurable function always exists although the value of the integral may be 00.

\Ve have considered two important concepts that are as­sociated with Lebesgue. It is important not to confuse them. Lebesgue measure is a particular example of a measure that is only defined for certain subsets of the real line or ]R.k. The Lebesgue integral allows us to inte­grate real-valued measurable functions that are defined on any measurable space. In particular, the Lebesgue in­tegral is defined on general measure spa<:es and need not have any relation at all to Lebesgue measure. If, how­ever, we consider the Lebesgue integral with respect to Lebesgue measure on ]R.k then for certain functions we recover the familiar Riemann integral.

3.3.3 Properties of the Lebesgue Integral

Consider a measure spa<:e (0, :F, Jil A <:ondition is said to hold almost everywhere with respect to the measure ~ (written a.e. [,u]) if there exists a ,u-null set B su<:h that the mndition holds for all w in 0 \ B. For example, if 0 = ]R. and if ~ is Lebesgue measure then IQ(x) = 0 a.e. [~l. Lebesgue integrals satisfy the following properties:

www.MathGeek.com

Page 55: Probability - Basic Ideas and Selected Topics

54

www.MathGeek.com

Integration

1. If In f dp, exists and if k E lR then In kf dp, exists and equals k In f dp,.

2. If g(w) 2': h(w) for all w E 0 then

k9dP, 2': k hdp,

provided that these integrals exist.

3. If In f dp, exists then

4. If the Lebesgue integral of f and of 9 each exist then

provided that the right hand side is not of the form 00 - 00

or -00 + 00.

5. A real-valued measurable function f is integrable if and only if If I is integrable.

6. If f = 0 a.e. [p,l then In f dp, = o.

7. If 9 = h a.e. [p,l, if In 9 dp, exists, and if h is measurable then In h dp, exists and is equal to In 9 dp,.

8. If h is integrable then h is finite a.e. [p,l.

9. If h 2': 0 and In hdp, = 0 then h = 0 a.e. [p,l.

The following two results are the "work-horses" of real analysis:

3.4 Theorem (Monotone Convergence or B. Levi's Theorem) Let {fn}nEN be a sequence of measurable Teal-valued functions defined on 0 such that 0 ::::; h(w) ::::; h(w) ::::; ... faT all w E 0 and such that f n (w) ---7 f (w) as n ---7 00 faT all w E 0 faT some function f. The f1mdion f ,is measurable and

as n ---7 00.

www.MathGeek.com

Page 56: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

The Riemann Integral and the Lebesgue Integral 55

Proof. For a proof of this theorem, see page 172 of Real and Abstract Analysis by E. Hewitt and K. Stromberg (Springer­Verlag, New York, 196.5). D

3.5 Theorem (Dominated Convergence Theorem) Let {fn}nEN be a sequence of measurable real-valued functions defined on n s1lch that f(w) = limn->oo fn(w) exists for all wEn. If Ifni s g for some integrable function g and for each n E N then

1. f is integrable,

2. lim r Ifn - fl dp = 0, and, n->= Jo

Proof. For a proof of this theorem, see pages 172-173 of Real and Abstract Analysis by E. Hewitt and K. Stromberg (Springer-Verlag, New York, 1965). D

3.4 The Riemann Integral and the Lebesgue Integral

3.6 Theorem Let f be a bounded real-valued function defined on an interval [a, b]. If f is Riemann integrable on [a, b] then f is Lebesgue integrable wdh respect to Lebesgue measure on [a, b] and the two integrals are eq1wl.

Proof. This result is proved on pages 121-122 of Real Variables by A. Torchinsky (Addison-'Wesley, Redwood City, California, 1988). D

3.7 Theorem Let f be a bounded real-valued function defined on an interval [a, b]. The function f is Riemann integrable on [a, b] if and only if f is continuous a.e. on [a, b] with respect to Lebesgue measure.

www.MathGeek.com

Page 57: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

56 Integration

Proof. This result is proved on page 123 of Real Variables by A. Torchinsky (Addison-\Vesley, Redwood City, California, 1988). D

Note 3.1 The previous result holds for a bounded function defined on a bounded interval. In contrast, there do exist func­tions that possess improper Riemann integrals and yet are not Lebesgue integrable. The function sin( x) / x is such a function. (Recall that for an improper Riemann integral either the inte­grand or the interval over which the integrand is integrated is unbounded. )

Note 3.2 Let). denote Lebesgue measnre on (lR, B(lR)). If .f: lR ----+ lR is Lebesgue integrable then we will often denote the integral

kfd).

via the more familiar notation

k f(x) dx.

The second expression, however, is just our notation for the Lebesgue integral with respect to Lebesgue measnre and should not ordinarily be taken to refer to a Riemann integral.

3.5 The Riemann-Stieltjes Integral and the Lebesgue Integral

A fnnction F: lR ----+ lR is said to be right continnous if

limF(y) = F(x) ylx

for any x E R (The notation y 1 x means that y ----+ x with y > x.)

www.MathGeek.com

Page 58: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Caveats and Curiosities 57

3.8 Theorem Let F: ~ ---+ ~ be nondecreas'ing and Tight cont'iml­ous. Let f be a continuous, real-valued function defined on [a, b]. The function F induces a measure J-L on (~, B(~)) such that J-L((s, t]) = F(t) - F(s) for all s < t and such that

lb

f(x)dF(x) = 1 fdJ-L. a (a,~

Proof. This res nIt is proved on pages 5-9 of Frobabildy Theory by rr. G. Laha and V. K. Rohatgi (John 'Wiley, New York, 1979). This proof nses the Caratheodory Extension Theorem that is developed in Section 5.6 of this book. D

3.6 Caveats and Curiosities

www.MathGeek.com

Page 59: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

58 Integration

www.MathGeek.com

Page 60: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

4 Functional Analysis

4.1 Vector Spaces

Let X be a nonempty set and snppose that there exists a map­ping .f of X X X into X that is called the addition fllndion and is denoted by j(XI' X2) Xl + X2. Suppose also that there is a mapping 9 of lR X X into X that is called the scalar mlllti­plication fllndion and is denoted by g(n, x) cu. The set X endowed with two such mappings is called a real vector space if the following properties are satisfied:

1. x + y = y + x for all x and y in X.

2. x + (y + z) = (x + y) + z for all x, y, and z in X.

3. There exists in X a unique element denoted by 0 and called the zero element such that x + 0 = x for each x in X.

4. To each x in X there corresponds a llniqlle element in X denoted by -x such that x + (-x) = O. (\Ve will often write +(-x) as -x.)

5. n(x + y) = nx + ny for each n in lR and each x and y from X.

6. (a + (3)x = ax + /3x for each a and /3 in lR and each x in X.

7. a(/3x) = (n(3)x for each a and /3 in lR and each x in X.

8. Ix = x for each x in X.

9. Ox = 0 for each x in X where the 0 on the left is a real number and the 0 on the right is the element in X de­scribed in Property 3.

Consider a real vector space X (or, more precisely, a real vector space (X, j, g)). A finite set {Xl, ... , x n } of elements (vedors) from X is said to be linearly dependent (or consist of elements

www.MathGeek.com

Page 61: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

60 Functional Analysis

that are linearly dependent) ifthere exist real numbers (scalars) aI, ... , an, not all zero, such that alxl + ... + anXn = O. Oth­envise, the elements are said to be linearly independent. An infinite set is said to be linearly independent if every finite sub­set of it is linearly independent.

A nonempty subset 1\11 of a vector space X is called a subspace of X if x + y and ax are in AI for every a in Jl{ and every x and y fI·om !vI. A subspace !vI of X is said to be a proper subspace if !vI i= X. A subspace of a vector space is itself a vector space. The intersedion of any family of sllbspaces is itself a sllbspace.

Let S be a nonempty subset of a vedor space X and let £(S) be the set of all finite linear combinations of elements from S. That is, x E £(S) if and only if x = CtlXl + ... + anXn for some positive integer n and where Xi E Sand ai E Jl{ for each i. The set £(S) is a subspace of X and is called the linear manifold generated by S or the linear span of S.

If X is a vector space then there may be some positive integer n such that X contains a set ofo, vectors that are linearly indepen­dent while every set ofo, + 1 vectors in X is linearly dependent. In this case we say that X is finite-dimensional and of (iimen­siono,. The trivial vector space {O} has dimension O. If X is not finite dimensional then it is infinite dimensional. (The set Jl{k

endowed with the standard operations is k-dimensional. Spaces whose elements are functions are typically infinite-dimensional.) If X is n-dimensional for some positive integer n then there ex­ists a linearly independent set S consisting of n elements such that the linear span of S is X itself. Such a set is called a basis for X.

4.2 Normed Linear Spaces

A mapping from a vector space X into Jl{ is called a norm on X and is denoted by II . II if it satisfies the following properties:

1. Ilx + yll ::; Ilxll + Ilyll for each x and y from X.

www.MathGeek.com

Page 62: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Normed Linear Spaces 61

2. Iiaxil = lallixil for each a in lR and each x in X.

3. Ilxll :2: 0 for each x in X.

4. Ilxll = 0 if and only if x = o.

A nonempty set X is said to be a metric space if there exists a mapping P of X x X into lR (called a metric or distance function) such that:

1. P(XI' X2) :2: 0 for each Xl and X2 from X.

2. P(XI' X2) = 0 if and only if Xl = X2.

3. P(XI' X2) = P(X2' Xl) for each Xl and X2 from X.

4. P(XI' X3) :::; P(XI' X2) + P(X2' X3) for each Xl, x2, and X3 from X.l

An open ball centered at a point p in a metric space (X, p) is a set consisting of all points q in X such that p(p, q) < 'r for some fixed positive 'r. A point p is said to be a limit point of a subset E of X if every open ball centered at p contains a point q such that q i= p and such that q E E. The set E is closed if every limit point of E is an element of E.

A seqnence {Xi}iEN of elements from a metric space (X, p) is said to be a Cauchy sequence if for every E > 0 there exists an integer N snch that p(xn, xm) < E whenever n :2: N and 'Tn :2: N. A metric space in which every Canchy sequence converges to a point in the space is said to be a complete metric space.

A vector space X equipped with a norm is called a normed linear space. With the aid of this norm on X we can define a metric d on X by letting d(x, y) = Ilx - yll for each X and y from X. That is, a normed linear space is also a metric space. A normed linear space that is complete with respect to the metric induced by its norm is called a Banach space.

lThis property is called the Triangle Inequality.

www.MathGeek.com

Page 63: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

62 Functional Analysis

4.3 Inner Product Spaces

Consider a real vector space X. A mapping of X x X into lR is called an inner product on X and is denoted by (x, y) if it satisfies the following conditions:

2. (x, y) = (y, x).

3. (x, x) ~ O.

4. (x, x) = 0 if and only if x = o.

A vector space endowed with an inner product is called an inner product space or a pre-Hilbert space. An inner product may be

used to define a norm by letting Ilxll = ~. A complete inner product space is called a Hilbert space.

Mathematics is the one area of human en­terprise where the motivation to deceive has been practically eliminated. Not be­cause mathematicians are necessarily vir­tuous people, but because the nature of mathematical ability is such that decep­tion can be immediately determined by other mathematicians. This requirement of honesty soon affects the character of the continuous student of mathematics. -Howard Fehr

Two elements x and y in an inner product space X are said to be orthogonal if (x, y) = o. If S is a set of vectors from X then a vector y is said to be orthogonal to the set S if (x, y) = 0 for each x in S. A set S of vectors from X such that (x, y) = 0 for any distinct elements x and y from S is said to be

www.MathGeek.com

Page 64: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Inner Product Spaces 63

an orthogonal set. An orthogonal set S of vectors from X such that Ilxll = 1 for each xES is said to be an orthonormal set. An orthonormal subset of X is said to be total if there exists no orthonormal subset of X of which S is a proper subset. For a subspace AI of a Hilbert space H let A1.l (pronounced. 'M perp') denote the subspace of H consisting of all elements in H that are orthogonal to every element in M.

Example 4.1 The vectors [0, 1] and [1, 0] comprise a total orthonormal subset of the inner product space ]R.2 where the inner product is simply the vector dot product. D

4.1 Theorem (Bessel's inequality) Let {'U1, 'U2, ... } be an or­thonormal subset of an inner product space X. Then, for each x EX, it follows that:

(Xl

L\X, Uk)'2 ::; Ilxll'2· k=l

4.2 Theorem Let {U1,U2, ... } be an orthonormal subset of a Hilbert space X. Each of the follollJ'ing conditions is necessary and sufficient for the or-ihonormal set to be total:

(Xl

1. x = L\X, un)un for each x EX. n=l

(Xl

2. Ilxll'2 = L \X, Un )2 for each x E X.'2 n=l

4.3 Theorem (Parallelogram Law) In an inner product space, the following eq1wlity holds for any two elements x and y of the space:

4.4 Theorem If {Xl, ... , xn} is an orthonormal subset of a Hilbert space H and if x E H then

Fl,

X - Lajxj j=l

2This equality is called Parseval's identity.

www.MathGeek.com

Page 65: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

64 Functional Analysis

is minimized when aj = (x, Xj) faT j = 1, ... ,n. (That is, the aj's provide the coefficients for a best linear estimator of x in terms of the Xj's.)

Proof. Note that

2

where we note that

since the x;'s are orthonormal. Thus, we have

n

x- ~ax· L J J j=1

slllce

2 n

IIxI1 2 + L (a; - 2aj(x, Xj)) j=1

n

IIxI1 2 + L ((aj - (x, Xj))2 - (x, Xj)2) j=1

222 (aj - (x, Xj)) = aj - 2aj(x, Xj) + (x, Xj) .

Thus, we have

2 n n n

0::::; X - Lajxj j=1

= IIxI1 2 - L(x, Xj)2 + L(aj - (x, Xj))2,

j=1 j=1

which is minimized when aj = (x, Xj). D

A subset E of a vector space X is said to be convex if it has the following geometric property: Vlhenever x and yare in E and o < t < 1 then the point (1 - t)x + ty is also in E. That is, convexity requires that E contain the line segment between any two of its points.

www.MathGeek.com

Page 66: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Inner Product Spaces 65

4.5 Theorem Let !vI be a nonempty closed convex subset of a Hilbert space H. fr x E H, then there is a un'ique element Yo E All such that Ilx - yoll = inf{llx - yll : Y E 1\1}. The element Yo is called the projection of x on 1'1.

Proof. Let d = inf{llx - yll : Y E !vI} and choose points Yl, Y2, ... E !vI such that Ilx - Ynll ----Jo d as n ----Jo 00. \lVe will show that {Yn}nEN is a Cauchy sequence.

The parallelogram law states that Ilu+vl1 2 + Ilu -1)11 2 = 211ul12 + 211vl12 for all u and v in H. Let u = Yn - X and v = Ym - X to obtain:

llYn + Yrn - 2xl12 + llYn - Yrnl1 2 = 211Yn - xl1 2 + 211Yrn - x11 2,

or,

llYn - Yrnl12 = 211Yn - xl1

2 + 211Ym - xl1

2 - 411~(Yn + Ym) _ xl1

2

Since ~(Yn + Yrn) E M (by convexity), it follows that

Thus

Since the right hand side of this expression goes to 0 as '17" m ----Jo

00 it follows that {Yn}~=l is Cauchy.

Since H is complete, Yn converges to some limit Yo E H as n ----Jo 00. Thus, Ilx - Ynll ----Jo Ilx - Yoll as n ----Jo 00. But then II x - Yo II = d and Yo E }./I since 1\1 is closed. Thus such an element Yo exists.

To prove uniqueness, let Yo, Zo E 1\1 with Ilx-yo II = Ilx-zoll = d. In the parallelogram law, let u = Yo - x and v = Zo - x to obtain:

But, 2

Ilyo + Zo - 2xl12 = 411~(YO + zo) - xii ;:::: 4d2

.

Thus, Ilyo - zoll = 0, which implies that Yo = zoo D

www.MathGeek.com

Page 67: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

66 Functional Analysis

4.6 Theorem Let All be a closed subspace of a Hilbert Space H, and

let Yo be an element of A1. Then Ilx-yoll = inf{llx-YII : Y E A1} iff x - Yo ~ 1'1, i.e. iff (x - Yo, y) = 0 for all Y E M.

Proof. Assume that x - Yo ~ !vI. If Y E 1W then

Ilx - Yo - (y - Yo)11 2

Ilx - yol12 + Ily - yol12 - 2(x - Yo, Y - Yo)

Ilx - Yol12 + Ily - Yol12 ~ Ilx - Yol12

since Y - Yo E M. Thus, Ilx - yoll = inf{llx - yll : Y E !vI}.

Assume now that Ilx - Yoll = inf{llx - yll : Y E !vI}. Let Y E All and let c be a real number. Since A1 is a subspace it follows that Yo + cy E lW. Thus Ilx - Yo - cY11 ~ Ilx - Yoll. Bnt,

II x - Yo - cy 112 = II x - Yo 112 + c211 Y 112 - 2 (x - Yo, cy).

Thus, c211YI12 - 2(x - Yo, cy) ~ o.

Let c = b(x - Yo, y) for some b E R Then

2 (x - Yo, cy) = (x - Yo, b(x - Yo, y)y) = b(x - Yo, y) .

Thus,

b2(x - Yo, y)211Y112 - 2b(x - Yo, y)2

(x - Yo, y)2(b21IYI12 - 2b) ~ o.

But (b2 11Y112 - 2b) < 0 if b is small and positive. Thus (x - Yo, y) = o. D

4.7 Theorem (Hilbert Space Projection Theorem) Let A1 be a closed subspace of a Hilbert space H. If x E H, then x has a 1mique representation x = y + z where y E M and z E !vI ~ . Furthermore, y is the pTOjection of x on A1; that is, y is the nearest pO'int ,in M to x.

Proof. Let Yo be the projection of x on !vI (see Theorem 4.5) and let y = Yo and z = x - Yo. Theorem 4.6 implies that z E

www.MathGeek.com

Page 68: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Inner Product Spaces 67

M..1. Thus such a representation exists. To prove uniqueness, let x = y + z = y' + z' where y, y' E AI and z, z' E 1\1..1. Then y - y' E NI since AI is a subspace and y - y' E 11/1..1 since y - y' = z' - z. Thus y - y' is orthogonal to itself which implies that y = y'. But then z = z', which proves uniqueness. D

4.8 Theorem (Riesz-Frechet) Cons'ider- a r-eal Hilber-t space H. Ever-y bounded linear- function f : H ----7 lR. may be expr-essed as an inner- product on H. That is, ever-y bounded linear- function f : H ----7 lR. may be expr-essed in the for-m f(h) = ~h, z) wher-e z E H ,is uniq1Lely deter-mined by f and has nor-m Ilzll = Ilfll·

Proof. If f = 0 then let z = 0 and note that f(h) = ~h, z) and Ilzll = Ilfll = O. Assume that f i- 0 and note that z i- O. Let N (f) denote the null space of f; that is, N (f) consists of those points h in H su<:h that f (h) = O. Note that z 1.. N (f) sin<:e ~h, z) = 0 for all h E N(f).

Note that N(f) is a vector space. That is, if 'U and 1) are in N(f) then, since f is linear, f(u) + f(v) = f(u + v) = O. Further, if a is a scalar and if u E fl(f) then au E N(f) since af(u) = f (au) = O. Note also that N (f) is closed since f is a bounded, linear, and hence continuous, map. Thus, since f i- 0 it follows that N(f) i- H and hence, via Theorem 4.7, that J-/(f)..1 i- {O}.

Let Zo be any nonzero element of N (f)..1, and let 1) = f (x) Zo -f(zo)x for some fixed x E H. Applying f to ea<:h side implies that f(v) = f(x)f(zo) - f(zo)f(x) = O. That is, 'U E N(f). Fnrther, sin<:e Zo E N(f)..1 it follows that ~/}, zo) = 0 = f(x)~zo, zo) - f(zo)~x, zo) whkh implies that f(x)llzoI12 - f(zo)~x, zo) = O. Solving for f(x) implies that

f(zo) f ( x) = II Zo 112 ~ x, zo)

where we recall that IIzol12 > O. Finally, we may rewrite f(x) as ~x, z) where

D

www.MathGeek.com

Page 69: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

68 Functional Analysis

<:; 4.4 The Radon-Nikodym Theorem

4.9 Theorem Consider cr-finite measures J-L and v defined on a measumble space (O,:F) such that any J-L-null set is also a v­null set. There exists an a. e. [J-Ll unique:F -measumble function h : 0 ---7 lR. such that

v(F) = k h dJ-L

for all F E :F.

4.5 Caveats and Curiosities

www.MathGeek.com

Page 70: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

5 Probability Theory

5.1 Introduction

Modern probability theory is a branch of measnre theory that is distingllished by its special emphasis and applications. Mllch of the terminology of probability theory was established hun­dreds of years ago by people sllch as Pascal, Fermat, Bernoulli, Laplace, and Gallss. While this historical foundation provided much of the current vocabulary used in probability, it did not provide a rigorous mathematical basis for probability theory. Near the end of the nineteenth centnry, C. S. Peirce, the founder of pragmatism, wrote:

This branch of mathematics [probability] is the only one, I believe, in which good writers frequently get resllits entirely erroneOllS. In elementary geometry the reasoning is freqllently fallac:iolls, but erroneous conclusions are avoided; but it may be doubted if there is a single extensive treatise on probabilities in existence which does not contain soilltions absolutely indefensible. This is partly owing to the want of any reglliar methods of procednre; for the subject involves too many sllbtleties to make it easy to pllt problems into equations without such aid.

At the beginning of the twentieth century measure theory was established primarily through the work of Henri Lebesgue. In 1929 Andrei Kolmogorov developed a measure-theoretical ap­proach to probability theory and established probability theory as a rigorous mathematical theory. 1 Thus, much of the vocab­ulary of probability theory was established hllndreds of years before the vocabulary of measure theory was established. Con­sequently, many concepts have different names when seen from the perspectives of probability theory and measure theory. For

lThis incident seems to have been overlooked by a large part of the engineering community.

www.MathGeek.com

Page 71: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

70 Probability Theory

example, an event in probability theory is a measurable set in measure theory. On the other hand, there are concepts such as statistical independence in probability theory that have no analog in measure theory.

5.2 Random Variables and Distribu­tions

Consider a probability space (n, F, P); that is, consider a mea­sure space (n, F, P) such that p(n) = 1. A real-valued F­measurable function defined on n is said to be a random vari­able defined on (n, F, P). That is, X is a random variable if X: (n, F) ---+ (JR, B(JR)). Note that X is a fundion and X(w) is a real number. Note, also, that if Hand P2 are probability mea­sures defined on (n, F), then a random variable X defined on (n, F, P1 ) is also a random variable defined on (n, F, P2 ). A random variable X defined on (n, F, P) is said to be a bounded random variable if there exists some real number B such that IX(w)1 < B for all wEn.

The probability distribution function of a random variable X is the function F: JR ---+ [0, 1] defined by F(x) = P(X :::; x) where P(X :::; x) denotes the probability of the event {w E n: X (w) :::; x}. (How do we know that this set is an event?) If sev­eral random variables are under consideration we may denote the distribution function of X by Fx. A probability distribu­tion function F of a random variable X satisfies the following properties:

1. lim F(x) = o. x-----t-CXJ

2. lim F(x) = 1. X-----t(X)

3. If x < y then F(x) :::; F(y).

4. F is right continuous.

www.MathGeek.com

Page 72: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Random Variables and Distributions

[In statistics] you have the fact that the concepts are not very dean. The idea of probability, of randomness, is not a clean mathematical idea. You cannot prodnce random nnmbers mathematically. They can only be produced by things like toss­ing dice or spinning a roulette wheel. With a formula, any formula, the number you get wonld be predidable and therefore not random. So as a statistician you have to rely on some conception of a world where things happen in some way at random, a conception which mathematicians don't have. -Lucien LeCam

5. P(x < X ::::; y) = F(y) - F(x).

6. P(X > x) = 1 - F(x).

7. P(X = x) = F(x) -limF(y). yTx

71

8. P(X = x) = 0 for x E lR if and only if F is continuous at x.

Exercise 5.1 Consider a probability distribution function F. Show that limx---+_= F(x) = O.

Exercise 5.2 Consider a probability distribution function F. Show that F is right continuous.

Exercise 5.3 Consider a random variable X defined on a probability space (D, F, P). Show that P(X = x) = F(x) -limyTx F(y) where F is the probability distribution function of X.

www.MathGeek.com

Page 73: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

72 Probability Theory

Example 5.1 If F is continuous then P(X = x) is zero for each x E R Does this mean that X cannot take on the value x for any x E lR? No. Consider a dart that lands in a circular dart board of unit area in such a way that the probability that the dart lands in any particular circular region of the board is simply the area of that region. Since a single point on the board can be enclosed within a circle of arbitrarily small area it follows that the probability of hitting any particular point is zero. Thus, even though our (idealized) dart will hit a point when thrown, the probability that it will hit that point is zero before it is thrown. D

A random variable is said to be discrete if it takes values only in some countable subset of R A probability distribution func­tion is said to be atomic if it is continnons except at most a conntable nnmber of points and if it is constant between any two adjacent points from the union of the set of discontinuities with {-oo, oo}. A probability distribution function F is said to be absolutely continuous if

F(x) = [xao f(t) dt

for some integrable Borel measurable function f. If a function is absolutely continuous then it is continuous, but there do ex­ist continuous functions that are not absolutely continuous. A probability distribution function F is said to be singular if

d dxF(x) = 0 a.e.

with respect to Lebesgue measure. If a distribution function is atomic then it is singular, but there do exist singular distribution functions that are not atomic.

The following corollary relates our definition of a random vari­able to a definition that is frequently found in introductory texts.

5.1 Corollary Consider a measurable space (n, F) and let X be a function mapping n to R It follows that X- 1

(( -00, xl) E F for each x E lR if and only 'if X- 1 (B(lR)) c F.

Proof. It follows immediately that if X- 1 (B(lR)) c F then X -1 ( ( - 00, x]) E F for each x E R Further, using Theorem 1. 7

www.MathGeek.com

Page 74: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Random Variables and Distributions 73

it follows that X-1 (B(lR)) c :F if X-1((-00, xl) E :F for each x E lR. D

5.2 Corollary Any continuous function mapping lR to lR is Borel measurable.

Proof. Let.l: lR --+ lR and recall that, for any subset A of R f-l(AC) = (.f-l(A)t Next, assume that .I is continuous and recall from Theorem 2.4 that for any open subset U of lR, .1-1 (U) is open. Thus, we see that for a closed subset K of lR, .I-1 (K) is closed. Further, from Corollary 5.1 it follows that f is Borel measurable if and only if for each x E lR, .1-1 (( -00,

xl) E B(lR). Note that for any x E lR, (-00, xl is a closed set since it is the complement of the open set (x, (0). Thus, for any x E lR, f- 1

(( -00, xl) is closed and hence is a real Borel set. \\'e thus conclude that f is Borel measurable. D

I told him, I'm a scientist, we're objective. I told him a crash was improbable. I was trying to remember the exact probability when we smashed into the ground. -27 year old botanist Wim Kodman trying to calm a friend as their jet flew through tur­bulence.

5.1 Theorem (The Lebesgue Decomposition Theorem) Any pTObability distribution function F may be written in the form F(x) = Ci:lFl(X) + cC2F2(X) + a3F:3(x) where Ci:i ~ 0 for each i, where al + a2 + a3 = 1, and where

1. Fl is an atom'ic pTObability distribution function,

2. F2 is an absolutely continuous pTObability distribution function, and

3. F3 'is a singular, cont'lmLOus pTObabildy distr'ibution func­tion.

www.MathGeek.com

Page 75: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

74 Probability Theory

Note 5.1 The Cantor-Lebesgue function (which is developed by Example 2.6 on page 54 of Counterexamples in Probability and Real Analysis2 by Gary VVise and Eric Hall) is an example of a distribution function that is continuous and singular. In particular, it is equal to zero at zero, is equal to one at one, and is nondecreasing and continuous, yet has a derivative that is almost everywhere equal to zero.

Consider a random variable X that possesses a probability dis­tribution function F that is absolutely continuous. There exists a nonnegative Borel measurable function 1 mapping JR to JR such that

P (X E A) = L 1 (x) dx

for any real Borel set A. Such a function 1 is called a probability density function of the random variable X and exists if and only if the probability distribution function of X is absolutely con­tinuous. A probability density function 1 for X is often denoted by Ix. Note that if X possesses an absolutely continuous distri­bution function F then F is a.e. differentiable with respect to Lebesgue measure and X possesses a probability density func­tion given by the derivative of F at points where the derivative exists and defined to be an nonnegative value at points where the derivative does not exist.

Let X be a random variable with an absolutely continuous prob­ability distribution function F and a probability density function f. The function 1 satisfies the following properties:

1. F(x) = 1 l(s) ds. (-00, x]

2. k 1 (x) dx = 1.

3. P(a:S; X :s; b) = P(a < X < b).

<:; Note 5.2 Consider a random variable X defined on a prob-ability space (r2, F, P) and the corresponding measure /-Lx de­fined on (JR, B(JR)) such that /-Lx(B) = P(X E B) for each

20xfonl University Press, 1993.

www.MathGeek.com

Page 76: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Independence 75

B E B(JR.). If px is absolutely continuous with respect to Lebesgue measure A defined on (JR., B (JR.)) then there exists a Radon-Nikodym derivative dpjdA. A nonnegative version of this Radon-Nikodym derivative is known as a probability den­sity function of X. Note from the Radon-Nikodym Theorem that such a fundion must be Borel measurable. Thus, there exist nonnegative integrable functions that integrate to one, yet which are not probability density fundions.

5.3 Independence

Consider a probability space (n, 5", P). Recall that elements of 5" are said to be events. Two events A and B are said to be independent if P(A n B) = P(A)P(B). Consider an index set I and let Ai be an event for eachi E I. The sets {Ai :i E I} are mutually independent3 if for every finite collection {h, i 2 , ... ,

id of distinct indices from I it follows that

The sets {Ai : i E I} are said to be pairwise independent if P(Ai n Aj) = P(AJP(Aj ) for all i and j from I with i i= j. If the index set I contains only two elements then mutual inde­pendence and pairwise independence are equivalent. In general, however, pairwise independence is implied by, but does not im­ply, mutual independence.

Note 5.3 Consider three events A l , A2 , and A3 . The fol­lowing chart illustrates the difference between pairwise indepen-

3Many authors omit the word "mutually," but we prefer to retain it as a way of reinforcing the distinction between mutual independence and pairwise independence.

www.MathGeek.com

Page 77: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

76 Probability Theory

dence and mutual independence of the three events.

mutual independence

pmnVlse independence

P(A I n A2) = P(Al )P(A2)

P(A2 n A3 ) = P(A2)P(A3 )

P(A I n A3 ) = P(Al )P(A3 )

P(AI n A2 n A3 ) = P(Al )P(A2)P(A3 )

Consider a probability space (n, F, P). Let F l , F2 , ... , Fn be subsets (not necessarily IT-sub algebras) of F. (That is, each Fi is a collection of events.) The collection F l , F2 , ... , Fn are said to be mutually independent if given any Al E F l , any A2 E F2, ... , and any An E Fn, it follows that AI, A2, ... , An are mutually independent.

[Cantor's theory] seems to me the most admirable fruit of the mathematical mind and indeed one of the highest achievements of man's intellectual processes .... No one shall expel us from the paradise which Cantor has created for us. -David Hilbert

Let X be a random variable defined on (n, F, P). We define the IT-algebra generated by X (denoted by IT(X)) to be the small­est O"-subalgebra of F with respect to which X is measurable. That is, IT(X) = X-l(B(lR.)). For a collection Xl, ... , Xn of random variables we will let IT(Xl, ... , Xn) denote the smallest IT-algebra with respect to which Xl, ... , Xn are each measur­able. Note that

Random variables Xl, X 2 , ... , Xn defined on (n, F, P) are said to be mutually independent if IT(Xd, IT(X2 ), ... , IT(Xn) are mutually independent collections of events.

www.MathGeek.com

Page 78: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Independence 77

5.2 Theorem For an 'integer n > 1, cons'ider mutually independent random variables Xl, X 2 , ... , Xn defined on a common probabil­ity space. Let Tn be a positive integer such that Tn < rI. Further, consider two functions f and g such that f : (JR.m, B (JR.m)) ---+

(JR., B(JR.)) and g : (JR.n-m, B(JR.n-m)) ---+ (JR., B(JR.)). The ran­

dom variables f(X1 , ... , Xm) and g(Xm+l, ... , Xn) are 'inde­pendent.

5.3 Theorem (The Second Borel-Cantelli Lemma) Consider a pTObabildy space (0, F, P). If {An}nEN is a sequence of mu­tually independent events and if

00

then P(lim sup An) = 1.

Proof. Since limsupAn = (liminf A~Y and since P(A) + P(AC) = 1 for any event A the desired result will follow if we show that P(lim inf A~) = O. Recall that

= = liminf A~ = U n A~n·

k=l m=k

By countable subadditivity it follows that

P(lim inf A~) ::; ~ P CQk A~n) . Thus, the desired result will follow if we show that

for all kEN. Let j E N and note that

via independence ofthe An's (and hence of the A~'s). Note that 1 - x ::; e-X for all x E JR. and, in particular, for all x E [0, 1].

www.MathGeek.com

Page 79: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

78 Probability Theory

Thus, it follows that

r Cd A~;) k+j

II (1 - P(An)) n=k

k+j

< II exp( - P(An)) n=k

Since L~=l P(An) = 00 we see that

---+0

and hence that

(

k+j )

P nQk A~ ---+ 0

as j ---+ 00 for any kEN. Since

k+j 00

n A~ ---+ n A~ n=k n=k

as j ---+ 00 the desired result follows from Lemma 2.1. D

Example 5.2 An adaptive communications system transmits blocks of bits where each block contains a fixed number of bits. Let Xn equal 1 or 0 depending on whether an error occurs or does not occur in block n, respectively. Assume that the Xn's are mutually independent. Further, let Pn denote the probability that Xn = 1. In this example we will derive a condition of the Pn's that is necessary and sufficient for there to be almost surely only a finite number of errors.

Let En denote the event that the nth block of data has an error. That is, let En denote that event that Xn = 1. Thus, if [l is the set of all possible sequences of received bits, then wEEn if and only if sequence w contains an error in block n. Note that lim sup En is the event that infinitely many errors occur.

www.MathGeek.com

Page 80: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Independence 79

That is, lim sup En is the set of w such that wEEn for infinitely many different values of n. Thus, our problem is to determine a condition that is both necessary and sufficient to ensure that P(lim sup En) = O. If this probability is zero then with proba­bility one there will be only a finite number of errors.

By the first Borel-Cantelli lemma we know that if L:~=1 Pn < 00

then P(lim sup En) = o. Fnrther, the second Borel-Cantelli lemma implies that if L:~=1 P(En) = 00 then P(lim sup En) = 1. That is, there will almost surely be infinitely many errors. Thus, it is necessary that L:~=1 Pn < 00 for there to be almost surely only a finite number of errors. Finally, we conclude that L:~=1 Pn < 00 occurs if and only if there are almost surely a finite number of errors. D

Example 5.3 Although we know that all years are not of equal length, and although we might suspect that all days of the year are not eqnally likely to be birthdays, we will neverthe­less make the simplifying assnmptions that all years have 365 days and that each day is equally likely to be a birthday. This example is concerned with the probability of the existence of a common birthday between any two or more people among a given group of people. It seems easier to calculate the prob­ability that each of the birthdays are different. Note that for two people, the probability of no common birthday is given by 1 - (1/365); that is, the first person has some birthday, and the second person then has 364 possible days for a noncommon birthday. Further, for three people, the probability of no com­mon birthday is given by (1 - (1/365))(1 - (2/365)), and, for four people, the probability of no common birthday is given by (1- (1/365))(1- (2/365))(1- (3/365)). Continuing in this way, we see that for n people (where n is a positive integer less than 365), the probability of no common birthday is given by

( 1 - _1 ) (1 - ~) (1 - ~) x ... x (1 - ~) . 365 365 365 365

Checking this numerically, we find that foro, = 23, this probabil­ity is less than 1/2. Thus, for 23 or more people, the probability that at least two have a common birthday exceeds 1/2. D

www.MathGeek.com

Page 81: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

80 Probability Theory

5.4 The Binomial Distribution

Consider a finite sequence of n terms taking the values of Hand T. Let N (n, k) denote the nnmber of snch seqnences of length 17 having exactly k H's. Note that if we know this quantity for sequences of length n - 1 then we see that in sequences of length 71, the sequences that have exactly k H's are given by those which have exactly k H's in the first n - 1 terms and a T for the nth term and those sequences that have k - 1 H's in the first 17 - 1 terms and an H in the nth term. Hence, N(n, k) = N(n - 1, k) + N(n - 1, k - 1). Next, use induction, and assume that

n! N(17, k) = k!(17 _ k)!·

(We use the convention that zero factorial is one.) Assume that this expression is correct for n - 1. Then,

N(n, k) = (n-1)! (n-1)!

k!(n - 1 - k)! + (k - l)!(n - k)!

n! (n - k k) k!(n - k)! -17- +;:;-

n! k!(n - k)!·

Note that for k = 0 or for k = 17, it follows straightforwardly that N(17, 0) = 1 and N(17, 17) = 1. For 17 = 1 we have that N(l, 0) = 1 and N(l, 1) = 1. Thus, the general result follows by induction, and we conclude that the number of ways of selecting k items from a set of 17 items is given by

n! k!(17 - k)!

which is denoted by

(~) and read as "17 choose k."

www.MathGeek.com

Page 82: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

The Binomial Distribution

He uses statistics as a drunken man uses lampposts-for support rather than illumi­nation. -Andrew Lang

81

In many of the elementary aspects of probability, a sequence of mutually independent trials whose only outcomes are success or failure is considered such that the probability of success is fixed from trial to trial. Such trials are called Bernoulli trials.

Consider a finite sequence of 17 Bernoulli trials where the proba­bility of success on a trial is given by p. vVe model the underlying probability space as the set of all seqnences of length 17 consist­ing of S's and F's. Let q = 1 - p. \Ve assign a sequence of n S's and k - 17 F's to have probability pnqk-n. Now, consider the probability of getting exactly k S's in 17 trials. Each s11ch seqnence has probability pnqk-n. Fnrther, there are (~) snch sequences. Since probability measnres are conntably additive, it follows that to find the probability of obtaining exactly k S's in n trials, we simply multiply the common probability of one such sequence by the total number of such sequences. Hence, the probability of obtaining exactly k S's in 17 trials is given by

Further, note that the event of having exactly kl successes is disjoint from the event of having exactly k2 successes if kl i- k2. Thus, we see that the probability of having no more than T successes in 17 trials is given by

~ (~)pkqn-k. A random variable X taking values in the set {O, 1, ... , n} for some positive integer n such that

for some p E [0, 1] is said to have a binomial distribution with parameters p and n.

www.MathGeek.com

Page 83: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

82 Probability Theory

5.4.1 The Poisson Approximation to the Binomial Dis­tribution

Let b(k; n, p) denote the probability of obtaining exactly k suc­cesse8 in n Bernoulli trials where p denotes the probability of suc­ce8S. It is common to deal with a binomial distribution where, relatively speaking, the parameter n is large and the parame­ter p is small, and yet the product ..\ = np is positive and of moderate size. In such cases it is often convenient to use an approximation that is due to Poisson.

For k = 0 it follows that

b(O;17, p) = (1- pyn = (1- ~)n

Taking logarithms and using Taylor's expansion yields

(..\) ..\2 Inb(O; 17, p) = 17ln 1 - - = -..\ - -. - .... 17 217

Thus, for large 17, it follows that b(O; 17, p) ~ e-).. Alternatively, we could have obtained this result by recalling that for fixed A,

( ..\)71 lim 1 - - = e-).

71--'= 17

Also, for any fixed positive integer k, it follows that for suffi­ciently large n,

b(k;n,p) "\-(k-1)p..\ b(k - 1; n, p) = k(l - p) ~ k'

From this we successively conclude that

b(l; n, p) ~ ..\b(O; n, p) ~ ..\e->.,

and, 1 1 2 -).

b(2; n, p) ~ "2..\b(l; n, p) ~ "2..\ e .

Induction thus implies that

www.MathGeek.com

Page 84: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Multivariate Distributions 83

This is the classical Poisson approximation to the binomial dis­tribution.

Let ),k

p(k;)') = e->'kT'

\Ve have shown that p(k;)') is an approximation for b(k; 17, p) when n is sufficiently large. Note that

= = ),k L p(k;)') = e->' L -, = 1. k=O k=O k.

5.5 Multivariate Distributions

For a positive integer n, consider random variables Xl, X 2 ,

... , Xn defined on a probability space (r2, :.F, P). The joint probability distribution function of Xl, ... , Xn is the func-tion F : jRn ----Jo [0, 1] defined by F(XI' ... ,xn) = P(XI :::; Xl and ... and Xn :::; xn). \lVe will often denote the function F by FX1 , ... , Xn when the particular random variables of interest are not dear from context.

The random variables Xl, ... , Xn possess a joint probability density fllnction f if there exists a nonnegative Borel measnrable function J : JRn ----Jo JR such that

P((XI' X 2, ... , Xn) E A)

= j J(XI' X2, ... , xn) dXI dX2' .. dXn A

for all A E B(JRn). Note that the integral of J over JRn is equal to 1.

For a positive integer n consider random variables Xl, ... , Xn defined on the same probability space and possessing a joint probability density function JX1, ... ,X.", For any positive integer i :::; n, the random variable Xi possesses a probability density function Jx; given by

Jx; (Xi) = r JX1' ... , Xn (Xl, ... , Xn) dXI ... dXi-1 dXi+1 ... dxn. Jrrt.n - 1

www.MathGeek.com

Page 85: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

84 Probability Theory

A density function obtained m this way IS called a marginal density function.

5.4 Theorem For a positive integer n consider mndom variables

Xl, ... , Xn defined on the same pTObability space. The mn­dam variables Xl, ... , Xn are mui1tally independent if and only

if FXI, ... ,X,,(XI, ... ,xn ) = FXI(XI)" . Fx,,(xn) for all Xl, ... , Xn E JR.

5.5 Theorem For a positive integer n consider mndom variables Xl, ... , Xn defined on the same probability space and possess­ing a joint pTObability density function fXI' ... , Xn ' The mndom variables Xl, ... , Xn are rrmtually independent 'if and only if fXI, ... ,Xn(XI, ... , xn) = fxI(xd···fxn(xn) a.e. with 'respect to Lebesgue measure on B(JRn).

A random variable X is said to have a nniform distribution on an interval [a, b] if

0 if X < a

Fx(x) = x-a if a < x ::::; b

b-a

1 if X> b.

Note that a density function for X is given by

1 fx(x) = b _ a I[a,b](x),

If we knew that the outcome of an experiment resulted in values from some interval [a, b] but had no reason to believe that those valnes wonld tend to concentrate toward any particnlar part of that interval then we might choose to model the experiment via a uniform distribution.

Example 5.4 In this example we will consider a problem known as Buffon's needle problem, which was an early example of a problem solving technique called Monte Carlo analysis in which a nonprobabilistic problem is solved using probabilistic techniques. Consider a plane that is ruled by the lines y = 17 for 17 E Z and onto which a needle of unit length is cast randomly.

www.MathGeek.com

Page 86: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Multivariate Distributions 85

\Vhat is the probability that the needle intersects one of the ruled lines?

Let (X, Y) denote the coordinates of the center of the needle and let e denote the angle between the needle and the x axis. Let Z denote the distan<:e from the needle's <:enter to the nearest line beneath it. Note that Z = Y - l Y J where l x J (the floor of x) denotes the greatest integer not greater than x.

\Ve will model the statement "needle is cast randomly" via the following assumptions:

1. Z is uniformly distributed on [0, 1].

2. e is uniformly distributed on [0, 1T].

3. Z and e are independent.

Note that these assnmptions imply that

. 1 . iZ,e(z, 8) = iz(z)ie(8) = -1[0. 1] (z)1[O,1f] (8).

1T

For what values of z and 8 will the needle intersect the line immediately above its center? If z < 1/2 then the needle cannot intersect the line above its center. Assume then that 1/2 :::::; z :::::; 1. In this case the needle intersects the line directly above its <:enter if and only if 80 :::::; 8 :::::; 1T - 80 where 80 = sin-1(2(1 - z)). Thus, the probability that the needle intersects the line above it is given by

111 l1f-sin-l(2(1-Z)) - d8dz 1T 1/2 sin- 1 (2(1-z))

1 2lo1/2 - - - sin-1(2y) dy 2 1T 0

1 2 [ 1 ] 11/2 "2 -:; y sin -1 ( 2y) + "2 viI - 4y2 0

1

By symmetry the needle has the same probability of hitting the line directly beneath its <:enter. Thus, the probability that the needle hits any line on the grid is given by 2/1T.

www.MathGeek.com

Page 87: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

86 Probability Theory

Note that this experiment can be used to obtain an estimate of the numerical value of 1l". That is, throw the needle N times and count the number of times H that the needle hits a line. The ratio 2N / H should be close to 1l" for large values of N. Indeed, we will show later that this ratio converges to 1l". Solving a deterministic problem via probabilistic techniques is an example of a technique known as Monte Carlo simulation. D

5.6 Caratheodory Extension Theo­rem

Let D be a nonempty set, and let A be an algebra of subsets of D. That is, A is a nonempty set of subsets of n that is closed under the operations of taking complements and finite unions. Recall that it follows from DeMorgan's Law that an algebra is also closed under the operation of taking finite intersections. Further, recall, that an algebra on D contains both the empty set and the set D.

By a measure A on an algebra A we mean a function A defined on A and taking values in [0,00] that satisfies the following two properties:

1. A(0) = 0 and, for A E A, A(A) ;:::: o.

2. If {An}nEN" is a sequence of disjoint sets in A whose union UnEN" An also belongs to A, then

Note that when a countable union of disjoint sets in the algebra is itself in the algebra then we require that the measure on the algebra must behave as if it were a measure.

www.MathGeek.com

Page 88: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Caratheodory Extension Theorem 87

Let 0 be a nonempty set. A function M defined on JP(O) and taking values in [0,00] is called an outer measure if it satisfies the following three properties:

1. 1\1(A) 2': 0 for all A E JP(O), and 1\1(0) = O.

2. 1V1(A) ::::; M(B) if A c B c O.

3. 1\1 (91 Ak) ::::; %i M(Ak) for any sequence {Ak}kEN of

subsets of O.

As an example of an outer measure, note that Lebesgue outer measure is an outer measure on JP(JR). Further, note that Dirac measure at a fixed point of a set is an outer measure on the family of all subsets of the set of interest.

As with outer Lebesgue measure, it is possible to use an outer measnre to characterize a family of measnrable sets. In doing so, we base the definition of measurability on Caratheodory's condition. For a given outer measure AI, we say that a subset S of 0 is measnrable if J\J(A) = 1V1(AnS)+Al(A\S) for any snbset A of O. This condition has somewhat of an artificial touch to it. It almost seems mysterions, since it is not in the least intnitive. Indeed, it singles out the subsets S of 0 which when split by any subset of 0 resnlts in two subsets of 0 for which the onter measure adds. Note that a subset S of 0 is measurable if and only if 1V1(AI U A 2) = 1V1(Al) + 1\I(A2) whenever Al C Sand A2 C se. Note that it follows from property (3) of an outer measure that 1\1(A) ::::; l\1(An S) + 1\1(A \S). Hence, we see that a subset S of o is measurable if and only if, for any subset A of 0, it follows that M(A) 2': M(A n S) + M(A \ S). Now, it follows almost immediately that if IvI is an outer measure on JP(O) and if Z is a subset of 0 such that 1V1(Z) = 0, then Z is measnrable. That is, let Z be such a set and let A be any subset of O. Then we have that J\1(A n Z) + 1V1(A \ Z) ::::; J\1(Z) + M(A) by property (2) of outer measures. Then, since IvI(Z) = 0, we have that 1\1(A n Z) + J'I(A \ Z) ::::; 1\1(A), which characterizes Z as being measnrable since it is always true that M(A n Z) + M(A \ Z) 2': 1\1(A) by property (3) of outer measures.

www.MathGeek.com

Page 89: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

88 Probability Theory

If lVI is an outer measure on the subsets of 0, and if A is a mea­surable set, then M·(A) is called the NI-measure, or simply the measure, of A. This terminology is justified by the next theo­rem. Before presenting that theorem, however, we will present a lemma that will be of use in proving the theorem of interest.

5.1 Lemma Let 0 be a nonempty set and let M be an outer meaS1tre on the subsets of o. If Al and A2 are measurable, then so 'is Al \ A2.

Proof. Vve will show that NI(AnB) = NI(A)+NI(B) whenever A c (AI \A2) and B C (AI \A2)c. Since B = (BnA2) u (B\A2), it follows that Au B = (A u (B \ A2)) u (B n A2). Hence, since A U (B \ A2) c A2 and (B n A2) c A2, it follows from the measnrability of A2 that JVI(AU B) = lVI(AU (B \ A2)) + J\J(B n A2). However, A C Al and (B \ A2) c (AI \ A2)C \ A2 cAl. Therefore, since Al is measnrable, JVI(A U (B \ A2)) = AI(A) + .i\1(B\A2). Combining equalities and using the measurability of A2 we see that NJ(A U B) = .i\1(A) + NI(B \ A2) + JVI(B n A 2) = .i\1(A) + .i\1(B), and the lemma is proved. D

As before, let D be a nonempty set. If S is a subset of 0, then any family of subsets of 0 whose union contains S as a subset is known as a cover of S. A countable cover of S is a cover of S that is countable.

5.6 Theorem Let D be a nonempty set. Let NJ be an outer meaS1tre on the subsets of D.

1. The family of ./I/I -measurable subsets ofD. forms a (j-algebra on D..

2. If {Ad kEN is a sequence of disjo'int measurable sets then

More generally, for any subset A of 0, it follows that

www.MathGeek.com

Page 90: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Caratheodory Extension Theorem 89

and

Proof. Let {Ad kEN be a seqnence of disjoint measurable sub­sets of O. Let E = Uk EN Ak and, for each positive integer j, let E j = U~=I Ak . vVe will show that

j

NI(A) = L AI(A n Ak) + M(A \ Ej). k=1

The proof will proceed by induction on j. For j = 1, the result follows from the measurability of AI. Now, assnming that the result holds for j - 1, it follows that

1\1(A) 1\1(A n Aj) + 1\1(A \ Aj) j

1\1(A n Aj) + L NI((A \ Aj) \ Ak) k=1

Recalling that the Ak'S are disjoint, it follows that (A\Aj)nAk = An Ak for k ~ j - 1. Therefore, since (A \ Aj) \ Ej - I = A \ Ej , it follows that

j

1\;f(A) = L AI(A n Ak) + M(A \ Ej ), k=1

as required. This completes the proof of the previous claim.

Next, since Ej C E, it follows that M(A \ Ej ) :2: M(A \ E). Using this fact with the above result and considering the limit as j ---7 00, we see that

00

M(A) :2: L 1\;f(A n Ak) + 1VI(A \ E) :2: 1\;f(A n E) + 1Vl(A \ E). k=1

However, we also have that M(A) ~ M(A n E) + 1VI(A \ E). Therefore, E is measurable, and

00

1Vl(A) = L 1Vl(A n Ak) + M(A \ E). k=1

www.MathGeek.com

Page 91: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

90 Probability Theory

If we replace A with An B in this equation we see that

= M(A n B) = L NI(A n A k ),

k=l

and the proof of (2) is complete.

Note that we have also shown that a countable union of disjoint measurable sets is measurable. To prove (1), we must show that a countable union of arbdrary measurable sets is measurable.

Returning now to the proof of (1), it follows from Lemma 5.1 and the fad that n is measurable that the complement of a measurable set is also measurable. Moreover, since El U E2 = (Ef \ E2)C, it follows that El U E2 is measurable if El and E2 are measurable. Therefore, any finite union of measurable sets is measurable. Next, let {EkhEN be a sequence of measurable sets. If, for each positive integer j, B j = U~=l E k , then

Since the Bj's are measurable and nondecreasing, the terms on the right are measurable and disjoint. Thus, by the case already considered, it follows that Uk=l Ek is measurable. This com­

pletes the proof of the theorem. D

A measure J-l on an algebra A is said to be O"-finite (with respect to A) if n can be written as n = UkEN nk where for each positive integer k, rh E A and J-l(fh) < 00. For example, Lebesgue measure is O"-finite on the algebra generated by the intervals (a, b].

Let n be a nonempty set, and let A be an algebra on n. If J-l is a measure on the algebra A, we define the outer extension J-l* of J-l as follows: For any subset A of n,

= J-l*(A) = inf L J-l(Ak ) ,

k=l

where the infimum is taken over all countable covers of A by sets in A. Note that it is always possible to find such a cover of A

www.MathGeek.com

Page 92: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Caratheodory Extension Theorem 91

since 0 itself belongs to A. The fact that A is an algebra allows us to assume without loss of generality that the sets Ak are dis­joint. \Ve will make this assumption throughout the remainder of the section.

5.2 Lemma Let 0 be a nonempty set. If A is an algebra on 0 and if J-L is a measure on A then the outer extension J-L* of J-L ,is an outer meaS1Lre.

Proof. Note that J-L*(0) = 0 since 0 E A, and J-L*(A) ;:::: 0 for any subset A of D. If Al and A2 are two subsets of 0 such that Al C A2, then any <:ountable (;Over of A2 by sets in A is also a countable cover of Al by sets in A. Thus, we see that J-L*(AI) ::; J-L*(A2). Now, let {AdkEN be any sequen<:e of subsets of O. \Ve wish to show that

Let E be a positive real number. For ea<:h positive integer k, there is a countable covering of Ak by sets {Ajk} from A such that

L J-L(Ajk ) ::; J-L* (Ak + ;k' jEN

since J-L* (Ajk) is defined as an infimum. Now, since Uk EN Ak C

UjEN Uk EN Ajk' it follows that

and, since E > 0 may be chosen arbitrarily close to zero, the desired result follows. D

5.7 Theorem (Caratheodory Extension Theorem) Let A be an algebra on a nonempty set D. If A is a meaSUTe on A, let A* be the correspond'ing outer meaS1Lre, and let A* be the O"-algebra of A * -measumble sets. Then

1. the restriction of A* to A* is an extension of A

www.MathGeek.com

Page 93: Probability - Basic Ideas and Selected Topics

92

www.MathGeek.com

Probability Theory

2. if A is rr-finite with respect to A, and if S is any rr-algebm with A eSc A *, then A * is the only measure on S that is an extension of A.

Proof. Let A E A. Then clearly A*(A) ::::; A(A). On the other hand, given disjoint sets {Ak : kEN} in A that cover A, let A~ = Ak n A. Then A~ E A and A is the disjoint union of the A~'s. Hence A(A) = L:kEN A(AU. Since A~ c A, it follows that A(A) ::::; L:kEN A(A~). Therefore, A(A) ::::; A*(A), and the proof of (1) is complete.

To prove (2), which states the nniqneness of the extension, let JL be any measure on the O"-algebra S where A eSc A* that agrees with A on A. Given a set E E S, consider any countable colledion {Ed snch that E C UkEN Ek and snch that each Ak E

A. Then

Therefore, by definition of A *, it follows that JL( E) ::::; A * (E). To show that equality holds, first suppose that there exists a set A E A with E c A and A(A) < 00. Applying what has just been proved to A \ E, which belongs to S, we see that JL(A \ E) ::::; A*(A \ E). However,

JL(E) + JL(A \ E) = JL(A) = A*(A) = A*(E) + A*(A \ E).

Since each of these terms is finite (due to the fact that A(A) is finite) it follows that JL(E) = A*(E) in this case.

In the general case, since A is rr-finite, there exist disjoint Ak E A such that the Ak'S cover S1 and such that A(Ak) < 00. \lVe may apply the result above to each En Ak (which is a subset of Ak )

to show that JL(E n Ak) = A*(E n Ak)' By summing over k, we see that JL(E) = A*(E), and this completes the proof. D

The next result follows as a consequence of the Caratheodory Extension Theorem.

5.8 Theorem Let F : lR ----7 [0, 1] be a pTObab'ility distribution func­tion and let JLo((a, b]) = F(b) - F(a) for -00 ::::; a < b. Then

www.MathGeek.com

Page 94: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Caratheodory Extension Theorem 93

theTe is a unique e:Eiension of Po to a meaSUTe p on B(lR) such that p(I) < 00 for any bounded interval I.

Consider a random variable X defined on a probability space (0, F, P). The distribution or law of X is the probability measure Px on (lR, B(lR)) defined by px(A) = P(X E A) = P({w EO: X(w) E A}) for each A E B(lR). VVe say that Px is the measure on (lR, B(lR)) induced by X. Note that Fx(x) = px(( -00, xl) and that Px = PoX-I.

5.9 Theorem If F is a nondecreasing, right-contirmous real-valued function defined on lR then theTe er;ists a unique meaSUTe p on (lR, B(lR)) such that p((a, bl) = F(b) - F(a) for all a <: b.

The measure p corresponding to the function F in Theorem 5.9 is said to be the measure on (lR, B(lR)) induced by F and is obtained via Theorem 5.8. If Fx is the distribution function of a random variable X then the measure on (lR, B(lR)) induced by Fx is equal to the measure on (lR, B(lR)) induced by X.

5.10 Theorem If F is any probability distribution function then there exists on some probability space a random variable X such that Fx = F.

Proof. Let p be the measure on (lR, B (lR)) induced by F and define a random variable X on the probability space (lR, B(lR) , p) by letting X (w) = w for each w E R The distribution function Fx of X is given by Fx(x) = p({w : X(w) :::; x}) = p(( -00, xl). From Theorem 5.9 we know that the measure p is such that p((a, b]) = F(b) - F(a). In particular, p(( -00,

xl) = F(x) and hence F(x) = Fx(x). D

The dearer the teacher makes it, the worse it is for you. You must work things out for yourself and make the ideas your own. vVilliam Osgood

www.MathGeek.com

Page 95: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

94 Probability Theory

Many questions about a random variable X may be answered based only on the distribution function of X. That is, to answer such questions we do not need to know the probability space on which X is defined. Instead, we may simply take the distribution function Fx and use Theorem 5.10 to define a random variable Yon a probability space (R B(JR.) , f-L) where f-L is the measure on (JR., B(JR.)) induced by Fx. Any question about X that depends only upon Fx will have the same answer if we ask it about the random variable Y instead. Thus, we will often say "let X be a random variable with distribution function Fx" and make no reference to the underlying probability space on which X is defined.

The following result establishes a link between the concept of measurability and the existence of a functional relation. In par­ticular, this result will place on firm footing the engineering concept of a data processor.

5.11 Theorem ConsideT a collection {Xl, ... , Xn} of random vaTi­abIes defined on a pTObab'ility space (0, F, P). A random vaTi­able X defined on th'is space ,is measurable with Tespect to ()(Xl, ... , Xn) if and only -if theTe exists a BOTel measurable function f: JR.n -7 JR. s1Lch that X(w) = f(Xl(w), ... , Xn(w)) fOT all wE 0.

5.7 Expectation

If X is a random variable defined on (0, F, P) then the expected value of X is denoted by E[X] and is defined by

E[X] = kXdP

provided the integral exists. If 9 : (JR., B(JR.)) -7 (JR., B(JR.)) then

E[g(X)] = k g(X) dP.

A random variable X for which E[X] exists and is finite is said to be integrable or to have a finite mean or to be a first order random variable.

www.MathGeek.com

Page 96: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Expectation 95

5.12 Theorem Consider an integrable random variable X and a Borel measurable function 9 : JR. ---+ R ff Fx is the distribution function of X and if J-L x is the measure on (JR., B (JR.)) induced by X then

Further, if X possesses a density function fx then

E[g(X)] = k g(x)fx(x) dx.

Example 5.5 In this example we will find a rather simple expectation using three different methods in order to illustrate some of the concepts that we have been considering.

Let X be a random variable defined on a probability space (0, F, P) with distribution function

{

0 if x < -2 F (x) = ~ if - 2 ::; x < 3

1 if x ;:::: 3.

Note that P(X = -2) = P(X = 3) = 1/2. VVhat is E[X2 + I]? D

Method I: \Ve will first find the expectation via a Lebesgue integral over 0 with respect to P. Let A = {w EO: X(w) = -2} and let B = {w EO: X(w) = 3} and note that 0\ (AUB) is a P-null set. Further, note that

k(X 2 + 1) dP

L (X2 + 1) dP + k (X2 + 1) dP

L (4 + 1) dP + k (9 + 1) dP

15 5P(A) + 10P(B) = 2.

The following result will be used by Method II.

www.MathGeek.com

Page 97: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

96 Probability Theory

5.3 Lemma If G(x) = {a 'if x < y

(3 if x ~ y

where /3 > a and if h : lR ----Jo lR is continuous at y then

k h(x) dG(x) = ((3 - a)h(y).

Proof. Consider a subdivision r = {ao, ... , an} of an interval [a, b] such that a < y < b and assume that aj > y > aj-l.

Recall the notation we introduced during our derivation of the Riemann-Stieltjes integral. This desired result follows since

n

R(r) = Lh(bi)(G(a;) - G(a;-I)) = h(bj ) ((3 - a) i=1

and since h(bj ) ----Jo h(y) as If! ----Jo O. D

Method II: We will next express E[X] as a Riemann-Stieltjes integral over lR with respect to F. Let 1 > [ > 0 and note that

k (x2 + 1) dF(x)

j-2-E, -2+e) (x2 + 1) dF(x)

+ j3-E,3+c) (x2 + 1) dF(x)

1 1 15 -(4+1)+-(9+1) =-. 2 2 2

Method III: Finally, we will express E[X] as a Lebesgue inte­gral over lR with respect to the measure /-Lx on (lR, B(lR)) induced by X. First, note that

{

I if -2 E A and 3 E A

/-Lx(A) = P(X E A) = 1/2 ~~ -2 E A and 3 E Ac 1/2 If -2 E AC and 3 E A o if -2 E AC and 3 E AC

for A E B(lR). Note that lR \ { -2, 3} is a /-Lx-null set. Thus, it follows that

E[X2 + 1] r (x2 + l)d/-LX + r (x 2 +1)d/-LX J{-2} J{3}

15 (4 + l)/-Lx({ -2}) + (9 + 1)/-Lx({3}) = 2.

www.MathGeek.com

Page 98: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Expectation 97

5.4 Lemma Consider a random variable X that takes values only in some countable set {Xl, X2, ... } and a function 9 : JR;. ---+ JR;.

such that g(X) is integrable. It follows that

= E[g(X)] = Lg(Xi) P(X = Xi).

i=l

Example 5.6 This example is known as the St. Petersburg Paradox. Consider the following game. A fair min is flipped until a tail appears; we win $2k if it appears on the kth toss. Let the random variable X denote our winnings. \Vhat is E[X]? That is, how mu(;h should we be required to "put up" in order to make the game fair? Note that X takes on the value 2k with probability 2-k ; i.e. the probability that we toss k -1 heads and then toss one tail. Thus,

ex)

E[X] = L 2k 2-k = 00.

k=l

The paradox arises since most people would "expect" their win­nings to be much less. The problem arises from our inability to put in perspective the very small probabilities of winning very large amounts. The problem returns a much more realistic value if we assign a maximum amount that can be won; that is, if we are allowed to "break the bank" when we reach a preassigned level. D

Example 5.7 As part of a reliability stndy, a total of n items are tested. Suppose that each item has an exponential failure time distribution given by

for t > 0 where Ti is a random variable that denotes the time at whi(;h the ith item fails and where A is a fixed positive mnstant. Note that if A is large then we expect the item to fail quickly. Assume that the T/s are mutually independent. (Is this a good assumption?) Let T denote the time at which the first failure O(;(;11rs. \Vhat is the expected value of T? Note that T ex(;eeds

www.MathGeek.com

Page 99: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

98 Probability Theory

some positive time t if and only if Ti > t for each i. Thus, for t> 0,

P(T> t) P(T1 > t, T2 > t, ... , Tn > t) P(T1 > t)P(T2 > t) ... P(Tn > t) {= Ae-Atldtl1= Ae-At2 dt

2 ••• (= Ae-Atndt

n it . t it -At -At -nAt e ... e = e

From this we see that FT(t) = 1 - e-nAt for t > o. Recall from the fnndamental theorem of caknlns that if a probability distribution function is differentiable then that derivative is a probability density function corresponding to that distribution. Thus, fT(t) = nAe-nAt for t > 0 from which it follows that

1= 1 E[T] = tfT(t) dt = -

o nA

where we have used the fact that fo= ye-Ydy = 1. Note that the expeded time of the first failure decreases as either 17 or A mcreases. D

5.8 Useful Inequalities

Let X be a random variable defined on (0, :.F, P). If kEN then E[Xk] is called the kth moment of X and E[(X - E[X])k] is called the kth central moment of X. The first moment of X is called the mean of X and the second central moment of X is called the variance of X and is denoted by (]"2, (]"1-, or by VAR[X]. The standard deviation of X is denoted by (Tx and is given by the nonnegative square root of the variance of X. A random variable with a finite second moment is said to be a second order random variable.

5.13 Theorem If k > 0 and if E[Xk] is finite then E[Xj] is finite when 0 < j < k.

www.MathGeek.com

Page 100: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Useful Inequalities 99

Proof. Note that E[Xj] is finite if and only if E[IXlj] is finite. Further, note that

E[IXlj] in IXl

j dP

r IXlj dP + r IXlj dP J{IXlj<l} l{IXlj:;"l}

< r IdP+ r IXlkdP J{IXlj<l} J{IXlj:;"l}

< P({IXl j < I}) +E[IXlk] < 00.

Thus, if the kth moment is finite then all lower moments are also finite. D

Exercise 5.4 The density function

. 1 f(x) = ---:-::-

7r(1 + x 2 )

for x E lR is called a Cauchy density function. Let X be a random variable with density function f. Show that none of the odd moments of X exists and that none of the even moments of X is finite.

Exercise 5.5 Although the first moment of a random variable need not exist, the second moment of a random variable always exists. vVhy?

Exercise 5.6 Show that if X is a second order random vari­able then

5.14 Theorem Consider a positive integer n and let Xl, ... , Xn be mutually independent random variables defined on (n, F, P). If Xi :2: 0 for each i or if E[Xi ] < 00 for each i then E[XI ... Xn] exists and is equal to E[Xl]' .. E[Xn]'

www.MathGeek.com

Page 101: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

100 Probability Theory

5.1 Inequality (Holder) Ifl < P < 00, 1 < q < 00, and 1+1 = I, p q

then

5.2 Inequality (Minkowski) If p :2: 1, then

The following inequality is a special case of Holder's inequality.

5.3 Inequality (Cauchy-Schwarz) E[IXYI] ::; VE[X2]VE[Y2].

5.4 Inequality (Chebyshev) If 0: > 0 then

P(IX - E[X]I :2: a) ::; ~2VAR[X]. a

Example 5.8 Consider again Buffon's needle problem from Section 5.5 on page 84 and recall that the random variable Y = H/N provides an estimate of 2/Tr where H denotes the number of times the needle hits a line after N drops. Note that

(N) (2)h ( 2)N-h P(H = h) = h -; 1 - -;

for h = 0, 1, ... , N, where we have nsed the binomial dis­tribntion from Section 5.4 on page 80. Thns, E[Y] = 2/Tr

and VAR[Y] = ~~ (1 - ~). What value of N ensures that

IY - (2/it) I < 0.01 with probability 0.999? Chebyshev's in­equality implies that such will be true if

1 1 2 ( 2) ( / )

2 - 1 - - < 0.001. 1 100 N it it

This inequality holds when N > 2,313,350. reader is invited to verify this result empirically.

The dedicated D

Recall that a function <I> : lR ---7 lR is said to be convex if <I> (AX + (I-A)Y) ::; A<I>(x)+(I-A)<I>(y) whenever 0::; A::; 1. A sufficient condition for <I> to be convex is that it have a nonnegative second derivative.

www.MathGeek.com

Page 102: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Useful Inequalities 101

5.5 Inequality (Jensen) If <I> is conve:r on an interval containing the range of X then <I>(E[X]) ::::; E[<I>(X)]. Note that letting <I> ( x) = x 2 implies that (E[X])2 ::::; E[X2].

5.6 Inequality (Lyapounov) If 0 < a ::::; (3 then

Statistical thinking will one day be as nec­essary for efficient citizenship as the ability to read and write. -H. G. 'Wells

Let X and Y be random variables with finite means and assume that E[XY] is also finite. The covariance of X and Y is de­noted by COV[X, Y] and is defined to be COV[X, Y] = E[(X­E[X])(Y -E[Y])]. Note that COV[X, Y] = E[XY]-E[X]E[Y], also. The random variables X and Yare said to be nncorrelated if COV[X, Y] = 0; that is, if E[XY] = E[X]E[Y]. Note that if X and Yare independent (and if E[X], E[Y], and E[XY] are finite) then X and Yare uncorrelated. If the variances oJ and ()~ of X and Yare finite and nonzero then the correlation coef­ficient between X and Y is denoted by p(X, Y) and is defined by

p(X, Y) = COV[X, Y]. (}x(}y

5.15 Theorem If Xl, ... ,Xn are second order random variables then

n n

VAR[XI + ... +Xn] = LVAR[Xi ] + 2 L COV[Xi' Xj]. i, j=l i<j

5.3 Corollary If Xl, ... , Xn are second order, uncorrelated ran­dom var'iables (that is, if COV[X;, X j ] = 0 when i i= j) then

n

VAR[XI + ... +Xn] = LVAR[Xd-;=1

www.MathGeek.com

Page 103: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

102 Probability Theory

5.4 Corollary If Xl, ... , Xn are second order, mutually indepen­dent random variables then

n

VAR[X1 + ... +Xn ] = LVAR[Xi ].

;=1

5.9 Transformations of Random Vari­ables

5.16 Theorem If X and Y have a joint probability density function f x, y (x, y) then the random variable Z = X + Y possesses a density function given by

fz(z) = [: fx,Y(x, z - x) dx.

Proof. Let Az = {(x, y) E]R2 : x + y:::::; z} and note that

P(Z:::::;z) = jjfX,y(x,y)dydx Az

[: [Z~X fx,Y(x, y) dydx

[: [z= fx,Y(x, s - x) dsdx

[ziX) [: fx,y(x, s - x) dxds.

Thus, we have found a nonnegative function f z (s) such that

P(Z:::::; z) = [ziX) fz(s) ds

for all z E R It follows by definition that fz is a probability density function for Z. D

5.5 Corollary If X and Yare independent mndom 1Jariables pos­sessing density functions fx and fy, respectively, then the mn­dom variable Z = X + Y possesses a probability density function given by

fz(z) = l fx(x)fy(z - x) dx.

www.MathGeek.com

Page 104: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Transformations of Random Variables 103

Note that this density for Z is the convoltdion of fx and fy.

For example, if X and Yare independent random variables each with a uniform distribution on [0, 1] then X +Y has a triangular distribution on [0, 2]. A proof of the following result will be supplied by Example 5.10.

5.17 Theorem If X and Y possess a joint pmbability density func­tion fx,Y(x, y) then the random variable B = XY possesses a probability density funct'ion given by

5.18 Theorem Consider a random variable X that possesses a pmb­ability density function fx and a function g: ]R ---+ ]R that pos­sesses a differentiable inverse. The random variable Y = g(X) possesses a pmbability density function given by

Example 5.9 Consider a random variable X with probability density function fx and let g(x) = ax + b for a, b E ]R with a i- O. Let Y = g(X) and note that g-l(x) = (x - b)ja. Thus, Theorem 5.18 implies that

fy(y) = fx(g-l(y)) Id~yg-l(Y)1

.fx (y : b) I :y (y : b) I

Ix (Y:b) I~I. D

5.19 Theorem Consider random variables X and Y that possess a joint pmbability density function Ix, y(x, y). Consider functions g: ]R2 ---+ ]R and h: ]R2 ---+ ]R for wh'lch there eX'lst I1Lnctions a:

]R2 ---+ ]R and {3: ]R2 ---+ ]R such that a(g(x, y), h(x, y)) = x

www.MathGeek.com

Page 105: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

104 Probability Theory

aa aa and (3(g(x , y), h(X, y)) = y and such that -a (x, y), -a (x, y),

'x y

aal6

(x, y), and aa(3 (x, y) each e:rist. The random vaTiables B = x y

g(X, Y) and T = h(X, Y) possess a joint pmbability density function given by

fB,T(b, t) = fX'y(a(b, t), (3(b, t)) det

Example 5.10 As an example we will prove Theorem 5.17. Let g(x, y) = xy and h(x, y) = y. Let a(b, t) = bit and fJ(b, t) = t and note that a(g(x, y), h(x, y)) = x and. 16(g(x, y), h(x, y)) = y as desired. Let B = XY and T = Y. Using the previous result it follows that

1 0

fx,y(~,t) t

fB, T(b, t) det -b t'2

1

fx,Y (~, t) I~I· Thus, it follows that

fB(b) k fB, T(b, t) dt

k fx,Y (~, t) I~I dt

as claimed. D

www.MathGeek.com

Page 106: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Moment Generating and Characteristic Functions 105

5.10 Moment Generating and Char­acteristic Functions

The moment generating function of a random variable X is de­fined to be Alx(s) = E[eSX ] for all s E lR for whkh the expec­tation is finite provided that 1\1 x (s) is finite in some nonempty open interval containing the origin.

5.20 Theorem The moment genemting funct'lon of a bounded mn­dom variable exists.

Proof. Let X be a bounded random variable, and note that eSx

is bounded as well for each fixed value of s. Thus, E[eSX ] exists for each fixed s and, by the dominated convergence theorem, is a continuous function of s. D

5.21 Theorem Consider a mndom variable X for wh'ich the moment genemting function 1\Jx (s) e.1:'ists. The function JvIx satisfies the following properties:

= 1. 1\1x(s) = L skE[Xk]jkL

k=O

5.22 Theorem If X and Yare independent mndom variables pos­sessing moment genemting functions lUX and lvIy , respectively, then the sum X + Y possesses a moment genemting function that is given by lvIx+y(s) = lUx(s)My(s).

5.23 Theorem Cons'lder two mndom variables X and Y possessing moment genemting functions 1\1x and lvIy , respectively. The mndom variables X and Y have the same distribution if and only if lvIx = 1\1y .

4This result is known as Taylor's Theorem.

www.MathGeek.com

Page 107: Probability - Basic Ideas and Selected Topics

106

www.MathGeek.com

Probability Theory

In the space of one hundred and seventy-six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oolitic Silurian Period, just a million years ago next November, the Lower Mississippi River was upward of one million three hundred thousand miles long, and stuck out over the Gulf of Mex­ico like a fishing-rod. And by the same to­ken any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three­quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a sin­gle mayor and a mutual board of aldermen. There is something fascinating about sci­ence. One gets such wholesale returns of conjecture out of such a trifling investment of fact. -Mark Twain

Example 5.11 A random variable X is said to have a Poisson distribution with parameter A > 0 if

Ak P(X = k) = _e-A

k!

for each nonnegative integer k. Note that for snch a random variable X, the moment generating function NIx exists and is given by

NIx(s)

www.MathGeek.com

Page 108: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Moment Generating and Characteristic Functions 107

where we have recalled that the Taylor's series expansion for eZ

is given by (Xl k

z _","""" z e - L ,.

k=O k.

Now, assume that X and Yare independent random variables each with a Poisson distribution with parameter ).. \Vhat is the distribution of X + Y? Using Theorem 5.22 we see that

and hence from Theorem 5.23 it follows that X + Y is Poisson with parameter 2),. D

A problem with the moment generating function is that it need not exist and hence is difficult to use in a general setting. The characteristic function defined below shares many of the same properties as the moment generating function yet always exists. In nonprobabilistic contexts, a moment generating function is similar to a Laplace transform and a characteristic function is similar to a Fourier transform.

The characteristic function of a random variable X is the func­tion <I> x : lR ----Jo C defined by

<I>x(t) = E[e~tX] = E[cos(tX)] + 1.E[sin(tX)].

For the characteristic functions of several common distributions, see Table 5.1.

5.24 Theorem A characteristic function <I>x exists for any random variable X and it possesses the following properties:

1. I <I> x (t) I ::::; <I> x (0) = 1 for all t E R

2. IfE[lXkl] < 00 then <I>~)(O) = 1,kE[Xk].

3. <I>x(t) = <I>x(-t).

-4. <I> x (t) is real-valued if and only if F x is symmetric; that is, if and only if IE dFx(x) = LE dFx(x) for any real Borel set B where - B = {-x: x E B}. (Note that a random variable with a symmetric, absolutely continuous probability distr"ibution function possesses an even proba­bility density function.)

www.MathGeek.com

Page 109: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

108 Probability Theory

5.25 Theorem Distinct pTObability distribtdions correspond to dis­tinct characteristic functions.

5.26 Theorem If X and Yare independent random variables then

5.27 Theorem If a, b E lR and 'if Y = aX + b then

5.28 Theorem (Continuity Property) Suppose that {Fn}nENis a sequence of pTObability distribution functions with correspond­ing characteristic functions {<pn : n E N}. If there exists a pTObability distribution function F such that Fn(x) ---7 F(x) at each point x where F is continuous then <pn(t) ---7 <p(t) for all t, where <P is the characteristic function of F. Conversely, if <p(t) = limn ---7= <pn(t) exists and is continuous at t = 0 then <P is the character'isi'ic function of some pTObability &istribui'ion ftLnc­tion F and Fn(x) ---7 F(x) at each point x where F is continuous.

5.11 The Gaussian Distribution

A random variable X is said to be a Gaussian random variable or to possess a Gaussian distribution if X has a probability density function of the form

1 (-(X - m)2) fx(x) = ~exp 2

21l"(J"2 2(}

for all x E lR where m E lR and (}2 > 0 are fixed parameters. To indicate that X has such a distribution we write X rv N(m, (}2).5

As we will see, the mean of X is Tn and the variance of X is (}2.

Note that these two parameters completely specify the Gaussian

"Some texts refer to the Gaussian distribution as the Normal distribu­tion. The "N" in our notation comes from this latter terminology.

www.MathGeek.com

Page 110: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

The Gaussian Distribution 109

Table 5.1 Common Characteristic Functions

I Distribution I Ix <Px

Uniform I(O,l)(X) e't - 1

~t

Exponential e-xI(o, =) (x) 1

--

I - ~t

Laplace le- 1xl 1 --

2 1 + {2

Cauchy 1 1

e- 1tl -

1[" 1 + X2

Gaussian 1 -x2/2 e-t2 /"2 --e yI27r

www.MathGeek.com

Page 111: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

110 Probability Theory

distribution of X. If X f'.) N(O, 1) (i.e. if X is Gaussian with zero mean and unit variance), then we say that X is a standard Gaussian random variable or that X has a standard Gaussian distribution.

5.29 Theorem If X f'.) N(m, ()2) then X possesses a moment gener­ating function given by

NJX(t) = exp ((J:t2 + tm) .

Proof. Note that

Mx(t)

where the final integral equals 1 since the integrand is a Gaussian density with mean (J2t + 1n and variance (J2. D

www.MathGeek.com

Page 112: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

The Gaussian Distribution 111

Example 5.12 The moment generating function may now be used to confirm that if X f"'-.) N(rn, (}2) then E[X] = Tn and VAR[X] = (}2. Note that lVl'x(t) = (Tn + (}2t)l\1x(t) and that l\1'{(t) = (}2l\1x(t) + (m + (}2t)2 1v1x(t). Thus, E[X] = M'x(O) = m and E[X2] = l\1'{(0) = (}2+rn2, which implies that VAR[X] =

E[X2] - (E[X])2 = (}2 as expected. D

5.30 Theorem If a mndom vaTiable X has a N(m, (}2) distTibution then the mndom vaTiable

W = X -Tn ()

is a standaTd Gauss'lan mndom vaTiable.

Proof. Note that

Fw(w) = P(W S w) = P (X :rn s w) = P(X S (}w + rn).

Thus, it follows that

Fw(w) = i:+m fx(x) dx

jw ~ exp ( _y2) dy with y = x -m~ . -00 V 21f 2 ()

Thus, we see that lV has a standard Gaussian distribution. D

Note 5.4 If

j x 1 cD(x) = ;;cexp(-t2 /2)dt

. -00 V 21f

then, for x 2 0,

cD(x) = 1- ~(1 +d1x+d"2X2 +d3X3 +d4x4+d5X5 +d6x6)-16 +E(X)

where 1c:(x)1 < 1.5 x 10-7 and where

d1 0.0498673470

d2 0.0211410061

d3 0.0032776263

d4 0.0000380036

d5 0.0000488906

d6 0.0000053830.

www.MathGeek.com

Page 113: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

112 Probability Theory

Further, if <I>(x) = 1 - p for 0 < p ::; 1/2 then

Co + C1t + C2 t "2 x=t- +;:-(p)

1+q1t+q2t2+q3t3 ~

where Ic(p) I < 4.5 X 10-4 , where

and where

5.12

Co 2.515517

C1 0.802853

C2 0.010328

ql 1.432788

q"2 0.189269

q3 0.001308.

The Bivariate Gaussian Distri­bution

Two random variables X and Yare said to possess a bivariate or joint Gaussian distribution if they possess a joint probability density function of the form

1 (-q(X, y)) fx,Y(x, y) = 2 VI 2 exp 2 KO"lO""2 - P

with q(x, y) =

www.MathGeek.com

Page 114: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Multivariate Gaussian Distributions 113

where IT1 > 0, IT2 > 0, m1 E lR, m2 E lR, and Ipi < 1. Our nota­tion for this distribution is N(m1' rn2, o"i, 0"3, p). Such random variables X and Yare said to be jointly Gaussian or mutually Gaussian.

Exercise 5.7 For X and Y as above, show that X f"'...J N(m1' lTD and that Y rv N(m2, IT3).

Exercise 5.8 For X and Y as above, show that the correla­tion coefficient of X and Y is p.

5.31 Theorem Let X and Y have a N(m1' m2, O"i, 0"3, p) distribu­tion. The random variables X and Yare independent if and only 'if p = O. That is, mutually Gaussian random var'iables X and Yare 'independent if and only 'if they are uncorrelated.

Proof. \Ve have already seen on page 101 that if the two random variables are independent then they are uncorrelated. To see that, in this case, if the random variables are uncorrelated then they are independent, simply let p = 0 and note that fx,Y(x, y) = fx(x)fy(y). D

5.32 Theorem If X and Y possess a bivariate Gaussian distribution then X + Y is a Gamsian random variable.

5.13 Multivariate Gaussian Distribu­tions

A collection {Xl, ... , Xn} ofrandom variables is said to possess a multivariate Gaussian distribution (or to be jointly Gaussian or mutually Gaussian) if they possess a joint probability density function of the form !x1 , ... ,Xn (X1, ... , xn ) =

1 [ 1 T l' ] ---===----exp --(x - m) ~- (x - m) V(21l")nVdet~ 2

www.MathGeek.com

Page 115: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

114 Probability Theory

where x = [Xl' .. Xn]T, m = [ml ... mnJT, and 2.: is a symmetric positive definite matrix. Recall that a matrix N is symmetric if N = NT and that a real symmetric matrix is positive definite if all of its eigenvalues are positive. It follows easily that E[Xi ] = mi fori = 1, ... , n and that COV[Xi' X j ] = aij where

2.:=

Ltnl an 2 (Ynn

The matrix 2.: is called the covariance matrix of Xl, ... , X n . We denote sHch a distribution for X = [Xl, ... , Xn]T by writing X f"'.) N(m, 2.:).

Except for boolean algebra there is no the­ory more universally employed in mathe­matics than linear algebra; and there is hardly any theory which is more elemen­tary, in spite of the fact that generations of professors and textbook writers have ob­scured its simplicity by preposterous calcu­lations with matrices. -J. Dieudonne

5.33 Theorem If a collection of Gaussian mndom variables are mu­tually independent then they are mut1wlly Gaussian.

5.34 Theorem If a collection of mutually Gaussian mndom vari­ables are (pa'lrW'l.'ie) 1Lncorrdated then they are rrmtually inde­pendent.

5.35 Theorem If X = [Xl, ... , Xn]T has a N(p, 2.:) distribution with 2.: posdive definde, if e is an m x n real matr'lX with mnk!" m ::::; n, and if b is an m x 1 r-eal vector, then ex + b has a N (e J1 + b, C2.:CT ) distribution and e2.:CT is positive definite.

6The rank of a matrix is the number of linearly independent rows (or columns) in the matrix. The matrix C in this theorem can have more columns than rows but the rows must be linearly independent.

www.MathGeek.com

Page 116: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Multivariate Gaussian Distributions 115

5.36 Theorem If X = [Xl, ... , Xn]T 'lS composed of mut1wlly Gaussian, positive variance random variables, then there ex­ists a nonsingular n x n real matrix C such that Z = ex is a random vector composed of mutually independent positive variance Gaussian random variables.

Example 5.13 Let Xl and X2 be mutually Gaussian ran­dom variables with zero mean, unit variances, and correlation coefficient ~. Let

[ ~~ ] = [~~ ~:] [ ~~ ] = [ ~~~~ ! ~:~~ ]. Note that Zl and Z2 are mutually Gaussian. Thus, for Zl and Z'2 to be independent we require that E[ZlZ'2] = E[Zl]E[Z'2] = O. Let Cl = C3 = 1, let C'2 = 0, and note that E[ZlZ'2] = E[Xl(Xl + C4X2)] = E[Xf] + C4E[X1X2]. Note that E[Xf] = 1 and E[X1X'2] = 1/2. Thus, Zl and Z'2 are independent if C4 = -2. D

Example 5.14 Let Xl, ... , Xn be random variables possess­ing a joint probability density function given by

www.MathGeek.com

Page 117: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

116 Probability Theory

Thus, since any subset of {Xl, ... , Xn} containingn - 1 ran­dom variables is composed of mutually independent standard Gaussian random variables, it follows that any proper subset of {Xl, ... , Xn} containing at least two random variables is also composed of mutually independent standard Gaussian random variables. However, it is dear that the random variables in {Xl, ... , Xn} are neither mutually independent nor mutually Gaus­sian. This example points out the dangers that arise when one attempts to show that a collection of random variables is jointly Gaussian. D

5.14 Convergence of Random Vari­ables

Consider a probability space (0, F, P) and a sequence {Xn}nEN of random variables defined on that space. In this section we will consider several ways in which the elements in this sequence may converge.

5.14.1 Pointwise Convergence

Consider a probability space (0, F, P) and a sequence {Xn}nEN of random variables defined on that space. \lVe say that the Xn's converge pointwise to a random variable X defined on (0, F, P) if IXn(w) -X(w)l---+ 0 asn ---+ OC! for each wE 0. In such a case we write Xn ---+ X.

5.14.2 Almost Sure Convergence

In a probabilistic context, a condition that holds almost ev­erywhere with respect to the underlying probability measure of interest is said to hold almost surely (written a.s.) or with prob­ability one (written wpl).

www.MathGeek.com

Page 118: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Convergence of Random Variables 117

Consider a probability space (0, :F, P) and a sequence {Xn}nEN of random variables defined on that space. \lVe say that the Xn's converge almost surely to a random variable X defined on (0, :F, P) if there exists a set E E :F such that P(E) = ° and such that IXn(w) - X(w)1 ----+ ° as n ----+ 00 for each wE EC. In such a case we write Xn ----+ X a.s.

5.37 Theorem Consider- a pmbability space (O,:F, P) and a se­quence {Xn},,,EN of mndom var-iables defined on that space. If X is a mndom var-iable defined on (0, :F, P) such that

00

L E[(Xn - X)2] < 00

n=l

then Xn ----+ X a.s.

Example 5.15 Consider the probability space given by ([0, 1], 8([0, 1]), A) where 8([0, 1]) denotes the collection of real Borel subsets of [0, 1] and A is Lebesgue measure on 8([0, 1]). Define random variables Xn for n E N on this space via:

if w E [0, 1] n QC if w E [0, 1] n Q.

Note that Xn(w) ----+ 00 as n ----+ 00 for all w E [0, 1] n Q. Even so, [0, 1] n Q is countable and hence is a Lebesgue null set. Further, off the set [0, 1] n Q we see that Xn ----+ ° as 17 ----+ 00. Thns, we mndllde that Xn ----+ ° a.s. Note that this also follows from Theorem 5.37 since

which is finite. D

5.14.3 Convergence in Probability

Consider a probability space (0, :F, P) and a sequence {Xn}nEN of random variables defined on that space. \Ve say that Xn converges in probability to a random variable X defined on (0, :F, P) if for each E > 0, P(IXn - XI ~ E) ----+ ° as n ----+ 00. In

p such a case we write .Xn -----? X.

www.MathGeek.com

Page 119: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

118 Probability Theory

5.38 Theorem Cons-ideT a probability space (0, F, P) and a se­q1lence {Xn }nEN of random variables defined on that space. Let X be a random variable also defined on (0, F, P). 1f Xn ---+ X

p a.s. then Xn ----? X. That is, convergence in probability is weaker than almost sure convergence.

Example 5.16 Consider a sequence of mutually independent random variables {Xn}nEN such that

P(Xn=(Y)={~ 1 1--n

Let c > 0 and note that

if (Y = 1 if 0: = O.

if c > 1 P(IXn - 01 ~ c) = P(Xn ~ s) = { ~ if 0 < c :S 1.

Thus, P(Xn ~ c) ---+ 0 as n ---+ 00 for any E > 0 which implies p

that Xn ----? O. Does Xn ---+ 0 a.s.? See Problem 11.1. D

5.14.4 Convergence in Lp

Consider a probability space (0, F, P), and let p be a positive real number. Let Lp(O, F, P) denote the set of all random variables defined on (0, F, P) whose pth absolute moment is finite, where we agree to identify any two random variables that are eqnal almost surely. (The pth absolute moment of a random variable X is E[IXIP].)

Consider a probability space (0, F, P) and a sequence {Xn}nEN of random variables defined on that space such that Xn E Lp(O, F, P) for some fixed p > o. \Ve say that the Xn's converge in Lp (or in the pth mean) to a random variable X E Lp(O, F, P) if E[IXn - XIP] ---+ 0 as n ---+ 00. In such a case we write Xn ---+ X in Lp. If p = 1 then Lp convergence is sometimes called convergence in mean. If p = 2 then Lp convergence is sometimes

nl.s. called convergence in mean-sqnare and we often write Xn ----? X.

5.39 Theorem Cons-ideT a probability space (0, F, P) and a se­quence {Xn}nEN of random variables defined on that space. Let

www.MathGeek.com

Page 120: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Convergence of Random Variables

After passing through several rooms in a museum filled with the paintings of a rather well-known modern painter, [Zyg­mund] mused, '-Mathematics and art are quite different. VVe could not publish so many papers that used repeatedly the same idea and still command the respect of our colleagues." -Ronald Coifman and Robert Strichartz writing about Antoni Zygmund

119

X be a random variable also defined on (n, F, P). fr there p

exists some p > 0 such that Xn ----7 X in Lp then Xn ----7 X. That is, convergence in probability is weaker than convergence in Lp.

Exercise 5.9 Does the converse to Theorem 5.39 hold?

Exercise 5.10 Construct a sequence of random variables that does not converge pointwise to zero at any point yet does converge to zero in Lp for any p > o.

Exercise 5.11 Show by an example that almost sure con-vergence need not imply convergence in Lp.

5.14.5 Convergence in Distribution

A sequence {Xn}nEN of random variables is said to converge in distribution or converge in law to a random variable X if the sequence {FX,JnEN of distribution functions converges to Fx(x) at all points x where Fx is continuous. In such a case we write

L Xn ----7 X. Note that these random variables need not be defined on the same probability space.

www.MathGeek.com

Page 121: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

120 Probability Theory

Table 5.2 Relations Between Types of Convergence

Relation Reference

Xn ---7 X in Lp #- Xn ---7 X a.s. Exercise 5.10 Xn ---7 X a.s. #- Xn ---7 X in Lp Exercise 5.11

p Xn ---7 X a.s. ::::} Xn ----? X Theorem 5.38

p Xn ----? X #- Xn ---7 X a.s. Example 5.16

p Xn ---7 X in Lp ::::} Xn ----? X Theorem 5.39

p Xn ----? X #- Xn ---7 X in Lp Problem 11.3

p L X n ----? X ::::} X n ----? X Theorem 5.40

L P Xn ----? X #- Xn ----? X Example 5.17

5.40 Theorem Cons-ideT a probability space (0, F, P) and a se­quence {Xn}nEN of random vaTiables defined on that space. Let

p X be a random vaTiable also defined on (0, F, P). If Xn ----? X

L then Xn ----? X.

Example 5.17 Let X take on the values 0 and 1 each with probability 1/2, and let Xn = X for each n E N. Let Y = I-X.

L Note that Xn ----? Y since FXn = Fx = Fy for all n E N even though IXn - YI = 1 for each n E N. D

Table 5.2 summarizes the relationships between the different types of convergence that we have considered.

5.15 The Central Limit Theorem

The Central Limit Theorem states that the sum of many in­dependent random variables will be approximately Gaussian if each term in the sum has a high probability of being small.

www.MathGeek.com

Page 122: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

The Central Limit Theorem 121

A key word in that description is "approximately." Nowhere does the Central Limit Theorem state that anything actually has a Gaussian distribution, except perhaps in a limit. In en­gineering applications, Gaussian assumptions are often justified by appeals to the Central Limit Theorem. Such appeals are, however, often at best simply not properly supported and at worst are simply specious. \Ve must always keep in mind that the Central Limit Theorem is not a magic wand that can make anything have a Gaussian distribution.

5.41 Theorem (Central Limit Theorem) Suppose that {Xn}nEN is a mutually 'independent sequence of ident'ically distr'ib1Lied ran­dom variables each with mean m and finite positive variance (j2. fr Sn = Xl + ... + Xn then

S -nm £ n -----+Z (jfo

where Z ,is a standard Cams'ian random variable.

Proof. (Sketch) Let m = O. Let ¢ be the characteristic

function of Xn and note that Sn;;;; has characteristic function ()yn

[¢ ((j~) In. Since the X;'s have a finite variance, Taylor's

t 2()2 theorem implies that ¢(t) = 1 - -2- + /3(t) where fJ(t)/t 2

----7 0

as t ----7 O. (Recall from calculus that

1 - ~n ----7 exp -; ( 2)11 (2)

as n ----7 00.) Thus, it follows that the characteristic function of S ~. converges to exp( -t2 /2), the characteristic function of a

()yn

standard Gaussian random variable, as n ----7 00. The desired result follows from Theorem 5.28 on page 108. D

<) Note 5.5 A sequence of mutually independent, identically distributed, second order random variables exists such that the convergence rate associated with the Central Limit Theorem can

www.MathGeek.com

Page 123: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

122 Probability Theory

be aTbitraTily slow. For details about this result, see the article "A Lower Bound for the Convergence Rate in the Central Limit Theorem" by V. K Matskyavichyus in the Theory of Probability and its Applications, 1983, Vol. 28, No.3, pp. 596-601. This results calls into question the standard engineering claim that the sum of a few dozen random variables is always approximately Gaussian.

5.16 Laws of Large Numbers

Consider n mutually independent tossings of a coin with con­stant probability p of turning up heads. Let T denote the num­ber of times that the coin comes up heads in n tosses. If n is large then it is reasonable to expect the ratio T /n to be close to p. The laws of large numbers make this idea mathematically precise. According the 'Weak Law of Large Numbers CWLLN), the ratio r /n converges to p in probability. According to the Strong Law of Large Numbers (SLLN), T /n converges to p al­most surely.

5.42 Theorem (WLLN) If {Xn}nEN is a sequence of identically distributed, mtdually independent random var'iables each with a finite mean m, then

------ ----+ In. n

Proof. We will prove only the special case when the Xn's each have a finite positive variance (J"2. Let

1 n

X= - LXk n k=l

and apply Chebyshev's inequality to obtain

(I XI + ..... + Xn I ) (J2 P -m >s <-. 17 - - ns2

The desired result now follows immediately. D

www.MathGeek.com

Page 124: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Conditioning 123

5.43 Theorem (SLLN) If {Xn}nENis a sequence of identically dis­trib1ded, rrmtually independent random variables each with a fi­nite mean Tn and a finite positive variance 17

2, then

---+ Tn a.s. n

5.17 Conditioning

Consider a random variable X defined on a probability space (rl, F, P) with E[IXI] < 00, and let Q be a l7-subalgebra of F. The conditional expectation of X given Q is denoted by E[XIQ] and is defined to be any random variable defined on (rl, F, P) that satisfies the following two properties:

1. E[XIQ] is Q-measurable.

2. 1; E[XIQ] dP = 1; X dP for all G E Q.

Any Q-measurable random variable that is equal a.s. to E[XIQ] is called a version of E[XIQ].

5.44 Theorem Consider a random var'iable X defined on a probabil­ity space (rl, F, P) with E[lXI] < 00, and let Q be a l7-subalgebra of F. The conditional e.rpectat'ion E[XIQ] eX'ists and ,is almost surely unique.

If A E F then the conditional probability of A given Q is denoted by P(AIQ) and is defined by

Thus, P(AIQ) satisfies the following two properties:

1. P(AIQ) is Q-measurable.

2. r P(AIQ) dP = P(A n G) for all G E Q. Je

www.MathGeek.com

Page 125: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

124 Probability Theory

Exercise 5.12 Consider a random variable X defined on a probability space (D, F, P) with E[IXll < 00. Show that E[XIFl = X a.s.

Exercise 5.13 Consider a random variable X defined on a probability space (D, F, P) with E[IXll < 00. Show that E[XI{ 0, n}l = E[Xl. Does this hold pointwise or just almost surely?

Consider random variables X and Y defined on a probability space (D, F, P) with E[IXll < 00 and E[lYll < 00, and let Q be a (J-sllbalgebra of F. Conditional expectations satisfy the following properties:

1. If X = a a.s. for a E ~ then E[XIQ] = a a.s.

2. If 0: E ~ and (3 E ~ then E[aX + (3YIQl = o:E[XIQl + (3E[YIQl a.s.

3. If X ::::; Y a.s. then E[XIQl ::::; E[YIQl a.s.

4. IE[XIQll ::::; E[IXIIQl a.s.

Property (1) is a special case of the following result.

5.45 Theorem Consider integrable random variables X and Y de­fined on a probability space (D, F, P), and let Q be a (J-subalgebra ofF. If X is Q-measurable and ifE[XYl is finite then E[XYIQl = XE[YIQl a.s.

5.6 Corollary Consider a random variable X defined on a probabil­ity space (n, F, P) with E[lXll < 00, and let Q be a (J-subalgebra of F. If X is Q-measurable then E[XIQl = X a.s.

5.46 Theorem Consider a random variable X defined on a proba­bility space (D, F, P) with E[IXll < 00, and let Q1 and Q2 be (J" -s1Lbalgebras of F such that Q1 c Q:2. It follows that

E[E[XIQ1lIQ2l E[XIQ1l a.s.

www.MathGeek.com

Page 126: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Conditioning 125

Proof. Vve will first show that

Recall that if

1. Y is Qrmeasnrable, and

2. r Y dP = r X dP for all G E Ql Je Je

then Y = E[XIQl] a.s. Thus, if E[E[XIQ2]IQl] satisfies the previ­ous two properties then E[E[XIQ2]IQl] = E[XIQl] a.s. By defini­tion, E[E[XIQ2]IQl] is Ql-measurable. Thus, we need only show that 1; E[E[XIQ2]IQl] dP = 1; X dP

for all G E Ql. By definition, the conditional expectation E[E[XIQ2]IQl] must satisfy:

L E[E[XIQ2]IQl] dP = L E[XIQ2] dP for all G E Ql. (1)

Similarly, E[XIQ2] must satisfy:

which, since Ql C Q2, implies that

Substituting this expression into (1) implies that

for all G E Ql which is what we wanted to show. \Ve will next show that E[E[XIQl]IQ2] = E[XIQl] a.s. In this regard, the following lemma will be useful.

5.5 Lemma Consider- a mndom var-iable Z defined on a pr-obability space (D, F, P) and let Ql and Q2 be (J-s1Lbalgebms of F. If Z is Ql -measumble and if Ql C Q2 then Z is Q2 -measumble.

www.MathGeek.com

Page 127: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

126 Probability Theory

Proof. Since Z is Yl-measurable it follows that Z-l(B(lR)) C

Yl. But, since Yl C Y2 it follows that Z-l(B(lR)) C Y2. But this means that Z is Y2-measurable. D

\Ve will now continue with our proof of Theorem 5.46. By defi­nition, E[XIY1] is Yl-measurable. Lemma 5.5 thus implies that E[XIY1] is also Y2-measurable. Corollary 5.6 thus implies that E[E[XIY1]IY2] = E[XIY1] a.s. D

5.7 Corollary Consider a random variable X defined on a pmbabil­ity space (0, F, P) with E[IXI] < 00, and let Y be a (T-subalgebra of F. It follows that E[E[XIYll = E[X].

Consider random variables X, 11, ... , Y" defined on a proba­bility space (0, F, P) with E[IXI] < 00. The conditional ex­pectation of X given Y1, ... , Yn is denoted by E[XI11, ... , Yn ]

and is defined to be E[XI(J"(Yl' ... , Yn)].

5.4 7 Theorem If X and Yare independent random variables de­fined on a pmbability space (0, F, P) wdh X 'integrable then E[XIY] = E[X] a.s.

Proof. Since X and Yare independent it follows that P(A n B) = P(A)P(B) for all A E (T(X) and for all B E (T(y). Let A E (T(y) and consider the random variable I A . Since (T(IA) C

(T(y) it follows that X and IA are independent. Note that

LE[XIY]dP LX dP for all A E (T(y)

in lAX dP for all A E (J"(y)

E[IAX] for all A E (T(y)

E[tl]E[X] for all A E (T(y)

in IA dP E[X] for all A E (J"(y)

14 dP E[X] for all A E (J"(y)

14 E[X] dP for all A E (T(y).

Note also that E[X] is (J"(Y)-measurable. Thus it follows that E[XIY] = E[X] a.s. D

www.MathGeek.com

Page 128: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Conditioning 127

Exercise 5.14 Show that if E[XIY] = E[X] then E[XY] = E[X]E[Y].

5.48 Theorem (Jensen's Inequality) Cons'ideT a mndom vaT'l­able X defined on a probability space (D, :.F, P) with E[IXI] < 00, and let Q be a rr-subalgebm of :.F. If cjJ ,is a conve.1: Teal-valued function defined on lR and if cjJ(X) is integmble then

cjJ(E[XIQ]) :::; E[cjJ(X)IQ] a.s.

Example 5.18 Consider a random variable X defined on a probability space (D, :.F, P) with E[X2] < 00, and let Ql and Q2 be O"-t-mbalgebras of :.F such that Ql C Q2' \Ve will show that

To begin, for a rr-subalgebra Q of :.F, note that

Thus, it follows that

if and only if

E[X2 - 2XE[XIQ] + E[XIY]2] E[X2] - 2E[XE[XIQ]] + E[E[XIQ]2] E[X2] - 2E[E[XE[XIQ]IQ]] +E[E[XIQ]2] E[X2] - 2E[E[XIQ]2] + E[E[XIQ]2] E[X2] - E[E[XIQ]2].

The desired result follows since we have

E[E[XIQl]2] E[(E[E[XIQ2]IQl])2] < E[E[E[XIQ2]2IQl]]

E[E[XIQ2]2]

via Jensen's ineqnality. D

www.MathGeek.com

Page 129: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

128 Probability Theory

Consider a probability space (0, :F, P) and let H denote the Hilbert space of square integrable random variables defined on (0, :F, P) where (X, Y) = E[XY] and where we agree to iden­tify any two random variables X and Y for which E[(X - y)2] = O. Let X, Y1 , ... , Yn be second order random variables defined on (0, :F, P). Our goal now is to find a Borel measurable fU11(;­tion f: lRn ----+ lR so that E[(X - .1(11, ... , Yn))2] is minimized over all such functions f. Let G be the subspace of H given by all elements of H that may be written as Borel measurable transformations of Y1 , ... , Yn . Using the Hilbert Space Projec­tion Theorem (Theorem 4.7) we know that the function we are seeking is the projection of X on G; that is, we seek the point in G that is nearest to X.

5.6 Lemma The projection of X on G is given by E[XIY1, ... , Yn].

Proof. First, note that E[XI11, ... , Y,,] E G since (via Jensen's inequality) we have:

Next, let Z E G and note that

E[XZ] = E[E[XZI11, ... , Y"ll = E[ZE[XI11, ... , Y"ll· That is,

(X, Z) = (E[XIY1' ... , Yn ], Z)

which implies that

(X - E[XI11, ... , Yn ], Z) = O.

Thus, X - E[XI11, ... , Y,,] is orthogonal to every element in G. D

Thus, we conclude that the best minimum mean-square Borel measurable estimate of X in terms of 11, ... , Yn is given by E[XI11, ... ,Yn ]. The following result shows that Borel mea­surability of our estimators cannot be dispensed with:

5.49 Theorem Let A1 be any real number. There exists a probability space (0, :F, P), two bounded mndom variables X and Y defined on (0, :F, P), and a f1Lnct'lon .I: lR ----+ lR such that X(w) = f(Y(w)) Jor all w E ° yet such that E [(X - E[Xly])2] > !vI.

www.MathGeek.com

Page 130: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Regression Fu nctions 129

<) Proof. See "A Note on a Common Misconception in Estima­tion" by Gary \\'ise in Systems and Control Letters, 1985, Vol. 5, pp. 355-356. For related material, see also "A Result on Multi­dimensional Quantization" by Eric Hall and Gary \Vise, in Pro­ceedings of the American Mathematical Society, Vol. 118, No.2, June 1993, pp. 609-613. D

5.18 Regression Functions

Consider random variables X and Y defined on a probability space (0, F, P) with E[X] < 00. A regression function of X given Y = y is denoted by E[XIY = y] and is defined to be any real-valued Borel measurable function on lR that satisfies

r E[XIY = y] dFy(y) = r X dP JB JY-l(B)

for all B E 8(lR).

5.50 Theorem Any two regression functions of X given Y = yare equal almost everywhere with respect to the measure 'induced by Fy .

5.51 Theorem Consider random variables X and Y defined on a probability space (0, F, P) with E[IXI] < 00. If ¢(y) = E[XIY = y] then E[XIY] = ¢(Y) a.s.

5.52 Theorem Consider two random variables X and Y possessing a joint density function fx, y then

E[XIY = y] = r x fx,Y(x, y) dx JIT€. fy(y)

almost everywhere with respect to the measure induced by Fy . That is, a version of E[XIY = y] is given by

r . fx,Y(x, y) d JIT€. X fy(y) x.

www.MathGeek.com

Page 131: Probability - Basic Ideas and Selected Topics

130

(The ratio

www.MathGeek.com

fx.y(x, y) fy(y)

Probability Theory

is called a condit'lonal densdy of X given Y = y and is denoted by fXIY(xly).)

Example 5.19 Let X and Y be zero mean, unit variance, mutually Gaussian random variables with correlation coefficient p. In this example we will find E[XIY = y] and E[XIY]. Note that

E[XIY =y] r xfX,y(x'Y)dx JlR fy(y)

1 (-(x2 - 2pxy + y2)) r -=-2n-v/----;;::O=l =_=p::::;;:2 exp 2 (1 - (2) d

JJFi!. x 1 (_y2) X y'21f exp ---:2

1 1 (y2) (_y2) -y'2K-2-n VI - p2 exp 2 exp 2(1 _ (2)

~ (

-(X2 - 2PXY)) x x exp ( 2) dx

JFi!. 21-p

1 1 (y2) (_y2) y'2K v/l - p2 exp 2 exp 2(1 _ (2)

~ (_(x2 _ 2pxy ± p2y2))

X x exp (2) dx lR 21-p

(y2) (_y2) (p2y2)

exp 2 exp 2(1 _ (2) exp 2(1 _ (2)

X r _1_ 1 x exp (-(X - py)2) dx JlR y'2K v/l - p2 2(1 - (2)

(y2) (_y2) (p2y2)

exp 2 exp 2(1 _ (2) exp 2(1 _ (2) py

(y2(1 _ p"2) _ y"2 + p2y"2)

exp 2(1 _ p"2) py py

where the final integral above is simply the mean of a N(py, 1- (2) random variable. Thus, it follows that E[XIY = y] = py and hence that E[XIY] = pY a.s. D

www.MathGeek.com

Page 132: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Regression Fu nctions 131

As the following theorem shows, the existence of a joint density function for two random variables X and Y with X integrable places no additional restrictions on the regression function of X given Y = y.

5.53 Theorem Let 9 be any Borel measurable function mapping JR into JR. There exist mndom variables X and Y possessing a joint density function such that X is integmble and E[XIY = y] = g(y) for all y E R

Proof. Let g: JR ----7 JR be Borel measurable and define

1 f(x, y) = 4" exp[- exp(lyl)lx - g(y)I]·

Note that f(x, y) is a joint probability density function since

kkf(x, y)dxdy k k ~ exp[ - exp(lyl) Ix - g(y) I] dx dy

k k ~ exp[ - exp(lyl) Izl] dz dy

k ~ exp( -IYI) dy = 1.

Let X and Y be random variables s11ch that the pair (X, Y) has a joint probability density function given by f(x, y). Notice from the above calculation that a second marginal probability density function of f(x, y) is given by fy(y) = exp( -lyl)/2. Recall that a version of E[XIY = y] is given by fIR. x [J(x, y)/ fy(y)] dx. This version will be used throughout the remain­der of this proof. Substituting for fy (y) implies that

E[XIY =y] 2 exp(lyl) ( ::. exp[ - exp(lyl) Ix - g(y) I] dx JIR. 4 ( z+g(y)

2exp(lyl) JIR. 4· exp[-exp(IYI)lzl] dz

2exp(IYI)g(y) \1 I) = g(y). 2exp y

Hence, the random variables X and Y with the joint probability density function f(x, y) are such that E[XIY = y] = g(y), where 9 was an arbitrarily preselected Borel measurable function. D

www.MathGeek.com

Page 133: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

132 Probability Theory

{) 5.19 Statistical Hypothesis Testing

Basic concepts of statistics arise in medicine, engineering, sociol­ogy, bllsiness, education, and other areas. For example, consider a medical situation in which a new medication for a particular problem is being tested. Assume that patients are divided into two groups, and assume that the patients in the first group are each given the new medication and that the patients in the sec­ond group are each given a placebo. (A placebo is a sugar pill that is identical in appearence to the medication.) Assume that each group has 50 patients in it. 'What if 36 patients in the first group were found to be free of the medical problem of concern, and 25 patients in the second group were found to be free of the medical problem of concern? How might we describe these re­sults? Of the 50 patients taking the new medication, 36 of them improved. This is an objective result of the data. However, we should be careful before concluding that 72% of the time the new medication will be effective. This conclusion belongs in the realm of statistical inference.

A statistical hypothesis is a nonempty family of probability mea­Sllres on a given measurable space. For convenience, we will take our llnderlying measurable space to be (JR., B(JR.)). Then, for instance, the probability measures of interest could be distribu­tions of random variables. A statistical hypothesis is said to be simple if it is a singleton set. Roughly speaking, we are trying to discern which hypothesis is in effect based upon knowledge of a realization of a random variable. For example, if the hy­potheses are simple and if, under one hypothesis, unit measure is given to a particular Borel set, and if, under the other hy­pothesis, the measure gives unit measure to a disjoint Borel set, then it should be straightforward to discern which hypothesis is in effect. Indeed, just see which of the two Borel sets contains the realization and announce the corresponding hypothesis.

Consider the situation where we have two disjoint simple sta­tistical hypotheses Ho and Hl and assume that we know the probability 1To of Ho and 1Tl of H1 . These probabilities are of-

www.MathGeek.com

Page 134: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Statistical Hypothesis Testing 133

ten called the priors since such a probability is the probability of a hypothesis being true without regard to any random vari­able that is observed. For convenience, assume that the relevant measures associated with Ho and HI have probability densities denoted, respectively, by .10 and .h.

\Ve note that there are two types of errors we could make in reaching a decision. VVe conld annonnce HI when Ho is trne or we could announce Ho when HI is true. Our goal will be to make a decision in such a way so as to minimize the probability of error Pe. Let So denote a Borel set snch that if a realization belongs to So then we announce Ho and if a realization belongs to S8 we announce HI. Let SI = Soc. Thus, it follows that

Pe= 7rO r fo(x)dx+ 7rl r .h(x)dx. JS I JSa

Rewriting this, we have

Pe = 7ro hI .10 (x) dx + 7rl ha .II (x) dx

+7rl hI .II (x) dx - 7rl hI .11 (x) dx

7rl + hI (7rofo(x) - 7rl.h(x)) dx.

Now, we see that we can minimize Pe by choice of SI by defining SI to be the set ofreal numbers x such that 7rofo(x) -7rl.h(X) < o. Consequently, So is the set of all real numbers x such that 7rofo(x) - 7rl.h(x) ;:::: O. \Ye note that the equality condition in these inequalities is arbitrary since such a change in the inequal­ity does not change the corresponding integral.

As an example, consider testing for the hypothesis that a ran­dom variable X has a standard normal distribution versus the hypothesis that X is normal with a mean and a variance of one. Assnme that each hypothesis is eqnally likely; that is, 7ro = 7rl = 1/2. Using the above procednre, we announce that the mean is one for real nnmbers x snch that

e-(x-l)2

that is, we announce that the mean is one when x > 1/2. Hence, we announce that the mean is one whenever X E (1/2, (0), and this test minimizes the probability of error.

www.MathGeek.com

Page 135: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

134 Probability Theory

5.20 Caveats and Curiosities

www.MathGeek.com

Page 136: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Random Processes 135

6 Random Processes

6.1 Introduction

Throughout this chapter, we will assume that all probability spaces are complete unless otherwise specified. A random pro­cess (or a stochastic process) defined on a probability space (0, F, P) is an indexed collection of random variables each defined on (0, F, P). \Ve denote a random process hy {X(t) : t E T} where T is a nonempty index set that often denotes time and is usually (in these notes, always) taken to be a subset of R Thus, for each fixed t in T, X (t) (or, more precisely, X (t, .)) is simply a random variable defined on (0, F, P). If T is a countably in­finite set then we say that {X (t) : t E T} is a random sequence or a discrete time or discrete parameter random process. If T is a subinterval of lR. then we say that {X (t) : t E T} is a con­tinuous time or continuous parameter random process. Vve will often denote a random process {X(t) : t E T} by {X(t)} when the index set T is arbitrary or clear from the context.

Consider a random process {X(t) : t E T}. A function X(t, wo) : T ---+ lR. obtained by fixing some Wo E n and letting t vary is called a sample function or sample path or trajectory of the random process {X (t)}. If T is count ably infinite then a sample path is called a sample sequence. If {tl' t"j, ... , tn} is any finite set of elements from T then the joint probability distribution of the random variables X(t 1 ), ... , X(tn ) is called a finite dimensional distribution of the random process {X (t)}.

A random process {Y(t) : t E T} is said to be a modification of a random process {X(t) : t E T} if X(t) = Y(t) a.s. for all t E T. Notice that in such a case {X(t) : t E T} and {Y(t) : t E T} have the same family of finite dimensional distributions. Also, note that the associated P-nnll set can depend on t.

Two random processes {X(t) : t E T} and {Y(t) : t E T} are said to be indistinguishable if, for almost every w, X(t, w) = Y(t, w) for all t E T. Notice that there is just one set of measure zero off of which X(t) = Y(t) for all t in T while for

www.MathGeek.com

Page 137: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

136 Random Processes

a modification the set of measure zero off of which X(t) = Y(t) may depend on t. If T is a countable set then the two definitions are equivalent since a countable union of null sets is itself a null set.

The value of a problem is not so much in coming up with the answer as in the ideas and attempted ideas it forces on the would­be solver. -1. N. Herstein

Let D be a subset of R The set D is said to be dense in lR if every nonempty open snbset of lR contains an element from D. For example, the set Q of rational numbers is dense in R Let {X (t) : t E T} be a random process defined on a complete probability space (0, F, P) where T is an interval. The random process {X (t) : t E T} is said to be separable if there exists a countable dense subset I of T and a null set N E F such that if w E NC and t E T then there exists a sequence {tn}nEN of elements from I with tn ---+ t such that X (tn' w) ---+ X (t, w).

6.1 Theorem Consider a random process {X (t) : t E T} defined on a complete pTObabildy space and assume that T E B(lR). There exists a separable random pTOcess {Y(t) : t E T} defined on the same pTObabUity space that ,is a modificat'ion of {X(t) : t E T}.

Theorem 6.1 says that requiring a random process to be sep­arable places no additional restrictions on the family of finite dimensional distributions of that process. In short, any random process admits a separable modification.

Let (fh, Fd and (02 , F 2 ) be two measurable spaces. If A E Fl and B E F2 then A x B is called a measurable rectangle. The smallest IT-algebra on 0 1 X O2 that contains every measurable rectangle is denoted by Fl x F'2 and is called the product IT­algebra on 0 1 x fh

6.2 Theorem If (01, F 1 , !11) and (02, F 2 , !12) are (j-finite measure spaces then there e.rists a IT-finde measure on the measm-able space (01 X O2, Fl x F 2 ), called the pTOduct measure and denoted

www.MathGeek.com

Page 138: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

I ntrod uction 137

by P,1 X P,2, such that, for any measurable rectangle A x B, P,1 x P,2(A x B) = P,1(A)P,2(B).

Consider a random process {X (t) : t ETc JR.} defined on a probability space (0, 5", P), and let M(JR.) denotes the collec­tion of all Lebesgue measurable subsets of JR.. If T is an ele­ment of M(JR.) and if X is a measurable mapping from (JR. x 0, M(JR.) x 5") to (JR., M(JR.)) then we say that the random process {X (t) : t E T} is measurable.

Example 6.1 Let A be a subset of JR. that is not a Lebesgue measurable set, let X be a positive random variable defined on a probability space (0, 5", P), and define a random process {Y(t) : t E JR.} on this space via Y(t, w) = X(w)IA(t). The inverse image of the Borel set (0, (0) is A x ° which is not a measurable set in the product measure space. Thus, {Y(t)} is not a measurable random process. Notice that for each fixed t, Y(t) is a random variable yet, for each fixed w, Y(t) is a non­Lebesgue measnrable fnndion of t. D

6.3 Theorem Let {X (t) : t E T} be a random process defined on a complete probability space and assume that T is a Lebesgue measurable subset of R Suppose that there exists a subset N of JR. having Lebesgue measure zero such that X (s) converges in probability to X(t) as s ---+ t for every t in T \ N. (That is, S1lppose that {X(t) : t E T \ N} is continuous in probability.) Then there e:rists a random process defined on the same space that is a measurable and separable modification of {X (t) : t E T}.

Theorem 6.3 says that any random process that is continuous in probability admits a modification that is both separable and measurable. Recall that separability places no additional re­strictions on the family of finite dimensional distributions of a random process. This statement cannot be made for measura­bility. In particular, there exist random processes that do not possess measurable modifications. An example of such a pro­cess (that is, nevertheless, discussed frequently in engineering contexts) is provided by the following theorem.

www.MathGeek.com

Page 139: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

138 Random Processes

6.4 Theorem Let {X (t) : t E lR} be a random process composed of second order, positive variance, mui1wlly independent random variables defined on the same probability space. The random process {X (t) : t E lR} does not admit a measurable modification.

One is often confronted with a need to integrate the sample paths of a random process. The following theorem presents con­ditions that are sufficient to ensure that almost all of the sample paths of a random process are Lebesgue integrable. Later, we will define an L2 (or mean-square) integral for a certain family of random processes. It will not be defined as a pathwise in­tegral but instead will be defined as an L2 limit. (If both the pathwise integral and the L2 integral exist they will be equal almost surely.) VVe will find this latter type of integral much more useful for our purposes than an integral based upon the sample paths of a random process.

6.5 Theorem Let {X (t) : t E T} be a measurable random process. All sample paths of the random process are Lebesgue measurable functions of t. rf E[X(t)] exists for all t E T then it defines a Lebesgue measurable function of t. Further, if A is a Lebesgue measurable subset ofT and if f4 E[IX(t) I] dt < 00 then almost all sample paths of {X (t) : t E T} are Lebesgue integrable over A.

6.2 Gaussian Processes

A random process {X (t) : t E T} is called a Gaussian process if the random variables X(td, X(t 2 ), ... , X(tn ) are mutually Gaussian for every finite subset {tl' t 2 , ... , t n } of T.

6.6 Theorem Let {Y(t) : t E T} be any random process such that E[IY(t)12] < 00 for all t E T. There exists a Gaussian process {X (t) : t E T} (defined, perhaps, on a different probability space) such that E[X(t)] = 0 and E[X(s)X(t)] = E[Y(s)Y(t)] for all s and tin T.

www.MathGeek.com

Page 140: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Second Order Random Processes 139

6.3 Second Order Random Processes

A random process {X (t) : t ETc lR} is said to be a second or­der random process or an L2 random process if E [X2 (t)] < CXJ for all t E T. The a11tocovariance f11ndion of sl1ch a process is de­fined to be K(tl' t 2 ) = E[(X(tl) - E[X(tl )])(X(t2 ) - E[X(t2)])] where t l , t2 E T. Notice that if {X(t) : t E T} has autocovari­ance function K and if f : T ---+ lR then {X(t) + f(t) : t E T} also has auto covariance function K. That is, changing the means of the individual random variables in {X (t) : t E T} does not change the autocovariance function of the process. The autocorrelation function of a second order random process {X(t) : t E T} is defined to be R(tl' t"2) = E[X(tl)X(t"2)] for tl and t2 in T. Note that K(tl' t2) = R(h, t 2 ) -E[X(tl)]E[X(t2)]' Note, also, that for a zero mean, second order random process, the autocovariance function is equal to the autocorrelation func­tion.

6.7 Theorem Let K be a Teal-valued nonnegative definite function defined on TxT such that K(t, s) = K(s, t) fOT any t and s in T. TheTe exists a second ordeT mndom pmcess {X(t) : t E T} whose autocovaT'lance funct'lon is K.

A random process {X(t) : t E T} is said to be strictly stationary if given any positive integer n, any elements tl < t2 < ... < tn from T, and any h > 0 such that ti + h E T for each i ::::; n, the joint distribution function of the random variables X (tl + h), X(t2+h), ... , X(tn+h) is the same as that of X(t l ), X(t2), ... , X(tn). That is, if t denotes time, a stridly stationary random process is one whose finite dimensional distributions remain the same as time is shifted.

Example 6.2 Let {X(t) : t E lR} be a random process mm­posed of identically distributed, mutually independent random variables. Let s, t l , t2, ... , tn E lR where n E N, and note that

FX(h+s), ... ,X(tn+s)(Xl, ... , xn) = P(X(tl + s) ::::; Xl, ... , X(tn + S) ::::; Xn)

www.MathGeek.com

Page 141: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

140 Random Processes

P(X(tl + S) ::; Xl)'" P(X(tn + S) ::; Xn)

P(X(tl) ::; Xd ... P(X(tn) ::; Xn)

FX(h), ... ,X(tn)(Xl, ... , xn).

Thus, it follows that {X(t) : t E lR} is a strictly stationary random process. D

A random process {X (t) : t E T} is said to be wide sense sta­tionary CWSS) if it is a second order process, and if K(s, t) depends only on the difference s - t. VVe denote K(s + t, s) by K(t) for a random process that is wide sense stationary. In the case of a \VSS random process {X(t) : t E T} the assumption that E[X(t)] is a constant function of t is often added. However, this condition is unnatural mathematically and has nothing to do with the essential properties of interest for \VSS random pro­cesses. For example, let e be a random variable with a uniform distribution on [0, 1l"]. For each t E lR, let X(t) = cos(21l"t + 8). Then {X(t) : t E lR} is a VVSS random process with a noncon­stant mean.

Example 6.3 Let {X(t) : t E lR} be a random process composed of mutually independent random variables such that E[X(t)] = 0 for all t E lR and such that E[X2(t)] = (}2 E (0, (0) for all t E R This random process is wide sense stationary since

E [V(t)X(t )] = {E[X(t).]E[X(t + s)] = 0 j~ + s E[X2(t)] = (}2

is a constant function of t.

if s i= 0 if s = 0

D

6.8 Theorem A str-ictly stationar-y second or-der- mndom pmcess is wide sense stationar-y.

6.9 Theorem A wide sense stationar-y Gaussian pmcess with a con­stant mean is str-ictly stat'lonar-y.l

\Ve will next consider a calculus for second order processes. That is, we will consider a framework in which we may discuss con­tinuity, differentiation, and integration of second order random processes.

lThus, the phrase "stationary Gaussian process" is not ambiguous.

www.MathGeek.com

Page 142: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Second Order Random Processes 141

A second order random process {X(t) : t E lR} is said to be L2 continuous at the point t E lR if X (t + h) ---+ X (t) in L2 as h ---+ o. A second order random process {X (t) : t E lR} is said to be L2 differentiable at the point t E lR if (X(t+h) -X(t))/h converges in L2 to a limit X' (t) as h ---+ o. The next theorem relates L2 continuity of a second order random process to the auto covariance function of the random process.

Recall that a function f : lR2 ---+ lR is said to be continuous

at (x, y) if for every E > 0, there exists a 8 > 0 such that If(x, y) - f(a, b)1 < s for all points (a, b) in lR2 snch that

j(x - a)2 + (y - b)2 < b.

6.10 Theorem Let {X(t) : t E lR} be a second order random pTOcess such that E[X(t)] is continuous. The random pTOcess {X(t) : t E lR} ,is L2 cont'inu01Ls at T E lR 'if and only if K ,is continuous at (1', 1') E lR2

.

6.1 Lemma If an autoc01!aTiance function is continuous at (t, t) fOT all tin lR then ,it is cont'inuous at (s, t) fOT all sand t ,in R

6.1 Corollary Let {X (t) : t E lR} be a WSS random process with autocovariance function K (t). If the pTOcess is L2 continuous at some point s then K is continuous at the origin. If K is continuous at the origin then it is continuous everywhere and the random pTOcess is L2 contimw1Ls for all t.

Notice that the random process in Example 6.3 is nowhere L2 continuous since its auto covariance function is discontinuous at the origin. Vve will next relate L2 differentiability and differen­tiability of the autocovariance function in the wide sense sta­tionary case.

6.11 Theorem Let {X(t) : t E lR} be a WSS random pTOcess with autocovar'lance fundion K (t). If the pTOcess ,is L2 d~fferentiable

at all points t E lR then K (t) is twice differentiable fOT all t E lR and {X' (t) : t E lR} is a llyide sense stationary random pTOcess with autocovaTiance function - K" (t).

\Ve next consider integration of second order random processes. Let {X (t) : a ::; t ::; b} be a second order random process with

www.MathGeek.com

Page 143: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

142 Random Processes

auto covariance function K where a and b are real numbers with a < b. Let 9 be a real-valued function defined on [a, b]. \Ye define ib g(t)X(t) dt

as follows. Let ~ = {to, t l , ... , tn} be such that a = to < tl < ... < tn = b, and let I~I denote the maximum of It i - ti-ll over all positive integers i ::; n. Define

n

I(~) = "Lg(tk)X(tk)(tk - tk-l). k=l

If I(~) converges in L2 to some random variable Z as I~I ---+ 0 then we say that g(t)X(t) is L2 integrable on [a, b] and we denote the L2 limit Z by

ib g(t)X(t) dt.

6.12 Theorem If, in the context of OUT discussion, E[X(t)] and g(t) aTe continuous on [a, b] and if K ,is continuous on [a, b] X [a, b] then g(t)X(t) is L2 integmble on [a, b].

6.13 Theorem If E[X(t)] = 0, if 9 and haTe cont'inuous on [a, b], and 'if K ,is continuous on [a, b] x [a, b] then

E [ib g(s)X(s) ds ib h(t)X(t) dt] = ib ib g(s)h(t)K(s, t) dsdt,

and,

E [i b

g(s)X(s) dS] = E [i b

h(t)X(t) dt] = o.

6.14 Theorem If E[X(t)] = 0, if h is continuous on [a, b], and if K is continuous on [a, b] X [a, b] then

E [X(S) ib h(t)X(t) dt] = ib K(s, t)h(t) dt.

6.2 Lemma ff a seq1Lence of mndom vaTiables defined on some probability space conveTges in L2 and converges almost sUTely then the limits aTe equal with probab'ility one.

www.MathGeek.com

Page 144: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

The Karhunen-Loeve Expansion 143

6.15 Theorem If the integral J: g(t)X(t) dt exists as an L2 integral and, for almost all w, as a Riemann integral then the two inte­grals are equal with probability one.

Proof. If g(t)X(t) is both L2 integrable and Riemann integrable a.s. on [a, b] then (using the current notation) I(lJ..) converges to Z in L2 and almost surely. Thus, the desired conclusion follows by the previous lemma. D

<) 6.4 The Karhunen-Loeve Expansion

Let K be a continuous auto covariance function defined on [a, b] x [a, b]. Define an integral operator A on L2 ([a, b]) (the set of all square integrable real-valued Lebesgue measurable functions defined on [a, b] where we identify any two functions that are eqnal a.e.) via

A[f](s) = 1b K(s, t)f(t) dt,

where a ::; s ::; band f(t) E L2 ([a, b]). Notice that the function A maps the real-valued function f defined on lR to the real­valued function A[f] defined on R A function e(-) is said to be an eigenfunction of the integral operator A if A[ e](s) = Ae(s) for some constant A and for a < s < b. The constant A is called the eigenvalue associated with the eigenfunction e(t).

6.16 Theorem (Mercer) US'ing the above notai'ion, let {en(-)}nEN be a sequence of e'igenf1Lndions of the integral operator A such that

if j i: k 'if j = k,

2. 'if e(.) is any eigenfunction of A then e(·) zs equal to a linear cornb'ination of the en's, and,

www.MathGeek.com

Page 145: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

144 Random Processes

3. the eigenvalue An associated with en is nonzero fOT each n E N.

It then follows that

(Xl

K(s, t) = L Anen(s )en (t) n=l

for sand t 'In [a, b] (where the ser'les converges absolutely and uniformly in both variables).

6.17 Theorem (Karhunen-Loeve) Let {X(t) : a ::; t ::; b} be a second oTdeT pTocess with zero mean and continuous autoc01JaTi­ance fmLdlon K. Let {en (t) }nEN be a sequence of eigenfmLct'lons of the integml opemtoT A (as defined above) associated with K that satisfies properties (1), (2), and (3) of MeTceT's TheoTem. Then

(Xl

X(t) = L Znen(t), n=l

fOT a ::; t ::; b, wheTe

Zn = lb X(t)en(t) dt.

FUTtheT, the Zn's aTe zero mean, oTthogonal (E[ZkZj] = 0 fOT k i= j) mndom vaTiables such that E[Z~] = An, and the seTies converges in L2 to X(t) unzformly in t; that is,

E [ ( X(t) - t, Zke,(t)) '] ~ () as n ---7 00, 1tn'iformly fOT t in [a, b].

Notice that each term in the above series expansion for X(t) is a product of a random part (that is, a function of w) and a deter­ministic part (that is, a function of t). As the following theorem shows, the Karhunen-Loeve expansion takes on a special form when the random process is Gaussian.

6.18 Theorem Let the previous discussion set notation. In the Karhunen-Loeve expansion for a Gaussian mndom process, the mndom sequence {Zi}iEN is a Gaussian mndom sequence com­posed of mutually independent mndom variables.

www.MathGeek.com

Page 146: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Markov Chains 145

6.5 Markov Chains

Consider a discrete parameter random process {Xn : n E N u {O}} where each random variable in the process takes values only in some snbset C = {ai :i E I} of lR where I is a subset of N. For each j and k: from I, let Pj = P(Xo = aj) and let Pjk = P(Xl = aklXo = G:j). The random sequence {Xn : n E NU{O}} is said to be a Markov chain if, for any nonnegative integer n,

= PjoPjoj1 Pjd2 X ... x Pjn-dn.

The points in C are called the states of the lVIarkov chain, the Pk values are called the initial probabilities of the Markov chain, and the Pjk values are called the transition probabilities of the Markov chain. If C is a finite set then the Markov chain is said to be a finite Markov chain.

Higher order transition probabilities of a Markov chain are de­fined as follows. Let

PJ~) = P(Xn = aklXo = aj).

This probability is equal to the sum of the probabilities of all possible distinct seqnences of states that begin at state aj and arrive, n steps later, as state G:k. For example, if n = 2, then

P (2) _ '""' P P jk - L jm mk·

mEl

A simple inductive argument shows that in general we have

P(m+n) = '""'p(m .. )p(n) Jk L)1 lk

iEI

which is a special case of a general Markov property known as the Chapman-Kolmogorov equation.

The unconditional probability of entering state Cl!k at the nth step is denoted by P1n

) and is given by

P(n) = '""' P .p(n) k L J )k·

jEl

www.MathGeek.com

Page 147: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

146 Random Processes

Note that if Pi = 1 (that is, if the Markov chain always begins in state cy.) then p(n) = p(n)

, 1. k lk .

\Ve will say that a state ak can be reached from state aj if there

exists some nonnegative integer n for which p~~) is positive. A set A of states is said to be closed if no state outside of A can be reached from any state inside A. For an arbitrary set A of states, the smallest closed set containing A is said to be the closure of A. If the singleton set containing a particular state is closed then that state is said to be an absorbing state. A Markov chain is said to be irreducible if there exists no closed state other than the set of all states. Note that a Markov chain is irreducible if and only if every state can be reached from every other state.

A state CYj is said to have period m > 1 if p)7) = 0 unless n is a multiple of m and if m is the largest integer with this property. A state aj is said to be aperiodic if no such period exists. Let

ij~) denote the probability that for a Markov chain starting in state CYj, the .first entry into state CYk occms at the nth step. Let

= ijk = L ij~).

n=1

Note that ijj = 1 for a Markov chain that begins in state CYj

then a return to state CXj will occur at some later time with probability one. In this case, we let

= J-Lj = L ni?;),

71,=1

and we call J-Lj the mean recurrence time for the state aj. A state aj is said to be persistent if ijj = 1 and is said to be transient if ijj < 1. A persistent state CYj is said to be a null state if J-Lj = 00. An aperiodic persistent state aj with J-Lj < 00 is said to be ergodic.

= 6.19 Theorem A state CXj is transient if and only 'if L p~~) < 00.

71,=0

6.20 Theorem A persistent state eej is a null state if and only if (Xl

L p~7) = 00 yet p~7) --7 0 as n --7 00.

17.=0

www.MathGeek.com

Page 148: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Markov Processes 147

6.21 Theorem rr the state aj ,is aperiod'ic then limn--->oo Pl;l ,is e'ither equal to zero or to fij / Pj .

6.22 Theorem In a finite Markov chain there exist no null states and it is impossible that all states are transient.

6.6 Markov Processes

Consider a random process {X (t) : t E T} defined on a pro b­ability space (0, F, P) where T c Itt The random process { X (t) : t E T} is said to be a Markov process if, for kEN and o ::; tl ::; t2 ::; ... ::; tk ::; u where ti E T for each i and U E T,

P(X(u) E B I X(t 1 ), ... , X(t k )) = P(X(u) E B I X(td) a.s.

for each real Borel set B. Recall that a conditional expectation with respect to {X (s) : s ::; t} is by definition a conditional expectation with respect to IT( {X (s) : S ::; t}), the smallest IT­algebra with respect to which every random variable in the set {X (s) : s ::; t} is measurable.

6.23 Theorem Consider a Markov process {X (t) : t E [0, oo)}. For all real Borel sets B and for all t ::; u, 'it follollJS that P(X(u) E

B I {X(s) : S ::; t}) = P(X(u) E B I X(t)) a.s.

Note that the result of the previous theorem follows from the seemingly weaker condition used to define a Markov process. The theorem says roughly that a conditional probability of a future event (at time u) associated with a Markov process given the present (at time t) and the past (at times before t) is the same as a conditional probability of that future event given just the present. That is, for a Markov process, the past and the present combined are no more "informative" than jnst the present for determining the probability of some future event. The following corollary to the previous theorem restates this property in terms of conditional expectation.

www.MathGeek.com

Page 149: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

148 Random Processes

6.2 Corollary Consider a Markov process {X(t) : t E [0, oo)}. If Z is an integrable random variable that is (J ( { X (s) : s 2': t})­measurable then E[Z I {X(s) : s ::::; t}] = E[Z I X(t)] a.s.

The following theorem says that the future and the past of a Markov process are conditionally independent given the present.

6.24 Theorem Consider a Markov process {X(t) : t E [0, oo)}. If Z is an integrable random variable that is (J ( { X (s) : s 2': t})­measurable and if Y is an integrable random variable that is (j({X(s) : s::::; t})-measurable then E[ZYIX(t)] = E[ZIX(t)] E[Y I X(t)] a.s.

Notice in the previous theorem that Z is a function of the present and the future of the Markov process and that Y is a function of the present and the past of the Markov process.

6.25 Theorem (Chapman-Kolmogorov) ConsideT a MaTkov pro­cess {X (t) : t E [0, oo)}. If Z is an integrable random variable that is (j ( { X (s) : s 2': t}) -measurable and if 0 ::::; to < t then E[Z I X(to)] = E[E[Z I X(t)] I X(to)] a.s.

Example 6.4 Consider a zero mean, Gaussian, Markov random process {X(t) : t E JR.} and consider real numbers tl < t2 < ... < tn < t. Since the process is Markov it fol­lows that E[X(t) I X(td, X(t"2), ... ,X(tn)] = E[X(t) I X(tn )]

a.s. But, since the process is also Gaussian, we know that E[X(t) I X(tn)] = aX(tn ) a.s. for some real constant a. Now, consider the problem of estimating or predicting the value of the process at some future time t based on a collection of past samples of the process by taking the conditional expectation of the process at time t given the past samples. For a Gaussian Markov process, this estimate is simply a linear function of the last sample. All previous samples taken before the last sample may be discarded. Although we will not show it here, an esti­mate of this type based on conditional expectation provides a best (in a minimum mean-square error sense) estimator of the random variable of interest as a Borel measurable transforma­tion of the data. D

www.MathGeek.com

Page 150: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Martingales 149

6.7 Martingales

Let {Xn}nEN be a random sequence defined on a probability space (n, F, P) and let {Fn}nEN be a seqllence of IT-sllbalgebras of F. The random seqllence {Xn}nEN is said to be a martingale relative to {Fn : n E N} if the following four conditions hold for each positive integer n:

2. Xn is Fn-measurable,

3. E[IXnll < 00, and

4. E[Xn+l I Fnl = Xn a.s.

Do you know what it is to be possessed by a problem, to have within yourself some urge that keeps you at it every waking moment, that makes you alert to every sign pointing the way to its solution; to be gripped by a piece of work so that you cannot let it alone, and to go on with deep joy to its accomplishment? -Lao G. Simons

A sequence of IT-algebras that satisfies condition (1) is called a filtration. If condition (2) holds for all n E N then we say that the random sequence {Xn}nEN is adapted to the filtration {Fn : n EN}. If Fn = IT(Xl' X 2 , ... , Xn) then {Fn : n E N} is a filtration and is called the canonical filtration associated with the random sequence {Xn}nEN. If a martingale is given without a specified filtration then it should be regarded as a martingale with respect to its canonical filtration.

6.26 Theorem If a random sequence is a martingale with respect to some filtration then it is a martingale with respect to 'its canon­ical filtration.

www.MathGeek.com

Page 151: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

150 Random Processes

Proof. Assume that {Xn :n E N} is a martingale with respect to some filtration {Fn : n E N} and let Tn and n be positive integers such that Tn < n. Since Xm is Fm-measurable and Fm c Fn, it follows that Xrn is Fn-measurable. Thus, it follows that Xl, X 2, ... , Xn are each Fn-measurable for any positive integer n. Finally, since IT(XI, ... , Xn) C Fn for any n E N, it follows that

E[E[Xn+IIFnlliJ(XI' ... , Xn)l

E[XnliJ(XI, ... , Xn)l

D

A random sequence {Xn}nEN is said to be a submartingale rel­ative to {Fn : n E N} if conditions (1), (2), and (3) given above and condition (4') given below each hold for every positive inte­ger n: (4') E[Xn+1 I Fnl ~ Xn a.s.

A random sequence {Xn}nEN is said to be a supermartingale relative to {Fn : n E N} if conditions (1), (2), and (3) given above and condition (4") given below each hold for every positive integer n: (4") E[Xn+1 I Fnl :::; Xn a.s.

Example 6.5 Let Xl, X 2, ... be mutually independent ran­dom variables with zero means and finite variances. Further, let Sn = Xl + X 2 + ... + Xn and Tn = S~ for n E N. Note that

E[Tn+IIXI' ... , Xnl

E[(XI + X 2 + ... + Xn + Xn+d 2 IXI' ... , Xnl E[(XI + ... + Xn)2 + 2Xn+1 (Xl + ... + Xn)

+X~+IIXI' ... , Xnl 2 2 E[Sn + 2Xn+1 Sn + Xn+IIXI, ... , Xnl

E[S~IXI, ... , Xnl + 2E[Xn+ISnIXI, ... , Xnl

+E[X~+IIXI' ... , Xnl

S~ + 2SnE[Xn+ll + E[X~+1l Tn + E[X~+1l

> Tn a.s.

Thus, Tn is a submartingale with respect to iJ(XI' ... ,Xn). D

www.MathGeek.com

Page 152: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Random Processes with Orthogonal Increments 151

Martingales are often used to model gambling games that are fair. That is, they model a game in which the expected fortune of the gambler after the next play is the same as his present fortune. In this context, a submartingale would represent a game that is favorable to the gambler and a supermartingale would represent a game that is unfavorable to the gambler. (A "martingale" is part of a horse's harness that prevents the horse from raising its head too high. A martingale be<:ame a gam­bling term through its association with horse racing and later was used to describe processes of this sort.) The following the­orem is called the Martingale Convergence Theorem and is due to Joseph Doob.

6.27 Theorem (Doob) Let the mndom sequence {Xn}nEN be a sub­martingale wdh respect to ds canon'lcal filtmtion. If SUPnEN E[IXnl] is finite then Xn converges almost surely to a mndom variable X such that E[IXI] ::; sUPnEN E[IXnl]·

Consider a filtration {Fn : n E N} and let FrXJ denote the small­est 0" -alge bra containing U~=l Fn. In this case we write Fn r F = and have the following result.

6.28 Theorem If Fn r F= and if Z is an integmble mndom variable then E[Z I Fn] converges to E[Z I F=] a.s.

6.8 Random Processes with Orthog­onal Increments

A random pro<:ess {X (t) : t E T} is said to possess orthogonal in<:rements if E[IX(t) - X(s)12] < IX) for all 5, t E T and if, whenever parameter values satisfy the inequality Sl < t1 ::; S2 < t 2, the increments X(t1)- X (Sl) and X(t2)- X (S2) ofthe process are orthogonal; that is, E[(X(t1) - X(Sl))(X(t2) - X(S2))] = o.

6.29 Theorem Let {X(t) : t E T} be a random process with orthogo­nal increments. There e:cists a nondecreasing function F(t) such

www.MathGeek.com

Page 153: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

152 Random Processes

that E[IX(t) - X(sW] = F(t) - F(s) when s < t. Further, the function F is unique up to an additive constant.

Notice that the previolls theorem implies that the mean-square continuity of a random process with orthogonal increments is related to the pointwise continuity of the corresponding function F(t). VVe denote the relationship between the function F(t) and the random process {X(t)} by writing E[ldX(t)12] = dF(t). (Our use of a differential here is just for notational purposes.)

Let {Y(t) : t E Jl{} be a random process with orthogonal incre­ments and let h(t) be a real-valued function. VYe are now going to direct our attention toward defining an integral of the form

k h(t) dY(t).

Since the sample functions of the random process {Y(t)} are not generally of bounded variation, we cannot define the above integral as an ordinary Riemann-Stieltjes integral with respect to the individual sample functions. Instead, we will define this integral (called a stochastic integral of h( t) with respect to the random process {Y(t)}) as an L2 limit. As usual, we begin by defining the integral when h(t) is a step function.

Assume that the function h(t) is of the following form where ai

and Ci are real numbers for each i and where al < a2 < ... < an:

h(t) ~ {~ if t < al

if aj-l =::; t < aj for 1 < j =::; n

if t ::2: an.

For such a function h we will define the stochastic integral of h(t) with respect to {Y(t)} to be

More precisely, we define the integral to be any random variable that is equal almost surely to the sum on the right hand side. (One technical detail is that if aj is a discontinuity point of F then instead of Y (aj) in the previous definition we use the mean-square limit of Y(t) as t raj. This limit will exist due

www.MathGeek.com

Page 154: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Random Processes with Orthogonal Increments 153

to the relation between F and Y (t).) It is not difficult to show that if h(t) and g(t) are step functions (as defined above) and if E[ldY(t)12] = dF(t) then

E [l h(t) dY(t) l g(t) dY(t)] = l h(t)g(t) dF(t).

Now, consider a real-valued function h(t) and let {hn(t)}nEN be a sequen<:e of step fundions (as defined above) su<:h that

l (h(t) - hn (t))2 dF(t) ---70

as n ---7 00. Further, let Zn denote the sto<:hastk integral of the step fundion hn(t) with resped to the random pro<:ess {Y(t)}. The previous observation implies that there exists a random variable Z such that E[(Z - Zn)2] ---7 0 as n ---7 00. Further, the random variable Z does not depend on the particular se­quen<:e {hn}nEN given above. That is, a random variable equal almost surely to Z will be obtained when any sequen<:e {hn}nEN mnverging in the above sense to h is <:hosen. \Ve define the sto<:hasti<: integral

l h(t) dY(t)

to be any random variable that is equal almost surely to the random variable Z.

6.30 Theorem In the context of OUT discussion, a function h may be TepTesented as a limit of step functions (in the above sense) if the integml fJR h2(t) dF(t) e.'rists and is finite.

If h(t) = g(t)+2p(t) is a mmplex-valued fundion su<:h that 9 and P each satisfy the condition of the previous theorem then the in­tegral fJR h( t) dY( t) is defined to be fJR g(t) dY (t) +l fJRp( t) dY(t). \Ve will say more about complex-valued random processes later.

www.MathGeek.com

Page 155: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

154 Random Processes

6.9 Wide Sense Stationary Random Processes

Recall the definition of a wide sense stationary random process that was given on page 140. In this section we will consider zero mean, continuous time VVSS random processes with par­ticular concern for their harmonic properties. Throughout this section we will assume, based upon the following theorem, that all wide sense stationary random processes satisfy the following condition:

lim E[IX(t) - X(sW] = O. t-s---+O

6.31 Theorem A wide sense stationary random process {X(t) : t E

lR.} possesses a separable and measurable modification if

lim E[IX(t) - X(s)12] = O. t-s---+O

Further, 4 a modification of {X (t) : t E lR.} is measurable then the pmcess must satisfy the previous condition.

Thus, the previous condition is a minimal continuity hypothesis and we will assume that it is satisfied whenever we discuss con­tinuous parameter WSS random processes. In addition, we will always take the parameter set of such a process to be either lR. or [0, (0). Recall that the auto covariance function of a zero mean \VSS random process is defined by K(t) = E[X(s + t)X(s)]. Note that the autocovariance function K(t) is continuous since

IK(t) - K(s)1 IE[(X(t) - X(s))X(O)]1

< VE[IX(t) - X(s)12]E[IX(0)12]

where the right hand side approaches zero as t - s ----7 0 via the previolls continuity hypothesis.

6.32 Theorem The autocovariance function K(t) of a zem mean WSS random pmcess may be expressed as

www.MathGeek.com

Page 156: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Wide Sense Stationary Random Processes 155

wheTe the junction F is nondecTeasing, bounded, Tight continu­ous, and such that F( -00) = O. FuTther, the junction F is the unique such junction jor which the above equality is satisfied.

Consider a zero mean WSS random process with a autocovari­ance function K and let F denote the function obtained via the previous theorem for this autocovariance function. The func­tion F is called the spectral distribution function of a zero mean \VSS random process with autocovariance function K. If F is absolutely continuous then its derivative F' exists almost every­where and is called a spectral density function of a \VSS random process with auto covariance function K.

6.33 Theorem Ij JIll!.IK(t)1 dt < 00 then theTe e:rlsts a cont'lnuous spectral density junction given by

The spectrum of a \VSS random process is given by the set of all real numbers AO such that F (AO + c) > F (AO - c) for every [ > O. That is, the spectrum c:onsists of all points of increase of the spectral distribution function F. Note in particular that the spectrum of a vVSS random process is a subset of lR and is not, as is frequently misstated, a function. The spectrum consists of frequencies that enter into the harmonic analysis of both the auto covariance function and the sample functions of the random process.

6.34 Theorem Every zero mean wide sense stationary random pro­cess {X(t) : t E lR} satisjying the continuity condition given at the beginn'ing oj this sect-ion possesses a spectral representation oj the jorm

X(t) = r e21r1t)" dY(A),

Jill!. where the random process {Y(A) : A E lR} has orthogonal in­CTements and is s1Lch that E[ldY(A)12] = dF(A) wheTe F is the spectral distribution junction oj {X (t)}.

Let {X (t) : t E lR} be a mean-square continuous, wide sense stationary random process defined on a probability space (0,

www.MathGeek.com

Page 157: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

156 Random Processes

F, P) and let H denote the real Hilbert space L2 (O, F, P). Note that for any real number t, the random variable X(t) is a

point in H. Further, note that IIX(t)11 = VR(O) where R is the autocorrelation function of the random process {X (t) : t E lR.}. Let S denote the sphere in H consisting of all points in H that

are at a distance of V R(O) from the origin. Note that the random

process {X(t) : t E lR.} is a subset of S. Further, since the random process is mean-square continuous, we see that as .s ---+ t, IIX(s) -X(t)ll---+ 0, which implies that the random process is a continuous curve in H.

6.10 Complex-Valued Random Pro­cesses

It is often convenient to be able to deal with random processes that are complex-valued. The extension from the real case is very straightforward.

If {Y(t) : t E T} is a random process taking values in the com­plex plane then Y(t) = X(t) + ~Z(t) where {X(t) : t E T} and {Z(t) : t E T} are real-valued random processes. Further, E[Y(t)] = E[X(t)] + ~E[Z(t)] for all t for which the expecta­tions on the right hand side exist and are finite. We say that a complex-valued function is measurable if the real and imaginary parts of the function are each measurable. Finally, the autoco­variance function of a complex-valued random process {Y(t)} is given by K(s, t) = E[(Y(s) - E[Y(s)])(Y(t) - E[Y(t)])*].

One should be careful not to carelessly apply theorems about real-valued random processes to complex-valued random pro­cesses. For example, a wide sense stationary complex-valned Gaussian process need not be strictly stationary, and there exist two complex-valued mutually Gaussian random variables that are uncorrelated but not independent.

www.MathGeek.com

Page 158: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Linear Operations on WSS Random Processes 157

6.11 Linear Operations on WSS Ran­dom Processes

Let {X (t) : t E ~} be a zero mean \VSS random process with a spectral representation given by

where E[ldY(>.)I2] = dF(>'). By a linear operation on the pro­cess {X(t)} we will mean a transformation of {X(t)} into a random process {Z (t) : t E ~} of the form

The function C(>') may be any real or complex-valued function for which the following integral exists and is finite

k IC(>')1 2 dF(>.).

The function C is called the gain of the linear operation. Note that the process {Z (t)} is a zero mean VVSS random process and also satisfies the continuity condition given at the beginning of this section. (That {Z(t)} is zero mean follows from the definition of the stochastic integral and the fact that {X (t)} is zero mean.) Further, {Z(t)} is a WSS random process since its auto covariance function Q(t) itl given by

Q(t) = r e21TltA IC(>') 12 dF(>.). lIT{

In addition, the spectral distribution function G (>.) of {Z (t)} is given by

G (>.) = 1 1 C (JL ) 12 dF (JL ) . . (-=,>.]

If {X (t)} possesses a spectral density function f (>.) then {Z (t)} possesses a spectral density function g(>.) that is given by g(>.) = IC(>')1 2 f(>.)·

www.MathGeek.com

Page 159: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

158 Random Processes

In engineering contexts, it is common to consider linear op­erations (sometimes called linear filters) of the following form where the function h(t) is often (imprecisely) called an impulse response function:

Z(t) = k h(s) X(t - s) ds.

As the following theorem shows, this linear filter is a special case of the linear operations that we have been considering.

6.35 Theorem Let {X(t) : t E lR} be a zero mean WSS random process with a bounded spectral density function and a spectral repTesentation given by

X(t) = r e2mt)" dY(>').

J~

FUTtheT, let h be a Teal OT comple:r-valued function defined on lR that is continuous a. e. with respect to Lebesgue measure, inte­grable, and square integrable. Define

k h(s)X(t - s) ds

to be the limit in L2 as T ---7 CXJ of the L'2 'integral

1 h(s)X(t - s) ds. (-T,T)

This L2 limit e:rists and is eqtwl to

where H is the FouTier transfoTm of h. The function H is some­times called the transfer function of the linear jilter.

<) 6.12 Nonlinear Transformations

www.MathGeek.com

Page 160: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Nonlinear Transformations 159

Random processes often appear as models for random signals and noise. An assumption of stationarity is often warranted. Nonlinear systems that commonly appear in practice include half wave rectifiers, limiters, square law devices, and others. Let {X (t) : t E JR.} be a stationary Gaussian random process with mean zero and positive varian<:e (J2. Assume that the auto<:or­relation function of X(t) is denoted by ReT) = E[X(t)X(t+T)]. Further, assume that the function R(·) is positive definite. For two random variables X(t 1 ) and X(t2) in the random process, let P(tl - t2) = R(tl - t2) / (J2 denote their correlation coefficient. Recall that a bivariate probability density function j (., .) exists for these two random variables. Further, we can and do take this bivariate probability density function to be continuous as a function of its two real arguments. Indeed, we note that j(x, y) can be taken as the bivariate Gaussian density function given on page 112 with ml = m2 = 0, with (Jl = (J2 = (J, and with p = p(tl - t2). That is X(td and X(t2) have a N(O, 0, (J2, (J2, P(tl - t2)) distribution. Now, let p denote the continuous marginal Gaussian density function. That is, p is the <:ontinu­ous probability density function corresponding to the N(O, (J2) distribution. Let T = tl - h

Define the measure m on B(JR.) via m(B) = IE p(x) dx, and note that Tn is equivalent to Lebesgue measure on B(JR.). Further, consider the real Hilbert space L2 (JR., B(JR.) , m). Vve will take this real Hilbert space as our space of nonlinearities. VVe will let the Borel measurable function 9 correspond to a point in L2 (JR., B(JR.) , m), and we will refer to 9 as a nonlinearity. Nonlinear systems such as this are often referred to as zero memory non­linearities. That is, the output is a Borel measurarable function of the input at the same time; if it depends on the input at earlier times, then the system is said to have memory. \Ve will be <:on<:erned with the random pro<:ess {g(X(t)) : t E JR.}. Note that this random pro<:ess is also a stationary random pro<:ess.

For a nonnegative integer n, let

(-l)n (X2) dn

Bn(x) = 1:::1. exp -2 -p(x) V nl 2(J dxn

where p is the univariate Gaussian density function given above. These functions are called the orthonormalized Hermite poly-

www.MathGeek.com

Page 161: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

160 Random Processes

nomials and are obtained by applying the Gram-Schmidt or­thonormalization procedure to the collection of functions of the form xk for nonnegative integers k. Note that 8n is an nth degree polynomial. Note, also, that for each nonnegative integer n, the norm of 8n is unity, and ,8n , 8m ) = 0 for nonnegative distinct integers nand m. Thus, the set of functions {8n : n E N U {O} } is a set of orthonormal functions in L2 (lR., B(lR.) , rn). Indeed, it is an orthonormal subset of the real Hilbert space L2 (lR., B(lR.) , m).

Recall Lllsin's Theorem which states that for any element 9 of L2(lR., B(lR.) , m) and for any positive s there exists a continu­ous function c(·) that is equal to 9 on a given bounded interval pointwise off a set of Lebesgue measure less than E. Recalling the "Vierstrauss approximation theorem, we know that there ex­ists a sequence of polynomials that converges to c(·) uniformly on the interval of interest. Thus we can make the uniform norm between c(·) and a polynomial arbitrarily small on the interval of concern. This polynomial can be written as a linear com­bination of the orthonormalized polynomials 8n (·). \Vith this reasoning, we see that the set of orthonormalized polynomials {8n (·) : n E N U {O}} is dense in L2(lR., B(lR.) , m). Hence, any 9 E L2 (lR., B (lR.), m) can be expressed as

00

9 = L bn 8n (-),

n=O

where the convergence is in L2 (lR., B(lR.) , m). Note that

00 roo ~ b~ = i-oo Ig(x)12 dx < CXJ

via Parseval's eqnality and since 9 is an element of L2(lR., B(lR.) , m).

Consider again the bivariate density function f of the random variables X(tl) and X(t2) where, for convenience, we let X = X(td and Y = X(t2). This bivariate density admits a senes expansion, given via the Mehler series, as

00

f(x, y) = p(x)p(y) L pn(T)8n(x)8n (y), n=O

www.MathGeek.com

Page 162: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Nonlinear Transformations 161

where the convergence is in the following sense:

xp(x)p(y)dxdy = O.

\Ve now consider a nonlinearity 9 and the bivariate density func­tion f as given above. Consider, also, the output random process g(X(t)). \Ve are interested in the bandwidth characteristics of the output. The autocorrelation function of the output is given by Rg(T) = E[g(X(t))g(X(t + T))]. Thus, we see that

Rg(T) = E[g(X)g(Y)] = 1: 1: g(x)g(y)f(x, y) dxdy.

Observe, further, that

= Rg(T) = L b~pn(T),

11,=0

where the convergence is uniform in T, since the bn's are square summable and since Ip( T) I ::::; 1.

Now suppose that the input random process {X(t) : t E lP&} has a spectral density function that has compact support. Let 8 denote the spectral density function of the input, and let n denote the support of S. \Ve are thus assuming that the input is bandlimited and that the Lebesgue measure of n is bounded. Recall that the Fourier transform of ()2 p( T) is equal to 8 (w). Hence, by well known properties of Fourier transforms, we see that the Fourier transform of pH (T) is given by

where 8(W)*" = (8 * 8 * ... * 8)(w) denotes 8(·) convolved with itself n - 1 times. (Note that 8 * 8(w) is 8 convolved with itself once.) Assume that the nonlinearity 9 is not almost everywhere equal to a polynomial. Then infinitely many of the bn's are nonzero. Note that in this case, if the input random process { X (t) : t E lP&} were not bandlimited, then the output random process {g(X(t)) : t E lP&} would not be bandlimited, either.

www.MathGeek.com

Page 163: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

162 Random Processes

Next we show that even if the input random process were ban­dlimitec!., the output random process is not bandlimited when 9 is not a polynomial. Observe that the support of

is given by the n-fold Minkowski sum of r2 with itself; that is, its support is r2 EB ... EB r2 = EBnn where EB denotes the lVIinkowski sum; that is, AEBB = {a+b : a E A, b E B}. Assume again that the input is bandlimited. It follows that 00 > In( EBnr2) 2: n'\(r2) where .\ is Lebesgue measure. Further, recalling the Cantor ternary set, we see that the above inequality can be strict, and by considering a dosed interval, we see that it can be satisfied with equality. For the moment, we will measure bandwidth by the Lebesgue measure of the support of the spectral density. Thus, we see that the output is bandlimited if and only if there exists a positive integer N such that bn = 0 for all n > N. Notice that this condition is equivalent to the nonlinearity being almost everywhere equal to a polynomial. However, since we are assuming that the nonlinearity is not almost everywhere equal to a polynomial, we see that the bn's do not truncate, and thus the output is not bandlimited. On the other hand, if the nonlinearity is almost everywhere equal to a polynomial then we see that the output is bandlimited if and only if the input is bandlimited. \Ve summarize this result with the following theorem.

6.36 Theorem The output random pTOcess in the above discussion is strictly band limited if and only if the Gaussian input random process is strictly bandlim'ited and the nonlinearity 9 (.) is almost everywhere equal to a polynomial.

Recall, in particular, that "limiters" are not polynomials, and thus the output of a limiter with a Gaussian input is never ban­dlimited.

Now we will consider the case where the input random process is not strictly bandlimited but has a finite second moment band­width given by

www.MathGeek.com

Page 164: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Nonlinear Transformations 163

Assume for now that the mean has been subtracted from the out­put random process. Observe that this is equivalent to assuming that bo = 0 since Bo(x) = 1 for all x and thus bo = E[g(X(t))]. Observe, also, that (J"-2nS(w)*n can be viewed as a density func­tion of a sum of n mutually independent, identically distributed random variables each with mean zero and variance equal to the second moment bandwidth of the input B2 [X]. Thus it follows that 1: w2(J"- 2nS(w)*n dw = nB;[Y]

where B2 [Y] denotes the output second moment handwidth.

Now recall that the output antocorrelation fnnction is given by

00

R(T) = L b~pn(T). n=1

Further, recall £I·om standard properties of Fourier transforms that B2 [X] = -pl/(O). Similarly, note that

B [Y] = -RI/(O). 2 R(O)

Next, using Fubini's theorem on term by term differentiation, we will deduce the preceding derivative. Note that p(T) is max­imized at the origin. Further, there exists a positive number 6 such that p( T) is monotone in (0, 6). Also, pn (T) has the same monotonicity property. Taking derivatives from the right, and using Fubini's theorem on term by term differentiation, we see that

00

n=1 00

11,=1 00

RI/(T) = L (b~n(n - 1)pn-2(T)(p'(T))2 + b~npn-1(T)pl/(T)) . 11,=1

Thus, we see that

00

RI/ (0) = L nb~pl/ (0). n=1

www.MathGeek.com

Page 165: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

164 Random Processes

Further, we see that

B [Y] = -R"(O) = 2:::=1 nb;'B [X]. 2 R(O) 2::~=1 b~ 2

Thus the output second moment bandwidth is greater than or equal to the input second moment bandwidth with equality hold­ing precisely when hn = 0 for all n > 1. This, however, charac­terizes the case where 9 is almost everywhere equal to an affine function; that is, when 9 (x) = ax + b for real numbers a and b. \Ve summarize this result in the following theorem.

6.37 Theorem If {X(t) : t E lR} is a zero mean Gaussian random process that has a finite mean squar-e bandwidth, and if 9 is a nonlinear-ity such that 9 (X (t)) has zero mean, then the mean square bandwidth of 9 (X (t)) is greater- than or equal to that of the 'input. Eq1wlity holds if and only 'if 9 is almost ever-ywher-e equal to an affine fm~ction.

<> 6.13 Brownian Motion

The random processes that we will describe in this section were first used to model the movement of a partide suspended in a fluid and bombarded by molecules in thermal motion. Such mo­tion was first analyzed by a nineteenth-century botanist named Robert Brown. The mathematical foundations of the theory were later developed by Albert Einstein in 1905 and (rigorously) by Norbert vViener in 1924.

A Brownian motion process (or a \iViener process) is a random process {W(t) : t ~ O} defined on some probability space (0, F, P) that satisfies the following four properties:

1. n'(O, w) = 0 for each w E 0,

2. for any real numbers 0 ::; to < tl < ... < t k , the incre­ments ~V(tk) - Vqtk- 1 ), VV(tk- 1) - ~V(tk-2)' ... , H'(t1)­n'(to) are mutually independent,

www.MathGeek.com

Page 166: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Brownian Motion 165

3. for 0 :::; s < t, the increment W~(t) - W~(s) is a Gaussian random variable with mean zero and variance t - s,

4. for each w E r2, W~(t, w) is continuous in t.

6.38 Theorem There exists a random process defined on a probabil­ity space (that may be taken as the und interval wdh Lebesgue measure) that satisfies conditions (1), (2), (3), and en given above.

Returning to the physical motivation given above, a Brownian motion process may be used as a model for a single compo­nent of the path of a suspended particle subjeded to molecnlar bombardment. For example, consider the projedion onto the vertical axis of snch a particle's path. Condition (2) refleds a lack of memory of the suspended particle. That is, althongh the future behavior of the particle depends on its present position, it does not depend on how the particle arrived at its present po­sition. Condition (3), which specifies that the increments have zero mean, indicates that the particle is equally likely to go up or down. That is, there is no drift. Condition (3), which specifies that the variance of the increments grows in proportion to the length of the interval, indicates that the particle tends to wan­der away from its present position and having done so suffers no force tending to restore it. Condition (4) is a natural condition to expect of the path of a particle and condition (1) is merely a convention.

Since ~V(t) - tV(O) = tt'(t) , property (3) implies that ~V(t) is Gaussian with mean 0 and variance t. If 0 :::; s < t then, using the previous properties, we see that E[vV(s)vV(t)] = E[~V(s)(vV(t) - ~V(s))] +E[vV2 (s)] = E[vV(s)]E[tt'(t) - ~V(s)] + E[~V2(S)] = s. Thns, it follows that E[~V(s)~V(t)] = min{s, t}.

6.39 Theorem There exists a Brownian mot'lon process that is a measurable random process.

6.40 Theorem With probability one, lV(t, w) zs nowhere differen­tiable as a function of t.

Thus, off a null set, the sample paths of a Brownian motion process are continuous and nowhere differentiable. A nowhere

www.MathGeek.com

Page 167: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

166 Random Processes

differentiable path represents the motion of a particle that at no time has a velocity. Further, since a function of bounded variation is differentiable a.e., the sample paths of a Brown­ian motion process are of unbounded variation almost surely. Brownian motion is a commonly used. model for noise in engi­neering applications. In the following example we will find the Karhunen-Loeve expansion of a Brownian motion process.

Example 6.6 Consider a Brownian motion process restricted to the interval [0, 1]. The autocovariance function of such a random process is given by K(s, t) = min(s, t) for s, t E [0, 1]. To find the eigenvalues of this integral operator associated with this autocovariance function, we must solve the integral equation

101

min(s, t)e(t) dt = >.e(s); 0:::; s :::; 1

which reduces to

los 11 te(t) dt + s e(t) dt = >.e(s); 0 :::; s :::; 1.

o s (1)

Leibniz's rule2 implies that

d los - te(t) dt = se(s) ds 0

and that

:s 11 se(t) dt = 11 e(t) dt - se(s).

Thus, differentiating (1) with respect to s implies that

11 d

e(t) dt = >.-e(s) . s ds

(2)

2If a1(t, s) exists and is continuous and if o:(s) and (3(s) are difIeren­as tiable real-valued functions then

d 1;3(S) d3(s) do:(s) 1;3(S) a ----;; . 1(t, 8) dt = 1(3(8), 8)-'-~--1(0:(8), s)-.-+ C) )(t, 8) dt. de n(8) de ds . 0«8) uS

www.MathGeek.com

Page 168: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Brownian Motion 167

and differentiating (2) with respect to s implies that

d2

-e(s) = A~e(s). (3) ds

Recall that a solution of (3) will have the form

for A > 0 and A, B E JR. Setting s = 0 in (1) implies that e(O) = 0 for A > 0 and hence that B = O. Setting s = 1 in (2) implies that cos(l/ v1) = 0 which in turn implies that

1 (2n - 1)11"

vIA 2

for n E N. Thus, writing e(s) as a function of nand s we have

en(s) = A sin C2n ~ l)11"S)

for n E N. Note that

101

cn(s)em(s) ds

= 2A2 [sin((j - k)11"/2) _ sin((j + k)11"/2)] 11" 2(j-k) 2(j+k)

= o. Thus, the en's are orthogonal. Requiring the en's to be orthonor­mal implies that A = J2 since the sum and difference of odd numbers is even. Thus, the eigenvalues are given by

4 An = (2n - 1)211"2

and the orthonormalized eigenfunctions are given by

en(t) = J2 sin((2n - 1)11"t/2)

for n E N. Thus, the Karhunen-Loeve theorem implies that

= X(t) = L Znen(t)

n=l

where

for each n E N.

www.MathGeek.com

D

Page 169: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

168 Random Processes

6.14 Caveats and Curiosities

www.MathGeek.com

Page 170: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

7 Problems

7.1 Set Theory

Problem 1.1. Let n be a nonempty set and, for each t E lR, let At be a subset of n. Assume that if t1 < t2 then Atl C A t2 . Show that UtElR At = UnEN An.

Problem 1.2. Let 0 be a nonempty set, let :F be a IT-algebra on 0, and let A be a nonempty subset of O. Let 9 be a family of subsets of A given by 9 = {B E lP(A) : B E :F}. Is 9 a IT-algebra on A?

Problem 1.3. Prove or Disprove: The set of all integers is equipotent to the set of all positive, even integers.

Problem 1.4. Consider a nonempty set 0 and let :F be a sllbset of lP(n). Show that IT(:F) exists and is llniqlle. (Recall that IT(:F) is the smallest IT-algebra on 0 that contains every element in :F.)

Problem 1.5. Consider a nonempty set O. A subset of 0 is said to be cofinite if its complement is finite. (That is, A is cofinite iff AC is finite.) Let:F be the subset of lP(O) consisting entirely of all finite and cofinite subsets of n. Must:F be an algebra on O? Must :F be a IT-algebra on O?

Problem 1.6. Consider non empty sets X and Y and consider a function f: X ---+ Y. Show that j (j-1 (A)) c A for all A C Y and that B C j-1(f(B)) for all B C X.

Problem 1.7. For a function f: X ---+ Y, show that the following three statements are equivalent:

1. j is one-to-one.

www.MathGeek.com

Page 171: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

170 Problems

2. f(A n B) = f(A) n f(B) for all A C X and all B C X.

3. f(A) n f(B) = 0 whenever A and B are disjoint subsets of X.

Problem 1.8. Show that .f: X ---+ Y is onto if and only if f(f-I(B)) = B for each subset B of Y.

Problem 1.9. Consider the set S of all sequences of the form {aI, a2, a3, ... } where ai is equal to ° or 1 for each i. (For example, {O, 0, 0, ... }, {I, 1, 1, ... }, and {I, 0, 1, 0, ... } are all points in S.) Show that S is an uncountable set. (That is, show that there are uncountably many different such sequences of O's and 1 's.)

Problem 1.10. Any real number that is a root of a (nonzero) polynomial with integer coefficients is called an algebraic num­ber. (For example, y'2 is algebraic yet 7r is not.) Show that the set of all algebraic numbers is countable.

<) Problem 1.11. Show that any uncountable set of positive real numbers includes a countable subset whose elements sum to 00.

Problem 1.12. Let n be a nonempty set and let F be a collection of subsets of n such that n E F and such that if A and B are in F then A \ B is in F. Show that F is an algebra on n.

<) Problem 1.13. A IT-algebra is said to be countably generated if it is equal to IT(AI' A2 , ... ) for some countable colledion {An} of measurable sets. Show that B(JR.) is countably generated.

Problem 1.14. 'What is the smallest IT-algebra on JR. that contains every singleton subset of JR.?

www.MathGeek.com

Page 172: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Measure Theory 171

Problem 1.15. Consider an uncountable set A and let B be a countably infinite subset of A. Show that A is equipotent to A \B.

Problem 1.16. Show that the interval (0, 1] is equipotent to the set of all nonnegative real numbers.

<> Problem 1.17. Does there exist a a-algebra with a countably infinite number of elements?

7.2 Measure Theory

Problem 2.1. Let f-L: JPl(N) ---7 [0, 00] via f-L(A) = 0 if A is a finite set and f-L(A) = 00 if A is not a finite set. Is f-L a measure on (N, JPl(N))?

Problem 2.2. Prove or Disprove: Let (D, F, f-L) be a measure space and let {An}nEN be a sequence of sets from F. Assume that the sequence {An}nEN is a strictly decreasing sequence; that is, assume that, for each positive integer fl, An+1 is a proper subset of An. If the sequence {An}nEN converges to the empty set as n ---7 00 then f-L(An) converges to zero as n ---7 00.

<> Problem 2.3. Let D = JR2 and, for each positive integer TI, let An be the open ball ofradius one centered at the point (( -l)n In, 0). Findliminfn ---7= An andlimsuPn---7= An.

Problem 2.4. Consider the measure space (JR, B(JR) , A) where A denotes Lebesgue measure. Let {At: t E I} be a collection of null sets where I is any index set.

1. Show that UtE! At need not be a measurable set.

2. If UtE! At is measurable then must it be a null set?

www.MathGeek.com

Page 173: Probability - Basic Ideas and Selected Topics

172

www.MathGeek.com

I would quarrel with mathematics, and say that the sum of zeros is a dangerous num­ber. -Stanislaw Jerzy Lec

Problems

Problem 2.5. Show that any countable subset of lR. IS

Lebesgue measurable and has Lebesgue measure zero.

Problem 2.6. Consider a function .I that maps lR. into R If 1.11 is a Borel measurable function then must .I also be a Borel measurable function? Explain.

<) Problem 2.7. Suppose that P1 and P2 are probability mea­sures on rr(P) where P is a 1l"-system. Prove that if Hand P2

agree on P then they also agree on O"(P).

7.3 Integration Theory

Problem 3.1. Let .In: [0, 1] ---+ lR. be defined via

{n

fn(x) = ° 1. Find limn--;oo .In (x) .

2. Find limn--;oo f01 .In (x) dx.

3. Find fei limn--;oo fn(x) dx.

if x E (0, lin) if x tf. (0, lin).

Problem 3.2. Let 0 = {1, 2, 3, 4} and let J1 be a real-valued function on lP(O) such that J1(A) is equal to the number of points in A. (For example, J1( {1, 3}) = 2 and J1(0) = 4.) Show that

www.MathGeek.com

Page 174: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Integration Theory 173

JL IS a measure on (0, lP(O)). Let f 0 ---+ lR. via f(w) = w2.

Evaluate the Lebesgue integral

10 f dJL.

Problem 3.3. Consider a continuous probability distribution function 1 F for which F(a) = 0 and F(b) = 1 for some a and b from R Evaluate the following Riemann-Stieltjes integral:

r F(x) dF(x). J[a,b]

Problem 3.4. Let F be a probability distribution function that is absolutely continuous and let c > o. Evaluate the follow­ing integral: 1: (F(x + c) - F(x)) dx.

Problem 3.5. Engineers frequently use the "delta function" l5(t) which has the interesting property that if f : lR. ---+ lR. is c:ontinnolls at the origin then

1: l5(t)f(t) dt = f(O).

Unfortunately, no such function 15 exists since if it did it would equal 0 for nonzero t, and hence would integrate to zero. That is, the above integral would be zero for any continuous function f. \Ve can, however, obtain such a "sampling property" using a Riemann-Stieltjes integral. For what function 9 : lR. ---+ lR. is it true that

1 f(x) dg(x) = f(O) (a, b)

when .f is continnolls at the origin and when a < 0 < b? Why can't we simply define l5(t) to be the derivative of g(t)?

1 Probability distribution functions are defined on page 70 in Section 5.2. If you have not yet studied that section, you may want to defer this problem and the next problem until later.

www.MathGeek.com

Page 175: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

174 Problems

7.4 Functional Analysis

Problem 4.1. For real numbers x and y, let d(x, y) = (X-y)2. Does d define a metric on the set of real numbers?

Problem 4.2. Let (lvI, p) be a metric space. Show that the closure of an open ball B(x, r) = {y E !vI : p(x, y) < r} need not equal the corresponding closed ball B(x, r) = {y EM: p(x, y) :::; T}.

Problem 4.3. Let a and b be real numbers with a < b. Let C[a, b] denote the set of all real-valued functions that are con­tinuous on [a, b] and consider a metric don C(a, b] defined by

d(j, g) = max li(t) - g(t)l· tEla. b]

Show that if {in }nEN is a sequence of points in C[a, b] such that, for t E [a, b], fn(t) ---+ 0 as n ---+ 00 then it need not follow that d(jn' 0) ---+ 0 as n ---+ OC).

Problem 4.4. Consider the set Q of all rational nnmbers endowed with a metric d given by d(x, y) = Ix-yl. This metric space is called the rational line. Show that the rational line is not complete.

7.5 Distributions & Probabilities

Problem 5.1. Assume that Band C are random variables possessing a joint probability density function given by

i (b c) = {I for 0 :::; b :::; 1 and 0 :::; c :::; 1 E, C , 0 tl . o ' len,VIse.

\Vhat is the probability that the roots ofthe equation x2+2Bx+ C = 0 are real?

www.MathGeek.com

Page 176: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Distributions &. Probabilities 175

Problem 5.2. Consider a random variable X that has a uni­form distribution on (0, 1). ·What is the probability that the first digit after the decimal point in VX will be a 3?

Problem 5.3. Consider a random variable X that has a continuous, stridly increasing, positive probability distribution function Fx. What is the probability distribution function of the random variable Z = Fx(X)?

Problem 5.4. Consider a random variable X that has a continuous, strictly increasing, positive probability distribution function F. Find a probability density function for the random variable Y = -In(F(X)).

Problem 5.5. Consider random variables X and Y, let Fx denote the distribution fundion of X, let Fy denote the distri­bution fundion of Y, and let Fx , y denote the joint distribution function of X and Y. Let Z = max{X, Y} and let W = min{X, Y}. Find the distribution of Z and the distribution of ~V in terms of Fx , Fy , and Fx,Y.

Problem 5.6. Consider real numbers Xl, X2, Yl, and Y2 such that Xl ::; X2 and Yl ::; Y2. Show that if F(x, y) is a joint probability distriblltion fundion then it mllst follow that

Show that the function

{ 0 if X + Y < 1

G(x, y) = 1 ·f· 1 1 x+y ~

is not a joint probability distribution fundion.

Problem 5.7. Find the marginal probability density function .fx if the joint probability density function .fx. y is uniform on the circle of radius one centered at the origin.

www.MathGeek.com

Page 177: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

176 Problems

7.6 Independence

Problem 6.1. Consider a probability space (n, F, P) and two events A and B from F that are independent. Show that Ae and Be are also independent events.

Problem 6.2. Assume that a dart is thrown at a circular dart board having unit area in such a way that the probability the dart lands in any particular circular region of the board is given simply by the area of that region. Let (X, Y) denote the coordinates of the dart's position on the board after one throw. Are the random variables X and Y independent? Explain.

Problem 6.3. Consider a toss of two fair dice. Let A denote the event that the number appearing on the first die is even. Let B denote the event that the nnmber appearing on the second die is odd. Let C denote the event that the numbers on the two die are either both even or both odd. Are the events A, B, and C mutually independent? Explain.

Problem 6.4. Consider a monkey that is seated at a type­writer and who makes a single keystroke each second. Assume that the keystrokes are mutually independent events. (Is this a good assumption for a human typist?) Further, assume that the set of all possible outcomes of a keystroke include all low­ercase and uppercase English letters, the numbers zero through nine, all punctuation, and a space. Assume that each possible outcome of a keystroke has a fixed positive probability of be­ing typed. The typewriter never fails, the monkey is immortal, and there is an endless stream of paper. (All standard assump­tions!) Prove that, with probability one, the entire script of the play Hamlet by 'William Shakespeare will be typed an infinite number of times.

www.MathGeek.com

Page 178: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Random Variables 177

Problem 6.5. Let X and Y be random variables possessing a joint probability density function given by f (x, y) = 2 exp( -x­y) for 0 < x < y < 00. Are X and Y independent?

Problem 6.6. Assume that missiles are fired at a target in such a way that the point at which each missile lands has a uni­form distribution on the interior of a dis(; of radius 5 miles (;en­tered aronnd the target. If we assume that the points at whi(;h the missiles land are rrmtnally independent, then how many mis­siles rrmst we fire to ensnre at least a 0.95 probability of at least one hit not more than one mile from the target?

7.7 Random Variables

Problem 7.1. Consider a random variable X defined on a probability space (0, :.F, P) such that X(w) = 87 for each w E n. VVhat is IT(X)?

Problem 7.2. Consider a probability space (lR, B(lR) , P) where P is any probability measnre on (lR, B(lR)). Let X be a random variable defined on this space via X (w) = w2

.

(Note that such a definition is possible only because we have let n = JR.) vVhat is cr(X)?

Problem 7.3. Let X and Y be random variables su(;h that E[(X - y)2] = o. Show that X = Y a.s.

Problem 7.4. Consider a random variable X with probability density function

1 fx(x) = "2 exp( -Ixl)

for x E JR. Use Chebyshev's inequality to find an upper bound. on the probability that IXI > 2. ·What is the actual value of that probability?

www.MathGeek.com

Page 179: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

178 Problems

Problem 7.5. Consider a random variable X defined on a probability space (D, F, P). Let 9 : JR. ---+ JR. be a Borel mea­surable function and let Y = g(X). Show that cr(Y) C cr(X). \Vhen will cr(Y) = cr(X)?

<) Problem 7.6. Consider a random variable X. Show that cr(X) is countably generated.

<) Problem 7.7. Prove that a function X mapping a measurable space (D, F) into (JR., B(JR.)) is a random variable if and only if the set {w ED: X (w) :::::; x} is an element of F for each x E R

Problem 7.8. Let X and Y be integrable random variables defined on (0, F, P). Show that X = Y a.s. if and only if

for all F E F.

Problem 7.9. Consider independent random variables X and Y such that each has a uniform distribution on the interval [0, 2]. Find E[IX - YI].

Problem 7.10. For a positive integer TL, let Xl, ... , Xn be a collection of mutually independent, identically distributed ran­dom variables each with a uniform distribution on the interval [0, e] for some fixed positive real number e. If Z = max{XI' ... , Xn} then what is E[Z]?

Problem 7.11. Consider a random variable X whose charac­teristic function Cx(t) is such that Cx (2) = 0. For a fixed real number s, find E[cos(X + s) cos(X + s + 1)].

www.MathGeek.com

Page 180: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Moments 179

7.8 Moments

Problem 8.1. Consider a nonnegative, integrable random variable X defined on a probability space (0, F, P). Show that

E[X] = 10= P(X > t) dt.

Problem 8.2. Consider a random variable X with a finite second moment. Show that E[(X - m)2] is minimized over all m E Jl{ when Tn = E[X].

Problem 8.3. Consider random variables X and Y with finite second moments. Show that E[XY] is finite.

Problem 8.4. Consider random variables X and Y with finite second moments. Show that COV[X, Y] = E[XY]- E[X]E[Y].

Problem 8.5. Consider random variables X and Y with finite second moments. Show that Ip(X, Y)I :::; 1.

<) Problem 8.6. Consider random variables X and Y with finite second moments. vVhat can be said about X and Y if p(X, Y) = ±1?

Problem 8.7. Let Y be a random variable with a uniform distribution on [a, b] where a < b. \iVhat is VAR[Y]?

Problem 8.8. If X is Poisson with parameter A > 0 then what is VAR[X]? Let Xl, X 21 X 3 , and X 4 be mutually independent, Poisson random variables ea<:h with a mean equal to 3. Let Y = 4XI + X 2 + 6X3 + 3X4 . What is VAR[Y]? (The Poisson distribution is defined in Example 5.11 on page 105.)

www.MathGeek.com

Page 181: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

180 Problems

Problem 8.9. Consider random variables X and Y such that each has a finite positive second moment. Find a real number a for which E[(X - ay)2] is minimized.

Problem 8.10. Let Xl, ... , Xn be mntnally independent ran­dom variables each with variance (}2 and mean J-L. Find the correlation coefficient between 2:7=1 Xi and Xl.

Problem 8.11. Let X be a random variable with a Poisson distribution having parameter A. Find E[tX

] for t E lit (The Poisson distribution is defined in Example 5.11 on page 105.)

Problem 8.12. For an integer n > 1, let Xl, ... , Xn be mutually independent random variables that are uniformly dis­tributed over the interval (-1, 1). Find the <:haraderistk fun<:­tion for the sum Xl + ... + X n.

<) Problem 8.13. Let <I>(t) be the characteristic function of a random variable that possesses an even probability density func­tion. Show that 1 + <I> (2t) ;:::: 2 <I> 2 (t) for all t E lit

Problem 8.14. Let X denote the nnmber of 'Heads' that o<:<:nr when a fair min is flipped twke. vVhat is the moment generating function of X? Find E[xn] for n E N.

Problem 8.15. Consider independent random variables X and Y such that

and

wp 1/3 wp 1/3 wp 1/3

Y = {01

wp 1/3 wp 2/3.

Let Z be a random variable such that

if X + Y = 0 or X + Y = 3 if X + Y = 1 if X + Y = 2.

www.MathGeek.com

Page 182: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Transformations of Random Variables 181

Find 1\IIx(s) (the moment generating function of X), Alz(s) (the moment generating function of Z), and 1\IIx+z(s) (the moment generating function of X + Z). Is l\1x+z(s) = 1\IIx(s)Mz(s)? Are X and Z independent random variables?

Problem 8.16. Consider a random variable e with a uniform distribution on [0, 21T]. Let X = cos(8) and let Y = sin(8). Are X and Y uncorrelated? Are X and Y independent?

<) Problem 8.17. Let Xl, X2, Yl, and Y2 be real numbers such that Xl #- X2 and Yl #- Y2. Consider random variables X and Y defined on the same probability space such that P(X = Xl) + P(X = X2) = 1 with P(X = xd > 0 and P(X = X2) > 0 and such that P(Y = Yl) + P(Y = Y2) = 1 with P(Y = Yl) > 0 and P(Y = Y2) > o. Prove or Disprove: If X and Yare uncorrelated then X and Yare independent.

7.9 Transformations of Random Vari­ables

Problem 9.1. The radius of a circle is approximately mea­sured in such a way that the approximation has a uniform dis­tribution in the interval (a, b) where a > b > O. Find the distribution of the resulting approximation of the circumference of the circle and of the resulting approximation of the area of the circle.

Problem 9.2. Let X and Y be independent random variables with densities

1 1 fx(x) = - VI 2; Ixl < 1

1[" -x

and Y (_y 2

) fy(y) = 0"2 exp 20"2 ; Y> 0,

www.MathGeek.com

Page 183: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

182 Problems

respectively. Find the distribution of the product XY.

Problem 9.3. Let X and Y be independent random variables each with a density fundion given by f(x) = e-xI[o,=)(x). Let HT = Xj(X + Y). What is the distribution of l)V?

Problem 9.4. Consider a positive random variable X with density fundion fx. Find a density fundion for IjX.

7.10 The Gaussian Distribution

Problem 10.1. Consider random variables X and Z that are defined on the same probability space (0, :F, P). Assume that X has a standard Gaussian distribution and that Z has a Gaussian distribution with mean 5 and variance 4. Find a real number a such that P(X > a) = P(Z < 2.44).

Problem 10.2. Let X be a Gaussian random variable with mean m and variance (}2. \Vhat is E[X3] in terms of rn and (}2?

\Vhat is E[X97] if m = 0 and iT2 = 38?

Problem 10.3. For a fixed positive integer n, let Z and Xl, ... , Xn be zero mean, unit variance, mutually independent Gaussian random variables. Let

and let

w~ ~t,Xf The random variable W has a density function fw that you do not need to find. Instead, find an expression for a density func­tion of T in terms of fw and fz where fz is a density fnndion for Z.

www.MathGeek.com

Page 184: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Convergence 183

Problem 10.4. Let X be a standard Gaussian random vari­able, and let Z be a random variable that takes on the values 1 and -1 each with probability~. Assume that X and Z are inde­pendent, and let Y = X Z. Show that Y is a standard Gaussian random variable. Is X + Y a Gaussian random variable? Are X and Y uncorrelated? Are X and Y independent?

Problem 10.5. Let Xl and X 2 be zero mean, unit variance, mutually Gaussian random variables with correlation coefficient 1/3. Let X denote the random vector [Xl X2]T. Find a real 2 x 2 matrix e so that the random vector Z = ex is composed of independent Gaussian random variables.

Problem 10.6. Let X be a Ganssian random variable with mean 'ml and variance tTi. Let Y be a Ganssian random variable with mean m2 and variance tT5. Assume that X and Yare independent and find the distribution of X + Y.

Problem 10.7. Let 11 be a N(ml' aD density function and let 12 be a N(m2' 0"5) density function. Consider a random variable X that has a density given by )..JI(x) + (1- )..)12(x) where 0 < ).. < 1. Find the moment generating function for X, the mean of X, and the variance of X.

Problem 10.8. Let X and Y be independent Gaussian ran­dom variables each with mean zero and variance one. Find E[max(X, Y)].

7.11 Convergence

Problem 11.1. Define a sequence of mutually independent random variables as follows:

with probability.!. n

with probability 1 - 1 n

www.MathGeek.com

Page 185: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

184 Problems

for n E N. Does Xn ----Jo a a.s.? Explain.

Problem 11.2. A random variable X is said to have a Cau<:hy distribution <:entered at zero with parameter a > a if X has a density given by

a f x ( x) = --:-------::-:-

1I(a2 + x 2)'

The characteristic function Cx(t) of X is given by Cx(t) = exp( -altl). Let {Xn}nEN be a sequence of mutually independent random variables each having a Cauchy distribution centered at zero with parameter a = 1. For a fixed positive integer n, let Sn = Xl + ... + X n. \iVhat is the distribution of Sn/n?

Problem 11.3. Show via an example that a seqnen<:e of ran­dom variables may converge in probability without converging in Lp for any p > 1.

Problem 11.4. Let c be a real constant. Show that Xn ----Jo C

in distribution if and only if Xn ----Jo C in probability.

Problem 11.5. Consider a sequence {Xn},,,EN of mutually independent random variables each with a uniform distribution on the interval (0, 1]. For each positive integer n, let Zn = n(1 - max(XI, ... , Xn)). Does Zn <:onverge in distribution? If so then to what distribution does FZn converge?

Problem 11.6. Consider a nnmerkal s<:heme in whkh the round-off error to the second decimal place has the uniform dis­tribution on the interval (-0.05, 0.05). 'What is an approximate valne of the probability that the absolnte error in the S11m of 1000 s11<:h nnmbers is less than 2?

Problem 11. 7. If you toss a fair coin 10, 000 times then what (approximately) is the probability that you will observe exactly 5000 heads?

www.MathGeek.com

Page 186: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Conditioning 185

Problem 11.8. Let {Xn}nEN be a sequence of second order random variables defined on (0, F, P) and let a be a real num­ber. Find conditions on E[Xn] and VAR[Xn ] that are both suf­ficient and necessary to ensure that Xn ----+ a in L2 .

7.12 Conditioning

Problem 12.1. Let U and V be independent random variables each with a zero mean, unit variance Gaussian distribution. Let X = U + V and Y = U - V. Show that X and Yare independent random variables each with a zero mean Gaussian distribution having a variance equal to 2. Find E[XIU] and E[YIU]. Are E[XIU] and E[YIU] independent random variables?

Problem 12.2. Show via em example that E[XIY] = E[X] need not imply that X and Yare independent.

Problem 12.3. Let X and Y be independent, zero mean random variables and let Z = XY. Assume that Z has a finite mean. Find E[ZIX]' E[ZIY]' and E[ZIX, Y].

Problem 12.4. Consider the probability space ([0, 1], B([O, 1]), ).) where ). denotes Lebesgue measure. Consider subsets of [0, 1] given by A = [0, 1/4]' B = (1/4, 2/3]' and C = (2/3, 1]. Let F be the O"-algebra on 0 given by O"({A, B, C}) and let X(w) = w2 for W E [0, 1]. Find E[XIF].

Problem 12.5. Consider second order random variables X, Y, and Z defined on the same probability space. Show that if X and Z are independent and if X and Yare independent then E[XZIY] = E[X]E[ZIY] a.s.

www.MathGeek.com

Page 187: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

186 Problems

Problem 12.6. Consider a sequence {Yn}nEN of mutually in­dependent random variables each with mean zero and positive variance a 2

. For each positive integer n, let

Problem 12.7. Let X and Y be second order random variables defined on the same probability space. The conditional variance of X given Y is denoted by VAR[XIY] and is defined by

VAR[XIY] = E[(X - E[Xly])2IY].

Show that

VAR[X] = E[VAR[XIY]] + VAR[E[XIY]].

Problem 12.8. Let X and Y be random variables defined on the same probability space and assume that E[X2] < 00. Let 9 be a Borel measurable function mapping lR to R Show that

E[(X - g(y))2] = E[(X - E[Xly])2] + E[(E[XIY]- g(y))2].

For what such function 9 is E[(X - g(y))2] minimized?

Problem 12.9. Let 0 = {I, 2, 3, 4, 5, 6} and let F be the power set of O. Define a probability measure P on (0, F) by letting P({w}) = 1/6 for each wE O. Let Q = a({{l, 3, 5}}) and let X(w) = w for each w E O. Find E[XIQ].

Problem 12.10. Consider a probability space (0, F, P) and let 0 1 , ... , ON be disjoint measnrable subsets of 0 s11ch that 0= 0 1 u· .. U ON and s11ch that P(Di ) > 0 for each i. Let Q be the a-algebra on 0 generated by 0 1 , ... , ON and let X be an integrable random variable defined on (0, F, P). Find E[XIQ] for all w E Oi.

www.MathGeek.com

Page 188: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

True/False Questions 187

Problem 12.11. Let X be a random variable defined on (0, F, P) such that E[X2] is finite. Let 91 and 92 be cr-subalgebras of F. If Y = E[XI91] a.s. and X = E[YI92] a.s. then show that X = Y a.s.

Problem 12.12. Let X be a random variable with mean 3 and variance 2. Let Y be a random variable sl1ch that E[Y] = 4 and E[XY] = -3. If E[YIX] = a + lJX a.s. then find a and lJ.

<) Problem 12.13. Let X and Y be zero mean, positive vari­ance, mutually Gaussian random variables possessing a corre­lation coefficient p such that Ipl < 1. Show that E[X2y2] = E[X2]E[y2] + 2(E[Xy])2.

Problem 12.14. Consider a sequence {Y1 , 1'2, ... } of mutu­ally independent random variables that are defined on the same probability space and that each have a mean of 1. For each positive integer n, let Xn = Y1Y2 ... Yn. Find

and

where j < m < n.

Problem 12.15. Consider random variables X and Y pos­sessing a joint probability density f11nction

f(x, y) = ' - . - _. -{ 8x'y if 0 < x < 1 and 0 < Y < x

o otherwIse.

Find E[XIY] and E[YIX].

7.13 True/False Questions

A statement that is not always true should be considered false. For example, the statement "If x 2 = 4 then x = 2" is a false statement.

www.MathGeek.com

Page 189: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

188 Problems

1. The set IR is a subset of the set IR2.

2. There exists a function f:IR ---7 IR such that f:A ---7

A is a bijection for any nonempty subset A of R

3. Any O"-algebra is a A-system.

4. Consider a complete measure space (D, F, P) and let A be an element of F. If B c A then B E F.

5. Consider a probability space (D, F, P) such that nand IR are equipotent. If x:n ---7 IR is bijective then X is a random variable defined on (n, F, P).

6. Consider two independent random variables X and Y defined on a probability space (n, F, P). There does not exist a set A s11ch that A E iJ(X) n iJ(Y) and 0< P(A) < 1.

7. If the second moment of a random variable X exists then the first moment of X must also exist.

8. If .f:IR ---7 IR is constant a.e. with respect to Lebesgue measure then f is Riemann integrable.

9. Consider a nonempty set n and two subsets F and 9 of JID(n). If F n 9 = 0 then O"(F) i= 0"(9).

10. The expected value of an integrable random vari-able must be an element of the range of that random vari­able.

11. If two sets A and B are such that A is a subset of B then there always exists an element x in the set B that is not in the set A.

www.MathGeek.com

Page 190: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

True/False Questions 189

12. There exists a nonempty set n such that the power set of n is the smallest cr-algebra that contains n.

13. A probability measure is always a cr-finite mea-sure.

14. The infimum of a set of positive real numbers must itself be a positive real number.

15. The collection ofreal I30rel sets is the smallest (J-

algebra on the real line that contains every closed interval.

16. ___ _ There exist two subsets A and B of JR: s11ch that B is a Lebesgue null set, such that A is a subset of B, and such that A is not an element of M(JR:).

17. It is possible for a random variable to be inde-pendent of itself.

18. A random variable X possessing an even proba-bility density function must have a mean equal to zero.

19. It is possible for disjoint events to be indepen-dent and it is possible for disjoint events not to be inde­pendent.

20. A Lebesgue measurable subset of the real line that is not countable must have positive Lebesgue mea­sure.

21. ___ _ If X and Yare Gaussian random variables then X + Y must be a Gaussian random variable.

22. ___ _ If X and Yare uncorrelated Gaussian random variables then X and Y must be independent random vari­ables.

www.MathGeek.com

Page 191: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

190 Problems

23. If X and Yare independent Gaussian random variables then X + Y must be a Gaussian random variable.

24. A function mapping the real line to a finite sub-set of the real line must be Riemann integrable.

25. Consider a probability space (n, F, P), a ran-dom variable X defined on this space, and a cr-subalgebra Q of F. The conditional expectation E[XIQ] must be F­measurable.

26. If all of the sample paths of a random process are continuous then all of the sample paths of a modification of that process must also be continuous.

27. Two distinct second order random processes must possess distinct auto covariance functions.

28. There does not exist a random variable with a first moment equal to J2 and a second moment equal to 1.

29. Let n be a nonempty set and let f be a function mapping n to R There always exists a IT-algebra F on n so that f is a measurable mapping from (n, F) to (JR, B(JR)).

30. If 1\2 f dp is a Lebesgue integral then p must be Lebesgue measure.

31. Consider two random variables X and Y defined on a probability space (n, F, P). If E[X - Y] = 0 then P(X = Y) = 1.

32. Consider two random variables X and Y defined on the same probability space. If E[X + Y] < 00 then E[X] < 00 and E[Y] < 00.

www.MathGeek.com

Page 192: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

True/False Questions 191

33. Consider a function 9 lR. ----Jo lR. and a random variable X defined on (D, F, P). The function g(X) will always be a random variable defined on (0, F, P).

34. If a random variable X is equal almost surely to a certain mnditional expedation then X mllst be a version of that mnditional expedation.

35. Consider random variables X and Y defined on the same probability space. If X = Y a.s. then (J(X) = (J(y).

36. Consider a random variable X that possesses an absolutely continuous probability distribution function, and let 9 be a Borel measurable fundion mapping lR. to R The random variable g(X) must also possess an absolutely mntinuous probability distribution fundion.

37. There exists a probability density function f such that the supremum of the set {.f(x) : x E lR.} is not finite.

38. Consider two random variables X and Y defined on the same probability space. If X is (J(Y)-measurable then O"(X, Y) = O"(X).

39. A set may be equipotent to a proper subset of itself.

40. Let D be a set containing at least two elements and let F and 9 be two distinct (J-algebras on D. The set F u 9 is never a (J-algebra on D.

www.MathGeek.com

Page 193: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

192 Problems

www.MathGeek.com

Page 194: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

8 Solutions

8.1 Solutions to Exercises

1.1. Yes, {0} is a set containing one element and 0 is the set containing no elements.

1.2. Since the only subset of the empty set is the empty set itself it follows that {0} is the power set of 0.

1.3. Assume that A c B. If A U B is empty then B is empty and hence Au B = B. Assume that Au B is not empty and let x E A U B. By definition of union, it follows that either x E A or x E B. If x E A then x E B since A c B. Thus, x E Band we conclude that A U B c B. If B is empty then A is empty and hence Au B = B. Assume that B is not empty and let x E B. Then x E Au B which implies that B c Au B. Thus, we conclude that A U B = B.

Assume that Au B = B. If A is empty then A c B for any set B. Assume that A is not empty and let x E A. Then x E Au B which implies that x E B since Au B = B. Thus, we conclude that A c B.

1.4. The first function is not onto and not one-to-one. The second function is onto but not one-to-one. The third function is one-to-one but not onto. The fourth function is bijective with inverse 1-1 (x) = yIX.

1.5. Choose some b E B and note that since 1 is onto there exists some a E A such that 1 (a) = b. Since 1 is bijective it follows that 1-1 ({b}) = {a}; that is, 1-1 ( b) = a. Substitution thus implies that lU-1(b)) = b.

Choose some a E A and let 1 (a) = b. As above, note that 1-1(b) = a. Substitution thus implies that l-1U(a)) = a.

1.6. There does not exist a bijection from R into S since no function from R to S can be one-to-one. There does not exist a bijection from S into R since no function from S to R can be onto.

www.MathGeek.com

Page 195: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

194 Solutions

1. 7. Yes, if ] : A ---+ Band] is bijective then ]-1 : B ---+ A is a bijection from B to A. That is, ]-1 is onto since] is defined on all of A and ]-1 is one-to-one since if ](a) = 61 and ](a) = 62

then 61 = 62 . (That is, if 61 i= 62 then ]-1 (61) cannot be equal to ]-1(62 ).)

1.8. Consider a countable set C and a subset B of C. Since C is countable there exists a bijection f mapping C to a subset N of the positive integers. Let 9 mapping B to ](B) be the restriction of ] to B; that is 9 = ] on Band 9 is undefined on C \ B. Note that 9 is onto since it maps B to f(B) and that 9 is one-to-one since] is one-to-one. Thus, B is countable since 9 is a bijection from B to ](B) c N.

1.9. For notational simplicity, assume that all of the A;'s are countably infinite. For each i E N, let Ai = {ai, a~, ... }. (For example, if ] is a bijection £I·om Ai to N then we could simply choose aj such that f ( aj) = j.) Note that we may arrange the a~ in matrix form as:

Define a sequence {6i }iEN by selecting elements from the above array in the following manner:

61 63 66

62 65 69

64 68 613

Note that this sequence defines a bijection from the union of the Ai'S to N. That is, the countable union is itself countable.

1.10. Yes. This is the smallest possible algebra or O"-algebra on n. 1.11. Yes. This is the largest possible O"-algebra on n since it contains every subset of n. 1.12. Five O"-algebras on n are {0, n}, JPl(n) , {0, n, {I}, {2, 3}}, {0, n, {2}, {I, 3}}, and {0, n, {3}, {l, 2}}.

www.MathGeek.com

Page 196: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Exercises 195

1.13. Consider nonempty sets ~ and I, and for each i E I let Ai be a cr-algebra on~. Let A denote the intersection of the A;'s for i E I. (That is, A E A if and only if A E Ai for each i E I.) First, note that ~ E A since ~ E Ai for each i E I. Second, note that if A E A then A and hence AC is in Ai for each i E I which implies that AC E A. Finally, assume that An E A for each n E N. Then An E Ai for each n E N and each i E I. Thus, UnEN An E Ai for each i E I which implies that UnEN An E A.

Note that a union of cr-algebras need not be a cr-algebra. Let :F = {A, AC, 0, D} and g = {B, BC, 0, ~}. Note that :F U g (generally) does not include A U B.

1.14. First, note that D E A since ~c = 0 is finite. Second, note that if A E A then either AC is finite or has a finite complement and hence AC E A. Further, note that if A and B are in A then Au B is finite if A and B are each finite and Au B has finite complement if either A or B has finite complement since (A U B)C = AC nBC. Thus, A is dosed under finite unions, and hence A is an algebra. To see that A is not a O"-algebra let Ai = {i} for i E N and note that UiEN Ai = N which is neither finite nor has a finite complement.

1.15. First, note that ~ E A since ~c = 0 is finite, and hence countable. Second, note that if A E A then either AC is count­able or has a countable complement and hence Ac E A. Now, let A; for eachi E N be an element from A. If each of the Ai'S is countable then so is their countable union. If one or more of the Ai's is cocountable then (by DeMorgan's Law) it follows that their countable union is also cocountable. In each case, we see that the union of the A;'s is in A. Thus, A is both an algebra and a cr-algebra.

1.16. They each equal {0, ~}, but for different reasons. The cr­algebra 0"(0) is the smallest O"-algebra on ~ that contains every set in 0. Since there are no sets in 0, cr(0) is simply the smallest cr-algebra on D, which is {0, D}. The cr-algebra cr( {0}) is the smallest cr-algebra on ~ that contains 0, which again is {0, ~}.

1.17. This again is simply {0, ~}.

1.18. This is {A, AC, ~, 0}.

www.MathGeek.com

Page 197: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

196 Solutions

1.19. Note that IT({A, B}) = {O, 0, A, AC, B, BC, AUB, (AU B)C, A U BC, B \ A, B U AC, A \ B, AC U BC, A n B, A D. B, (A D. B)C}.

1.20. Yes, 0 E F and 0 C F.

1.21. No. Let A = lR and let A = {0, A}. Further, let f: lR---+ lR via f(x) = 3 for all x E R Then f(A) = {0, {3}} is not a IT -alge bra on lR since lR t/:. f (A).

For another example, let A = lR and let A = {0, A, {5}, {5y}. Further, let f : lR ---+ [-1, 1] via f(x) = sin(x). Then f(A) = {0, [-1,1]' {sin(5)}} which is not a O"-algebra on [-1, 1] since it does not contain {sin(5)}c.

2.1. Yes. Since every real number is an upper bound of 0

it follows that the least upper bound (or supremum) of 0 is -00. Since every real number is a lower bound of 0 it follows that the greatest lower bound (or infimum) of 0 is 00. Thus, sup 0 < inf 0.

2.2. Note that

(lim sup A~)C [kOl nQk A~ 1 C

kQl [rQk A~l c

= = u n Am k=l m.=k liminf An.

2.3. Assume that lim inf An is not empty and note that

= = w E lim inf An ::::} W E U n Am

k=l m=k =

::::} 3N st WEn Am rn=N

= ::::} W E U AmVk

m=k

www.MathGeek.com

Page 198: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Exercises 197

= 00 ::::} wEn U Am = lim sup An· k=l m=k

2.4. Recall that lim inf An consists of all those points that belong to all but perhaps a finite number of the An's. Let a be a positive real number. Choose a positive integer N so that liN < Ct. Note that a ¢:. An when n is an even integer greater than N. Since there are an infinite number of such n's it follows that Ct cannot be in lim inf An. A similar argument implies that no negative real number is in lim inf An. Note, however, that since 0 E An for each 11, it follows that 0 E lim inf An. Thns, we conclude that lim inf An = {O}.

Recall that lim sup An consists of all those points that belong to infinitely many ofthe An's. Note that any real number from the interval (-1, 0] is in An for any even integer n, and that any real number from the interval [0, 1] is in An for any odd integer n. Further, any real number outside of these intervals is not in An for any n. Thus, lim sup An = (-1,1].

2.5. Note that this exercise asked you to try to find a non-Borel set. In particular, you could have sucessfully completed this problem without actually finding such a set!

The purpose of this exercise is to convince the reader that con­structing a non-Borel subset of the real line is not a trivial task. Since the construction of such a set at this point would take us rather far afield, a non-Borel set will not be presented here. For many examples, see the book Counterexamples in Pmbability and Real Analysis by Gary Wise and Eric Hall.

A proof for the e.Tistence of a non-Borel set is not quite as dif­ficult. It follows immediately from the fact that the set of real Borel sets is equipotent to Itt

2.6. To begin, we will show that any singleton subset of lR is a real Borel set. Note that, for any x E lR,

= ( 1 1) {x} = n x - -, x + - . n=l n n

Thus, since {x} is a countable intersection of bounded open intervals it follows that {x} must be an element of 8(lR), the smallest O"-algebra containing every bounded open interval.

www.MathGeek.com

Page 199: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

198 Solutions

Now, let C be a countable subset of R Since C is countable we may enumerate its elements as a sequence {C1' C2, ... }. Note that

=

n=l

Thus, C is a countable union of sets from B(JR) it follows that C must also be an element of B (JR).

2.7. A function 1 : JR ---7 JR is continuous if and only if 1-1 (U) is open for every open subset U of R Further, a function 1 : JR ---7

JR is Borel measurable if and only if 1-1((-00, x)) is a Borel

set for each x E R Since (-00, x) is open for each x E JR and each open subset of JR is a Borel set, the desired result follows immediately.

2.8. The Cantor ternary set is an uncountable subset of JR that has Lebesgue measure zero.

2.9. Dirac measure on a single point will yield the power set of the reals when completed.

3.1. Let 9 denote the collection of all subdivisions of [a, b] and recall that V = sup{ S(r) : rEg} where

s(r) = L 11(ai) - l(ai-1)1 i=l

if r = {ao, aI, ... , am}. Since 11(x) - I(Y)1 s Clx - YI for all x and Y in [a, b] it follows that

Tn

S(r) S CLai - ai-1 = C(b - a) i=l

for any rEg and hence that V = C(b - a).

5.1. Since the set {w En: X(w) S n} converges to the empty set as n ---7 -00 it follows from Lemma 2.1 that F(n) ---7 0 as n ---7 -00. From this the desired result follows immediately.

5.2. VVe must show that limyl x F(y) = F(x). Again, we can use Lemma 2.1 since the set {w En: X(w) S x + (lin)} converges to the set {w En: X(w) S x} as n ---700.

5.3. Since P(X S x) = P(X < x) + P(X = x), the desired result will follow if we show that P (X < x) = limyTx F (y ). Let

www.MathGeek.com

Page 200: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Exercises 199

{Un}nEN be a strictly increasing sequence whose limit is x, and let An = {w En: X(w) ~ Un}. Note that U~=l An = {w En: X(w) < x}. Note further that

as n ----7 00 since An C An+1 for each n E N. Thus, the desired result follows from Lemma 2.1.

5.4. Recall that

Thus,

10= x f(x) dx = +00

and

1°= x f(x) dx = -00

which implies that the first moment does not exist. Note that

2. if n is even and n> 2 then lim xnf(x) = 00, and x~±=

3. if n is odd and n > 1 then lim xn.f(x) = ±oo. X---7±=

Thus, the odd moments do not exist and the even moments are infinite.

5.5. The only way that a Lebesgue integral of a measurable fundion can fail to exist is if one encounters a sum of the form 00 - 00. This cannot occur if the measurable function is non­negative or nonpositive.

5.6. Note that

VAR[X] E[(X - E[X])2] E[X2]- 2E[XE[X]] + E2[X] E[X2]- 2E2[X] + E2[X] E[X2]- E2[X].

www.MathGeek.com

Page 201: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

200 Solutions

5.7. Recall that X and Y possess a joint density of the form

1 (-q(X, y)) fx,Y(x, y) = 2 J1 :2 exp 2

1!"iT1iT:2 - P

where q(x, y) =

Further, recall that X has a density function fx given by

fx(x) = l f(x, y) dy.

The desired result follows after substituting, completing the square, and integrating.

5.8. Recall that X and Y possess a joint density of the form

1 (-q(X, y)) fx,y(x, y) = 2 J1 2 exp 2

1!"iT1 iT2 - P

where q(x, y) =

Further, recall that

The desired result follows immediately after finding

II xy f(x, y) dxdy.

5.9. No, see Problem 11.3.

5.10. Consider a sequence {Xn}nEN of random variables defined as follows on the probability space ([0, 1], 8([0, 1]), A) where A is Lebesgue measure on 8([0,1]). Let Xl = 1[0,1/2], X 2 = 1[1/2,1],

X3 = 1[0,1/4], X 4 = 1[1/4,1/2], X5 = 1[1/2, :3/4], X6 = 1[3/4,1], X 7 = 1[0, l/S] , Xs = I[l/S, 1/4], ... , X 14 = I[7/S,1], X 15 = 1[0,1/16], etc.

www.MathGeek.com

Page 202: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Exercises 201

Note that Xn does not converge to zero at any point in [0, 1] even though E[IXn - OIP] = E[Xn] ---+ 0 as n ---+ 00 for any p > O.

5.11. Consider the probability space given by (0, 1), the Borel snbsets of (0, 1), and Lebesgne measnre. Define a seqnence of random variables on this space by setting Xn(w) = 2nI(o.1/n)(W) for n E N. Note that Xn converges pointwise to zero as n ---+ 00.

However,

ll/n 2np

E[IXn - OIP] = E[X~] = 2np dw = -, o n

which goes to 00 as n ---+ 00 for every p > O. Thns, the Xn's do not converge to zero in Lp.

5.12. Since X is a random variable on (r2, F, P) it must be F­measurable, and thus satisfies the first property in the definition of E[XIF]. Further, X trivially satisfies the second property of that definition. Thus, E[XIF] = X a.s.

5.13. Since E[X] is a constant it is measurable with respect to any O"-algebra and thus satisfies the first property in the defini­tion of E[XI{0, r2}]. Further, note that

and that any integral over 0 is zero. Thns, E[X] satisfies the second property in the definition of E[XI{0, r2}]. \Ve condnde that E[XI{0, D}] = E[X] a.s. Note, however, that this eqnality actually holds pointwise since the only null set in {0, r2} is the empty set.

5.14. Note that

E[XY] E[E[XYIY]]

E[YE[XIY]] E[YE[X]] E[X]E[Y].

www.MathGeek.com

Page 203: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

202 Solutions

8.2 Solutions to Problems

1.1. If UtElR At is empty then it follows immediately that UtElR At C UnEN An· Assnme then that UtElR At is not empty and let x E UtElR At. Then there exists some y E lR snch that x E Ay. Let Tn be any positive integer such that Tn > y and note that by assumption Ay C Am. Hence, x E Am. and x E UnEN An. Thus, UtElR At C Un EN An.

If UnEN An is empty then it follows immediately that Un EN An C

UtElR At· Assume that UnEN An is not empty and let x E

UnEN An. Then, x E UtElR At since NcR Hence, UnEN An C

UtElR At and we conclude that in fact the two sets are equal.

1.2. Not necessarily. Simply let A be a subset of n that is not in F. Then, A is not in 9 and hence 9 is not a O"-algebra on A

1.3. Let B denote the set of positive, even integers and define f: Z ---+ B via f(O) = 2, f(n) = 4n if n E N, and f(n) = 41nl + 2 if -n E N. Since f is bijedive it follows that Band Z are equipotent.

1.4. To begin, we will show that an intersedion of (J-algebras on n is itself a O"-algebra on n. Consider a nonempty set n and a nonempty set A. For each ..\ E A assume that F).. is a (J-algebra on n and let M = n)..EA F)... Note that n E F).. for each ..\ E A since F).. is a O"-algebra on n for each ..\ E A. Hence, n E M. Next, let A E M and note that A E F).. for each ..\ E A. Hence, AC E F).. for each ..\ E A since F).. is a O"-algebra on n for each ..\ E A. Thus, A" E M and we see that M is dosed nnder complementation. Finally, let An E .!\It for each n E N and note that An E F).. for each ..\ E A and each n E N. Hence, UnEN An E F).. for each ..\ E A since each F).. is closed under countable unions. Thus, since this union must also be in M, it follows that M is closed under countable unions. Combining these three results we see that .!\It is itself a (J-algebra on n.

Now, returning to the problem, let C denote the family of all O"-algebras on n that contain each element in F. Note that C is not empty since lfD(n) E C. Let M denote the (J-algebra on

www.MathGeek.com

Page 204: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 203

n given by the intersection of all of the O"-algebras in C. Note that M contains every element in:.F. Further, assume that £., is another O"-algebra on n that contains every element in:.F. Since £., E C it follows that Me£. Thus, M is the smallest O"-algebra on n that contains every element in:.F. That is, M = O"(:.F).

To show that M is unique, assume that Ml = O"(:.F) and that M2 = O"(:.F). By definition of O"(:.F) it follows that Ml c M2 and that M2 c MI. Hence, we conclude that Ml = M 2; that is, O"(:.F) is the unique such O"-algebra on n. 1.5. The set :.F is an algebra on n but need not be a O"-algebra on n. Since nc = 0 is finite it follows that n is cofinite and hence that n E:.F. If A E :.F then A is either finite or cofinite and hence AC is either cofinite or finite. In either case, AC E :.F. Finally, let A and B be elements of :.F. If A and B are each finite then Au B is finite and hence is an element of :.F. If either A or B is cofinite then either A" or BC must be finite which implies that (A U B)C = AC n B" is finite and hence that (A U B)C is in:.F. Since:.F is dosed under complementation it follows that Au B E :.F and hence that :.F is an algebra on n. To see that :.F need not be a O"-algebra, let n = lR. and let An = {n} for each n E N. Note that An is finite for eachn and hence is an element of :.F for each Tl. However, Un;::!'! An = N and N is neither finite nor cofinite. Thus, since :.F is not closed under countable unions it follows that :.F is not a O"-algebra on n. 1.6. If y E f(1-1(A)) then y = f(x) for some x E f-l(A). If x E f-1(A) then f(x) E A. Thus, since yEA we condnde that f(1-1(A)) is a subset of A. If x E B then f(x) E f(B) and hence x E f-l(1(B)). Thus, B is a subset of f-l(1(B)).

1.7. [(1) =? (2)] If y E f(A) n f(B) then there exists a E A and b E B such that y = f(a) = f(b). Since f is one-to-one it follows that a = b E An B and hence that y E f(A n B). Further, if A n B i- 0 and y E f(A n B) then there exists some point z E An B such that y = f(z). Since z E A and z E B it follows that y E f(A) n f(B). Thus, it follows that f(A n B) = f(A) n f(B).

[(2) =? (3)] This part is obvious since f(0) = 0.

www.MathGeek.com

Page 205: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

204 Solutions

[(3) ::::} (1)] Let f(a) = f(b). If a i= b then {a} and {b} are disjoint yet f( {a} )nf( {b}) is equal to {f(a)} which is not empty. Hence f is one-to-one.

1.8. Assume that f is onto and that BeY. If b E B then there exists some a E X such that f (a) = b and hence snch that a E f-1(B). Thus, b = f(a) E f(f-1(B)). This and Problem 1.6 imply that fU- 1(B)) = B.

Next, assume that f(f-1(B)) = B for every subset B of Y. If y E Y then f(f-1( {y})) is equal to {y} which implies that f- 1 ({y}) is not empty. Thus, f is onto.

1.9. Assume that the set S is countable and let the sequence {a1' a2, ... } denote the elements in S. Construct a sequence /3 of 0' sand l' s as follows: Let the n-th term in /3 be 0 if the n-th term in an is 1 and let then-th term in /3 be 1 otherwise. Note that }6 is an element of S yet is different from an for each n EN. This contradiction implies that the set S is not countable.

1.10. Fix n E N and note that every polynomial p(x) = ao + a1x + ... + anxn with integer coefficients is nniquely deter­mined by the point (ao, a1, ... , Ll:n) from the countable set zn+1. Thus, the set P of all such polynomials is countable and we may list the elements of P as a sequence {P1, P2, ... }. The fundamental theorem of algebra implies that the set Ak = {x E

lR. : Pk (x) = O} is a finite set for each k. Since a countable union of finite sets is countable it follows that the set of all algebraic numbers is countable.

1.11. A point x E lR. is said to be a point of condensation of a subset E of lR. if every open interval containing x contains 11n­countably many elements of E. To begin, we will show that any uncountable subset E of lR. has at least one point of condensa­tion.

Assume that there exists no condensation point of E. Then, for each x E E there exists an open interval Ix such that x E Ix and such that Ix n E is countable. Let Jx be an open interval such that Jx C Ix, such that x E Jx, and such that Jx has rational endpoints. Note that Jx n E is also countable. Further, the collection of all such intervals Jx is countable and may be

www.MathGeek.com

Page 206: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 205

enumerated as N 1 , N 2 , etc. Note that

(Xl

E= U NknE k=l

which implies that E is countable. This contradiction implies that E must have at least one point of condensation.

Now, let E be an uncountable set of positive real numbers and let a be a condensation point of E. If a i- 0 then let (0:, (3) be an open interval containing a such that CI: > O. Let {Xn}nEN be a sequen<:e of distind points in (0:, (3) n E and note that 2:~=1 Xn = 00 since Xn > CI: for each n. If a = 0 then since

(0, (3) = Uk=l 0:, (3) it follows that some interval of the form

(~, (3) contains un<:01mtably many point of E. From this point, we may pro<:eed as we did when a i- O.

1.12. Since n E F we see that F is closed under complementa­tion. That is, if A E F then n \ A = Ae E F. Now, let A E F and B E F. Then Be E F and hence A \ Be = A n B E F. Thus, F is closed under finite intersections. De lVIorgan's Law thus implies that F is also dosed under finite unions.

1.13. Re<:all that B(JR) is the smallest IT-algebra on JR <:ontain­ing all bounded open intervals. Let Q be the collection of all bounded open intervals of JR with rational endpoints and note that Q is countable. Further, note that (J(Q) is a subset of B(JR) sin<:e Q is a snbset of the <:olledion of all bonnded, open inter­vals. Assume that (J(Q) is a proper subset of B(JR). Then there must exist an open interval (x, y) that is not an element of (J(Q) since B(JR) is the smallest (J-algebra containing all such intervals. Let {Xn}nEN and {Yn}nEN be sequences of rational numbers such that Xn 1 x and Yn 1 Y with Xn < Yn for each n E N. Note that since (x, y) = U~=l(xn, Yn) it follows that (x, y) E (J(Q). This contradiction implies that IT(Q) = B(JR), and thus we see that B(JR) is count ably generated.

1.14. Consider the IT-algebra F given by the countable and cocountable subsets of R Note that F contains every singleton subset of R Assume that there exists a (J-algebra Q such that Q contains every singleton subset of JR and su<:h that Q is a proper subset of F. Let F E F with F tj. Q. Note that F can be written

www.MathGeek.com

Page 207: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

206 Solutions

as a countable union of singleton sets or as a complement of such a countable union. Thus, F E Q. This contradiction implies that :F must be the smallest IT-algebra containing all singleton sets.

1.15. To begin, note that A \ B is nnconntable. Let C be a conntable snbset of A \ B. Enumerate the elements of Band C such that B = {b l , b2 , ... } and C = {Cl l C2, ... }. Finally, consider a fnnction f : A \ B ----7 A via

1(x) = {~n C n

if x tj C if x = C2n

if x = C2n-l.

Note that 1 is onto and one-to-one. Thus, we conclude that A and A \ B are equipotent.

1.16. Let 1 : (0, 1] ----7 [0, (0) via 1 ( x) = (1 - x) / x . Let y E [0, (0) and note that 1(1/(1 +y)) = y. Thus, since 1/(1 +y) E (0, 1], we see that 1 is onto. Next, let a, b E (0, 1] with ai-b. Since (1 - a)/a i- (1 - b)/b we see that 1 is one-to-one. Thus, we conclude that (0, 1] and [0, (0) are equipotent.

1.17. No. Assume that M is a countably infinite IT-algebra on a nonempty set 0 and, for each w E 0, let

A",= n All. {MEM:"'EM}

Note that there are at most only a countable number of distinct A", 's since M is countable. If there are only a finite number of A", 's then )\It is finite, which contradicts our assumption. How­ever, if there are only a conntably infinite number of distinct A", 's then M must be nnconntable. To see why this last point holds, consider an enumeration of the elements of M as {j1{l,

M2 .. . }, consider an ennmeration of the distinct A",'s as {AI, A2 ... }, and define

N = {Aj J 0

if Aj ct. 1\;lj if Aj C 1\;lj .

Note that N = U~l N j is different from j1{j for every j and hence we conclude that M is not countable.

2.1. No. Let An = {n} for each n E N. Then N = U:=l An and the An's are disjoint, but JL(N) = 00 i- I:~=l p(An) = o.

www.MathGeek.com

Page 208: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 207

2.2. Consider the measure space (lR, B(lR) , A) where A is Lebesgue measure. Let An = (n, (0) for each n E Nand note that the An's comprise a strictly decreasing sequence of Borel sets. Further, the sequence converges to the empty set since given any real number x there exists an integer m such that x tj. An for anyn > m; i.e. limsupAn = 0. However, A(An) = 00 for each n E N and hence A(An) -/-'t 0 as n ---+ 00.

(What would happen if we required the measure p, to be a finite measure?)

2.3. Let U = {(x, y) E lR2 : x'2 + y2 < I} and recall that

liminfAn = u~=ln~nAk. Assume that (x, y) E liminfAn .

Then there exists some n E N such that (x, y) E nk=nAk. Note that (x, y) E n~n Ak if and only if

( (-1)k)2 2

X--k-,- +y <1

for all k 2 n since (x, y) E Ak if and only if

( (-I.)k) x--k-,y E U.

Since

( (_I)k)2

X - -k-' - + y2 < 1

for all k > n it follows that

for all k > n. Assume that x 2 + y'2 2 1. Then it follows that

for all k 2 TI, and hence that 2x(-I)k 2 l/k for all k 2 TI.

This last resnlt, however, cannot be trne since the left hand side alternates sign (or is zero) and the right hand side is always positive. Thus we conclude that x 2 + y2 must be less than 1. Hence (x, y) E U and thus liminf An C U.

www.MathGeek.com

Page 209: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

208 Solutions

Now, assume that (x, y) E U, let E = 1 - (x2 + y2), and note that E > O. Further, note that

x - -- +' 2 = x 2 _ + _ + ' 2 ( (_I)k) 2 2x(-I)k 1

k Y k k 2 Y

2 21xl 1 2 2 2 1 2 2 2 3 3 < x +-+-+y < x +-+-+y = x +y +- = l-E+­- k k2 k k· . k k

since (x, y) E U and kEN. Thus, we see that

( (_I)k)2

X - -k-' - + y2 < 1

if 3/k :::; E or if k 2 3/E. Thus, for n 2 3/E it follows that (x, y) E Ak for all k > 17. Hence, U c lim inf An which combined with the earlier result implies that lim inf An = U.

Let S = {(x, y) E ]R2 : x2 + y2 :::; I} \ {(O, 1) U (0, -I)}. Recall that lim sup An = n~=l U~n Ak and assume that (x, y) E lim sup An. Note that (x, y) E Uk=nAk for all 17 E N. \Ve will first show that x 2 + y2 :::; 1. Let E = x 2 + y2 - 1 and assume that E > O. Note that (x, y) E Ak if and only if

( (-1)k)2

X - -k-' - + y2 < 1

which is true if and only if

2x(-I)k 1 E < k k2·

Since (x, y) E lim sup An we know that (x, y) E Uk=nAk for all n E N. That is, for all n E N there exists some kEN such that k 2n and such that (x, y) E Ak. Note that (x, y) E Ak if and only if

2x 1 E < k - k2

if k is even and if and only if

-2x 1 E<T-k2

if k is odd. Assume that x 2 o. Then since E > 0 we see that (x, y) fj. Ak for any odd value of k. Let n be an integer such

www.MathGeek.com

Page 210: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 209

that n > 2 (x + 1) / t: and let k be any even integer not less than n. Since (x, y) E lim sup An we see that there exists an (even) integer rn such that rn 2: n and such that

2x 1 C"<---~ 'm ln2 ·

From this we conclude that

mt: 1 rn£ nt: x>-+->->-.

2 2m~ 2-2

Recall, however, that n > 2(x + 1)/t:. Hence, nE/2 > x + 1 and nt:/2 ::::; x. This contradiction implies that t: cannot be positive when x 2: o. A similar procedure shows that t: cannot be positive when x ::::; o. Thlls, we see that x2 + y2 ::::; 1 if (x, y) E lim sup An. Let (x, y) = (0, ±1). Then, (x, y) fj. Ak for any kEN since

( (_I)k)2 ( (_I)k)2 1

x - -- + y2 = --- + (±1)2 = - + 1 > 1 k k k2

for all kEN. Hence, lim sup An C S.

Now, let (x, y) E S and consider x < 0 and k odd. Then,

Note that if k > 1/lxl then, since x < 0, it follows that x + (l/k) < o. Hence, if k > 1/lxl and if k is odd then

( (-It)2 :2 :2 :2 x--'-k- +y <x +y ::::;1.

Thus, for any n E N we can find some kEN such that k > n and such that (x, y) E Ak if x < o. Hence, (x, y) E U~nAk for all n E N if x < o. A similar argument shows that (x, y) E Uk=n Ak for all 17 E N if x > o. Finally, if (0, y) E S then (0, y) E U = lim inf An. Since lim inf An C lim sup An we thus see that (0, y) E lim sup An- Hence, we conclude that S C lim sup An. Combined with our earlier result we see that lim sup An = S.

www.MathGeek.com

Page 211: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

210 Solutions

2.4. Let C be a subset of JR that is not a real Borel set. Let At = {t} for each t E JR and note that '\(At) = 0 for each t E R Let the index set I be given by C. Then UtE I At = C tj. B(JR). That is, an arbitrary union of null sets need not be a measurable set.

Let C be a real Borel set such that .\(C) > O. Let At = {t} for each t E JR and note that '\(At) = 0 for each t E R Let the index set I be given by C. Then UtEI At = C. That is, even when an arbitrary union of null sets is measurable it need not be a nnll set.

2.5. Let A be a countable subset of JR and note that, since A is countable, we may express A as a countable union of singleton sets; that is A = U:=I {an} where an E JR for each n. Recall that singleton subsets of JR are Borel sets. Thus, A as a countable union of Borel sets must also be a Borel set. Since the Borel sets are a subset of the Lebesgue sets we conclude that A is Lebesgue measurable. Let m denote Lebesgue measure on the real line and the Lebesgue measurable subsets of the real line. By countable subadditivity (or countable additivity if the an's are distinct) we see thatm(A) must be zero since m({an }) = 0 for each n.

2.6. Let A be a subset of JR such that A tj. B(JR). Define a function f : JR ---+ JR via f(x) = 2IA(x) - 1. Note that f is not Borel measurable since f-I( {I}) = A tj. B(JR). However, If I = 1 is Borel measurable.

2.7. Let £ be the collection of all sets A E (J(P) such that PI(A) = P2(A). Note that n E £ since PI and P2 are probability measures. Further, if A E £ then AC E £ since H (AC) = 1 -PI(A) = 1 - P2(A) = P2(AC). Finally, if An E £ for each n E N and if the An's are disjoint then UnEl'i An E £ since

www.MathGeek.com

Page 212: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 211

Thus, 1: is a A-system. By assumption, P c 1: and P is a 1f-system. Thus, the 1f-A theorem implies that (}"(P) c £.

3.1. Note that fn(O) = 0 for all n E N, and let x E (0, 1]. Choose 17 E N s11ch that 1/17 < x and note that fm(x) = 0 for all 'Tn ;:::: n. Hence, limn~oo fn(x) = 0 for all x E [0, 1]. Since f01 fn(x) dx = 1 for all n it follows that limn--->oo f~ fn(x) dx = 1. The final integral is zero since the integrand is zero.

3.2. Clearly, J-L maps JID(O) into [0, 00]. Indeed, it maps it into the set {O, 1, 2, 3, 4}. Further, J-L(0) = 0 since 0 contains zero elements. Finally, if A and B are disjoint sets then J-L(A U B) is simply J-L(A) + J-L(B) , the number of points in Au B. Next, note that

r f dJ-L + r f dJ-L + r f dJ-L + r f dJ-L ~1} ~2} J{3} ~4} f(l)f1({l}) + f(2)J-L( {2})

+f(3)f1({3}) + f(4)f1({4})

1 X 1 + 4 X 1 + 9 X 1 + 16 X 1

30.

3.3. Recall the integration by parts theorem for Riemann­Stieltjes integrals. Since F is continuous and of bounded vari­ation it follows that the integral exists. Thus, integrating by parts we see that

lb F(x) dF(x) = (F(b))2 - (F(a))2 - lb F(x) dF(x).

Since F(b) = 1 and F( a) = 0 we see that

lb 1

F(x) dF(x) = -. a 2

3.4. Let f be a probability density fnndion associated with F. Then

i: (F(x + c) - F(x)) dx = i: lx+c f(t) dtdx

= i: l~c dx f(t) dt = c i: f(t) dt = c.

www.MathGeek.com

Page 213: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

212

For an alternate solution, note that

E[X] = 10= P(X > t) dt - i O= P(X < t) dt.

Thus,

c E[X - (X - c)] E[X] - E[X - c]

10= (1 - F(t)) dt - iOco F(t) dt

-loco (1 - F(t + c)) dt + iO= F(t + c) dt

Solutions

10= (F(t + c) - F(t)) dt + iOco (F(t + c) - F(t)) dt

i: (F(t + c) - F(t)) dt.

3.5. Using Lemma 5.3 on page 96, it follows that we shonld choose

{ 0 if x < 0

9 (x) = 1 if x > O.

Note that 9 is not differentiable at the origin, and hence the typical engineering appeal to "derivatives of the step function" is nonsensical.

4.1. No. Consider the three real numbers 1, 2, and 3. Note that d(l, 3) = 4 but d(l, 2) = 1 and d(2, 3) = 1. Thus, we see that d(l, 3) > d(l, 2) +d(2, 3). Hence, d does not satisfy the triangle inequality and consequently cannot be a metric.

4.2. Consider the metric p defined on the positive integers N via p(n, rn) = In - rnl. Notice that B(l, 1) = {1}, while B(l, 1) = {1, 2}. Further, the closure of B(l, 1) is equal to {1}. Thus, the closure of B(l, 1) is a proper subset of the closed ball B(l, 1).

4.3. Let a = 0, let b = 1, and let fn(t) = n2te-nt . Clearly this seqnence converges to zero pointwise as n ---+ 00. However, note that fn(t) has a maximnm at t = lin and that fn(l/n) = nle. Thns, we see that although fn(t) ---+ 0 as 17 ---+ 00, dUn, 0) ---+ 00

as n ---+ 00.

www.MathGeek.com

Page 214: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 213

4.4. For each positive integern, let Xn be a rational number in the interval (J2 - (lIn), J2). Note that d(xn' xm) < (lIn) + (11m). Hence, we see that {Xn}nEN is a Cauchy sequence in Q. However, there is no element in Q to which Xn converges. Since we have found a Cauchy sequence in Q that does not converge to a point in Q we see that the rational line is not complete.

5.1. The polynomial x2 + 2Bx + C has real roots if and only if B2 - C :2:: O. Thus, we are seeking the probability that B2 :2:: C. This probability is given by

101 1ob2 101

1 P(B2 :2:: C) = JE c(b, c) dcdb = b2 db = -. o .0' 0 3

That is, the polynomial has real roots with probability 1/3.

5.2. Note that

P(0.3::::; VX < 0.4) P(0.09 ::::; X < 0.16)

0.16 - 0.09 = 0.07.

5.3. Note first that Fj(l : (0, 1) ---+ lR exists and is strictly increasing. Thus, if Z = Fx(X) then it follows that Fz(z) = P(Z ::::; z) = P(Fx(X) ::::; z) = P(X ::::; Fj(I(Z)) = Fx(Fj(I(Z)) = z for 0 < z < 1. Thus, Z is nniform on (0, 1).

5.4. Let Y = -In(F(X)) and, as above, note that Fy(y) = P(Y ::::; y) = P( -In(F(X)) ::::; y) = P(ln(F(X)) :2:: -y) = P(F(X) :2:: exp( -y)) = P(X :2:: F-I(exp( -y))) = 1 - P(X ::::; F-l(exp( -y))) = 1 - F(F-I(exp( -y))) = 1 - exp( -y) for y :2:: o where the sixth equality follows from the continuity of the indicated distribution function. Thus, .!y (y) = exp( -y) for y :2:: o and is zero for y < o. 5.5. Note that Fz(z) = P(Z ::::; z) = P(X ::::; z, Y ::::; z) = Fx,Y(z, z). Also, note that Fw(w) = P(W ::::; w) = 1 - P(W > w) = 1-P(X > w, Y > w) = P(X::::; w)+P(Y::::; w)-P(X::::; w, Y ::::; w) = Fx(w) + Fy(w) - Fx,Y(w, w).

5.6. Assume that X and Y have a joint probability distribution function given by F. Note that P(XI < X ::::; X2, YI < Y ::::; Y2) = P(X ::::; X2, Y ::::; Y2) - P(X ::::; Xl, Y ::::; Y2) - P(X ::::; X2, Y ::::; yd + P(X ::::; Xl, Y ::::; YI) = F(X2' Y2) - F(Xl' Y2) - F(X2'

www.MathGeek.com

Page 215: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

214 Solutions

Yl) + F(Xl' yd :2: O. Thus G is not a distribution function since G(2, 2) - G(O, 2) - G(2, 0) + G(O, 0) = 1 - 1 - 1 + 0 = -1.

5.7. Let C be the circle of radius one centered at the origin. Note that

.fx(X)

where -1 < x < 1.

r ~Ic(x,y)dy JIR 1["

jY l-X2 1 -dy

-yl-x2 1["

~Jl- X2 1["

6.1. To begin, note that the three sets AnB, AnBc, and BnAc partition Au B. Thus, by countable additivity it follows that P(A U B) = P(A n B) + P(A nBC) + P(B n AC) which implies that P(AC nBC) = 1 - P(A)P(B) - P(A nBC) - P(B n AC) where we have used De Morgan's Law and the fact that A and B are independent. Note that since A and B are independent and since An BC and An B partition A it follows that P(A nBC) = P(A) - P(A n B) = P(A)(1 - P(B)) = P(A)P(BC). Similarly, it follows that P(B n AC) = P(B)P(AC). Substituting we see that P(AC nBC) = P(AC)P(BC) which implies that AC and BC are independent.

6.2. Note that a cirde with llnit area has radills r = 1/ y'iF. Assume that the dart board is the circle of unit area centered at the origin in ]R2. Note that P(X E [r/y'2, r]) and P(Y E

[r / y'2, rn are each positive since the dart's final resting place is determined by a uniform distribution over the area of the board. However, P(X E [r / y'2, r], Y E [r / y'2, r]) is zero since the region in question is outside of the circle. Thus, X and Y are not independent.

6.3. No, since P(A), P(B), and P(C) each equal 1/2 yet p(An B n C) is equal to zero.

6.4. Let h denote the number of keystrokes required to type Hamlet. Let D = {Wl, ... , wrn} denote the m different char­acters that the typewriter is able to produce. The monkey's output may be thought of as a sequence of experiments where

www.MathGeek.com

Page 216: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 215

the outcome of each experiment is an element of Oh. The prob­ability of each possible outcome is simply the product of the probabilities of the keystrokes required to produce it. Let Pi de­note the probability of D:i E Oh where 1 ::::;i ::::; mh. Note that Pi

is positive for each i and that the text of the play Hamlet corre­sponds to {Xj for some integer j. We may model the situation as follows: Repeatedly toss an m h sided die where the ith side of the die appears on top with probability Pi. Our question then is on how many tosses will the jth side appear on top. Since each side comes up with positive probability and since the tosses are made independently the second Borel-Cantelli lemma implies that the jth side (and, indeed, each side) will with probability one appear infinitely many times.

6.5. No, since

and since

fy(y) = loy 2e-x e-Y dx = 2e-Y (1 - e-Y ); y ~ 0

and thus f(x, y) i= fx(x)fy(y)·

6.6. To begin, note that the area of the disc is 251T and the area of the disc with those points removed that are less than one mile from the center is 241T. Thus, the probability that there are no hits within one mile of the target after N shots is (24/25)N. Hence, the probability that there is at least one hit within a mile of the target after N shots is equal to 1 - (24/25)N. This probability exceeds 0.95 when N ~ 74.

7.1. Recall that O"(X) = X-I(B(lR)). Let A be a real Borel set. If 87 E A then X-I(A) = 0 since X(w) E A for each w E O. Similarly, if 87 tj. A then X-I(A) is empty. Thus, iJ(X) = {0, O}.

7.2. If A is a real Borel set such that A C (-00, 0) then X-I (A) = 0. Further, if A is any real Borel set then X-I(A) = X-I(B) where B = An [0, (0). If A is a real Borel set such that A C [0, (0) then let VA denote the set {ft : x E A} and let -A denote the set {-x: x E A}. For such a set A it

www.MathGeek.com

Page 217: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

216 Solutions

then follows that X-l(A) = VAu (-VA) and hence that IT(X) consists of all sets of this form. But, any real Borel set B C [0, 00) may be written as VC for the set C = {x2

: x E B}. Thus, O"(X) consists of all sets of the form B U - B where B C [0, 00) is a real Borel set.

7.3. Assume that X is not equal to Y a.s. Then there must exist a set of positive probability on which X - Y is not zero. Hence, there exists a set of positive probability on which (X - y)2 is positive. But, if (X - y)2 is positive on a set of positive probability then E[(X - y)2] cannot be equal to zero. This contradiction implies that X must equal Y a.s.

7.4. Note that E[X] = 0 and that

VAR[X] = ~ ( x2e- 1xl dx = 2. 2 JIT€.

Thus, Chebyshev implies that P(IXI > 2) ::; 1/2. Note, how­ever, that

1 2 . P(IXI > 2) = 1 - -1 e- 1xl dx = e-2 ~ 0.135.

2. -2

7.5. Recall that IT(Y) = y-1(8(lR)). Let B E 8(lR) and note that y-1(B) = X-1(g-1(B)). Since 9 is Borel measurable it follows that g-l(B) E 8(lR) and hence that X-1(g-1(B)) E

O"(X). Equality will occur when any Borel set B may be written as g-l(A) for some A E 8(lR). (For example, if 9 is bijective.)

7.6. Let A and B be nonempty sets, let j : A ----7 B, and let 9 be a collection of subsets of B. To begin, we will show that j-1(O"B(9)) = O"A(f-1(9)) where, for a nonempty set M and a collection .Iv/ of subsets of A1, O"M(M) denotes the smallest O"-algebra on M that contains every set in M.

Recall that if {Bi : i E I} is a collection of subsets of B then j-1(UiEI B i ) = UiEI j-l(B;) and j-l(niEI B i ) = niEI j-l(Bi)' That is, intersections and inverses commute and unions and in­verses commute. Let 91 = {G c B : G E 9 or GC E 9}. Further, for each positive integer i > 1, let 9i denote the set of all countable unions and all countable intersections of elements in Uj<i 9j . Note that Uj <= 9j is a IT-algebra and, in fact, is

www.MathGeek.com

Page 218: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 217

equal to (T(Q). Further, j-1((T(9)) = Uj<oo j-1(9i). Also, since j-1 (9i) is the set of all countable unions and countable inter­sections of elements in Uj<i j-1(QJ it follows that Ui<oo j-1(9i) is a O"-algebra. Indeed, it is the O"-algebra generated by j-1 (9). Thus, j-1(0"(9)) = O"U-1(9)).

Consider a random variable X defined on a probability space (n, F, P). Now, let S be a collntable collection of sllbsets of ffi. such that B(ffi.) = O"(S). Our first result implies that X-I (O"lR(S)) = 0"1l(X-1(S)). Note, also, that X-1(O"lR(S)) = X-1(B(ffi.)) = (Tll(X). Thus, (Tll(X) = (Tll(X-1(S)) from which it follows immediately that O"Il(X) is countably generated.

7.7. Consider measurable spaces (n1' F 1 ) and (n2' F 2 ) and let j be a function mapping fh to n2 . Let A be a collection of subsets of fh such that O"(A) = F 2. If j-1(A) C F1 then j-1(F2) c Fl. To see why this holds, recall that complements, unions, and intersections commute with inverses. Thus, the collection y of all subsets A of n2 such that j-1(A) E F1 is a (T-algebra on n2 .

Note that n2 E y since j-1(0) = 0. Further, note that A c y. This implies that O"(A) C y. Since O"(A) = F2 the desired result follows immediately.

It follows immediately that if X-I (B(ffi.)) c F then X-1((-00, xl) E F for each x E R Further, using the result of the previous paragraph, it follows that X-I (B(ffi.)) c F if X-I (( -00, xl) E F for each x E R

7.S. The forward implication is clear. The reverse implication follows quickly via a proof by contradiction.

7.9. Note that

E[IX-Yll ~ r2 r2

Ix - yl dx dy 4 Jo Jo 1 2 2 1 2 y - r r (x-y)dxdy+- r r (y-x)dxdy 4 Jo Jy 4 Jo Jo 1 2 ( 1 ) 4lo 2 - 2y - "2y2 + y2 dy

1 102 (2 1 2) +- Y - -y dy 4.0 2

1 ., 4lo~ (y2 - 2y + 2) dy

www.MathGeek.com

Page 219: Probability - Basic Ideas and Selected Topics

218

7.10. Note that

2

3

P(Z:S; z)

www.MathGeek.com

P(max(Xl' ... , Xn) :s; Z) P(XI :s; z, ... , Xn :s; Z) P(XI :s; Z) ... P(Xn :s; z)

(~)n

for 0 < z :s; B. Thus, it follows that

for 0 < z < B. Hence,

7.11. To begin, recall that

1 1 cos ( a) cos ( b) = "2 cos (a - b) + "2 cos (a + b)

and cos(a + b) = cos(a) cos (b) - sin(a) sin(b).

Solutions

Note also that E[cos(2X)] = 0 and E[sin(2X)] = 0 since Cx (2) = E[cos(2X)] + ~E[sin(2X)] = O. Thus, it follows that

E[cos(X + s) cos(X + s + 1)]

E [~COS(l) + ~ cos(2X + 2s + 1)] 1 1 "2 cos(l) + "2E[cos(2X + 2s + 1)]

1 1 "2 cos(l) + "2E[cos(2X) cos(2s + 1) - sin(2X) sin(2s + 1)]

www.MathGeek.com

Page 220: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems

1 1 - cos(l) + - cos(2s + 1)E[cos(2X)] 2 2

1 -- sin(2s + 1)E[sin(2X)]

2 1 "2 cos(l).

8.1. Note that

E[X] laoo

xdF(x)

laoo

lax dt dF(x)

laoo 100

dF(x) dt

laoo

P(X > t) dt.

219

8.2. Since E[(X - m)2] = E[X2] - 2mE[X] + m2 it follows that d

dm E[(X - m)2] = -2E[X] + 2m. Setting this latter expression

equal to zero implies that m = E[X] is a critical point. Since ~ . .

dm2E[(X - m)2] = 2 > 0 it follows that m minimizes E[(X-

m)2J.

8.3. Apply the Cauchy-Schwarz inequality to the product XY to see that

IE[XY]I :::; E[IXYI]:::; VE[X2]/E[Y2].

8.4. Note that COV[X, Y] = E[(X - E[X])(Y - E[Y])] E[XY]-E[X]E[Y]-E[X]E[Y]+E[X]E[Y] = E[XY]-E[X]E[Y].

8 .. 5. Apply the Cauchy-Schwarz inequality to X - E[X] and Y - E[Y] to see that

IE[(X - E[X])(Y - E[Y])] I

:::; VE[ (X - E[X])2] V'---E[-(Y---E-[Y-])-2]

= O"xO"y

which implies that

Ip(X, Y)I = ICOV[X, Y]I :::; 1. O"xO"y

www.MathGeek.com

Page 221: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

220 Solutions

8.6. For a, b E lR let Z = aX - bY. Note that 0 ::; E[Z2] = a2E[X2]-2abE[XY]+b2E[Y'lJ. Note that the right hand side is a quadratic equation in a that has at most one real root (possibly of multiplicity two). Note that the roots of this expression are given by

2bE[XY] ± /4b2E[XYJ2 - 4E[X2]b2E[Y2]

2E[X2]

Based upon the previoU8 observation we know that 4b2E[XYj2-4E[X2]b2E[y2] ::; 0 and hence that E[Xy]2 - E[X2]E[y2] ::; O. Equality holds if and only if E[ Z2], as a function of a, has a real root. Thus, equality holds if and only E[(aX - by)2] = 0 for some a and b not both eqnal to zero. Thns, eqnality holds if and only if P(aX = bY) = 1 for a and b not both zero. In fact, if p(X, Y) = 1 then Y increases linearly with X (almost snrely) and if p(X, Y) = -1 then Y decreases linearly with X (almost surely).

8.7. It follows quickly that

1 lb a + b E[Y] = -b- ydy =--- a a 2

and that

1 lb (a + b)2 VAR[Y] = -- y2 dy - --b - a a 2

(b - a)2

12

8.8. Recall that Mx(t) = exp().(e t - 1)). Thus, M'x(t)

).et exp().(e t - 1)) and lVf'};(t) = ().et + ).2e2t ) exp().(e t

- 1)). Thus, E[X] = M·'x(O) = ). and E[X2] = 1\1'};(0) = ). +).2 which implies that VAR[X] = E[X2]- E[X]2 = ).+).2 _).2 = ).. Recall that VAR[aX] = a2VAR[X] for a E R Thns, by independence we see that VAR[Y] = (16 + 1 + 36 + 9)(3) = 186.

8.9. Note that E[(X - aY?] = E[X2] - 2aE[XY] + a2E[y2]. d .

Hence, -d E[(X - ay)2] = -2E[XY] + 2aE[y2] = 0 If a = a

2

E[XY]/E[y2]. Since dd "E[(X - aY?] = 2E[y2] > 0 it follows a"

that this choice for a a results in a minimum value of E[(X -ay)2].

www.MathGeek.com

Page 222: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 221

8.10. Let Z = I:~=l Xi and note that E[XIZ] = E[Xfl + (n­l)fL2 = 0-2 + fL2 + (n -1)fL2 = 0-2 + nfL2, that E[XI ] = fL, and that E[Z] = nfL. Thus, COV[XI' Z] = 0-2 +nfL2 - (fL)(nfL) = 0-2. Further, VAR[XI ] = 0-2 and VAR[Z] = n0-2. Thus,

8.11. Recall that

for k = 0, 1, 2, .... Thus,

exp ( A (t - 1)).

8.12. Recall that if Y is has a uniform distribution on (0, 1) then

e,t - 1 <Py ( t) = zt .

Note that if X = 2Y - 1 then X has a uniform distribution on (-1, 1) and

<P x (t) e -It<py (2t)

e- lt ( e21~z~ 1 )

e,t _ e-1t

2d sin( t)

t

Finally, if Sn = Xl + ... + Xn then

<P, = (sin ( t) ) n Sn t

www.MathGeek.com

Page 223: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

222

8.13. Note that <I>(t) is real-valued, and hence that

<I>(t) = E[cos(tX)] = 1: cos(tx)f(x) dx.

Solutions

Let g(x) = cos(tx)J f(x), let h(x) = J f(x), and re(;all that S(;hwarz's inequality implies that

(1: g(x)h(x) dx r ::; 1: i(x) dx 1: h2(x) dx.

Thus,

<I>2(t) < 1: cos2(tx)f(x) dx 1: f(x) dx

1/00

"2. -00 (1 + (;os(2tx))f(x) dx

1 1 "2 + "2<I>(2t)

from which the desired result follows immediately.

For an alternate solution (that does not require X to possess a density fundion), note that

<I>(t) E[(;os(tX)]

E [2 cos2 C;) -1 ]

2E [cos2 C;)] - 1.

Thus, <I>(2t) = 2E[cos2 (tX)]-1. Jensen's inequality implies that E[cos2 (tX)] ;::: E[cos(tX)J2. Thus, <I>(2t) ;::: 2E[cos(tX)J2 - 1 = 2<I>2(t) - 1 from which the desired result again follows immedi­ately.

8.14. Note that X is equal to 0, 1, and 2 with probabilities 1/4, 1/2, and 1/4, respectively. Thus,

1'vlx(s) E[eSX]

2

'LeskP(X=k) k=O

ens P(X = 0) + e1s P(X = 1) + e2s P(X = 2) 1 1 1 _ + _es + _e2s

4 2 4

~(1 + es )2. 4

www.MathGeek.com

Page 224: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems

Note that I () 1 1 2 lvI s = _es + -e S

x 2 2'

that

and, in general, that

lvIt'l(s) = ~es + 2n-2e2s . 2

Thus, we see that

for n E N.

8.15. To begin, note that

lvIx(s) = E[eSX] = ~(1 + eS + e2s

).

Further, note that

Z=

From this we see that

0 if X = 0 and Y = 0 0 if X = 2 and Y = 1 1 if X = 1 and Y = 0 1 if X = 0 and Y = 1 2 if X = 1 and Y = 1 2 if X = 2 and Y = o.

{

0 WP 1/3 Z = 21 WP 1/3

WP 1/3,

from which it follows that

JI/1z(s) = E[eSZ] = ~(1 + eS + e2s

). 3

Next, note that

0 if X = 0 and Y = 0 1 if X = 0 and Y = 1

X+Z= 2 if X = 1 and Y = 0 2 if X = 2 and Y = 1 3 if X = 1 and Y = 1 4 if X = 2 and Y = o.

www.MathGeek.com

223

Page 225: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

224 Solutions

Thus, we see that

0 wp 1/9 1 wp 2/9

X+Z= 2 wp 1/3 3 wp 2/9 4 wp 1/9.

Hence,

~1 ~ () 1 2 8 1 28 2 3s 1 48 IVjX+Z s = - + -c + -c + -e + -c .

9 9 3 9 9

Note also that

as well. However, X and Z are not independent since P(Z = 0, X = 1) = 0 even though P(Z = 0) and P(X = 1) are each positive.

8.16. The random variables X and Yare uncorrelated since

127r 1

E[XY] = - cos( e) sin( e) de = 0, o 21r

127r 1

E[X] = - cos( e) de = 0, .0 21r

and, 27r 1

E[Y] = ( - sin(e) de = o. Jo 21r

X and Yare not independent since, however, Sll1ce P(X E

[1/V2, 1], Y E [1/V2, 1]) = 0 =I P(X E [1/V2, I])P(Y E [1/V2, 1]).

8.17. Consider uncorrelated random variables X and Y with joint probability distribution function P(X = Xi, Y = Yj) = Pij

for i, j = 1, 2 and marginal probability distributions P(X = Xi) = Pi fori = 1, 2 and P(Y = Yj) = qj for j = 1, 2. Note that Pll + P12 + P21 + P22 = 1, Pi1 + Pi2 = Pi for i = 1, 2, P1j + P2j = qj for j = 1, 2, P1 + P2 = 1, and q1 + q2 = 1.

[] 2 2. [] 2 Note that E XY = 2:: i =l 2::j =l XiYjPij, EX = 2:: i =l XiPi, and E[Y] = 2::;=1 Yjqj. Since X and Yare uncorrelated it follows

www.MathGeek.com

Page 226: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 225

that E[XY] - E[X]E[Y] = 0 and hence that X1Yl (pu - Plqd +

X1Y2(P12 - Plq2) + X2Yl (P2l - P2ql) + X2Y2(P22 - P2Q2) = O. Notice that P12 = Pl - Pll, P2l = ql - Pll, P22 = q2 - P12 = q2 - Pl + Pll· Substitution yields X1Yl (Pll - Plql) - X(lj2(Pll - Pl + Plq2) -X2Yl(Pll-ql +P2ql)+X2Y2(Pll-Pl +q2-P2q2) = O. Next, note that Plql = Pl - P1Q2 = ql - P2Ql = Pl - q2 + P2Q2· Substituting again implies that (X1Yl - X1Y2 - X2Yl + X2Y2)(Pll - P1Ql) = 0 or that (Xl - X2)(Yl - Y2)(Pll - P1Qd = O. Since Xl i- X2 and Yl i- Y2 it follows that Pll = P1Ql· From this we see that P12 = Pl-P1Ql = Pl(I-Ql) = P1Q2, P2l = Ql-P1Ql = Ql(I-Pl) = P2Ql, and, P22 = P2 - P2l = P2 - P2Ql = P2(1 - Ql) = P2Q2· That is, Pij = PiQj for i, j = 1, 2. Thus, X and Yare independent.

9.1. Let fR denote a uniform density on (a, b). Let X be the length of the circumference and let Y be the area of the circle. It follows that

1 ( X ) 1 fx(x) = 21/R 211" = 211"(b - a)

for 211"a < X < 211"b and

1 ( fY) 1 1 fy(y) = 2y11rYfR V -; = 2y11rY b - a

for 11"a2 ::::; Y ::::; 11"b2.

9.2. Let Z = XY. Theorem 5.17 on page 103 implies that

fz(z) = I: I~I fy(y)fx (~) dy

_1_ (= yexp (_y2

) ~ 1 dy 11"0"2 llzl . 20"2 lyl VI - (Z2/ y2) .

(since Iz/yl < 1 and Y > 0)

1 (_Z2) 10= (-t2) ~2 exp ~-2 exp ~2 11" (J" 2(J". 0 2(J"

1 t d x t VI - (Z2/(t2 + Z2)) Jt2 + Z2

(where we let y2 = t2 + Z2)

1 (_Z2) 10= (-t2) = ~') exp ~-2 exp -') dt 11" O"~ 20" 0 20"~

www.MathGeek.com

Page 227: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

226 Solutions

-exp - -V21ra2 1 (_Z2) 1 1ra2 2a2 2

_1 exp (_Z2) rrV'iii 2a2

for z E JR. That is, Z is N(O, ( 2).

9.3. Let g(x, y) = x/(x+y) and let h(x, y) = x+y. Note that if a(b, t) = bt then a(g(x, y), h(x, y)) = a(x/(x + y), x + y) = x, and if /3(b, t) = t(l- b) then (3(g(x, y), h(x, y)) = (3(x/(x + y), x + y) = y. Let B = X/(X + Y) and T = X + Y. Then

1

[

8a/8b 8(3/8b II fB,T(b, t) = fx,y(a(b, t), J3(b, t)) det 8a/8t 8(3/8t

fx,Y(bt, t(l - b)) Idet [~ 1 __ t b II

e-bte-Hbtltl

te-t

for t > 0 and 0 < b < 1. Thus,

fB(b) = 10= te-t dt = 1

for 0 < b < 1 which implies that B is uniform on (0, 1).

9.4. Let g(x) = l/x for x > 0 and note that g-l = g. Thus, it follows that

I1/x(Y) = fX(g-l(y)) Id~yg-1(Y)1

Ix ( 1 / y) 1 ~211 fx(1/y)/y2

for y > O.

10.1. Note that

P(Z < 2.44) ( Z - 5 2.44 - 5)

P --<---2 2

P(X < -1.28)

P(X> 1.28)

1/10.

www.MathGeek.com

Page 228: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 227

Thus, a = 1.28.

10.2. Recall the moment generating function lVIx and note that

JVI~(t) = ((J"2 + (m + (J"2t)2) exp (J 2t + tm [

2 :2 1

and that

lVI'!!(t) = (2(m + (J2t)(J2 + ((J2 + (Tn + (J2t)2)(m + (J2t))

[(J"2 t2 1

x exp -2- + tTn .

Thus, E[X3] = lVI'j(.(O) = 2m(J"2 + ((J"2 + m 2)(m) = 3m(J"2 + m 3. If m = 0 then E[X97] = 0 since x97 fx(x) is an integrable, odd function.

10.3. Let T = ViiZlvV and let E = VV. Note that Z = TEI;n. Theorem 5.19 on page 103 and the independence of VV and Z imply that

fT,B(t, b) = fz,w (~, b) det [~ : 1 Vii

= f . (.!!!..- b) ~ Z.R Vii' Vii

= fz ( b~) fw(b) 1%. y'lL vn

Thus, we see that

r ( bt ) Ibl h(t) = JIR fz Vii fw(b) Vii db.

10.4. Note that

P(Y ~ y) P(XZ ~ y)

P(XZ ~ y, Z = 1) + P(XZ ~ y, Z = -1) P(X ~ y, Z = 1) +P(-X ~ y, Z = -1)

P(X ~ y)P(Z = 1) + P( -X ~ y)P(Z = -1) 1 1

P(X ~ Y)"2 + P(X ~ Y)"2

(since -X is also standard Gaussian)

= P(X ~ y)

www.MathGeek.com

Page 229: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

228 Solutions

which implies that Y is standard Gaussian. Note, however, that X + Y is not Gaussian since

X + Y = {2X wp 1/2 o wp 1/2.

That is, X + Y has a discontinuous distribution function. Note, also, that X and Yare nncorrelated since E[XY] = E[X2 Z] = E[X2]E[Z] = O. Finally, however, note that X and Yare not independent since P(X E [1, 2], Y E [3, 4]) = 0 yet P(X E [1, 2]) and P(Y E [3, 4]) are each positive.

10.5. Let

[ ~~ ] = [~~ ~~] [ ~~ ] = [ ~~~~ ! ~~~~ ]. Note that Zl and Z2 are mntnally Ganssian. Thns, for Zl and Z2 to be independent we require that E[ZlZ2] = E[Zl]E[Z2]. (Note the we effectively have one equation and four variables so long as we ensure that C is not singular.) Let C1 = C3 = 1, let C2 = 0, and note that E[ZlZ2] = E[X1(X1 +C4X2)] = E[XiJ+C4E[X1X2]. Note that E[Xf] = 1 and E[X1X 2] = 1/3. Thus, Zl and Z2 are independent if C4 = -3 with C1, C2, and C3 as given above.

10.6. Recall that

. ~t ~t ( 22) (22) Alx(t) = exp -2- + tm1 and },fy(t) = exp -2- + tm2 .

Thus,

. (0"1+0"2 t (

. 2 2) 2 ) lVlx+y(t) = exp 2 + t(m1 + m2)

which implies that X + Y is N(m1 + m2, O"r + O"i).

10.7. Note that

k etx (>.f1 (x) + (1 - )..)i2(x)) dx

).. k etx f1(X) dx + (1 -)..) k etx i2(x) dx

).. [exp (0"~t2 + tm1) ]

+ (1 - )..) [exp ((}~t2 + tm2) ]

)"lVh (t) + (1 - )")lVh(t)

www.MathGeek.com

Page 230: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 229

where lvfl is the moment generating function associated with h and A12 is the moment generating function associated with h. Note, also, that

E[X] lvI~(O)

[A(lTit + lndlvh(t) + (1 - A)(IT~t + m2)M2(t)L=o

An~l + (1 - A)m2.

Further,

E[X2] M~(O)

[AO"i lvfl (t) + A(O"it + ml)21\1l (t)

+(1 - A)0"~lvf2(t) + (1 - A)(O"~t + m2)21\12(t)]t=o 2 2 ()2 ) 2 AlTl + Aml + 1 - A 1T2 + (1 - A m 2.

Thus,

VAR[X]

10.8. Let Z = max(X, Y) and note that Fz(z) = Fx,Y(z, z) via Problem 5.5. Let F be the distribution function of X (and Y) and let j be a density function for X (and Y). Then, since Fz(z) = F(z)F(z) it follows that jz(z) = F~(z) = 2F(z)j(z). Thus,

E[Z] i: zfz(z) dz

i: 2zj(z)F(z) dz

2 roo z_1_e-z2 /2 rz _1_e-t2 /2 dt dz L= v'2i L= v'2i

~ roo t ze-z2/2e-t2/2dtdz 7r i-= i-= 1 j= 1= 2/. 2 '. _ e-t 2ze-z/2dzdt 7r -= t

11= 1= 2 2 _ ze-(Z +t )/2 dz dt 7r -=. t

11= h= w 1 - e-"2 - dwdt 7r -=. 2t2 2

www.MathGeek.com

Page 231: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

230 Solutions

11(Xl _t2 d - e t 1[" -(Xl ~ 1(Xl e-t2 / 2 _

1_ dt = _1_. 1[" -(Xl J2 yIK

11.1. Note that the Xn's are mutually independent and that 2:~=1 P(Xn = 1) = 2:~=1 ~ = 00. Thus, the second Borel­Cantelli Lemma implies that P(lim sup{ Xn = I}) = 1. That is, there exists a null set A such that for any w E AC, Xn(w) = 1 for infinitely many values of n. Hence, Xn does not converge to zero for any w E AC. Since P(AC) = 1 it follows that the Xn's cannot converge to zero almost snrely.

11.2. Since Cx ; (t) = e- 1tl for eachi and since the Xi's are mutually independent, it follows that

CSn/n(t) = (Cx; (~)) n = Cx;(t).

Thus, Sn/n has a Cauchy distribution centered at zero with parameter 1. That is, the distribution of the normalized sum is the same as the distribution of any specific random variable in the sum.

11.3. Consider a sequence {Xn}nEN" of random variables such that

_ {n:3 Xn - 0

with probability 1/n2

with probability 1 - (l/n2).

Fix s > 0 and note that

P(IXn - 01 2 s) = P(Xn 2 E) = {01 n 2

if s > 7)3

if 0 < s S; n 3.

Thus, since P(Xn 2 E) ---+ 0 as n ---+ 00 for any positive E we conclude that Xn converges in probability to zero. Note, however, that since E[X~l = n 3p /n2 = n 3p

-2 it follows that the

Xn's do not converge to zero in Lp for any p > 1.

11.4. By Theorem 5.40 we know that Xn ---+ C in distribution if Xn ---+ C in probability. We will show that Xn ---+ C in probability if Xn ---+ C in distribution. If we consider c to be a constant random variable then Fc(x) = I[c, 00) (x). Let s > 0 be given and note that

P(IXn - cl 2 s) = P(Xn S; c - s) + P(Xn 2 c + s).

www.MathGeek.com

Page 232: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 231

Note that P(Xn S C - E) = FXn(c - E) ----Jo 0 as n ----Jo 00 since Fxn(x) ----Jo Fc(x) as n ----Jo 00 for all x < c. Similarly, P(Xn ;:::: C + E) ----Jo 0 as n ----Jo 00. From this we see that Xn ----Jo C in probability.

11.5. Note that

P(Zn S z) P(n(l - max{Xl' ... , Xn}) S z)

P (1 - max{Xl , ... , Xn} S ~) 17,

P (max{Xl , ... , Xn} ;:::: 1 - ~) n

1 - P (max{Xl , ... , Xn} < 1 - ~) 17,

1 - P (Xl < 1 - ;;, ... , Xn < 1 - ;;)

1 - P (Xl < 1 - ~) ... P ( Xn < 1 - ~) 1 _ (1 _ ~)n

n

for 0 < z < n. Recall that

1 - - ----Jo e- z ( z)n n

as n ----Jo 00. Thus, FZn (z) ----Jo 1 - e-Z for 0 < z < 00 as n ----Jo 00.

11.6. If the X;'s are mutually independent uniform random vari­ables taking values in the interval (-0.05, 0.05) then Xi has mean zero and variance 0.01/12 for each ,i. Let Z be a random variable with a standard Gaussian distribution function and let <I>(x) = P(Z S x). The Central Limit Theorem implies that 5n has approximately the same distribution as Z(Jvn. Thus,

P (IZI < 2 ) )1000(0.01)/12

p(IZI < _2 ) 0.91

P(IZI < 2.19)

2<I>(2.19) - 1 = 0.97.

www.MathGeek.com

Page 233: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

232 Solutions

11.7. Let Xi equal 0 or 1 if the ith flip is tails or heads, respec­tively. Thus, X = I:~~'~OO Xi denotes the number of heads that are observed in 10,000 flips. VVe will approximate P(X = 5000) by finding P(4999.5 < X < .5000.5) with an appeal to the Cen­tral Limit Theorem.

The Central Limit Theorem implies that

Let Z be a standard Gaussian random variable, and note that E[Xi ] = ~ and VAR[Xil = E[Xf] - E2[Xil = ~ - ~ = ~. Thus,

P( 4999.5 < X < 5000.5)

(4999.5 - 10,0000) 5000.5 - 10, 000 (~))

;::::j P < Z < -~;::;;::::::;:;::;:;:;'---r=f"-'----'-v10,000y1 v 10,000y1

P -<Z<-(-1 1 ) 100 100

~ (1~0) - ~ (~~) 2 (~(1~0) - ~(O)) 2~ (_1 ) -1.

100

Note that ~(0.01) = 0.5040. Hence, P(X = 5000) ;::::j 2(0.5040)-1 = 0.008.

11.8. Note that

E[X~l - 20E[Xnl + 0 2

VAR[Xnl + E[Xn]2 - 20:E[Xnl + 0:2

VAR[Xnl + (E[Xnl - 0)2 ---7 0

if and only if VAR[Xnl ---7 0 and E[Xnl ---70 as n ---7 00.

12.1. Note that X and Yare mutually Gaussian and that X and Yare uncorrelated since E[XYl = E[(U + V)(U - V)] = E[U2

E[V2l = O. Thus, X and Yare independent. Problem 10.6 implies that X and Yare each N(O, 2).

www.MathGeek.com

Page 234: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 233

Note that E[XIU] = E[U + VIU] = E[UIU] + E[VIU] = U + E[V] = U a.s. Similarly, E[YIU] = U a.s. Thus, not only are E[XIU] and E[YIU] not independent, but they are each equal almost surely to the same positive variance random variable.

12.2. Let f2 = {a, b, c} and define a measnre P on JPl(D) via P({a}) = P({b}) = P({e}) = 1/3. Define random variables X and Y on the resnlting probability space via Y(a) = 1, Y(b) = Y(e) = -1, X(a) = 1, X(b) = 2, and X(e) = O. Since the distributions are discrete the second condition in the definition of conditional expectation reduces to

L E[XIY](w)P({w}) = L X(w)P({w}),

where JlvI E (j(y) = {0, f2, {a}, {b, e}}. Snbstitntion for !vI implies that E[XIY](a) = X(a) and E[XIY](b) + E[XIY](e) = X(b) + X(e). Since E[XIY] is (j(Y)-measnrable it follows that E[XIY](b) = E[XIY](e). Thus, E[XIY](a) = E[XIY](b) = E[XIY] (c) = 1 and we see that E[XIY] = E[X] = 1/3 + 2/3 = 1 as required. However, X and Yare not independent since P(X = 1, Y = 1) = 1/3 yet P(X = l)P(Y = 1) = 1/9.

12.3. Note that E[ZIX] = E[XYIX] = XE[YIX] = XE[Y] = o a.s. Similarly, E[ZIY] = 0 a.s. However, Z is O"(X, Y)­measurable and hence E[ZIX, Y] = Z a.s.

12.4. Since E[XIF] is F-measurable it follows that E[XIF] = alIA + a2IB + a3Ic for some real constants aI, a2, and a:3. Recall that E[XIF] must satisfy

£E[XIF]d)'= £Xd),

for all F E F. Choosing F = A implies that

which in turns implies that

1 rl / 4 1 al = )'(A) Jo w

2 dw = 48·

Similarly, a2 = 97/432 and a3 = 19/27.

www.MathGeek.com

Page 235: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

234

12.5. Note that if A E (J(y) then

LE[XZIY]dP LXZdP

L IAXZdP

E[IAXZ] E[X]E[IAZ]

E[X] LZdP

E[X] L E[ZIY] dP.

Solutions

12.6. Recall from Problem 12.5 that if X and Z are independent and if X and Yare independent then E[XZIY] = E[X]E[ZIY] a.s. Let Fn denote 0"(X1' ... , Xn). Thus, it follows that

[(n+1 ) 2 I ] E {;Yk -(n+1)0"2Fn

E [ (~Yk +Y" 11)' - (n + l)<T'IFn]

E [ (~Yk)' + 21';, 11 ~ Yk +Y';+1 - n,,' - "'1.1;,]

E [Xn + 2Yn+1 ~Yk + Y;+1 - 0"21Fn]

E[XnIFn] + 2E [Yn+1 ~ YklFn] + E[Y,;+lIFn] - 0"2

Xn + 2E[Yn+1]E [~ Yk IFn] + E[Y;+l] - (J2

Xn a.s.

where in the second to the last step we used our first result and noticed that Y;+ 1 is independent of Xi fori ::; TI.

12.7. Note that

VAR[XIY] E[(X - E[Xly])2IY]

E[X2 - 2XE[XIY] + E[Xly]2IY]

www.MathGeek.com

Page 236: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 235

E[X21Y]- 2E[XE[XIY]IY] + E[E[Xly]2IY] E[X2IY] - 2E[XIY]E[XIY] + E[Xly]2 E[X2IY]- E[Xly]2.

Thus,

and

E[VAR[XIY]] E[E[X2IY]] - E[E[Xly]2] E[X2] - E[E[Xly]2]

VAR[E[XIY]] = E[E[Xly]2]- E[E[XIYW = E[E[Xly]2] - E[X]2.

Thus, E[VAR[XIY]]+VAR[E[XIY]] = E[X2]-E[X]2 = VAR[X].

12.8. To begin, note that

E[(X - g(y))2] E[(X - E[XIY] + E[XIY] - g(y))2]

= E[(X - E[Xly])2] + 2E[(X - E[XIY])(E[XIY]- g(Y))] +E[(E[XIY]- g(y))2].

The first res nIt now follows since

E[(X - E[XIY])(E[XIY] - g(Y))] E[E[(X - E[XIY])(E[XIY]- g(Y))IY]] E[(E[XIY]- g(Y))E[X - E[XIY]IY]] E[(E[XIY]- g(Y))(E[XIY] - E[XIY])] o.

From this result it is dear that E[(X _g(y))2] is minimized over all Borel measurable functions 9 when we let g(y) = E[XIY = y].

12.9. Let A = {I, 3, 5}, let B = {2, 4, 6}, and note that E[XIQ] = alIA + a2IB for some choice of al and a2 from R Since

r E[XIQ] dP = al r dP = r X dP JA JA JA it follows that Ctl = 3. That is since nIP(A) = 1P( {I}) + 3P( {3}) + 5P( {5}) it follows that ad2 = 9/6. Similarly it follows that Ct2 = 4. Thus, E[XIQ] = 3IA + 4IB .

www.MathGeek.com

Page 237: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

236 Solutions

12.10. It follows by the definition of conditional expectation that

for all w E Di .

r XdP E[XIQ] = _JO-,,-i -,------.,._

P(D;)

12.11. To begin, note that

where

E[XY]

and

E[XY]

E[E[YIQ2]Y] E[E[E[YIQ2]YIQ2]] E[E[YIQ2]2] E[X2]

E[XE[XIQl]] E[E[XE[XIQl]IQl]] E[E[XIQl]2] E[y2].

Thus, E[(X _y)2] = E[XY]-2E[XY] +E[XY] = 0 and, hence, the desired result follows via Problem 7.3.

12.12. Since E[YIX] = a + /3X it follows that E[E[YIX]] = E[Y] = a + pE[X]. Sllbstitntion implies that 4 = a + 3p. Note also that

E[XY] E[E[XYIX]] E[XE[YIX]] E[X(a + leX)] aE[X] + pE[X2] aE[X] + pVAR[X] + pE[X]2.

Substituting here implies that -3 = 3a + 11p. Solving these two eqnations yields a = 53/2 and p = -15/2.

www.MathGeek.com

Page 238: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to Problems 237

12.13. To begin, note that

(P(T X):.l E[y2IX = x] = (T~(1 _ p2) + (T:

since, for each fixed x, a conditional density f(ylx) of Y given X = x is Gaussian with mean prJy x / rJ X and variance rJ~ (1 - p:.l) . Next, recall that a moment generating function for X exists and is given by lHx(t) = exp(oJt2/2). Finding the fourth derivative of this function and evaluating it at t = 0 implies that E[X4] = 3rJl. Thus, since p = E[XYl!(rJxrJy), it follows that

E[X2y2] E[E[X2y2IX]]

E[X2E[y2IX]]

E[X2((T~(1 _ p2) + p2(T~.(T>? X2)]

rJ~(l - p2)rJi + p2rJ~rJX2(3rJ:tJ rJirJ~ + 2p2rJirJ~ E[X2]E[y2] + 2(E[Xy])2.

12.14. First, note that

E[YI ... Yn lXI, ... , Xm]

E[XmYrn+1 ... YnIXI, ... , Xm]

X mE[Yrn+1 ... Yn]

X m ·

Next, note that

E[(Xn - Xm) COS(Xj )]

E[E[(Xn - Xm) COS(Xj)IXI' ... , Xmll

E[cos(Xj)E[Xn - XmIXI, ... , Xm]]

E[cos(Xj) (E[Xn lXI, ... , Xm]- Xm)]

E[cos(Xj)(Xm - Xm)]

o.

12.15. To begin, note that

.fy(y) k f(x, y) dx

rl 8xy dx Jy

4y(1 _ y:.l)

www.MathGeek.com

Page 239: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

238

for 0 S y S 1. Also, note that

fx(x) kf(x, y)dy

lax 8xydy

4x3

for 0 S x S 1. Thus,

E[XIY = y]

for 0 S y S 1. Further,

E[YIX = x]

for 0 S x S 1.

r xf(x, y) dx JJR fy(y)

2 2 r1

X 2 dx 1- y Jy

21 - y:3

31- y2

21 + y + y2

3 l+y

r f(x, y) d JJR Y fx(x) y

2 lox, 2 d 2" Y Y x 0 2 -x 3

Solutions

8.3 Solutions to True jFalse Questions

1. False 5. False 9. False

2. True 6. True 10. False

3. True 7. False 11. False

4. False 8. False 12. True

www.MathGeek.com

Page 240: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Solutions to True/False Questions 239

13. True 23. True 33. False

14. False 24. False 34. False

15. True 25. True 31': o. False

16. False 26. False 36. False

17. True 27. False 37. True

18. False 28. True 38. False

19. True 29. True 39. True

20. False 30. False! 40. False

21. False 31. False

22. False 32. False

www.MathGeek.com

Page 241: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

Index

absolutely c:ontinuous distri-bution, 72

affine function, 164 algebra, 24 almost always, 36 almost everywhere, 53 almost sure convergence, 116,

1·51 almost surely, 116 atomic distribution, 72 autocOITelation function, 139 auto covariance fnndion, 139,

156 axiom of choice, 13

Banach space, 61 bandwidth, 161

second moment, 162 basis, 60 Bernoulli trial, 81 Bessel's inequality, 63 bijection, 20 bijective function, 20 binomial distribution, 81 bivariate Gaussian distribu-

tion, 112 Borel measurable fundion,

40,46, 73 Borel set, 39, 46 Borel-Cantelli lemma

first, 37 second, 77

bounded random variable, 70

bounded set, 35 bounded variation, 49 Brown, Robert, 164 Brownian motion process,

164 Buffon's needle problem, 84,

100

canonical filtration, 149 canonical projection, 18 Cantor-Lebesgue function,

74 Caratheodory criterion, 43 Caratheodory extension the­

orem, 86 Cartesian product

arbitrary index set, 17 n sets, 17 two sets, 17

Cauchy distribution, 99, 109 Cauchy sequence, 61 Cauchy-Schwarz inequality,

100 central limit theorem, 120 central moment, 98 Chapman-Kolmogorovequa-

tion, 145 Chapman-Kolmogorov the­

orem, 148 charaderistic fundion, 107,

121

www.MathGeek.com

Page 242: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

INDEX

Chebyshev's inequality, 100, 122

choose, 80 closed set, 30, 61 closure, 30, 146 cocountable set, 25, 41 cofinite set, 25 complement, 16 complete measure space, 45 complete metric space, 61 complete probability space,

135 complex-valued random pro­

cess, 153, 156 conditional density, 130 conditional expectation, 123,

126 conditional independence,

148 conditional probability, 123 continuous function, 40, 73,

141 continuous in probability,

137 continuous time random pro­

cess, 135 convergence in Lp , 118 convergence in distribntion,

119 convergence in law, 119 convergence in mean, 118 convergence In mean-square,

118 convergence in probability,

117, 137 convergence in the pth mean,

118 convergence of random vari­

ables, 116 convergence of sets, 37

convergent sequence, 42 convexity, 64, 100, 127 convolution, 103 coordinate, 18

241

correlation coefficient, 101, 113

countable additivity, 33 countable cover, 88 countable set, 21 countable subadditivity, 34 countable union, 23 countably infinite set, 21 counting measure, 34 covariance, 101 covariance matrix, 114 cover, 88

data processor, 94 Dedekind's theorem, 22 DeMorgan's law, 18 dense set, 136 density function, 74 dimension, 60 Dirac measnre, 34 discrete random variable, 72 discrete time random pro-

cess, 135 disjoint sets, 16 disjointification, 34 distance function, 61 distribution, 93 distribution function, 70 domain, 20 dominated convergence the­

orem, 55 Doob, Joseph, 151 Dynkin's Jr-A theorem, 29

eigenfunction, 143 eigenvalue, 143

www.MathGeek.com

Page 243: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

242

Einstein, Albert, 164 empty set, 13 equipotence, 21 equipotent sets, 21 equivalence relation, 19 evaluation, 18 event, 25, 75 expectation, 94 expected value, 94 exponential distribution, 109

Fa set, 31 factor, 18 field, 24 filtration, 149 finite dimensional distribu-

tion, 135 finite measure, 33 finite set, 21 finite-dimensional vector space,

60 first order random variable,

94 floor, 85 Fonrier transform, 107 Frege, Gottlob, 13 Fubini's Theorem, 163 function of a random van-

able, 103 functions, 19 fundamental theorem of cal­

culus, 98

Gb set, 31 gain, 157 gambling, 151 Gaussian density function,

108 Gaussian distribution, 109,

130

INDEX

Gaussian Markov process, 148

Gaussian process, 138 Gaussian random variable,

108 Gram-Schmidt procedure,

160 greatest lower bound, 36

half wave rectifier, 159 Hall, Eric, 74, 129 Hilbert space, 62, 128 Hilbert space projection the-

orem, 66, 128 Holder's inequality, 100

improper Riemann integral, 56

impulse response function, 158

independent events, 75 independent random van-

ables, 76 index set, 14, 135 indicator function, 20 indistinguishable random pro-

cesses, 135 induced measure, 57, 93 induced metric, 61 inferior limit, 36, 42 infimum, 36 infinitely often, 36 initial probability, 145 injective function, 20 inner product, 62 inner product space, 62 integrable function, 53 integrable random variable,

94 integrable sample paths, 138

www.MathGeek.com

Page 244: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

INDEX

integral operator, 143 integration by parts, 51 intersection, 14 inverse image, 20 invertible function 20 , isolated point, 31

Jensen's inequality, 101, 127 joint Gaussian distribution ,

112 joint probability density func­

tion, 83 joint probability distribution

fundion,83 jointly Gaussian random vari­

ables, 113

Karhunen-Loeve expansion, 144, 166

Kolmogorov, Andrei, 69

Lp(O, :F, P), 118 L2 continuous process, 141 L2 differentiable process, 141 L2 integrable process, 142 L2 integral, 138 A-system, 28 Laplace distribution, 109 Laplace transform 107 , law, 93 least upper bound, 36 Lebesgue decomposition the-

orem,73 Lebesgue, Henri, 69 Lebesgue integrable func-

tion, 53, 55 Lebesgue integral, 52, 53 Lebesgue measurable set, 43 Lebesgue measure, 44 Leibniz's rule, 166 lim inf, 36

lim sup, 36 limit point, 30, 61 limiter, 159, 162 linear filter, 158 linear manifold 60 , linear operation, 157 linear span, 60

243

linearly dependent vedors, 59

linearly independent vectors, 60

Lipschitz condit on 49 , lower bound, 35 lower Riemann integral, 47 Lusin's Theorem, 160 Lyapounov's inequality, 101

marginal density fnndion, 84

Markov chain, 145 irredndble, 146

Markov process, 147 martingale, 149 martingale convergence the-

orem, 151 mean, 94, 98 mean recurrence time, 146 mean-square continuity 141 , ,

152 mean-square integral, 138 measurable function, 38 measurable modification, 137 measurable random process,

137 measurable rectangle, 136 measurable set, 25 measurable space, 25 measure, 33 measure on an algebra, 86 measure space, 33

www.MathGeek.com

Page 245: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

244

Mehler series, 160 Mercer's theorem, 143 metric, 61 metric space, 61 mllllmum mean-square esti-

mate, 128, 148 Minkowski sum, 162 Minkowski's inequality, 100 modification, 135 moment, 98 moment generating function,

105 monotone convergence theo-

rem, 54 monotonicity, 34 Monte Carlo analysis, 84 Monte Carlo simulation, 86 multivariate Gaussian distri-

bution, 113 mutually Gaussian random

variables, 113 mutually independent events,

75 nmtnally independent ran­

dom variables, 76, 84

neighborhood, 31 nondecreasing function, 49 nonnegative definite func-

tion, 139 norm, 60 normed linear space, 61 nowhere differentiable func-

tion, 165 null set, 45

one-to-one function, 20 onto function, 20 open ball, 61

INDEX

open Euclidean ball, 30 open interval, 39 open rectangle, 46 open set, 30 orthogonal increments, 151 orthogonal vectors, 62 orthonormal vectors, 63 outer Lebesgue measure, 43

pairwise independent events, 75

parallelogram law, 63 Parseval's equality, 160 Parsevars identity, 63 perfect set, 31 1[" -system, 28 1[">., 18 placebo, 132 pointwise convergence, 116 pointwise limit, 42 Poisson approximation, 82 Poisson distribution, 106 positive definite matrix, 114 power set, 14 pre-Hilbert space, 62 probability density fnnction,

74 probability distribution func-

tion, 70 probability measure, 33 probability space, 33, 70 product O"-algebra, 136 product measure, 136 product of random variables,

103 projection, 65, 128 proper subset, 14 proper subspace, 60

quantization, 129

www.MathGeek.com

Page 246: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

INDEX

]R.n, 46 Radon-Nikodym theorem,

68 random process, 13.5 random sequence, 135 random variable, 70 range, 20 rank, 114 rational line, 174 real vector space, 59 reflexive relation, 19 regression function, 129 relations, 19 relative <:omplement, 16 Riemann integral, 47, 51, 55,

143 Riemann-Stieltjes integral,

51,96 Riemann-Stieltjes sum, 50 Riesz-Frechet theorem, 67 right mntinuous fundion,

56,70 Russell, Bertrand, 13

sample fnndion, 135 sample path, 135 sample sequence, 135 sample space, 25 Schroeder-Bernstein theo-

rem, 22 second order random pro­

cess, 139 second order random van­

able, 98 separable modification, 136,

137 separable random process,

136 sequence of random van­

ables, 116

set difference, 16 erA, 216 er-algebra, 24

245

countably generated, 41 generated, 25, 76

IT-field, 24 er-finite measure, 33 IT-subalgebra, 41 simple function, 51 singleton set, 14 singular distribution, 72 size of a subdivision, 50 spectral density function, 155 spedral distribution fun<:-

tion, 155 spedral representation, 155 spedrum, 155 square law device, 159 St. Petersburg paradox, 97 standard deviation, 98 standard Gaussian distribu-

tion, 110, 111 standard Gaussian random

variable, 110 state, 145

absorbing, 146 aperiodic, 146 dosed set, 146 ergodi<:, 146 rnean re<:nrren<:e tinle,

146 null, 146 period, 146 persistent, 146 reachable, 146 transient, 146

stationary Gaussian process, 140

statistical hypothesis, 132

www.MathGeek.com

Page 247: Probability - Basic Ideas and Selected Topics

www.MathGeek.com

246

inference, 132 step function, 1.52 stochastic integral, 1.52 stochastic process, 135 strictly stationary process,

139 strong law of large numbers,

123 subdivision, 47 submartingale, 150 subset, 13 subspace, 60 sum of random variables,

102 superior limit, 36, 42 supermartingale, 150 superset, 13 supremum, 36 surjective function, 20 symmetric difference, 16 symmetric distribution, 107 symmetric matrix, 114 symmetric relation, 19

Taylor's series, 107, 121 Taylor's theorem, 105 topological space, 30 topology, 30 total set, 63 trajectory, 135 transfer function, 158 transition probability, 145 transitive relation, 19 triangle inequality, 61

unbounded set, 36 unbounded variation, 49 uncorrelated random van-

ables, 101 uncountable set, 21

INDEX

uniform convergence, 144 uniform distribution, 84, 109 union, 15 upper bound, 35 upper Riemann integral, 47 usual topology, 30

variance, 98 vector space, 59 version, 123

weak law of large numbers, 122

wide sense stationary pro-cess, 140, 154

vViener, Norbert, 164 \Viener process, 164 vVierstrauss Aproximation

Theorem, 160 \Vise, Gary, 74, 129 with probability 1, 116

Zermelo-Fraenkel, 13 zero memory nonlinearity,

159

www.MathGeek.com