probability theory and mathematical

Post on 26-Jan-2016

37 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

[Marek Fisz]

TRANSCRIPT

Probability Theory and

Mathematical Statistics

A WILEY PUBLICATION IN MATHEMATICAL STATISTICS

Published in colloboration with P WN, Polish Scientific Publishers

John Wiley & Sons, Ine.New York • London • Sydney

THIRD EDITION

Late Professor of MathematicsNew York University

Probability Theory andMathematical Statistics

MAREK FISZ

PRINTED IN THE UNITED STATES OF AMERICA

LIBRARY Of CONGRESS CATALOG CARD NUMBER: 63-7554

TRANSLATIONSINTO OTHER LANGUAGESARE PROHIBITED EXCEPT BYPERMISSIONOF P'!NN, POLlSH SCIENTIFIC PUBLlSHERS

THIRD PRINTING, JANUARY, 1967

Al! Rights Reserved

This book or any part thereof múst notbe reproduced in any [orm without thewritten permission 01 (he publisher,

BY

JOHN WILEY & SONS, INC.

RACHUNEK PRAWDOPODOBIENSTWA 1 STATYSTYKAMATEMATYCZNA

THE FIRST AND SECOND EDlTIONS OF THIS BOOKWERE PUBLlSHED AND COPYRIGHTED IN POLAND UNDER THE TITLE

AUTHORIZED TRANSLATION FROM THE POLlSH.

TRANSLATED BY R. BARTOSZYNSKI'

COPYRIGHT © 1963

To OIga and Aleksander

The opening sentence of this book is "Probability theory is a part ofmathematics which is useful in discovering and investigating the regularfeatures of random events." However, even in the not very remote pastthis sentence would not have been found acceptable (or by any meansevident) either by mathematicians or by researchers applying probabilitytheory. It was not until the twenties and thirties of our century that thevery nature of probability theory as a branch of mathematics and therelation between the concept of "probability" and that of "frequency" ofrandom events was thoroughly clarified. The reader will find in Section1.3 an extensive (although, certainly, not exhaustive) list of names ofresearchers whose contributions to the field of basic concepts of proba­bility theory are important. However, the foremost importance ofLaplace's Théorie Analytique des Probabilités, von Mises' Wahrscheinlich­keit, Statistik und Wahrheit, and Kolmogorov's Grundbegriffe der Wahr­scheinlichkeitsrechnung should be stressed. With each of these three works,a new period in the history of probability theory was begun. In addition,the work of Steinhaus' school of independent functions contributed greatlyto the clarification of fundamental concepts of probability theory.The progress in foundations of probability theory, along with the

introduction of the theory of characteristic functions, stimulated theexceedinglyfast development of modern probability theory. In the field oflimit theorems for sums of random variables, a fairly general theory wasdeveloped (Khintchin, Lévy, Kolmogorov, Feller, Gnedenko, and others)for independent random variables, whereas for dependent randomvariables sorne important particular results were obtained (Bernstein,Markov, Doeblin, Kolmogorov, and others). Furthermore, the theoryof stochastic processes became a mathematically rigorous branch of

Preface to the English Edition

vii

probability theory (Markov, Khintchin, Lévy, Wiener, Kolmogorov,Feller, Cramér, Doob, and others).Sorne ideas on application of probability theory that are now used in

mathematical statistics are due to Bayes (estimation theory), Laplace(quality control of drugs), and to Gauss (theory of errors). However, itwas not until this century that mathematical statistics grew into a self­contained scientific subject. In order to restrict myself to the names ofjust a few of the principal persons responsible for this growth, 1 mentiononly K. Pearson, R. A. Fisher, J. Neyman, and A. Wald, whose ideas andsystematic research contributed so much to the high status of modernmathematical statistics.At present, the development of the theory of probability and mathe­

matical statistics is going on with extreme intensity. On the one hand,problems in classical probability theory unsolved, as of now, are attractingmuch attention, whereas, on the other hand, much work is being done inan attempt to obtain highly advanced generalizations of old concepts,particularIy by considering probability theory in spaces more general thanthe finite dimensional Euclidean spaces usually treated. That probabilitytheory is now closely connected with other parts of mathematics is evi­denced by the fact that almost immediately after the formulation of thedistribution theories by L. Schwartz and Mikusinski their probabilisticcounterparts were thoroughly discussed (Gelfand, 1t6, Urbanik). More­over, probability theory and mathematical statistics are no longer simply"customers" of other parts of mathematics. On the contrary, mutualinfluence and interchange of ideas between probability theory and mathe­matical statistics and the other areas of mathematics are constantly goingon. One example is the relation between analytic number theory andprobability theory (Borel, Khintchin, Linnik, Erdós, Kac, Rényi, andothers), and another example is the use of the theory of games in mathe­matical statistics and its stimulating effecton the development of the theoryof games itself (v. Neumann, Morgenstern, Wald, Blackwell, Karlin, andothers). .. Areas of application of probability theory and mathematical statisticsare increasing more and more. Statistical methods are now widely used inphysics, biology, medicine, economics, industry, agriculture, fisheries,meteorology, and in communications. Statistical methodology has becomean important ,;:omponent of scientific reasoning, as well as an integralpart of well-organized business and social work.The development of probability theory and mathematical statistics and

their applications is marked by a constantly increasing flood of scientificpapers that are published in many journals in many countries.

PREFACE TO THE ENGLISH EDITIONviii

* * *

1 started to write this book at the end of 1950. Its first edition (374pages) was published in Polish' in 1954. AH copies were sold within afew months. 1 then prepared the second, revised and extended, Polish

* * *

Having described briefly the position of modero probability theory andmathematical statistics, 1 now state the main purpose of this book:

1. To give a systematic introduction to modero probability theory andmathematical statistics.2. To present an outline of many of the possible applications of these

theories, accompanied by descriptive concrete examples.3. To provide extensive (however, not exhaustive) references of other

books and papers, mostly with brief indications as to their contents,thereby giving the reader the opportunity to complete his knowledge of thesubjects considered.

Although great care has been taken to make this book mathematicalIyrigorous, the intuitive approach as weIl as the applicability of the conceptsand theorems presented are heavily stressed.For the most part, the theorems are given with complete proofs. Sorne

proofs, which are either too lengthy or require mathematical knowledgefar beyond the scope of this book, were omitted.The entire text of .the book may be read by students with sorne back­

ground in calculus and algebra. However, no advanced knowledge inthese fields or a knowledge in measure and integration theory is required.Sorne necessary advanced concepts (for instance, that of the Stieltjesintegral) are presented in the text. Furthermore, this book is providedwith a Supplement, in which sorne basic concepts and theorems of moderomeasure and integration theory are presented.Every chapter is folIowed by "Problerns and Complements." A large

part ofthese problems are relatively easy and are to be solved by the reader,with the remaining ones given for information and stimulation.This book may be used for systematic one-year courses either in

probability theory or in mathematical statistics, either for senior under­graduate or graduate students. 1 have presented parts of the material,covered by this book, in courses at the University of Warsaw (Poland)for nine academic years, from 1951/1952 to 1959/1960, at the PekingUniversity (China) in the Spring term of 1957, and in this country at theUniversity of Washington and at Stanford, Columbia and New YorkUniversities for the last several years.This book is also suitable for nonmathematicians, as far as concepts,

theorems, and methods of application are concemed.

IXPREFACE TO THE ENGLISH EDlTION

J. Lukaszewicz, A. M. Rusiecki, and W. Sadowski read the manuscriptof the first edition and suggested many improvements. The criticism ofE. Marczewski and the reviewsby Z. W. Birnbaum and S. Zubrzycki ofthefirst edition were useful in preparing the second edition; also useful werevaluable remarks of K. Urbanik, who read the manuscript of the secondedition. Numerous remarks and corrections were suggested by J. Woj­tyniak, and R. Zasepa (first edition), and by L. Kubik, R. Sulanke, and J.Wloka (second edition). R. Bartoszynski, with the substantial collabora­tion of Mrs H. Infeld, translated the book from the Polish. J. Karushmade valuable comments about the language. Miss D. Garbose did theeditorial work. B. Eisenberg assisted me in the reading of the proofs. Mysincere thanks go to all these people.

* * *

edition, which was published in 1958, simultaneously with its Germantranslation .. Indications about changes and extensions introduced intothe second Polish edition are given in its preface. In comparison withthe second Polish edition, the present English one contains many exten­sions and changes, the most important of which are the following:Entirely new are the Problems and Complements, the Supplement, and

the Sections: 2.7,C, 2.8,C, 3.2,C, 3.6,G, 4.6,B, 6.4,B, 6.4,C, 6.12,D,6.12,F, 6.15, 8.11, 9.4,B, 9.6,E, 9.9,B, 1O.10,B,1O.1l,E, 12.6,B, 13.5,E,13.7,D, 14.2,E, 14.4,D, 15.l,C, 15.3,C, 16.3,D, 16.6, and 17.10,A. Sec­tion 8.10 (Stationary processes) is almost entirely new.Considerably changed or complemented are Sections: 2.5,C, 3.5,

3.6,C, 4.1, 4.2, 5.6,B, 5.7, 5.I3,B, 6.2, 6.4,A, 6.5, 6.l2,E, 6.l2,G, 7.5,B,8.4,D, 8.8,B, 8.I2, 9.1, 9.7, 10.12, 10.13, 12,4,C, 12.4,D, 13.3, 16.2,C.These changes and extensions have all been made to fulfill more com­

pletely the main purpose of this book, as stated previously.

PREFACE TO THE ENGLISH EDITIONx

MAREK FISZNew YorkOctober, 1962

xi

2.1 The concept of a random variable 292.2 The distribution function . 312.3 Random variables of the discrete type and the continuous type 332.4 Functions of random variables . 362.5 Multidimensional random variables 402.6 Marginal distributions 462.7 Conditional distributions . 482.8 Independent random variables 522.9 Functions of multidimensional random variables 56

2 RANDOM VARIABLES 29

1.1 Preliminary remarks 31.2 Random events and operations performed on them 51.3 The system ofaxioms of the theory of probability 111.4 Application of combinatorial formulas for computing proba-

bilities 161.5 Conditional probability 181.6 Bayes theorem . 221.7 Independent events 24

Problems and Complements 25

PROBABILlTY THEORYPART 1

PAGECHAPTER

Contents

11 RANDOM EVENTS •

5.1 One-point and two-point distributions 1295.2 The Bernoulli scheme. The binomial distribution . 1305.3 The Poisson scheme. The generalized binomial distribution 1345.4 The Pólya and hypergeometric distributions 1355.5 The Poisson distribution 1405.6 The uniform distribution 145

5 SOME PROBABILITY DISTRlBUTIONS • 129

4.1 Properties of characteristic functions 1054.2 The characteristic function and moments 1074.3 Semi-invariants 1104.4 The characteristic function of the sum of independent random

variables . 1124.5 Determination of the distribution function by the characteristic

function . 1154.6 The characteristic function of multidimensionaI random vectors 1214.7 Probability-generating functions 125

Problems and Complements 126

xii CONTENTS

2.10 Additional remarks 62

Problems and Complements 62

3 PARAMETERS OF THE DISTRIBUTION OF A RANDOM VARIABLE 64

3.1 Expected values 643.2 Moments. 673.3 The Chebyshev inequality 743.4 Absolute moments 763.5 Order parameters 773.6 Moments of random vectors 793.7 Regression of the first type 913.8 Regression of the second type 96

Problems and Complements 101

4 CHARACTERISTIC FUNCTIONS

147151154156158

5.7 The normal distribution5.8 The gamma distribution5.9 The beta distribution5.10 The Cauchy and Laplace distributions5.11 The multidimensional normal distribution

8.1 The notion of a stoehastie proeess 2718.2 Markov proeesses and proeesses with independent inerements 2728.3 The Poisson proeess . .' 2768.4 The Furry- Yute proeess 2818.5 Birth and death proeess 2878.6 The Pólya proeess 2988.7 Kolmogorovequations 301

2718 STOCHASTIC PROCESSES

250250252255

homogeneous Markov ehain 263

Preliminary remarks .Homogeneous Markov ehainsThe transition matrixThe ergodie theorem .Random variables forming a

Problems and Complements

7.17.27.37.47.5

CONTENTS Xlll

5.12 The multinomial distribution 1635.13 Compound distributions 164

Problems and Complements 170

6 LlMIT THEOREMS 175

6.1 Preliminary remarks 1756.2 Stoehastie eonvergenee 1766.3 Bernoulli's law of large numbers 1796.4 The eonvergenee of a sequenee of distribution funetions 1806.5 The Riemann-Stieltjes integral 1846.6 The Lévy-Cramér theorem . 1886.7 The de Moivre-Laplaee theorem . 1926.8 The Lindeberg-Lévy theorem 1966.9 The Lapunov theorem 2026.10 The Gnedenko theorem 2116.11 Poisson's, Chebyshev's, and Khintehin's laws of large numbers 2166.12 The strong law of large numbers 2206.13 Multidimensionallimit distributions 2326.14 Limit theorems for rational funetions ofsome random variables 2366.15 Final remarks 239

Problems and Complements 239

7 MARKOV CHAINS • 250

267

372372374377379384387388388390394405407410

372

358363366368.

337339343348354357

335337

335

304309314323325

327

10.1 Preliminary remarks10.2 The notion of an order statistic10.3 The empirical distribution function10.4 Stochastic convergence of sample quantiles10.5 Limit distributions of sample quantiles10.6 The limit distributions of successive sample elements10.7 The joint distribution of a group of quantiles10.8 The distribution of the sample range10.9 Tolerance limits10.10 Glivenko theorem10.11 The theorems of Kolmogorov and Smirnov '.10.12 Rényi's theorem10.13 The problem of k sarnples

Problems and Complements

10 ORDER STATlSTlCS

9.1 The notion ofa sample9.2 The notion of a statistic9.3 The distribution ofthe arithmetic mean ofindependent normally

distributed random variables9.4 The 'i distribution9.5 The distribution of the statistic (X, S)9.6 Student's t-distribution9.7 Fisher's Z-distribution9.8 The distribution of X for sorne non-normal populations9.9 The distribution of sample moments and sample correlation

coefficients of a two-dimensional normal population9.10 The distribution of regression coefficients9.11 Limit distributions of sample moments

Problerns and Complements

9 SAMPLE MOMENTS ANO THEIR FUNCTIONS

MATHEMATICAL STATISTICSPART 2

8.8 Purely discontinuous and purely continuous processes8.9 The Wiener process8.10 Stationary processes8.11 Martingales8.12 Additional remarks

Problems and Complements

CONTENTSXIV

CONTENTS XV

11 AN OUTLINE OF THE THEORY OF RUNS 415

11.1 Preliminary remarks . " 41511.2 The notion of a run 41511.3 The probability distribution of the number of runs 41611.4 The expccted value and the variance of the number of runs 421

Problems and Complements 423

12 SIGNIFICANCE TESTS 425

12.1 The concept of a statistical test 42512.2 Parametric tests for small samples 42712.3 Parametric tests for large samples 43312.4 The .%2 test 43612.5 Tests of the Kolmogorov and Smirnov type . 44512.6 The Wald-Wolfovitz and Wilcoxon-Mann-Whitney tests 44912.7 Independence tests by contingency tables 456

Problems and Complements 459

13 THE THEORY OF ESTIMATION 461

13.1 Preliminary notions 46113.2 Consistent estimates 46113.3 Unbiased estimates 46213.4 The sufficiencyof an estimate 46513.5 The efficiencyof an estimate 46713.6 Asymptotically most efficient estimates 47913.7 Methods of finding estimates 48413.8 Confidence intervals . 49013.9 Bayes theorem and estimation 494

Problems and Complements 499

14 METHOOS ANO SCHEMES OF SAMPLING 503

14.1 Preliminary remarks . 50314.2 Methods of random sampling 50414.3 Schemes of independent and dependent random sampling 509"14.4 Schemes of unrestricted and stratified random sampling 51214.5 Random errors of measurements 520

Problems and Complements 522

ERRATA FOLLOWS INDEX

612621658665671

SUPPLEMENTREFERENCESSTATISTICAL TABLESAUTHORINDEXSUBJECT INDEX .

610Problems and Complements

17.1 Preliminary rernarks . 58417.2 The sequential probability ratio test 58517.3 Auxiliary theorems 58717.4 The fundamental Identity . 59117.5 The OC function of the sequential probability ratio test 59217.6 The expected value E(n) 59517.7 The determination of A and B 59717.8 Testing a hypothesis concerning the parameter p of a zero-one

distribution 59717.9 Testing a hypothesis concerning the expected value m of a

normal population 60417.10 Additional remarks . 608

58417 ELEMENTS OF SEQUENTIAL ANALYSIS

541541552558560566578

578

16.1 Preliminary remarks .16.2 The power function and the OC function16.3 Most powerfuI tests16.4 Uniformly most powerfuI test16.5 Unbiased tests .lq.6 The powcr and consistency of nonparametric tests16.7 AdditionaI remarks .

Problems and Complement

54116 THEORY OF HYPOTHESES TESTlNG

524531535

540

15.1 One-wa y c1assification15.2 Multiple c1assification15.3 A modified regression problem

Problems and Complements

CONTENTS .XVi

52415 AN OUTLlNE OF ANALYSlS OF YARIANCE

ProbabilityTheory

PA R T 1

3

A Probability theory is a part of mathematics which is useful in dis­covering and investigating the regular features of random events. Thefollowing examples show what is ordinarily understood by the termrandom event.

Example 1.1.1. Let us toss a symmetric coin. The result may be either a heador a tail. For any one throw, we cannot predict the result, although it is obviousthat it is determined by definite causes. Among them are the initial velocity ofthe coin, the initial angle of throw, and the smoothness of the table on which thecoin falIs. However, since we cannot control alI these parameters, we cannotpredetermine the result of any particular toss. Thus the result of a coin tossing,head or tail, is a random event.

Example 1.1.2. Suppose that we observe the average monthly temperatureat a definite place and for a definite month, for instance, for January in Warsaw.!This average depends on many causes such as the humidity and the direction andstrength of the wind. The effect of these causes changes year by year. HenceWarsaw's average temperature in January is not always the same. Here we candetermine the causes for a given average tempera ture, but often we cannot deter­mine the reasons for the causes themselves. As a result, we are not able topredict with a sufficient degree of accuracy what the average temperature for acertain January wilI be. Thus we refer to it as a random evento

B It might seem that there is no regularity in the examples given.But if the number of observations is large, that is, if we deal with a massphenomenon, sorne regularity appears.

Let us return to exarnple 1.1.1. We cannot predict the result of anyparticular toss, but if we perform a long series of tossings, we noticethat the number of times heads occur is approxirnately equal to thenumber of times tails appear. Let n denote the nurnber of all our tossesand m the number of times heads appear. The fraction m/n is called the

1 See example 12.5.1.

Random Events

CHAPTER 1

1.1 PRELIMINAR Y REMARKS

One can see that the values of PI oscilIate about the number 0.517, and thevalues of P2 oscilIate about the number 0.483.

mPI = m + f'

tossed a coin 4040 times, and obtained heads 2048 times; hence the ratioof heads was m/n = 0.50693. In 24,000 tosses, K. Pearson obtained afrequency of heads equal to 0.5005. We can see quite clearly that theobserved frequencies oscillate about the number 0.5.As a result of long observation, we can also notice certain regularities

in example 1.1.2. We investigate this more closely in example 12.5.1.

Example 1.1.3. We cannot predict the sex of a newborn baby in any partic­ular case. We treat this phenornenon as a random event. But if we observe alarge number of births, that is, if we deal with a mass phenomenon, we are able topredict with considerable accuracy what will be the percentages of boys and girls.among all newborn babies. Let us consider the number of births of boys andgirIs in Poland in the years 1927 to 1932. The data are presented in Table 1.1.1.In. this table m andf denote respectively the number of births of boys and girls

in particular years. Denote the frequencies of births by PI and P2' respectively;then

Total

192719281929193019311932

Yearof Birth

TABLE 1.1.1FREQUENCYOF BIRTHS OF Bovs ANOGIRLS

TotalNumber of Births Number Frequency of Births

of BirthsBoys Girls Boys Girlsm [ m +I PI P2

496,544 462,189 958,733 0.518 0.482513,654 477,339 990,993 0.518 0.482514,765 479,336 994,101 ·0.518 0.482528,072 494,739 1,022,811 0.516 0.484496,986 467,587 964,573 0.515 0.485482,431 452,232 934,663 0.516 0.484

3,032,452 2,833,422 5,865,874 0.517 0.483

frequency of appearance of heads. The frequency of appearance of tailsis given by the fraction (n - m}/n. Experience shows that if n issufficiently large, thus if the tossings may be considered as a massphenomenon, the fractions m/n and (n - m}/n differ little ; hence each ofthem is approximately l. This regularity has been noticed by manyinvestigators who have performed a long series of coin tossings. Buffon

PROBABILITY THEORY4

A We now construct the mathematical definition of a random event,the colloquial meaning of which was discussed in the preceding section.The primitive notion of the axiomatic theory of probability is that of

the set 01elementary events. This set is denoted by E.For everyparticularproblem wemust decidewhat iscalled the elementary

event; this determines the set E.Example 1.2.1. Suppose that when throwing a die we observe the frequency

of the event, an even face. Then, the appearance of any particular face i, wherei = 1, ... , 6, is an elementary event, and is denoted by ei. Thus the whole set ofelementary events contains 6 elements.

In our example we are investigating the randorn event A that an even face willappear, that is, the event consisting of the elementary events, face 2, face 4, andface 6. We denote such an event by the symbol (e2, e4, es). The random event(e2' e4, es) occurs if and only if the result of a throw is either face 2 or face 4 orface 6.Ifwe wish to observe the appearance of an arbítrary face which is not face 1,

we will have a random event consisting of five elements (e2' ea, e4, es, es).Let us form the set Z of random events which in this example is the set of all

subsets of E.We include in Z all the single elements of E: (el)' (e2), (ea), (e4);(eS)' (e6), where

for instance, the random event (e4) is simply the appearance of the elementaryevent, face 4.

Besides the 6 one-element random events (el), ... , (es), there also belong tothe set Z 15 two-element subsets (el' e2), .•. , (es, es), 20 three-element subsets(el> e2, ea), ..• ,(e4' es, e6), 15 four-element subsets (el' e2, e3, e4), ... , (ea, e4, es,e6), and 6 five-element subsets (el' e2, ea, e4, es)' ... , (e2, ea' e4, es, es). But theseare not all.

Now consider the whole set E as an event. It is obvious that as a result of athrow we shall certainly obtain one of the faces 1, ... ,6, that is, we are sure thatone of the elementary events of the set E will occur. Usually, if the occurrence

1.2 RANDOM EVENTS AND OPERATIONSPERFORMED ON THEM

Example 1.1.4. We throw a die. As a result of a throw one of the faces1, ... , 6 appears. The appearance of any particular face is a random evento If,however, we perform a long series of throws, observing all those which give faceone as a result, we will notice that the frequency of this event will oscillate aboutthe number i. The same is true for any other face of the die.

This observed regularity, that the frequency of appearance of anyrandom event oscilIates about sorne fixed number when the number ofexperiments is large, is the basis of the notion of probability.Concluding these preliminary remarks, let us stress the fact that the

theory of probability is applicable only to events whose frequency ofappearance can (under certain conditions) be either directly or indirectlyobserved or deduced by logical analysis.

5RANDOM EVENTS

and read: A is contained in B.

sure event (the whole set of elementary events E).

( nI) one-element events,

(n2) two-element events,

( n ) (n - l)-element events,n - I

of an event is sure, we do not consider it a random event; nevertheless we shallconsider asure event as a random event and include it in the set Z of randomevents.Finally, in throwing a die, consider the event of a face with more than 6 dots

appearing. This event includes no element of E; hence as a subset of E, it isan empty set. Such an event is, of course , irnpossible, and usually is not con­sidered as a random evento However, we shall consider it as a random event andwe shall incIude it in the set Z of random events, denoting it by the symbol (O).JncIuding the impossible and sure events, the set Z of random events in our

example has 64 elements.Generally, ir the set E contains n elernents, then the set Z of random events

contains 2n elements, namely,

impossible event (empty set),

PROBABILlTY THEORy6

B In example 1.2.1., the set E of elementary events was finite; in thetheory of probability we also consider situations where the set E isdenumerable or is of power continuum. In the latter case the set Z ofrandom events dóes not contain alI events, that is, it does not containall subsets of the set E. We shall restrict our considerations to a set Zwhich is a Borel field of subsets of E. The definition of such a set Z isgiven at the end of this section since this book is to be available to thereaders who do not know the set operations which are involved in thedefinition of a Borel field.We now give the definition of a random event. The notion of the set Z

appears in this definition. But since it has not been given precisely, wereturn to the notion of a random event once more (see definition 1.2.10).

Definition 1.2.1. Every element of the Borel fieldZ of subsets of the setE of elementary events is called a random evento

Definition 1.2.2. The event containing all the elements of the set E ofelementary events is called the sure evento

Definition 1.2.3. The event which contains no elements of the set E of.elementary events is called the impossible evento

The impossible event is denoted by.(O).Definition 1.2.4. We say that event A is contained in event B if every

elementary event belonging to A belongs to B.We write

and read: Al or A2 or ....

or A = Al + A2 + ... ,A = Al U A2 U ...

e We now come to a discussion of operations on events. LetA¡, A2, ••• be a finite or denumerable sequence of random events.

Definition 1.2.6. The event A which contains those and only thoseelementary events which belong to at least one of the events Al' A2, ••• iscalled the alternatioe (or sum or union) of the events Al' A2, ••••

We write

Example 1.2.2. Consider the random event A that two persons from thegroup of n persons born in Warsaw in 1950 will stilI be alive in the year 2000 andthe event B that two or more persons from the group considered will still be alivein the year 2000. Events A and B are not exclusive.If, however, we consider the event B' that onlv one person will still be alive in

the year 2000, events A and B' will be exclusive.Let us analyze this example more closely. In the group of n elements being

considered it may happen that 1, or 2, or 3 ... up to n persons will still be alivein the year 2000, and it may happen that none of them will be alive at that time.Then the set E consists of n + 1 elementary events eo' el' ... , en, where the índi­ces O, 1, ... , n denote the number of persons from the group being consideredwho will still be alive in the year 2000. The random event A in this example con­tains only one element, namely, the elementary event e2. The random event Bcontains n - 1 elementary events, namely ,e2' e3' ... ,en- The common elementof the two events A and Bis the elementary event e2, and hence these two events .are not exclusive. However, event B' contains only one element, namely, theelementary event el. Thus events A and B' have no common element, and areexclusive.

Fig.1.2.1

EA=B.

We write

We now postulate the folIowing properties of Z.Property 1.2.1. The set Z 01 random events

contains as an element the whole set E.Property 1.2.2. The set Z 01 random eoents

contains as an element the empty set (O).These two properties state that the set Z of

random events contains as elements the sureand the impossible events.

Definition 1.2.5. We say that two events A and B are exclusive if theydo not have any common element of the set E.

We illustrate this notion by Fig. 1.2.1, where square E represents theset of elementary events and circles A and B denote subsets of E. We seethat A is contained in B.

Definition 1.2.4'. Two events A and B are called equal if A is containedin B and B is contained in A.

7RANDOM EVENTS

A = Al - A2•

The difference of events is illustrated by Fig. 1.2.3, where square Erepresents the set of all elementary events and circ1es Al and A2 representtwo events; the shaded area represents the difference Al - A2•

For example, we prove thatA uA = A.

In fact, every elementary event belonging to A U A belongs to A; hence(A U A) e A. SimilarIy, A e (A U A); thus A U A = A.

Definition 1.2.7. The random event A containing those and only thoseelementary events which belong to Al but do not belong to A2 is calledthe difference of the events Al and A2•

We write

A U (O) = A.A U E= E,A U A = A,

Let us illustrate the alternative of events by Fig. 1.2.2.On this figure, square E represents the set of elementary events and

circles Al' A2, A3 denote three events; the shaded area represents thealternative Al + A2 + A3'

In our definition the alternative of random events corresponds to theset-theoretical sum of the subsets Al' A2, ••• , of the set of elementaryevents.

The alternative of the events Al' A2, ••• occurs if and only if at leastone of these events occurs.The essential question which arises here is whether the alternative of

an arbitrary (finite or denumerable) number of random events belongs toZ and hence is a random event. A positive answer to this questionresults from the following postulated property of the set Z of randomevents.

Property 1.2.3. If a finite or denumerable number 01 events Al' A2, •••

belong lo Z, then their alternatioe also belongs lo Z.It is easy to verify that for every event A the following equalities are

true:

Fig.1.2.3Fig. 1.2.2

E

PROBABILlTY THEORY

E

8

Example 1.2.4. Consider the random event A that a farm chosen at randomhas at least one horse and at least one plow, with the additional condition thatthe maximum number of plows as well as the maximum number of horses are

1We shall discuss later the methods of making such a choice.

A n (O) = (O).A nE= A,A nA = A,

and read: Al and A2 and ....The product of events is illustrated by Flg. 1.2.4, where square E

represents the set of elementary events, and circles Al' A2, A3 representthree events; the shaded area represents the product AlA2A3'

In our definition the product of events Al' A2, ... , corresponds to theset-theoretical product of subsets Ah A2, ••• , of the set of elementaryevents. A product of events occurs if and only if all these events occur.

We postulate the following property of Z.Property 1.2.5. If a finite or denumerable number 01 events Al' A2, •••

belong to Z, then their product also belongs to Z.It is easy to verify that for an arbitrary event A the following equalities

are true:

or A = TI Aii

or A = AlA2' .. ,A = Al nA2 n ... ,

The difference Al - A2 occurs if and only if event Al but not event A2occurs.

If events Al and A2 are exclusive, the difference Al - A2 coincides withthe event Al'

As before, we·postulate the following property of the set Z of randomevents.

Property 1.2.4. If events Al and A2 belong to Z, then their differencealso belongs to Z.Example 1.2.3. Suppose that we investigate the number of children in a

group of families. Consider the event A that a family chosen at randorn! hasonly one child and the event B that the family has at least one child. The alter­native A + B is the event that the family has at least one child.If it is known that in the group under investigation there are no families having

more than n children, the set of elementary events consists of n + 1 elementswhich, as in example 1.2.2. is denoted by eo' el' ... ,en' Event A contains onlyone elementary event el' and event B contains n elementary events el' ... en' Thedifference A - Bis, of course, an impossible event since there is no elementaryevent which belongs to A and not to B. However, the difference B - A con­tains the elements e2' e3, ... , en and is the event that the family has more thanone child.

Definition 1.2.8. '(he event A which contains those and only thoseelements which belong to all the events Al' A2, ••• is called the product(or intersection) of these events.

We write

9RANDOM EVENTS

Fig.1.2.4

the first index denoting the number of horses, and the seeond the number ofplows.The random event A eontains four elementary events, ell' e12, e21, e22 and the

random event B eontains two elementary events, e10and ell. The produet A n Beontains one elementary event en, and hence the event A r,B oeeurs if and onlyif on the ehosen farm there is exaetly one horse and exaetly one plow.

Definition 1.2.9. The difference of events E - A is called the complemen tof the event A and is denoted by Á.The complement of an event is illustrated by Fig. 1.2.5, where square

E represents the set of elementary events, and circle A denotes someevent; the shaded area represents the complement A of A.This definition may also be formulated in the following way: Event A

occurs if and only if event A does not occur.According to properties 1.2.1 and 1.2.4 of the set Z of random events,

the complement A of A is a random event.

Example 1.2.5. Suppose we have a number of eleetrie light bulbs. We areinterested in the time t that they gIow. We fix a certain value lo such that if thebulb burns out in a time shorter than lo, we eonsider it to be defective. Weselect a bulb at random. Consider the random event A that we select a defec­tive bulbo Then the random event that we seleet a good one, that is, a bulb thatgIowsfor a time no shorter than 10' is the event A-, the complement of the event A.

We now give the definition (see Supplement) of the Borel field of eventswhich was mentioned earlier.Definition 1.2.10. A set Z 9f subsets of the set E of elementary events

with properties 1.2.1 to 1.2.5 is called a Borel field of events, and its elementsare caIIed random events.

In the sequel we consider only random events, and often instead ofwriting "random event" we simply write "event."

two. Consider also the event B that on the farm there is exaetly one horse and atmost one plow. We find the produet of events A and B.In this example the set of elementary events has 9 elements which are denoted

by the symbols

Fig.1.2.5

PROBABILITY THEORy

E

10

1 Many works have been devoted to the axiomatization of the theory of probability.We mention here the papers of Bernstein [1], Lomnick i [1], Rényi [1], Steinhaus [1],andthe book by Mazurkiewicz [1]. The system ofaxioms given in this section was con­structed by Kolmogorov [7]. (The numbers in brackets refer tothe number of the paperquoted in the references at the end of the book.) The basic notions of probability theoryare also discussed in the distinguished work of Laplace [1], and by Hausdorf [1], Mises[1, 2], Jeffreys [1], and Barankin [2].

A In everyday Ianguage the notion of probability is used without aprecise definition of its meaning. However, probability theory, as amathematical discipline, must make this notion precise. This is done byconstructing a system ofaxioms which formalize sorne basic propertiesof probability, or in brief, by the axiomatization of the theory ofprobability.! The additional properties of probability can be obtained ascon sequen ces of these axiorns.

In mathernatics, the notion of random event defined in the precedingsection corresponds to what is called a random event in everyday use.The system ofaxiorns which is about to be formulated makes precise thenotion of the probability of a random event. It is the mathematicalformalization of certain regularities in the frequencies of occurrence of

1.3 THE SYSTEM OFAXIOMS OF THE THEORYOF PROBABILITY

A = 2 An = lim Anon;;'l n-+oo

Definition 1.2.12.. The sequence {An}(n = 1, 2, ... ) of events is callednondecreasing if for every n we have

An+1 ~ Ano

The sum of a nondecreasing sequence {An} is called the Iimit of thissequence.

We write

D "The following definitions will facilitate some of the formulationsand proofs given in the subsequent parts of this book.

Definition 1.2.11. The sequence {An}(n = 1,2, ... ) of events is callednonincreasing if for every n we have

An ~ An+1'The product of a nonincreasing sequence of events {An} is called the

limit of this sequence, We write

A = TI An = lim Anon;;'l n++a:

11RANDOM EVENTS

We shaIl see in Section 2.3 that the converse ofaxiom II is not true:if the probability of a random event A equaIs one, or peA) = 1, the set Amay not include all the elementary events of the set E. \

We have already seen that the frequency of appearance of face 6 inthrowing a die oscillates about the number l. The same is true for face 2.We notice that these two events are exclusive and that the frequency ofoccurrence ofeither face 6 or face 2 (that is, the frequency ofthe aIternativeof these events), which equals the sum of their frequencies, oscilIatesabout the number i + t = l·

Experience shows that if a card is selected from a deck of 52 cards(4 suits of 13 cards each) many times over, the frequency of appearanceof any one of the four aces equals about l2' and the frequency of appear­ance of any spade equaIs about H. Nevertheless, the frequency ofappearance of the aIternative, ace or spade, oscil1ates not about thenumber 5~2 + H = H but about the number H. This phenomenon isexplained by the fact that ace and spade are not exclusive random .events (we could select the ace of spades). Therefore the frequency of the

The following simple exampIe Ieads to the formuIation ofaxiom n.Example 1.3.1. Suppose there are only black balls in an urn. Let the random

experiment consist in drawing a ball from the urn. Let m/n denote, as before,the frequency of appearance of the black ball. It is obvious that in this examplewe shall always have m/n = 1. Here, drawing the black ball out of the urn is asure event and we see that its frequency equals one.

Taking into account this property of the sure event, we formuIate thefoIlowing axiom.

Axiom JI. The probability of the sure etent equals one.We write

random events (this last to be understood in the intuitive sense) observedduring a long series of triaIs performed under constant conditions.

Suppose we are given the set of elementary events E and a Borel fieldZ of its subsets. As has already been mentioned (Section 1.1), it has beenobserved that the frequencies of occurrence of random events oscilIateabout sorne fixed number when the number of experiments is Iarge.This observed reguIarity of the frequency of random events and the factthat the frequency is a non-negative fraction less or equal to one have ledus to accept the foIlowing axiom.Axiom 1. To every random event A there corresponds a certain number

peA), called the probability of A, which satisfies the inequallty

O < peA) < 1.

PROBABILITY THEORY12

P(E) = 1.

n+ I P(AklAk2Ak) + ...+ (-1)n+lp(A1 ... An).

kl,1.·2.k3 =1k1<k2<k3

1We could have said that the probability f(A), satisfying axioms 1 to IJI, is a normed,non-negatioe, and countably additioe measure on the Borel field Z of subsets of E.

Let Al' A2, ••• , An, where n ~ 3, be arbitrary random events. It iseasy to deduce the formula (due to Poincaré [ID

(1.3.2') PC~lAk)=J/(Ak) -k,t/(Ak,Ak,)

kl <i«

peA U B) = peA) + P(B) - P(AB).(1.3.2)

A U B = A U (B - AB),B = AB U (B - AB).

The right sides of these expressions are alternatives of exclusive events.Therefore, according to axiom IlI, we have

PiA U B) = peA) + P(B - AB),P(B) = P(AB) + P(B - AB).

From these two equations we obtain the probability of the alternative oftwo events

In particular, if a random event contains a finite or countable numberof elementary events ek and (ek) E Z (k = 1,2, ... ),

P(e1, e2, ••• ) = P(e1) + P(e2) + ...The property expressed by axiom III is called the countable (or complete)

additioity of probabílity.'Axiom III concerns only the sums of pairwise exclusive events. Now

let A and B be two arbitrary random events, exclusive or not. We shallfind the probability of their alternative.

We can write

(1.3.1)

alternative, ace or spade, is not equal to the sum of the frequencies of aceand spade. Taking into account this property of the frequency of thealternative of events, we formulate the last axiom.Axiom IIl. The probability 01 the alternative 01a finite or denumerable

number 01pairwise exclusive events equals the sum 01 the probabilities 01these events.Thus, if we have a finite or countable sequence of pairwise exclusive

events {Ak}, k = 1,2, ... , then, according to axiom IlI, the followingformula holds:

13RANDOM EVENTS

Let A be the impossible event. We prove the next theorem.

peA) + peA) = 1.(l.3.4)

and finaIlypeA U A) = peA) + peA)

But since events A and A are exclusive, we have, by axiom lII,

peA U A) = 1.

In the following chapters it turns out that in this example we haveconsidered a particular case of the Poisson distribution which appearsvery often in practice.

We now prove the following theorem.Theorem 1.3.2. The sum 01 the probabilities 01 any eoent A and its

complement A is one.Proof. From the definition of A it follows that the alternative A U A

of A and A is the sure event; therefore, according to axiom II we have

(<Xl) ce

But P '/~oen = 1 and 1~O lln! = e, where e is the base of naturallogarithms.We then have 1 = ce; hence

(

r:IJ ), r:IJ 1P Len =cL "

n=O n=O n.

where e is some constant. From theorem 1.3.1 and axiom III it follows that

Example 1.3.2. Let the set of all non-negative integers form the set ofelementaryevents. Let (en) be the event of obtaining the number n, where n =0, 1, 2, . . .. Suppose that

(1.3.3)

B Consider a finite or countable number of random events Ak, wherek = 1, 2, . . .. lf every elementary event of the set E belongs to at leastone of the random events Ah A2, ••• , we say that these events exhaustthe set 01 elementary eoents E. The alternative LAk contains all the

kelementary events of the set E and therefore is the sure event. By axiomII we obtain

Theorem 1.3.1. If the eoents Al' A2, ••• exhaust the set 01 elementaryecents E,

PROBABILITY THEORY14

ocP(An) = L P(AkAk+l) + peA).

k=n(1.3.7)

Since the events under the summation sign on the right-hand side offormula(I.3.6) are exclusive, we have

For every k, the event AA¡;;AHl is the impossible event; thereforeP(AAkAk+I) = O. By axiom IlI, we obtain

00 00

AL Ak'4,c+l = L AAkAk+l"k=n lc=n

(1.3.6) P(An) = P (Jn AkAk+1) + peA) - P (AJnAkAk+1 ).

We note that

It foIlows from formula (I.3.2) that

00

An = L AkAk+l + A.k=n

Proofi If the sequence {An} is nonincreasing, then for every n we have

n .... oopeA) = lim P(An).(1.3.5)

We shaIl see in Section 2.3 that the converse is not true ; from the factthat the probability of sorne event equa]s zero it does not foIlow that thisevent is impossible.e The folIowing two theorems have numerous applications.

Theorem 1.3.4. Let {An}, n = 1, 2, ... , be a nonincreasing sequence ofeoents and let A be their producto Then

peA) = O.It folIows immediately that

peA) + P(E) = P(E).

If A is the impossible event (does not contain any ofthe elementary events),A and E are exclusive because they have no common element. Applyingaxiom IlI, we obtain

AuE=E.

Theorem 1.3.3. The probability of the impossible event is zero.Proof. For every random event A we have the equality

15RANDOM EVENTS

In sorne problerns we can compute probabilities by applying combina­torial formulas. We illustrate this by sorne examples.

1.4 APPLICATION OF COMBINATORIA.L FORMULASFOR COMPUTING PROBABILITIES

P(B) = peA) + P(B - A).

Since P(B - A) ~ 0, we have P(B) ~ peA).

Events A and B - A are exclusive; hence, according to axiom IlI,

B = A + (B - A).Proof, Let us write

peA) < P(B). I1

then

and the theorem is proved.We give one more simple theorem.Theorem 1.3.6. If ecents A and B satisfy the condition

A e B,

peA) = 1 - peA) = 1 - lim P(An) = 1 - lim [1 - P(An)] = lim P(An)

Hencen-+ 00

Proof. Consider the sequence of events {ArJ which are the complementsof the events An' From the assumption that {An} is a nondecreasingsequence it follows that {An} is a nonincreasing sequence. Let A be theproduct of events An' From theorem 1.3.4 it follows that

peA) = lim P(An).

peA) = lim P(An).n-+ ex)

(1.3.8)

Theorem 1.3.5. Let {An}, n = 1, 2, ... , be a nondecreasing sequence ofeoents and let A be their alternative. Then we hace

n-+ 00lim P(An) = peA).

is convergent, being a sum of non-negative terms whose partial sums arebounded by one. 1t follows that as 11-+ 00 the sum in (1.3.7) tends tozero. Thus, finally,

ex)

I P(AkAk+l)k=l

However, the series

PROBABILITY THEORY16

n-+oon-+oon-+ ex)

n!(1.4.1)

( n) n!m =m!(I1-111)!'

If every possible result of 11 successive tosses of a coin is equally likely,the required probability is

Example 1.4.1. Suppose we have 5 balls of different colors in an urn. Assumethat the probability of drawing any particular ball is the same for any ball andequals p.Here E consists of 5 elements and by hypothesis each has the same probability.

Hence by theorem 1.3.1, we have 5p = 1, or p = t.Example 1.4.2. "Suppose we have in the urn 9 slips of paper with the numbers

1 to 9 written on them, and suppose there are no two slips rnarked with the samenumber. Then E has 9 elementary events. Denote by A the event that on theslip of paper selected at random an even number will appear. What is the prob­ability of this event?As before, we suppose that the probability of selecting any particular slip is the

same for any slip, and hence equals t. We shall obtain a slip with an evennumber ir we draw one of the slips marked with 2, 4, 6 or 8. According to axiomIll, the required probability equals

peA) = t + ~-+ t + t = -*.If in the example considered we wish to compute the probability of selecting aslip with an odd number, w~ may notice that this random event is the comple­ment of A (we denote it by A) and, by theorem 1.3.2, we have

peA) = 1 - peA) = i.Example 1.4.3. Let us toss a coin three times. What is the probability that

heads appear twiee?The number of all possible combinations which may occur as a result of three

suceessive tosses equals 23 = 8. Denote the appearanee of heads by H and theappearanee of tails by T. We have the following possible eombinations:

HHH, HHT, HTH, THH, HTT, THT, TTH, TTT.

Consider each of these combinations as an elementary event and the wholecolleetion of them as the set E. Suppose that the oeeurrenee of each of thern hasthe same probability. We then have that the probability of eaeh particular corn­bination equals 1/23• From the table we see that heads appear twíce in threeelementary cvents (HHT, HTH, THH); henee by axiorn 111the required prob­ability is ~-."

If in the example just considered we had 11 tos ses instead of 3 and lookedfor the probability of obtaining heads m times, our reasoning would havebeen as follows.

The number of all possible combinations with 11 tosses equals Z".The number of combinations in which heads appear m times equals thenumber of combinations of m elements from 11 elements given by

17RANDOM EVENTS

1The methods of verification of such hypotheses are given in Part 2 of this book.2 It means, "My uncIe's shown his good intentions."

Consider now event A that the pair of letters occurs with a vowel in first place.Event A may be written as (aa, ab).

event B occurs 8638 times. Thus8638

P(B) = 20 000 = 0.432.,

"Moü ARAR causrx t¡eCTHbIX npaunn ... ,"2

To compute these probabilities he counted the corresponding pairs of letters inPushkin's poem Eugene Onegin on the basis of a text of 20,000 letters, and heaccepted the observed frequencies as probabilities.' The experiment yielded thefollowing results: there were 8638 vowels, and the pair "vowel after vowel"appeared 1104 times.Let us analyze this example. Denote a vowel by a and a consonant by b. As

elementary events we shall consider the pairs aa, ba, ab, bb, the set of elementaryevents is then (aa, ab, ba, bb).Consider event B that a pair of letters will appear in which a vowel is in

second place. Event B may be written as (aa, ba). It is known that a vowelappears 8638 times. These vowels follow either another vowel (in the pairs aa)or a consonant (in the pairs ba). Because no vowel appears at the beginning ofthe text considered

Vowel after vowel,Vowel after coi.sonant.

Let us first consider sorne examples.

Example 1.5.1. A. Markov [4] has investigated the probability of the appear­ance of these pairs of letters in Russian:

A\

1.5 CONDITIONAL PROBABILITY

In exarnples 1.4.1 to 1.4.4 the equiprobability of all elernentary eventswas assurned. This assurnption was obviously satisfied in our exarnples,but it is not always acceptable.

233! O! = 8'and the probability that heads appear twice equals l, as we already know.Hence, according to axiom Hl, the required probability is

i + i = ¡.

PROBABILITY THEORY18

3!

Example 1.4.4. Compute the probability that heads appear at least twice inthree successive tosses of a coin.

The random event under consideration will occur if in three tosses heads appeartwo or three times. According to formula (1.4.1), the probability that headsappear three times equals

Fig. 1.5.1

19RANDOM EVENTS

B In general, let B be an event in the setof elementary events E. The set B is then anelement of the Borel field Z of subsets of theset E of all elementary events. SupposeP(B) > O.Let us consider B as a new set of elementaryevents and denote by Z' the Borel field of allsubsets of B which belong to the fieId Z.

Consider an arbitrary event A from the fieId Z. It may happen inparticular cases that the event A belongs to the field Z', namely, when Ais a subset of B. If, however, A contains any element of E which does notbelong to B, A is not an element of Z'; yet sorne part of A rnay be arandom event in Z', namely, when A and B have cornrnon eIernents, thatis, when the product AB is not empty.

Now let B denote a fixed elernent of the field Z, where P(B) > 0, whileA runs over all possible elements of Z; then all elernents of Z' are productsof the form AB. To stress the fact that the product AB is now beingconsidered as an element of Z' (and not of Z) we denote it by the symboIA IB and read: "A provided that B" or "A provided that B has occurred."

If A contains B, A IBis the sure event (in the field Z').Event A IBis illustrated by Fig. 1.5.1. Here square E represents the

set of all elementary events, and circles A and B denote sorne randomevents. The shaded area represents the random event B, and the doublyshaded area represents the random event A I B, that is, "event A providedthat B has occurred."

The probability of the event A I B in the field Z' will be denoted bypeA IB) and read: The conditional probability of A provided B hasoccurred.

As will be shown shortly this probability can be defined by using theprobability in the field Z; hence there is no need to postulate separately .the existence of the probability PtA IB) and its properties.

The question "What is the frequency of a vowel followed by a vowel?" mightnow be formulated as follows.What is the probability of event A in cases when event B has already occurred?

We are not interested here in the probability of event A in the whole set E ofelementary events but in the conditional probability which would correspond tothe conditional frequency of event A provided event B has occurred ; in otherwords, the probability of event A in the set (aa, ha) considered as the whole setof elementary events.In our example we are interested in the probability of the event (aa). The ex­

periment showed that this event appeared 1104 times, and, since event Bappeared 8638 times, the probability we are lookingfor equals E

11048638 = 0.128.

P(A¡A2A3) = P(A¡A2)P(A3 I A¡A2)= P(A¡)P(A2 IA¡)P(A3 I A1A2)·

(l.5.6)

peA I A A ) = P(A¡A2A3) .3 ¡ 2 P(A¡A2)

From (l.5.5) and (l.5.3) we obtain for the probability of the product ofthree events thc reIations

(1.5.5)

This formula is to be read: The probability 01 the product AB 01 two eventsequals the product 01 the probability 01 B times the conditional probability01 A prooided B has occurred or, what amounts to the same thing, to theprobability 01 A times the probability 01 B provided A has occurred.

Let A¡, A2, A3 denote three events from the same fieIdZ. Consider theexpression P(A3 I A¡A2), or the probabiIity of A3 provided the productA¡A2 has occurred. According to (l.5.2) this probabiIity, assuming thatP(A¡A2) > O,equaIs

P(AB) =P(B)P(A I B) =P(A)P(B lA).(1.5.4)

where peA) > O.(1.5.3)

Similarly,

peA I B) = P(AB)P(B) ,

P(B I A) = P(AB)peA) ,

From (l.5.2) and (l.5.3) we obtain

(1.5.1) k k/nm m/n'

to the probabiIities instead of the frequencies, we accept the followingdefinition.

Definition 1.5.1. Let the probabiIity of an event B be positive. Theconditional probability 01 the event A provided B has occurred equaIs theprobability of AB divided by the probability of B.Thus

e To faciIitate the understanding of the definition of PtA lB), let usconsider the following.Suppose we have performed n random experiments and have obtained

the event B m times. Moreover, in k(k < m) of these experiments we alsoobtained the random event A. The frequency of AB equaIs kln, and thefrequency of B equaIsm/n; the frequency of the random event A, providedthe random event B has occurred, equaIs klm.AppIying the equaIity

PROBABILITY THEORy20

where P(B) > O.(1.5.2)

and hence

This is the property expressed by axiom 11.Consider now the alternative L(Ai I B) of pairwise exclusive events.

We can write i

tCA;1 B) = (tA,) lB,

p[ t CA;lB)] = p[ (tA,) lB J.

peA lB) = 1.and hence

where A is the complement of A. Thus AB e B, and from theorem 1.3.6,we obtain (1.5.8).

Since P(AB) > Oand P(B) > Owe obtain, from formula (1.5.8),

O~p(AIB)~l,

which is the property expressed by axiom 1.Now let A I B be the sure event in fieId Z', that is, let AB = B. Then

P(AB) = P(B),

B= AB UAB,

In fact, event B may occur either when event A occurs, or when eventA does not occur; hence

P(AB) < P(B).(1.5.8)

D We shall show that the conditional probability satisfies axioms 1to 111.

We notice that the folIowing inequality is true:

(1.5.7) P(AlA2 ... An)

= P(Al)P(A2 I Al)P(A3 IAlA2) ... P(An I Al ... An-l)·

This formula is to be read: The probability 01 the product 01 threeevents equals the probability of the first event times the conditional proba­bility 01 the second event provided the first event has. occurred times theprobability 01 the third event provided the product 01 the first two eventshas occurred.

Now let A¡, Az, ... ,An be random events. We could consider theconditional probabilities peAk Ak ... Ak I Ak ... Ak ) of the product

1 2 r r+l 11

of sorne subgroup consisting of r events (1 ~ r ~ n - 1) provided theproduct of the remaining n - r events has occurred. By a reasoningsimilar to that stated we obtain

21RANDOM EVENTS

P(B) = P(AIB) + P(A2B) + ...(1.6.3)and

A Before we start the general consideration let us consider an example.Example 1.6.1. We have two urns. There are 3 white and 2 black balls in the

first urn and 1 white and 4 black balls in the second. From an urn chosen atrandom we select one ball at random. What is the probability of obtaining awhite ball if the probability of selecting each of the urns equals 0.5?Denote by Al and A 2 respcctively, the events of selecting the first or second

urn, and by B the event of selecting a white ball. Event B may happen eithertogether with event Al or together with event A2; hence we have

B = AIB + A2B,

and since events AIB and A2B are exclusive, we haveP(B) = P(AIB) + P(A2B).

Applying formula (1.5.4) we obtain(1.6.1) P(B) =P(AI)P(B IAl) + P(A2)P(B IA2)·

In this example we have peAl) = P(A2) = 0.5, P(B I Al) = 0.6, andP(B I A 2) = 0.2. Placing these values into (1.6.1) we obtain P(B) = 0.4.

Formula (1.6.1) obtained in this example is a speciaIcase of the theoremof absolute probability, which is now given. .Theorem 1.6.1. If the random events Al' A2, ••• are pairwise exclusive

and exhaust the set E of elemen tary events, and ifP(Ai) > 0for i = 1,2, ... ;then for any random event B we have

(1.6.2) P(B) = P(AI)P(B I Al) + P(A2)P(B IA2) + ...In fact, from the assumptions it follows that R may happen together

with one and only one of the events Ai' We then have

B = AIB + A2B + ...

1.6 BAYES THEOREM

[( ) I ]- p[(tA,)BJ P(tA,B)

P t Ai B - P(B) - P(B)

= I P(AiB) = I P(Ai lB).i P(B) i

This formula expresses the countable additivity of conditional probability.Since all the axioms are satisfied for the conditional probabilities, the

theorems derived from these axioms hold for the conditional probabilities.

According to (1.5.2) and axiom 111we have

PROBABILITY THEORY22

1 The methods of verifying such hypotheses will be given in Part 2.

and introducing in the denominator expression (1.6.2) for P(B), weobtain (1.6.5).

Formula (1.6.5) is called Bayes formula or the formula for a posterioriprobability. The latter name is explained by the fact that this formulagives us the probability of Ai after B has occurred. On the other hand,the probabilities P(Ai) in this formula are called the a priori probabilities.

Bayes formula plays an important role in applications.

Example 1.6.2. Guns 1 and 2 are shooting at the same target. It has beenfound that gun 1 shoots on the average nine shots during the same time gun 2shoots ten shots. The precision of these two guns is not the same; on the aver­age, out of ten shots from gun 1 eight hit the target, and from gun 2, only seven.During the shooting the target has been hit by a bullet, but it is not known

which gun shot this bullet. What is the probability that the target was hit bygun 2?. Denote by Al and A 2 the events that a bullet is shot by gun 1 and gun 2,respectively, Taking into consideration the ratio of the average number of shots.made by gun 1 to the average number of shots made by gun 2, we can putpeAl) = 0.9P(A2V Denote by B the event that the target is hit by the bullet.According to the data about the precision of the guns we have P(B I Al) = 0.8and P(B I A2) = 0.7. According to Bayes formula

P(A2)P(B I A2)

P(A21 B) = P(Al)P(B I Al) + P(A2)P(B I A2)

0.7 P(A2)

0.9P(A2) • 0.8 + 0.7P(A2) = 0.493..

P(Ai I B) = P(Ai)P(B I Ai)P(Al)P(B I Al) + P(A2)P(B I A2) + ...

In fact, substituting Ai for A in formula (l.5.4), we obtain

(1.6.5)

Substituting values (1.6.4) into (1.6.3) we get (1.6.2).B Again let the events Ai satisfy the assumptions of theorem 1.6.1.Suppose that the event B has occurred. Now what is the probability ofAi? This question is answered by the following theorem due to Bayes.

Theorem 1.6.2. If the events Al' A2, ••• satisfy the assumptions of thetheorem of absolute probability and P(B) > 0, then for i = 1, 2, ... we hace

(1.6.4)

According to (1.5.4) we obtain for every i,

23RANDOM EVENTS

that is, if the probability of the product of every combinationAkl' Ak2' ... , Aks of events equals the product of the probabilities of theseevents.

PtA; Ak ... Ak ) = PtA¿ )P(Ak ) ... PtA, ),1 2 .~. 1 2 S

(1.7.4)

we have

Formula (1.7.2) was derived from formula (1.5.4) where it was assumedthat peA) > O and P(B) > O; nevertheless this formula is also validwhen one of these probabilities equals zero.We now define the notion of independence of two random events.Definition 1.7.1. Two events A and B are called independent if their

probabilities satisfy (1.7.2), that is, if the probability of the product ABis equal to the product of the probabilities of A and B.It follows from this definition that the notion of independence of two

events is symmetric with respect to these events.As we have already established, formula (1.7.2) can be obtained from

either of the formulas (1.7.1) and (1.7.3). We also notice that formulas(1.5.4) and (1.7.2) with peA) > Oand P(B) > Ogive formulas (1.7.1) and(1.7.3). We thus deduce that each of the last formulas is a necessary andsufficient condition for the independence of two events with positiveprobabili 1ies.B The notion of independence of two events can be generalized tothe case of any finite number of events.

Definition 1.7.2. Events Al' A2, ••• , An are independent iffor all integerindices kl, k2, ••• k, satisfying the conditions

1 < kl < k2 < ... < k, < n

P(B I A) = P(B)(1.7.3)

Formula (1.7.2) is also satisfied if

P(AB) = P(A)P(B).(1.7.2)

is of special importance. Then the fact that B has occurred does not haveany influence on the probability of A, or we could say, the probabilityof A is,independent of the occurrence of B.We notice that if(l. 7.1) is satisfied, formula (1.5.4) gives

A In general, the conditional probability peA I B) differs from peA).However, the case when we have the equality

1.7 INDEPENDENT EVENTS

PROBABILITY THEORY24

peA I B) = peA)(1.7.1)

Al + A2 = A2 + AhA1A2 = A2A¡,

Al + (A2 + A3) = (Al + A2) + A3,Al(A2A3) = (A1A2)A3,

Al(A2 + A3) = A1A2 + A1A3'

Problems and Complements1.1. Prove that the operations of addition and multiplication of random

events are commutative and satisfy the following associative and distributivelaws:

It is possible that Al' A2, ••• , An are pairwise independent, that is, eachtwo events among Al' A2, ••• , An are independent, that each three ofthemare independent, and so on, and yet Ab A2, •.• , An are not independent.This is illustrated by the example given by Bernstein.

Example 1.7.1. There are four slips of paper of identical size in an urn.Each slip is marked with one of the numbers 110, 101,011,000 and there are notwo slips marked with the same number. Consider event Al that on the slip selec­ted the number 1 appears in the first place, event A2 that 1 appears in the secondplace, and A3 that 1 appears in the third place. The number of slips of each cate­gory is 2, the number of all slips is 4; hence if we assume that each slip has thesame probability of being selected, we have

P(Al) = P(A2) = P(A3) = t .Let A denote the product A1A2A3' P(A) = Osince event A is impossible (there

is no slip marked with 111) and sinceP(Al)P(A2)P(A3) = i~O =P(A),

events Al' A2, and A3 are not independent. We shall show, however, that theseevents are pairwise independent. In fact, for the pair Al' A2 we have

P(A21 Al) = t =P(A2),since there are only two slips having 1in the first place, and only one among themwith 1 in the second place. In a similar way we could show that the remainingpairs are independent.

Independence of a countable number of events is defined in the followingway.

Definition 1.7.3. Events Al' A2, ••• are independentifforevery n = 2,3, ...events Ab A2, ••• , An are independent.

From definitions 1.7.3 and 1.7.2 it follows that if the random eventsAb A2, ••• are independent, for n = 2, 3, . . . and for arbitrary indiceskl, k2, ••• , km events Ak , Ak , ... ,Ak are independent.

1 2 n

To stress the fact that we consider the independence in the sense ofdefinitions 1.7.2 and 1.7.3, and not the independence of pairs, triplesand so on, the term independence en bloc, or mutual independence, isused in probability theory. We shall avoid these terms, and independencewill always be understood in the sense of the given definitions.

25RANDOM EVENTS

ce

A =A* =A* =TIAn.n=l

ceA=A*=A* =_LAn.

n=l(h) Prove that if {An} is a nonincreasing sequence, then

(a) Prove that if {An} is a nondecreasing sequence,

A* e A*.1.8. (Notation as in the preceding problern) lf A* = A*, then A = A* = A*

is called the limit of the sequence {An}' Then we writeA = lim Ano

(h) Show that

a) ce

A* = I TIAk'n=lk=n

ce a)

A* =TI :L Ak,n=lk=n

(a) Prove thatn-a)

The random event A* whích contains all the elementary events which belong toall but a finite number of the events An will be called the lower Iimit of the sequence{An},

U Ai = Al + (A2 - A1A2) + (A3 - A1A3 - A2A3) + ....i=l

1.6. Prove that properties 1.2.2 and 1.2.5 follow from properties 1.2.1, 1.2.3,and 1.2.4.

1.7. Let {An} (n = 1,2, ... ) be an arbitrary sequence of random events. Therandom event A * which contains all the elementary events which belong to an in­finite nurnber of the events An will be called the upper limit of the sequence {An},

A* = lim sup An'n---a)

A1A2 = Al - (Al - A2),Al + A2 = Al + (A2 - Al),Al - A2 = Al ,- A1A2,

Al(A2 - A3) = A1A2 - A3'

1.3. Prove that Al e A2 implies Al + A2 = A2 and A1A2 = Al'1.4. (a) Prove the following two identities, called de Morgan's laws:

Al.+ A2 = A1A2, A1A2 = Al + A2·(h) Generalize these identities to the case of n (n > 2) random events.1.5. (a) Prove that for n = 2, 3, ...

Al + ... + An = Al + (A2 - A1A2) + ... + (An - A1An ... - An-1An)'Note that the terms on the right-hand side are pairwise exclusive.

(h) Show that

1.2. Prove the relations:

PROBABILITY THEORY26

In combinatorial Problems 1.11 to 1.16 assume that all the possible outcomeshave the same probability.1.11. A deck of cards contains 52 cards. Player G has been dealt 13 of them.

Compute the probability that player G has(a) exactIy 3 aces,(b) at least 3 aces,(e) any 3 face cards of the same face value,(d) any 3 cards of the same face value from the five highest denominations,(e) any 3 cards of the same face value from the eight lowest denominations,(f) any 3 cards of the same face value,(g) three successive spades,(h) at least three successive spades,(i) three successive cards of any suit,(j) at least three successive cards of any suit.1.12. Three dice are thrown once. Compute the probability of obtaining(a) face 2 on one die,(b) face 3 on at least one die,(e) an even sum,(d) a sum divisible by 3,(e) a sum exceeding 7,(f) a sum smaller than 12,(g) a sum which is a prime number.1.13. (Cheoalier de Méré's problem) Find which of the following two events

is more likely to occur: (1) to obtain at least one ace in a simultaneous throw offour dice; (2) at least one double ace in a series of 24 throws of a pair of dice.1.14. (Banach's problem) A mathematician carries two boxes of matches,

each of which originally contained n matches. Each time he lights a cigarettehe selects one box at random. Compute the probability that when he eventuallyselects an empty box, the other will contain r matches, where r = 0, 1, ... , n.

1.10. Using Problems 1.5(a) and (b), prove the inequalities

pCºlA;) Si~/(A;),PCºlA;) Si~/(Ai)'

P ( lim An) = lim P(An).n->- 00 n->- 00

n->-oo

Hint: Use theorems 1.3.4 and 1.3.5.(b) Deduce that if lim An exists, then

P (lim inf An) ~ lim inf P(An).1t->-00 n->- 00

and

27RANDOM EVENTS

1.9. (a) Prove that for an arbitrary sequence of random events {An}

P (lim sup An) ~ lim sup P(An),n->-oo n-oo

1The beginning of a Polish birthday songo It means something similar to, "May helive a hundred years, a hundred years."

where Po is a given number.1.22. (a) The events Al' A2, ••• are independent and P(Ak) =h. What is the

probability that none of the events A,~occurs?(b) Answer the question of Problem 1.22(a) without the assumption of inde­

pendence.1.23. Prove that if the evénts A and B are independent, the same is true for

the events A and [J.

The slips are then arranged in random order. What is the probability of obtain­ing the sentence.! "Sto lat sto lat niech iyje iyje nam"?1.17. The famous poet Goethe once gave his guest, the famous chemist Runge,

a box of coffee beans. Runge used this gift-at that time very valuable-forscientificexperiments, and for the first time obtained pure caffeine. Is it possibleto compute the probability of this event? If so, is the answer unique? What arethe factors which determine the precise formulation of the random event whoseprobability we compute?

1.18. (Bertrand's paradox) A circle is drawn around an equilateral trianglewith side a. Then a random chord is drawn in this circle, The event A occurs ifand only if the length 1of this chord satisfies the relation 1> a. State the condi­tions under which (a) P(A) = 0.5, (b) P(A) = 1/3, (e) P(A) = 1/4. Should theseresuIts be considered as paradoxical?

1.19. (Buffon's problem) A needle oflength 21 is thrown at random on aplaneon which parallel lines are drawn at a distance 2a apart (a > 1). What is theprobability of the needle intersecting any of these lines?. 1.20. The probability that both of a pair of twins are boys equals 0.32 and theprobability that both of them are girls equals 0.28. Find the conditional prob­ability that .

(a) the second twin is a boy, provided the first is a boy,(b) the second twin is a girl, provided the first is a girl.Hint: Use example 1.1.3.1.21. (a) What should n be in order that the probability of obtaining the face 6

at least once in a series of n independent throws of a die will exceed 3/4?(b) The events Al' A2, ••• are independent and P(Aj) = P (j = 1, 2, ... ).

Find the least n such that

a e e h .i lmn o s

3Number of slips

1.15. An urn contains m white and n-m black balls. Two players succes­sively draw balls at random, putting the drawn ball back into the urn before thenext drawing. The player who first succeeds in drawing a white ball wins. Com­pute the probability of winning by the player who starts the game.

1.16. There are 28 slips of paper; on each of them one letter is written. Theletters and their frequencies are presented in the following table:

PROBABILITY THEORY28

y i

2 2 2 4 2 22 23

Letter

29

We can assign a number to every elementary event from a set E ofelementary events. In the coin-tossing example we assigned the number 1to the appearance of heads and the number O to the appearance of tails.Then the probability of obtaining the number 1 as a result of an experimentwill be the same as the probability of obtaining a head, and the probabilityof obtaining the number O will be the same as the probability of obtaininga tail.

Similarly, in the example of throwing a die we can assign to everyresult of a throw one ofthe numbers i (i = 1, ... , 6) corresponding to thenumber of dots appearing on the face resulting from our throw. Ingeneral, let e denote an elementary event of a set E of elementary events.On the set E we define a single-valued real function X(e) such that, roughlyspeaking, the probability that this function will assume certain values isdefined. To formulate precisely the conditions which are to be satisfiedby this function let us introduce the notion of an inverse image.

Definition 2.1.1. Let X(e) be a single-valued real function defined on theset E of elementary events. The set A of all elementary events to whichthe function X(e) assigns values in a given set S of real numbers is calledthe inverse image 01 the set S.It is clear that the inverse image of the set R of all real numbers is the

whole set E.Definition 2.1.2. A single-valued real function X(e) defined on the set E

of elementary events is called a random variable! if the inverse image ofevery interval 1 on the real axis of the form (- 00, x) is a random evento

We shall set the probability P(x>(l) that the random variable X(e) takeson a value in the interval 1 equal to the probability peA) of the inverseimage A of l.

1The notion of a random variable corresponds in the theory of real functions to thenotion of a [unction measurable with respecI lo the field 01seIs being considered.

Random Variables

CHAPTER 2

2.1 THE CONCEPT OF' A RANDOM VARIABLE

top related