probability theory 1 - ole witt-hansenolewitthansen.dk/mathematics/probability_theory.pdf&+$37(5 9,,...

Probability Theory

An introduction and beyond

This is an article from my home page: www.olewitthansen.dk

Ole Witt-Hansen (2017) 1997

Contents

Chapter I. Combinatorics ............................................................................................................................................... 1

1. The multiplication and addition principles ................................................................................................................ 1

2. Permutations ................................................................................................................................................................. 2

3. Combinations ................................................................................................................................................................ 3

CHAPTER II. FINITE PROBABILITY FIELDS ....................................................................6

1. Introduction .................................................................................................................................................................. 6

2. A finite probability field ............................................................................................................................................... 6

3. Events............................................................................................................................................................................. 7 3.1 Using summation signs ............................................................................................................................................ 8 3.2 Symmetric outcome spaces ...................................................................................................................................... 8

4. Manipulating events and probabilities...................................................................................................................... 11

5. Samples without replacement .................................................................................................................................... 14

5.1 Examples.................................................................................................................................................................... 14 5.2 Example. Probabilities in Lotto games .................................................................................................................. 15

CHAPTER III. CONDITIONAL PROBABILITIES ..............................................................18

1. Definition of a conditional probability ...................................................................................................................... 18

2. Developing probabilities on events ............................................................................................................................ 21

3. Independent events ..................................................................................................................................................... 23

CHAPTER IV. PROBABILITY DISTRIBUTIONS ..............................................................25

1. Repetition of an experiment....................................................................................................................................... 25

2. An example of the Binomial Distribution ................................................................................................................. 25

3. The general binomial distribution............................................................................................................................. 27

4. The binomial distribution and Pascal’s triangle ...................................................................................................... 27

5. Stick diagrams for the binomial distribution ........................................................................................................... 31

6. Cumulated probabilities............................................................................................................................................. 31

7. Samples with replacement.......................................................................................................................................... 32

8. Hypothesis testing ....................................................................................................................................................... 33

9. Samples without replacement. The hyper geometric distribution.......................................................................... 34

4. The Poisson distribution............................................................................................................................................. 34

CHAPTER V. STOCHASTIC VARIABLES .......................................................................37

1. What is a stochastic variable?.................................................................................................................................... 37

2. The mean value of a stochastic variable.................................................................................................................... 37 2.1 The variance and the standard deviation ................................................................................................................ 39

3. The mean value and variances of the binomial distribution ................................................................................... 43

4. The mean and the variance of the Poisson distribution........................................................................................... 44

5. Estimating the population mean and population standard deviation .................................................................... 45

6. Chebychev’s inequality .............................................................................................................................................. 46

7. The distribution function. Graphic representation.................................................................................................. 48

CHAPTER VI. CONTINUOUS PROBABILITY DISTRIBUTIONS ....................................49

1. Introduction ................................................................................................................................................................ 49

4. The Normal Distribution............................................................................................................................................ 49 2.1 Stirling’s approximation to the Gamma function................................................................................................... 50 2.3 The normal distribution as the asymptotic limit of the binomial distribution ........................................................ 51 2.6 The Poisson distribution as a limit of the binomial distribution ........................................................................... 54

2.7 The central limit theorem......................................................................................................................................... 55 2.7.1 Evaluating the moments of a distribution ........................................................................................................... 55

3. Practical applications of the normal distribution .................................................................................................... 58

Example. Fluctuations in the weight of goods. ............................................................................................................. 60 3.1 Normal distribution paper ...................................................................................................................................... 60 Exercise........................................................................................................................................................................ 65 3.2. The statistical limits for deviations in a sample test.............................................................................................. 65

CHAPTER VII. FITTING DATA TO THEORY ...................................................................66

1. The theoretical foundation of regression .................................................................................................................. 66 1.1 The method of maximum likelihood...................................................................................................................... 66

2. Linear regression ........................................................................................................................................................ 67

4. Correlation coefficients .............................................................................................................................................. 68

5. Chi-square test ............................................................................................................................................................ 69

6. The chi-square density function and distribution function.................................................................................... 70

7. Pearson’s formula for the χ2 test ............................................................................................................................... 72

8. Explaining Pearson’s formula. .................................................................................................................................. 74

Preface These notes on probability theory, was originally written in 1997 adapted for the third year of education in the 9-12 year Danish high school. It was generated from a frustration over the deterioration of the professional academic level in the Danish educational system. A development, which since 1988 has gone from the bad to the catastrophic after 2005. So it would be out of question to use these lecture notes to day, alone because the mathematical skills, which were earlier placed in the heads of the students, have now been entirely transferred to the core store of computers. The first two thirds of these notes are more or less a direct translation from the Danish notes. They are therefore at an introductory, but still rigorous level, with extended detail in the logical procedure and calculations. The last third part, however, is an extension to (my) university level, relying heavily on calculus, and dealing with the normal distribution as a asymptotic limit of the binomial distribution. Also the central limit theorem is treated in some detail. The derivation of the chi–square distribution function, and the significance of Pearson’s formula to decide independence of two variables, especially adapted to social sciences and biology. 1. march 2017 Ole Witt-Hansen.

Combinatorics 1

Chapter I. Combinatorics

1. The multiplication and addition principles In combinatorics one is concerned with methods applied to a systematic counting the number of ways to select a subset of elements from a given finite set. Combinatorics is founded by two simple but fundamental principles, called the multiplication principle and the addition principle. For practical reasons they are often referred to as the ”both… and” principle and the ”either or” principle. Since mathematical theories are axiomatically founded, we shall initially state an obvious fact.

If we have a set with n different elements, one element can be selected in n different ways. For example, if we have a class with 24 students a student can be selected in 24 different ways. However if one must select a student both from the X-class (24 students) and a student from the Y-class (18 students), one may reason as follows: Each time you has selected a student from the X-class, you may chose a student from the Y-class in 18 different ways. The number of different ways to select the two students is therefore obviously 24 times 18 possibilities: 2418 = 432 possibilities.

This is an example of the contents of the multiplication principle. If you must select both an element from a set of n elements and an element from a set with m elements, this can be done in nm different ways.

The multiplication principle is more informally referred to as the both…and principle, since if you can formulate your selection with the words both…and, then the number of choices in the selections should be multiplied with each other Let us next assume that you shall either select a student from the X-class or a student from the Y-class. In that case you merely make a selection among 24 plus 18 students, and the answer is 24+18 = 42 possibilities.

This is an example of the contents of the addition principle. If you may select either an element from a set of n elements or an element from a set of m elements it can be done in n+m different ways.

The addition principle is less formally referred to as the either…or principle, for the reason that if you can formulate your selection using the words either…or you should add the number of choices from the two selections. Although these examples appear utterly trivial, you may encounter far more complex situations, where you have to decide for yourself, whether the selections are both…and or either…or.

Combinatorics 2

The two principles can of course trivially be generalized to more than two sets of elements, since you must either add the choices of the sets or multiply them. Examples:

1. If you have to choose from a menu, including starter, main dish and dessert, having 4 starters, 7 main dishes and 6 desserts, and if you want the whole menu, it can be done in 476 = 168 different ways, but if you can’t afford a whole menu, and settle for just one dish, the are 4+7+6 = 17 possibilities.

2. An older model of a lock on a bicycle had 6 keys, each having 3 positions (in, out and neutral). What are the

numbers of different combinations to unlock? Well, the first key has 3 positions, and the for each, the next key also has 3 positions, which makes 9, so using the multiplication principle, the number of combinations is 333333 = 36. (The combination where all the keys are in neutral is not advisable, however).

3. In Denmark we have had for a long time the possibility of playing on the outcomes of national foot-ball games.

It is done on a coupon with 13 entries 1 (won), X (tie), 2 (lost). What is the number of possibilities when filling in a coupon? Since each entry is filled in independently of the others, and you have to fill in all entries, the multiplication principle applies. So the result is: 333...3 = 313 = 1.594.323 possibilities.

2. Permutations Think of a set having n different elements, (e.g. a class with 24 students). A sorted subset with q elements (think of the last row in the class where q = 8) is called a q-permutation of a set having n elements, which we shall denote a n-set. We wish to find a formula, written symbolically P(n,q) or Pn,q , for the number of different q-permutations, one may form from a n-set. We start out by making a formula for P(n,n) , that is the number of different permutations of a n-set. We imagine that we have a row of n places (24 sorted seats) and we aim at finding the number of different ways they can be occupied by n elements (e.g. 24 persons). n n-1 n-2 … 3 2 1 Since both the first and the second, and the nth place must be occupied, we must apply the multiplication principle. The first place (seat) can by occupied in n (24) different ways, and for each choice, the second place (seat) can be occupied in n – 1 (23) different ways and so on, so by the multiplication principle, the first two places (seats) can be occupied in n(n-1) different ways. We may continue this line of argument, so that the n places (seats) can be occupied in n(n-1)(n-2) ...321 (242322...321) different ways. This leads to the formula P(n,n) = n(n-1)(n-2) ...321 P(24,24) =242322...321

For the product of integers from 1 to n: n(n-1)(n-2) ...321 the standard symbol n! is used, and it is read n-factorial.

To accomplish the general validity of various formulas it is useful to define:

0! = 1 On most pocket computers you can find e.g. that 6! = 720 and 10! = 3,628,800.

Combinatorics 3

Thus we have shown the formula: P(n,n) = n!

Using the factorial notation, it is relatively easy to establish a formula P(n,q), where n q, that is the number of q-permutations taken from a n-set. The number of q-permutations that can be made from a n-set, is according to the arguments above:

P(n,q) =n(n – 1)(n – 2)∙∙∙(n – q +1).

Is the then straightforward to establish a formula for P(n,q) for n q , since we only have to multiply and divide the result by (n –q)! = (n-q)(n – q – 1) ∙∙∙3∙2∙1, since

)!(

!

)!(

)!()1(...)2()1()1(...)2()1(

qn

n

qn

qnqnnnnqnnnn

We thus have the formula:

)!(

!),(

qn

nqnP

Notice, that with the definition 0! = 1, the formula is also valid when n = q, since then P(n,n) = n!/0! = n! Sometimes it is surprising to realize, how large a number P(n,q) actually is. For example if we calculate in how many ways can you occupy the last row (8 seats) in a class having 24 students. The answer is:

P(24,8) = )!824(

!24

= 242322...1817 = 2.96541010

3. Combinations If we have a set of n-elements (a n-set), you may for example think of a class having 24 students, where you shall select a subset with q elements, a q-subset. You may think of appointing a committee with 4 members. The selection of a q-subset from an n-set is not a permutation, because the order of the members in the q-subset is insignificant. The committee is unsorted. A q-subset selected from an n-set is called a combination.

The number of different q-subset combinations that can be selected from a n-set is denoted C(n,q), or sometimes Cn,q. We wish to establish a formula for C(n,q).

We notice that trivially: C(n,1) = n and C(n,n) =1 We shall first illustrate the reasoning with the example of selecting a committee of 4 students from a class with 24 students.

Combinatorics 4

We imagine, however, that the committee after the selection must elect a chairman, a referent, a subordinate chairman and a cashier. The committee can be selected in C(24,4) different ways, but until now we do not know the number C(24,4). However, once the committee is chosen, the different positions in the committee can be selected in 4321 ways, since there are 4 possibilities for the chairman, then 3 possibilities for the referent… Since we must select both the committee, and distribute the positions in the committee, the number of possibilities of selecting the committee, and subsequently a chairman, a referent, a subordinate chairman and a cashier, must be

C(24,4) 4321 = C(24,4) 4! But we might as well have selected the 4 positions directly, and the answer would be: P(24,4), (since the 4 elements in the combination are different). So when we put the two ways of calculating P(24,4) together, we have:

C(24,4) 4! = P(24,4) )!424(!4

!24

!4

)4,24()5,24(

PC

Which is the wanted formula for C(24,4). We shall now repeat the argument with n and q, replacing 24 and 4. To do so we shall calculate the number of q-combinations that can be selected from an n-set in two different ways. The selection of a q- permutation (directly) can be done in P(n,q) different ways. However, we could also do it the other way round, first select a q-combination, which can be done in C(n,q) different ways and then permute the q-elements, which can be done in q! different ways. One way or another, we must find the same number of q- permutations. This implies:

C(n,q)q! = P(n,q) !)!(

!

!

),(),(

qqn

n

q

qnPqnC

(3.1) !)!(

!),(

qqn

nqnC

We have then obtained a general formula, which is also valid for q = 0, and n = q, because we have set 0! = 1. For practical and theoretical use, two formulas are often used, when doing calculation by hand The two formulas are merely a reduction to avoid n! for large values of n.

(3.2) )1(...)2()1(),( qnnnnqnP and q

qnnnnqnC

...321

)1(...)2()1(),(

Notice that the last formula has the same number of factors in the numerator and the denominator.

Combinatorics 5

Examples

Lotto: When you play Lotto (in Demark), you must choose 7 numbers out of 36. We want to find the total number of possible Lotto coupons. This is however the same as finding the numbers of 7-subsets from a 36-set. So the answer is simply:

680.347.8!7)!736(

!36)7,36(

C

1. In a class having 10 boys and 12 girls they must form a committee of 4, having 2 girls and 2 boys. In how many ways can the committee be formed.

The 2 boys can be selected in C(10,2) = 45 different ways, and the girls in C(12,2) = 66 different ways. According to the multiplication principle, (since we shall choose both the boys and the girls), the result is: C(10,2) C(12,2) = 2970

2. From an assembly consisting of 8 woman and 12 men a committee of 5 must be formed. In how many ways can it be done if:

a) One may choose freely among the 20 members? b) The committee must have at least one woman and one man.

c) The committee must have at least two woman and two men

a) C(20,5) = 15.504 b) The answer is most easily found if we subtract the number of committees having only woman and the committees having only men.

C(20,5) – C(8,5) –C(12,5) = 15.504 – 56 – 792 = 14.656. c) There are only 2 possibilities, either 2 woman and 3 men or 3 woman and 2 men. Therefore we apply both the multiplication principle and the addition principle. C(12,3) C(8,2) + C(12,2) C(8,3) = 985

Exercises 1. The morse code alphabet consists of the symbols ”” and ” – ”. A code for a letter or a digit consists of 1 to 5

symbols. How many codes is it possible to create?

2. There are 8 participants in a chess tournament. All must play against all. How many games of chess are there to be played.

3. How many different seating plans are possible, if a company of 7 men and 6 ladies should be seated around a

circular table, such that two ladies may not be seated next to each other? How many seating plans are possible, if the company has of 6 men and 6 ladies?

4. In a tasting of 5 types of beer 5 glasses are put in a row op. How many possibilities are there for the place of

the glasses?

5. How many 3-ditgit numbers are there, where all the digits are different? How many 3-ditgit numbers are there, where exactly two digits are equal?

6. The registration number (the license plate) of a car (in Denmark) consists of two letters (A-Z) and five digits.

E.g. OS 52911 or RU 45710. How many different license plates can be made. Given the two letters, how many plates are there when all the digits are different?

7. As mentioned there are 312 different ways to fill in a coupon (won, tie, lost) playing on the domestic football

games. One of the coupons have all 13 correct. How many coupons have 12, 11 10 correct?

Finite probability fields 6

Chapter II. Finite probability fields

1. Introduction In daily life the concept of probability is heavily used, also when it has nothing to do with the mathematical definition of the word. “The probability that…” is often used synonymous with “there is a chance that…”, or “ It might..” In the mathematical theory, however, probability is always a proper fraction or a decimal number between 0 and 1, or what is the same, as a percentage less than or equal to 100%. When one uses the concept of probability, it should, however, always connected with an uncertainty whether an event will occur or not, e.g. the chance that you win at the roulette or the probability that the metro will run without delay tomorrow. In each case the use of the term “probability” is an estimate, however, based on some insight, whether an event will occur or not In mathematics the of concept of probability forms a ”model” of an “experiment” the outcome of which is partly unknown, within some definite boundaries. For example: The probability of getting 6 eyes when rolling a dice or the probability to get 3 of a kind in a hand of poker. In the mathematical theory one is actually not concerned with establishing the probabilities for the outcome of an real life experiment, but once the probabilities are given (right or not), the purpose of probability theoty is to draw some logical consequences. The conclusions are often far reaching and often unexpected. On this issue, we shall be occupied in the following. But first we shall establish the mathematical model of a stochastic experiment. This could for example be a model of real life experiments as rolling a dice, or picking a card from a deck of cards.

2. A finite probability field A finite probability field (U,P) is an abstraction of a stochastic experiment. It consists of a finite set U, and a function P defined on U called the probability (or probability function), having the following properties:

U = {u1, u2, u3, …, un}, called the outcome space, is the set of outcomes of an experiment, and the elements in U, are called outcomes.

P(u) is denoted the probability of an outcome (occurs)

For all uU applies: 0 P(u) 1 and : P(u1) + P(u2) + P(u3) + …+ P(un) =1

This may be formulated as follows:

The probability of an outcome from an abstract experiment is always a number between 0 and 1, and the sum of the probabilities is equal to 1.


These are the only conditions necessary to establish a finite probability field. The mathematical part of the theory is, however, neither concerned with establishing the initial probabilities of an experiment, nor it has anything to do with real life experiences.

Examples

1. When rolling a dice the outcome space is U = {1, 2, 3, 4, 5, 6}. We only assume that the outcomes have the

same probability. According the presuppositions above, we must therefore have: 6∙P(u) = 1, from which follows that P(u) =1/6 for all outcomes.

2. Throwing two coins (heads or tails). Here we may establish the outcome space. If we abbreviate h (head) and t

(tail), then U = {(h, h), (h, t), (t, h), (t, t)}. Assuming that these 4 outcomes are equally likely, the probability of each of these outcomes is P(u)= ¼.

3. When playing at the roulette in a Casino there are 37 fields numbered 0…36. The outcome space is therefore

U = {0, 1, 2, 3…36}. It is the common supposition that the fields have equal probability (and it ought to be so), so according to the presuppositions above, the probability that the ball will end in any of the 37 fields should be P(u) = 1/37.

4. When rolling two dices (a green one and a red one), we may establish the outcome space, writing down the

outcomes (eyes on the green dice, eyes on the red dice), which are considered to be equally likely. With this notation the outcome space becomes

U = {(1,1), (1,2), ….(1,6), (2,1), (2,2),…(2,6)…..(6,5), (6,6)}

Since there are 6 possibilities for the green dice as well as for the red dice there must be 6∙6=36 outcomes, and the probability of each outcome must therefore be P(u) = 1/36.

3. Events An event H is formally defined as a subset of the outcome space. Since an event is a set, it is always written with a capital letter, whereas an outcome is always written with a lower case letter. If we have an outcome space:

U = {u1, u2, u3, …, un} Then three events might be A = {u2, u7, u8}, B = {u4}, C = {u1, u7}. Events are most often characterized in words. When you roll a dice, where the outcome space is

U ={1, 2, 3, 4, 5, 6}

one can formulate various events as:

A: Getting an even number of eyes. A = {2, 4, 6} B: The dice does not show 6 eyes. B = {1, 2, 3, 4, 5} C: The eyes are greater than 4. C = {5,6}

Intuitively, even in public school, and without knowledge of probability theory, most people would settle for the probabilities of three events mentioned above: P(A) = 3/6 = 1/2, P(B) = 5/6 and P(C) = 2/6 = 1/3


In probability theory, the challenge is often, however, using combinatorics, to decide which outcomes belong to a certain event. In the previous examples it was trivial, but often this is far from the case. The probabilities stated in the example, when rolling a dice are perfectly right, and this leads to the following definition, assuming that we have an arbitrary outcome space U = {u1, u2, u3, …, un}

By the probability of an event P(H) one should understand the sum of probabilities P(u) of the outcomes u that belong to the event H.

So if U = {u1, u2, u3, …, un}, and A = {u2, u7, u8}, then P(A) = P(u2) + P(u7) + P(u8) One may easily convince oneself that this definition of an event is in full accordance with the probabilities of the events A, B and C in the “rolling a dice” example above. Since both the outcome space itself, and the empty set Ø ={} formally are subsets of U, it leads to two special events.

U is called the safe event , and P(U) = 1, according the definition of an event. Ø is called the impossible event, and P(Ø) =0, according the definition of an event.

3.1 Using summation signs When operating with events in probability theory it is practical, if not necessary, to use the so called summation symbol . The summation symbol is merely an abbreviated way of writing a sum of indexed terms. Below we show some examples.

10

1

4

1

3

1

2

1110

2

...n

n

Hu

uPHP )()(

The first summation sign displays the sum of the fractions from 1/2 to 1/10. n = 2 is called the lower limit, and 10 is the upper limit. The meaning of the summation symbol should otherwise be obvious. In the second summation sign there are no lower or upper limits. Instead it should be understood as we add all probabilities for the outcomes that belong to H.

3.2 Symmetric outcome spaces

In all the examples above we have considered outcome spaces, where all outcomes have the same probability. Such an outcome space is called symmetric In a symmetric outcome space having n outcomes, the probability of each outcome is

P(u) = n1

This follows from the presupposition that the sum of probabilities is 1.


In a symmetric outcome space U, one calculates, according to the definition of an event, the probability of an event H, (having q elements), as q divided by n, (the number of elements in U).

nq

nnnnuPHP

Hu

1...

111)()(

In this connection, one has a manner op speaking, since the outcomes belonging to H are called favourable outcomes (or just favourable) and the outcomes belonging to U are called possible outcomes (or just possible). By the same token, we shall from now on, when we are dealing with a symmetric outcome space, write n(U) for the number of elements in U and n(H) for the number of elements in H. This results in some formulas, which are easy to remember each time one must calculate the probability of an event in a symmetric outcome space.

possible

favourableHPjustor

outcomesposssible

outcomesfavourableHP )()(

)(

)()(

Un

HnHP

Examples

1. First we consider drawing a single card from a deck of cards consisting of 52 cards. We want to find the probabilities of the following events: A: We draw a spade. B: We draw a picture card. C: We draw an ace. The outcome space is obviously symmetric, and since n(U) = 52 then P(u) = 1/52, and for the three events we have: n(A) = 13, n(B) = 12 and n(C) = 4. Thus we find for the three probabilities:

4

1

52

13

)(

)()(

Un

AnAP

13

3

52

12

)(

)()(

Un

BnBP

13

1

52

4

)(

)()(

Un

CnCP

2. We now look at a simultaneous roll two dices (a red one and a green one). We wish to find the probabilities P(2), P(3), …, P(12) that the dices together show 2 eyes, 3 eyes,…, 12 eyes. We have previously established the symmetric outcome space as: U = {(1,1), (1,2), ….(1,6), (2,1), (2,2),…(2,6)…..(6,5), (6,6)}. Since this outcome space is symmetric, and it has 36 elements, The probability of each outcome is P(u) = 1/36. The calculation of the probabilities is then reduced to find the number of outcomes in each event. Outcomes {(1,1)} {(1,2), (2,1)} {(1,3), (3,1), (2,2)} {(1,4), (4,1), (2,3), (3,2)}

Probability P(u) P(2) =

36

1 P(3) =

18

1

36

2 P(4) = 12

1

36

3 P(5) =9

1

36

4

Outcomes {(1,5), (5,1), (2,4), (4,2),(3,3)} {(1,6), (6,1), (2,5), (5,2), (3,4), (4,3)}

Probability P(u) P(6) =

36

5 P(7) =

6

1

36

6


In a quite similar way we find the probabilities P(8)…P(12). Since they are symmetric with respect to P(7)

P(8) =36

5 , P(9) =9

1

36

4 , P(10) =

12

1

36

3 , P(11) =

18

1

36

2 , P(12) =

36

1

We are then able to define a new (not symmetric) outcome space, shown in the table below. Such a table is often denoted as the probability distribution.

2 3 4 5 6 7 8 9 10 11 12

36

1

36

2

36

3

36

4

36

5

36

6

36

5

36

4

36

3

36

2

36

1

Finally a “stick diagram” is often used as a graphical representation of the probability distribution.

2 3 4 5 6 7 8 9 10 11 12

3. The birthday problem:

The birthday problem is actually one of most frequent examples to illustrate elementary probability theory. Suppose we have a class with 24 students. If you make a bet, whether at least two persons in the class have birthday on the same day of the year, what are the odds that you may win the bet? Said in another way: What is the probability that at least to students in the class have birthday on the same day of the year? We shall generalize the problem, changing the number of student from 24 to n. Solving problems in probability theory it is sometimes easier and more advantageous to calculate the probability of the complementary event to an event. The complementary event to an event H is read as non H and it is usually written as H with a bar over. If H is the event that at least two persons in the class have birthday on the same day of the year we shall therefore first find the probability of complementary event: That no one has birthday on the same day of the year. Since the union of an event and the complementary event is equal to U, and as they have no common elements, we must have:

)(1)(1)()( HPHPHPHP

With this in mind, we shall calculate the probability that among n persons, none have birthday on the same day. Since each person has 365 possibilities of having birthday, and since the persons are supposed to be independent of each other, the outcome space has 365365365…365 = 365n different possibilities of placing the birthdays on n persons. Similarly we find the number of possibilities that no one has birthday on the same day as: 365364363…(365-n+1) (n-factors), since there are 365 possibilities for the first person 365 -1 possibilities for the second person and so on. Since the outcome space is obviously symmetric, we may calculate the probability H of no coincidence i birthday, with the formula:

)(

)()(

Un

HnHP

365

1365...

365

2365

365

1365

365

365

365...365365365365

)1365...()2365()1365(365)(

nn

HP


Exercising some patience it is certainly possible to calculate this probability, say for n = 24, with a pocket calculator or even with a mathematical computer, but an analytic answer to this problem has been given long before the invention of electronic calculators. We shall now show that it is possible (applying a little mathematics) to give a simple expression for the probability for an arbitrary n. To do so, we divide the denominator 365 up in the numerator in each of the factors in the expression above, followed by taking the natural logarithm on each side.

)365

11(...)

365

21()

365

11(1

365

1365...

365

2365

365

1365

365

365)(

nnHP

)

365

11ln(...)

365

21ln()

365

11ln()(ln

nHP

Expanding the natural logarithm to the first order, we have the formula: 1)1ln( hforhh (meaning h < 0.1) (ln (1.1) =0.095) Applying this formula to each term in the expression above.

(1.5)

2

)1(

365

1)(ln

))1...(321(365

1

365

1...

365

3

365

2

365

1)(ln

nnHP

nn

HP

Our aim is then to determine n such that:

6931.0)(ln21ln)(ln

21)( HPHPHP

It gives the inequality:

96.505)1(6931.02

)1(

365

1

nn

nn

This is an algebraic inequality of second order:

0963,5052 nn

The solution to the corresponding quadratic equation is 22

23

2

451

n

And it gives the solution to the inequality: 23220963,5052 nnnn Everything so, when the number of persons is larger than 23, the probability for no coincidence of birthday between them is less than 50%, meaning that the probability of a least one coincidence of birthday is greater than 50% when the number of persons are greater than 23.

4. Manipulating events and probabilities For two events, that is, two subsets A and B of a set U, one may, as it is well known, form some new sets. They are:

The union AB are the element which belongs to either A or B. The intersection AB, are the element which belongs to both A and B. The surplus set A\B, are the elements in A, which does not belong to B.


The complementary set to A is the elements in U, which do not belong to A. These events are also called the “either…or event”, the “both …and event”, the “A but not B event” and the complementary event. Below are illustrated the four types of composite sets If A and B are events in an outcome space U, the four composite events AB, AB , A\B , A are also events. Within the framework of probability theory we express this as: The union event AB occurs if at least one of the events A or B occurs. The intersection event AB occurs if both A and B occur. The surplus event A\B occurs if A occurs, but B does not. The complementary event A occurs if A does not occur. There are some rules for calculating the probabilities of composite events, and it is most appropriate to write these rules with the help of summation sign, as they imply adding the probabilities of outcomes belonging to an event. Using this notation we have:

Au

uPAP )()( ,

Bu

uPBP )()( ,

BAu

uPBAP

)()( ,

BAu

uPBAP

)()( ,

BAu

uPBAP\

)()\( ,

CAu

uPAP )()(

We may then derive the following rules:

BAu

uPBAP

)()( = Au

uP )( + Bu

uP )( - BAu

uP

)(

When we add the probabilities of the outcomes that either belongs to A or belongs to B, we will count the outcomes which belong to both A and B twice, therefore we must subtract these outcomes. This gives the formula:


P(AB) = P(A) + P(B) – P(AB) In other words: The probability that either A or B occur, is the probability that A occurs plus the probability that B occurs, minus the probability that they occur simultaneously. If AB = Ø, so that A and B have no common outcomes, the two events are said to exclude each other. In that case the formula becomes simpler, since P(Ø) = 0

P(AB) = P(A) + P(B) In the same manner we find:

CAu

uPAP )()( = Uu

uP )( - Au

uP )(

P( A ) = 1 – P(A) The probability that A does not occur is 1 minus the probability that A occurs.

BAu

uPBAP\

)()\( = Au

uP )( - BAu

uP

)(

P(A\B) = P(A) – P(AB) The probability that A occurs, but B does not occur is the probability that A occurs minus the probability that A and B occur simultaneously.

Examples

1. We shall first consider the experiment of drawing a card from a deck of cards. We want to find the probabilities of the following events: A: You draw a picture card or a spade. B: You do not draw a picture card. C: You draw a picture card, but not in hearts. According to the formulas above for the probabilities of composite events, we have, if we notice that there are 12 picture cards, 12 cards of hearts and 3 picture cards of spade.

P(A) = P(Picture card) + P(spade) – P(Picture card of spades) = 52

22

52

3

52

13

52

12

P(B) = 1 – P(Picture card) = 13

10

52

40

52

121

P(C) = P(Picture card) - P(Picture card of hearts) = 52

9

52

3

52

12

2. We shall next consider rolling with two dices, and we look at the following events: A: The two dices show the

same eyes. B: The two dices show different eyes. C: The red dice shows more eyes than the green one. D: None of the dices show 6 eyes. E: We get 6 eyes on at least one of the dices. With the rules stated above, we have:

P(A) = 6

1

36

6 and P(B) = 1 - P(A) =

6

5

6

11

In half of the 36-6 outcomes the red dice shows more eyes than the green dice, so the answer is:

P(C) = )36

15(

12

5)(

2

1BP


There are 55 =25 outcomes, where none of the dices show 6 eyes, therefore

P(D) =36

25 and P(E) = 1 – P(D) =

36

11

36

251

5. Samples without replacement Making a sample without replacement is a classical discipline in probability theory. Most often it is illustrated by pulling at random, coloured balls from an otherwise closed basket, and calculating the probability of getting various combinations. The technique is however best illustrated by doing some examples. In the examples below, we shall make extensive use of the formula:

)!(!

!),(

qnq

nqnC

Which is the number of ways to select a combination of q elements (q-subset) from a set having n elements (a n-set)

5.1 Examples

1. In a class there are 10 boys and 12 girls, and they must form a committee with 4 members. We shall determine the probability for the following events: A: The committee has girls only. B: The committee has two boys and two girls. C: There is at least one boy in the committee. To calculate these probabilities, we shall use the formula for a symmetric outcome space:

)()(

)(UnHn

outcomesPossible

outcomesFavourableHP

The possible outcomes are 73151957111234

19202122

)!422(!4

!22)4,22(

C

And 4 girls can be selected in C(12,4) = 495 different ways. So we have:

0677.07315

495

)4,22(

)4,12()4()(

C

CgirlsPAP

In the same manner we get:

4060.07315

6645

)4,22(

)2,12()2,10()girlstwoand2boys()(

C

CCPBP

P(C) = P(At least one boy) = 1 - P(4 girls) = 1 - 0.0667 = 0.9333

3. In a drawer we find disorderly 6 blue, 8 grey and 10 black socks. One persons takes (without controlling the colour)

3 socks hoping that he will get two in the same colour. We shall calculate the probabilities:

A: You get socks in 3 different colours. B: You get either 3 blue, 3 grey or 3 black socks. C: You get at least 2 socks in the same colour.

The number of possible outcomes is: C(6+8+10,3) = C(24,3) = 202422234123

222324


P(A) = P(3 different colours) = 2372.02024

480

)3,24(

1086

C

P(B) = P(3 of the same kind) = 0968.02024

196

2024

1205620

)3,24(

)3,10()3,8()3,6(

C

CCC

P(At least 2 in the same colour) = 1 - P(3 different colours) = 1 - 0.2372 = 0.7628

5.2 Example. Probabilities in Lotto games In Denmark the weekly online Lotto game is about, in one row to guess 7 numbers out of 36. Besides the 7 drawn lotto numbers, there is also drawn an additional number (an add number). One may achieve a premium, in the following manners:

1. Premium: Having all the 7 drawn numbers. 2. Premium: Having 6 right numbers plus an add number. 3. Premium: Having 6 out of 7 right numbers 4. Premium: Having 5 out of 7 right numbers. 5. Premium: Having 4 out of 7 right numbers.

We shall begin by calculating the probabilities of these events, when you have filled in one row. The numbers of different ways one may chose a combination of 7 numbers out of 36 is:

(1.3) 680,347,81234567

30313233343536

)!736(!7

!36)7,36(

C

Accordingly the probability P(7) of having the 7 right numbers in one row is:

7101979,17)7,36(1

)7( pC

P

When we calculate the probability of having 6 right numbers and one right add number, we reason as follows. The 6 correct numbers can be chosen among the 7 numbers in C(7,6)=7 different ways, whereas the add number may be chosen in one way only. Therefore

710330.877)7,36(1)6,7(

)16(

pC

CaddP

In the same manner we find, (by the multiplication principle) the probability of having 6 right numbers. The 6 right numbers can be chosen in C(7,6) = 7 different ways, and the “wrong” number may be chosen among 36 -7 – 1 = 28 numbers. Therefore:

7196)7,36(28)6,7(

)6( pC

CP

= 2.3479·10-5

The probability of obtaining 5 right numbers is the number of ways 5 numbers can be selected from 7 numbers, which is C(7,5) = 21 times the number of ways we can select 2 numbers from 36-7 =29 numbers, which is C(29,2)=406.

78526)7,36()2,29()5,7(

)5( pC

CCP

= 1.02·10-3 =1.02 /

In the same manner we calculate the probability of having 4 right numbers. The number of possibilities are: C(7,4) ·C(29,3)

7890,127)7,36()3,29()4,7(

)4( pC

CCP

= 0.0153 = 1.53%


From the calculated probabilities above we see that the probability of having more than 4 correct numbers is immensely small. On the other hand there is still a fair chance of getting 4 right numbers. This is of course a deliberate choice from the administrators of the game since, if a lotto player never wins, most (non addicted) players will stop playing after a certain time. A lotto coupon has 10 rows, and we shall first calculate the chance P(H) of winning on at least one of the rows. We do it by calculating the probability of the complementary event )(1)( HPHP . Since the 10 rows are independent, the probability that none has 4 right numbers is P(not 4 right numbers on any of the 10 rows) = (1-p4)

10 = 0.984710 =0.8557 The chance of having at least one row with 4 right numbers is therefore:

P(at least on coupon with 4 right numbers) = 1 – 0.8557 = 0.1443. If we assume that someone buys 10 rows, for 10 weeks, the probability of not winning is (0.8557)10 = 0.4487. So having bought a coupon 10 weeks the probability of winning is: P(at least one coupon with 4 right numbers in 10 weeks) = 1- (0.8557)10 = 0.5513. But it is worth mentioning that the premium of having 4 right numbers is of the same magnitude or less, than the cost of 10 coupons. Exercises

8. Let P be a probability function belonging to an outcome space U = {1,2,3,4,5}. Find P(5), if

a) P(1) = P(2) = 0.1 and P(3) = P(4) = 0.2 b) P(1) = P(2) = P(3) = P(4) = P(5) c) P(1) =0.4 and P(2) = P(3) = P(4) = P(5)

9. Throwing three dices, you should find the probability that the dices do not show three equal.

10. By throwing two dices find the probability that only one show 6 eyes.

11. A car driver must, on his way to work, pass through four traffic lights. It is assumed that there are equal

probability of having green and red light.

a) How many possibilities are there for having red and green light along the route? b) What is the possibility of meeting 4 green lights? c) What is the possibility of meeting 3 green and 1 red lights? d) What is the possibility of meeting 2 green and 2 red lights?

12. Find the probability of getting 13, 12 11 or 10 right, when filling in a pools coupon having 13 rows, each with 3 choices (won, tie, lost).

13. Poker. From a deck of cards 5 cards are drawn. How many possibilities are there? How many possibilities are there of getting 5 cards in the same colour (a flush)?

What is the probability of getting a flush?

14. In a jar there are 6 red and 4 white balls. Three balls are drawn from jar at random. Find the probability that:

a) Three red balls are drawn. b) At least two white balls are drawn. c) Exactly one white ball is drawn.


15. (A tough one). At a tasting of wine 5 glasses are placed in a row, where you are supposed to decide where the wine comes from. On the supposition of pure guessing, what is the probability of getting 0, 1, 2, 3, 4 and 5 right. It is a tough one, since you must find out, in how many permutations, where there are one, two, …five elements which correspond to themselves

Conditional probabilities 18

Chapter III. Conditional probabilities

1. Definition of a conditional probability Earlier in the Danish 9 – 12 grade high school (Gymnasium) the students could choose between entering the mathematical branch or the linguistic branch. In one Gymnasium 56% of the students are mathematicians and 44% are linguists. Among the mathematicians 60% are boys, whereas among the linguists there are 25% boys. This is schematically illustrated below. Mat. 0.56 Ling. 0.44 Boys

Boys: 0.60 Boys: 0.25

Girls: 0.75

Girls: 0.40 Then we consider an experiment, where we at random select a student. We shall introduce some shorthand notations for various outcomes of the experiment.

M: A mathematician is selected. L: A linguist is selected. B: A boy is selected. G: A girl is selected

Statistically we would expect the first two probabilities: P(M) = 0.56 and P(L) = 0.44. However we can not directly find the probabilities P(B) and P(G) for selecting a boy or a girl, since the two events are distributed among the events mathematician or linguist. However, if a boy is selected only among the mathematicians the probability is 0.60. This we express in the following manner: The conditional probability to select a boy, on the condition that he is a mathematician is 0.60 This is an example of what we in probability theory call a conditional probability and it is written symbolically as:

P(B | M) = 0.60 The vertical line between the two events reads: ”Given” or ”On the condition that”, and the probability P(B | M) is then read as:


P(B | M): The probability of selecting a boy, given he is a mathematician. Or: P(B | M): The probability of selecting a boy on the condition that he is a mathematician. In the same manner we may define the probabilities:

P(G | M) = 0.40 , P(B | L) = 0.25 , P(G | L) = 0.75 We shall then look into, how we may establish conditional probabilities in a symmetric outcome space, using the probability P(B | M) as an example. As usual we use the notation n(H) for the number of elements (outcomes) in an event (subset) H, and n(U) for the total number of outcomes. The event ”A mathematician boy” is then written BM, and the probability P(D | M) of this event is as usual calculated as favourable outcomes/possible outcomes. That is, the number of mathematical boys divided by the number of mathematicians.

)(

)()|(

Mn

MBnMBP

Dividing the numerator and the denominator of the right hand side of the equation with n(U), we get:

)(

)(

)(

)()(

)(

)(

)()|(

MP

MBP

Un

Mnun

MBn

Mn

MBnMBP

This leads to the definition: For two arbitrary events A and H Ø in any (not necessarily symmetric) outcome space, we define the conditional probability of A given H (that is, H has occurred) as:

)(

)()|(

HP

HAPHAP

The probability for A: given H, is the probability that both A and H occur, divided by the probability that H occur. Often the formula is used when multiplying with P(H) P(AH) = P(A | H) P(H) Since AH = HA, then we must also have P(AH) = P(HA) and consequently:

P(A | H) P(H) = P(H | A) P(A) )(

)()|()|(

HP

APAHPHAP


This (controversial) formula is called Bayes formula. Mathematically it is completely straight, but in some cases, when applied to probabilities of events in real life, it seem to violate the principle of causality, in the sense that it can predict the probability of an event that has occurred, which does not make sense. If P(A) is the probability of an accident when driving on the highway in winter times, and P(H) is the probability of an icy road, then P(A|H) is the probability of having an accident if the road is icy. So far so well, but if we use Bayes’ formula without reservations, then when it is turned the other way round to calculate P(H|A), which is mathematically perfectly legal, it predicts the probability of a icy road, given there has been an accident. The crucial point is of course, that when the accident has happened, we know whether the road was icy or not, and therefore the notion of probability is meaningless. Statistically, however, it will probably be true that the number of accidents on icy roads corresponds to the statistically percentage of days with icy roads. It is true that Bayes’ theorem has caused much (mostly philosophical) debate, of which we shall not participate, but in probability theory it is established as a most fundamental theorem.

Examples: 1. A card is drawn from a deck of cards. You are informed that it is a card of hearts (H).

Find the probability that it is a picture card (B). The question may be answered in two different ways. The number of picture cards in hearts is 3 and there are 13 cards of hearts. From the definition of probability

the answer is therefore: P(B | H) = 13

3 .

But the question may also be calculated from the formula defining conditional probabilities:

13

3

521352

3

)(

)()|(

HP

HBPHBP

2. For families having 3 children, we want to find the probabilities that:

a) Among the children there are both a girl and a boy (A). For the gender of 3 children there are 23 = 8 equal possibilities, since there are 2 possibilities for the first child and so on.

There are two outcomes (all boys and all girls), where this is not the case, so P(A) =4

3

8

6 .

b) Same question, but where it is informed that they have a girl. (G). Thus we must determine the probability P(A | G), where we notice that AP = A , n(A) = 6 and n(G) = 7 (8 minus the outcome of 3 boys).

7

6

8

78

6

)(

)()|(

GP

GAPGAP

3. This example does not really concern conditional probabilities, but erroneously is ascribed to conditional

probability. On the contrary, it is a notorious riddle, which also have appeared the magazine for teachers in the Danish 9-12 grade high school, where it rather surprisingly has given rise to lengthy debates! The problem can for example be stated as: There are three boxes. The two of them are empty, while the third one contains a gold coin. A person is asked to select one of the boxes. The box is not opened, but after the choice has been made, one of the remaining boxes (which does not have the gold coin) is removed. The person is asked if he wants to redo the selection? Within the framework of probability: Should he undo the selection or not, or is it without importance?


Most people are perhaps inclined to answer that it is does not matter, since the gold coin may be in any of the three boxes. But this is erroneous for the following reasons. By the first choice a box was chosen with probability 1/3. The probability the coin is in one of the other two boxes are therefore 2/3. These probabilities can not be changed, by removing a box, so the probability that the coin is in one of the two remaining boxes is still 2/3. Since the coin is not in the box that has been removed, then the probability that the coin is in the remaining box must be 2/3. Consequently, one should always redo the choice, since the probability of selecting the box with the coin is doubled from 1/3 to 2/3.

2. Developing probabilities on events We still need to answer the question in the leading example on conditional probabilities from the preceding section concerning the high school students. What is the probability to select a boy or a girl, since this probability is masked in some conditionally probabilities. So, we set out for finding the probability P(B) that the selected student is a boy. We know the conditional probabilities P(B | M) = 0.60 and P(B | S) = 0.25, together with the probabilities that the student is mathematician or linguist: P(M) = 0.56 and P(L) = 0.44. We then ”expand” the subset B (boy) on the subsets M (mathematician) and L (linguist). B = (BM)(BL) A boy is either a mathematician or linguist, so the two sets (BM) and (BS) are disjoint. (having no common elements), and we may apply the simplified form for adding the probability of the union of the two sets. P(B) = P(BM) + P(BL) We then apply the formula for conditional probability: P(AB) = P(A | B)P(B) on each term. P(B) = P(B | M)P(M) + P(B | L) P(L) Generally this formula can only be applied when ML = U and ML =Ø in which case M and L are said to form a class division of U. The formula has an obvious generalization to a class division having n subsets. The subsets H1, H2, H3, … , Hn are said to form a class division of a set U, if it upholds the two conditions;

U = H1 H2 H3 … Hn and Hi Hj = Ø for every ij. It then follows: A=(A H1) (A H2) … (A Hn) => P(A) = P(A H1) + P(A H2) +…+ P(A Hn) And using the definition of conditional probability, we obtain a general and very important formula.

P(A) = P(A| H1) P(H1) + P(A| H2) P(H2) + P(A |H3)P(H3) + … +P(A |Hn)P(Hn) Using this formula, brings us in a position to calculate the announced probability P(B) of selecting a boy in the class, using the probabilities from the scheme. P(B) = P(B | M)P(M) + P(B | L) P(L) P(B) =0.600.56+0.250.44 = 0.446


Examples

1. We shall elaborate a little on the example of selecting a student. It is stated that the student is a boy. We wish to find the probability that he is linguist. What we wish to find is the probability P(L | B). Since we only know the inverse probability, we apply Bayes’ formula:

247.0446.0

44.025.0

)(

)()|()|(

BP

LPLBPBLP

2. The next two exercises are notorious ones, appearing in a mathematical textbook, that was used in the Danish

Gymnasium until the 1988 reform. Very few, (including teachers) succeeded in giving the right answer. There is 75% probability that a player will lie, when giving an answer. He rolls a dice, and he is asked if it showed 6 eyes. a) What is the probability that he answers yes? b) What is the probability that it showed 6 eyes, if he answered yes? c) Which answer (yes or no) gives the greatest probability that the dice actually showed 6 eyes. To find the answer we expand the probability on the events “6 eyes” or “not 6 eyes”, and use Bayes’ formula:

)()|()()()|()( APABPABPBPBAPBAP

a) P(yes) = P(yes | not 6) P(not 6) + P(yes | 6) P(6)

3

2

24

16

6

1

4

1

6

5

4

3)( yesP

b) 16

1

3

26

1

4

1

)(

)6()6|(

)(

)6(

)(

)6()|6(

yesP

PyesP

yesP

yesP

yesP

yesPyesP

c) 8

3

3

21

6

1

4

3

)(

)6()6|(

)(

)6(

)(

)6()|6(

noP

PnoP

noP

noP

noP

noPnoP

3. Olga makes a certain game of patience, which statistically comes out every twentieth time. But she has noticed that if she makes a little certain harmless cheat there are equal chances that it comes out, as the opposite. Olga’s game of patience on the average comes out every tenth time. What is the probability that Olga cheats the nest time she sets up a game of patience? The answer to this riddle is a bit tricky, because you may not write the answer as the left side of an equation, as in the pervious examples. First we write down the probabilities that appears from the formulation. P(comes out| no cheat) = 0.05. P(comes out | cheat) = 0.5, P(comes out) = 0.1. We want to find the probability P(cheat). So we write down an equation which contains the information we have, except for one unknown.

))(1(05.0)(5.01.0)(05.0)(5.01.0

)()|()()|()(

cheatPcheatPcheatnoPcheatP

cheatnoPcheatnooutcomesPcheatPcheatoutcomesPoutcomesP

The last equation is solved to give: P(cheat)= 1/9. Exercises 1. Two cards are drawn from a pile consisting of aces and king from a normal deck of cards. Find the probability

that both cards are aces, when it is informed that a) On card is an ace, b) One card is a black ace, c) One card is an ace of spades.


2. A physician thinks that there are 80% probability that a patient suffers from the disease S. He submits the patient to a clinical test, where there are 90% probability of a positive reaction if the patient has the disease S and 15% probability of a positive reaction if the patient does not have the disease S. a) What probability does the physician thinks there are for a positive reaction of the test? b) The patient has a positive reaction. With what probability do the physician now thinks that the patient suffer from the disease.

3. 20% of the population are left handed. From the left handed 45% are right/left confused while among the right handed persons, it is only 15%. Find the possibility that a person selected at random is right/left confused. A person selected at random turns out to be right/left confused. Find the probability that he is right handed.

3. Independent events We have not yet presented a general formula for the ”both…and” event, that is, to calculate the probability P(AB), when P(A) and P(B) are known. Without resort to probability theory most people would probably multiply the two probabilities.

But this is only true if the two events A and B are independent of each other.

Example

If you draw a card from a deck of cards, and look at the events: A: A picture card. B: A red picture card. C: A card of hearts. Then both events AC and BC are a picture card of hearts.

P(AC) = P(BC) = 3/52

However: P(A) P(C) = 12/52 13/52 = 12/52 ¼ = 31/52 = P(AC),

while P(B) P(C) = 6/52 13/52 = 3/261/4= 3/2 1/52 P(BC)

So the two events A and C are independent of each other, while B and C are not. This is quite reasonable, since there is the same number of picture cards in all suits, whereas only half of the red cards are hearts. We may get a better understanding of this, if we review the definition of conditional probability. P(AB) = P(A | B) P(B) Which is the correct method of calculating P(AB) in all cases. The concept of independent events rely on the following definition..

An event A is said to be independent of an event B, if and only if: P(A) = P(A | B). Replacing this expression for P(A) in the definition of the conditional probability we have: P(AB) = P(A| B) P(B) = P(A) ∙P(B) So two events are (in the mathematical sense) independent if

P(AB) = P(A) ∙P(B) Sometimes it can be quite difficult to decide, whether two events are independent or not. The question can however be settled from calculating P(AB) and compare it to P(A)∙P(B).


However, if the aim was to calculate P(AB) from the knowledge of P(A) and P(B), you must find another way to settle the question of independence. Since the formula P(AB) = P(A) P(B) is symmetric in A and B, then if A is independent of B then B is also independent of A. We hereafter say that the two events are independent (of each other).

Examples

1. It is a common fallacy (among non mathematicians) to assume that if you for example throw a coin, and if you have had 5 occurrences of tails, then there is a greater probability to throw heads in the 6th throw. But this is not the case, (if you don’t believe in the existence of a godness for luck). Any throw with a coin is independent of the previous throws, and it has the same probability. A quite similar fallacy occurs, when playing at the roulette or just play on red and black. It is not very likely that red will come out 8 times in a row, but it happens, (also for players using the Martingale system, doubling up on red each time black comes out). The probability is easily calculated as 1/2 ∙ 1/2 ∙…. ∙1/2 = 1/28 = 1/256. Similarly it is not unusual to presume that the probability of getting a boy child is greater, if you already have 4 girls. (The opposite is perhaps even more likely due to genetic causes). It is certainly unlikely to throw tails 5 times in a row, but it is just as likely as throwing any other sequence.

More precisely we have P(5 tails) = 1/25 =1/32, P(at least 1 head) = 1 – P(5 tails) = 32

31

32

11

2. The probability of getting 6 eyes, when rolling a dice is 1/6. On the average one should therefore expect to

obtain at least 6 eyes in 6 rolls. We shall determine the probability not to roll 6 eyes once in 6 attempts. The probability of not throwing 6 eyes is = 5/6. Since the 6 throws are assumed to be independent of each other the

answer must be: 5/6∙5/6∙5/6∙5/6∙5/6∙5/6. So the result is: P(no 6 in 6 rolls) = 335,06

6

5

3. Some times it can be quite difficult logically to decide, whether two events are dependent or not. In connection

with rolling two dices we consider the following events:

A: First throw shows 4 eyes. B: The sum of the eyes on the dices is 7. C: The sum of the eyes on the dices is 5. Then we find:

36

1)(

36

1)(

9

1

36

4)(

6

1

36

6)(

6

1)( CAPBAPCPBPAP

From this we conclude that A and B are independent, but not A and C This can be explained as, no matter what the first throw is, it will be possible the make the sum 7 eyes, so they are independent, but if the sum must be five it will exclude 5, and 6 in the first throw, so they are dependent.

Probability distributions 25

Chapter IV. Probability distributions

1. Repetition of an experiment We shall now look at a situation where an experiment is repeated a certain number of times and where each performance is independent of the preceding experiments. When doing the series of experiments, we are however only interested in whether a certain event occurs or not. More specifically we take as an example a repeated throw with a dice and focus on the event that the dice shows six eyes or not. This choice of example, however, makes no constraint on the general results, since it can readily be generalized to a repetition of any experiment which complies with the demand of independence. If the experiment is repeated n times, the probability distribution is the determination of the probabilities P(0), P(1)…P(n), that event in question occurs 0…n times.

2. An example of the Binomial Distribution The binomial distribution is a mathematical model of a repetition of an experiment n times, where the individual experiments are independent of each other. The aim is to establish the probabilities that a specific event occurs 0…n times. Before we enter the general formulas, we shall consider the, where a dice is thrown 12 times, calculating the probability that the dice shows six eyes exactly two times. Throwing a dice once, we have a finite probability field (U1, P1) with the outcome space U1 = {1,2,3,4,5,6}.

If A is the event {6} and A = {1,2,3,4,5} is the complementary event, then obviously P1 (A) =1/6 and P1 ( A )=5/6. If we imagine that the experiment is carried out 12 times, then the outcome space U consists of the outcomes in each of the 12 repetitions of the outcome space U1. U = (u1,u2,u3,...u11,u12) , where each of the outcomes ui is one of the outcomes {1},…,{6}. The outcome space has 612 elements, namely 6 for each repetition, and this is often written formally as: U = U1xU1x...xU1 = U1

12. We shall first look at the event A4: The dice shows six eyes in the 4th throw. As the 4th throw is independent of the other 11 throws, we have for the probability of that event P(A4) = P1 (6) = 1/6 . Correspondingly we have for the event A7: The dice shows 6 eyes in the 7. throw that: P(A7) = P1 (6) = 1/6. Then we look at the event A4 A7. The dice shows 6 eyes both in the 4th and the 7th throw. We should notice, (and that is important), that we have assumed that the outcomes of the throws are independent of each other, and consequently we may find the probability of the “both…and” event multiplying the probabilities of the two events.

P(A4 A7) = P(A4 )P(A7 ) = 361

6

1

6

1

Using the same arguments, we may determine the probability of the event:


A4 A5 A7 8A 10A , as P(A4 A5 A7 8A 10A ) = 1/61/61/65/65/6 = (1/6)3(5/6)2 Namely the probability 1/6, for each time we get 6 eyes, and the probability 5/6 for each time we do not. We are interested in the event: Exactly two times we get 6 eyes in 12 throws. One possibility could be 6 eyes in the first and the second throw, and not 6 eyes in the following 10 throws. The probability of this specific event is according to the considerations above: P(6 eyes in the 1. and 2. throw and not 6 eyes in the 3 -12 throw) = (1/6)2∙(5/6)10

The two outcomes where we get 6 eyes could however also be in the 4th and the 7th throw or in any pair of numbers selected from the12 repetitions. For the event: That the dice shows 6 eyes in exactly two out of the 12 throws, it is immaterial in which pair of throws it occurs. To find the probability of the event: The dice shows 6 eyes exactly two times in 12 throws, we need to add the probabilities of the outcomes, where this is the case. Since they have the same probability, we only need to find the number of such events. We must therefore ask the question in how many different ways can one select 2 places out of twelve. This is however the same question that we have encountered before: The number of different ways to select a q-subset from a n-set, and the answer is well known as:

!)!(

!),(

qqn

nqnC

We may then write the sought probability: P(The dice shows 6 eyes exactly two times in 12 attempts) = C(12,2)*(1/6)

2*(5/6)

10 For this reason the numbers C(n,q) are often called binomial coefficients, and in probability theory they are almost always written with the symbol:

q

n Being the same as ),( qnC

The new symbol is read (in Denmark) as: “n over q” .(Not to be confused with “n divided by q”) The probability that an event occurs exactly q times in n attempts, is called for a binomial probability, and the probabilities together are called the binomial distribution. If X = 2 is the number of times the event “6 eyes” occur, and we use the new symbol for the binomial coefficients, we may write:

102102

6

11

6

1

2

12

6

5

6

1

2

12)2(

XP


The probability can be evaluated to: P(X = 2) = 0.2961. In exactly the same manner we can establish the probabilities: P(X = j); j = 0, 1, 2,...,12.

jjjj

jjjXP

1212

6

11

6

112

6

5

6

112)( ; j = 0, 1, 2, ...,12

In this manner we have established a probability field with an outcome space U = {0, 1, 2,...,12}, corresponding to that the dice shows 6 eyes exactly 0, 1, …,12 times. The probability field is not symmetric, however, since (most of) the probabilities are different Although it would otherwise be surprising, we have not formally proved that the probabilities above form a probability field, that is 1)(0 uP and 1)(

Uu

uP . However, this will be done below,

when we have treated the general form of the binomial distribution.

3. The general binomial distribution We now generalize to the case, where we have n independent versions of an experiment. We are interested in whether an event A occurs in the j’th attempt or not. As usual in probability theory, we write the complementary event to an event A as A . We put P(A) = p, which is also called the primary probability. Hence P( A ) = 1 - p The probability that A occurs exactly “j” times out of n, and that A does not occur in the remaining "n - j" attempts, may be calculated in the same manner as we did in the more specific example of throwing a dice 12 times as: pj (1 - p)n-j . To find the probability that A occurs exactly j times, independently of where in the sequence it happens is therefore pj (1-p)n-j times the number of different ways we can select a j-subset from an n-set. The answer is as we have used it many times already:

!)!(

!),(

qqn

nqnC

j

n

Thus we obtain the general expression for the binomial distribution

jnj ppj

njXP

1=)( ; j = 0, 1, 2, ..., n

4. The binomial distribution and Pascal’s triangle Pascal’s triangle is named after Blaise Pascal, a French mathematician and physicist from the 17th century. He is often ascribed as the founder of probability theory, as well as constructor of the first mechanical calculator. (Unfortunately he gave up his brilliant scientific career at an early age, and dedicated the rest of his life to the study of theology).


Pascal’s triangle is in principle just an exposure of the binomial coefficients in a triangular scheme.

n

n

q

n

q

n

q

nnn

n

n

q

n

q

nnn

n

n

n

n

n

.....................11

....................0

1

1...............

1

1

1...............

0

11

........

146414

4

3

4

2

4

1

4

0

44

13313

3

2

3

1

3

0

33

1212

2

1

2

0

22

111

1

0

11

10

00

In the scheme above to the left, we have established the binomial coefficients in symbolic form ),( qnCq

n for n = 0…4. Then we have added two rows, corresponding to n – 1, and n.

To the right we have evaluated the binomial coefficients for n = 1…4 You should notice that along the sides of the triangle we have ones only.

This because 10

n, and 1

n

n, since there is only one way to select zero elements from a n-set.

You don’t have to watch Pascal’s triangle for a very long time to discover a very simple system, when creating a new row from the one above, since each number is the sum of the two neighbour numbers (the numbers to the right and to the left) in the row above. This should indicate that:

q

n

q

n

q

n 1

1

1 or C(n,q) = C(n-1,q-1) + C(n-1,q)

The formula can be proved algebraically, when we use the formula for C(n,q).

q

n

qnq

n

qqn

n

qnq

n

qqnqnq

n

qnq

n

qnq

n

q

n

q

n

)!(!

!

)()!1()!1(

)!1(

1

)(

1

)!1()!1(

)!1(

)!1(!

)!1(

)!()!1(

)!1(1

1

1

But it may also be proved using logic:


Let us assume that we have a set with n elements: U = {a1 , a2 , a3 ,…, an }. We shall then divide the q-subsets into two groups:

1. The q-subsets that a1 belongs to. There are then n – 1 elements left, and q -1 to be selected which can be don in C(n - 1, q – 1) different ways.

2. The subsets, that a1 does not belong to. Then we must select q elements from n -1 elements. This may be done in C(n - 1, q) different ways. Together they include all combinations, so:

C(n, q) = C(n - 1, q – 1) + C(n - 1, q) or

q

n

q

n

q

n 1

1

1 As asserted.

Next we shall look at a so called binomial: (a + b)n, where a and b are numbers, and n is an integer. Most students from public school, know how to write it down for n = 2: (a + b)2 = a2 + b2 +2ab. But what about n =4 or n =6? Before analyzing, we present the result below from n = 0..5. (a + b)0 = 1 (a + b)1 = 1a + 1b (a + b)2 = 1a2 +2ab +1b2 (a + b)3 = 1a3 +3a2b + 3ab2 + 1b3

(a + b)4 = 1a4 +4a3b +6a2b2 + 4ab3 + 1b4 (a + b)5 = 1a5 +5a4b +10a3b2+10a2b3+5ab4 +1b5 What you discover is that the coefficients (arranged in this manner) when evaluating the binomial are equal to the numbers in Pascal’s triangle. What might come as a surprise is actually rather obvious. For example if we look at (a + b)5. (a + b)5 = (a + b) (a + b) (a + b) (a + b) (a + b) Each term in the evaluation of the products comes from taking (b) from j parenthesis and (a) from 5 –j parenthesis and multiply them to get b ja5 – j. The number of different ways, one may select j

parenthesis out of 5 is of course C(5,j) =

j

5 . The same applies for all terms, which explains that

the coefficients of the different terms are the binomial coefficients.

50413223140555

5

4

5

3

5

2

5

1

5

0

5bababababababa

The formula is trivially generalized to evaluating (a + b)n, but it is then more suitable written by the help of a summation sign.

n

j

jjnn baj

nba

0

This formula is called the binomial formula.


And again we stress that the coefficients to the terms aj bn-j, when evaluating (a + b)n are equal to the nth row in Pascal’s triangle. We only need (formally) to establish that the binomial distribution is a probability distribution, that is, the sum of the probabilities P(X =j), j = 0 …n are equal to 1. This, however, is straightforward, once we have established the binomial distribution. Suppose namely that we have two numbers s and t for which s + t =1, and therefore t = 1- s. It then follows that (s + t)n = 1. We then apply the binomial formula to evaluate (s + t)n.

njjnnnnn tsn

nts

j

nts

nts

nts 0110 ......

10)(11

If we in this formula put t = p and s = 1 – p we find again the binomial distribution.

0110 )1(...)1(...)1(1

)1(0

1 ppn

npp

j

npp

npp

n njnjnn

From which it appears that the sum of the binomial probabilities is equal to 1 In a manner of speaking it is often referred to as a ”success”, when the primary event occurs (e.g. the dice shows 6 eyes), and a “failure” if it does not occur. (You should not associate any semantics to these expressions, since the primary event might be the probability that you get killed in a traffic accident, or the probability of getting divorced). The probability P(X = j) then means the probability of getting j “successes” in n attempts, and there exist an commonly used abbreviation b(j; n, p) or just bj , when n and p are understood. Often we are interested in the number j giving the largest probability, that is, for which j, bj = b(j; n, p) is largest. To do so, we look at the ratio bj /bj-1.

jp

jnp

ppj

n

ppj

n

b

b

jnj

jnj

j

j

)1(

)1(

)1(1

)1(

111

The last expression comes from writing out the binomial coefficients and reducing. First we write the condition that the sequence b0 , b1 , b2 , ... , bn is increasing.

pnpjjpjnp

jp

jnp

b

bbb

j

jjj

)1()1(

1)1(

)1(1

11

From which we conclude that if: j np then bj > bj-1 meaning that the sequence is increasing.


In a quite similar way, if we reverse the inequality sign, we find that the sequence is decreasing if and only if : j > np+p, from which we conclude that:

j np+1 => bj < bj-1 , or j np then bj+1 < bj

Implying that the sequence is decreasing. So the sequence b0 , b1 , b2 , ... , bn. is increasing for j np and decreasing for j np.

The largest probability must therefore be at j = np if np is an integer, otherwise the largest value for the probability is found among the two neighbouring integers to np.

5. Stick diagrams for the binomial distribution To get a overview of distribution of the binomial distribution, it is often illustrated by a so called “stick diagram”. Below the distribution is depicted for n =12 and p=1/3. Notice that the largest probability is for j = 3, where np = 3, as we derived above. sandsynlighed befinder sig ved j=np=4.

6. Cumulated probabilities It may – even after the appearance of mathematical calculators - be rather circumstantial to calculate binomial probabilities for high values of n. Even if one could do numerical calculation with a pocket calculator since 1970, it was not until the mid 90’ties with the appearance of mathematical pocket calculators that one could directly evaluate formulas from combinatorics and probability. Before that one was forced to use entries in tables and often do manual interpolation. Tables are still used, but they have always, (when it concerns probabilities) been designed to have cumulated probabilities, that is, sums of probabilities, from the low end or the high end. As an example, we shall again consider 12 throws with a dice where the primary event is that the dice shows 6 eyes. The probability that we have success only two times, we have up till now written P(X=2), but similarly, we may write e.g. P(X 4) for the probability that we get at most 4 eyes. P(X 4) is an example of a cumulated probability, having the precise meaning: P(X 4) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) In general for arbitrary n and p, we have:


q

j

jXPqXP0

)()( and

n

qj

jXPqXP )()(

As mentioned before, at the time, where we used tables in high school, the tables only had cumulated probabilities.

Eksemples

1. (This is an outdated example, from the period, where we used tables). We want to find the probability of getting 6 eyes on a dice two times, throwing a dice 12 times. In the table we find for n = 12 and p = 1/6. P(X 2) = 0.6774 and P(X 1) = 0.3813.

From which we find: P(X = 2) = P(X 2) - P(X 1) = 0.2961.

2. Same experiment, but now we wish to find the probability that we get at least two six in twelve attempts

P(At least two six) =

probability theory 1 - ole witt-hansenolewitthansen.dk/mathematics/probability_theory.pdf&+$37(5 9,,...

Documents