introduction to probabilities with...

Introduction to probabilities with Scilab

Michael Baudin

June 2011

Abstract

In this article, we present an introduction to probabilities with Scilab.Numerical experiments are based on Scilab. The first section presents discreterandom variables and conditionnal probabilities. In the second section, wepresent combinations problems, tree diagrams and Bernouilli trials. In thethird section, we present simulation of random processes with Scilab. Coinsimulations are presented, as well as the Galton board.

Contents

1 Discrete random variables 41.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Distribution function and probability . . . . . . . . . . . . . . . . . . 51.3 Properties of discrete probabilities . . . . . . . . . . . . . . . . . . . . 71.4 Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . 111.6 Life table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.7 Bayes’ formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.8 Independent events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.9 Notes and references . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Combinatorics 212.1 Tree diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3 The gamma function . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.4 Overview of functions in Scilab . . . . . . . . . . . . . . . . . . . . . 282.5 The gamma function in Scilab . . . . . . . . . . . . . . . . . . . . . . 282.6 The factorial and log-factorial functions . . . . . . . . . . . . . . . . . 302.7 Computing factorial and log-factorial with Scilab . . . . . . . . . . . 312.8 Stirling’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.9 Computing permutations and log-permutations with Scilab . . . . . . 362.10 The birthday problem . . . . . . . . . . . . . . . . . . . . . . . . . . 382.11 A modified birthday problem . . . . . . . . . . . . . . . . . . . . . . 402.12 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

1

2.13 Computing combinations and log-combinations with Scilab . . . . . . 442.14 The poker game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.15 Bernoulli trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.16 Computing the binomial distribution . . . . . . . . . . . . . . . . . . 512.17 The hypergeometric distribution . . . . . . . . . . . . . . . . . . . . . 542.18 Computing the hypergeometric distribution with Scilab . . . . . . . . 552.19 Notes and references . . . . . . . . . . . . . . . . . . . . . . . . . . . 562.20 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3 Simulation of random processes with Scilab 593.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.2 Generating uniform random numbers . . . . . . . . . . . . . . . . . . 603.3 Simulating random discrete events . . . . . . . . . . . . . . . . . . . . 613.4 Simulation of a coin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.5 Simulation of a Galton board . . . . . . . . . . . . . . . . . . . . . . 643.6 Generate random permutations . . . . . . . . . . . . . . . . . . . . . 663.7 References and notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4 Acknowledgments 70

5 Answers to exercises 715.1 Answers for section 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.2 Answers for section 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Bibliography 89

Index 90

2

Copyright c© 2008-2010 - Consortium Scilab - Digiteo - Michael BaudinThis file must be used under the terms of the Creative Commons Attribution-

ShareAlike 3.0 Unported License:

http://creativecommons.org/licenses/by-sa/3.0

3

http://creativecommons.org/licenses/by-sa/3.0

1 Discrete random variables

In this section, we present discrete random variables. The first section presentsgeneral definition for sets, including unions and intersections. Then we presentthe definition of the discrete distribution function and the probability of an event.In the third section, we give properties of probabilities, such as, for example, theprobability of the union of two disjoints events. The fourth section is devoted to thevery common discrete uniform distribution function. Then we present the definitionof conditional probability. This leads to Bayes formula which allows to computethe posterior conditional probability, given a set of hypothesis probabilities. Thissection finishes with the definition of independent events.

1.1 Sets

A set is a collection of elements. In this document, we consider sets of elements ina fixed non empty set Ω, to be called a space.

Assume that A is a set of elements. If x is a point in that set, we note x ∈ A. Ifthere is no point in A, we write A = ∅. If the number of elements in A is finite, letus denote by #(A) the number of elements in the set A. If the number of elementsin A is infinite, the cardinality cannot be computed (for example A = N).

The set Ac is the set of all points in Ω which are not in A:

Ac = x ∈ Ω / x /∈ A . (1)

The set Ac is called the complementary set of A.The set B is a subset of A if any point in B is also in A and we can write A ⊂ B.

The two sets A and B are equal if A ⊂ B and B ⊂ A. The difference set A− B isthe set of all points of A which are not in B:

A−B = x ∈ A / x /∈ B . (2)

The intersection A ∩B of two sets A and B is the set of points common to A andB:

A ∩B = x ∈ A and x ∈ B . (3)

The union A ∪B of two sets A and B is the set of points which belong to at leastone of the sets A or B:

A ∪B = x ∈ A or x ∈ B . (4)

The operations that we defined are presented in figure 1. These figures are oftenreferred to as Venn’s diagrams.

Two sets A and B are disjoints, or mutually exclusive if their intersection isempty, i.e. A ∩B = ∅.

In the following, we will use the fact that we can always decompose the union oftwo sets as the union of three disjoints subsets. Indeed, assume that A,B ⊂ Ω. Wehave

A ∪B = (A−B) ∪ (A ∩B) ∪ (B − A), (5)

4

Ω

A B

Ω

A B

Ω

Ac

Ω

A B

A-B

Figure 1: Operations on sets – Upper left, union: A ∪ B, Upper right, intersection:A ∩B, Lower left, complement: Ac, Lower right, difference: A−B

where the sets A − B, A ∩ B and B − A are disjoints. This decomposition will beused several times in this chapter.

The cross product of two sets is the set

A×B = (x, y) /x ∈ A, y ∈ B . (6)

Assume that n is a positive integer. The power set An is the set

An = (x1, . . . , xn) /x1, ..., xn ∈ A . (7)

Example 1.1 (Die with 6 faces) Consider a 6-faces die. The space for this experi-ment is

Ω = 1, 2, 3, 4, 5, 6 . (8)

The set of even numbers is A = 2, 4, 6 and the set of odd numbers is B = 1, 3, 5.Their intersection is empty, i.e. A∩B = ∅ which proves that A and B are disjoints.Since their union is the whole sample space, i.e. A ∪ B = Ω, these two sets aremutually complement, i.e. Ac = B and Bc = A.

1.2 Distribution function and probability

A random event is an event which has a chance of happening, and the probabilityis a numerical measure of that chance. What exactly is random is quite difficultto define. In this section, we define the probability associated with a distributionfunction, a concept that can be defined very precisely.

5

Assume that Ω is a set, called the sample space. In this document, we willconsider the case where the sample space is finite, i.e. the number of elements inΩ is finite. Assume that we are performing random trials, so that each trial isassociated to one outcome x ∈ Ω. Each subset A of the sample space Ω is called anevent. We say that the event A ∩ B occurs if both the events A and B occur. Wesay that the event A ∪B occurs if the event A or the event B occurs.

Example 1.2 (Die with 6 faces) Consider a 6-faces die which is rolled once. Thesample space is:

Ω = 1, 2, 3, 4, 5, 6 . (9)

Let us denote by x ∈ Ω the outcome of this experiment. When we focus on theexperiments where x is odd, we consider the event A = 1, 3, 5. When the outcomeis lower than 3, we consider the event B = 1, 3, 5. The intersection of these twoevents is C = 1, 3, which occurs when the outcome x is both odd and lower than3.

We will first define a distribution function and then derive the probability fromthere. The example of a 6-faces die will serve as an example for these definitions.

Definition 1.1. (Distribution) A distribution function is a function f : Ω → [0, 1]which satisfies

0 ≤ f(x) ≤ 1, (10)

for all x ∈ Ω and ∑x∈Ω

f(x) = 1. (11)

Example 1.3 (Die with 6 faces) Assume that a 6-faces die is rolled once. Thesample space for this experiment is

Ω = 1, 2, 3, 4, 5, 6 . (12)

Assume that the die is fair. This means that the probability of each of the sixoutcomes is the same, i.e. the distribution function is f(x) = 1/6 for x ∈ Ω, whichsatisfies the conditions of the definition 1.1.

Definition 1.2. (Probability) Assume that f is a distribution function on the sam-ple space Ω. For any event A ⊂ Ω, the probability P of A is

P (A) =∑x∈A

f(x). (13)

Example 1.4 (Die with 6 faces) Assume that a 6-faces die is rolled once so thatthe sample space for this experiment is Ω = 1, 2, 3, 4, 5, 6. Assume that the distri-bution function is f(x) = 1/6 for x ∈ Ω. The event

A = 2, 4, 6 (14)

6

corresponds to the statement that the result of the roll is an even number. Fromthe definition 1.2, the probability of the event A is

P (A) = f(2) + f(4) + f(6) (15)

=1

6+

1

6+

1

6(16)

=1

2. (17)

1.3 Properties of discrete probabilities

In this section, we present the properties that the probability P (A) satisfies. We alsoderive some results for the probabilities of other events, such as unions of disjointsevents.

The following theorem gives some elementary properties satisfied by a probabilityP .

Proposition 1.3. (Probability) Assume that Ω is a sample space and that f is adistribution function on Ω. The probability of the event Ω is one, i.e.

P (Ω) = 1. (18)

The probability of the empty set is zero, i.e.

P (∅) = 0. (19)

Assume that A and B are two subsets of Ω. If A ⊂ B, then

P (A) ≤ P (B). (20)

For any event A ⊂ Ω, we have

0 ≤ P (A) ≤ 1. (21)

Proof. The equality 18 derives directly from the definition 11 of a distribution func-tion, i.e. P (Ω) =

∑x∈Ω f(x) = 1. The equality 19 derives directly from 11.

Assume that A and B are two subsets of Ω so that A ⊂ B. Since a probabilityis the sum of positive terms, we have

P (A) =∑x∈A

f(x) ≤∑x∈B

f(x) = P (B) (22)

which proves the inequality 20.The inequalities 21 derive directly from the definition of a probability 13. First,

the probability P (A) is positive since 10 states that f is positive. Second, theprobability P of an event A is lower than 1, since P (A) =

∑x∈A f(x) ≤

∑x∈Ω f(x) =

P (Ω) = 1, which concludes the proof.

Proposition 1.4. (Probability of two disjoint subsets) Assume that Ω is a samplespace and that f is a distribution function on Ω.

Let A and B be two disjoints subsets of Ω, then

P (A ∪B) = P (A) + P (B). (23)

7

Ω

A B

Figure 2: Two disjoint sets.

The figure 2 presents the situation of two disjoints sets A and B. Since the twosets have no intersection, it suffices to add the probabilities associated with eachevent.

Proof. Assume that A and B are two disjoints subsets of Ω. We can decomposeA ∪B as A ∪B = (A−B) ∪ (A ∩B) ∪ (B − A), so that

P (A ∪B) =∑

x∈A∪B

f(x) (24)

=∑

x∈A−B

f(x) +∑

x∈A∩B

f(x) +∑

x∈B−A

f(x). (25)

But A and B are disjoints, so that A−B = A, A∩B = ∅ and B−A = B. Therefore,

P (A ∪B) =∑x∈A

f(x) +∑x∈B

f(x) (26)

= P (A) + P (B), (27)

which concludes the proof.

Notice that the equality 23 can be generalized immediately to a sequence ofdisjoints events.

Proposition 1.5. (Probability of disjoints subsets) Assume that Ω is a sample spaceand that f is a distribution function on Ω.

For any disjoints events A1, A2, . . . , Ak ⊂ Ω with k ≥ 0, we have

P (A1 ∪ A2 ∪ . . . ∪ Ak) = P (A1) + P (A2) + . . .+ P (Ak). (28)

Proof. For example, we can use the proposition 1.4 to state the proof by inductionon the number of events.

Example 1.5 (Die with 6 faces) Assume that a 6-faces die is rolled once so thatthe sample space for this experiment is Ω = 1, 2, 3, 4, 5, 6. Assume that the dis-tribution function is f(x) = 1/6 for x ∈ Ω. The event A = 1, 2, 3 corresponds tothe numbers lower or equal to 3. The probability of this event is P (A) = 1

2. The

event B = 5, 6 corresponds to the numbers greater than 5. The probability of thisevent is P (B) = 1

3. The two events are disjoints, so that the proposition 1.5 can be

applied, which implies that P (A ∪B) = 56.

8

Ω

A B

Figure 3: Two sets with a non empty intersection.

Proposition 1.6. (Probability of the complementary event) Assume that Ω is asample space and that f is a distribution function on Ω.

For any subset A of Ω,

P (A) + P (Ac) = 1. (29)

Proof. We have Ω = A∪Ac, where the sets A and Ac are disjoints. Therefore, fromproposition 1.4, we have

P (Ω) = P (A) + P (Ac), (30)

where P (Ω) = 1, which concludes the proof.

Example 1.6 (Die with 6 faces) Assume that a 6-faces die is rolled once so thatthe sample space for this experiment is Ω = 1, 2, 3, 4, 5, 6. Assume that the dis-tribution function is f(x) = 1/6 for x ∈ Ω. The event A = 2, 4, 6 correspondsto the statement that the result of the roll is an even number. The probability ofthis event is P (A) = 1

2. The complementary event is the event of an odd num-

ber, i.e. Ac = 1, 3, 5. By proposition 1.6, the probability of an odd number isP (Ac) = 1− P (A) = 1− 1

2= 1

2.

The following equality gives the relationship between the probability of the unionof two events in terms of the individual probabilities and the probability of theintersection.

Proposition 1.7. (Probability of the union) Assume that Ω is a sample space andthat f is a distribution function on Ω. Assume that A and B are two subsets of Ω,not necessarily disjoints. We have:

P (A ∪B) = P (A) + P (B)− P (A ∩B). (31)

The figure 3 presents the situation where two sets A and B have a non emptyintersection. When we add the probabilities of the two events A and B, the inter-section is added twice. This is why it must be removed by subtraction.

Proof. Assume that A and B are two subsets of Ω. The proof is based on the analysisof Venn’s diagram presented in figure 3. The idea of the proof is to compute the

9

probability P (A ∪ B), by making disjoints sets on which the equality 23 can beapplied. We can decompose the union of the two set A and B as the union ofdisjoints sets:

A ∪B = (A−B) ∪ (A ∩B) ∪ (B − A). (32)

The equality 23 leads to

P (A ∪B) = P (A−B) + P (A ∩B) + P (B − A). (33)

The next part of the proof is based on the computation of P (A−B) and P (B−A).We can decompose the set A as the union of disjoints sets

A = (A−B) ∪ (A ∩B), (34)

which leads to P (A) = P (A−B) + P (A ∩B), which implies

P (A−B) = P (A)− P (A ∩B). (35)

Similarly, we can prove that

P (B − A) = P (B)− P (B ∩ A). (36)

We plug the two equalities 35 and 36 into 33, and find

P (A ∪B) = P (A)− P (A ∩B) + P (A ∩B) + P (B)− P (B ∩ A), (37)

which implies 31 and concludes the proof.

Example 1.7 (Disease) Assume that infections can be bacterial (B), viral (V) orboth (B ∩ V ). This implies that B ∪ V = Ω but the two events are not disjoints,i.e. B ∩ V 6= ∅. Assume that P (B) = 0.7 and P (V ) = 0.4. What is the probabilityof having both types of infections ?

The probability of having both infections is P (B ∩ V ). From proposition 1.7,we have P (B ∪ V ) = P (B) + P (V ) − P (B ∩ V ), which leads to P (B ∩ V ) =P (B) + P (V )− P (B ∪ V ). We finally get P (B ∩ V ) = 0.7 + 0.4− 1 = 0.1.

This example is based on an example presented in [15].

1.4 Uniform distribution

In this section, we describe the particular situation where the distribution functionis uniform.

Definition 1.8. (Uniform distribution) Assume that Ω is a finite, nonempty, samplespace. The uniform distribution function is

f(x) =1

#(Ω), (38)

for all x ∈ Ω.

10

Proposition 1.9. (Probability with uniform distribution) Assume that Ω is a finite,nonempty, sample space and that f is a uniform distribution function. Then theprobability of the event A ⊂ Ω is

P (A) =#(A)

#(Ω). (39)

Proof. The definition 1.2 implies

P (A) =∑x∈A

f(x) (40)

=∑x∈A

1

#(Ω)(41)

=#(A)

#(Ω), (42)


Example 1.8 (Die with 6 faces) Assume that a 6-faces die is rolled once so thatthe sample space for this experiment is Ω = 1, 2, 3, 4, 5, 6. In the previous analysisof this example, we have assumed that the distribution function is f(x) = 1/6 forx ∈ Ω. This is consistent with definition 1.8, since #(Ω) = 6. Such a die is afair die, meaning that all faces have the same probability. The event A = 2, 4, 6corresponds to the statement that the result of the roll is an even number. Thenumber of outcomes in this event is #(A) = 3. From proposition 1.9, the probabilityof this event is P (A) = 1

2.

1.5 Conditional probability

In this section, we define the conditional distribution function and the conditionalprobability. We analyze this definition in the particular situation of the uniformdistribution.

In some situations, we want to consider the probability of an event A given thatan event B has occurred. In this case, we consider the set B as a new sample space,and update the definition of the distribution function accordingly.

Definition 1.10. (Conditional distribution function) Assume that Ω is a samplespace and that f is a distribution function on Ω. Assume that A is a nonemptysubset of Ω. The function f(x|A) defined by

f(x|A) =

f(x)∑

x∈A f(x), if x ∈ A,

0, if x /∈ A,(43)

is the conditional distribution function of x given A.

The hypothesis that A is nonempty implies P (A) =∑

x∈A f(x) > 0, so that theequality 43 is well defined.

The figure 4 presents the situation where an event A is considered for a condi-tionnal distribution. The distribution function f(x) is with respect to the samplespace Ω while the conditionnal distribution function f(x|A) is with respect to theset A.

11

Ω

A

Figure 4: A set A, subset of the sample space Ω.

Proof. We must prove that the function f(x|A) is a distribution function. Let usprove that the function f(x|A) satisfies the equality∑

x∈Ω

f(x|A) = 1. (44)

Indeed, we have ∑x∈Ω

f(x|A) =∑x∈A

f(x|A) +∑x/∈A

f(x|A) (45)

=∑x∈A

f(x)∑x∈A f(x)

(46)

= 1, (47)

(48)


This leads us to the following definition of the conditional probability of an eventA given an event B.

Proposition 1.11. (Conditional probability) Assume that Ω is a finite sample spaceand A and B are two subsets of Ω. Assume that P (B) > 0. The conditionalprobability of the event A given the event B is

P (A|B) =P (A ∩B)

P (B). (49)

The figure 5 presents the situation where we consider the event A|B. The prob-ability P (A) is with respect to Ω while the probability P (A|B) is with respect toB.

Proof. Assume that A and B are subsets of the sample space Ω. The conditionaldistribution function f(x|B) can be used to compute the probability of the event Agiven the event B. Indeed, we have

P (A|B) =∑x∈A

f(x|B) (50)

=∑

x∈A∩B

f(x|B) (51)

12

Ω

A B

A⋂B

Figure 5: The conditionnal probability P (A|B) measures the probability of the setA ∩B with respect to the set B.

since f(x|B) = 0 if x /∈ B. Hence,

P (A|B) =∑

x∈A∩B

f(x)∑x∈B f(x)

(52)

=

∑x∈A∩B f(x)∑x∈B f(x)

(53)

=P (A ∩B)

P (B). (54)

The previous equality is well defined since P (B) > 0.

This definition can be analyzed in the particular case where the distributionfunction is uniform. Assume that #(Ω) is the size of the sample space and #(A)(resp. #(B) and #(A∩B)) is the number of elements of A (resp. of B and A∩B).Therefore, the equation 49 implies

P (A|B) =

#(A∩B)#(Ω)

#(B)#(Ω)

(55)

=#(A ∩B)

#(B). (56)

We notice that

#(B)

#(Ω)

#(A ∩B)

#(B)=

#(A ∩B)

#(Ω), (57)

for all A,B ⊂ Ω. This leads to the equality

P (B)P (A|B) = P (A ∩B), (58)

for all A,B ⊂ Ω. The previous equation could have been directly found based onthe equation 49.

13

Age Group Male Female<1 100000 1000001-4 99276 993915-9 99156 9929210-14 99085 9923215-19 98989 9916420-24 98573 9899125-29 97887 9875830-34 97223 9848435-39 96526 9813340-44 95665 9762145-49 94396 9682350-54 92487 9560355-59 89643 9385060-64 85726 9138465-69 80364 8772670-74 72889 8227575-79 62860 7439880-84 49846 6321885-89 34096 4808690-94 18315 3028995-99 7198 14523100+ 1940 4804

Figure 6: Life table, United States of America, 2009 [14]..

1.6 Life table

In this section, we analyze a life table in order and compute conditional probabilitieswith Scilab.

The World Health Organization gives life tables for many countries in the world[14]. The table 6 gathers data compiled in the USA in 2009. The first line counts100,000 born alive males and females, with decreasing values when the age is in-creasing.

From this table, we can easily deduce that the probability that 91384/100000 =91.384 % of the females live to age 60 and that 63218/100000 = 63.218 % of thefemales live to age 80. We consider a women who is 60. What is the probabilitythat she lives to age 80 ?

Let us denote by A = a ≥ 60 the event that a woman lives to age 60, and letus denote by B = a ≥ 80 the event that a woman lives to age 80. We want tocompute the conditionnal probability P (a ≥ 80|a ≥ 60). By the proposition

14

1.11, we have

P (a ≥ 80|a ≥ 60) =P (a ≥ 60 ∩ a ≥ 80)

P (a ≥ 60)(59)

=P (a ≥ 80)P (a ≥ 60)

(60)

=0.63218

0.91384(61)

= 0.6918, (62)

with 4 significant digits. In other words, a women who is already 60, has 69.18 %of chance to live to 80.

It is easy to gather the data into Scilab variables, as in the following Scilab script.The ages variable contains the age classes. It is made of 22 entries, where the class#k is from age ages(k-1)+1 to ages(k). The number of male survivors in the class#k is males(k), while the number of female survivors is females(k).

ages = [0;4;9;14;19;24;29;34;39;44;49;54;..

59;64;69;74;79;84;89;94;99;100];

males = [100000;99276;99156;99085;98989;98573;..

97887;97223;96526;95665;94396;92487;89643;85726;

80364;72889;62860;49846;34096;18315;7198;1940];

females = [100000;99391;99292;99232;99164;98991;98758;..

98484;98133;97621;96823;95603;93850;91384;

87726;82275;74398;63218;48086;30289;14523;4804];

The following lifeprint function prints the data and displays a table similar tothe figure 6.

function lifeprint(ages , males , females)

nc = size(ages ,"*");

mprintf(" <1 %6d %6d\n",males(1), females (1));

for k = 2:nc

amin = ages(k-1) + 1;

amax = ages(k);

mprintf("%3d -%3d %6d %6d\n" ,..

amin ,amax ,males(k),females(k));

end

endfunction

We are now interested to compute the required probabilities with Scilab. Thefollowing lifeproba function returns the probability that a person can live to agea, given the datas in the tables ages and survivors. In practice, the survivors

variable will be either equal to males or to females. The algorithm first searchesin the ages table the class k which contains the age a. We compute the index k

such that a is contained in the age class from ages(k-1)+1 to ages(k). Then theprobability of living to age a is computed by using the number of survivors associatedto the class k.

function p = lifeproba(a, ages , survivors)

nc = size(ages ,"*");

for k = 2:nc

if ( a >= ages(k-1) + 1 & a <= ages(k) ) then

break

15

end

end

if (k==nc) then

error("Age not found in table")

end

p = survivors(k)/ survivors (1)

endfunction

Although the previous algorithm is rather naive (it does not make use of vectorizedstatements), this is sufficient for small life tables such as in our case.

The following session shows how the lifeproba function computes the proba-bility of living to age 60.

-->pa = lifeproba (60, ages , females)

pa =

0.91384

The following lifecondproba returns the probability that a person with age a

can live to age b, given the datas in the tables ages and survivors. We assumethat a < b. The algorithm is a straightforward application of the proposition 1.11.

function p = lifecondproba(a, b, ages , survivors)

if ( a >= b ) then

p = 1

return

end

pa = lifeproba(a, ages , survivors)

pb = lifeproba(b, ages , survivors)

p = pb/pa

endfunction

In the following session, we compute the probability that a woman lives to age 80,given that she is 60.

-->pab = lifecondproba (60, 80, ages , females)

pab =

0.6917841

It is now easy to compute the probability that a female live to various ages, giventhat she is 40. This is done in the following script, which produces the figure 7.

bages = floor(linspace (41 ,99 ,20));

for k = 1 : 20

pab(k) = lifecondproba (40, bages(k), ages , females );

end

plot(bages ,pab ,"bo-");

xtitle("Probability of living to age B, for a women of age 40." ,..

"B","Probability");

1.7 Bayes’ formula

In this section, we present Bayes’ formula, which allows to compute the posteriorconditional probability, given a set of hypotheses probabilities.

Proposition 1.12. ( Bayes’ formula) Assume that the sample space Ω can be de-composed in a sequence of events, which are called hypotheses. Let us denote by

16

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

40 50 60 70 80 90 100

Probability of living to age B, for a women of age 40.

Age B

Pro

babi

lity

Figure 7: Probability that a female live to various ages, given that she is 40.

Hi ⊂ Ω with i = 1,m a sequence of pairwise disjoints sets such that

Ω = H1 ∪H2 ∪ . . . ∪Hm, (63)

where m is a positive integer. Then,

P (Hi|E) =P (E|Hi)P (Hi)∑

i=1,m P (E|Hi)P (Hi). (64)

In order to use Bayes’ formula, assume that the probabilities of each hypothesisis known, that is, assume that the probabilities P (Hi) are given for 1 ≤ i ≤ m.Assume that the probabilities P (E|Hi) are known for 1 ≤ i ≤ m. Therefore, weare able to compute P (Hi|E), that is, if the event E has occurred, we are able tocompute the probability of each hypothesis Hi. In practice, we consider the mostlikely hypothesis Hi for which the probability P (Hi|E) is maximum over i = 1,m.

Proof. By definition of the conditional probability, we have

P (Hi|E) =P (Hi ∩ E)

P (E). (65)

The numerator can be computed by using the conditional probability P (E|Hi)which, by hypothesis, is known. Hence we have

P (Hi ∩ E) = P (E|Hi)P (Hi), (66)

17

for 1 ≤ i ≤ m. On the other side, we must compute de denominator P (E). Thiscan be done by using the fact that the sequence of hypotheses Hi is a decompositionof the whole sample space Ω. Hence, we can decompose the event E as

E = (E ∩H1) ∪ (E ∩H2) ∩ . . . ∩ (E ∩Hm). (67)

By hypothesis, these events are pairwise disjoints. Therefore, by proposition 1.5, wecan compute the probability of the event E as the sum of the probabilities of thedisjoints subsets E ∩Hi. We have

P (E) =∑i=1,m

P (E ∩Hi). (68)

By definition of the conditional probability, we have P (E ∩ Hi) = P (E|Hi)P (Hi),which leads to

P (E) =∑i=1,m

P (E|Hi)P (Hi). (69)

We can now plug 66 and 69 into 65, which concludes the proof.

The following exercise is presented in [17], in chapter 1, ”Elements of probability”.

Example 1.9 Consider the situation where an insurance company tries to computethe probability of having an accident. This company makes the assumption thatpeople are accident prone or are not accident prone and consider the probability ofhaving an accident during the 1 year period following the insurance policy purchase.They assume that an accident-prone person will have an accident with probability0.4. For a non-accident-prone person, the probability of having an accident is 0.2.Assume that 30 % percent of the population is accident prone. Assume that a personhas an accident. What is the probability that he is accident prone ?

Let us denote by E the event that the person will have an accident within theyear of purchase and denote by H1 the event that the person is accident prone.By hypothesis, the sample space Ω of all the persons can be decomposed with thepairwise disjoints sets H1 and H2 = Hc

1, which are, in this particular situation thehypotheses. By hypothesis, we know that P (E|H1) = 0.4 and P (E|Hc

2) = 0.2. Wealso know that P (H1) = 0.3, which implies that P (H2) = 1 − P (H1) = 0.7. Wewant to compute P (H1|E). By Bayes’ formula 64, we have

P (H1|E) =P (E|H1)P (H1)

P (E|H1)P (H1) + P (E|H2)P (H2)(70)

=0.4× 0.3

0.4× 0.3 + 0.2× 0.7(71)

≈ 0.4615, (72)

with 4 significant digits.

18

1.8 Independent events

In this section, we define independent events and give exemples of such events.

Definition 1.13. (Independent events) Assume that Ω is a finite sample space. Theevent two events A,B ⊂ Ω are independent if P (A) > 0 and P (B) > 0 and

P (A|B) = P (A), (73)

In the previous definition, the roles of A and B can be reversed, which leads tothe following result. If two events A,B ⊂ Ω are independent, then

P (B|A) = P (B). (74)

This is the subject of exercise 1.4.This definition, associated with the definition of the conditional probability 49

leads immediately to the following proposition.

Proposition 1.14. (Independent events) Assume that Ω is a finite sample space.Assume that the two events A,B ⊂ Ω are satisfying P (A) > 0 and P (B) > 0. Thenthe events A and B are independent if and only if

P (A ∩B) = P (A)P (B). (75)

Proof. Assume that the two events A,B ⊂ Ω are satisfying P (A) > 0 and P (B) > 0.In the first part of this proof, let us prove that 75 is satisfied. By definition of

the conditional probability 1.11, we have P (A∩B) = P (A|B)P (B). Since A and Bare independent, we have P (A|B) = P (A), which leads to P (A ∩ B) = P (A)P (B)and concludes the first part.

In the second part, let us assume that 75 is satisfied, and let us prove that theevents A and B are independent. By definition of the conditional probability 1.11,we have

P (A|B) =P (A ∩B)

P (B). (76)

By hypothesis, we have P (A ∩B) = P (A)P (B), so that

P (A|B) =P (A)P (B)

P (B)= P (A), (77)

since P (B) > 0. Similarily, switching the roles of A and B, by definition we

have P (B|A) = P (B∩A)P (A)

. By hypothesis, we have P (B ∩ A) = P (B)P (A), so that

P (B|A) = P (B)P (A)P (A)

= P (B), since P (A) > 0, which concludes the proof.

The proposition 1.14 has an important and rather subtle consequence which ispresented in the following example. This example is based on the ”Historial remarks”of section 1.2 in [7], which present the results collected by Gerolamo Cardano (1501-1576).

19

Example 1.10 (Die with 6 faces) Assume that a 6-faces of a fair die is rolled twice(instad of once) and consider the problem of choosing the correct sample space. Thecorrect sample space takes into account for the order of the rolls and is

Ω = (i, j) / i, j = 1, 6 . (78)

The set Ω as 6× 6 = 36 elements. The wrong (for our purpose), unordered, samplespace is

Ω2 = (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),

(2, 2), (2, 3), (2, 4), (2, 5), (2, 6),

(3, 3), (3, 4), (3, 5), (3, 6), (4, 4), (4, 5), (4, 6)

(5, 5), (5, 6), (6, 6). (79)

The set Ω2 as 21 elements. The fact that the set Ω2 is the wrong sample space forthis experiment is linked to the fact that the two rolls are independent. Therefore,the equality 75 can be applied, stating that the probabilities of two independentevents is the product of the probabilities. Therefore, the probability of any event(i, j) is

P ((i, j)) = P (i)P (j), (80)

for all i, j = 1, 6. Since P (i) = P (j) = 16, we have

P ((i, j)) =1

36, (81)

for all i, j = 1, 6. The only sample space which can leave this probability consistentwith the uniform distribution equality of a finite space 38, is Ω. If the sample spaceΩ2 was chosen, the equality 75 would be violated so that the two events wouldbecome dependent.

1.9 Notes and references

The material for section 1.1 is presented in [11], chapter 1, ”Set, spaces and mea-sures”. The same presentation is given in [7], section 1.2, ”Discrete probabilitydistributions”. The example 1.3 is given in [7].

The equations 21, 18 and 23 are at the basis of the probability theory so that in[17], these properties are stated as axioms.

In some statistics books, such as [8] for example, the union of the sets A and Bis denoted by the sum A+B, and the intersection of sets is denoted by the productAB. We did not use these notations in this document.

The section 1.6 on life tables is inspired by an example presented in [7], in section4.1, ”Discrete conditional Probability”. We have updated the data from 1990 to 2009and introduce the associate Scilab function.

20

1.10 Exercises

Exercise 1.1 (Head and tail) Assume that we have a coin which is tossed twice. We record theoutcomes so that the order matters, i.e. the sample space is ΩHH,HT, TH, TT . Assume that thedistribution function is uniform, i.e. each of the head and the tail have an equal probability.

1. What is the probability for the event A = HH,HT, TH ?

2. What is the probability for the event A = HH,HT ?

Exercise 1.2 (Two dice) Assume that we are rolling a pair of dice. Assume that each face hasan equal probability.

1. What is the probability of getting a sum of 7 ?

2. What is the probability of getting a sum of 11 ?

3. What is the probability of getting a double one, i.e. snakeeyes ?

Exercise 1.3 (Mere’s experiments) This exercise is presented in [7], in the historical remarksof section 1.2, ”Discrete Probability Distributions”. Famous letters between Pascal and Fermatwere investigated by a request for help from a French nobleman and gambler, Chevalier de Mere.It is said that de Mere had been betting that, in four rolls of a die, at least one six would turn up(event A). He was winning consistently and, to get more people to play, he changed the game toget that, in 24 rolls of two dice, a pair of 6 would turn up (event B). It is claimed that de Merelost with 24 and felt that 25 were necessary to make the game favorable (event C). What is theprobability for the three events A, B and C ? Can you compute with Scilab the probability forevent A and a number of rolls equal to 1, 2, 3 or 4 ? Can you compute with Scilab the probabilityfor the event B or C for a number of rolls equal to 10, 20, 24, 25, 30 ?

Exercise 1.4 (Independent events) Assume that Ω is a finite sample space. Assume that thetwo events A,B ⊂ Ω are independent. Prove that P (B|A) = P (B).

Exercise 1.5 (Boole’s inequality) Assume that Ω is a finite sample space. Let (Ei)i=1,n be asequence of finite sets included in Ω and n > 0. Prove Boole’s inequality:

P

⋃i=1,n

Ei

≤ ∑i=1,n

P (Ei). (82)

Exercise 1.6 (Discrete conditional probability) In this exercise, we consider a laboratoryblood test which is experimented against healthy and persons who have the disease. Assume that,when the person has the disease, the test is positive with probability 99 %. However, this test alsogenerates ”false positive” with probability 1 %. This means that, when a person does not have thedisease, the test is positive with probability 1 %. Assume that 0.5 % of the population has thedisease. Given that the test is positive for one person, what is the probability that this person hasthe disease ? Consider the case where 5 % of the population has the disease. Consider the casewhere the probability for a false positive is 0.1 % (but keep the probability of a true positive equalto 99 %).

This exercise is presented in [17], Chapter 1, ”Elements of probability”.

2 Combinatorics

In this section, we present several tools which allow to compute probabilities ofdiscrete events. One powerful analysis tool is the tree diagram, which is presentedin the first part of this section. Then, we detail permutations and combinationsnumbers, which allow to solve many probability problems.

21

Figure 8: Tree diagram - The task is made with 3 steps. There are 2 choices for thestep #1, 3 choices for step #2 and 2 choices for step #3. The total number of waysto perform the full sequence of steps is n = 2 · 3 · 2 = 12.

2.1 Tree diagrams

In this section, we present the general method which allows to count the total numberof ways that a task can be performed. We illustrate that method with tree diagrams.

Assume that a task is carried out in a sequence of n steps. The first step can beperformed by making one choice among m1 possible choices. Similarly, there are m2

possible ways to perform the second step, and so forth. The total number of waysto perform the complete sequence can be performed in n = m1m2 . . .mn differentways.

To illustrate the sequence of steps, the associated tree can be drawn. An exampleof such a tree diagram is given in the figure 8. Each node in the tree corresponds toone step in the sequence. The number of children of a parent node is equal to thenumber of possible choices for the step. At the bottom of the tree, there are N leafs,where each path, i.e. each sequence of nodes from the root to the leaf, correspondsto a particular sequence of choices.

We can think of the tree as representing a random experiment, where the finalstate is the outcome of the experiment. In this context, each choice is performedat random, depending on the probability associated with each branch. We willreview tree diagrams throughout this section and especially in the section devotedto Bernoulli trials.

2.2 Permutations

In this section, we present permutations, which are ordered subsets of a given set.

Definition 2.1. ( Permutation) Assume that A is a finite set. A permutation of Ais a one-to-one mapping of A onto itself.

Without loss of generality, we can assume that the finite set A can be orderedand numbered from 1 to n = #(A), so that we can write A = 1, 2, . . . , n. To definea particular permutation, one can write a matrix with 2 rows and n columns which

22

1

2

3

2

3

3

1

3

1

2

2

3

2

2

1

Figure 9: Tree diagram for the computation of permutations of the set A = 1, 2, 3.

represents the mapping. One example of a permutation on the setA = a1, a2, a3, a4is

σ =

(1 2 3 42 1 4 3

), (83)

which signifies that the mapping is:

• a1 → a2,

• a2 → a1,

• a3 → a4,

• a3 → a4.

Since the first row is always the same, there is no additional information providedby this row. This is why the permutation can be written by uniquely defining thesecond row. This way, the previous mapping can be written as

σ =(

2 1 4 3). (84)

We can try to count the number of possible permutations of a given set A withn elements.

The tree diagram associated with the computation of the number of permutationsfor n = 3 is presented in figure 9. In the first step, we decide which number to placeat index 1. For this index, we have 3 possibilities, that is, the numbers 1, 2 and3. In the second step, we decide which number to place at index 2. At this index,we have 2 possibilities left, where the exact numbers depend on the branch. In thethird step, we decide which number to place at index 3. At this last index, we onlyhave one number left.

This leads to the following proposition, which defines the factorial function.

Proposition 2.2. ( Factorial) The number of permutations of a set A of n elementsis the factorial of n defined by

n! = n · (n− 1) . . . 2 · 1. (85)

23

Proof. #1 Let us pick an element to place at index 1. There are n elements inthe set, leading to n possible choices. For the element at index 2, there are n − 1elements left in the set. For the element at index n, there is only 1 element left.The total number of permutations is therefore n · (n − 1) . . . 2 · 1, which concludesthe proof.

Proof. #2 The element at index 1 can be located at indexes 1, 2, . . . , n so that thereare n ways to set the element #1. Once the element at index 1 is placed, thereare n − 1 ways to set the element at index 2. The last element at index n canonly be set at the remaining index. The total number of permutations is thereforen · (n− 1) . . . 2 · 1, which concludes the proof.

When n = 0, it seems that we cannot define the number 0!. For reasons whichwill be clearer when we will introduce the gamma function, it is convenient to define0! as equal to one:

0! = 1. (86)

Example 2.1 Let us compute the number of permutations of the set A = 1, 2, 3.By the equation 85, we have 6! = 3 · 2 · 1 = 6 permutations of the set A. Thesepermutations are:

(1 2 3)(1 3 2)(2 1 3)(2 3 1)(3 1 2)(3 2 1)

(87)

The previous permutations can also be directly read from the tree diagram 9, fromthe root of the tree to each of the 6 leafs.

In some situations, all the elements in the set A are not involved in the permu-tation. Assume that j is a positive integer, so that 0 ≤ j ≤ n. A j-permutation isa permutation of a subset of j elements in A. The general counting method usedfor the previous proposition allows to count the total number of j-permutations ofa given set A.

Proposition 2.3. ( j-permutations) Assume that j is a positive integer. The num-ber of j-permutations of a set A of n elements is

(n)j = n · (n− 1) . . . (n− j + 1). (88)

Proof. The element at index 1 can be located at indexes 1, 2, . . . , n so that there aren ways to set the element at index 1. Once element at index 1 is placed, there aren− 1 ways to set the element at index 2. The element at index j can only be set atthe remaining n − j + 1 indexes. The total number of j-permutations is thereforen · (n− 1) . . . (n− j + 1), which concludes the proof.

24

Notice that the number of j-permutations of n elements and the factorial of nare equal when j = n. Indeed, we have

(n)n = n · (n− 1) . . . (n− n+ 1) = n!. (89)

On the other hand, the number of 0-permutations of n elements can be defined tobe equal to 1:

(n)0 = 1. (90)

Example 2.2 Let us compute the number of 2-permutations of the setA = 1, 2, 3, 4.By the equation 88, we have (4)2 = 4 · 3 = 12 permutations of the set A. Thesepermutations are:

(1 2)(1 3)(1 4)

(2 1)(2 3)(2 4)

(3 1)(3 2)(3 4)

(4 1)(4 2)(4 3)

(91)

We can check that the number of 2-permutations in a set of 4 elements is (4)2 = 12which is stricly lower that the number of permutations 4! = 24.

2.3 The gamma function

In this section, we present the gamma function which is closely related to the facto-rial function. The gamma function was first introduced by the Swiss mathematicianLeonard Euler in his goal to generalize the factorial to non integer values[18]. Ef-ficient implementations of the factorial function are based on the gamma functionand this is why this functions will be analyzed in detail. The practical computationof the factorial function will be analyzed in the next section.

Definition 2.4. ( Gamma function) Let x be a real with x > 0. The gamma functionis defined by

Γ(x) =

∫ 1

0

(− log(t))x−1dt. (92)

The previous definition is not the usual form of the gamma function, but thefollowing proposition allows to get it.

Proposition 2.5. ( Gamma function) Let x be a real with x > 0. The gammafunction satisfies

Γ(x) =

∫ ∞0

tx−1e−tdt. (93)

Proof. Let us consider the change of variable u = − log(t). Therefore, t = e−u, whichleads, by differenciation, to dt = −e−udu. We get (− log(t))x−1dt = −ux−1e−udu.Moreover, if t = 0, then u =∞ and if t = 1, then u = 0. This leads to

Γ(x) = −∫ 0

∞ux−1e−udu. (94)

25

For any continuously differentiable function f and any real numbers a and b.∫ b

a

f(x)dx = −∫ a

b

f(x)dx. (95)

We reverse the bounds of the integral in the equality 94 and get the result.

The gamma function satisfies

Γ(1) =

∫ ∞0

e−tdt =[−e−t

]∞0

= (0 + e0) = 1. (96)

The following proposition makes the link between the gamma and the factorialfunctions.

Proposition 2.6. ( Gamma and factorial) Let x be a real with x > 0. The gammafunction satisfies

Γ(x+ 1) = xΓ(x) (97)

and

Γ(n+ 1) = n! (98)

for any integer n ≥ 0.

Proof. Let us prove the equality 97. We want to compute

Γ(x+ 1) =

∫ ∞0

txe−tdt. (99)

The proof is based on the integration by parts formula. For any continuously differ-entiable functions f and g and any real numbers a and b, we have∫ b

a

f(t)g′(t)dt = [f(t)g(t)]ba −∫ b

a

f(t)′g(t)dt. (100)

Let us define f(t) = tx and g′(t) = e−t. We have f ′(t) = xtx−1 and g(t) = −e−t. Bythe integration by parts formula 100, the equation 99 becomes

Γ(x+ 1) =[−txe−t

]∞0

+

∫ ∞0

xtx−1e−tdt. (101)

Let us introduce the function h(t) = −txe−t. We have h(0) = 0 and limt→∞ h(t) = 0,for any x > 0. Hence,

Γ(x+ 1) =

∫ ∞0

xtx−1e−tdt, (102)

which proves the equality 97.The equality 98 can be proved by induction on n. First, we already noticed that

Γ(1) = 1. If we define 0! = 1, we have Γ(1) = 0!, which proves the equality 98 forn = 0. Then, assume that the equality holds for n and let us prove that Γ(n+ 2) =(n+1)!. By the equality 97, we have Γ(n+2) = (n+1)Γ(n+1) = (n+1)n! = (n+1)!,which concludes the proof.

26

The gamma function is not the only function f which satisfies f(n) = n!. But theBohr-Mollerup theorem prooves that the gamma function is the unique function fwhich satisfies the equalities f(1) = 1 and f(x+1) = xf(x), and such that log(f(x))is convex [2].

It is possible to extend this function to negative values by inverting the equation97, which implies

Γ(x) =Γ(x+ 1)

x, (103)

for x ∈] − 1, 0[. This allows to compute, for example Γ(−1/2) = −2Γ(1/2). Byinduction, we can also compute the value of the gamma function for x ∈] − 2,−1[.Indeed, the equation 103 implies

Γ(x+ 1) =Γ(x+ 2)

x+ 1, (104)

which leads to

Γ(x) =Γ(x+ 2)

x(x+ 1). (105)

By induction of the intervals ] − n − 1,−n[ with n a positive integer, this formulaallows to compute values of the gamma function for all x ≤ 0, except the negativeintegers 0,−1,−2, . . .. This leads to the following proposition.

Proposition 2.7. ( Gamma function for negative arguments) For any non zerointeger n and any real x such that x+ n > 0,

Γ(x) =Γ(x+ n)

x(x+ 1) . . . (x+ n− 1). (106)

Proof. The proof is by induction on n. The equation 103 prooves that the equalityis true for n = 1. Assume that the equality 106 is true for n et let us proove that italso holds for n+ 1. By the equation 103 applied to x+ n, we have

Γ(x+ n) =Γ(x+ n+ 1)

x+ n. (107)

Therefore, we have

Γ(x) =Γ(x+ n+ 1)

x(x+ 1) . . . (x+ n− 1)(x+ n)(108)

which proves that the statement holds for n+ 1 and concludes the proof.

The gamma function is singular for negative integers values of its argument, asstated in the following proposition.

Proposition 2.8. ( Gamma function for integer negative arguments) For any nonnegative integer n,

Γ(−n+ h) ∼ (−1)n

n!h, (109)

when h is small.

27

factorial returns n!gamma returns Γ(x)gammaln returns log(Γ(x))

Figure 10: Scilab commands for permutations.

Proof. Consider the equation 106 with x = −n+ h. We have

Γ(−n+ h) =Γ(h)

(h− n)(h− n+ 1)) . . . (h+ 1). (110)

But Γ(h) = Γ(h+1)h

, which leads to

Γ(−n+ h) =Γ(h+ 1)

(h− n)(h− n+ 1)) . . . (h+ 1)h. (111)

When h is small, the expression Γ(h+ 1) converges to Γ(1) = 1. On the other hand,the expression (h − n)(h − n + 1)) . . . (h + 1)h converges to (−n)(−n + 1) . . . (1)h,which leads to the the term (−1)n and concludes the proof.

We have reviewed the main properties of the gamma function. In practicalsituations, we use the gamma function in order to compute the factorial number, aswe are going to see in the next sections. The main advantage of the gamma functionover the factorial is that it avoids to form the product n! = n · (n − 1) . . . 1, whichallows to save a significant amount of CPU time and computer memory.

2.4 Overview of functions in Scilab

The figure 10 presents the functions provided by Scilab to compute permutations.Notice that there is no function to compute the number of permutations (n)j =

n · (n − 1) . . . (n − j + 1). This is why, in the next sections, we provide a Scilabfunction to compute (n)j.

In the next sections, we analyze each function in Scilab. We especially considertheir numerical behavior and provide accurate and efficient Scilab functions to man-age permutations. We emphasize the need for accuracy and robustness. For thispurpose, we use the logarithmic scale to provide intermediate results which stays inthe limited bounds of double precision floating point arithmetic.

2.5 The gamma function in Scilab

The gamma function allows to compute Γ(x) for real input argument. The mathe-matical function Γ(x) can be extended to complex arguments, but this has not beimplemented in Scilab.

The following script allows to plot the gamma function for x ∈ [−4, 4].

x = linspace ( -4 , 4 , 1001 );

y = gamma ( x );

plot ( x , y );

28

-6

-4

-2

0

2

4

6

-4 -3 -2 -1 0 1 2 3 4

Figure 11: The gamma function.

h = gcf();

h.children.data_bounds = [

- 4. -6

4. 6

];

The previous script produces the figure 11.The following session presents various values of the gamma function.

-->x = [-2 -1 -0 +0 1 2 3 4 5 6]’;

-->[x gamma(x)]

ans =

- 2. Nan

- 1. Nan

0. -Inf

0. Inf

1. 1.

2. 1.

3. 2.

4. 6.

5. 24.

6. 120.

Notice that the two floating point signed zeros +0 and -0 are associated with thefunction values −∞ and +∞. This is consistent with the value of the limit of thefunction from either sides of the singular point. This contrasts with the value of thegamma function on negative integer points, where the function value is %nan. Thisis consistent with the fact that, on this singular points, the function is equal to −∞on one side and +∞ on the other side. Therefore, since the argument x has onesingle floating point representation when it is a negative nonzero integer, the onlysolution consistent with the IEEE754 standard is to set the result to %nan.

Notice that we used 1001 points to plot the gamma function. This allows to get

29

0.0e+000

5.0e+005

1.0e+006

1.5e+006

2.0e+006

2.5e+006

3.0e+006

3.5e+006

4.0e+006

1 2 3 4 5 6 7 8 9 10

Figure 12: The factorial function.

points exactly located at the singular points. These values are ignored by the plot

function and makes a nice plot. Indeed, if 1000 points are used instead, vertical linescorresponding to the y-value immediately at the left and the right of the singularitywould be displayed.

2.6 The factorial and log-factorial functions

In the following script, we plot the factorial function for values of n from 1 to 10.

f = factorial (1:10)

plot ( 1:10 , f , "b-o" )

The result is presented in figure 12. We see that the growth rate of the factorialfunction is large.

The largest values of n so that n! is representable as a double precision float-ing point number is n = 170. In the following session, we check that 171! is notrepresentable as a Scilab double.

-->factorial (170)

ans =

7.257+306

-->factorial (171)

ans =

Inf

The factorial function is implied in many probability computations, sometimes asan intermediate result. Since it grows so fast, we might be interested in computingits order of magnitude instead of its value. Let us introduce the function fln as thelogarithm of the factorial number n!:

fln(n) = log(n!). (112)

30

160100 18040n

14060 120

log(

n!)

200

800

700

600

500

400

300

200

100

080

Logarithm of n!

Figure 13: Logarithm of the factorial number.

Notice that we used the base-e logarithm function log, that is, the reciprocal of theexponential function.

The factorial number n! grows exponentially, but its logarithm grows much moreslowly. In the figure 13, we plot the logarithm of n! in the interval [0, 170]. We seethat the y coordinate varies only from 0 up to 800. Hence, there are a large numberof integers n for which n! may be not representable as a double but fln(n) is stillrepresentable as a double.

2.7 Computing factorial and log-factorial with Scilab

In this section, we present how to compute the factorial function in Scilab. We focusin this section on accuracy and efficiency.

The factorial function returns the factorial number associated with the givenn. It has the following syntax:

f = factorial ( n )

In the following session, we compute n! for several values of n, from 1 to 7.

-->n = (0:7) ’;

-->[n factorial(n)]

ans =

0. 1.

1. 1.

2. 2.

3. 6.

4. 24.

5. 120.

6. 720.

7. 5040.

31

The implementation of the factorial function in Scilab allows to take bothmatrix and hypermatrices input arguments. In order to be fast, it uses vectorization.The following factorialScilab function represents the computationnal core of theactual implementation of the factorial function in Scilab.

function f = factorialScilab ( n )

n(n==0)=1

t = cumprod (1:max(n))

v = t(n(:))

f = matrix(v,size(n))

endfunction

The statement n(n==0)=1 allows to set all zeros of the matrix n to one, so thatthe next statements do not have to manage the special case 0! = 1. Then, we usethe cumprod function in order to compute a column vector containing cumulatedproducts, up to the maximum entry of n. The use of cumprod allows to get all theresults in one call, but also produces unnecessary values of the factorial function.In order to get just what is needed, the statement v = t(n(:)) allows to extractthe required values. Finally, the statement f = matrix(v,size(n)) reshapes thematrix of values so that the shape of the output argument is the same as the shapeof the input argument.

The following function allows to compute n! based on the prod function, whichcomputes the product of its input argument.

function f = factorial_naive ( n )

f = prod (1:n)

endfunction

The factorial_naive function has two drawbacks. The first one is that it can-not manage matrix input arguments. Furthermore, it requires more memory thannecessary.

In practice, the factorial function can be computed based on the gamma function.The following implementation of the factorial function is based on the equality 98.

function f = myfactorial ( n )

if ( ( or (n(:) < 0) ) | ( n(:) <> round (n(:) ) ) ) then

error ("myfactorial: n must all be nonnegative integers");

end

f = gamma ( n + 1 )

endfunction

The myfactorial function also checks that the input argument n is positive. It alsochecks that n is an integer by using the condition ( n(:) <> round (n(:) ).Indeed, if the value of n is different from the value of round(n), this means that theinput argument n is not an integer.

The main drawback of the factorialScilab function is that it uses more mem-ory than necessary. It may fail to produce a result when it is given a large inputargument. In the following session, we use the factorial function with a very largeinput integer. In this particular case, it is obvious that the correct result is Inf.Anyway, this should have been the result of the function, which should not havegenerated an error. On the other side, the myfactorial function works perfectly.

-->factorial (1.e10)

!--error 17

32

stack size exceeded!

-->myfactorial (1.e10)

ans =

Inf

We now consider the computation of the log-factorial function fln. We can usethe gammaln function which directly provides the correct result.

function flog = factoriallog ( n )

flog = gammaln(n+1)

endfunction

The advantage of this method is that matrix input arguments can be manage by thefactoriallog function.

There is another possible implementation for the log-factorial function, based onthe logarithm function. We have

log(n!) = log(1 · 2 · 3 · · ·n) (113)

= log(1) + log(2) + log(3) + . . .+ log(n). (114)

The previous equation can be simplified since log(1) = 0. This leads to the followingimplementation.

function flog = factoriallog_lessnaive ( n )

flog = sum(log(2:n))

endfunction

The previous function has several drawbacks. The first problem is that this functionmay require a large array if n is large. Hence, using this function is limited torelatively small values of n. Moreover, it requires to evaluate the log function atleast n − 1 times, which leads to a performance issue. Finally, it is not possible todirectly let the variable n be a matrix of doubles. All these issues make the functionfactoriallog_lessnaive a less than perfect implementation of the log-factorialfunction.

2.8 Stirling’s formula

In this section, we present Stirling’s formula [2, 19], which allows to get an asymptoticformula for the gamma function.

Let us recall the definition of the asymptotic symbol ∼.

Definition 2.9. ( Asymptotic) Let ann≥0 and bnn≥0 be two real sequences. Wesay that an is asymptotically equal to bn if

limn→∞

anbn

= 1. (115)

In this case, we write an ∼ bn.

Stirling’s formula gives the asymptotic rate of the gamma function.

Proposition 2.10. ( Stirling’s formula) For any real number x→∞,

Γ(x) ∼ e−xxx−12

√2π. (116)

33

We shall not give the proof of this formula here (see [2, 19] for a completederivation).

The previous proposition allows to directly derive an asymptotic behavior forthe factorial function.

Proposition 2.11. ( Stirling’s formula) For any positive integer n→∞,

n! ∼(ne

)n√2πn. (117)

Proof. By the proposition 2.10, we have

n! = Γ(n+ 1) ∼ e−n−1(n+ 1)n+1− 12

√2π (118)

∼ e−n−1(n+ 1)n+ 12

√2π (119)

∼ e−n−1(n+ 1)n√

2πn. (120)

We can simplify the previous expression for large values of n. Indeed, we havee−n−1 ∼ e−n and (n+ 1)n ∼ nn, which concludes the proof.

In the following script, we compare Stirling’s formula and the factorial functionfor various values of n. Moreover, we compute the number of significant digitsproduced by Stirling’s formula, by using the equation d = − log10

(n!−Sn

n!

), where Sn

is Stirling’s approximation given by the equation 117.

-->n = (1:20:185) ’;

-->f = factorial(n);

-->s = sqrt (2*%pi.*n).*(n./%e).^n;

-->d = -log10(abs(f-s)./f);

-->[n f s d]

ans =

1. 1. 0.9221370 1.1086689

21. 5.109D+19 5.089D+19 2.4022947

41. 3.345D+49 3.338D+49 2.692415

61. 5.076D+83 5.069D+83 2.8648116

81. 5.79D+120 5.79D+120 2.9878919

101. 9.42D+159 9.41D+159 3.0836832

121. 8.09D+200 8.08D+200 3.1621171

141. 1.89D+243 1.89D+243 3.2285294

161. 7.59D+286 7.58D+286 3.2861201

181. Inf Inf Nan

We see that the number of significant digits increases with n, reaching more than 3when n is close to its upper limit.

It is not straightforward to proove Stirling’s formula. Still, we can easily proovethe following proposition, which focuses on the log-factorial function.

Proposition 2.12. ( Log-factorial for large n) For any positive integer n→∞,

log(n!) ∼ n log(n)− n. (121)

Proof. By the property of the logarithm function, we have

log(n!) = log(1 · 2 · 3 · · ·n) (122)

= log(1) + log(2) + log(3) + . . .+ log(n). (123)

34

The previous equation can be simplified, since log(1) = 0. The log function is anondecreasing function of x for x > 0. Therefore,∫ k

k−1

log(x)dx < log(k) <

∫ k+1

k

log(x)dx, (124)

for k ≥ 1. We sum the previous equation over k = 1, 2, . . . , n and get∫ n

0

log(x)dx < log(n!) <

∫ n+1

1

log(x)dx. (125)

We must now compute the two integrals which appear in the previous inequali-ties. We recall that the anti-derivative of the log(x) function is x log(x) − x, since(x log(x) − x)′ = log(x) + x · 1

x− 1 = log(x). Furthermore, we recall that the limit

of the function x log(x) is zero when x converges to zero. Therefore,∫ n

0

log(x)dx = [x log(x)− x]n0 (126)

= n log(n)− n (127)

and ∫ n+1

1

log(x)dx = [x log(x)− x]n+11 (128)

= (n+ 1) log(n+ 1)− (n+ 1) + 1 (129)

= (n+ 1) log(n+ 1)− n. (130)

We plug the two previous results into the equation 125 and get

n log(n)− n < log(n!) < (n+ 1) log(n+ 1)− n, (131)


Stirling’s formula 116 is consistent with the asymptotic equation 117, whichallows to directly derive the equation

log(n!) = log((n

e

)n√2πn

)(132)

= log((n

e

)n)+ log(

√2πn) (133)

= n log(ne

)+ log(

√2π) + log(

√n) (134)

= n log(n)− n log(e) +1

2log(2π) +

1

2log(n) (135)

= n log(n)− n+1

2log(2π) +

1

2log(n), (136)

since log(e) = 1. This immediately implies the equation 121.In the following session, we compare the asymptotic equation 121 with the gam-

maln function. The session displays the column vectors [n f s d], where f iscomputed from the gammaln function, s is computed from the asymptotic equation121 and d is the number of significant digits in the asymptotic equation. We see that,for n greater than 1010, we have more than 10 significant digits in the asymptoticequation.

35

-->n = logspace (1,10,10)’;

-->f = gammaln(n+1);

-->s = n.*log(n)-n;

-->d = -log10(abs(f-s)./f);

-->[n f s d]

ans =

10. 15.104413 13.025851 0.8613409

100. 363.73938 360.51702 2.0526167

1000. 5912.1282 5907.7553 3.1309743

10000. 82108.928 82103.404 4.1721275

100000. 1051299.2 1051292.5 5.1972489

1000000. 12815518. 12815511. 6.2141578

10000000. 1.512D+08 1.512D+08 7.2263182

1.000D+08 1.742D+09 1.742D+09 8.2354866

1.000D+09 1.972D+10 1.972D+10 9.2426477

1.000D+10 2.203D+11 2.203D+11 10.248397

2.9 Computing permutations and log-permutations with Scilab

There is no Scilab function to compute the number of permutations (n)j = n.(n −1) . . . (n− j + 1).

We might be interested in simplifying the expression for the permutation numberWe have

(n)j = n.(n− 1) . . . (n− j + 1) (137)

=n.(n− 1). . . . .1

(n− j).(n− j − 1) . . . 1, (138)

=n!

(n− j)!. (139)

This leads to the following function permutations_verynaive.

function p = permutations_verynaive ( n , j )

p = factorial(n)./ factorial(n-j)

endfunction

In the following session, we see that the previous function works for small values ofn and j.

-->n = [5 5 5 5 5 5]’;

-->j = [0 1 2 3 4 5]’;

-->[n j permutations_verynaive(n,j)]

ans =

5. 0. 1.

5. 1. 5.

5. 2. 20.

5. 3. 60.

5. 4. 120.

5. 5. 120.

In the following session, we compute the permutations number (171)171 = 171!.

-->permutations_verynaive ( 171 , 171 )

ans =

Inf

36

This is caused by an overflow during the computation of the factorial function.There is unfortunately no way to fix this problem, since the result is, indeed, notrepresentable as a double precision floating point number.

On the other hand, the permutations_verynaive function performs poorly incases where n is large, whatever the value of j, as presented in the following session.

-->permutations_verynaive ( 171 , 0 )

ans =

Nan

There is certainly something to do about this problem, since, when j = 0, we have(n)0 = 1, for any n ≥ 1.

The following permutations_naive function allows to compute (n)j for positiveinteger values of n, j. It is based on the prod function, which computes the productof the given vector.

function p = permutations_naive ( n , j )

p = prod ( n-j+1 : n )

endfunction

In the following session, we check the values of the function (n)j for n = 5 andj = 1, 2, . . . , 5.

-->n = 5;

-->for j = 0 : 5

--> p = permutations_naive ( n , j );

--> disp([n j p]);

-->end

5. 0. 1.

5. 1. 5.

5. 2. 20.

5. 3. 60.

5. 4. 120.

5. 5. 120.

The following session shows that permutations_naive performs more nicelythan the previous function for small values of j.

-->permutations_naive ( 171 , 0 )

ans =

1.

The permutations_naive function has still several drawbacks. First, it requiresmore memory than necessary. For example, it may fail to compute (n)n = 1 forvalues of n larger than 105.

-->permutations_naive ( 1.e7 , 1.e7 )

!--error 17

stack size exceeded!

Furthermore, the function permutations_naive does not manage matrix inputarguments.

In order to accurately compute the permutation number, we may compute its

37

logarithm first. By the equation 139, we have

log((n)j) = log

(n!

(n− j)!

)(140)

= log(n!)− log((n− j)!) (141)

= log(Γ(n+ 1))− log(Γ(n− j + 1)). (142)

The previous equation leads to the definition of the log-permutation function, asdefined in the following function.

function plog = permutationslog ( n , j )

plog = gammaln(n+1)- gammaln(n-j+1);

endfunction

In order to compute the permutation number, we compute the exponential of theexpression. This leads to the following function permutations, where we round theresult in order to get integer results.

function p = permutations ( n , j )

p = exp(gammaln(n+1)- gammaln(n-j+1));

if ( and(round(n)==n) & and(round(j)==j) ) then

p = round ( p )

end

endfunction

The permutations function takes matrix input arguments, as presented in thefollowing session.

-->n = [5 5 5 5 5 5]’;

-->j = [0 1 2 3 4 5]’;

-->[n j permutations(n,j)]

ans =

5. 0. 1.

5. 1. 5.

5. 2. 20.

5. 3. 60.

5. 4. 120.

5. 5. 120.

Finally, the permutations functions requires the minimum amount of memoryand performs correctly, even for large values of n.

-->permutations ( 1.e7 , 1.e7 )

ans =

Inf

-->permutations ( 1.e7 , 0 )

ans =

1.

2.10 The birthday problem

In this section, we consider a practical computation of a probability, based on per-mutations. We present in this example the Scilab functions which allow to performthe computations.

38

Assume that n > 0 persons are gathered in a room. Can we compute theprobability that two persons in the room have the same birthday ?

To perform this computation, we assume that the year is made of 365 days. Weassume that each day has the same probability of a birthday.

Let us denote by Ω ⊂ Nn the sample space, which is the space of all possiblecombinations of birthdays for n persons. It is defined by

Ω = (i1, . . . , in) / i1, . . . , in = 1, 2, . . . , 365 . (143)

The searched event E is the event that two persons have the same birthday. Tocompute this event more easily, we can compute the probability of the comple-mentary event Ec, i.e., the event that all persons have a distinct birthday. In-deed, once we have computed P (Ec), we can deduce the searched probability withP (E) = 1− P (Ec).

By hypothesis, all days in the year have the same probability so that we canapply proposition 1.9 for uniform discrete distributions. Therefore, P (Ec) = #(Ec)

#(Ω).

The birthday of the each person can be chosen from 365 days. Since the birthdaysof several persons are independent events, this leads to

#(Ω) = 365n. (144)

Let us now compute the size of the complementary event Ec. The birthday ofthe first person can be chosen among 365 possible days. Once chosen, the birthdayfor the second person can be chosen among 365-1 = 364 days, so that the birthdaysare different. By repeating this process for the n persons, we get the following sizefor the event Ec:

#(Ec) = 365.364 . . . (365− n+ 1) = (365)n. (145)

Let us define the probability Q(E) as the probability of the complementary event:

Q(E) = P (Ec). (146)

We can combine 144 and 145 to compute the complementary probability

Q(E) =(365)n365n

(147)

which leads to the required probability

P (E) = 1−Q(E) = 1− (365)n365n

. (148)

The following twobirthday_verynaive function returns the probability that twopersons have the same birth time in a group of n persons, in a period of d days. Inorder to compute (365)n, we choose the formula (365)n = 365!

(365−n)!.

function p = twobirthday_verynaive ( n , d )

p = 1 - factorial ( d )./ factorial( d - n ) ./ d^n

endfunction

39

We are going to use the previous function with d = 365. In the following session,we see that the twobirthday_verynaive does not allow to compute any result,whatever the value of n.

-->n = (1:5) ’;

-->[n twobirthday_verynaive(n ,365)]

ans =

1. Nan

2. Nan

3. Nan

4. Nan

5. Nan

Indeed, the formula involves 365!, which cannot be represented as a double.The following twobirthday_naive function uses the permutations function pre-

viously defined.

function p = twobirthday_naive ( n , d )

p = 1 - permutations ( d , n ) ./ d^n

endfunction

In the following session, we compute several the probability that two persons havethe same birthday for n going from 1 to 10.

-->n = (1:5) ’;

-->[n twobirthday_naive(n ,365)]

ans =

1. 0.

2. 0.0027397260273972490197

3. 0.0082041658847813447863

4. 0.0163559124665503263785

5. 0.0271355736996392593596

We can explore larger values of n and see when the probability p break the p =0.5 threshold. The following Scilab session shows how to compute the requiredprobability for n = 20, 21, . . . , 25.

-->n = (20:25) ’;

-->[n twobirthday_naive(n ,365)]

ans =

20. 0.4114383835806317835093

21. 0.4436883351652584073221

22. 0.4756953076625709542213

23. 0.5072972343239776638057

24. 0.5383442579144532835755

25. 0.5686997039695230737877

This shows that if more than 23 persons are gathered, there is a favorable probabilitythat two persons have the same birthday.

2.11 A modified birthday problem

We now consider a modified birth problem. We have computed that 23 persons aresufficient to make a favorable bet that two persons are born the same day. We areinterested in the event that two persons are born on the same day, at the same hour.More precisely, we would like to compute the number of persons in the group so that

40

the probability of having two persons with the same birth day and hour is greaterthan 0.5.

We assume that a year is made of 365 days and that a day is made of 24 hours.This computation should be easy to perform. It suffices to use the twobirth-

day_naive function with d = 365 · 24. In the following session, we explore thevalues of P (E) for n = 20, 21, . . . , 25.

-->n = (20:25) ’;

-->[n twobirthday_naive(n ,365*24)]

ans =

20. 0.0214717378650828294440

21. 0.0237058206494269452236

22. 0.0260462518925431707473

23. 0.0284922544755111806225

24. 0.0310430168113396964813

25. 0.0336976934779864567560

We see that the probability is much lower than previously. The problem is that,for larger values of n, the previous function fails, as can be seen in the followingsession.

-->twobirthday_naive (100 ,365*24)

ans =

Nan

The problem is that the number of permutations required in the implementation is(365 · 24)100 is represented by Inf.

-->permutations ( 365*24 , 100 )

ans =

Inf

The same issue occurs for (365 · 24)100. This leads to the ratio Inf/Inf, which isequal to the IEEE Nan. In order to solve this issue, we can use logarithms of theintermediate results. We have

log(Q(E)) = log

((365)n365n

)(149)

= log((365)n)− log(365n) (150)

= log((365)n)− n log(365). (151)

This leads to the following twobirthday_lessnaive function, which uses the per-

mutationslog function.

function p = twobirthday_lessnaive ( n , d )

q = exp(permutationslog ( d , n ) - n * log(d))

p = 1 - q

endfunction

We check in the following session that our less naive function allows to compute therequired probability for n = 100.

-->twobirthday_lessnaive (100 ,365*24)

ans =

0.4329003041011145747063

41

In order to search for the number of persons which makes the probability be greaterthat p = 0.5, we perform a while loop. We quit this loop when the probabilitybreaks the threshold.

n = 1;

while ( %t )

p = twobirthday_lessnaive ( n , 365*24 );

if ( p > 0.5 ) then

mprintf("n=%d, p=%e\n",n,p)

break

end

n = n + 1;

end

The previous script produces the following output.

n=111, p=5.033485e-001

We are now interested in computing the probability of getting two persons withthe same birth day and hour in a group of 500 persons. The following session showsthe result of calling the twobirthday_lessnaive with n = 500.

-->p = twobirthday_lessnaive ( 500 , 365*24 )

p =

0.9999995

This probability is very close to 1. In order to get more significant digits, we usethe format function in the following session.

-->format (25)

-->p

p =

0.9999995054082023715480

This allows to get a larger number of significant digits. But the result is only accurateat most to 17 significant digits. Since 6 of these digits are digits are 9, there areonly 17-6=11 digits available for the required probability p. The reason for thisinaccuracy is that q is very close to zero, which makes p = 1 − q be very close to1. This implies that, because of our way to compute the probability p, we can atbest expect 11 accurate digits for p. We emphasize that this is independent fromthe actual accuracy of the intermediate computations and is only generated by theway of representing the solution of the problem with double precision floating pointnumbers.

One possible solution is to compute q instead of p. Indeed, the q value is closeto zero, where floating point numbers can be represented with limited but sufficientaccuracy. The following twobirthday function allows to compute both p and q.

function [ p , q ] = twobirthday ( n , d )

q = exp(permutationslog ( d , n ) - n * log(d))

p = 1 - q

endfunction

In the following session, we display the result of the computation and use themprintf function with the %.17e format in order to display 17 digits after thedecimal point.

42

-->[p,q] = twobirthday ( 500 , 365*24 );

-->mprintf("p=%.17e\n", p)

p=9.99999505408202370e-001

-->mprintf("q=%.17e\n", q)

q=4.94591797644640870e-007

We now have all the available digits for q and we know that the probability of havingtwo persons with the same birth day and hour in a group of n = 500 persons is closep = 1− 4.94591797644640870 · 10−7.

But this does not imply that all these digits are all exact. Indeed, floatingpoint evaluations of elementary operators like +, -, *, / and elementary and specialfunctions like exp, log and gamma are associated with rounding errors and variousapproximations.

We have computed the probability for n = 500 with the symbolic computationsystem Wolfram Alpha [16] and used the expresssion:

(365*24)!/(365*24 -500)!/((365*24)^500)

We found that the exact probability is in this case

z=4.94591797640207720e-7

rounded to 17 digits. In the following session, we compute the relative error betweenthe computed and the exact probabilities.

-->z=4.94591797640207720e-7

z =

0.0000005

-->abs(q-z)/z

ans =

8.963D-12

-->-log10(abs(q-z)/z)

ans =

11.047534

We see that there are approximately 11 digits accurate in this case which is suffi-ciently accurate for our purpose.

2.12 Combinations

In this section, we present combinations, which are unordered subsets of a given set.The number of distinct subsets with j elements which can be chosen from a set

A with n elements is the binomial coefficient and is denoted by

(nj

). The following

proposition gives an explicit formula for the binomial number.

Proposition 2.13. ( Binomial) The number of distinct subsets with j elementswhich can be chosen from a set A with n elements is the binomial coefficient and isdefined by (

nj

)=n.(n− 1) . . . (n− j + 1)

1.2 . . . j. (152)

The following proof is based on the fact that subsets are unordered, while per-mutations are based on the order.

43

Proof. Assume that the set A has n elements and consider subsets with j > 0elements. By proposition 2.3, the number of j-permutations of the set A is (n)j =n.(n−1) . . . (n−j+1). Notice that the order does not matter in creating the subsets,so that the number of subsets is lower than the number of permutations. This iswhy each subset is associated with one or more permutations. By proposition 2.2,there are j! ways to order a set with j elements. Therefore, the number of subsets

with j elements is given by

(nj

)= n.(n−1).....(n−j+1)

1.2...j, which concludes the proof.

The expression for the binomial coefficient can be simplified if we use the numberof j-permutations and the factorial number, which leads to(

nj

)=

(n)jj!

. (153)

The equality (n)j = n!(n−j)! leads to(

nj

)=

n!

(n− j)!j!. (154)

This immediately leads to (nj

)=

(n

n− j

). (155)

The following proposition shows a recurrence relation for binomial coefficients.

Proposition 2.14. For integers n > 0 and 0 < j < n, the binomial coefficientssatisfy (

nj

)=

(n− 1j

)+

(n− 1j − 1

). (156)

The proof recurrence relation of the proposition 2.14 is given as an exercise.

2.13 Computing combinations and log-combinations withScilab

In this section, we show how to compute combinations with Scilab.

There is no Scilab function to compute the binomial number

(nj

). In order

to compute the required combinations, we will use the gamma function. By theequation 154, we have

log

((nj

))= log(n!)− log((n− j)!)− log(j!) (157)

= log(Γ(n+ 1))− log(Γ(n− j + 1))− log(Γ(j + 1)). (158)

The following Scilab function performs the computation of the binomial number forpositive values of n and j.

44

function c = nchoosek ( n , j )

c = exp(gammaln(n+1)- gammaln(j+1)- gammaln(n-j+1))

if ( and(round(n)==n) & and(round(j)==j) ) then

b = round ( b )

end

endfunction

In the following session, we compute the value of the binomial coefficients forn = 1, 2, . . . , 5. The values in this table are known as Pascal’s triangle.

-->for n=0:5

--> for j=0:n

--> c = nchoosek ( n , j );

--> mprintf("%2d ",c);

--> end

--> mprintf("\n");

-->end

1

1 1

1 2 1

1 3 3 1

1 4 6 4 1

1 5 10 10 5 1

We now explain why we choose to use the exp and the gammaln to perform ourcomputation for the nchoosek function. Indeed, we could have used a more naivemethod, based on the prod function, as in the following example :

function c = nchoosek_naive ( n , j )

c = prod ( n : -1 : n-j+1 )/prod (1:j)

endfunction

For small integer values of n, the two previous functions produce the same result.Unfortunately, even for moderate values of n, the naive method fails. In the following

session, we compute the value of

(nj

)with n = 10000 and j = 134.

-->nchoosek ( 10000 , 134 )

ans =

2.050+307

-->nchoosek_naive ( 10000 , 134 )

ans =

Inf

The reason why the naive computation fails is because the products involved inthe intermediate variables for the naive method are generating an overflow. Thismeans that the values are too large for being stored in a double precision floatingpoint variable. This is a pity, since the result can be stored in a double precisionfloating point variable. The nchoosek function, on the other hand, computes first thelogarithm of the factorial number. This logarithm cannot overflow because if x is adouble precision floating point number, then log(x) can always be represented sinceits exponent is always smaller than the exponent of x. In the end, the combinationof the exp and gammaln functions allows to accurately compute the result in thesense that, if the result is representable as a double precision floating point number,then nchoosek will produce a result as accurate as possible.

45

Notice that we use the round function in our implementation of the nchoosek

function. This is because the nchoosek function manages in fact real double preci-sion floating point input arguments. Consider the example where n = 4 and j = 1

and let us compute the associated number of nchoosek

(nj

). In the following Scilab

session, we use the format so that we display at least 15 significant digits.

-->format (20);

-->n = 4;

-->j = 1;

-->c = exp(gammaln(n+1)- gammaln(j+1)- gammaln(n-j+1))

c =

3.99999999999999822

We see that there are 15 significant digits, which is the best that can be expectedfrom the exp and gammaln functions. But the result is not an integer anymore,i.e. it is very close to the integer 4, but not exactly equal to it. This is why inthe nchoosek function, if n and j are both integers, we round the number c to thenearest integer with a call to the round function.

Finally, notice that our implementation of the nchoosek function uses the func-tion and. This allows to use arrays of integers as input variables. In the following

session, we compute

(5j

), for j = 0, 1, . . . , 5 in one single call. This is a consequence

of the fact that the exp and gammaln both accept matrices input arguments.

-->n = 5 * ones (6,1);

-->j = (0:5) ’;

-->c = nchoosek ( n , j );

-->[n j c]

ans =

5. 0. 1.

5. 1. 5.

5. 2. 10.

5. 3. 10.

5. 4. 5.

5. 5. 1.

It appears, as we will see later in this document, that the number of combinationsappears in several probability computations. For example, this function is used asan intermediate computation in the hypergeometric distribution function which willbe presented later in this document. We will see that the numerical issue associatedwith the use of floating point numbers is solved by the use of the logarithm of thenumber of combinations, and this is why we now focus on this computation.

Let us introduce the function clog as the logarithm of the number

(kx

):

clog(n, j) = log

((nj

)). (159)

The functions fln and clog are related by the equation(nj

)=

n!

(n− j)!j!, (160)

46

1 2 3 4 5 6 7 8 9 10 J Q K1♥ 2♥ 3♥ 4♥ 5♥ 6♥ 7♥ 8♥ 9♥ 10♥ J♥ Q♥ K♥1♣ 2♣ 3♣ 4♣ 5♣ 6♣ 7♣ 8♣ 9♣ 10♣ J♣ Q♣ K♣1♠ 2♠ 3♠ 4♠ 5♠ 6♠ 7♠ 8♠ 9♠ 10♠ J♠ Q♠ K♠

Figure 14: Cards of a 52 cards deck - ”J” stands for Jack, ”Q” stands for Queen and”K” stands for King

Name Description Exampleno pair none of the below combinations 7 3♣ 6♣ 3♥ 1♠pair two cards of the same rank Q♣ Q♣ 2♥ 3♣ 1double pair 2× two cards of the same rank 2 2♠ Q♠ Q♣three of a kind three cards of the same rank 2 2♠ 2♥ 3♣ 1straight five cards in a sequence, not all the same suit 3♥ 4♠ 5 6♠ 7flush five cards in a single suit 2 3 7 J Kfull house one pair and one triple, each of the same rank 2 2♠ 2♥ Q♣ Q♥four of a kind four cards of the same rank 5 5♠ 5♣ 5♥ 2♥straight flush five in a sequence in a single suit 2 3 4 5 6royal flush 10, J, Q, K, 1 in a single suit 10♠ J♠ Q♠ K♠ 1♠

Figure 15: Winning combinations at the Poker

which implies

clog(n, j) = log(n!)− log((n− j)!)− log(j!). (161)

The following nchooseklog computes clog(n, j) from the equation 161.

function c = nchooseklog ( n, k )

c = gammaln ( n + 1 ) - gammaln (k + 1) - gammaln (n - k + 1)

endfunction

2.14 The poker game

In the following example, we use Scilab to compute the probabilities of poker hands.The poker game is based on a 52 cards deck, which is presented in figure 14.

Each card can have one of the 13 available ranks from 1 to K, and have on the 4available suits , ♥, ♣ and ♠. Each player receives 5 cards randomly chosen inthe deck. Each player tries to combine the cards to form a well-known combinationof cards as presented in the figure 15. Depending on the combination, the playercan beat, or be defeated by another player. The winning combination is the rarest;that is, the one which has the lowest probability. In figure 15, the combinations arepresented in decreasing order of probability.

Even if winning at this game requires some understanding of human psychology,understanding probabilities can help. Why does the four of a kind beats the fullhouse ?

47

To answer this question, we will compute the probability of each event. Since theorder of the cards can be changed by the player, we are interested in combinations(and not in permutations). We make the assumption that the process of choosingthe cards is really random, so that all combinations of 5 cards have the same prob-abilities, i.e. the distribution function is uniform. Since the order of the cards doesnot matter, the sample space Ω is the set of all combinations of 5 cards chosen from52 cards. Therefore, the size of Ω is

#Ω =

(525

)= 2598960. (162)

The probability of a four of a kind is computed as follows. In a 52-cards deck,there are 13 different four of a kind combinations. Since the 5-th card is chosen atrandom from the 48 remaining cards, there are 13.48 different four of a kind. Theprobability of a four of a kind is therefore

P (four of a kind) =13.48(525

).

=624

2598960≈ 0.0002401 (163)

The probability of a full house is computed as follows. There are 13 different

ranks in the deck, and, once a rank is chosen, there are

(42

)different pairs for one

rank Therefore the total number of pairs is 13.

(42

). Once the pair is set, there are

12 different ranks to choose for the triple and there are

(43

)different triples for one

rank. The total number of full house is therefore 13.

(42

).12.

(43

). Notice that the

triple can be chosen first, and the pair second, but this would lead exactly to thesame result. Therefore, the probability of a full house is

P (full house) =

13.12.

(42

).

(43

)(

525

).

=3744

2598960≈ 0.0014406 (164)

The computation of all the probabilities of the winning combinations is given asexercise.

2.15 Bernoulli trials

In this section, we present Bernoulli trials and the binomial discrete distributionfunction. We give the example of a coin tossed several times as an example of sucha process.

Definition 2.15. A Bernoulli trials process is a sequence of n > 0 experiments withthe following rules.

48

(start)

p

q

p

p

q

q

p

p

p

p

q

q

q

q

S

F

S

S

S

S

S

F

F

F

F

F

F

S

Figure 16: A Bernoulli process with 3 trials. – The letter ”S” indicates ”success” andthe letter ”F” indicates ”failure”.

1. Each experiment has two possible outcomes, which we may call success andfailure.

2. The probability p ∈ [0, 1] of success of each experiment is the same for eachexperiment.

In a Bernoulli process, the probability p of success is not changed by any knowl-edge of previous outcomes For each experiment, the probability q of failure isq = 1− p.

It is possible to represent a Bernoulli process with a tree diagram, as the one infigure 16.

A complete experiment is a sequence of success and failures, which can be rep-resented by a sequence of S’s and F’s. Therefore the size of the sample space is#(Ω) = 2n, which is equal to 23 = 8 in our particular case of 3 trials.

By definition, the result of each trial is independent from the result of the previoustrials. Therefore, the probability of an event is the product of the probabilities ofeach outcome.

Consider the outcome x = ”SFS” for example. The value of the distributionfunction f for this outcome is

f(x = ”SFS”) = pqp = p2q. (165)

The table 17 presents the value of the distribution function for each outcome x ∈ Ω.We can check that the sum of probabilities of all events is equal to 1. Indeed,∑

i=1,8

f(xi) = p3 + p2q + p2q + pq2 + p2q + pq2 + pq2 + q3 (166)

= p3 + 3p2q + 3pq2 + q3 (167)

= (p+ q)3 (168)

= 1 (169)

We denote by b(n, p, j) the probability that, in n Bernoulli trials with successprobability p, there are exactly j successes. In the particular case where there are

49

x f(x)SSS p3

SSF p2qSFS p2qSFF pq2

FSS p2qFSF pq2

FFS pq2

FFF q3

Figure 17: Probabilities of a Bernoulli process with 3 trials

n = 3 trials, the figures 16 and 17 gives the following results:

b(3, p, 3) = p3 (170)

b(3, p, 2) = 3p2p (171)

b(3, p, 1) = 3pq2 (172)

b(3, p, 0) = q3 (173)

The following proposition extends the previous analysis to the general case.

Proposition 2.16. ( Binomial probability) In a Bernoulli process with n > 0 trialswith success probability p ∈ [0, 1], the probability of exactly j successes is

b(n, p, j) =

(nj

)pjqn−j, (174)

where 0 ≤ j ≤ n and q = 1− p.

Proof. We denote by A ⊂ Ω the event that one process is associated with exactly jsuccesses. By definition, the probability of the event A is

b(n, p, j) = P (A) =∑x∈A

f(x). (175)

Assume that an outcome x ∈ Ω is associated with exactly j successes. Since thereare n trials, the number of failures is n − j. That means that the value of thedistribution function of this outcome x ∈ A is f(x) = pjqn−j. Since all the outcomesx in the set A have the same distribution function value, we have

b(n, p, j) = #(A)pjqn−j. (176)

The size of the set A is the number of subsets of j elements in a set of size n. Indeed,the order does not matter since we only require that, during the whole process, thetotal number of successes is exactly j, no matter of the order of the successes and

failures. The number of outcomes with exactly j successes is therefore #(A) =

(nj

),

which, combined with equation 176, concludes the proof.

50

Example 2.3 A fair coin is tossed six times. What is the probability that exactly 3heads turn up ? This process is a Bernoulli process with n = 6 trials. Since the coinis fair, the probability of success at each trial is p = 1/2. We can apply proposition2.16 with j = 3 and get

b(6, 1/2, 3) =

(63

)(1/2)3(1/2)3 ≈ 0.3125, (177)

so that the probability of having exactly 3 heads is 0.3125.

2.16 Computing the binomial distribution

In this section, we present the computation of the binomial distribution function.As we are going to see, there are numerical issues, so that computing this binomialdistribution function is not as easy as it seems.

The following binopdf_naive function is a naive implementation of the equation174. It takes as input arguments the number of trials n in the Bernoulli process, theprobability of success of one trial pb and the number of sucesses x. It returns theprobability of getting exactly x successes in the Bernoulli experiment.

function p = binopdf_naive ( x , n , pb )

qb = 1 - pb

p = nchoosek(n,x) * pb^x * qb^(n-x)

endfunction

We can check that our implementation is correct by checking against two simpleexamples.

Example 2.4 A fair coin is tossed six times. What is the probability that exactly3 heads turn up ?

-->n = 6; pb = 0.5;

-->p = binopdf_naive ( 3 , n , pb )

p =

0.3125

Example 2.5 Assume that we work in a factory producing n = 100 puffins eachday. The probability of producing a defective puffin is 2%. What is the probabilitythat exactly 0 puffins are produced ?

-->n = 200; pb = 2/100;

-->p = binopdf_naive ( 0 , n , pb )

p =

0.0175879

We can now consider an example where the probability of success of each Bernoullitrial is far more likely. Assume that the probability of sucess is p = 1 − 10−20, i.e.there is an extremely strong probability of success. Assume that we make n = 100trials. What is the probability of having 90 sucesses ?

-->n = 100; pb = 1 - 1.e-20;

-->binopdf_naive ( 90 , n , pb )

ans =

0.

51

In order to check our computation, we can use Wolfram Alpha [16], with the expres-sion :

nchoosek(100,90) * (1-10^-20)^90 * (10^-20)^(100-90)

The exact result is

p = 1.73103094564399999...× 10−187. (178)

This number is small, but it is representable by the double precision floating pointnumbers used in Scilab. The reason of the failure of the naive implementation isbecause the probability of success is so close to 1 that is has been rounded to oneby Scilab, as show in the following session.

-->format("e" ,25)

-->pb = 1 - 1.e-20

pb =

1.000000000000000000D+00

Hence the probability of failure qb is represented by the floating point number zero,leading to an inaccurate computation. In fact, any probability smaller than themachine epsilon ε ≈ 10−16 would lead to the same issue.

The following binopdf_lessnaive function is a more accurate implementationof the Binomial distribution function. Instead of computing the complementaryprobability qb from pb, it takes it directly as an input argument.

function p = binopdf_lessnaive ( x , n , pb , qb )

p = nchoosek(n,x) .* pb^x .* qb^(n-x)

endfunction

The operator .* used in the binopdf_lessnaive function allows it to returns aresult when x is a vector. In the following session, we compute the same probabilityas before and show that the current implementation is accurate, with 15 significantdigits (i.e. the maximum possible precision).

-->n = 100; pb = 1 - 1.e-20; qb = 1.e-20;

-->binopdf_lessnaive ( 90 , n , pb , qb )

ans =

1.731030945643998948 -187

In the previous example, there is an obvious difference between the naive and theaccurate implementations. But there are cases where the difference between the twofunctions is less obvious, leading to the false feeling that the naive implementationis accurate. This happens in particular where the complementary probability q isclose to zero, but larger than the machine precision, that is, larger than ε ≈ 10−16 inthe context of Scilab. In this case, the computed value of qb=1-pb is nonzero, butdoes not have full accuracy: its digits are mainly driven by the rounding errors.

There is still something wrong with our less naive implementation. Indeed, con-sider the following case, where we use a probability pb close to one and a particularilychosen value x.

-->n = 1.e9; pb = 1 - 1.e-14; qb = 1.e-14;

-->x = 1.e9 - 48

x =

52

999999952.

-->binopdf_lessnaive ( x , n , pb , qb )

ans =

Nan

We can compute the exact result with Wolfram Alpha and get

e = 8.05538642991600721e-302

The previous number is close to the low limit available for double precision normal-ized floating point numbers. We can analyze what happens here by computing theintermediate terms which appear in the computation, as in the following session.

-->nchoosek(n,x)

ans =

Inf

-->pb^x

ans =

0.9999900080431765037048

-->qb^(n-x)

ans =

0.

Since the number of combination is extremely large, it is represented by the floatingpoint number Inf. Moreover, the term qn−x is extremely small, and this is why itis represented by zero. The product Inf * 0 generates the IEEE value Nan, whichstands for ”Not a number”.

A solution to this issue is to use the logarithm of the number of combinations.We consider the logarithm of the equation 174 and get

log(b(n, p, j)) = log

((nj

))+ j log(p) + (n− j) log(q). (179)

This leads to the following implementation of the binopdf function.

function p = binopdf ( x , n , pb , qb )

plog = nchooseklog(n,x) + x .* log(pb) + (n-x) * log(qb)

p = exp ( plog )

endfunction

In the following session, we check that our implementation gives an accurate result.

-->n = 1.e9; pb = 1 - 1.e-14; qb = 1.e-14;

-->x = 1.e9 - 48

x =

999999952.

-->binopdf ( x , n , pb , qb )

ans =

8.055383937519171497 -302

By looking more closely at the previous result, we see that the order of magnitudeis good, but that all digits are not exact. In the following session, we compute thenumber of decimal significant digits from the formula d = − log10 (|c− e|/e), wheree is the exact result and c is the computed result.

-->d = -log10(abs(p-e)/e)

d =

6.5094691877632762100347

53

We see that the accurate implementation has about 6 significant digits, which isfar less than the maximum achievable precision. Indeed, the maximum achievableprecision is 15 significant digits for Scilab doubles. In order to measure the sensi-tivity of the output arguments depending on the input arguments, we compute thecondition number of the binomial distribution function for this particular value of x.In the following session, we compute the probability p1 of a slightly modified inputargument x + 2 * %eps * x. Then we compute the relative difference of the inputrx and the relative difference of the output ry. The condition number is defined asthe ratio between these two relative differences.

-->p1 = binopdf ( x + 2 * %eps * x , n , pb , qb )

p1 =

8.055461212394521478 -302

-->rx = 2*%eps

rx =

4.440892098500626162D-16

-->ry = abs(p2 - p)/p

ry =

9.817532328886864651D-07

-->c = ry/rx

c =

2.210711746903634071D+09

We see that the condition number is close to 1010. This means that a relativelysmall variation of the input argument x generates a relatively large change in theoutput probability p. In our particular case, a very small variation of x can makethe probability varying suddenly from values close to zero to values close to 1.

Hence, the computation is numerically difficult and this explains why the ac-curacy of the binopdf function is sometimes less than maximum. We emphasizethat this is not a problem which is specific to the binopdf function: it is, indeed, aproblem which comes from the behaviour of the function itself, which varies greatlyfor particular values of the input argument x.

2.17 The hypergeometric distribution

In this section, we present the hypergeometric distribution function.Consider an urn containing m > 1 balls. Assume that this urn contains k red

balls and m−k blue balls. In this urn, draw n balls without replacement. We assumehere that 1 ≤ k ≤ m, 1 ≤ n ≤ m and 1 ≤ x ≤ n. What is the probability of havingx red balls ? This distribution function is called the hypergeometric distributionfunction.

We assume here that 1 ≤ k ≤ m, 1 ≤ n ≤ m and 1 ≤ x ≤ n. We denote byP (X = x) = h(x,m, k, n) this distribution function.

The sample space Ω is made of all the possible choices of n balls from a set of m

balls. There are #(Ω) =

(mn

)ways to perform this choice.

There are

(kx

)ways to select k balls from the set of k red balls and there are(

m− kn− x

)ways to select n− x balls in the set of m− k blue balls.

54

Therefore, the probability of selecting x red balls is given by the hypergeometricdistribution function, defined by

P (X = x) = h(x,m, k, n) =

(kx

)(m− kn− x

)(mn

) . (180)

The actual computation of this distribution function is not straightforward, aswe are going to see in the next section.

An example of application of this distribution function is given in the exercise 2.9,which consider the computation of the probability of the prediction of earthquakes”by chance”. Another example of this distribution function is given in the nextsection.

2.18 Computing the hypergeometric distribution with Scilab

In this section, we consider the computation of the hypergeometric distributionfunction in Scilab.

In the following function, we use the nchoosek function and implements a naiveformula for the hypergeometric distribution function.

function p = hygepdf_naive ( x , m , k , n )

p = nchoosek(k,x) * nchoosek(m-k,n-x) / nchoosek(m,n)

endfunction

We can check that our implementation is correct by checking against a simpleexample.

Example 2.6 Assume that we have a collection of m = 100 puffins and that k = 30of them are defective. Assume that we select n = 10 puffins at random. What isthe probability of having exactly x = 5 defective puffins ? The following sessioncomputes this probability by using the hygepdf_naive function.

-->m = 100; k = 30; n = 10;

-->hygepdf_naive ( 5 , m , k , n )

ans =

0.0996373

We can see that the naive implementation fails to compute the h(x = 200,m =1030, k = 500, n = 515) which exact value is 1.65570 · 10−10. This example isused by Yalta in [21] to prove that Excel 2007 is inaccurate with respect to thehypergeometric function.

-->m=1030; k=500; n=515;

-->hygepdf_naive ( 200 , m , k , n )

ans =

0.

The reason of this failure is because some intermediate term involved in thecomputation of h(x,m, k, n) is too large to be represented in a double precisionfloating point number. Indeed, the computation of h(x = 200,m = 1030, k =

500, n = 515) involves the computation of the intermediate terms

(515200

),

(515300

)55

and

(1030500

). The following session shows that the last term is represented as the

Infinity number of the IEEE-754 standard.

-->nchoosek (1030 ,515)

ans =

Inf

We can solve the problem by computing first the logarithm of the probabilityand then exponentiating the result. Let us introduce the function hlog defined by

hlog(x,m, k, n) = log(h(x,m, k, n)) (181)

= log(

(kx

)) + log(

(m− kn− x

))− log(

(mn

)). (182)

Hence, if we are able to compute accurately the logarithm of the hypergeometricdistribution, we can easily compute the required probability by the equation

h(x,m, k, n) = exp(hlog(x,m, k, n)). (183)

The computation of the function log-combination function, i.e. the clog(n, k) =

log

((nk

))function, has been presented in the section 2.13.

The following hygepdf function finally computes the hypergeometric functionfrom the equations 182 and 183.

function p = hygepdf ( x, m, k, n )

c1 = nchooseklog ( k, x )

c2 = nchooseklog ( m-k, n-x )

c3 = nchooseklog ( m, n )

p_log = c1 + c2 - c3

p = exp ( p_log )

endfunction

The previous implementation is a Scilab port of a Matlab implementation due toJohn Burkardt.

The following Scilab session shows that the previous implementation leads to anaccurate floating point result, as opposed to the naive implementation.

-->m=1030; k=500; n=515;

-->hygepdf ( 200 , m , k , n )

ans =

1.656D-10

2.19 Notes and references

Combinatorics topics presented in section 2 can be found in [7], chapter 3, ”Combi-natorics”. The example of Poker hands is presented in [7], while the probabilities ofall Poker hands can be found in Wikipedia [20].

Parts of the section 2.8, dedicated to Stirling’s formula are based on [7], chapter3, ”Combinatorics”.

There is some confusion related to the hypergeometric distribution function,since many texts use various symbols for the distribution function, its parameters

56

and the order of these parameters. The notation for the hypergeometric distributionfunction h presented in section 2.17 is taken from Matlab. The letters chosen forthe parameters and the orders of the arguments are the one chosen in Matlab.In an earlier version of this document, we considered [7], section 5.1, ”ImportantDistributions”. Indeed, Grinstead and Snell chose to denote the number of items(or balls) in the urn by the capital letter N , which may lead to bugs in Scilabimplementations, because of the variable n, denoting the number of samples (numberof balls selected). Also, in Matlab, the total number of balls in the urn (i.e. m)comes as the first parameter of the distribution function, while it is the last one inGrinstead and Snell. For practical reasons, we choose to keep Matlab’s choice.

The gamma function presented in section 2.3 is covered in many textbook, as in[1]. An in-depth presentation of the gamma function is done in [18].

2.20 Exercises

Exercise 2.1 (Recurrence relation of binomial) Prove the proposition 2.14, which is thefollowing. For integers n > 0 and 0 < j < n, the binomial coefficients satisfy(

nj

)=

(n− 1j

)+

(n− 1j − 1

). (184)

Exercise 2.2 (Number of subsets) Assume that Ω is a finite set with n ≥ 0 elements. Provethat there are 2n subsets of Ω. Consider that ∅ and Ω are proper subsets of Ω.

Exercise 2.3 (Probabilities of Poker hands) Why does computing the probability of a straightflush forces to take into account the probability of the royal flush ? Explain other possible conflictsbetween Poker hands. Compute the probabilities for all Poker hands in figure 15.

This exercise is partly given in [7], in section 3.2, ”Combinations”.

Exercise 2.4 (Bernoulli trial for a die experiment) A die is rolled n = 4 times. What is theprobability that we obtain exactly one ”6” ? What is the probability for n = 1, 2, . . . , 12 ?

Exercise 2.5 (Probability of a flight crash) Assume that there are 20 000 flights of airplaneseach day in the world. Assume that there is one accident every 500 000 flights. What is theprobability of getting exactly 5 crash in 22 days ? What is the probability of getting at least 5crash in 22 days ? What is the probability of getting exactly 3 crash in 42 days ? What is theprobability of getting at least 3 crash in 42 days ? Consider now all that the year is a sequence of16 periods of 22 days (ignoring the 13 days left in the year). What is the probability of having oneperiod in the year which contains at least 5 crash ?

This exercise is presented in ”La loi des series noires”, by Janvresse and de la Rue [9].

Exercise 2.6 (Binomial function maximum) Consider the dicrete distribution function of aBernoulli process, as defined by 2.16. Show that

b(n, p, j) =p

q

(n− j + 1

j

)b(n, p, j − 1), (185)

for j ≥ 1. Compute jm ≥ 1 so that b(n, p, j) is maximum, i.e. so that

b(n, p, jm) ≤ b(n, p, j), 1 ≤ j ≤ n. (186)

Consider the experiment presented in section 3.4, which consists in tossing a coin 10 times andcounting the number of heads. With a Scilab simulation, can you compute what is the number ofheads which is the most likely to occur ?

This exercise is given in [7], in the exercise part of section 3.2, ”Combinations”.

57

Exercise 2.7 (Binomial coefficients and Pascal’s triangle) This exercise is given in [7], inchapter 3, ”Combinatorics”. Let a, b be two real numbers and let n be a positive integer. Provethe binomial theorem which states that

(a+ b)n =∑j=0,n

(nj

)ajbn−j . (187)

The binomial coefficients

(nj

)can be written in a triangle, where each line corresponds to n and

each row corresponds to j, as in the following array

A =

1.1. 1.1. 2. 1.1. 3. 3. 1.1. 4. 6. 4. 1.

(188)

Use the binomial theorem in order to prove that the sum of the terms in the n-th row is 2n. Provethat if the terms are added with alternating signs, then the sum is zero.

Binomial coefficients can also be represented in a matrix called Pascal’s matrix, where thebinomial coefficients are stored in the diagonals of the matrix.

A =

1. 1. 1. 1. 1.1. 2. 3. 4. 5.1. 3. 6. 10. 15.1. 4. 10. 20. 35.1. 5. 15. 35. 70.

(189)

Design a Scilab script to compute Pascal’s triangle and Pascal’s matrix and check that you findthe same results as presented in 188 and 189.

Exercise 2.8 (Binomial identity) Prove the following binomial identity(2nn

)=∑j=0,n

(nj

)2

. (190)

To help to prove this result, consider a set with 2n elements, where n elements are red and nelements are blue. Compute the number of ways to choose n elements in this set.

This exercise is given in [7], in chapter 3, ”Combinatorics”.

Exercise 2.9 (Earthquakes and predictions) Assume that a person ”predicts” the dates ofmajor earthquakes (with magnitude larger than 6.5 or with a large number of deaths, etc...) inthe world during 3 years, i.e. in a period of 1096 days. Assume that the ”specialist” predicts 169earthquakes. Assume that, during the same period, 196 major earthquakes really occur, so that 33earthquakes were correctly predicted by the ”specialist”. What is the probability that earthquakesare predicted by chance ?

This exercise is presented by Charpak and Broch in [4].

Exercise 2.10 (Log-factorial function) There is another possible implementation of the log-factorial function. Indeed, we have, by definition

n! = 1.2. . . . .n, (191)

which implies

fln(n) = log(n!) = log(1.2. . . . .n) (192)

=∑i=1,n

log(i). (193)

Propose an implementation of the log-factorial function based on formula 193.

58

3 Simulation of random processes with Scilab

In this section, we present how to simulate random events with Scilab. The problemof generating random numbers is more complex and will not be detailed in thischapter. We begin by a brief overview of random number generation and detailthe random number generator used in the rand function. Then we analyze how togenerate random numbers in the interval [0, 1] with the rand function. We presenthow to generate random integers in a given interval [0,m − 1] or [m1,m2]. In thefinal part, we present a practical simulation of a game based on tossing a coin.

3.1 Overview

In this section, we present a special class of random number generators so that wecan have a general representation of what exactly this means.

The goal of a uniform random number generator is to generate a sequence of realvalues un ∈ [0, 1] for n ≥ 0. Most uniform random number generators are based onthe fraction

un =xnm

(194)

where m is a large integer and xn is a positive integer so that 0 < xn < m . In manyrandom number generators, the integer xn+1 is computed from the previous elementin the sequence xn.

The linear congruential generators [10] are based on the sequence

xn+1 = (axn + c) (mod m), (195)

where

• m is the modulus satisfying m > 0,

• a is the multiplier satisfying 0 ≤ a < m,

• c is the increment satisfying 0 ≤ c < m,

• x0 is the starting value satisfying 0 ≤ x0 < m.

The parameters m, a, c and x0 should satisfy several conditions so that thesequence xn has good statistical properties. Indeed, naive approaches leads to poorresults in this matter. For example, consider the example where x0 = 0, a = c = 7and m = 10. The following sequence of number is produced :

0.6 0.9 0. 0.7 0.6 0.9 0. 0.7 . . . (196)

Specific rules allow to design the parameters of a uniform random number generator.As a practical example, consider the Urand generator [12] which is used by Scilab

in the rand function. Its parameters are

• m = 231,

59

• a = 843314861,

• c = 453816693,

• x0 arbitrary.

The first 8 elements of the sequence are

0.2113249 0.7560439 0.0002211 0.3303271 (197)

0.6653811 0.6283918 0.8497452 0.6857310 . . . (198)

3.2 Generating uniform random numbers

In this section, we present some Scilab features which allow to simulate a discreterandom process.

We assume here that a good source of random numbers is provided.Scilab provides two functions which allow to generate uniform real numbers in the

interval [0, 1]. These functions are rand and grand. Globally, the grand functionprovides much more features than rand. The random number generators whichare used are also of higher quality. For our purpose of presenting the simulation ofrandom discrete events, the rand will be sufficient and have the additional advantageof having a simpler syntax.

The simplest syntax of the rand function is

rand()

Each call to the rand function produces a new random number in the interval [0, 1],as presented in the following session.

-->rand()

ans =

0.2113249

-->rand()

ans =

0.7560439

-->rand()

ans =

0.0002211

A random number generator is based on a sequence of integers, where the firstelement in the sequence is the seed. The seed for the generator used in the rand ishard-coded in the library as being equal to 0, and this is why the function alwaysreturns the same sequence of Scilab. This allows to reproduce the behavior of ascript which uses the rand function more easily.

The seed can be queried with the function rand("seed"), while the functionrand("seed",s) sets the seed to the value s. The use of the ”seed” input argumentis presented in the following session.

-->rand("seed")

ans =

0.

-->rand("seed" ,1)

-->rand()

60

ans =

0.6040239

-->rand()

ans =

0.0079647

In most random processes, several random numbers are to use at the same time.Fortunately, the rand function allows to generate a matrix of random numbers,instead of a single value. The user must then provide the number of rows andcolumns of the matrix to generate, as in the following syntax.

rand(nr,nc)

The use of this feature is presented in the following session, where a 2× 3 matrix ofrandom numbers is generated.

-->rand (2,3)

ans =

0.6643966 0.5321420 0.5036204

0.9832111 0.4138784 0.6850569

3.3 Simulating random discrete events

In this section, we present how to use a uniform random number generator to gener-ate integers in a given interval. Assume that, given a positive integer m, we want togenerate random integers in the interval [0,m− 1]. To do this, we can use the rand

function, and multiply the generated numbers by m. We must additionally use thefloor function, which returns the largest integer smaller than the given number.The following function returns a matrix with size nr×nc, where entries are randomintegers in the set 0, 1, . . . ,m− 1.

function ri = generateInRange0M1 ( m , nbrows , nbcols )

ri = floor(rand( nbrows , nbcols ) * m)

endfunction

In the following session, we generate random integers in the set 0, 1, . . . , 4.-->r = generateInRange0M1 ( 5 , 4 , 4 )

r =

2. 0. 3. 0.

1. 0. 2. 2.

0. 1. 1. 4.

4. 2. 4. 0.

To check that the generated integers are uniform in the interval, we compute thedistribution of the integers for 10000 integers in the set 0, 1, . . . , 4. We use the barto plot the result, which is presented in the figure 18. We check that the probabilityof each integer is close to 1

5= 0.2.

-->r = generateInRange0M1 ( 5 , 100 , 100 );

-->counter = zeros (1,5);

-->for i = 1:100

-->for j = 1:100

--> k = r(i,j);

--> counter(k+1) = counter(k+1) + 1;

61

Figure 18: Distribution of random integers from 0 to 4.

-->end

-->end

-->counter = counter / 10000;

-->counter

counter =

0.2023 0.2013 0.1983 0.1976 0.2005

-->bar(counter)

We emphasize that the previous verifications allow to check that the empiricaldistribution function is the expected one, but that does not guarantee that theuniform random number generator is of good quality. Indeed, consider the sequencexn = n (mod 5). This sequence produces uniform integers in the set 0, 1, . . . , 4,but, obviously, is far from being truly random. Testing uniform random numbergenerators is a much more complicated problem and will not be presented here.

It is easy to adapt the previous function to various needs. For example, the fol-lowing function returns a matrix with size nr×nc, where entries are random integersin the set 1, 2, . . . ,m.

function ri = generateInRange1M ( m , nr , nc )

ri = ceil(rand( nr , nc ) * m)

endfunction

The following function returns a matrix with size nr×nc, where entries are randomintegers in the set m1,m1 + 1 . . . ,m2.

function ri = generateInRangeM12 ( m1 , m2 , nr , nc )

f = m2 - m1 + 1

ri = floor(rand( nr , nc ) * f) + m1

endfunction

62

3.4 Simulation of a coin

Many practical experiments are very difficult to analyze by theory and, most of thetime, very easy to experiment with a computer. In this section, we give an exampleof a coin experiment which is simulated with Scilab. This experiment is simple, sothat we can check that our simulation matches the result predicted by theory. Inpractice, when no theory is able to predict a probability, it is much more difficult toassess the result of simulation.

The following Scilab function generates a random number with the rand functionand use the floor in order to get a random integer, either 1, associated with ”Head”,or 0, associated with ”Tail”. It prints out the result and returns the value.

// tossacoin --

// Prints "Head" or "Tail" depending on the simulation.

// Returns 1 for "Head", 0 for "Tail"

function face = tossacoin ( )

face = floor ( 2 * rand() );

if ( face == 1 ) then

mprintf ( "Head\n" )

else

mprintf ( "Tail\n" )

end

endfunction

With such a function, it is easy to simulate the toss of a coin. In the followingsession, we toss a coin 4 times. The ”seed” argument of the rand is used so that theseed of the uniform random number generator is initialized to 0. This allows to getconsistent results across simulations.

rand("seed" ,0)

face = tossacoin ();




The previous script produces the following output.

Tail

Head

Tail

Tail

Assume that we are tossing a fair coin 10 times. What is the probability thatwe get exactly 5 heads ?

This is a Bernoulli process, where the number of trials is n = 10 and the prob-ability is p = 5. The probability of getting exactly j = 5 heads is given by thebinomial distribution and is

P (”exactly 5 heads in 10 toss”) = b(10, 1/2, 5) (199)

=

(105

)p5q10−5, (200)

where p = 1/2 and q = 1− p. The expected probability is therefore

P (”exactly 5 heads in 10 toss”) ≈ 0.2460938. (201)

63

The following Scilab session shows how to perform the simulation. Then, weperform 10000 simulations of the process. The floor function is used in combinationwith the rand function to generate integers in the set 0, 1. The sum allows to countthe number of heads in the experiment. If the number of heads is equal to 5, thenumber of successes is updated accordingly.

-->rand("seed" ,0);

-->nb = 10000;

-->success = 0;

-->for i = 1:nb

--> faces = floor ( 2 * rand (1,10) );

--> nbheads = sum(faces);

--> if ( nbheads == 5 ) then

--> success = success + 1;

--> end

-->end

-->pc = success / nb

pc =

0.2507

The computed probability is P = 0.2507 while the theory predicts P = 0.2460938,which means that there is lest than two significant digits. It can be proved that whenwe simulate an experiment of this type n times, we can expect that the error is lessor equal to 1√

nat least 95% of the time. With n = 10000 simulations, this error

corresponds to 0.01, which is the accuracy of our experiment.

3.5 Simulation of a Galton board

In this section, we define the binomial distribution function. We give the exampleof the Galton board which is based on a Bernoulli process and simulate this ran-dom process. We present the Scilab functions which allow to manage the binomialfunction and present their use on the Galton board example.

Definition 3.1. ( Binomial distribution function) Let n be a positive integer and letp be a real in the interval [0, 1]. Assume that the random variable B is the number ofsuccesses in a Bernoulli process with parameters n and p. The distribution functionb(n, p, j) of B is the binomial distribution.

A Galton board is board in which a ball is dropped at the top of the board anddeflected off a number of pins on their way down to the bottom of the board. Thelocation of the ball at the end of one trial is the result of random deflections eitherto the right or to the left. The figure 19 presents a Galton board.

The following function allows to simulate one fall of a ball on a Galton board.It takes as its input argument the number n of steps in the Galton board, so thatthe final number of cups (where the ball fall) is n + 1. At each stage, a randomnumber is generated with the rand. If the random number is lower than 1/2, theball is deflected to the left. If not, the ball is deflected to the right. Each timea ball is deflected to the right, the integer jmin is increased. After n deflections,the variables jmin is the index of the cup into which the ball has fallen. The samealgorithm can be designed with the jmax index, with initial value n+ 1. That indexwould be decreased each time the ball is deflected to the left.

64

Figure 19: A Galton board.

// simulgalton --

// Performs one simulation of the Galton board with n stages ,

// and returns the index j=1,2,...,n where the ball falls. --

function j = simulgalton ( n , verbose )

if exists("verbose","local")==0 then

verbose = 0

end

jmin = 1

for k = 1 : n

if verbose == 1 then

mprintf("Step #%d (%d)\n",k,jmin)

end

r = rand()

if r<0.5 then


mprintf("To the left !\n")

end

else


mprintf("To the right !\n")

end

jmin = jmin + 1

end

end

j = jmin

endfunction

In the following Scilab script, we perform 10 000 experiments of the Galton board.The cups variables stores the number of balls in each cup. For each experiment, weupdate the number of balls in the cup which has been randomly generated by theprocess. The bar function allows to plot the figure.

rand ( "seed" , 0 )

n = 10

cups = zeros(1,n+1)

nshots = 10000

for k = 1: nshots

j = simulgalton ( n );

cups(j) = cups(j) + 1;

65

Galton boardBinomial distribution

0.00

0.05

0.10

0.15

0.20

0.25

1 2 3 4 5 6 7 8 9 10 11

Simulation of the Galton board with n=100

Figure 20: Simulation of a Galton board with n = 10 stages and 100 simulations. –The line is the binomial distribution function.

end

bar(1:n+1, cups)

The figures 20, 21 and 21 presents the result of the simulation of a Galton boardfor n = 100, n = 1000 and n = 1000 as bar plots.

In the figure 23, we present Scilab functions which allow to manage the binomialdistribution. The cdfbin cumulated density function will be presented in the moregeneral context of cumulated density functions.

In the following session, we use the binomial function to compute the proba-bilities of the binomial distribution function. The plot function generates the plotwhich is presented in the previous figures, where the binomial distribution functionis computed with n = 10 and p = 0.5.

pr = binomial (0.5,n)

plot (1:n+1, pr)

In the figures 20, 21 and 21, we see that the bar plot representing the Galtonsimulations converge toward line plot representing the binomial distribution func-tion.

3.6 Generate random permutations

In this section, we present Scilab features which allows to generate random permu-tations. This problem is similar to the problem of shuffling a deck of cards. Thiscan be done by using the perms and grand functions.

The figure 24 presents the Scilab functions which allow to generate random per-mutations.

The perms function computes all the permutations of the given vector of indexes.If the size of the input vector is n, then the size of the output matrix is n!×n. In the

66


0.00

0.05

0.10

0.15

0.20

0.25

1 2 3 4 5 6 7 8 9 10 11


Figure 21: Simulation of a Galton board with n = 10 stages and 1000 simulations.


0.00

0.05

0.10

0.15

0.20

0.25

0.30

1 2 3 4 5 6 7 8 9 10 11


Figure 22: Simulation of a Galton board with n = 10 stages and 10 000 simulations.

binomial returns a matrix containing b(n, p, j) for j = 1, 2, . . . , ncdfbin computes the binomial cumulated density function

Figure 23: Scilab commands for the binomial distribution function

perms returns all the permutations of the given vectorgrand ( i , ’prm’ , v ) generate i random permutations of the column vector v

Figure 24: Scilab commands to generate random permutations

67

following session, we use the perms to compute all the permutations of the vector(1, 2, 3).

-->perms (1:3)

ans =

3. 2. 1.

3. 1. 2.

2. 3. 1.

2. 1. 3.

1. 3. 2.

1. 2. 3.

When the number of elements in the array is small, lower than 5 for example,we may generate all the possible permutations and randomly chose one of them. Inthe following session, we use n=4 and store in p all the possible permutations of thecolumn vector (1, 2, 3, 4)T . Then we generate the random number j in the interval[1, n] and use this to get the permutation stored in the j-th row of p.

-->n = 4

n =

4.

-->p = perms ( (1:n)’ )

p =

4. 3. 2. 1.

4. 3. 1. 2.

4. 2. 3. 1.

[...]

1. 3. 2. 4.

1. 2. 4. 3.

1. 2. 3. 4.

-->j = floor ( rand ( ) * n) + 1

j =

4.

-->v = p(j,:)

v =

4. 2. 1. 3.

The previous method is feasible, but only for very small values of n. Indeed, whenn grows, the required memory is grows as fast as n!, which is impractical for evenmoderate values of n.

The following randperm function returns a random permutation of the numbersin the interval [1, n]. It is based on the use of the grand function, which is usedto generate numbers in the uniform in the interval [0, 1[. Then, we use the gsort

function in order to sort the array and compute the order of the integers. Combined,these two functions allows to generate a random permutation, at the cost of the sortof a large number of values.

function p = randperm ( n )

[ignore , p] = gsort(grand(1,n,"def"),"c","i");

endfunction

For moderate values of n, the randperm function performs well, but for large valuesof n, sorting the array might be expensive.

The grand function provides an algorithm to compute a random permutationover a given array of values. This function is presented in figure 24. In the following

68

session, we use the grand function three times and get three independent permu-tations of the vector (1:10)’. A side effect of the call to this function is that isupdates the state of the uniform random number generator used by grand.

-->s = grand ( 1 , "prm" , (1:10) ’ )’

s =

5. 1. 4. 3. 8. 7. 2. 10. 6. 9.

-->s = grand ( 1 , "prm" , (1:10) ’ )’

s =

4. 2. 3. 9. 6. 8. 10. 7. 5. 1.

-->s = grand ( 1 , "prm" , (1:10) ’ )’

s =

7. 4. 8. 9. 5. 2. 10. 1. 6. 3.

The source code used by grand to produce random permutations has been imple-mented by Bruno Pinon. The algorithm is presented in [10], section 3.4.2 ”Randomsampling and shuffling”. According to Knuth, this algorithm was first published byMoses and Oakford [13] in 1963 and by Durstenfeld [5] in 1964.

The following genprm function is a simplified version of this algorithm. We as-sume that the size of the input matrix x is n. The algorithm proceeds by performingn steps of the algorithm. At the step i, we compute an integer k as a random integeruniform in the interval [i, n]. Then we exchange the values at indices i and k.

function x = genprm ( x )

n = size(x,"*")

for i = 1:n

t = grand(1,1,"unf" ,0,1)

k = floor ( t * (n - i + 1)) + i

elt = x(k)

x(k) = x(i)

x(i) = elt

end

endfunction

In the following session, we use the genprm function in order to generate threeindependent permutations of the vector (1:10)’.

-->genprm ( 1:10 )

ans =

10. 2. 9. 8. 5. 3. 4. 7. 6. 1.

-->genprm ( 1:10 )

ans =

7. 4. 2. 9. 3. 1. 6. 5. 10. 8.

-->genprm ( 1:10 )

ans =

5. 2. 9. 6. 3. 4. 10. 1. 7. 8.

We emphasize that the previous function is not designed to be used in practice,since the grand(1,"prm",x) provides as faster implementation (based on a compiledsource code) of the same algorithm.

3.7 References and notes

Some of the section 3 which introduces to random number generators is based on[10].

69

4 Acknowledgments

I would like to thank John Burkardt for his comments about the numerical compu-tation of the permutation function. Thanks are also adressed to Samuel Gougeonwho suggested to improve the performance of the computation of the Pascal matrix.

70

5 Answers to exercises

5.1 Answers for section 1

Answer of Exercise 1.1 (Head and tail) Assume that we have a coin which is tossed twice. Werecord the outcomes so that the order matters, i.e. the sample space is Ω = HH,HT, TH, TT.Assume that the distribution function is uniform, i.e. each of the head and the tail have an equalprobability. The size of the sample space Ω is #(Ω) = 4. The distribution is uniform so thatP (x) = 1

4 , for all x ∈ Ω.

1. What is the probability for the event A = HH,HT, TH ? The number of elements in theevent is #(A) = 3. By proposition 1.9, the probability of the event A is

P (A) =#(A)

#(Ω)=

3

4. (202)

2. What is the probability for the event A = HH,HT ?

P (A) =#(A)

#(Ω)=

2

4=

1

2. (203)

Answer of Exercise 1.2 (Two dice) Assume that we are rolling a pair of dice. Assume thateach face has an equal probability. The sample space is

Ω = (i, j)/i, j = 1, 6. (204)

The size of the sample space is #(Ω) = 36. The distribution is uniform so that P (x) = 136 , for all

x ∈ Ω.

1. What is the probability of getting a sum of 7 ? A sum of 7 corresponds to the event

A = (1, 6), (6, 1), (2, 5), (5, 2), (4, 3), (3, 4). (205)

The number of elements in the event is #(A) = 6. The probability is therefore P (A) =636 = 1

6 .

2. What is the probability of getting a sum of 11 ? A sum of 11 corresponds to the event

A = (5, 6), (6, 5). (206)

The number of elements in the event is #(A) = 2. The probability is therefore P (A) =236 = 1

18 .

3. What is the probability of getting a double one, i.e. snakeeyes ? The event is made of theset A = (1, 1), with #(A) = 1. Its probability is P (A) = 1

36 .

Answer of Exercise 1.3 (Mere’s experiments) The two proofs are based on the fact thatone event and its complementary event are satisfying P (A) + P (Ac) = 1. Since P (A) is complexto compute directly, we compute instead P (Ac) and then use P (A) = 1− P (Ac).

The event A is that with four rolls of a die, at least one six turns up. The fact that a die isrolled four times corresponds to the sample space

Ω = i / i = 1, 64. (207)

The size of the sample space is #(Ω) = 64. Two make the computation more easy, we consider thecomplementary event Ac, which is

Ac = i / i = 1, 54. (208)

71

The size of Ac is #(Ac) = 54 The probability of event A is therefore P (A) = 1−(56

)4 ≈ 0.5177469.Since P (A) > 1/2, de Mere wins consistently.

De Mere claims in 24 rolls of two dice, a pair of 6 would turn up (event B). The sample spaceis now

Ω = (i, j) / i, j = 1, 624. (209)

The size of the sample space is #(Ω) = 3624. The complementary event is

Ac = (i, j) / i, j = 1, 5 or (6, i) / i = 1, 5 or (i, 6) / i = 1, 524. (210)

The size of Ac is #(Ac) = (25 + 5 + 5)24 = 3524. The probability of event A is therefore P (A) =

1−(3536

)24 ≈ 0.4914039 < 1/2, which explains why de Mere looses consistently.De Mere claims that 25 rolls were necessary to make the game favorable (event C). The same

derivation leads to the probability P (A) = 1−(3536

)24 ≈ 0.5055315 > 1/2.Can you compute with Scilab the probability for event A and a number of rolls equal to 1, 2,

3 or 4 ? The following Scilab session shows how to perform the computation.

-->i=1:4

i =

1. 2. 3. 4.

-->1-(5/6)^i

ans =

0.1666667 0.3055556 0.4212963 0.5177469

Can you compute with Scilab the probability for the event B or C for a number of rolls equal to10, 20, 24, 25, 30 ?

The following Scilab session shows how to perform the computation.

-->i=[10 20 24 25 30]

i =

10. 20. 24. 25. 30.

-->1-(35/36)^i

ans =

0.2455066 0.4307397 0.4914039 0.5055315 0.5704969

Answer of Exercise 1.4 (Independent events) Assume that Ω is a finite sample space.Assume that the two events A,B ⊂ Ω are independent. Let us prove that

P (B|A) = P (B). (211)

By definition of the conditional probability, we have

P (B|A) =P (B ∩A)

P (A), (212)

where, by hypothesis, P (A) > 0. We have B ∩ A = A ∩ B so that P (B ∩ A) = P (A ∩ B) =P (A|B)P (B). Moreover, A and B are independent which implies P (A|B) = P (A). This leads toP (B ∩A) = P (A)P (B). We plug this equality into the equation 212 and we get P (B|A) = P (B),which concludes the proof .

Answer of Exercise 1.5 (Boole’s inequality) Assume that Ω is a finite sample space. Let(Ei)i=1,n be a sequence of finite subsets of Ω and n > 0. Let s prove Boole’s inequality:

P

⋃i=1,n

Ei

≤ ∑i=1,n

P (Ei). (213)

To prove this result, we will use the following result.

72

Let E1, E2 ⊂ Ω be two sets not necessarily disjoints. Therefore, we have

P (E1 ⊂ E2) ≤ P (E1) + P (E2). (214)

The equality 214 is the result of proposition 1.7, which states that P (E1 ∪E2) = P (E1) +P (E2)−P (E1 ∩E2). The equality 214 can be deduced from the fact that P (E1 ∩E2) ≥ 0, by definition ofa probability.

The equality 213 is therefore true for n = 2. To finish the proof, we will use induction. Let usassume that the inequality is true for n, and let us prove that it is true for n + 1. Let us denoteby Fn ⊂ Ω the set defined by

Fn =⋃

i=1,n

Ei. (215)

We want to compute the probability P(⋃

i=1,n+1Ei

)= P (Fn+1). We see that Fn+1 = En+1∪Fn.

We use the equality 214, which leads to

P (Fn+1) = P (En+1 ∪ Fn) ≤ P (En+1) + P (Fn). (216)

By hypothesis, the result is true for n, which implies

P (Fn) = P

⋃i=1,n

Ei

≤ ∑i=1,n

P (Ei). (217)

We now plug 217 into 216 and get

P (Fn+1) ≤ P (En+1) +∑i=1,n

P (Ei) =∑

i=1,n+1

P (Ei), (218)


Answer of Exercise 1.6 (Discrete conditional probability) In this exercise, we consider alaboratory blood test which is experimented against healthy and persons who have the disease.Assume that, when the person has the disease, the test is positive with probability 99 %. However,this test also generates ”false positive” with probability 1 %. This means that, when a person doesnot have the disease, the test is positive with probability 1 %. Assume that 0.5 % of the populationhas the disease. Given that the test is positive for one person, what is the probability that thisperson has the disease ?

We see that this situation is based on conditional probabilities, where we would like to computeposterior probabilities. This is why we would like to use Bayes’ formula 64. Let us denote by Dthe event that the person has the disease and E the event that the test is positive. The event Dis the hypothesis which allow to decompose the sample space Ω of all persons into the subsets D(the persons who have the disease) and Dc (the healthy persons). We know that P (E|D) = 0.99and P (E|Dc) = 0.01. We know that P (D) = 0.005, which implies P (Dc) = 1−P (D) = 0.995. Wenow use Bayes’ formula and have

P (D|E) =P (E|D)P (D)

P (E|D)P (D) + P (E|Dc)P (Dc)(219)

=0.99× 0.005

0.99× 0.005 + 0.01× 0.995(220)

≈ 0.3322 (221)

with 4 significant digits. This might be surprising, since the probability of a false positive is 1 %,which is quite small.

Consider the case where 5 % of the population has the disease.

P (D|E) =0.99× 0.05

0.99× 0.05 + 0.01× 0.95(222)

≈ 0.8389 (223)

73

P (E|Dc) P (D|E)0.100000 0.0473910.050000 0.0904940.010000 0.3322150.005000 0.4987410.001000 0.8326320.000500 0.9086740.000100 0.980295

Figure 25: Probabilities of having the disease, given that the test is positive –P (E|D) = 0.99 , P (D) = 0.005. The bold data is the data presented in the text ofthe exercise, which corresponds to P (E|Dc) = 0.01. The lower the probability of a”false positive”, the more the result is accurate.

P (D) P (D|E)0.500000 0.9900000.100000 0.9166670.050000 0.8389830.010000 0.5000000.005000 0.3322150.001000 0.0901640.000500 0.047188

Figure 26: Probabilities of having the disease, given that the test is positive –P (E|D) = 0.99 , P (E|Dc) = 0.01. The bold data is the data presented in the textof the exercise, which corresponds to P (D) = 0.005. The more disease is rare, themore the result is accurate.

with 4 significant digits. This shows that when more people have the disease (which is certainlynot desirable), the probability is higher.

Consider the case where the probability for a false positive is 0.1 % (but keep the probabilityof a true positive equal to 99 %).

P (D|E) =0.99× 0.05

0.99× 0.05 + 0.001× 0.95(224)

≈ 0.9811 (225)

with 4 significant digits. This shows that when the probability of a false positive is lower, theprobability of having the disease, given that the test is positive, is higher.

The previous results might surprise a little bit, but are the result of the false positive. Theseresults are presented in the figures 25 and 26, which present the probability P (D|E) computedwith varying parameters P (E|Dc) and P (D). The conclusion of this experiments is that a falsepositive can reduce the reliability of the test if the disease is rare, or if the probability of a falsepositive is high.

To make the previous computations clearer, consider the example where the population counts1000 persons. Since 0.5 % of the population has the disease, this makes 0.005× 1000 = 5 personswho have the disease and 0.995 × 1000 = 995 who do not have the disease. From the 5 personswho have the disease, there will be 0.99×5 = 4.95 persons who will have a positive test. Similarly,from the 995 person who do not have the disease, there will be 0.01× 995 = 9.95 persons who willhave a positive test. Therefore, given that the test is positive, the probability that the person has

74

the disease is

4.95

4.95 + 9.95= 0.3322. (226)

5.2 Answers for section 2

Answer of Exercise 2.1 (Recurrence relation of binomial) Let us prove proposition 2.14, i.e.that, for integers n > 0 and 0 < j < n, the binomial coefficients satisfy(

nj

)=

(n− 1j

)+

(n− 1j − 1

). (227)

The proof is based on the expansion of the binomial formula. By definition of the binomialnumber 152, we have (

n− 1j

)=

(n− 1) . . . ((n− 1)− j + 1)

j(j − 1) . . . 1(228)

=(n− 1) . . . (n− j)j(j − 1) . . . 1

, (229)

and (n− 1j − 1

)=

(n− 1) . . . ((n− 1)− (j − 1) + 1)

(j − 1)(j − 2) . . . 1(230)

=(n− 1) . . . (n− j + 1)

(j − 1)(j − 2) . . . 1. (231)

We can sum the two previous equalities and obtain(n− 1j

)+

(n− 1j − 1

)=

(n− 1) . . . (n− j + 1)(n− j)j(j − 1) . . . 1

+(n− 1) . . . (n− j + 1)

(j − 1)(j − 2) . . . 1(232)

=((n− j) + j) (n− 1) . . . (n− j + 1)

j(j − 1) . . . 1(233)

=n(n− 1) . . . (n− j + 1)

j(j − 1) . . . 1, (234)

which proves 227 and concludes the proof.

Answer of Exercise 2.2 (Number of subsets) Assume that Ωn is a finite set with n ≥ 0elements. We consider that ∅ and Ωn are proper subsets of Ωn. Let us prove that there are 2n

subsets of Ωn.In order to simplify the proof, for n ≥ 0, we can consider, without loss of generality, the set

Ωn = 1, 2, . . . , n.It is easy to see that the claim is true for n = 0, since the set A1 = ∅ = Ωn is a subset of Ωn,

so that there are 20 = 1 subsets. For n = 1, we can construct the sets A1 = ∅, A2 = 1 = Ωn sothat there are 21 = 2 subsets. It is more interesting to construct the sequence of subsets for n = 2.Indeed, the list of subset is A1 = ∅, A2 = 1 = Ωn, A3 = 2 = Ωn, A4 = 1, 2 = Ωn, so thatthe claim is true since there are 22 = 4 subsets.

We can make the proof by induction on n, since we have prooved that the claim is true forn = 0, 1, 2. Assume that the claim is true for n and let us prove that there are 2n+1 subsets of the setΩn+1. Let us denote by (Aj)j=1,2n the sequence of subsets of the set Ωn. Since Ωn+1 = Ωn∪n+ 1,we have

Aj ⊂ Ωn ⊂ Ωn+1, (235)

75

for j = 1, 2n. We can construct additionnal subsets by considering the sets

Bj = Aj ∪ n+ 1, (236)

for j = 1, 2n. For each j = 1, 2n, we have Bj ⊂ Ωn+1. We have found 2n sets (Aj)j=1,2n and 2n

sets (Bj)j=1,2n which are subsets of Ωn+1. The total number of subsets is therefore 2n+2n = 2n+1,which concludes the proof.

Answer of Exercise 2.3 (Probabilities of Poker hands) Why does computing the probabilityof a straight flush forces to take into account the probability of the royal flush ?

Consider the event where the 5 cards are in sequence with a single suit. This might be a straightflush, if the last card in the sequence is not an ace. If the last card is an ace, then the hand is nota straight flush anymore, since it is a royal flush. Therefore, the event which is associated with astraight flush is ”a sequence of cards with a single suit, but not a royal flush”. When computing theprobability of a straight flush, the simplest is to compute the probability of the event ”sequence ofcards with a single suit”, and to remove the probability of the royal flush.

Explain other possible conflicts between Poker hands.The following is a list of conflicts between Poker hands.

• A straight is a hand with 5 fives in sequence, but at least two cards have different suit. Ifthe cards are in sequence and have the same suit, this is not a straight anymore, this is astraight flush.

• A double pair is when there are two pairs in the hand. But the two pairs must have differentsuit, since, if they all have the same suit, this is not a double pair anymore, this is a four ofa kind.

• A flush is a hand where all the cards have the same suit. But the cards must not be insequence, since they would not form a flush anymore, but would form a straight flush or aroyal flush.

Compute the probabilities for all Poker hands in figure 15.The probability of a pair is computed as follows. There are 13 different ranks in the deck and,

once the rank is chosen, there are

(42

)different pairs of this rank Therefore, there are 13

(42

)different pairs in the deck. The remaining 3 cards in the hand are chosen so that there have adifferent suit from the current pair. If not, there would be a ”three of a kind” or even a four of a

kind. Therefore, their suit must be chosen in the remaining 12 suits. There are

(123

)different set

of 3 cards which have different ranks. For each card, there are 4 different suits, so that there are(123

).43 different combinations for the remaining 3 cards in the hand. The probability of a pair

is therefore

P (pair) =

13.

(42

).

(123

).43(

525

).

=1098240

2598960≈ 0.4225690 (237)

To compute the probability of a double pair, we must take into account the fact that the twopairs must have different suits. If not, that would be a four of a kind, instead of a double pair.This is why we begin by choosing two different ranks in the set of 13 ranks available. Once done,each pair can have two of the four suits. Therefore, the total number of double pair in the first 4

cards is

(132

).

(42

)2

. The 5th card must be chosen from the 11 remaining ranks (if not, one of

the pair would become a three of a kind). Once the rank of the 5th card is chosen, it can have oneof the 4 available suits. Therefore, there are 11 ∗ 4 different choices for the 5th card. In the end,

76

the probability for a double pair is

P (double pair) =

(132

).

(42

)2

.11.4(525

).

123552

2598960≈ 0.0475390 (238)

The probability of a three of a kind is computed as follows. There are 13 different ranks and

there are

(43

)ways to choose 3 cards in a set of 4. Therefore, there are 13.

(43

)different ways

to select the 3 first cards of the same kind. The last two cards must be chosen so that it does

not create a four of a kind. There are

(122

)different ways to select two ranks in the set of the

remaining 12 ranks. After that the ranks are chosen, each of the 2 cards can have one of the 4

suits, so that there are 42 ways to select the suits. All in all, there are

(122

).42 different ways to

select the last 2 cards. In the end, the probability for a three of a kind is

P (three of a kind) =

13.

(43

).

(122

).42(

525

).

=54912

2598960≈ 0.0211285 (239)

The probability of a four of a kind and of a full house have already been computed in the text.The probability of a straight flush is computed as follows. The following is a list of the 10

possible sequences:

1 2 3 4 5 – 2 3 4 5 6 – 3 4 5 6 7 – 4 5 6 7 8 – 5 6 7 8 9 – 6 7 8 9 107 8 9 10 J – 8 9 10 J Q – 9 10 J Q K – 10 J Q K 1,

where all the cards can take the same among the 4 possible suits , ♥, ♣ and ♠. Therefore, thetotal number of such hands is 4 ∗ 10 = 40. In order for these hands to be straight flush, and notroyal flush, we must remove the 4 royal flush. Finally, the total number of straight flush is 4∗10−4and the probability of this hand is

P (straight flush) =4 ∗ 10− 4(

525

).

=36

2598960≈ 0.0000154 (240)

The probability of the royal flush is easy to compute since there are only 4 such hands in thedeck. The probability of the royal flush is therefore

P (royal flush) =4(

525

).

=4

2598960≈ 0.0000015 (241)

The probability of a straight is computed as follows. The following is a list of the 10 possiblesequences:

1 2 3 4 5 – 2 3 4 5 6 – 3 4 5 6 7 – 4 5 6 7 8 – 5 6 7 8 9 – 6 7 8 9 107 8 9 10 J – 8 9 10 J Q – 9 10 J Q K – 10 J Q K 1,

where each card can have one of the 4 suits , ♥, ♣ and ♠. Therefore, the total number of suchhands is 45.10. But we require that a straight is neither a straight flush, nor a royal flush, so thatwe have to remove these higher value hands. Using the previous computation for the straight flush,we get 45.10− 4 ∗ 10 different straight. Therefore, the probability of the straight is

P (straight) =45.10− 4 ∗ 10(

525

).

=10200

2598960≈ 0.0039262 (242)

77

Name Number Probabilitytotal 2598960 1.no pair 1302540 0.5011774pair 1098240 0.4225690double pair 123552 0.0475390three of a kind 54912 0.0211285straight 10200 0.0039262flush 5108 0.0019654full house 3744 0.0014406four of a kind 624 0.0002401straight flush 36 0.0000139royal flush 4 0.0000015

Figure 27: Probabilities of Poker hands

The probability of a flush is computed as follows. There are 4 suits in the deck. Once the

suit is chosen, there are

(135

)different sequences of 5 cards which have the same suit. The total

number of such a hand is therefore 4.

(135

). But we have to remove straight flush and royal flush

from this counting. This leads to 4.

(135

)− 4 ∗ 10 different flush. Therefore, the probability of this

hand is

P (flush) =

4.

(135

)− 4 ∗ 10(

525

).

=5108

2598960≈ 0.0019654 (243)

The probability of a no pair event is simple to compute when we already know the probabilitiesof all other events.

P (no pair) = 1− P (royal flush)− P (straight flush)− P (four of a kind)− P (full house)(244)

−P (flush)− P (straight)− P (three of a kind)− P (double pair)− P (pair)(245)

=

4.

(135

)− 4 ∗ 10(

525

).

=1302540

2598960≈ 0.5011774 (246)

The results are summarized in the figure 27.

Answer of Exercise 2.4 (Bernoulli trial for a die experiment) A fair die is rolled n = 4times. What is the probability that we obtain exactly one ”6” ? For each roll, the probability ofgetting a six is p = 1/6. Since each roll is independent of the previous rolls, this is a Bernoulliprocess with n = 4 trials. Therefore the probability of getting exactly one ”6” is

b(4, 1/6, 1) =

(41

)(1/6)1(5/6)3 ≈ 0.3858025. (247)

What is the probability for n = 1, 2, . . . , 12 ? The following Scilab function computes theprobability of getting exactly one ”6” in n toss of a fair die.

78

// one6inNtoss --

// Computes the probability of getting exactly one "6" in n toss of a fair die.

function bnpj = one6inNtoss ( n )

p = 1/6

q = 1-p

j = 1

bnpj = nchoosek(n,j) * p^j * q^(n-j)

endfunction

In the following session, we use the function one6inNtoss to compute the probability for n =1, 2, . . . , 12.

-->for n = 1:12

--> b = one6inNtoss ( n );

--> mprintf("In %d toss , p(one six)=%f\n",n,b);

-->end

In 1 toss , p(one six )=0.166667









In 10 toss , p(one six )=0.323011

In 11 toss , p(one six )=0.296094

In 12 toss , p(one six )=0.269176

Answer of Exercise 2.5 (Probability of a flight crash) Assume that there are 20000 flightsof airplanes each day in the world. Assume that there is one accident every 500 000 flights. Whatis the probability of getting exactly 5 crash in 22 days ?

There are several answers to this question, depending on the accuracy required.

1. Flight by flight. The first approach is based on the analysis of a Bernoulli process, whereeach flight has a probability of crash.

2. Time decomposition. The second approach is based on the analysis of a Bernoulli process,where each days has a probability of crash.

3. Poisson approximation. The third approach is based on the Poisson approximation of thebinomial distribution function.

This answer is based on the hypothesis that the flights are independent. Therefore, the processcan be considered as a Bernoulli process where each flight has a crash probability equal to p =

1500000 . The number of steps in the Bernoulli process is equal to the number of flights n. In 22days, the number of flights is n = 22×20000. The probability of getting exactly 5 crash in 22 daysis

P (”exactly 5 crash in 22 days”) =

(22× 20000

5

)p5q22×20000−5 ≈ 0.0018241 (248)

The figure 28 presents the results for various number of crash.What is the probability of at least 5 crash in 22 days ? The probability of getting more than

5 crash can be computed as depending the probability of getting 0, 1, 2, 3 or 4 crash. Therefore,

P (”at least 5 crash in 22 days”) = 1− P (”0, 1, 2, 3 or 4 crash in 22 days”) (249)

= 1−∑j=0,4

(22× 20000

j

)pjq22×20000−j (250)

≈ 0.0021294 (251)

79

Event Probability0 crash in 22 days 0.414782551 crash in 22 days 0.365009372 crash in 22 days 0.160604083 crash in 22 days 0.047110414 crash in 22 days 0.010364245 crash in 22 days 0.001824096 crash in 22 days 0.000267537 crash in 22 days 0.000033638 crash in 22 days 0.000003709 crash in 22 days 0.0000003610 crash in 22 days 0.00000003

Figure 28: Crash probabilities for 22 days with one crash every 500 000 flights and20000 flights each day

Event Probability0 crash in 42 days 0.186373661 crash in 42 days 0.313108382 crash in 42 days 0.263011253 crash in 42 days 0.147286254 crash in 42 days 0.061860135 crash in 42 days 0.020784946 crash in 42 days 0.005819767 crash in 42 days 0.001396748 crash in 42 days 0.000293319 crash in 42 days 0.0000547510 crash in 42 days 0.00000920

Figure 29: Crash probabilities for 42 days with one crash every 500 000 flights and20 000 flights each day

What is the probability of getting exactly 3 crash in 42 days ? What is the probability ofgetting at least 3 crash in 42 days ?

The same computations can be performed for 42 days, which represent 6 weeks. The figure 29presents the results.

The probability of having exactly 3 crash in 42 days is

P (”exactly 3 crash in 42 days”) =

(22× 20000

3

)p3q22×20000−3 (252)

≈ 0.14728625 (253)

The probability of having at least 3 crash in 42 days is

P (”at least 3 crash in 42 days”) = 1− P (”0, 1, 2, 3 or 4 crash in 22 days”) (254)

= 1−∑j=0,4

(22× 20000

j

)pjq22×20000−j (255)

≈ 0.2375067 (256)

80

We now present another approach for the computation of the same problem. The method isbased on counting the number of crash during a given time unit, for example one day. We considerthat the process is a Bernoulli process, where each day is associated to the probability that exactly1 crash occur, which is obviously an approximation, since it is possible that more than one crashoccur during one day. Since there is one crash every 500 000 flights, the probability that one flighthas no crash is p = 49999

50000 . By hypothesis, all flight are independent, therefore, the probability ofgetting no accident in one day is

P (”no crash in 1 day”) =

(49999

50000

)20000

≈ 0.6703174. (257)

Hence, the probability of getting at least one crash is

P (”at least 1 crash in 1 day”) = 1−(

49999

50000

)20000

≈ 0.3296826. (258)

Therefore, the probability of having exactly 5 crash in 22 days is

P (”exactly 5 crash in 22 days”) ≈(

225

)p5q22−5 (259)

where p = 1−(4999950000

)20000 ≈ 0.0392106 and q = 1− p. This leads to

P (”exactly 5 crash in 22 days”) ≈ 0.0012366. (260)

We see that the result is close, but different from the previous probability, which was equal to0.00182409. In fact, the result depends on the time unit that we consider for our calculation.Obviously, if we consider that the unit time is the 1/2 day, the formula is changed to

P (”exactly 5 crash in 22 days”) ≈(

22× 25

)p5q22×2−5 (261)

(262)

where p = 1−(4999950000

)20000/2and q = 1− p. This gives

P (”exactly 5 crash in 22 days”) ≈ 0.0015155. (263)

If we consider the hour as the time unit, we get P (”exactly 5 crash in 22 days”) = 0.0017973.The third approach is based on the approximation that the binomial distribution function is

closely approximated when n is large by the Poisson distribution function, that is

b(n, p, j) ≈ λj

j!exp(−λ), (264)

where λ = np. Here, the parameter λ is equal to λ = 22× 20000/500000 = 0.88. The result is

P (”exactly 5 crash in 22 days”) ≈ λ5

5!exp(−λ) (265)

≈ 0.0018241. (266)

The three approaches are presented in figure 30. We can see that the flight by flight approachgives a result which is very close to the Poisson approximation, with 6 common digits. The variousapproaches based on time decomposition gives different results, with only 1 common digit. We cancheck that smaller time units lead to results which are closer to the flight by flight approach.

Consider now all that the year is a sequence of 16 periods of 22 days (ignoring the 13 days leftin the year). What is the probability of having one period in the year which contains at least 5crash ?

81

Event ProbabilityFlight by flight 0.00182409Time Unit = day 0.0012366Time Unit = 1/2 day 0.0015155Time Unit = hour 0.0017973Poisson 0.0018241

Figure 30: Probability of having exactly 5 crash in 22 days with one crash every 500000 flights and 20 000 flights each day - Different approaches

This decomposition corresponds to 16×22 = 352 days (instead of the usual 365 days by regularyear), where the remaining 13 days are ignored. We want to compute the probability that one ofthe 16 periods contains at least 5 crash. We have

P (”one period contains at least 5 crash”) = 1− P (”all periods contain less than 4 crash”).(267)

Since the periods are disjoints, the crash in all 16 periods are independents, so that

P (”all periods contain less than 4 crash”) = P (”one period contain less than 4 crash”)16. (268)

Now the probability of having less than 4 crash in one period is equal to

P (”less than 4 crash in 22 days”) = 1− P (”at least 5 crash in 22 days”) (269)

We now plug 269 into 268 and get

P (”all periods contain less than 4 crash”) = (1− P (”at least 5 crash in 22 days”))16. (270)

We finally plug 270 into 267 and get

P (”one period contains at least 5 crash”) = 1− (1− P (”at least 5 crash in 22 days”))16.(271)

It has already been computed that the probability of having at least 5 crash in one period of22 days is

P (”at least 5 crash in 22 days”) = 0.0021294, (272)

so that required probability is

P (”one period contains at least 5 crash”) ≈ 1− (1− 0.0021294)16 (273)

≈ 1− 0.997870616 (274)

≈ 0.0335316, (275)

which is approximately 3%. This is much larger that the original probability 0.0021294 ≈ 0.2% ofhaving at least 5 crash in 22 days.

The approach presented here is still too simplified. Indeed, we should instead consider theprobability that one period in the year contains at least 5 crash and consider all possible periods of22 days in the year. In this problem, the periods are not disjoints anymore, so that the computationsperformed earlier cannot be applied. This kind of problems involves a method called scan statisticswhich will not be presented here (see [9, 6]).

Answer of Exercise 2.6 (Binomial function maximum) Consider the discrete distributionfunction of a Bernoulli process, as defined by 2.16. Let us prove that

b(n, p, j) =p

q

(n− j + 1

j

)b(n, p, j − 1), (276)

82

for j ≥ 1. By definition, we have

b(n, p, j − 1) =

(n

j − 1

)pj−1qn−j+1, (277)

where q = 1− p. By pre-multiplying the previous equality by p/q, we find

p

qb(n, p, j − 1) =

(n

j − 1

)pjqn−j , (278)

By pre-multiplying the previous equality by n−j+1j , we find

p

q

(n− j + 1

j

)b(n, p, j − 1) =

(n− j + 1

j

)(n

j − 1

)pjqn−j . (279)

We can simply expand the term(

n−j+1j

)( nj − 1

)and have(

n− j + 1

j

)(n

j − 1

)=

(n− j + 1

j

)n(n− 1) . . . (n− j + 2)

(j − 1)(j − 2) . . . 1(280)

=n(n− 1) . . . (n− j + 2)(n− j + 1)

j(j − 1)(j − 2) . . . 1(281)

=

(nj

). (282)

We now plug the previous result into 279 and get 276, which concludes the proof.Let us compute jm ≥ 1 so that b(n, p, j) is maximum, i.e. so that

b(n, p, jm) ≤ b(n, p, j), 1 ≤ j ≤ n. (283)

To find jm, we consider the equality 276, which states that increasing values of j depends onpq

(n−j+1

j

). If this term is greater than 1, then increasing values of j will produce increasing

values of b(n, p, j). Similarly, if this term is lower than 1, then increasing values of j will producedecreasing values of b(n, p, j). Therefore, the index jm is the solution of the equation

p

q

(n− jm + 1

jm

)≥ 1. (284)

This is equivalent to

p(n− jm + 1) ≥ qjm, (285)

since q > 0 and jm > 0. We expand the previous inequality and get

pn− pjm + p ≥ (1− p)jm = jm − pjm, (286)

where the term −pjm appears on both sides of the previous equation. This can be simplified into

pn+ p ≥ jm, (287)

which can be written as

jm ≥ p(n+ 1). (288)

Therefore, the index j which makes the discrete distribution function b(n, p, j) maximum is

jm = [p(n+ 1)], (289)

where the [.] function is so that [x] is the largest integer lower or equal than x.Consider the experiment presented in section 3.4, which consists in tossing a coin 10 times and

counting the number of heads. With a Scilab simulation, can you compute what is the number ofheads which is the most likely to occur ?

The following function returns the probability of getting j heads in 10 tosses of a coin. It isbased on the same method presented in section 3.4, replacing the number of successes (which wasequal to 5) by the variable j.

83

function p = tossingcoin ( j )

rand("seed" ,0)

nb = 10000;

success = 0;

for i = 1:nb

faces = floor ( 2 * rand (1,10) );

nbheads = sum(faces);

if ( nbheads == j ) then

success = success + 1;

end

end

p = success / nb

endfunction

By using this function with different values of j, we can easily determine the value of j whichmaximizes this probability. The following session performs a loop for j = 0, 10 and prints out thecomputed probability for each value of j.

-->for j = 0:10

--> p = tossingcoin(j);

--> mprintf("P(j=%d)=%f\n",j,p);

-->end

P(j=0)=0.000800

P(j=1)=0.009700

P(j=2)=0.043900

P(j=3)=0.119600

P(j=4)=0.206800

P(j=5)=0.250700

P(j=6)=0.200000

P(j=7)=0.114400

P(j=8)=0.043700

P(j=9)=0.008800

P(j=10)=0.001600

We see that j = 5 maximizes the probability, which corresponds to the fact that the most probableevent is that the number of heads in 10 tosses of a coin is 5. Indeed, this corresponds to thejm = np = 10× 1

2 = 5 value that we just found by theory.

Answer of Exercise 2.7 (Binomial coefficients and Pascal’s triangle) Let a, b be two realnumbers and let n be a positive integer. Let us prove the binomial theorem which states that

(a+ b)n =∑j=0,n

(nj

)ajbn−j . (290)

We expand the formula (a+ b)n and get

(a+ b)n = (a+ b)(a+ b) . . . (a+ b). (291)

The first term of his product an, the second term is an−1b, and so forth, until the last term bn. Theexpansion can then be written as the sum of terms ajbn−j , with j = 0, n, and where each term isassociated with a coefficient that we have to compute. Consider the term ajbn−j and let us countthe number of times that his term will appear in the expansion. This is equivalent as choosing jelements in a set of n elements. Indeed, the order of the elements does not count, since ab = ba.

Therefore, each term ajbn−j will appear

(nj

)times, which concludes the proof.

The binomial coefficients

(nj

)can be written in a triangle, where each line corresponds to n

84

and each row corresponds to j, as in the following array

L =

1.1. 1.1. 2. 1.1. 3. 3. 1.1. 4. 6. 4. 1.

(292)

Let us use the binomial theorem in order to prove that the sum of the terms in the n-th row is 2n.We apply the binomial theorem with a = b = 1 and get

(1 + 1)n =∑j=0,n

(nj

)(293)

= 2n. (294)

Let us prove that if the terms are added with alternating signs, then the sum is zero. We applythe binomial theorem with a = 1 and b = −1 and we get

(1− 1)n =∑j=0,n

(nj

)(1)j(−1)n−j (295)

= (1)0(−1)n(n0

)+ (1)1(−1)n−1

(n1

)+ . . .+ (1)n(−1)0

(nn

)(296)

= (−1)n(n0

)+ (−1)n−1

(n1

)+ . . .+

(nn

)(297)

This last equality proves that, if the sum is beginning with the last term, alternating the signs of

the terms leads to a zero sum. Additionally, we have

(nj

)=

(n

n− j

)so that Pascal’s triangle has

a symmetry property. If we use this symmetry property in the binomial expansion, we have

(a+ b)n =∑j=0,n

(n

n− j

)an−jbj . (298)

We use this equation with a = 1 and b = −1 and we get

(1− 1)n =∑j=0,n

(n

n− j

)(1)n−j(−1)j (299)

= (1)n(−1)0(n0

)+ (1)n−1(−1)1

(n1

)+ . . .+ (1)0(−1)n

(nn

)(300)

=

(n0

)−(n1

)+ . . .+ (−1)n

(nn

)(301)

This last equality proves that, if the sum is beginning with the first term, alternating the signs ofthe terms leads to a zero sum.

Binomial coefficients can also be represented in a matrix called Pascal’s matrix, where thebinomial coefficients are stored in the anti-diagonals of the matrix. The following matrix is Pascal’smatrix of order 5 :

S =

1. 1. 1. 1. 1.1. 2. 3. 4. 5.1. 3. 6. 10. 15.1. 4. 10. 20. 35.1. 5. 15. 35. 70.

(302)

Let us design a Scilab script to compute Pascal’s triangle and Pascal’s matrix and check that wefind the same results as presented in 292 and 302. The following pascallow function allows tocompute Pascal’s lower triangular matrix of order n. It is directly based on the nchoosek functionalready defined in section 2.13.

85

function c = pascallow (n)

c = zeros(n,n);

for i = 1:n

c(i,1:i) = nchoosek (i-1,(1:i)-1);

end

endfunction

In the following session, we use the pascallow function to check that we get the matrix presentedin 292.

-->pascallow (5)

ans =

1. 0. 0. 0. 0.

2. 1. 0. 0. 0.

3. 3. 1. 0. 0.

4. 6. 4. 1. 0.

5. 10. 10. 5. 1.

In order to compute Pascal’s symetric matrix, some algebra is required so that the matrixelements Sij are filled anti-diagonal by anti-diagonal, where an anti-diagonal is associated with aconstant sum i+ j. The following function computes Pascal’s symetric matrix of order n.

function c = pascalsym (n)

c = zeros(n,n);

for i = 1:n

c(i,1:n) = nchoosek (i+(1:n)-2,i-1);

end

endfunction

The following session shows a sample use of this function in order to check that we get the sameresult as presented in equation 302.

-->pascalsym (5)

ans =

1. 1. 1. 1. 1.

1. 2. 3. 4. 5.

1. 3. 6. 10. 15.

1. 4. 10. 20. 35.

1. 5. 15. 35. 70.

We can additionnaly define Pascal’s upper triangular matrix as in the following function.

function c = pascalup ( n )

c = zeros(n,n);

for i = 1:n

c(i,i:n) = nchoosek ( (i:n)-1 , i-1 );

end

endfunction

In the following session, we compute Pascal’s upper triangular matrix of order 5.

-->pascalup ( 5 )

ans =

1. 1. 1. 1. 1.

0. 1. 2. 3. 4.

0. 0. 1. 3. 6.

0. 0. 0. 1. 4.

0. 0. 0. 0. 1.

The lower, upper and symetric Pascal matrix are related by the equality L ∗ U = S, as shownin the following session.

86

-->L = pascallow ( 5 );

-->U = pascalup ( 5 );

-->S = pascalsym ( 5 );

-->L * U - S

ans =

0. 0. 0. 0. 0.

0. 0. 0. 0. 0.

0. 0. 0. 0. 0.

0. 0. 0. 0. 0.

0. 0. 0. 0. 0.

Answer of Exercise 2.8 (Binomial identity)Let us prove the following binomial identity(

2nn

)=∑j=0,n

(nj

)2

. (303)

To help ourselves to prove this result, we consider a set A with 2n elements, where n elements arered and n elements are blue. Let us compute the number of ways to choose n elements in this set.For example, we consider the case n = 3 so that the set is

A = R1, R2, R3, B1, B2, B3 . (304)

We are searching subsets Ai ⊂ A, where i = 1, imax, where imax is the positive integer to becomputed. To organize our computation, we order the subsets depending on the number of redballs in the subset. The following is the list of all possible subsets of size 3 with no red element.

A1 = B1, B2, B3 . (305)

The following is the list of all possible subsets of size 3 with 1 red element (and, therefore, 2 blueelements).

A2 = R1, B1, B2 , A3 = R2, B1, B2 , A4 = R3, B1, B2 , (306)

A5 = R1, B2, B3 , A6 = R2, B2, B3 , A7 = R3, B2, B3 , (307)

A8 = R1, B1, B3 , A9 = R2, B1, B3 , A10 = R3, B1, B3 . (308)

The other subsets can be computed with the same method so that we finally find that there are,

indeed,

(63

)= 20 subsets. Therefore, we write the term

(2nn

)as the sum of the subsets where

there are j red elements, where 1 ≤ j ≤ n. We have(2nn

)=∑j=0,n

Cj , (309)

where Cj is the number of subsets of A with n elements where j elements are red. There are

(nj

)ways to choose the j red elements and

(n

n− j

)ways to choose the n−j blue elements so that Cj =(

nj

)(n

n− j

). But the symetry property of the binomial function states that

(nj

)=

(n

n− j

),

which leads to

Cj =

(nj

)2

. (310)

Finally, the previous equality can be plugged into 309 so that the equality 303 holds true, whichconcludes the proof.

87

Answer of Exercise 2.9 (Earthquakes and predictions) Assume that a person predicts thedates of major earthquakes (with magnitude larger than 6.5 or with a large number of deaths,etc...) in the world during 3 years, i.e. in a period of 1096 days. Assume that the ”specialist”predicts 169 earthquakes. Assume that, during the same period, 196 major earthquakes reallyoccur, so that 33 earthquakes were correctly predicted by the ”specialist”. What is the probabilitythat the earthquakes are predicted by chance ?

We consider the set of m = 1096 days, where k = 196 days are earthquakes and m − k =1096− 196 are not earthquake days. In this set, we are picking n = 169 days, where x = 33 daysare earthquakes.

The probability of selecting x earthquake days is given by the hypergeometric distributionfunction defined by

P (X = x) = h(x,m, k, n) =

(kx

)(m− kn− x

)(mn

) . (311)

In our particular situation, we have

P (X = 33) =

(19633

)(1096− 196169− 33

)(

1096169

) (312)

≈ 0.0705625, (313)

which is approximately 7 %.To know if the prediction is based on chance, we perform the computation, with Scilab, of

all the probabilities with x = 0, 1, . . . , 169. The following script allows to compute the requiredprobability and to draw the plot which is presented in figure 31.

// The number of days in three years

m = 1096

// The number of days selected

n = 169

// The number of earthquake days in three years

k = 196

// The number of earthquake days selected

x = 33

// The probability of picking 169 days , where 33 are earthquakes.

p = hygepdf ( x , m , k , n )

// Plot the distribution

xdata = zeros(n+1);

pdata = zeros(n+1);

for i = 0:n

xdata(i+1) = i;

pdata(i+1) = hygepdf ( i , m , k , n );

end

plot(xdata ,pdata)

f=gcf();

f.children.title.text="Probability of random predictions";

f.children.x_label.text="Number of earthquakes";

f.children.y_label.text="Probability";

Obviously, the prediction of the ”specialist” appears to be a random choice, since the numberof earthquakes predicted corresponds to the maximum probability.

88

160100 18040

Number of earthquakes

14060 120

Pro

babi

lity

200

0.09

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0.0080

Probability of random predictions

Figure 31: Probability of having x earthquake days while choosing n = 169 daysfrom m = 1096 days, where k = 196 days are earthquake days.

Answer of Exercise 2.10 (Log-factorial function) The following Scilab implementation isadapted from a Matlab source code written by John Burkardt[3]. The following factoriallog_sum

function uses the sum and ln functions to perform a fast computation of fln(n) = log(n!) fromequation 193.

function value = factoriallog_sum ( n )

value = sum(log(2:n));

endfunction

The previous implementation has the advantage of relying only on the logarithm function. But thefactoriallog_sum function does not take matrix input arguments. The main issue is that morememory is required, making the computation intractable for values of n larger than 105. Finally,the function might generate inaccurate results for large values of n, caused by the accumulation ofrounding errors in the sum.

References

[1] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions withFormulas, Graphs, and Mathematical Tables. Dover Publications Inc., 1972.

[2] George E. Andrews, Richard Askey, and Ranjan Roy. Special Functions. Cam-bridge University Press, Cambridge, 1999.

[3] John Burkardt. Probability density functions. http://people.sc.fsu.edu/

~burkardt/m_src/prob/prob.html.

[4] Georges Charpak and Henri Broch. Devenez sorciers, devenez savants. OdileJacob, 2002.

89

http://people.sc.fsu.edu/~burkardt/m_src/prob/prob.html

http://people.sc.fsu.edu/~burkardt/m_src/prob/prob.html

[5] Richard Durstenfeld. Algorithm 235: Random permutation. Commun. ACM,7(7):420, 1964.

[6] J. Glaz and N. Balakrishnan. Scan Statistics and Applications. BirkhAlauserBoston, 1999.

[7] Charles Grinstead, M. and J. Snell, Laurie. Introduction to probabilities, SecondEdition. American Mathematical Society, 1997.

[8] J. M. Hammersley and D. C. Handscomb. Monte Carlo Methods. Chapmanand Hall, 1964.

[9] E. Janvresse and T. de la Rue. La loi des series noires. La Recherche, 393:52–53, Janvier 2006. http://www.univ-rouen.fr/LMRS/Persopage/Janvresse/

Publi/series_noires.pdf.

[10] D. E. Knuth. The Art of Computer Programming, Volume 2, SeminumericalAlgorithms. Third Edition, Addison Wesley, Reading, MA, 1998.

[11] M. Loeve. Probability Theory I, 4th Edition. Springer, 1963.

[12] Michael A. Malcolm and Cleve B. Moler. Urand: a universal random numbergenerator. Technical report, Stanford University, Stanford, CA, USA, 1973.

[13] Lincoln E. Moses and V. Oakford, Robert. Tables of random permutations.Stanford University Press, Stanford, Calif., 1963.

[14] World Health Organization. Life table, united states of america, 2009. http:

//www.who.int/.

[15] Dmitry Panchenko. Introduction to probability and statistics. Course 18.05,Lecture #2, Properties of Probability, Finite Sample Spaces, Some Combina-torics, Lectures notes taken by Anna Vetter, 2005.

[16] Wolfram Research. Wolfram alpha. http://www.wolframalpha.com.

[17] M. Ross, Sheldon. Introduction to probability and statistics for engineers andscientists. John Wiley and Sons, 1987.

[18] Pascal Sebah and Xavier Gourdon. Introduction to the gammafunction. http://www.profesores.frc.utn.edu.ar/electronica/

analisisdeseniales/aplicaciones/Funcion_Gamma.pdf.

[19] Edmund Taylor Whittaker and George Neville Watson. A Course of ModernAnalysis. Cambridge Mathematical Library, 1927.

[20] Wikipedia. Poker probability — wikipedia, the free encyclopedia, 2009.

[21] A. Talha Yalta. The accuracy of statistical distributions in microsoft excel 2007.Comput. Stat. Data Anal., 52(10):4579–4586, 2008.

90

http://www.univ-rouen.fr/LMRS/Persopage/Janvresse/Publi/series_noires.pdf

http://www.univ-rouen.fr/LMRS/Persopage/Janvresse/Publi/series_noires.pdf

http://www.who.int/

http://www.who.int/

http://www.wolframalpha.com

http://www.profesores.frc.utn.edu.ar/electronica/analisisdeseniales/aplicaciones/Funcion_Gamma.pdf

http://www.profesores.frc.utn.edu.ar/electronica/analisisdeseniales/aplicaciones/Funcion_Gamma.pdf

Index

Bernoulli, 30binomial, 25

combination, 25combinatorics, 11complementary, 2conditional

distribution function, 9probability, 9

disjoint, 2

event, 3

factorial, 13, 21fair die, 8

gamma, 15grand, 34

intersection, 2

outcome, 3

permutation, 12permutations, 23poker, 28

rand, 34random, 3rank, 28

sample space, 3seed, 35subset, 2suit, 28

tree diagram, 12

uniform, 8union, 2

Venn, 2

91

introduction to probabilities with...

Documents