chapter 4 : discrete random variablesnehemyl/files/uw_math_stat394... · chapter 4 : discrete...

STAT/MATH 394 A - PROBABILITY I – UWAutumn Quarter 2016 Néhémy Lim

Chapter 4 : Discrete Random Variables

1 Random variablesExample. We select randomly three people that attend a tennis match betweenRoger Federer and Novak Djokovic. We assume that each person in the sampleis either a Federer or a Djokovic fan, but not a fan of both. A Federer fanis denoted by F and a Djokovic fan D. This experiment yields the followingsample space:

Ω = DDD,DDF,DFD,DFF, FDD,FDF,FFD,FFF

Note that we can assign a specific number to each of the outcomes in thesample space Ω. Indeed, we can define some variableX that represents the num-ber of Federer fans in the sample. For instance, we associate the outcome DFDwith the value X = 1. Conversely, for a given range of values of X, we can asso-ciate some event on Ω. For instance, X ≥ 2 corresponds to the event E that atleast 2 people in the sample are Federer fans: E = DFF,FDF,FFD,FFF.The variable X that we have defined is called a random variable.

The following discussion on the Borel σ-algebra on R is not crucial for theunderstanding of the chapter. So feel free to skip it and go directly to Definition1.2.Technical details on the Borel σ-algebra on R. Formally, the set of valuesthat a random variable can take on should be endowed with a σ-algebra since theset of all events A is itself a σ-algebra. In this course, we will always considerrandom values that take values on R. In this case, the σ-algebra that we willwork with is called the Borel σ-algebra on R and is defined as follows.

Definition 1.1 (Borel σ-algebra on R). The Borel σ-algebra on R is the smallest(in the sense of set inclusion) σ-algebra that contains all open intervals in R.The Borel σ-algebra on R is denoted B(R).

Reminder: Open intervals in R are of three kinds :

• (a, b), with a, b ∈ R;

• (a,∞), with a ∈ R;

• (−∞, b), with b ∈ R

Proposition 1.1. B(R) also contains all closed intervals.

1

Definition 1.2 (Random variable). Let (Ω,A) be a measurable space of eventson the sample space Ω. A real-valued random variable (rrv) X is a functionmapping with domain Ω, i.e. X : Ω→ R, such that for any Borel set B ∈ B(R):

ω ∈ Ω | X(ω) ∈ B ∈ A (1)

The event ω ∈ Ω | X(ω) ∈ B is simply denoted X ∈ B.Notations.• X(Ω) denotes all the possible values that the random variable X can takeon. For instance, in the first example, X(Ω) = 0, 1, 2, 3.

For a, b ∈ R, we have the following commonly used notations :

• X ∈ (a, b) is denoted a < X < b

• X ∈ [a, b) is denoted a ≤ X < b

• X ∈ (a, b] is denoted a < X ≤ b

• X ∈ [a, b] is denoted a ≤ X ≤ b

• X ∈ (−∞, b) is denoted X < b

• X ∈ (−∞, b] is denoted X ≤ b

• X ∈ (a,∞) is denoted X > a

• X ∈ [a,∞) is denoted X ≥ a

• X ∈ a is denoted X = a

Definition 1.3 (Discrete random variable). A real-valued random variable Xis said to be discrete if X can take :• either a finite number of values : X(Ω) = xi ∈ R, i = 1 . . . , n for agiven n ∈ N, n ≥ 1;

• or a countably infinite number of values : X(Ω) = xi ∈ R, i ∈ I for agiven subset I ⊆ N.

Examples.• X: the number of times you flip a coin until you get a heads. X is discretesince X(Ω) = 1, 2, . . ..

• Y : time spent on working STAT394 in hours. Y can take any positivevalues and they cannot be listed or indexed. Y takes an uncountablyinfinite number of values : Y (Ω) = [0,∞). Thus Y is not discrete. (Wewill see in the next chapter that X is a continuous rrv.)

We focus on discrete real-valued random variables in this chapter.

2 Probability Mass Function and DistributionFunction

Definition 2.1 (Probability mass function). Let X be a discrete rrv on proba-bility space (Ω,A,P) that takes values on X(Ω). The probability mass function(pmf) pX of X is a function with domain X(Ω) and defined by :

pX(x) = P(X = x), for x ∈ X(Ω) (2)

2

Example. We now consider a slight variation of the first example. In a stadium,10,000 people attend a tennis match between Roger Federer and Novak Djokovic.Select three people at random. A person is either a Federer fan or a Djokovicfan, but not a fan of both. 8,000 are Federer fans and 2,000 are Djokovic fans.What is the sample space of this experiment?

Assuming a uniform probability measure, compute the probabilities of thefollowing events :

• F0:“none of the three fans like Federer”

• F1:“exactly one of the three fans like Federer”

• F2:“exactly two of the three fans like Federer”

• F3:“all of the three fans like Federer”

Representations of a discrete random variable. A discrete rrv can eitherbe represented by:

• a table

• a bar graph or histogram

Below are the representations for some random variable X with probabilitymass function pX given by:

• pX(0) = 0.008

• pX(1) = 0.096

• pX(2) = 0.384

• pX(3) = 0.512

x 0 1 2 3pX(x) 0.008 0.096 0.384 0.512

Table 1: Tabular form of pX

Theorem 2.1 (Distribution of a random variable). A probability mass functioncompletely determines the distribution of a discrete real-valued random variable.

We will see other functions that completely determine the distribution of arandom variable.

Definition 2.2 (Identically distributed random variables). Two discrete real-valued random variables X and Y that have exactly the same probability massfunctions (same values on all points of their domain) are said to be identicallydistributed.

3

0 1 2 3

0

0.2

0.4

8 · 10−3

9.6 · 10−2

0.38

0.51

pX

(x)

Figure 1: Histogram of pX

Be careful ! If X and Y are identically distributed, this does not imply thatX and Y are equal.

Example. Consider the experiment of tossing a fair coin three times. Definethe random variables X as the number of heads observed and Y as the numberof tails observed. Prove that X and Y are identically distributed but are notequal.

Property 2.1. Let X be a discrete rrv on probability space (Ω,A,P) that takesvalues on X(Ω) with pmf pX , then for any subset B ⊆ X(Ω)

P(X ∈ B) =∑x∈B

pX(x)

Property 2.2. Let X be a discrete rrv with pmf pX , then the following holds :

• pX(x) ≥ 0, for x ∈ X(Ω)

•∑x∈X(Ω) pX(x) = 1

Proof. • It is obvious that pX(x) = P(X = x) ≥ 0 since P is a probabilitymeasure

•∑x∈X(Ω) pX(x) = P(X ∈ X(Ω)) = P(Ω) = 1

Definition 2.3. A real-valued function p with countable domain B ⊂ R is saidto be a valid pmf if the following holds :

• p(x) ≥ 0, for x ∈ B

•∑x∈B p(x) = 1

4

This means that if p is a valid pmf, then there exists some discrete rrv X thatadmits p as its pmf

Example. Let p(x) = cx2 for x = 1, 2, 3. Determine the constant c so that thefunction p satisfies the conditions of being a valid probability mass function.

• The first condition implies that c is nonnegative since p(x) ≥ 0 and x2 ≥ 0

• The second condition is that p should sum to 1 :

3∑x=1

p(x) = 1⇔ c · 12 + c · 22 + c · 32 = 1

⇔ c(12 + 22 + 32) = 1

⇔ c = 112 + 22 + 32 = 1

14

Example. Determine the constant c so that the following function p satisfiesthe conditions of being a valid probability mass function :

p(x) = c

(14

)x, for x ∈ N, n ≥ 1

Hint: If (un) is a geometric sequence, with first term u0 = a and common ratior, that is : un+1 = run with −1 < r < 1. Then the sum of the geometric seriesis :

∑∞n=0 un = a/(1− r).

Definition 2.4 (Cumulative distribution function). Let (Ω,A,P) be a proba-bility space. The (cumulative) distribution function (cdf) of a real-valuedrandom variable X is the function FX given by

FX(x) = P(X ≤ x), for all x ∈ R (3)

Property 2.3. Let FX be the distribution function of a random variable X.Following are some properties of FX :

• FX is increasing : x ≤ y ⇒ FX(x) ≤ FX(y)

• limx→∞ FX(x) = 1 and limx→−∞ FX(x) = 0

• FX is càdlàg :

– FX is right continuous : limx↓x0 FX(x) = FX(x0) , for x0 ∈ R– FX has left limits : limx↑x0 FX(x) exists, for x0 ∈ R

5

Property 2.4. Let X be a discrete rrv that takes its values in X(Ω) and FXbe the distribution function of X. Then, FX is piecewise constant and discon-tinuous at the points x ∈ X(Ω).

Example. Consider the experiment of tossing a fair coin three times. Let Xbe the number of heads observed. We previously saw that the correspondingprobability mass function pX is given by the table :

x 0 1 2 3pX(x) 1/8 3/8 3/8 1/8

• For instance, FX(2) = P(X ≤ 2) = pX(0) + pX(1) + pX(2) = 1/8 + 3/8 +3/8 = 7/8

• What is FX(−2)? By definition, it is FX(−2) = P(X ≤ −2), but Xcannot take any values below -2. Therefore FX(−2) = 0.

• What is FX(1.74)? By definition, it is FX(1.74) = P(X ≤ 1.74) = pX(0)+pX(1) = 1/8 + 3/8 = 1/2

• What is FX(4)? By definition, it is FX(4) = P(X ≤ 4) = pX(0) +pX(1) +pX(2) + pX(3) = 1/8 + 3/8 + 3/8 + 1/8 = 1. There are no more valuesbeyond 3. So the distribution function remains constant, equal to 1.

Function FX is thus defined as follows :

FX(x) =

0 x < 01/8 0 ≤ x < 11/2 1 ≤ x < 27/8 2 ≤ x < 31 x ≥ 3

Below is a plot of the distribution function FX :

−2 2 4

−0.5

0.5

1

1.5

x

FX(x)

Theorem 2.2. A distribution function completely determines the distributionof a real-valued random variable.

6

3 Mathematical ExpectationExample. Toss a fair, six-sided die many times, say 100,000 times. How doyou compute the average (or mean) of the tosses?

Assume that the first elements of the resulting sequence are : 1, 5, 4, 1, 2, 3, 6, 2, 5, 4, 2, . . ..Then the mean would be :

1 + 5 + 4 + 1 + 2 + . . .+ 3100, 000

= 1 + 1 + . . .+ 1100, 000 + 2 + 2 + . . .+ 2

100, 000 + 3 + 3 + . . .+ 3100, 000

+ 4 + 4 + . . .+ 4100, 000 + 5 + 5 + . . .+ 5

100, 000 + 6 + 6 + . . .+ 6100, 000

= 1number of 1s obtained100, 000 + 2number of 2s obtained

100, 000 + 3number of 3s obtained100, 000

+ 4number of 4s obtained100, 000 + 5number of 5s obtained

100, 000 + 6number of 6s obtained100, 000

= 1 · frequency of 1 + 2 · frequency of 2 + 3 · frequency of 3+ 4 · frequency of 4 + 5 · frequency of 5 + 6 · frequency of 6

Remarks:

• In reality, one-sixth of the tosses will equal xi ∈ 1, . . . , 6 only in thelong run. Frequencies converge exactly to probabilities.

• The mean is an average of the values weighted by their respective individ-ual frequencies.

Definition 3.1 (Expected Value). Let X be a discrete rrv that takes its valuesin X(Ω). Let pX be the associated pmf. If

∑x∈X(Ω) xpX(x) is absolutely con-

vergent, i.e.∑x∈X(Ω) |x|pX(x) <∞, then we say that X is integrable. Then,

the mathematical expectation (or expected value or mean) of X exists, isdenoted by E[X] and is defined as follows :

E[X] =∑

x∈X(Ω)

xpX(x) (4)

Definition 3.2 (Expected Value of a Function of a Random Variable). Let Xbe a discrete rrv that takes its values in X(Ω). Let pX be the associated pmf andlet g : X(Ω) → R be a piecewise continuous function. If random variable g(X)is integrable. Then, the mathematical expectation of g(X) exists, is denoted byE[g(X)] and is defined as follows :

E[g(X)] =∑

x∈X(Ω)

g(x)pX(x) (5)

7

Example. A roulette wheel contains 38 numbers: zero (0), double zero (00),and the numbers 1, 2, 3, . . . , 36. Let X denote the number on which the balllands and g(X) denote the amount of money paid to the gambler, such that:

• g(X) = 5$ if X = 0

• g(X) = 10$ if X = 00

• g(X) = 1$ if X is even

• g(X) = 2$ if X is odd

If I run a casino, how much would I have to charge each gambler to play inorder to ensure that I made some money?

Example. Let X be a discrete rrv with the following probability mass function:

pX(x) = c

x2 , for x ∈ N, x ≥ 1

1. Determine the constant c.

2. What is the expected value of X?

Example. Suppose the pmf pX of a discrete random variable X is given by:

xi 0 1 2 3pX(xi) 0.2 0.1 0.4 0.3

1. What is E[2]?

2. What is E[X]?

3. What is E[2X]?

Property 3.1. Let X be a discrete rrv that takes its values in X(Ω) with pmfpX .

• for all c ∈ R, E[c] = c

• If c ∈ R and g : X(Ω) → R is a piecewise continuous function and g(X)is integrable. Then, we have :

E[cg(X)] = cE[g(X)] (6)

• If g : X(Ω)→ R is a nonnegative piecewise continuous function and g(X)is integrable. Then, we have :

E[g(X)] ≥ 0 (7)

8

• If g1 : X(Ω) → R and g2 : X(Ω) → R are piecewise continuous functionsand g1(X) and g2(X) are integrable such that g1 ≤ g2. Then, we have :

E[g1(X)] ≤ E[g2(X)] (8)

Proof. • Let us first show that for all c ∈ R, E[c] = c :Here, we consider the function g equal to constant c.

E[c] =∑

x∈X(Ω)

cpX(x) = c∑

x∈X(Ω)

pX(x)

︸︷︷︸=1 since pX is a pmf

= c

• Now, let us consider a piecewise continuous function g : X(Ω)→ R, suchthat g(X) is integrable.

E[cg(X)] =∑

x∈X(Ω)

cg(x)pX(x) = c∑

x∈X(Ω)

g(x)pX(x) = cE[g(X)]

• The last two statements are left as an exercise.

Example. Let us return to the same previous discrete random variable X:

x 0 1 2 3pX(x) 0.2 0.1 0.4 0.3

1. What is E[X2]?

2. What is E[2X + 3X2]?

Property 3.2. Let X be a discrete rrv that takes its values in X(Ω) with pmfpX . If c1, c2 ∈ R and g1 : X(Ω) → R and g2 : X(Ω) → R are piecewisecontinuous functions and g1(X) and g2(X) are integrable. Then, we have :

E[c1g1(X) + c2g2(X)] = c1E[g1(X)] + c2E[g2(X)] (9)

Proof. Do it by yourself !

Example. Using results from previous example,

1. What is E[4X2]?

2. What is E[3X + 2X2]?

9

4 VarianceMotivating example. Consider two discrete random variables X and Y withrespective probability mass functions pX and pY . Here are their respectivetabular forms :

x 3 4 5pX(x) 0.3 0.4 0.3

x 1 2 6 8pY (x) 0.4 0.1 0.3 0.2

Show that the mean of X and the mean of Y are the same. Draw bar graphscorresponding to the two pmfs. and observe the variability of the two distribu-tions.

Definition 4.1 (Variance–Standard Deviation). Let X be a real-valued randomvariable. When E[X2] exists, the variance of X is defined as follows :

Var(X) = E[(X − E[X])2] (10)

Var(X) is sometimes denoted σ2X . The positive square root of the variance is

called the standard deviation of X, and is denoted σX . That is:

σX =√Var(X) (11)

Let us return to the previous example. What is the variance and standarddeviation of X? How does it compare to the variance and standard deviationof Y ?

As you can see, the expected variation in the random variable Y , as quan-tified by its variance and standard deviation, is much larger than the expectedvariation in the random variableX. Given the pmfs of the two random variables,this result should not be surprising.

Property 4.1. The variance of a real-valued random variable X satisfies thefollowing properties :

• Var(X) ≥ 0

• If a, b ∈ R are two constants, then Var(aX + b) = a2Var(X)

Proof. • The nonnegativity of the variance comes from the fact that functiong defined by g(x) = (x− E[X])2 is nonnegative.

10

• Let a, b ∈ R be two constants, then we have :

Var(aX + b) = E[(aX + b− E[aX + b])2]= E[(aX + b− (aE[X] + b))2]= E[(aX − aE[X])2]= E[(a(X − E[X]))2]= E[a2(X − E[X])2]= a2E[(X − E[X])2]= a2Var(X)

The formula for the variance of a discrete random variable can be quitecumbersome to use. There is a slightly easier-to-work-with alternative formula.

Theorem 4.1 (König-Huygens formula). Let X be a real-valued random vari-able. When E[X2] exists, the variance of X is also given by :

Var(X) = E[X2]− (E[X])2 (12)

Proof. We have the following :

Var(X) = E[(X − E[X])2]= E[X2 − 2XE[X] + (E[X])2]= E[X2]− E[2XE[X]] + E[(E[X])2]= E[X2]− 2E[X]E[X] + (E[X])2

= E[X2]− 2(E[X])2 + (E[X])2

= E[X2]− (E[X])2

Example. Use the alternative formula to verify that the variance of the randomvariable X is 0.6, as we calculated earlier.

Example. The mean temperature in Victoria, B.C. is 50 degrees Fahrenheitwith standard deviation 8 degrees Fahrenheit. What is the mean temperaturein degrees Celsius? What is the standard deviation in degrees Celsius?

Recall that the conversion from Fahrenheit (F) to Celsius (C) is:

C = 59(F − 32)

11

5 Common Discrete Distributions5.1 Discrete Uniform DistributionExample. I throw a toss a fair, six-sided die. What is the probability that thedie lands on a specific side?

Definition 5.1 (Discrete Uniform Distribution). Let (Ω,A,P) be a probabilityspace and let X be a random variable that can take n ∈ N, n ≥ 1 values onX(Ω) = x1, . . . , xn. X is said to have a discrete uniform distribution Unif its probability mass function is given by :

pX(i) = P(X = xi) = 1n, for i = 1, . . . , n (13)

Proof. Let us prove that the pmf of a discrete uniform distribution is actuallya valid pmf :

• pX(xi) = 1/n ≥ 0, for i = 1, . . . , n ;

• Does the pmf sum to 1?

n∑i=1

pX(xi) =n∑i=1

1n

= 1

Property 5.1 (Mean and Variance for a Discrete Uniform Distribution on1, . . . , n). If X follows a discrete uniform distribution Un on X(Ω) = 1, . . . , n,then

• its expected value is given by :

E[X] = n+ 12 (14)

• its variance is given by :

Var(X) = n2 − 112 (15)

Proof. • Expectation :

E[X] =n∑i=1

i · pX(i) = 1n·n∑i=1

i = 1n· n(n+ 1)

2 = n+ 12

12

• Variance : Var(X) = E[X2]− (E[X])2 where E[X2] is given by :

E[X2] =n∑i=1

i2 · pX(i) = 1n·n∑i=1

i2 = 1n· n(n+ 1)(2n+ 1)

6 = (n+ 1)(2n+ 1)6

Therefore, Var(X) = (n+ 1)(2n+ 1)/6− (n+ 1)2/4 = (n2 − 1)/12

5.2 Bernoulli distributionExample. Select randomly three people that attend a tennis match betweenRoger Federer and Novak Djokovic. A person is either a Federer fan or aDjokovic fan, but not a fan of both. And assume that a person has a likeli-hood of p = 80% to be a Federer fan.

Compute the probabilities of the following events :

• E0:“none of the three fans like Federer”

• E1:“exactly one of the three fans like Federer”

• E2:“exactly two of the three fans like Federer”

• E3:“all of the three fans like Federer”

Questions.

• Which assumptions did we make to compute those probabilities?

• What pattern can we identify in the calculations?

Definition 5.2 (Bernoulli process). A Bernoulli or binomial process hasthe following features :

1. We repeat n ∈ N, n ≥ 1 identical trials

2. A trial can result in only two possible outcomes, that is, a certain event E,called success, occurs with probability p, thus event Ec, called failure,occurs with probability 1− p

3. The probability of success p remains constant trial after trial. In this case,the process is said to be stationary.

4. The trials are mutually independent.

Are the following experiments Bernoulli processes?

• A coin is weighted in such a way so that there is a 70% chance of gettinga head on any particular toss. Toss the coin, in exactly the same way, 100times.

13

• A fair coin is tossed until it lands on heads.

• An urn contains 5 white balls and 5 black balls. We draw 6 balls from theurn without replacement. We are interested in the number of black ballsdrawn.

• 8,000 Federer fans and 2,000 Djokovic fans attend a tennis match. Selectthree fans randomly. We are interested in the number of Federer fansselected.

Rigorously, the last example is not a Bernoulli process. However, when thesample size n is small in relation to the population size N , the approximationby a Bernoulli process is tolerated.

Definition 5.3 (Bernoulli Distribution). Let (Ω,A,P) be a probability space.Let E ∈ A be an event labeled as success, that occurs with probability p. If therandom variable X is the indicator function of event E, that is X = 1 if Eoccurs and X = 0 if E does not occur, then X is said to have a Bernoullidistribution Ber(p) and its probability mass function is given by :

pX(1) = P(X = 1) = p and pX(0) = P(X = 0) = 1− p (16)

Proof. Let us prove that the pmf of a Bernoulli distribution is actually a validpmf :

• pX(1) = p ≥ 0 and pX(0) = 1− p ≥ 0 ;


1∑i=0

pX(i) = (1− p) + p = 1

Property 5.2 (Mean and Variance for a Bernoulli Distribution). If X followsa Bernoulli distribution Ber(p), then


E[X] = p (17)


Var(X) = p(1− p) (18)

14


E[X] =1∑i=0

i · pX(i) = 0 · (1− p) + 1 · p = p

• Variance : Var(X) = E[X2]− (E[X])2. Note that X2 = X for X ∈ 0, 1.Therefore E[X2] = E[X] = p and

Var(X) = p− p2 = p(1− p)

5.3 Binomial DistributionDefinition 5.4 (Binomial Distribution). Let (Ω,A,P) be a probability space.Let E ∈ A be an event labeled as success, that occurs with probability p. Ifn ∈ N, n ≥ 1 trials are performed according to a Bernoulli process, then therandom variable X defined as the number of successes among the n trials, is saidto have a binomial distribution Bin(n, p) and its probability mass functionis given by :

pX(x) = P(X = x) =(n

x

)px(1− p)n−x for x = 0, . . . , n (19)

Proof. Let us prove that the pmf of a binomial distribution is actually a validpmf :

• pX(x) =(nx

)px(1− p)n−x ≥ 0 for x = 0, . . . , n ;


n∑x=0

pX(x) =n∑x=0

(n

x

)px(1− p)n−x = (p+ (1− p))n = 1 (Binomial theorem seen in Chapter 1)

Property 5.3 (Mean and Variance for a Binomial Distribution). If X followsa binomial distribution Bin(n, p), then


E[X] = np (20)


Var(X) = np(1− p) (21)

15


E[X] =n∑x=0

xpX(x)

=n∑x=0

x

(n

x

)px(1− p)n−x

=n∑x=1

n!(x− 1)!(n− x)!p

x(1− p)n−x

=n−1∑k=0

n!k!(n− 1− k)!p

k+1(1− p)n−1−k

= np

n−1∑k=0

(n− 1)!k!(n− 1− k)!p

k(1− p)n−1−k

we recognize inside the sum the pmf of Bin(n− 1, p)= np

• Variance : Var(X) = E[X2] − (E[X])2. We use the following trick : wesubtract and add E[X]. We thus obtain : Var(X) = E[X2]−E[X]+E[X]−(E[X])2 = E[X(X−1)] +E[X]− (E[X])2, where E[X(X−1)] is given by :

E[X(X − 1)] =n∑x=0

x(x− 1)pX(x)

=n∑x=0

x(x− 1)(n

x

)px(1− p)n−x

=n∑x=2

n!(x− 2)!(n− x)!p

x(1− p)n−x

=n−2∑k=0

n!k!(n− 2− k)!p

k+2(1− p)n−2−k

= n(n− 1)p2n−2∑k=0

(n− 2)!k!(n− 2− k)!p

k(1− p)n−2−k

we recognize inside the sum the pmf of Bin(n− 2, p)= n(n− 1)p2

Hence,

Var(X) = n(n− 1)p2 + np− (np)2 = np(1− p)

16

Example. A student attends STAT394 three days a week. Assume that heoversleeps with probability 0.15.

• What is the probability the he misses one class in a week?

• What is the probability the he misses three classes in a month (12 classes)?

• What is the probability the he misses at least two classes in a month?

• What is the probability the he misses four classes in total in a given monthif he already missed two classes in that same month ?

• How many classes does the instructor of STAT394 expect that student tomiss at the end of the quarter (30 classes)? What is the correspondingvariance?

5.4 Geometric DistributionExample. I draw a card from a standard deck of 52 cards. If the card I drawis not an ace, I put it back in the deck and shuffle the cards and I draw a newcard. I repeat the process until I get an ace. What is the probability that Idraw an ace for the first time at the fourth trial?

Definition 5.5 (Geometric Distribution). Let (Ω,A,P) be a probability space.Let E ∈ A be an event labeled as success, that occurs with probability p. If allthe assumptions of a Bernoulli process are satisfied, except that the number oftrials is not preset, then the random variable X defined as the number of trialsuntil the first success is said to have a geometric distribution G(p) and itsprobability mass function is given by :

pX(x) = P(X = x) = (1− p)x−1p, for x ∈ N, x ≥ 1 (22)

Proof. Let us prove that the pmf of a geometric distribution is actually a validpmf :

• pX(x) = (1− p)x−1p ≥ 0 for x ∈ N, x ≥ 1 ;


∞∑x=1

pX(x) =∞∑x=1

(1− p)x−1p = p

∞∑x=1

(1− p)x−1 = p1

1− (1− p) = 1

Example. I draw a card from a standard deck of 52 cards. If the card I drawis not an ace, I put it back in the deck and shuffle the cards and I draw a newcard. I repeat the process until I get an ace. What is the probability that Ihave not drawn an ace yet after 6 trials?

17

Property 5.4 (Distribution function for a Geometric Distribution). If X fol-lows a geometric distribution G(p), then the distribution function of X is givenby :

FX(x) = 1− (1− p)x (23)Proof. For a given x ∈ R, we have that :

FX(x) = P(X ≤ x)= 1− P(X > x)= 1− P(X ≥ x+ 1) since X can only take on integer values

Let us calculate P(X ≥ x+ 1) :

P(X ≥ x+ 1) =∞∑

k=x+1pX(x)

=∞∑

k=x+1(1− p)x−1p

= p

∞∑k=x+1

(1− p)k−1

= p(1− p)x

1− (1− p)= (1− p)x

Hence, the result holds.

Property 5.5 (Mean and Variance for a Geometric Distribution). If X followsa geometric distribution G(p), then• its expected value is given by :

E[X] = 1p

(24)


Var(X) = 1− pp2 (25)


E[X] =∞∑x=1

xpX(x)

=∞∑x=1

x(1− p)x−1p

= p

∞∑x=1

x(1− p)x−1

18

Here, we notice that x(1 − p)x−1 is actually the derivative of −(1 − p)xwith respect to p. Therefore, we have that :

E[X] = −p ddp

∞∑x=1

(1− p)x

= −p ddp

(1− pp

)= −p · −1

p2

= 1p

• As seen in a previous proof, the variance can be written as follows :Var(X) = E[X(X − 1)] + E[X] − (E[X])2, where E[X(X − 1)] is givenby :

E[X(X − 1)] =∞∑x=1

x(x− 1)pX(x)

=∞∑x=1

x(x− 1)(1− p)x−1p

= p(1− p)∞∑x=1

x(x− 1)(1− p)x−2

Here, we notice that x(x − 1)(1 − p)x−2 is actually the second derivativeof (1− p)x with respect to p. Therefore, we have that :

E[X(X − 1)] = p(1− p) d2

dp2

∞∑x=1

(1− p)x

= p(1− p) d2

dp2

(1− pp

)= p(1− p) 2

p3

= 2(1− p)p2

Hence,

Var(X) = 2(1− p)p2 + 1

p− 1p2 = 1− p

p2

19

Example. A representative from the National Football League’s MarketingDivision randomly selects people on a random street in Seattle until he finds aperson who attended the last home football game. Let p, the probability thathe succeeds in finding such a person, equal 0.20. And, let X denote the numberof people he selects until he finds his first success.

• What is the probability that the marketing representative must select 4people before he finds one who attended the last home football game?

• What is the probability that the marketing representative must select morethan 6 people before he finds one who attended the last home footballgame?

• How many people should we expect (in the long run) the marketing rep-resentative needs to select before he finds one who attended the last homefootball game? What is the corresponding variance?

5.5 Hypergeometric DistributionExample. A wallet contains three $100 bills and five $1 bills. You randomlychoose four bills. What is the probability that you will choose exactly 2 $100bills?

Definition 5.6 (Hypergeometric Distribution). Let (Ω,A,P) be a probabilityspace. Let E ∈ A be an event labeled as success. If the experiment consists indrawing a sample of n items, without replacement, from a finite population ofsize N that contains exactly m successes, then the random variable X defined asthe number of successes among the n trials, is said to have a hypergeometricdistribution HG(N,n,m) and its probability mass function is given by :

pX(x) = P(X = x) =(mx

)(N−mn−x

)(Nn

) , for x = 0, . . . ,m (26)

Remark: As mentioned previously, when the sample size n is small in relationto the population size N , a hypergeometric distribution HG(N,n,m) can beapproximated by a binomial distribution Bin(n,m/N).

Verify that approximation in the “Federer/Djokovic” example : 8,000 Fed-erer fans and 2,000 Djokovic fans attend a tennis match. If three fans arerandomly selected, compute the exact and approximate probability that two ofthem are Federer fans.

Property 5.6 (Mean and Variance for a Hypergeometric Distribution). If Xfollows a hypergeometric distribution HG(N,n,m), then


E[X] = n · mN

(27)

20


Var(X) = N − nN − 1 · n ·

m

N·(

1− m

N

)(28)

Example. In Hold’em Poker players make the best hand they can combiningthe two cards in their hand with the 5 cards (community cards) eventuallyturned up on the table. The deck has 52 cards and there are 13 of each suit(hearts, clubs, spades, diamonds). Assume a player has 2 clubs in the hand andthere are 3 cards showing on the table, 2 of which are also clubs.

• What is the probability that neither of the next two cards turned areclubs?

• What is the probability that one of the next two cards turned is a club?

• What is the probability that both of the next two cards turned are clubs?

5.6 Poisson DistributionLet the discrete random variable X denote the number of times an event occursin an interval of time or space. Then X may be a Poisson random variablewith x ∈ N. Here is a list of examples of random variables that might obey thePoisson probability law :

• the number of typos on a printed page. (This is an example of an intervalof space, the space being the printed page.)

• the number of cars passing through the intersection of 8th Avenue NE andNE 50th St in one minute. (This is an example of an interval of time, thetime being one minute.)

• the number of customers at an ATM in 10-minute intervals.

• the number of students arriving during office hours.

Definition 5.7 (Approximate Poisson process). Let X denote the number ofevents in a given continuous interval. Then X follows an approximate Pois-son process with parameter λ > 0 if :

(1) The number of events occurring in non-overlapping intervals are indepen-dent.

(2) The probability of exactly one event in a short interval of length h = 1/n isapproximately λh = λ/n

(3) The probability of exactly two or more events in a short interval is essentiallyzero.

21

From the approximate Poisson process to the Poisson distributionLet X denote the number of events in a given continuous interval. Assume thatX follows an approximate Poisson process with parameter λ > 0. Properties(2) and (3) imply that X obeys a binomial law where the number of trials ncorresponds to the number of short intervals in a given continuous interval andthe probability of success is p = λ/n, that is the probability of exactly one eventin a short interval. Therefore the pmf of X is given by :

P(X = x) =(n

x

)(λ

n

)x(1− λ

n

)n−xNow, let us see how that pmf behaves when n tends towards infinity, that is

when the short interval gets smaller and smaller. To this end, let us rewrite theterms in the pmf :

P(X = x) = n!x!(n− x)! ·

λx

nx

(1− λ

n

)n(1− λ

n

)−x= λx

x! ·n!

(n− x)! ·1nx

(1− λ

n

)n(1− λ

n

)−x= λx

x! ·n

n· n− 1

n· · · n− x+ 1

n

(1− λ

n

)n(1− λ

n

)−x

• λx/x! is a constant with respect to n

• nn ·

n−1n · · ·

n−x+1n −−→

n∞1 · 1 · · · 1 = 1

•(1− λ

n

)−x −−→n∞

(1− 0)−x = 1

• According to a classic result of calculus,(1− λ

n

)n −−→n∞

e−λ

Hence, we obtain the following result :

limn∞

P(X = x) = e−λλx

x!

Definition 5.8 (Poisson Distribution). Let (Ω,A,P) be a probability space. Arandom variable X is said to have a Poisson distribution P(λ), with λ > 0if its probability mass function is given by :

pX(x) = P(X = x) = e−λλx

x! , for x ∈ N (29)

Proof. Let us prove that the pmf of a Poisson distribution is actually a validpmf :

22

• pX(x) = e−λ λx

x! ≥ 0 for x ∈ N ;


n∑x=0

pX(x) =∞∑x=0

e−λλx

x!

= e−λ∞∑x=0

λx

x!

Here we recognize the exponential series= e−λeλ

= 1

Property 5.7 (Mean and Variance for a Poisson Distribution). If X follows aPoisson distribution P(λ), then


E[X] = λ (30)

• its variance is given by :Var(X) = λ (31)


E[X] =∞∑x=0

xpX(x)

=∞∑x=0

xe−λλx

x!

= e−λ∞∑x=1

λx

(x− 1)!

= e−λ∞∑k=0

λk+1

k!

= λ · e−λ∞∑k=0

λk

k!

= λ · e−λeλ

= λ

23

• According to a previously used trick, the variance can be written as follows: Var(X) = E[X(X − 1)] + E[X] − (E[X])2, where E[X(X − 1)] is givenby :

E[X(X − 1)] =∞∑x=0

x(x− 1)pX(x)

=∞∑x=0

x(x− 1)e−λλx

x!

= e−λ∞∑x=2

λx

(x− 2)!

= e−λ∞∑k=0

λk+2

k!

= λ2 · e−λ∞∑k=0

λk

k!

= λ2 · e−λeλ

= λ2

Hence,

Var(X) = λ2 + λ− λ2 = λ

Example. Assume that a professor expects to be asked an average of 3 recom-mendations by quarter.

(1) What is the probability that 3 students ask for a recommendation in a givenquarter?

(2) What is the probability that the professor is asked at least 2 recommenda-tions in 2 quarters?

Example. Five percent of Christmas tree light bulbs manufactured by a com-pany are defective. The company’s Quality Control Manager is quite concernedand therefore randomly samples 100 bulbs coming off of the assembly line. LetX denote the number in the sample that are defective. What is the probabilitythat the sample contains at most three defective bulbs?Answer. X is a binomial random variable with parameters n = 100 (samplesize) and p = 0.05, probability that a bulb is defective. The desired probabilityis thus :

P(X ≤ 3) =3∑

x=0pX(x) =

3∑x=0

(100x

)0.05x · 0.95100−x

24

Many standard calculators would have trouble calculating that probabilityusing the pmf But if you recall the way that we derived the Poisson distribution,it seems reasonable to approximate the binomial distribution with the Poissondistribution whose parameter λ should be the expected number of defectivebulbs, that is λ = np = 100 · 0.05 = 5. This approximation holds as long as thenumber of trials n is large (and therefore, p is small since p = λ/n).

The exact calculation gives : P(X ≤ 3) ≈ 0.2578. Now, let us see how goodthe Poisson approximation is :

∑3x=0 e

−λλx/x! ≈ 0.2650, which is not too badof an approximation.

Property 5.8. Let X be a random variable that follows a binomial distributionBin(n, p). Then, when n is large and p is small enough to make np moderate,X is approximately a Poisson random variable with parameter λ = np.

In general, the above approximation works well if n ≥ 20 and p ≤ 0.05, or ifn ≥ 100 and p ≤ 0.10

25

chapter 4 : discrete random variablesnehemyl/files/uw_math_stat394... · chapter 4 : discrete...

Documents