lesson 4:

62
Lesson4-1 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson 4: Discrete Probability Distributions

Upload: derek-powers

Post on 30-Dec-2015

14 views

Category:

Documents


0 download

DESCRIPTION

Lesson 4:. Discrete Probability Distributions. Outline. Random Variables and probability distribution. A random variable is a numerical value determined by the outcome of an experiment. A random variable is often denoted by a capital letter, e.g., X or Y. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lesson 4:

Lesson4-1 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Lesson 4:

Discrete Probability Distributions

Page 2: Lesson 4:

Lesson4-2 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Outline

Random variables and probability distribution

Features of univariate probability distribution

Features of bivariate probability distribution

Marginal distribution and Conditional distribution

Expectation and conditional expectation

Variance, Covariance and Correlation Coefficient

Binomial Probability Distribution

Hypergeometric Probability Distribution

Poisson Probability Distribution

Page 3: Lesson 4:

Lesson4-3 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Random Variables and probability distribution

A random variable is a numerical value determined by the outcome of an experiment. A random variable is often denoted by a capital letter, e.g., X or Y.

A probability distribution is the listing of all possible outcomes of an experiment and the corresponding probability.

Page 4: Lesson 4:

Lesson4-4 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Types of Probability Distributions

A discrete probability distribution can assume only certain outcomes (need not be finite) – for random variables that take discrete values. The number of students in a class. The number of children in a family.

A continuous probability distribution can assume an infinite number of values within a given range – for random variables that take continuous values. The distance students travel to class. The time it takes an executive to drive to work. The amount of money spent on your last haircut.

Page 5: Lesson 4:

Lesson4-5 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Types of Probability Distributions

Number of random variables

Joint distribution

1 Univariate probability distribution

2 Bivariate probability distribution

3 Trivariate probability distribution

… …

n Multivariate probability distribution

Probability distribution may be classified according to the number of random variables it describes.

Page 6: Lesson 4:

Lesson4-6 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Features of a Univariate Discrete Distribution

Let x1,…,xN be the list of all possible outcomes (N of them).

The main features of a discrete probability distribution are: The probability of a particular outcome, P(xi), is between

0 and 1.00. The sum of the probabilities of the various outcomes is

1.00. That is, P(x1) + … + P(xN) = 1

The outcomes are mutually exclusive. That is, P(x1and x2) = 0 and P(x1or x2) = P(x1)+ P(x2)

Generally, for all i not equal to k.P(xi and xk) = 0.

P(xi or xk) = P(xi)+ P(xk)

Outcome

Prob.

x1 P(x1)

x2 P(x2)

… …

xN P(xN)

Page 7: Lesson 4:

Lesson4-7 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Features of a Univariate Discrete Distribution

x Prob.

1 0.2

2 0.3

3 0.1

1 0.4

Can the following be a probability distribution of a random variable?

x Prob.

1 0.6

2 0.3

3 0.1

event Prob.

1 or 2 0.6

2 or 3 0.3

3 or 1 0.1

Page 8: Lesson 4:

Lesson4-8 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE: Univariate probability distribution

Consider a random experiment in which a coin is tossed three times. Let x be the number of heads. Let H represent the outcome of a head and T the outcome of a tail.

The possible outcomes for such an experiment will be: TTT, TTH, THT, THH, HTT, HTH, HHT, HHH.

Thus the possible values of x (number of heads) are

From the definition of a random variable, x as defined in this experiment, is a random variable.

P(x=0) =1/8P(x=1) =3/8P(x=2) =3/8P(x=3) =1/8

If the coin is fair

x=0: TTTx=1: TTH, THT, HTT x=2: THH, HTH, HHTx=3: HHH

Page 9: Lesson 4:

Lesson4-9 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Features of a Bivariate Discrete Distribution

If X and Y are discrete random variables, we may define their joint probability function as PXY(xi,yi)

Let (x1,…,xR) and (y1,…,yS) be the list of all possible outcomes for X and Y respectively.

The main features of a bivariate discrete probability distribution are: The probability of a particular outcome, PXY(xi,yi) is

between 0 and 1. The sum of the probabilities of the various outcomes is

1.00. That is, PXY(x1,y1) + PXY(x2,y1) +…+ PXY(xR,y1) + + … + PXY(xR,yS) = 1

The outcomes are mutually exclusive. That is, if xi not equal to xk, or yi not equal to yk

PXY((xi,yi) and (xk,yk)) = 0 and PXY((xi,yi) or (xk,yk)) = PXY(xi,yi) + PXY(xk,yk)

Page 10: Lesson 4:

Lesson4-10 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Example: Bivariate Discrete Distribution

y1 y2 y3 y4

x1 P(x1,y1) P(x1,y2) P(x1,y3) P(x1,y4)

x2 P(x2,y1) P(x2,y2) P(x2,y3) P(x2,y4)

x3 P(x3,y1) P(x3,y2) P(x3,y3) P(x3,y4)

X takes 3 possible values and Y takes 4 possible values.

Page 11: Lesson 4:

Lesson4-11 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE: Bivariate distribution

Rainy Not Rainy Totals

HSI falls 0.15 0.4 0.55

HSI rises 0.2 0.25 0.45

Totals 0.35 0.65 1.0

The joint distribution of the movement of Hang Seng Index (HSI) and weather is shown in the following table.

Page 12: Lesson 4:

Lesson4-12 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE: Bivariate distribution

Rainy Not Rainy Totals

HSI falls 0 a

HSI rises 0 b

Totals 0

The joint distribution of the movement of Hang Seng Index (HSI) and weather is shown in the following table.

PX|Y(x | y) = P(X = x | Y = y)=P(x,y)/P(y) if P(Y = y) > 0PX|Y(x | y) =0 if P(Y = y) = 0

P(HSI falls|Rainy) = P(HSI falls, Rainy) / P(Rainy)= 0/0

Suppose …..

Forcing P(HSI falls|Rainy) in the definition eliminates the difficulty in interpreting 0/0.

Page 13: Lesson 4:

Lesson4-13 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Marginal Distributions

The marginal probability function of X.PX(x) = yPXY(x, y) = PXY(x, y1) +PXY(x, y2) +…+ PXY(x, yn)

P(HSI falls)= P(HSI falls and rainy) + P(HSI falls and not rainy)P(HSI rises)= P(HSI rises and rainy) + P(HSI rises and not rainy)

The double sum xyPXY(x, y)

= P(HSI falls and rainy) + P(HSI falls and not rainy)+ P(HSI rises and rainy) + P(HSI rises and not rainy)

= P(HSI falls)+P(HSI rises)= 1

Y

Rainy Not Rainy Totals

X HSI falls 0.15 0.4 0.55

HSI rises 0.2 0.25 0.45

Totals 0.35 0.65 1.0

Page 14: Lesson 4:

Lesson4-14 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Marginal Distributions

The marginal probability function of X.yPXY(x, y) = PX(x).

The marginal probability function of Y.xPXY(x, y) = PY(y).

The double sum yxPXY(x, y) = 1

Y

Rainy Not Rainy Totals

X HSI falls 0.15 0.4 0.55

HSI rises 0.2 0.25 0.45

Totals 0.35 0.65 1.0

Page 15: Lesson 4:

Lesson4-15 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Conditional Distributions

The conditional probability function of X given Y:PX|Y(x | y) = P(X = x | Y = y) = PXY(x,y)/PY(y) if P(Y = y) >

0PX|Y(x | y) =0 if P(Y =

y) = 0

Y

Rainy Not Rainy Totals

X HSI falls 0.15 0.4 0.55

HSI rises 0.2 0.25 0.45

Totals 0.35 0.65 1.0

Note that PX|Y(x | y) when P(Y = y) = 0 is undefined using the top formula.

Page 16: Lesson 4:

Lesson4-16 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Conditional Distributions

For each fixed y this is a probability function for X, i.e. the conditional probability function is non-negative and

XPX|Y(x | y) = PX|Y(x1 | y)+ PX|Y(x2 | y)

= PX,Y(x1, y)/ PY(y) + PX,Y(x2, y)/ PY(y)

=[PX,Y(x1, y) + PX,Y(x2, y)]/ PY(y)

=1.

Y

Rainy Not Rainy Totals

X HSI falls 0.15 0.4 0.55

HSI rises 0.2 0.25 0.45

Totals 0.35 0.65 1.0

Page 17: Lesson 4:

Lesson4-17 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Conditional Distributions

The conditional probability function of X given Y:PX|Y(x | y) = P(X = x | Y = y) if P(Y = y) > 0

PX|Y(x | y) =0 if P(Y = y) = 0 For each fixed y this is a probability function for X, i.e.

the conditional probability function is non-negative and XPX|Y(x | y) = 1.

By the definition of conditional probability, PX|Y(x | y) = PX,Y(x, y)/ PY(y).

E.g., P(HSI rises| Rainy) = 0.2/0.35. When X and Y are independent,

PX|Y(x | y) is equal to PX(x).

Y

Rainy Not Rainy Totals

X HSI falls 0.15 0.4 0.55

HSI rises 0.2 0.25 0.45

Totals 0.35 0.65 1.0

Page 18: Lesson 4:

Lesson4-18 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Example: Conditional Distributions

Y

Rainy Not Rainy Totals

P(X|Rainy) P(X| Not Rainy)

X HSI falls 0.15 0.4 0.55 0.15/0.35 0.4/0.65

HSI rises 0.2 0.25 0.45 0.2/0.35 0.25/0.65

Totals 0.35 0.65 1.0 1.0 1.0

P(Y|HSI falls)

0.15/0.55

0.4/0.55 1.0

P(Y|HSI rises)

0.2/0.45 0.25/0.45 1.0

PX|Y(x | y) = PX,Y(x, y)/ PY(y).

PY|X(y | x) = PX,Y(x, y)/ PX(x).

Page 19: Lesson 4:

Lesson4-19 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Transformation of Random variables

A transformation of random variable(s) results in a new random variable.

For example, if X and Y are random variables, the following are also random variables: Z=2X Z=3+2X Z=X2

Z=log(X) Z=X+Y Z=X2+Y2

Page 20: Lesson 4:

Lesson4-20 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

The Expectation (mean) of a Discrete Probability Distribution

The expectation (mean): reports the central location of the data. is the long-run average value of the random

variable. That is, the average of the outcomes of many experiments.

is also referred to as its expected value, E(X), in a probability distribution.

Is also known as first moment of a random variable.

is a weighted average.

Page 21: Lesson 4:

Lesson4-21 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Moments of a random variable

E(X) First moment

E(X2) Second moment

The n-th moment is defined as the expectation of the n-th power of a random variable: E(Xn)

E(X-)2 Second centralized moment

E(X-)3 Third centralized moment

The n-th centralized moment is defined as: E[X-E(X)]n

Page 22: Lesson 4:

Lesson4-22 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

The Expectation (Mean) of Discrete Probability Distribution

For univariate probability distribution, the expectation or mean E(X) is computed by the formula:

For bivariate probability distribution, the the expectation or mean E(X) is computed by the formula:

)P(xx ... )P(x x )P(x x

Σ[xP(x)]E(x)

nn2211

)y,(xPx )y,(xPx )y,(xPx

...

)y,(xPx )y,(xPx )y,(xP x

y)](x,[xPΣΣE(x)

nnYX,nn2YX,2n1YX,1

1nYX,n12YX,211YX,1

YX,XY

...

...

Page 23: Lesson 4:

Lesson4-23 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Conditional Mean of Bivariate Discrete Probability Distribution

For bivariate probability distribution, the conditional expectation or conditional mean E(X|Y) is computed by the formula:

Unconditional expectation or mean of X, E(X)

)y|(xPx )y|(xPx )y|(xP x

)]y|(x[xPΣ)yY|E(X

inY|Xni2Y|X2i1Y|X1

iY|XXi

...

][

[

X

iYiY

μE

Y)]|E(XE

)(y)PyY|E(XΣ E(X)

Page 24: Lesson 4:

Lesson4-24 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Expectation of a linear transformed random variable

If a and b are constants and X is a random variable, then E(a) = aE(bX) = bE(X)E(a+bX) = a+bE(X)

bE(x)a

] )P(xx ...)P(xx )P(xb[x)]P(x)P(x )a[P(x

)P(xbx )aP(x ...)P(xbx )aP(x )P(xbx )aP(x

))P(xx ... ))P(xx(a ))P(xbx(a

bx)P(x)]Σ[(a

bx)]bx)P(aΣ[(abx)E(a

nn2211n21

nnn222111

nn2211

...

( bab

Page 25: Lesson 4:

Lesson4-25 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

The Variance of a Discrete Probability Distribution

The variance measures the amount of spread (variation) of a distribution.

The variance of a discrete distribution is denoted by the Greek letter 2 (sigma squared).

The standard deviation is the square root of 2.

Page 26: Lesson 4:

Lesson4-26 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

The Variance of a Discrete Probability Distribution

For univariate discrete probability distribution

For bivariate discrete probability distribution

)P(xμ)(x ... )P(xμ)(x )P(xμ)(x

P(x)]μ)Σ[(x

]μ)E[(XXV

n2

n22

212

1

2

2

)(

)y,(xP)μ-x )y,(xP)μ-x )y,(xP)μ-(x

...

)y,(xP)μ-x )y,(xP)μ-x )y,(xP)μ-(x

y)](x,P)μ-[(xΣΣ

])μE[(XV

nnYX,2

Xnn2YX,2

X2n1YX,2

X1

1nYX,2

Xn12YX,2

X211YX,2

X1

YX,2

XXY

2X

(...(

(...(

)(

X

Page 27: Lesson 4:

Lesson4-27 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Variance of a linear transformed random variable

If a and b are constants and X is a random variable, then V(a) = 0V(bX) = b2V(X)V(a+bX) = b2V(X)

V(X)b

] μ)(X E[ b

] μ)(Xb E[

] μ)b(X E[

] bμbX E[

] ) bμabXa E[bXaV

2

22

22

2

2

2

()(

Page 28: Lesson 4:

Lesson4-28 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

The Covariance of a Bivariate Discrete Probability Distribution

y)](x,)Pμ)(Yμ-[(xΣΣ

)]μ)(YμE[(XC

YX,YXXY

YX

),( YX

Covariance measures how two random variables co-vary.

E[X]E[Y]E[XY]

μμE[XY]

μμ μμμμE[XY]

μμ E[X]μE[Y]μE[XY]

]μμX μYμE[XY

)]μ)(YμE[(XC

YX

YXXYYX

YXYX

YXYX

YX

),( YX

Page 29: Lesson 4:

Lesson4-29 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Covariance of linear transformed random variables

If a and b are constants and X is a random variable, then C(a,b) = 0C(a,bX) = 0C(a+bX,Y) = bC(X,Y)

Y)bC(X,

)μμ)(YE(X b

)μ](Y ) μb(X E[

)μ](Y bμbX E[

)μ](Y ) bμabXa E[YbX,aC

Y

YX

YX

YX

()(

Page 30: Lesson 4:

Lesson4-30 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Variance of a sum of random variables

If a and b are constants and X and Y are random variables, then

V(X+Y) = V(X) + V(Y) + 2C(X,Y)V(aX+bY) =a2V(X) + b2V(Y) + 2abC(X,Y)

Y)C(X,

)]μ)(YμE[(X)μ(YE ]) μ(X E[

)]μ)(Yμ(X)μ(Y ) μ(X E[

)]μ(Y)μ(X E[

] )μ μY XE[YXV

YX2

Y2

X

YX2

Y2

X

2YX

2YX

2)()(

2[

2

()(

YVXV

Y)C(X,a

)]μ)(YμE[(X)μ(YE ]) μ(X E[a

)]μ)(bYμ(aX)μ(Y ) μ(Xa E[

)]μ(bY)μ(aX E[

] )μ μYaX E[YXV

22

YX2

Y22

X2

YX2

Y22

X2

2YX

2YX

abYVbXV

abb

bab

ba

babba

2)()(

2[

2

()(

Page 31: Lesson 4:

Lesson4-31 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Correlation coefficient

The strength of the dependence between X and Y is measured by the correlation coefficient:

V(X)V(Y)Y)C(X,

Y)rr(X,C o

Page 32: Lesson 4:

Lesson4-32 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE

Dan Desch, owner of College Painters, studied his records for the past 20 weeks and reports the following number of houses painted per week:

Probability, P(x)

.25

.30

.35

.10

1.00

Number of houses painted, x W e e k s

10 5

11 6

12 7

13 2

Total 20

Page 33: Lesson 4:

Lesson4-33 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE

Compute the mean and variance of the number of houses painted per week and:

x P(x)

10 .25 11 .30

12 .35

13 .10 Total 1.00

11.3

(13)(.10)(12)(.35)(11)(.30)(10)(.25)

Σ[xP(x)]E(x)μ

0.91

0.28900.17150.02700.4225

(.10)11.3)(13...(.25)11.3)(10

P(x)]μ)Σ[(xσ22

22

Page 34: Lesson 4:

Lesson4-34 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Binomial Probability Distribution

The binomial distribution has the following characteristics: An outcome of an experiment is classified into one of

two mutually exclusive categories, such as a success or failure.

The data collected are the results of counts in a series of trials.

The probability of success stays the same for each trial. The trials are independent.

For example, tossing an unfair coin three times. H is labeled success and T is labeled failure. The data collected are number of H in the three tosses. The probability of H stays the same for each toss. The results of the tosses are independent.

Page 35: Lesson 4:

Lesson4-35 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Binomial Probability Distribution

To construct a binomial distribution, let n be the number of trials x be the number of observed successes be the probability of success on each trial

The formula for the binomial probability distribution is:

P(x) = nCx x(1- )n-x

Page 36: Lesson 4:

Lesson4-36 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

The density functions of binomial distributions with n=20 and different success rates p

Page 37: Lesson 4:

Lesson4-37 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE

x = number of patients who will experience nausea following treatment with Phe-Mycin

2 4-2 2 24!p(2) P(x=2)= (0.1) (0.9) =6(0.1) (0.9) =0.0486

2!(4-2)!

Find the probability that 2 of the 4 patients treated will experience nausea.

n = 4 , p = 0.1 , q = 1 – p = 1 - 0.1 = 0.9

Page 38: Lesson 4:

Lesson4-38 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Binomial Probability Distribution

The formula for the binomial probability distribution is:P(x) = nCx x(1- )n-x

TTT, TTH, THT, THH, HTT, HTH, HHT, HHH.

X=number of heads The coin is fair, i.e., P(head) = 1/2.

P(x=0) = 3C0 0.5 0(1- 0.5)3-0 =3!/(0!3!) (1) (1/8)=1/8

P(x=1) = 3C1 0.5 1(1- 0.5)3-1 =3!/(1!2!) (1) (1/8)= 3/8

P(x=2) = 3C2 0.5 2(1- 0.5)3-2 =3!/(2!1!) (1) (1/8)= 3/8

P(x=3) = 3C3 0.5 3(1- 0.5)3-3 =3!/(3!0!) (1) (1/8)= 1/8

When the coin is not fair, simple counting rule will not work.

r)!(nr!n!

Crn

Page 39: Lesson 4:

Lesson4-39 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Mean & Variance of the Binomial Distribution

The mean is found by:

The variance is found by:

n

)1(2 n

Page 40: Lesson 4:

Lesson4-40 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE

The Alabama Department of Labor reports that 20% of the workforce in Mobile is unemployed. From a sample of 14 workers, calculate the following probabilities: Exactly three are unemployed. At least three are unemployed. At least one are unemployed.

Page 41: Lesson 4:

Lesson4-41 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE

The probability of exactly 3:

The probability of at least 3 is:

The probability of at least one being unemployed:

2501.)0859)(.0080)(.364(

)20.1()20(.)3( 113314

CP

551.000....172.250.

)80(.)20(....)80(.)20(.)3( 0141414

113314

CCxP

.956.0441

.20)(1(.20)C1

P(0)11)P(x140

014

Page 42: Lesson 4:

Lesson4-42 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE

Since =.2 and n=14.

Hence, the mean is:= n = 14(.2) = 2.8.

The variance is:2 = n (1- ) = (14)(.2)(.8) =2.24.

Page 43: Lesson 4:

Lesson4-43 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Finite Population

A finite population is a population consisting of a fixed number of known individuals, objects, or measurements. Examples include: The number of students in this class. The number of cars in the parking lot.

Page 44: Lesson 4:

Lesson4-44 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Hypergeometric Distribution

The hypergeometric distribution has the following characteristics: There are only 2 possible outcomes. The probability of a success is not the same on

each trial. It results from a count of the number of

successes in a fixed number of trials.

Page 45: Lesson 4:

Lesson4-45 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE

R1

B1

R2

B2

R2

B2

7/12

5/12

6/11

5/11

7/11

4/11

In a bag containing 7 red chips and 5 blue chips you select 2 chips one after the other without replacement.

The probability of a success (red chip) is not the same on each trial.

Page 46: Lesson 4:

Lesson4-46 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Hypergeometric Distribution

The formula for finding a probability using the hypergeometric distribution is:

where N is the size of the population, S is the number of successes in the population, x is the number of successes in a sample of n observations.

nN

xnSNxS

C

CCxP

))(()(

Page 47: Lesson 4:

Lesson4-47 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Hypergeometric Distribution

Use the hypergeometric distribution to find the probability of a specified number of successes or failures if: the sample is selected from a finite population

without replacement (recall that a criteria for the binomial distribution is that the probability of success remains the same from trial to trial)

the size of the sample n is greater than 5% of the size of the population N .

Page 48: Lesson 4:

Lesson4-48 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

The density functions of hypergeometric distributions with N=100, n=20 and different success rates p (=S/N).

Page 49: Lesson 4:

Lesson4-49 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE: Hypergeometric Distribution

The National Air Safety Board has a list of 10 reported safety violations. Suppose only 4 of the reported violations are actual violations and the Safety Board will only be able to investigate five of the violations. What is the probability that three of five violations randomly selected to be investigated are actually violations?

238.252

)15(4))((

))(()3(

510

2634

510

2541034

C

CC

C

CCP

Page 50: Lesson 4:

Lesson4-50 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Poisson Probability Distribution

The binomial distribution becomes more skewed to the right (positive) as the probability of success become smaller.

The limiting form of the binomial distribution where the probability of success is small and n is large is called the Poisson probability distribution.

The formula for the binomial probability distribution is:P(x) = nCx x(1- )n-x

Page 51: Lesson 4:

Lesson4-51 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Poisson Probability Distribution

The Poisson distribution can be described mathematically using the formula:

where is the mean number of successes in a particular interval of time, e is the constant 2.71828, and x is the number of successes.

!)(

x

exP

x

Page 52: Lesson 4:

Lesson4-52 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Poisson Probability Distribution

The mean number of successes can be determined in binomial situations by n , where n is the number of trials and the probability of a success.

The variance of the Poisson distribution is also equal to n .

X, the number of success generally has no specific upper limit.

Probability distribution always skewed to the right. Becomes symmetrical when gets large.

Page 53: Lesson 4:

Lesson4-53 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

EXAMPLE: Poisson Probability Distribution

The Sylvania Urgent Care facility specializes in caring for minor injuries, colds, and flu. For the evening hours of 6-10 PM the mean number of arrivals is 4.0 per hour. What is the probability of 2 arrivals in an hour?

1465.!2

4!

)(42

e

xe

xPx

Page 54: Lesson 4:

Lesson4-54 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Example: Poisson Probabilities

x = number of Cleveland air traffic control errors during one week

= 0.4 (expected number of errors per week)

Find the probability that 3 errors will occur in a week.

.0072=3!

)4.0(e=3)=P(x p(3)

34.-0

Page 55: Lesson 4:

Lesson4-55 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Mean and Variance of a Poisson Random Variable

If x is a Poisson random variable with parameter , then

Standard Deviation 2xx =

=X Mean

=2x

Variance

Page 56: Lesson 4:

Lesson4-56 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Several Poisson Distributions

Page 57: Lesson 4:

Lesson4-57 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

What distributions to use?

Poisson considers the number of times an event occurs over an INTERVAL of TIME or SPACE. Note that we are not considering a sample of given number of observations. Thus, if we are considering a sample of 10

observations and we are asked to compute the probability of having 6 successes, we should not use Poisson. Instead, we should consider Binomial or Hypergeometric.

Hypergeometric consider the number of successes in a sample when the probability of success varies across trials due to “without replacement” sampling strategy. To compute the Hypergeometric probability, one will need to know N and S separately. Suppose we know that the probability of success is

0.3. We are considering a sample of 10 observations and we are asked to compute the probability of having 6 successes. We cannot use Hypergeometric because we do not have N and S separately. Instead, we have to use Binomial.

Page 58: Lesson 4:

Lesson4-58 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

What distributions to use?Example

First, we recognize that it is not Poisson because "4 of the disks are inspected" (i.e., sample size =4). 

Second, it is sampling without replacement because if we were to inspect four disks for defects, we will not want to sample with replacement. 

Third, both N (15 hard disks) and S (5 are defective) are given.  Hence we will use Hypergeometric.

In a shipment of 15 hard disks, 5 are defective.  If 4 of the disks are inspected, what is the probability that exactly 1 is defective?

Page 59: Lesson 4:

Lesson4-59 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

What distributions to use?Example

First, we recognize that it is not Poisson because 8 cars are “inspected" (i.e., sample size =8). 

Second, it is sampling without replacement because if we were to inspect all 8 cars for defects, we will not want to sample with replacement. 

Third, both N (48 cars) and S (12 have defective radio) are given.  Hence we will use Hypergeometric.

From an inventory of 48 cars being shipped to local automobile dealers, 12 have had defective radios installed. What is the probability that one particular dealership receiving 8 cars obtains all with defective radios?

Page 60: Lesson 4:

Lesson4-60 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

What distributions to use?Example

First, we recognize that it is likely Poisson because “on a given day”.  

Second, we are asked to compute the probability of the number of claims larger than some number. There is no limit on the number of claims that can arrive in a given day.

Third, “average per day” is given. Hence we will use Poisson.

The number of claims for missing baggage for a well-known airline in a small city averages nine per day. What is the probability that, on a given day, there will be fewer than three claims made?

Page 61: Lesson 4:

Lesson4-61 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

What distributions to use?Example

First, we recognize that it is not Poisson because 20 customers place orders (i.e., sample size =20). 

Second, the probability of drawing a particular type of customers appears the same across trials because “the probability of customers exceeding their credit limit is 0.05”.

Hence we will use Binomial.

When a customer places an order with Rudy’s on-Line Office Supplies, a computerized accounting information system (AIS) automatically checks to see if the customer has exceeded his or her credit limit. Past records indicate that the probability of customers exceeding their credit limit is 0.05. Suppose that, on a given day, 20 customers place orders. What is the probability that zero customers will exceed their limits?

Page 62: Lesson 4:

Lesson4-62 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

- END -

Lesson 4: Lesson 4: Discrete Probability Distributions