cumulative distribution function of a discrete random variable

9
Normal approximation to Binomial distribution We had known that Poisson distribution can be used to approximate the binomial distribution for large values of n and small values of p provided that the correct conditions exist. The approximation is only of practical use if just a few terms of the Poisson distribution need be calculated. In cases where many or sometimes several hundred terms need to be calculated the arithmetic involved becomes very tedious indeed and we turn to the normal distribution for help. It is possible, of course, to use high-speed computers to do the arithmetic but the normal approximation to the binomial distribution negates this in a fairly elegant way. In the problem situations following this introduction, the normal distribution is used to avoid very tedious arithmetic while at the same time giving a very good approximate solution. In this topic, we consider the normal approximation to the binomial distribution. Since the binomial is a discrete probability, this may seem to go against the intuition. However, a limiting process is involved, keeping p of the binomial distribution fixed and letting n → ∞. The approximation is known as DeMoivre-Laplace approximation. We recall the binomial distribution as Stirling’s approximation to n! is The error

Upload: muhammad-riezhuan-rabani

Post on 07-Apr-2016

12 views

Category:

Documents


3 download

DESCRIPTION

CDF

TRANSCRIPT

Page 1: Cumulative Distribution Function of a Discrete Random Variable

Normal approximation to Binomial distribution

We had known that Poisson distribution can be used to approximate the binomial distribution for large values of n and small values of p provided that the correct conditions exist. The approximation is only of practical use if just a few terms of the Poisson distribution need be calculated. In cases where many or sometimes several hundred terms need to be calculated the arithmetic involved becomes very tedious indeed and we turn to the normal distribution for help.

It is possible, of course, to use high-speed computers to do the arithmetic but the normal approximation to the binomial distribution negates this in a fairly elegant way. In the problem situations following this introduction, the normal distribution is used to avoid very tedious arithmetic while at the same time giving a very good approximate solution.

In this topic, we consider the normal approximation to the binomial distribution. Since the binomial is a discrete probability, this may seem to go against the intuition. However, a limiting process is involved, keeping p of the binomial distribution fixed and letting n → ∞. The approximation is known as DeMoivre-Laplace approximation.

We recall the binomial distribution as

Stirling’s approximation to n! is

The error

as n → ∞. Using Sterling’s formula to approximate the terms involving n! in the binomial model, we eventually find that, for large n,

Page 2: Cumulative Distribution Function of a Discrete Random Variable

so that

This result make sense in the light of the central limit theorem and the fact that X is the sum of independent Bernoulli trials. Thus, the quantity ( X−np )/√npq approximately has a N(0,1) distribution. If p is close to 0.5 and n > 10, the approximation is fairly good. However, for the other values of p, the value of n must be larger.

In general, experience indicates that the approximation is fairly good as long as np > 5 for p ≤ 0.5 or when nq > 5 when p > 0.5

Problem 1:

Steel bars are made to a nominal length of 4cm but in fact the length is a normally distributed random variable with mean 4.01cm and standard deviation 0.03. Each steel bar costs 6p to make and may be used immediately if its length lies between 3.98cm and 4.02cm. If its length is less than 3.98cm the steel bar cannot be used but has a scrap value of 1p. If the length exceeds 4.02cm it can be shortened and used at a further cost of 2p.

Find the average cost per usable steel bar.

Solution:

Total of steel bars = 100 steel bars

X~N( 4.01 ,0.032 )

Cost has 2 possible values per usable steel bar, 6p, 8p.

P(C=6) = P(3.98 < X < 4.02)

= P(0<Z< 4.01−3.980.03 )+ P(0<Z< 4.02−4.01

0.03 ) = P(0< Z < 1) + P(0 < Z < 0.333)

= 0.3413 + 0.1305

= 0.4718 ➝ Amount of steel bars that cost 6p = 47.18

Page 3: Cumulative Distribution Function of a Discrete Random Variable

P(C=8) = P(X > 4.02)

= P(Z > 0.333)

= 0.5 - P(0 < Z < 0.333)

= 0.3695 ➝ Amount of steel bars that cost 8p = 36.95

Cost:

For usable steel bar cost 8p each, 36.95 steel bars x 8 = 295.6

For usable steel bar cost 6p each, 47.18 steel bars x 6 = 283.08

For non-usable steel bar cost 5p each, 100 - 36.95 - 47.18 = 15.87 steel bars x 5 = 79.35

Therefore,

average cost per usable steel bar = 295.6+283.08+79.35

36.95+47.18 = 658.0384.13 =7.82

Page 4: Cumulative Distribution Function of a Discrete Random Variable

Cumulative Distribution Function of a Discrete Random Variable

For a discrete or continuous random variable, the cumulative distribution function, abbreviated as cdf and denoted by FX ( x ), is the probability of nonexceedance of x; this is sometimes referred to as the distribution function. That is,

FX ( x )=Pr [ X ≤ x ].

We note that FX ( x ) is a monotonic function, which, by definition, increases for increasing values of X and, as previously defined,

0 ≤ F X ( x ) ≤1 , for all possible x .

Definition and properties: Cumulative distribution function, cdf. For a discrete or continuous random variable X the cdf is the probability of nonexceedance of the value x. The cdf is a monotonic (non-decreasing) continuous function that is bounded by 0 and 1. In the discrete case it is obtained by summing over values of the pmf.

In the case of a discrete random variable, FX ( x ) is the sum of the probabilities of all possible values of X that are less than or equal to the argument x . That is,

F x (x )=∑X k ≤ x

px ( xk ) .

This is summed over all possible X k less than or equal to x .

Page 5: Cumulative Distribution Function of a Discrete Random Variable

Problem 2 :

Maximum potential soil absorption capacity. The absorption capacity of a portion of terrain can be described through its curve number. The curve number takes integer values in the range 1-100 and depends on soil properties and land use. As a first approximation, values taken by the random variable CN in a region may be assumed to be equally likely (that is to say, uniformly distributed) with pmf:

PCN ( cn )=1/100, for 1 ≤cn≤ 100

The corresponding cdf is given by

PCN ( cn )=∑i=1

cn 1100

= cn100

, 1≤ cn≤ 100 ;

PCN ( cn )=0 ,cn<1;

PCN ( cn )=1 , c≥ 100.

For example,FCN (25 )=Pr [ cn≤25 ]=0.25 . the pmf and cdf are shown in Fig. 3.1.3a.

This is a step function that appears to be a curve because of numerous steps. The maximum potential soil absorption capacity S is related to the CN as follows:

S=25.4 [ (1000/CN )−10 ] ,

Where S is measured in millimeters. Accordingly, S can take a value from 0 to 25146 mm. For example, S=762mm for CN=25 , and ps (762 )=0.01 . The corresponding cdf of S is the given by the sum of the probabilities of those outcomes of CN that yield a value of S less than or equal to 762mm. This corresponds to CN ≥ 25. Hence,

F s (s )=∑i=cn

100 1100

= ∑i= 25400

s+254

100 1100

=1− 254s+254

,

F s (s )=0 , s<0 F s (s )=1 , s>25146

This cdf is shown in Fig. 3.1.3b

This is also a step function. The log scale allows a clearer definition for high and low values of S.

It is often convenient to consider S as a continuous random variable that can take any real value from 0 to 25146mm.

Page 6: Cumulative Distribution Function of a Discrete Random Variable

Figure 3.1.3 (a) probability density function and cumulative distribution function of equally likely values of curve number, CN. (b) Cumulative distribution function of maximum soil

potential retention S as obtained from equally likely values of the curve number.