normal distributions - university of torontofisher.utstat.toronto.edu/~hadas/sta286/lecture...

24
STA286 week 6 1 Normal distributions The most important continuous probability distribution in the entire filed of statistics is the normal distributions. All normal distributions have the same overall shape. The exact density curve for a particular normal distribution is specified by giving its mean and its variance 2 . The mean is located at the center of the symmetric density curve and is the same as the median and the mode. Changing without changing moves the normal curve along the horizontal axis without changing its spread.

Upload: duongminh

Post on 27-Jul-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

STA286 week 6 1

Normal distributions

• The most important continuous probability distribution in the entire filed of statistics is the normal distributions.

• All normal distributions have the same overall shape.

• The exact density curve for a particular normal distribution is specified by giving its mean and its variance 2.

• The mean is located at the center of the symmetric density curve and is the same as the median and the mode.

• Changing without changing moves the normal curve along the horizontal axis without changing its spread.

STA286 week 6 2

• The standard deviation controls the spread of a normal curve.

• The density funstion of the normal random variable is given by

• Notation: A normal distribution with mean and variance 2 is denoted by N(, 2).

• Note, there are other symmetric bell-shaped density curves that are not normal e.g. t distribution.

STA286 week 6 3

xexf

x

X ,21 2

221

STA286 week 6 4

The 68-95-99.7 rule

In the normal distribution with mean and standard deviation , Approximately 68% of the observations fall within of the mean . Approximately 95% of the observations fall within 2 of the mean . Approximately 99.7% of the observations fall within 3 of the mean .

STA286 week 6 5

Standardizing and z-scores

• If x is an observation from a distribution that has mean and standard deviation , the standardized value ofx is given by

• A standardized value is often called a z-score.

• A z-score tells us how many standard deviations the original observation falls away from the mean of the distribution.

• Standardizing is a linear transformation that transform the data into the standard scale of z-scores. Therefore, standardizing does not change the shape of a distribution, but changes the value of the mean and standard deviation.

xz

STA286 week 6 6

Example• The heights of women is approximately normal with mean =

64.5 inches and standard deviation = 2.5 inches.

• The standardized height is

• The standardized value (z-score) of height 68 inches is

or 1.4 std. dev. above the mean.

• A woman 60 inches tall has standardized height

or 1.8 std. dev. below the mean.

6 4 .52 .5

h eig h tz

6 8 6 4 .5 1 .42 .5z

60 64.5 1.82.5z

STA286 week 6 7

The Standard Normal distribution• The standard normal distribution is the normal distribution N(0, 1)

that is, the mean = 0 and the sdev = 1 .

• If a random variable X has normal distribution N(, ), then the standardized variable

has the standard normal distribution.

• Areas under a normal curve represent proportion of observations from that normal distribution.

• There is no formula to calculate areas under a normal curve. Calculations use either software or a table of areas. The table and most software calculate one kind of area: cumulative proportions . A cumulative proportion is the proportion of observations in a distribution that fall at or below a given value and is also the area under the curve to the left of a given value.

XZ

STA286 week 6 8

The standard normal tables• Table A.3 gives cumulative proportions for the standard normal

distribution. The table entry for each value z is the area under the curve to the left of z, the notation used is P( Z ≤ z).

e.g. P( Z ≤ 1.4 ) = 0.9192

9

Standard Normal Distributionz .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

0.00.10.20.30.40.50.60.70.80.91.01.11.21.31.41.51.61.71.81.92.02.12.22.32.42.52.62.72.82.93.0

.5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359

.5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753

.5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141

.6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517

.6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879

.6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224

.7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549

.7580 .7611 .7642 .7673 .7703 .7734 .7764 .7794 .7823 .7852

.7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

.8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389

.8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621

.8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830

.8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015

.9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177

.9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319

.9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441

.9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545

.9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633

.9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706

.9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767

.9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817

.9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857

.9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890

.9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916

.9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936

.9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952

.9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964

.9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974

.9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981

.9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986

.9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990

The table shows area to left of ‘z’ under standard normal curve

For a negative number, -z : Area below (-z) = Area above (z) =1 – Area below (z)

STA286 week 6 10

The standard normal tables - Example

• What proportion of the observations of a N(0,1) distribution takes values

a) less than z = 1.4 ?

b) greater than z = 1.4 ?

c) greater than z = -1.96 ?

d) between z = 0.43 and z = 2.15 ?

STA286 week 6 11

Properties of Normal distribution

• If a random variable Z has a N(0,1) distribution then P(Z = z)=0. The area under the curve below any point is 0.

• The area between any two points a and b (a < b) under the standard normal curve is given by

P(a ≤ Z ≤ b) = P(Z ≤ b) – P(Z ≤ a)

• As mentioned earlier, if a random variable X has a N(, ) distribution, then the standardized variable

has a standard normal distribution and any calculations about Xcan be done using the following rules:

XZ

STA286 week 6 12

• P(X = k) = 0 for all k.

• The solution to the equation P(X ≤ k) = p isk = μ + σzp

Where zp is the value z from the standard normal table that has area (and cumulative proportion) p below it, i.e. zp is the pth

percentile of the standard normal distribution.

aZPaXP

bZPbXP 1

bZaPbXaP

STA286 week 6 13

Questions1. The marks of STA286 students has N(65, 15) distribution. Find the

proportion of students having marks (a) less then 50. (b) greater than 80.(c) between 50 and 80.

2. Scores on SAT verbal test follow approximately the N(505, 110) distribution. How high must a student score in order to place in the top 10% of all students taking the SAT?

3. The time it takes to complete a STA286 term test is normally distributed with mean 100 minutes and standard deviation 14 minutes. How much time should be allowed if we wish to ensure that at least 9 out of 10 students (on average) can complete it?

STA286 week 6 14

4. General Motors of Canada has a deal: ‘an oil filter and lube job in 25 minutes or the next one free’. Suppose that you worked for GM and knew that the time needed to provide these services was approximately normal with mean 15 minutes and std. dev. 2.5 minutes. How many minutes would you have recommended to put in the ad above if it was decided that about 5 free services for 100 customers was reasonable?

5. In a survey of patients of a rehabilitation hospital the mean length of stay in the hospital was 12 weeks with a std. dev. of 1 week. The distribution was approximately normal.

a) Out of 100 patients how many would you expect to stay longer than 13 weeks?

b) What is the percentile rank of a stay of 11.3 weeks?c) What percentage of patients would you expect to be in longer than 12

weeks?d) What is the length of stay at the 90th percentile?e) What is the median length of stay?

15

Normal Approximation to the Binomial

• If X has a Binomial distribution with mean µ = np and variance σ2 = npq, then the limiting form of the distribution of

as n∞, is the standard normal distribution.

• It turns out that the normal distribution provides a fairly good approximation even when n is not so large (section 6.5).

• As a rule of thumb, we will use this approximation for values of nand p that satisfy np ≥ 10 and n(1-p) ≥ 10 .

week 8

npqnpXZ

16

Example

• You are planning a sample survey of small businesses in your area. You will choose a SRS of businesses listed in the Yellow Pages. Experience shows that only about half the businesses you contact will respond.

(a) If you contact 150 businesses, it is reasonable to use the Bin(150; 0.5) distribution for the number X who respond. Explain why.

(b) What is the expected number (the mean) who will respond?(c) What is the probability that 70 or fewer will respond? (d) How large a sample must you take to increase the mean

number of respondents to 100?

week 8

17

Exercise

• According to government data, 21% of American children under the age of six live in households with incomes less than the official poverty level. A study of learning in early childhood chooses a SRS of 300 children.

a) What is the mean number of children in the sample who come from poverty-level households? What is the standard deviation of this number?

b) Use the normal approximation to calculate the probability that at least 80 of the children in the sample live in poverty.

week 8

week 6 18

The Chi-Square distribution

• The Chi-Squared densities are subsets of the gamma family of distributions. They are obtained by letting α = υ/2 and λ = ½ where υis a positive integer.

• The parameter of the Chi-Squared distribution, υ, is called degrees of freedom.

• The Chi-Squared density is given by

• The mean and variance of the Chi-Squared distribution are…

week 6 19

• Note:

• We can use Table A.5 in Appendix to answer questions like:Find the value k for which . k is the upper 2.5 percentile of the distribution. Notation: .

21

975.0210 kP

210

210,025.0

Weibull Distribution

• The continuous random variable X has a Weibull Distribution, with parameters α and β if its density function is given by

• The mean and variance of the Weibull Distribution are…

• The Weibull distribution is applied to reliability and life-testing problems such as time to failure or life length.

• The Weibull distribution does not have the lack of memory property.

• The cumulative distribution function is given by…

STA286 week 6 20

otherwise,00,1 xexxf

x

X

Example

Service life, in years of an hearing aid battery is a random variable having a Weibull distribution with α = ½ and β = 2.

a) How long can such battery be expected to last?

b) What is the probability that such a battery will be operating after 2 years?

STA286 week 6 21

Failure Rate for the Weibull Distribution

• The time to failure, T, of a component is often described by the Weilbull distribution.

• The Weilbull distribution is helpful in determining the failure rate (also called hazard rate) in order to get a sense of wear or deterioration of the component.

• The reliability of a component is the probability that it will last for at least a specified time under specific experimental conditions.

• The reliability of a component at time t is given by

STA286 week 6 22

tFdxxftTPtR Tt T

1

• The failure rate of a component is the change over time of the conditional probability that the component last an additional ∆t units of time given that it has lasted to time t.

• The failure rate at time t is given by:

• If β = 1, Z(t) = α which is a constant. This is a special case of the Exponential distribution which has lack of memory.

• If β > 1, Z(t) is an increasing function of t indicating that the components wears over time.

• If β < 1, Z(t) is a decreasing function of t indicating that the components strengthens over time.

STA286 week 6 23

.01 tttZ

Example

• The live of a certain automobile seal has the Weibull distribution with failure rate given by:

• Find the probability that the seal is still intact after 4 year.

STA286 week 6 24

./1 ttZ