math 227 elementary statistics...example 1 : let x be a normal random variable with mean 80 and...

72
Math 227 Elementary Statistics Bluman 5 th edition

Upload: others

Post on 28-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Math 227 Elementary Statistics

Bluman 5th edition

2

CHAPTER 6

The Normal Distribution

3

Objectives

• Identify distributions as symmetrical or

skewed.

• Identify the properties of the normal

distribution.

• Find the area under the standard normal

distribution, given various z values.

• Find the probabilities for a normally

distributed variable by transforming it into a

standard normal variable.

4

Objectives (cont.)

• Find specific data values for given

percentages using the standard normal

distribution.

• Use the central limit theorem to solve

problems involving sample means for

large samples.

• Use the normal approximation to compute

probabilities for a binomial variable.

5

Introduction

• Many continuous variables have distributions

that are bell-shaped and are called

approximately normally distributed variables.

• A normal distribution is also known as the bell

curve or the Gaussian distribution.

6

Normal and Skewed Distributions

• The normal distribution is a continuous, bell-

shaped distribution of a variable.

• If the data values are evenly distributed

about the mean, the distribution is said to be

symmetrical.

• If the majority of the data values fall to the

left or right of the mean, the distribution is

said to be skewed.

7

Left Skewed Distributions

• When the majority of the data values fall to

the right of the mean, the distribution is

said to be negatively or left skewed. The

mean is to the left of the median, and the

mean and the median are to the left of the

mode.

8

Right Skewed Distributions

• When the majority of the data values fall to

the left of the mean, the distribution is said

to be positively or right skewed. The mean

falls to the right of the median and both the

mean and the median fall to the right of

the mode.

9

6.1 Normal Distribution I. Continuous Probability Distributions

A continuous random variable is one that can theoretically take on any value on some line interval. We use to represent a probability density function. Unfortunately, does not give us the probability that the value x will be observed. To understand how a probability density function for a continuous random variable enables us to find probabilities, it is important to understand the relationship between probability and area. For the following given histogram, what is the probability that x is in between 2.5 to 5.5?

0 1 2 3 4 5 6 7 8

0

1

2

3

4

5

C1

Fre

qu

en

cy

Frequency Histogram

0 1 2 3 4 5 6 7 8

0

10

20

C1

Pe

rce

nt

Relative Frequency HistogramA. B.

10

Use the given frequency histogram to calculate P(2.5 < x < 5.5) :

A: P (2.5 < x < 5.5) = (4 + 5 + 4) / (1 + 2 + 3 + 4 + 5 + 4 + 3 + 2 + 1) = 13 / 25 = 52%

Use the corresponding relative frequency histogram to calculate P(2.5 < x < 5.5) :

B: P(2.5 < x < 5.5) = 16% + 20% + 16% = 52% which is the same as the area of the

three middle bars of the relative frequency histogram. The width of each bar is one

and the height is the given percentage.

For a continuous probability distribution,

1) for all values x of the random variable;

2) the total area under the graph of is 1;

3) P (a < x < b) can be approximated by the area under the graph of for

a < x < b.

11

Note : P (x = a) = 0 for continuous random variables.

This implies P(a ≤ x ≤ b) = P(a < x < b);

P(x ≥ a) = P(x > a);

and P(x ≤ a) = P(x < a).

12

II. The Normal Distribution

Continuous probability distributions can

assume a variety of shapes. However, the

most important distribution of continuous

random variables in statistics is the normal

distribution, that is approximately mound-

shaped. Many naturally occurring random

variables such as IQs, height of humans,

weights, times, etc. have nearly normal

distributions.

13

• The mathematical equation for a normal distribution is

The mean is located at the center of distribution. The distribution is symmetric about its mean .

where

e 2.718

3.14

= population mean

= population

standard deviation

1

5.05.0

mean

14

There is a correspondence between area and probability.

Since the total area under the normal probability

distribution is equal to 1, the symmetry implies that the

area to the right of is 0.5 and the area to the left of

is also 0.5.

Large values of reduce the height of the curve and

increase the spread.

Small values of increase the height of the curve and

reduce the spread.

Almost all values of a normal random variable lie in the

interval

15

III. Properties of the Normal Distribution

• The shape and position of the normal

distribution curve depend on two

parameters, the mean and the standard

deviation.

• Each normally distributed variable has its

own normal distribution curve, which

depends on the values of the variable’s

mean and standard deviation.

16

Normal Distribution Properties

• The normal distribution curve is bell-shaped.

• The mean, median, and mode are equal and

located at the center of the distribution.

• The normal distribution curve is unimodal

(i.e., it has only one mode).

• The curve is symmetrical about the mean,

which is equivalent to saying that its shape is

the same on both sides of a vertical line

passing through the center.

17

Normal Distribution Properties

(cont.) • The curve is continuous—i.e., there are no

gaps or holes. For each value of X, here is a

corresponding value of Y.

• The curve never touches the x axis.

Theoretically, no matter how far in either

direction the curve extends, it never meets

the x axis—but it gets increasingly closer.

18

IV. The Standard Normal Distribution

• Since each normally distributed variable has its own mean

and standard deviation, the shape and location of these

curves will vary. In practical applications, one would have

to have a table of areas under the curve for each variable.

To simplify this, statisticians use the standard normal

distribution.

• The standard normal distribution is a normal distribution

with a mean of 0 and a standard deviation of 1.

19

Recall: z Values

• The z value is the number of standard

deviations that a particular X value is away

from the mean. The formula for finding the z

value is:

z zX

value mean

standard deviation or

20

Finding Areas Under the Standard Normal Distribution Curve

Area To the Left of Any z Value

• Look up the z value in the table and use the

area given.

0 z

0 –z

21

To the Right of Any z Value

• Look up the z value in the table to get the

area.

• Subtract the area from 1.

0 - z

22

Between Any Two z Values

• Look up both z values to get the areas.

• Subtract the smaller area from the larger

area.

0 +z –z

23

Between Any Two z Values

• Look up both z values to get the areas.

• Subtract the smaller area from the larger

area.

0 z2 z1

24

Area Under the Curve • The area under the curve is more important

than the frequencies because the area

corresponds to the probability!

• Note: In a continuous distribution, the

probability of any exact Z value is 0 since area

would be represented by a vertical line above

the value. But vertical lines in theory have no

area. So

25

Example 1 :

(b) Find P (-2.48 < z < 0)

From table E

P (z < 1.63) = 0.9484

048.2

From table E

P (-2.48 < z < 0) =0.5-0.0066

=0.4934

4934.0

0 1.63

(a) Find P ( z < 1.63)

0.9484

Area for z=0? 0.5

(c) Find P (-2.02 < z < 1.74)

02.2 074.1

From Table E

-2.02 → 0.0217

1.74 → 0.9591 0.0217

0.9591

P (-2.02 < z < 1.74) =

0. 9591 - 0.0217 = 0.9374

27

(d) Find the probability that z is larger than 1.76.

76.10

From Table E

1.76 → 0.9608

P (z >1.76) = 1 – 0.9608

10.9608

+∞ →1

= 0.0392

28

Example 2 : Assume the standard normal distribution. Fill in the blanks.

(a) P (0 < z < ____ ) = 0.4279

(b) P (0 < z < ____ ) = 0.4997

0 ?z

Add 0.5 to the given area of

0.4279 to get the cumulative

area of 0.9279. 4279.0

1.46

0 ?z

4997.0

z = 1.46

Add 0.5 to the given area of

0.4997 to get the cumulative

area of 0.9997.

z ≈ 3.09

3.09

29

(c) P ( _____ < z < 0) = 0.4370

(d) P (z < _____ ) = 0.9846

0?z

4370.0

-1.53

0

From Table E 9846.05.0

2.16

?z

05-.4370=0.063

z = -1.53 because the z-value

is to the left of the mean.

z = 2.16

30

(e) Find the z value to the left of the mean so that 71.90% of the area under the

distribution curve lies to the right of it.

?z 0

0.7190

71.90% = 0.7190

1 – 0.7190 = 0.2190

From Table E

0.2190 → -0.58

z = -0.58

31

(f) Find two z values, one positive and one negative, so that the areas in the

two tails total to 12%

0

0.12 2 = 0.06 (one tailed area)

From Table E

z = ±1.555

32

I. Calculating Probabilities for a Non-Standard Normal

Distribution Consider a normal variable x with mean and standard deviation .

1. Standardize from x to z.

2. Use Table E to find the central area corresponding to z.

3. Adjust the area to answer the question.

6.2 Applications of the Normal

Distribution

33

Example 1 :

Let x be a normal random variable with mean 80 and standard deviation 12.

What percentage of values are

(a) larger than 56?

P (x > 56)

Standardize from x to z:

P (x > 56) = P (z > -2)

02z

From Table E

-2 → 0.0228

+∞ → 01

P (z > -2) = 1-0.0228

= 0.9772

0.02280.9772

34

(b) less than 62?

P (x < 62)

Standardize from x to z:

P (x < 62) = P (z < -1.5)

05.1z

0.0668 From Table E

-1.5 → 0.0668

35

(c) Between 85 and 98?

P (85 ≤ x ≤ 98)

P (0.42 ≤ z ≤ 1.5)

0 42.0

0.66280.9332

5.1

From Table E

1.5 → 0.9332

= 0.9332 – 0.6628

0.42 → 0.6628

P (0.42 ≤ z ≤ 1.5)

= 0.2704

36

(d) outside of 1.5 standard deviations of the mean

From Table E

-1.5 → 0.0668

P (-1.5 < z < 1.5) =2 · 0.0668

= 0.1336 05.1 5.1

What is outside of 1.5 standard deviation of the mean?

0.0688 5.1

37

Example 2 : (Ref: General Statistics by Chase/Bown, 4th ed.)

The length of times it takes for a ferry to reach a summer resort from the

mainland is approximately normally distributed with mean 2 hours and standard

deviation of 12 minutes. Over many past trips, what proportion of times has the

ferry reached the island in

(a) less than 1 hour 45 minutes?

P (z < -1.25)

025.1

0.1056

From Table E

-1.25 → 0.1056

38

(b) more than 2 hours, 5 minutes?

P (z > 0.42)

0 42.0

1628.0 5.0

From Table E

0.42 → 0.6628

+∞ 1

P (z > 0.42) =1- 0.6628

= 0.3372

39

(c) between 1 hour, 50 minutes and 2 hours, 20 minutes?

P (110 ≤ x ≤ 140)

P (-0.83 ≤ z ≤ 1.67)

0

0.29670.9525

67.1

From Table E

-0.83 → 0.2967

= 0.9525 -0.2967

1.67 → 0.9525

= 0.7492

83.0P (-0.83 ≤ z ≤ 1.67)

40

II. Calculating a Cutoff Value

Backward steps for calculating probabilities of a non-standard normal

distribution.

1. Adjust to the corresponding central area.

2. Use Table E to find the corresponding z cutoff value.

3. Non-standardize from z to x.

41

Example 1 :

Employees of a company are given a test that is distributed normally with mean

100 and variance 25. The top 5% will be awarded top positions with the company.

What score is necessary to get one of the top positions?

1 – 0.05 = 0.95

From Table E

0.95→ 1.645

z = 1.645

Non-standardize

0

0.95

cutoff

05.0

?z

Normal distribution, 525,100 2

42

Example 2 :

Quiz scores were normally distributed with = 14 and = 2.8, the lower 20%

should receive tutorial service. Find the cutoff score.

From Table E

0.2 → 0.-84

z = -0.84

Non-standardize

0

2.0

?z

Normal distribution, 8.2,14

43

Section 6 – 3 The Central Limit Theorem

• I. Sampling Distribution of Sample Mean

Example 1 : Population Distribution Table

(a) Find the population mean and population standard deviations of the

population distribution table.

44

(b) Construct a probability histogram for x

)(xP

x

4

1

2 4 6 8

45

Example 2 :

From the population distribution of example 1, 2 random variables are randomly selected.

(a) List out all possible combinations (sample place) and for each combination.

2

4

6

8

2

4

6

8

2

4

6

8

2

4

6

8

2

4

6

8

46

1 2 3 4 5 6 7 8

16/1

)(xP

x

16/2

16/3

16/4

(b) Construct a probability distribution table for .

(c) Construct a probability histogram for .

47

(d) Find the mean of sampling distribution of .

48

(e) Find the standard deviation of the sampling distribution of .

49

(f) Compare with

From (a) of Example 1,

From (d) of Example 2,

(g) Compare with

This shows that

From (a) of Example 1,

From (e) of Example 2,

This shows that ; however

50

Population parameter Sample statistics

Mean

Standard deviation

Population Distribution Sampling Distribution

1 2 3 4 5 6 7 8

10/1

)(xP

x

10/2

10/3

10/4

)(xP

x

4

1

2 4 6 8

51

II. Central Limit Theorem

If the population distribution is normally distributed, the sampling distribution

of will be normally distributed for any size of n.

If the population distribution is not normally distributed, the sampling distribution

of will be normally distributed for any size of n ≥ 30. x

)(xP

x

)(xP

x

)(xP

x

)(xP

x

52

(a) Find and for n = 4

Example 1 : Population distribution

Given :

(b) Is the sampling distribution normally distributed?

(c) If n is changed from 4 to 36, is the sampling distribution normal distributed?

Yes, because n is greater than or equal to 30.

According to central limit theory, will NOT be normally distributed

because the population distribution is NOT normally distributed and n is

NOT greater than 30.

)(xP

x

53

Example 2 : (Ref: General Statistics by Chase/Bown 4th ed.)

A population has mean 325 and variance 144. Suppose the distribution of

sample mean is generated by random samples of size 36.

(a) Find and

(b) Find

Recall : Standardize

Now use :

1 0

5.0

3413.0

54

(c) Find

0

0.8413

2 1

0.0228

55

The average number of days spent in a North Carolina hospital for a coronary

bypass in 1992 was 9 days and the standard deviation was 4 days (North Carolina

Medical Database Commission, Consumer’s Guide to Hospitalization Charges in

North Carolina Hospitals, August 1994). What is the probability that a random

sample of 30 patients will have an average stay longer than 9.5 days?

Example 3 :

68.00

0.7517 0.2483

49

56

Example 4 :

Suppose the test scores for an exam are normally distributed with = 75, = 8

(a) What percentage of the students has a score greater than 85?

25.10

0.8944 0.1056

57

(b) What is the probability that 4 randomly selected students will have a mean score

5.20

0.9938 0.0062

higher than 85?

58

Section 6 - 4 Normal Approximation to the Binomial

Distribution

I. When to use a Normal distribution to approximate a Binomial distribution?

Recall that a binomial distribution is determined by n and p. When p is approximately 0.5, and as n increases, the shape of the binomial distribution becomes similar to the normal distribution. In order to use a normal distribution to approximate a binomial distribution, n must be sufficiently large. It is known n will be sufficiently large if np ≥ 5 and nq ≥ 5.

When using a normal distribution to approximate a binomial distribution, the mean and standard deviation of the normal distribution is the same as the binomial distribution. Now recall the formulas for finding the mean and standard deviation.

npqnp ,

59

II. Continuity Correction

• In addition to the condition np ≥ 5 and nq ≥ 5, a correction for continuity is used in employing a continuous distribution (Normal distribution) to approximate a discrete distribution (Binomial distribution).

Warning : The continuity correction should be used only when approximating the Binomial probability with a normal probability. Don’t use the continuity correction with other normal probability problems.

Continuity correction x ± 0.5

60

Example 1 : Use the continuity correction to rewrite each expression :

(a) Binomial Distribution Normal Distribution

P (x > 6) → P ( x > 6.5)

(b) Binomial Distribution Normal Distribution

P (x ≤ 3) → P ( x ≤ 3.5)

(c) Binomial Distribution Normal Distribution

P (x ≤ 9) → P ( x ≤ 9.5)

61

(d) Binomial Distribution Normal Distribution

P (1< x < 7) → P ( 1.5 < x < 6.5)

(e) Binomial Distribution Normal Distribution

P (5 ≤ x ≤ 10) → P (4.5 ≤ x ≤ 10.5)

(f) Binomial Distribution Normal Distribution

P (4 < x ≤ 6) → P (4.5 < x ≤ 6.5)

62

III. Using a Normal Distribution to approximate a

Binomial Distribution Step 1 : Check whether the normal distribution can be used.

( np ≥ 5 and nq ≥ 5)

Step 2 : Find the mean and standard deviation .

Step 3 : Write the problem in probability notation, using x.

Step 4 : Rewrite the problem by using the continuity correction factor.

Continuity correction → x ± 0.5

Step 5 : Find the corresponding z value(s)

Step 6 : Use the z table to find the center area and adjust the center area

to answer the question.

63

Example 1 : (Ref: General Statistics by Chase/Bown, 4th ed.)

Assume that the experiment is a binomial experiment. Find the probability of

10 or more successes, where n = 13 and p = 0.4.

(a) Use the Binomial table

P (x ≥ 10) = P (x = 10) + P (x = 11) + P (x = 12) + P (x = 13)

= 0.006 + 0.001 + 0+ + 0

+

= 0.007

(b) Use the normal approximate to the binomial

Step 1 : Check :

np ≥ 5 13 · 0.4 ≥ 5 5.2 ≥ 5

nq ≥ 5 13 · 0.6 ≥ 5 7.8 ≥ 5

Step 2 : Find and

64

Step 3 :

Binomial Distribution → Normal Distribution

P (x > 9.5)

Step 4 :

Step 5 :

Step 6 :

43.20

0.9925 0.0075

P (x ≥ 10)

P (x ≥ 10)

65

A dealer states that 90% of all automobiles sold have air conditioning. If the

dealer sells 250 cars, find the probability that fewer than 5 of them will not have

air conditioning.

Example 2 :

Step 1 : Check :

np ≥ 5 250 · 0.1 ≥ 5 25 ≥ 5

nq ≥ 5 250 · 0.9 ≥ 5 225≥ 5

Step 2 : Find and

p = 0.10, q = 0.9 n = 250

Step 3 :

Binomial Distribution → Normal Distribution

P (x < 5)

Step 4 :

P (x < 5) → P (x < 4.5)

66

Step 5 :

Step 6 :

32.4 0

0.0001

67

Example 3 :

In a corporation, 30% of the people elect to enroll in the financial investment

program offered by the company. Find the probability that of 800 randomly

selected people, between 260 and 300 inclusive have enrolled in the program.

Step 1 : Check :

np ≥ 5 800 · 0.3 ≥ 5 240 ≥ 5

nq ≥ 5 800 · 0.7 ≥ 5 560 ≥ 5

Step 2 :

p = 0.3, q = 0.7 n = 800

Step 3 :

Binomial Distribution → Normal Distribution Step 4 :

P (260 ≤ x ≤ 300) → P (259.5 ≤ x ≤ 300.5)

P (260 ≤ x ≤ 300)

Find and

68

Step 5 :

Step 6 :

67.40

0.93320.9999

5.1

69

Summary

• The normal distribution can be used to

describe a variety of variables, such as

heights, weights, and temperatures.

• The normal distribution is bell-shaped,

unimodal, symmetric, and continuous; its

mean, median, and mode are equal.

• Mathematicians use the standard normal

distribution which has a mean of 0 and a

standard deviation of 1.

70

Summary (cont.)

• The normal distribution can be used to

describe a sampling distribution of sample

means.

• These samples must be of the same size

and randomly selected with replacement

from the population.

• The central limit theorem states that as the

size of the samples increases, the

distribution of sample means will be

approximately normal.

71

Summary (cont.)

• The normal distribution can be used to

approximate other distributions, such as the

binomial distribution.

• For the normal distribution to be used as an

approximation to the binomial distribution,

the conditions np 5 and nq 5 must be

met.

• A correction for continuity may be used for

more accurate results.

72

Conclusions

• The normal distribution can be used to

approximate other distributions to simplify

the data analysis for a variety of

applications.