required sample size, type ii error probabilities

1

Required Sample Size, Type II Error Probabilities

Required Sample Size, Type II Error Probabilities

Chapter 23 Inference for Means: Part 2

Required Sample Size To Estimate a Population Mean

• If you desire a C% confidence interval for a population mean with an accuracy specified by you, how large does the sample size need to be?

• We will denote the accuracy by ME, which stands for Margin of Error.

Example: Sample Size to Estimate a Population Mean

• Suppose we want to estimate the unknown mean height of male students at NC State with a confidence interval.

• We want to be 95% confident that our estimate is within .5 inch of

• How large does our sample size need to be?

Confidence Interval for

*1

*1

In terms of the margin of error ME,

the CI for can be expressed as

The confidence interval for is

so ME

n

n

x ME

sx t

n

st

n

• Good news: we have an equation• Bad news:

1. Need to know s2. We don’t know n so we don’t know the degrees of

freedom to find t*n-1

*1

2*1

So we can find the sample size by solving

this equation for n:

ME

which gives

n

n

st

n

t sn

ME

A Way Around this Problem: Approximate by Using the

Standard Normal*

*

2*

Use the corresponding z from the standard normal

to form the equation

Solve for n:

sME z

n

z sn

ME

1.96n

1.96n

.95

Confidence level

Sampling distribution of x

ME ME

2

set M 1.96 and solve for

1.96 (estimate with s)

E nn

nME

Estimating s• Previously collected data or prior knowledge

of the population• If the population is normal or near-normal,

then s can be conservatively estimated by

s range

6• 99.7% of obs. within 3 of the mean

Example: sample size to estimate mean height µ of NCSU undergrad. male students

We want to be 95% confident that we are within .5 inch of so ME = .5; z*=1.96

• Suppose previous data indicates that s is about 2 inches.

• n= [(1.96)(2)/(.5)]2 = 61.47• We should sample 62 male students

2*z sn

ME

Example: Sample Size to Estimate a Population Mean -Textbooks

• Suppose the financial aid office wants to estimate the mean NCSU semester textbook cost within ME=$25 with 98% confidence. How many students should be sampled? Previous data shows is about $85.

2 2z * σ (2.33)(85)

n 62.76ME 25

round up to n = 63

Example: Sample Size to Estimate a Population Mean -NFL footballs

• The manufacturer of NFL footballs uses a machine to inflate new footballs

• The mean inflation pressure is 13.5 psi, but uncontrollable factors cause the pressures of individual footballs to vary from 13.3 psi to 13.7 psi

• After throwing 6 interceptions in a recent game, Peyton Manning complains that the balls are not properly inflated.

The manufacturer wishes to estimate the mean inflation pressure to within .025

psi with a 99% confidence interval. How many footballs should be sampled?

Example: Sample Size to Estimate a Population Mean

• The manufacturer wishes to estimate the mean inflation pressure to within .025 pound with a 99% confidence interval. How may footballs should be sampled?

• 99% confidence z* = 2.576; ME = .025 = ? Inflation pressures range from 13.3 to 13.7 psi• So range =13.7 – 13.3 = .4; range/6 = .4/6 = .067

2*z

nME

22.58 .067

47.66 48.025

n

1 2 3 48

. . .

Significance Levels and Rejections Regions

Hypothesis Tests for

13

14

Levels and RejectionRegions, Right-Tail; n=26 (df=25)

If HA: > 0 and =.10then RR={t: t > 1.316}

If HA: > 0 and =.05 then RR={t: t > 1.708}

If HA: > 0 and =.01then RR={t: t > 2.485}

Rej Region

.10 t > 1.316

.05 t > 1.708

.01 t > 2.485

0 0

0

:H

yt

sn

15

Hypothesis Testing for , Type II Error Probabilities (Right-tail example)

• Example– A new billing system for a department store will be cost-

effective only if the mean monthly account is more than $170.

– A sample of 401 accounts has a mean of $174 and s = $65.

– Can we conclude that the new system will be cost effective?

16

• Hypotheses– The population of interest is the credit accounts at

the store.– We want to know whether the mean account for all

customers is greater than $170.

HA : > 170

– Where is the mean account value for all customers

– We will choose significance level = .05

Right-tail example: hypotheses, significance level

H0 : = 170

17

• The rejection region: reject H0 if the test statistic t satisfies t > t.05,n-1 = t.05,400 = 1.649

• We will reject H0 if the value of the test statistic t is greater than 1.649• Results from the n = 401 randomly selected customers:

A Right - Tail Test: Rejection Region

$174, $65x s

18

– Hypotheses:H0 : = 170HA : > 170

174 170test statistic: 1.23

65 401

xts n

Right-tail example: test statistic and conclusion

data: 174, 65x s

, 1 .05,400 1.649nt t t Recall that the rejection region is

Since the test statistic t = 1.23, and 1.23 < 1.649,We do not reject the null hypothesis H0: = 170.

19

400-value ( 1.23) .1097P P t

01.23t

P-value: The probability of observing a value of the test statistic as extreme or more extreme then t = 1.23, given that = 170 is…

Right-tail example: P-value and conclusion

t400

Since the P-value > .05, we conclude that there is not sufficient evidence to reject H0 : =170.

Type II error is possible

20

Calculating , the Probability of aType II Error

• Calculating for the t test is not at all straightforward and is beyond the level of this course– The distribution of the test statistic t is quite

complicated when H0 is false and HA is true– However, we can obtain very good approximate

values for using z (the standard normal) in place of t.

21

Calculating , the Probability of aType II Error (cont.)

• We need to1. specify an appropriate significance level ;2. Determine the rejection region in terms of z3. Then calculate the probability of not being in the

rejection when = 1, where 1 is a value of that makes HA true.

22

– Test statistic:H0 : = 170HA : > 170

Choose = .05Rejection region in terms of z: z > z.05 = 1.645

Example (cont.) calculating

rejection region in terms of :

1701.645

65

40065

170 1.645 175.34.400

x

xz

x

= 0.05

175.34170

23

Express the rejection region directly, not in standardized

terms

175.34

=.05

= 170


– The rejection region with = .05.

34.175x

Do not reject H0

180

HA: = 180

H0: = 170

Specify the alternative value under HA.

– Let the alternative value be = 180 (rather than just >170)

24

175.34

=.05

= 170


34.175x 180

H1: = 180

H0: = 170

– A Type II error occurs when a false H0 is not rejected. Suppose =180, that is H0 is false.

A false H0……is not rejected

25

175.34= 170


180

H1: = 180

H0: = 170

0(180) ( 175.34 )

( 175.34 180)

P x given that H is false

P x given that

0764.)40065

18034.175z(P

Power when =180 =

1-(180)=.9236

26

• Increasing the significance level decreases the value of and vice versa

Effects on of changing

= 170 180

2 >2 <

27

• A hypothesis test is effectively defined by the significance level and by the sample size n.

• If the probability of a Type II error is judged to be too large, we can reduce it by– increasing , and/or– increasing the sample size.

Judging the Test

28

• Increasing the sample size reduces

Judging the Test

By increasing the sample size the standard deviation of the sampling distribution of the mean decreases. Thus, the cutoff value of for the rejection region decreases.

Recall : ,x s

RR z z or x zs n n

29Lx 180= 170

Judging the Test

Lx

Note what happens when n increases:

Lx LxLx Lx

does not change,but becomes smaller

• Increasing the sample size reduces

Recall : ,x s

RR z z or x zs n n

30

• Increasing the sample size reduces • In the example, suppose n increases from 400 to

1000.65

170 1.645 173.381000

173.38 180( ) ( 3.22) 0

65 1000

sx z

n

P Z P Z

Judging the Test

• remains 5%, but the probability of a Type II drops dramatically.

31

A Left - Tail Test• Self-Addressed Stamped Envelopes.

– The chief financial officer in FedEx believes that including a stamped self-addressed (SSA) envelop in the monthly invoice sent to customers will decrease the amount of time it take for customers to pay their monthly bills.

– Currently, customers return their payments in 24 days on the average, with a standard deviation of 6 days.

– Stamped self-addressed envelopes are included with the bills for 76 randomly selected customers. The number of days until they return their payment is recorded.

32

• The parameter tested is the population mean payment period () for customers who receive self-addressed stamped envelopes with their bill

• The hypotheses are:H0: = 24H1: < 24

• Use = .05; n = 76.

A Left - Tail Test: Hypotheses

33

• The rejection region: reject H0 if the test statistic t satisfies t < t.05,75 = 1.665

• We will reject H0 if the value of the test statistic t is less than 1.665• Results from the 76 randomly selected customers:

A Left - Tail Test: Rejection Region

22.95 days, 6 daysx s

34

• The value of the test statistic t is:

A Left -Tail Test: Test Statistic

.05 1.665t t t

22.95 241.52

6 76

xts n

Since the test statistic t = 1.52, and 1.52 > 1.665,We do not reject the null hypothesis. Note that the P-value = P(t75 < -1.52) = .066 > .05.

Since our decision is to not reject the null hypothesis,A Type II error is possible.

Since the rejection region is

35

• The CFO thinks that a decrease of one day in the average payment return time will cover the costs of the envelopes since customer checks can be deposited earlier.

• What is (23), the probability of a Type II error when the true mean payment return time is 23 days?

Left-Tail Test: Calculating , the Probability of a Type II Error

36

– Test statistic:H0 : = 24HA : < 24

Choose = .05Rejection region in terms of z: z < -z.05 = -1.645

Left-tail test: calculating (cont.)


241.645

6

756

24 1.645 22.86.75

x

xz

x

= 0.05

22.86 24

37

Express the rejection region directly, not in standardized

terms

22.86

=.05

= 23


– The rejection region with = .05.

22.86x

Do not reject H0

24

HA: = 23H0: = 24

Specify the alternative value under HA.

– Let the alternative value be = 23 (rather than just < 24)

38

22.86 = 23


24

H1: = 23

H0: = 24

0(23) ( 22.86 )

( 22.86 23)

P x given that H is false

P x given that

22.86 23.718

6 75P z

Power when =23 = 1-(23)=.282

=.05

39

A Two - Tail Test for • The Federal Communications Commission

(FCC) wants competition between phone companies. The FCC wants to investigate if AT&T rates differ from their competitor’s rates.

• According to data from the (FCC) the mean monthly long-distance bills for all AT&T residential customers is $17.09.

40

A Two - Tail Test (cont.)• A random sample of 100 AT&T customers is

selected and their bills are recalculated using a leading competitor’s rates.

• The mean and standard deviation of the bills using the competitor’s rates are

• Can we infer that there is a difference between AT&T’s bills and the competitor’s bills (on the average)?

$17.55, $3.87x s

41

• Is the mean different from 17.09?

• n = 100; use = .05

H0: = 17.09

: 17.09AH

A Two - Tail Test (cont.)

42

0

20.025 20.025

17.55 17.09

3.87.1

1001 9

xts n

-t= -1.9842 t= 1.9842

Rejection region

Rejection region

A Two – Tail Test (cont.)

.025,99 .025,99

1.9842 1.9842

t t or t t

t or t

t99

43

20.025 20.02517.55 17.09

1.193.87 100

xts n

-t= -1.9842 t= 1.9842

There is insufficient evidence to conclude that there is a difference between the bills of AT&T and the competitor.

-1.19

Also, by the P-value approach:The P-value = P(t < -1.19) + P(t > 1.19) = 2(.1184) = .2368 > .05

1.190

A Two – Tail Test: Conclusion

A Type II error is possible

44

• The FCC would like to detect a decrease of $1.50 in the average competitor’s bill. (17.09-1.50=15.59)

• What is (15.59), the probability of a Type II error when the true mean competitor’s bill is $15.59?

Two-Tail Test: Calculating , the Probability of a Type II Error

45

20.025

17.0916.33 17.85

20.025

Rejection regionTwo – Tail Test: Calculating (cont.)

.025 .025

1.96 1.96

z z or z z

z or z


17.091.96

3.87100

3.8717.09 1.96

10016.33

17.091.96

3.87100

3.8717.09 1.96

10017.85

x

xz

x

x

xz

x

x

Do not reject H0

Reject H0

4616.63

= 15.59

Two – Tail Test: Calculating (cont.)

17.09

HA: = 15.59

H0: = 17.09

(15.59) (16.33 17.85 given that 15.59)P x 16.33 15.59 15.59 17.85 15.59

3.87 100 3.87 100 3.87 100

(1.912 5.84) .028

xP

P z

Power when =15.59 = 1-(15.59)=.972

=.05

17.85

General formula: Type II Error Probability (A) for a Level Test

47

00

00

0 00 /2 /2

:

: 1

:

AA

AA

A AA

H P z zn

H P z zn

H P z z P z zn n

Sample Size n for which a level test also has (A) =

48

2

0

2

/2

0

( )for a 1-tailed (right or left) test

( ) for a 2-tailed test (approx. solution)

A

A

z z

nz z

required sample size, type ii error probabilities

Documents

mean account

sample size need

mean inflation pressure

confidence interval

mean monthly account

mean height of ncsu

hypothesesthe population

confidence z