ap statistics – ch. 10 noteslinford-math.weebly.com/uploads/4/2/4/6/4246372/ch._10... · 2018. 4....

AP Statistics – Ch. 10 Notes

Comparing Two Proportions

Situations in which we perform inference about the difference between two proportions (1 2

p p−−−− ):

• Comparing the proportions of individuals with a certain characteristic in two different populations: The parameters of interest are the true proportions of individuals with the characteristic in

each population, 1p and 2.p We estimate these proportions by taking separate random samples from the

two population and calculating the proportion of individuals in each sample with the characteristic ( 1p̂

and 2p̂ ).

• Comparing the proportions of successful outcomes for two treatment groups in a completely randomized experiment: The parameters of interest are the true proportions of successful outcomes for

each treatment, 1p and 2.p We estimate these proportions using the proportions of successes in the two

treatment groups of our randomized experiment ( 1p̂ and 2p̂ ).

We compare the populations or treatments by doing inference about the difference 1 2p p− between the

parameters. The statistic that we use to estimate this difference in a confidence interval or hypothesis test is the

difference between the two sample proportions, 1 2ˆ ˆ .p p−

The Sampling Distribution of 1 2

ˆ ˆp p−−−−

Choose an SRS of size 1n from Population 1 with proportion of successes 1p and an independent SRS of size

2n from Population 2 with proportion of successes 2.p

• Shape: When 1 1 1 1 2 2 2 2, , , and n p n q n p n q are all at least 10, the sampling distribution of 1 2ˆ ˆp p− is

approximately Normal.

• Center: 1 2ˆ ˆ 1 2.p pµ p p−

= − That is, the difference in sample proportions is an unbiased estimator of the

difference in population proportions.

• Spread: 1 2

1 1 2 2ˆ ˆ

1 2

p p

p q p qσ

n n−

= + as long as the 10% condition is met.

Example: A researcher reports that 80% of high school graduates but only 40% of high school dropouts would pass a basic literacy test. Assume the researcher’s claim is true. Suppose we give a basic literacy test to a random sample of 60 high school graduates and a separate random sample of 75 high school dropouts. Let

graduatep̂ and dropoutp̂ be the proportions of graduates and dropouts in the samples who pass the test, respectively.

graduate graduate dropout dropout0.8 60 0.4 75p n p n= = = =

a) What is the shape of the sampling distribution of graduate dropoutˆ ˆ ?p p− How do you know?

The sampling distribution of graduate dropoutˆ ˆp p− is approximately Normal because

( ) ( ) ( ) ( )

( )( ) ( )( )

graduate graduate dropout dropout


60 0.8 48 10 75 0.4 30 10

60 0.2 12 10 75 0.6 45 10

n p n p

n q n q

= = ≥ = = ≥

= = ≥ = = ≥

p-hat grad - p-hat dropout

0.2

0.9955

µ=0.4

σ=0.0766 Area=

b) Find the mean and standard deviation of the sampling distribution of graduate dropoutˆ ˆ .p p− Interpret these

values in context.

( ) ( ) ( )( )

graduate dropout

graduate dropout

ˆ ˆ graduate dropout


ˆ ˆ

graduate dropout

0.8 0.4 0.4

0.8 0.2 0.4 0.60.0766

60 75

p p

p p

µ p p

p q p qσ

n n

−

−

= − = − =

= + = + ≈

In repeated samples of 60 graduates and 75 dropouts, the proportion of graduates in the sample who pass the test will be about 0.4 higher, on average, than the proportion of dropouts in the sample who pass. The difference in the sample proportions will typically differ from the true difference in population proportions by about 0.0766.

c) Find the probability that in your samples the proportion of graduates who pass the test is no more than

0.20 higher than the proportion of dropouts who pass.

( ) ( )graduate dropoutˆ ˆ 0.2 normalcdf lower 0.2, upper , 0.4, 0.0766 0.9955P p p µ σ− ≥ = = = ∞ = = ≈

d) Suppose that the difference in the sample proportions (graduate – dropout) who pass the test is exactly

0.20. Based on your result in part (c), would this give you reason to doubt the researcher’s claim? Explain.

Since it is very unlikely that we would get a difference this low or lower if the researcher’s claim is true (probability = 0.0045), we have very good reason to doubt the researcher’s claim.

Just as we can construct confidence intervals and perform hypothesis tests for one-sample situations, we can do the same for two-sample situations.

When we are constructing a confidence interval for 1 2 ,p p− we don’t know the values of 1p or 2 ,p so we have

to use 1p̂ and 2p̂ to estimate these values in the formula for standard deviation.

Standard Error (or Estimated Standard Deviation) of 1 2

ˆ ˆp p−−−− : 1 2

1 1 2 2ˆ ˆ

1 2

ˆ ˆ ˆ ˆp p

p q p qSE

n n−

= +

Two-Sample z Interval for a Difference between Two Proportions (or Two-Proportion z Interval)

An approximate level C confidence interval for 1 2p p− is

( ) 1 1 2 21 2

1 2

ˆ ˆ ˆ ˆˆ ˆ

p q p qp p z

n n

∗− ± +

where z∗ is the critical value for the standard Normal curve with area C between z∗− and z∗ .

Conditions:

• Random: The data come from independent random samples from two different populations

• Normal: The counts of “successes” and “failures” in each sample— 1 1ˆ ,n p 1 1

ˆ ,n q 2 2ˆ ,n p and 2 2

ˆn q — are all at

least 10.

• 10% Condition: Check that both populations are at least 10 times as large as their corresponding samples.

Confidence Interval for the Difference between Two Proportions on TI-83/TI-84 Calculators. 1. Choose “2-PropZInt” on the STAT → TESTS menu. 2. Enter the requested information: x1: number of successes in sample 1

n1: sample size of sample 1 x2: number of successes in sample 2 n2: sample size of sample 2 C-Level: confidence level (as a decimal)

3. Choose “Calculate”

Example: Did the proportion of U.S. adults who would report having read a book in the past year change between 2011 and 2016? In a random sample of 2,345 U.S. adults in December 2011, 79% reported having read a book in the past year. In another random sample of 1,520 U.S. adults in March 2016, 73% reported having read a book in the past year.

a) Calculate the standard error of the sampling distribution of the difference in the sample proportions (2016 – 2011). Interpret this value.

( ) ( ) ( )( )16 11

16 16 11 11ˆ ˆ

16 11

0.73 0.27 0.79 0.21ˆ ˆ ˆ ˆ0.014

1520 2345p p

p q p qSE

n n−

= + = + ≈

If we had taken many random samples of 2,345 U.S. adults in Dec. 2011 and 1,520 U.S. adults in Mar. 2016, the differences in the proportions of people in the samples who had read a book in the past year would typically differ from the true difference in the proportions of all U.S. adults who had read a book in the past year between Dec. 2011 and Mar. 2016 by about 0.014.

b) Construct and interpret a 90% confidence interval for the difference in the proportions of all U.S. adults

who would report having read a book in the past year in 2016 and in 2011 (2016-2011).

Two-sample z interval for 16 11p p−

16p and 11p are the true proportions of U.S. adults who would report having read a book in the past year

in 2016 and in 2011, respectively.

• Random? Independent random samples of U.S. adults were selected in 2011 and 2016.

• Normal? ( )( ) ( ) ( )

( ) ( ) ( )( )

16 16 11 11

16 16 11 11

ˆ ˆ1520 0.73 1110 10 2345 0.79 1853 10

ˆ ˆ1520 0.27 410 10 2345 0.21 492 10

n p n p

n q n q

= ≈ ≥ = ≈ ≥

= ≈ ≥ = ≈ ≥

• 10% Condition: 2,345 and 1,520 were less than 10% of all U.S. adults in Dec. 2011 and Mar. 2016, respectively.

( ) ( )( ) ( ) ( )( )16 16 11 11

16 11

16 11

0.73 0.27 0.79 0.27ˆ ˆ ˆ ˆˆ ˆ 0.73 0.79 1.645 0.06 0.024

1520 2345

p q p qp p z

n n

∗− ± + = − ± + ≈ − ±

( )0.084, 0.036− −

We are 90% confident that the interval from –0.084 to –0.036 captures the true difference in the proportions of U.S. adults who would have reported having read a book in the past year in 2016 and 2011 (2016 – 2011).

c) Based on your interval, is there convincing evidence that the proportion of U.S. adults who would report having read a book in the past year changed between 2011 and 2016? Explain. Yes. Since 0 is not in the 90% confidence interval, it is not plausible that there was no change in the population proportions. This means that there is convincing evidence that the proportion of U.S. adults who would report having read a book in the past year changed between 2011 and 2016. (Since all of the differences in the interval are negative, there is convincing evidence that this proportion dropped between 2011 and 2016.)

Example: Are teens or adults more likely to go online daily? The Pew Internet and American Life Project asked a random sample of 1016 teens and a separate random sample of 2001 adults how often they use the Internet. In these two surveys, 935 of the teens and 1461 of the adults said they go online every day. Construct and interpret

a 95% confidence interval for teens adults .p p−

2 sample z interval for teens adults ,p p− where teensp and adultsp are the true proportions of U.S. teens and

adults, respectively, who go online every day.

• Random? The data come from independent random samples of teens and adults.

• Normal? teens teens adults adults

teens teens adults adults

ˆ ˆ935 10 1461 10

ˆ ˆ81 10 540 10

n p n p

n q n q

= ≥ = ≥

= ≥ = ≥

• 10% Condition: There are more than ( )10 935 9350= teens and ( )10 2001 20,010= adults in the

U.S.

teens adults

935 1461ˆ ˆ0.92 0.73

1016 2001p p= = = =

( ) ( )( )( ) ( ) ( )

( )

teens teens adults adultsteens adults

teens adults

0.92 0.08 0.73 0.27ˆ ˆ ˆ ˆˆ ˆ 0.92 0.73 1.960

1016 2001

0.19 0.026 0.164, 0.216

p q p qp p z

n n

∗− ± + = − ± +

= ± =

We are 95% confident that the interval from 0.164 to 0.216 captures the true difference in the proportions of all U.S. teens and adults who go online every day (teens – adults). In other words, we are 95% confident that the proportion of teens who are online daily is between 0.164 and 0.216 higher than the proportion of adults who are online daily.

Significance Tests for 1 2

p p−−−−

Very often, we want to test the null hypothesis that there is no difference between two proportions, so we test

0 1 2 0 1 2: 0, or, alternatively, : .H p p H p p− = = The alternative hypothesis specifies what kind of difference we

expect.

Since the null hypothesis assumes that 1 2 ,p p= we use a pooled sample proportion to calculate the standard

deviation.

1 2

1 2

count of successes in both samples combinedˆ

count of individuals in both samples combinedC

X Xp

n n

+= = =

+pooled sample proportion

Two-Sample z Test for the Difference between Two Proportions

To test the hypothesis 0 1 2: 0H p p− = or 0 1 2:H p p=

Test statistic: ( )1 2

1 2

ˆ ˆ 0

ˆ ˆ ˆ ˆC C C C

p pz

p q p q

n n

− −=

+

P-value: Find the probability of getting a z statistic this large or larger in the direction specified by the

alternative hypothesis .a

H The P-value is the shaded area. For a two-tailed test, it is the total area in both tails.

Conditions:


• Normal: The counts of “successes” and “failures” in each sample— 1 1ˆ ,n p 1 1̂,n q 2 2

ˆ ,n p and 2 2ˆn q — are all at

least 10.

• 10% Condition: Check that both populations are at least 10 times as large as their corresponding samples. Hypothesis Test for the Difference between Two Proportions on TI-83/TI-84 Calculators.

1. Choose “2-PropZTest” on the STAT → TESTS menu. 2. Enter the requested information:

x1: number of successes in sample 1 n1: sample size of sample 1 x2: number of successes in sample 2 n2: sample size of sample 2

3. Specify which proportion the alternative hypothesis says is higher. 4. Choose “Calculate” to see results, or “Draw” to see a shaded Normal curve.

Example: Are teenagers going deaf? In a study of 3000 randomly selected teenagers in 1988-1994, 15% showed some hearing loss. In a similar study of 1800 teenagers in 2005-2006, 19.5% showed some hearing loss. (These data are reported in Arizona Daily Star, August 18, 2010.)

a) Do these data give convincing evidence that the proportion of all teens with hearing loss has increased? 2 sample z test for a difference in proportions

0 1994 2006

1994 2006

:

:a

H p p

H p p

=

<

where 1994p and 2006p are the true proportions of all teenagers who had hearing loss in 1988-1994 and

2005-2006, respectively.

• Random? Independent random samples of teenagers were selected from each time period.

• Normal? ( )( ) ( )( )

( ) ( ) ( ) ( )

1994 1994 2006 2006

1994 1994 2006 2006

ˆ ˆ3000 0.15 450 10 1800 0.195 351 10

ˆ ˆ3000 0.85 2550 10 1800 0.805 1449 10

n p n p

n q n q

= = ≥ = = ≥

= = ≥ = = ≥

• 10% Condition: There were more than ( )10 3000 30,000= teens in 1988-1994 and

( )10 1800 18,000= teens in 2005-2006.

1994 2006

450 351 450 351ˆ ˆ ˆ0.15 .195 0.1669

3000 1800 3000 1800Cp p p

+= = = = = =

+

( ) ( )

( ) ( ) ( ) ( )

1994 2006

1 2

ˆ ˆ 0 0.15 0.195 04.048

ˆ ˆ ˆ ˆ 0.1669 0.8331 0.1669 0.8331

3000 1800

C C C C

p pz

p q p q

n n

− − − −= = ≈ −

+ +

( )-value 4.048 0.000026P P z= < − ≈

Since the P-value is so low, we reject 0.H There is convincing evidence that the proportion of all teens

with hearing loss increased from 1988-1994 to 2005-2006.

b) Between the two studies, Apple introduced the iPod. If the results of the test are statistically significant, can we blame iPods for the increased hearing loss in teenagers?

No. We did not do an experiment in which we randomly assigned some teens to listen to iPods and some to never listen to iPods, so we cannot make a conclusion about cause and effect. There are many other possible causes for the increase in hearing loss. For example, teens who listen to iPods may also like to listen to music in their cars, so the car stereos could be causing the hearing loss.

Example: A study published in the Archives of General Psychiatry examined the impact of depression on a patient’s ability to survive cardiac disease. Researchers identified 450 people with cardiac disease, evaluated them for depression, and followed the group for 4 years. Of the 361 patients with no depression, 67 died. Of the 89 patients with minor or major depression, 26 died. Do these data provide convincing evidence that among people who suffer from cardiac disease, depressed patients are more likely to die than non-depressed ones?

Two sample z test for a difference in proportions

0 :

:

dep non

a dep non

H p p

H p p

=

>

depp and non

p are the true proportions of all depressed and non-depressed patients suffering from cardiac

disease, respectively, who died during the 4 year period of the study.

• Random: We were not told whether anything was done randomly, so the results should be treated with caution, but it’s reasonable to believe that the samples are representative of depressed and non-depressed patients with cardiac disease and will act in a similar way to random samples. The two groups should be unrelated to each other in terms of whether or not they die, so the samples can be treated as independent.

• Normal: ˆ ˆ26 10 67 10

ˆ ˆ63 10 294 10

dep dep non non

dep dep non non

n p n p

n q n q

= ≥ = ≥

= ≥ = ≥

• 10% condition: It’s reasonable to assume there are more than 10(89) = 890 depressed and 10(361) = 3610 non-depressed patients with cardiac disease.

( )

( ) ( ) ( )( )( )

26 67 26 67 93ˆ ˆ ˆ0.292 0.186 0.207

89 361 89 361 450

0.292 0.186 02.22 -val 2.22 0.0131

0.207 0.793 0.207 0.793

89 361

dep non Cp p p

z P P z

+= ≈ = ≈ = = ≈

+

− −= ≈ = > ≈

+

Since -val 0.05,P α< = we reject 0.H There is convincing evidence that among people who suffer from

cardiac disease, depressed patients are more likely to die than non-depressed ones.

Comparing Two Means

Situations in which we perform inference about the difference between two means (1 2µ µ−−−− ):

• Comparing the mean of some quantitative variable for the individuals in two different

populations: The parameters of interest are the population means in each population, 1µ and 2.µ We

estimate these means by taking separate random samples from each population and calculating the

sample means 1x and 2.x

• Comparing the average effectiveness of two treatments in a completely randomized experiment:

The parameters of interest are the true mean responses for treatment 1 and treatment 2, 1µ and 2.µ We

use the mean response in the two groups, 1x and 2 ,x to make the comparison.

We compare the populations or treatments by doing inference about the difference 1 2µ µ− between the

parameters. The statistic that we use to estimate this difference in a confidence interval or hypothesis test is the

difference between the two sample means, 1 2.x x−

The Sampling Distribution of 1 2

x x−−−−

Choose an SRS of size 1n from Population 1 with mean 1µ and standard deviation 1σ and an independent SRS

of size 2n from Population 2 with mean 2µ and standard deviation 2.σ

• Shape: When both population distributions are Normal, the sampling distribution of 1 2x x− is Normal.

In other cases, the sampling distribution of 1 2x x− is approximately Normal if the sample sizes are large

enough ( 1 30n ≥ and 2 30n ≥ ).

• Center: 1 2 1 2 .x xµ µ µ−

= − That is, the difference in sample means is an unbiased estimator of the

difference in population means.

• Spread: 1 2

2 2

1 2

1 2

x x

σ σσ

n n−

= + as long as the 10% condition is met.

Example: A potato chip manufacturer buys potatoes from two different suppliers, Riderwood Farms and Camberley, Inc. The weights of potatoes from Riderwood Farms are approximately Normally distributed with a mean of 175 grams and a standard deviation of 25 grams. The weights of potatoes from Camberley are approximately Normally distributed with a mean of 180 grams and a standard deviation of 30 grams. When shipments arrive at the factory, inspectors randomly select a sample of 20 potatoes from each shipment and weigh them. They are surprised when the average weight of the potatoes in the sample from Riderwood Farms,

,R

x is higher than the average weight of the potatoes in the sample from Camberley, .C

x

a) Describe the shape, center, and spread of the sampling distribution of .C R

x x− Interpret the values of the

mean and standard deviation in context. Shape: Since the distributions of potato weights from both farms are approximately Normal, the

sampling distribution of C R

x x− is approximately Normal.

Center: 180 175 5 grams.C Rx x C Rµ µ µ

−= − = − = In repeated samples of 20 potatoes from each farm, the

average difference in the mean sample weights (Camberley – Riderwood) is approximately 5 grams. On average, the mean weight for the Camberley sample will be 5 grams higher than the mean weight for the Riderwood sample.

Spread: 2 2 2 230 25

8.73 grams.20 20C R

C Rx x

C R

σ σσ

n n−

= + = + ≈ In repeated samples of 20 potatoes from each

farm, the difference in the mean weight of potatoes in the two samples will vary by an average of 8.73 grams from the true difference in mean weight of potatoes for the two farms (5 grams).

b) Find the probability that the mean weight of the Riderwood sample is larger than the mean weight of the Camberley sample. Should the inspectors have been surprised that the Riderwood sample had a higher mean weight than the Camberley sample?

( ) ( )0C R C RP x x P x x< = − >

( )normalcdf lower , upper 0, 5, 8.73 0.2834µ σ= −∞ = = = ≈

Since the mean weight of the Riderwood sample will be larger than the mean weight of the Camberley sample about 28% of the time, the inspectors should not be surprised.

c) Review from Ch. 6: Find the probability that a single potato from Riderwood Farms weighs more than a

single potato from Camberley Farms. Let R = weight of Riderwood potato and C = weight of Camberley potato.

( ) ( )0P R C P R C> = − >

175 180 5 gramsR C R Cµ µ µ

−= − = − = −

2 2 2 2 225 30 1525

1525 39.05 grams

R C R C

R C

σ σ σ

σ

−

−

= + = + =

= ≈

Standard Error (or Estimated Standard Deviation) of 1 2

x x−−−− : 1 2

2 2

1 2

1 2

x x

s sSE

n n−

= +

Two-Sample t Statistic: ( ) ( )1 2 1 2

2 2

1 2

1 2

x x µ µt

s s

n n

− − −=

+

.

Like any other z or t statistic, this statistic tells us how many standard deviations the sample statistic 1 2x x− is

from its mean. The two-sample t statistic has approximately a t distribution. There are two options for determining the degrees of freedom.

x-barC - x-barR0

0.2834

µ = 5

Area=σ = 8.73

Difference in weight (R-C)

0

Area=0.4491

μ=-5

σ=39.05

• Option 1 (Technology): Use the t distribution with degrees of freedom calculated from the data by the lovely formula below. With this option, the degrees of freedom may not be a whole number.

22 2

1 2

1 2

2 22 2

1 2

1 1 2 2

df1 1

1 1

s s

n n

s s

n n n n

+

=

+

− −

• Option 2 (Conservative): Use the t distribution with degrees of freedom equal to the smaller of 1 1n −

and 2 1.n − This always gives a confidence interval that is wider than necessary (higher margin of error)

for the desired confidence level, and a P-value that is greater than or equal to the true P-value. Robustness of Two-Sample t Procedures Two-sample t procedures are even more robust against non-Normality than the one-sample t procedures. This is especially true if the two populations being compared have distributions with similar shapes. The two-sample t procedures are most robust against non-Normality when the sample sizes are equal or very similar.

Two-Sample t Interval for a Difference between Two Means (or Two-Sample t Interval)

An approximate level C confidence interval for 1 2µ µ− is

( )2 2

1 21 2

1 2

s sx x t

n n

∗− ± +

where t∗ is the critical value for confidence level C for the t distribution with degrees of freedom approximated

by technology or the smaller of 1 1n − and 2 1.n −

Conditions:


• Normal/Large Sample Size: Both samples are large ( 1 30n ≥ and 2 30n ≥ ) OR no strong skewness or

outliers can be seen in the graph of either distribution of sample data. (You are checking to make sure it is reasonable to believe that both populations distributions are approximately Normal.)

• 10% Condition: Check that both populations are at least 10 times as large as their corresponding samples.

Confidence Interval for the Difference between Two Means on TI-83/TI-84 Calculators. 1. Choose “2-SampTInt” on the STAT → TESTS menu.

2. Choose “Data” if you have a list of sample data. Choose “Stats” if you have values for 1 2 1 2, , and .x x s s

3. Enter the requested information: For “Data” option, input the sample values into two lists and indicate which lists they are in.

For “Stats” option,

1x : mean of sample 1 2x : mean of sample 2

Sx1: sample st. dev. of sample 1 Sx2: sample st. dev. of sample 2 n1: size of sample 1 n2: size of sample 2

C-level: confidence level (as a decimal) Always choose “NO” pooling!

4. Choose “Calculate”

1600014000120001000080006000

Target

Bashas

Capacity (grams)

Example: Do plastic bags from Target or plastic bags from Bashas hold more weight? A group of AP Statistics students decided to investigate by filling a random sample of 5 bags from each store with common grocery items until the bags ripped. Then they weighed the contents of items in each bag to determine its capacity. Here are their results, in grams:

Target: 12,572 13,999 11,215 15,447 10,896

Bashas: 9,552 10,896 6,983 8,767 9,972

a) Draw parallel dotplots of the grocery bag capacities from the two stores. Just from looking at the graphs,

do you expect to find evidence of a significant difference in mean capacity between the bags from the two stores?

There is hardly any overlap in the capacities of the bags from the two stores. The Bashas bag with the highest capacity has the same capacity as the Target bag with the lowest capacity. I would expect to find evidence of a significant difference in mean capacity.

b) Construct and interpret a 99% confidence interval for the difference in mean capacity of plastic grocery

bags from Target and Bashas.

Two-sample t interval for T Bµ µ−

Tµ and

Bµ are the mean capacities of plastic bags at Target and Bashas, respectively, in grams.

• Random? Independent random samples of bags from the two stores were selected.

• Normal? The dotplots show no obvious skewness or outliers, so it’s safe to proceed.

• Independent? There are more than ( )10 5 50= bags at each store.

12,825.8 g 1912.5 g 9234 g 1474.2 gT T B B

x s x s= = = =

Using the conservative 5 – 1 = 4 df, 4.604t∗

= for a 99% interval.

( ) ( )

( )

2 2 2 21912.5 1474.212825.8 9234 4.604

5 5

3591.8 4972.1 grams 1380.3, 8563.9 grams

T BT B

T B

s sx x t

n n

∗− ± + = − ± +

= ± = −

Using technology, with df = 7.5, the interval is ( )100.9, 7284.5 grams.−

We are 99% confident that the interval from –100.9 to 7284.5 grams captures the true difference in the mean capacity of plastic grocery bags from the two stores (Target – Bashas).

c) Does your interval provide convincing evidence that there is a difference in the mean capacity between

the two stores? Justify your answer. Since the interval includes 0, it is plausible that there is no difference in the two means. Thus, we do not have convincing enough evidence to say that there is a difference in mean capacity for bags from the two stores. However, with a larger sample size, we would likely find a significant difference since it seems pretty clear that Target bags have a larger capacity.

Two-Sample t Test for the Difference between Two Means

To test the hypothesis 0 1 2: hypothesized value,H µ µ− = compute the two-sample t statistic

Test statistic: ( ) ( )1 2 1 2

2 2

1 2

1 2

x x µ µt

s s

n n

− − −=

+

P-value: Find the probability of getting a t statistic this large or larger in the direction specified by the

alternative hypothesis .a

H Use the t distribution with degrees of freedom approximated by technology or the

smaller of 1 1n − and 2 1.n − The P-value is the shaded area. For a two-tailed test, it is the total area in both tails.

1 2 1 2 1 2: hypothesized value : hypothesized value : hypothesized valuea a a

H µ µ H µ µ H µ µ− > − < − ≠

Conditions:


• Normal/Large Sample Size: Both samples are large ( 1 30n ≥ and 2 30n ≥ ) OR no strong skewness or

outliers can be seen in the graph of either distribution of sample data. (You are checking to make sure it is reasonable to believe that both populations distributions are approximately Normal.)

10% Condition: Check that both populations are at least 10 times as large as their corresponding samples.

Significance Tests for the Difference between Two Means on TI-83/TI-84 Calculators. 1. Choose “2-SampTTest” on the STAT → TESTS menu.

2. Choose “Data” if you have a list of sample data. Choose “Stats” if you have values for 1 2 1 2, , and .x x s s

3. Enter the requested information: For “Data” option, input the sample values into two lists and indicate which lists they are in.

For “Stats” option,

1x : mean of sample 1 2x : mean of sample 2

Sx1: sample st. dev. of sample 1 Sx2: sample st. dev. of sample 2 n1: size of sample 1 n2: size of sample 2

Always choose “NO” pooling! 5. Specify which proportion the alternative hypothesis says is higher. 6. Choose “Calculate” to see results, or “Draw” to see a shaded Normal curve.

Example: In commercials for Bounty paper towels, the manufacturer claims that they are the “quicker picker-upper.” But are they also the stronger picker upper? Two AP Statistics students, Wesley and Maverick, decided to find out. They selected a random sample of 30 Bounty paper towels and a random sample of 30 generic paper towels and measured their strength when wet. To do this, they uniformly soaked each paper towel with 4 ounces of water, held two opposite edges of the paper towel, and counted how many quarters each paper towel could hold until ripping, alternating brands. Here are their results: Bounty: 106 111 106 120 103 112 115 125 116 120 126 125 116 117 114 118 126 120 115 116 121 113 111 128 124 125 127 123 115 114

Generic: 77 103 89 79 88 86 100 90 81 84 84 96 87 79 90 86 88 81 91 94 90 89 85 83 89 84 90 100 94 87

a) Display these distributions using parallel boxplots and briefly compare these distributions. Based only on the boxplots, discuss whether or not your think the mean for Bounty is significantly higher than the mean for generic. Both distributions are approximately symmetric, but the generic distribution has three high outliers. The median number of quarters for the Bounty sample is much higher than the median number of quarters for the generic sample. The ranges of the two distributions are very similar, but the IQR of the Bounty distribution is larger. Since the medians are so far apart and there is almost no overlap in the two distributions, the mean number of quarters for the Bounty sample is almost certain to be significantly higher than the mean number of quarters for the generic sample. If the true means were really the same, it would be almost impossible to get so little overlap.

b) Use a significance test to determine whether there is convincing evidence that wet Bounty paper towels can hold more weight, on average, than wet generic paper towels can. Two-sample t test for a difference in means

0 :

:

B G

a B G

H µ µ

H µ µ

=

>

Bµ and

Gµ are the mean number of quarters that can be held by a wet Bounty and generic paper towel,

respectively.

• Random? The students used independent random samples of paper towels from each brand.

• Normal? Even though there are three outliers in the generic distribution, both distributions are reasonably symmetric and both sample sizes are at least 30, so it’s safe to proceed.

• Independent? There are more than ( )10 30 300= paper towels of each brand.

117.6 6.64 88.1 6.30B B G G

x s x s= = = =

( ) ( ) ( )2 2 22

117.6 88.1 017.64

6.64 6.30

30 30

B G B G

GB

B G

x x µ µt

ss

n n

− − − − −= = ≈

++

Conservative df = 30 – 1 = 29 and Technology df = 57.8

-value 0P ≈ (with either degrees of freedom)

Since the P-value is so low, we reject 0.H There is very convincing evidence that wet Bounty paper

towels can hold more quarters, on average, than wet generic paper towels.

Generic

Bounty

130120110100908070

Number of Quarters

Bounty: 1 3Min 103 Q 114 Med 116.5 Q 124 Max 128= = = = =

( ) ( )

( ) ( )

1

3

Q 1.5 IQR 114 1.5 10 99

Q 1.5 IQR 124 1.5 10 139

− = − =

+ = + = No outliers

Generic: 1 3Min 77 Q 84 Med 88 Q 90 Max 103= = = = =

( ) ( )

( ) ( )

1

3

Q 1.5 IQR 84 1.5 6 75

Q 1.5 IQR 90 1.5 6 99

− = − =

+ = + = 3 high outliers (100, 100, 103)

c) Interpret the P-value from part (b) in the context of this question. If the two brands of paper towel really can hold the same number of quarters, on average, when wet, there is virtually no chance of getting samples in which the average number of quarters held by the Bounty paper towels would be at least this much higher than the average number of quarters held by the generic paper towels.

Independent Samples vs. Paired Samples – Which Is It? a) To test the effect of background music on productivity, several workers are observed. For one month

they had no music. For another month they had background music. Paired – the same workers in both months

b) A random sample of 10 workers in Plant A are to be compared to a sample of 10 workers in Plant B.

Independent samples – different workers in the two plants

c) A new weight reducing diet was tried on ten women. The weight of each woman was measured before the diet, and again after being on the diet for ten weeks.

Paired – before and after weights measured on the same ten women

d) To compare the average weight gain of pigs fed two different rations, nine pairs of pigs were used. The pigs in each pair were litter-mates.

Paired – pigs were paired with a litter-mate and treatments were assigned within pairs

e) To test the effects of a new fertilizer, 100 plots are treated with one fertilizer, and 100 plots are treated with the other.

Independent samples – two separate sets of plots, no basis for matching one plot in the first group with one in the second group

f) A sample of college teachers is taken. We wish to compare the average salaries of male and female

teachers. Independent samples – no basis for matching a specific male teacher with a specific female teacher

g) A new fertilizer is tested on 100 plots. Each plot is divided in half. Fertilizer A is applied to one half and

B to the other. Paired – two measurements were taken on each plot

h) Consumers Union wants to compare two types of calculators. They get 100 volunteers and ask them to

carry out a series of 50 routine calculations (such as figuring discounts, sales tax, totaling a bill, etc.). Each volunteer does each calculation on both types of calculator, and the time required for each calculation is recorded.

Paired – each volunteer used both types of calculators

Inference for Experiments Important Differences Parameters:

• Proportions of individuals like those in the study who would respond a certain way to each treatment.

• Mean response sizes for individuals like those in the study.

� Caution: Avoid past tense when defining your parameters and in your conclusion. If you refer to how individuals did respond rather than how they would respond, you are talking about your treatment group and

are referring to values of statistics ( ˆ 'p s or 'x s ) that can be calculated directly rather than the unknown

parameters ( 'p s or 'µ s ) for which you are doing inference. Also, do not refer to subjects when defining

your parameter or in your conclusion for the same reason – subjects are people who actually took part in your experiment, not individuals similar to them.

Conditions:

• Random: The data come from two groups in a randomized experiment.

• Normal:

o For 2-sample inference about proportions:

� 1 1 1 1 2 2 2 2ˆ ˆ ˆ ˆ, , , and n p n q n p n q must all be at least 10. That is, there must be at least 10

successes and 10 failures in each treatment group.

o For 2-sample inference about means:

� The two treatment groups are both large ( 1 230 and 30n n≥ ≥ ) OR no strong skewness

or outliers can be seen in the graph of either distribution of sample data. (You are checking to make sure it is reasonable to believe that the true distributions of responses to the two treatments are approximately Normal).

• Independent: The outcomes for the individuals in the study must be independent of each other. (The outcome for any individual shouldn’t give you any new information about the likely outcome for any other individual.) In a well-designed experiment that includes controls and random assignment, this should be true. (If you want to study the effects of the treatments, you must control any other variables

that might affect the outcome, including any influence the subjects might have on each other.)

� DO NOT CHECK THE 10% CONDITION! (You didn’t take a random sample, so it isn’t correct to check the 10% condition, and YOU WILL LOSE POINTS if you do!)

Example: High levels of cholesterol in the blood are associated with a higher risk of heart attacks. Will using a drug to lower blood cholesterol reduce heart attacks? The Helsinki Heart Study recruited middle-aged men with high cholesterol but no history of other serious medical problems to investigate this question. The volunteer subjects were assigned at random to one of two treatments: 2051 men took the drug gemfibrozil to reduce their cholesterol levels, and a control group of 2030 men took a placebo. During the next five years, 56 men in the gemfibrozil group and 84 men in the placebo group had heart attacks.

a) Do the results of this study give convincing evidence at the 0.01α = level that gemfibrozil is effective in preventing heart attacks? Two-sample z test for a difference in proportions

0 :

:

gem plac

a gem plac

H p p

H p p

=

<

gemp and

placp are the true proportions of men like those in the study who would suffer a heart attack

while taking gemfibrozil or a placebo, respectively.

• Random: Subjects were randomly assigned to take either gemfibrozil or a placebo.

• Normal: ˆ ˆ56 10 84 10

ˆ ˆ1995 10 1946 10

gem gem plac plac

gem gem plac plac

n p n p

n q n q

= ≥ = ≥

= ≥ = ≥

• Independent: Whether or not one subject has a heart attack should be unrelated to whether or not any other subject has a heart attack.

( )

( ) ( ) ( ) ( )( )

56 84 56 84 140ˆ ˆ ˆ0.027 0.041 0.034

2051 2030 2051 2030 4081

0.027 0.041 02.47 -val 2.47 0.0068

0.034 0.966 0.034 0.966

2051 2030

gem plac Cp p p

z P P z

+= ≈ = ≈ = = ≈

+

− −≈ ≈ − = < − ≈

+

Since -val 0.01,P α< = we reject 0.H There is convincing evidence that gemfibrozil is more effective

than a placebo at preventing heart attacks for men like those in the study.

b) Interpret the P-value you got in part a) in the context of this experiment. If gemfibrozil and placebo are equally effective at preventing heart attacks, there is only about a 0.0068 probability of seeing a difference (Gemfibrozil – placebo) in heart attack rate for the two groups as low or lower than the observed difference of –0.0141 just by the chance involved in random assignment.

The logic behind this example: There are two possible reasons why we might have observed a difference in the proportions of subjects in our two groups who experienced heart attacks. Either a lower proportion of the gemfibrozil group experienced heart attacks because gemfibrozil is more effective at preventing heart attacks than a placebo, or all 140 people in the study who experienced heart attacks would have had a heart attack regardless of which treatment they received, and the researchers just happened to put a higher proportion of them in the gemfibrozil group by chance.

Let’s assume that 0 : 0G C

H p p− = is true. That is, there is no difference in the effectiveness of gemfibrozil and

the placebo. All 140 people in the study who experienced heart attacks would have had a heart attack no matter which treatment they received. We can think about what would happen if we were to repeat the reassignment many times – dividing the 4081 subjects into two treatment groups, one with 2051 subjects, and one with 2030 subjects. Each time, we count how many of those who end up having a heart attack end up in each group, then we calculate the difference in sample

proportions, ˆ ˆG C

p p− for each randomization. The

result is called a randomization distribution. The figure to the right shows the result of 3000 re-randomizations for this scenario. Notice that the distribution is approximately Normal with a mean of 0 and a standard deviation of 0.0057, which are very close to the same mean and standard deviation we get from the formulas used for situations where we have two independent random samples! Very convenient!

In the Helsinki Heart Study, the observed difference in the proportions of subjects who had a heart attack in the gemfibrozil and placebo groups was 0.0273 – 0.0414 = – 0.0141. Only 25 of the 3000 re-randomizations resulted in a difference in proportions this low or lower by chance, so the estimated P-value is about 0.0083, which isn’t far off from the P-value we calculated with our formulas. Basically, the P-value is the probability that we see at least this much of a difference in the sample proportions simply due to chance variation in random assignment if gemfibrozil is no more effective than a placebo at preventing heart attacks. Example: Does increasing the amount of calcium in our diet reduce blood pressure? Observational studies have suggested a link, but researchers designed a randomized comparative experiment to investigate the question of causation. The subjects were 21 healthy men who volunteered to take part in the experiment. They were randomly assigned to two groups: 10 of the men received a calcium supplement for 12 weeks, while the control group of 11 men received a placebo pill that looked identical. The experiment was double-blind. The response variable is the decrease in systolic blood pressure for a subject after 12 weeks, in millimeters of mercury. (An increase appears as a negative number.) Here are the data:

Calcium: 7 –4 18 17 –3 –5 1 10 11 –2

Placebo: –1 12 –1 –3 3 –5 5 2 –11 –1 –3

a) Draw parallel dotplots of the data. Based only on the dotplots, do you suspect there will be a significant

difference between the true mean decrease in systolic blood pressure for healthy men like those in the study who take calcium and those who take a placebo?

While the mean decrease in blood pressure for calcium does appear to be slightly higher than the mean decrease for placebo, there is a lot of overlap in the distributions of sample data. Additionally, the sample sizes are small, so I suspect the difference won’t be significant.

b) Do the data provide convincing evidence that a calcium supplement reduces blood pressure more than a placebo, on average, for subjects like the ones in this study? Two-sample t test for a difference in means

0 :

:

calc plac

a calc plac

H µ µ

H µ µ

=

>

calcµ and

placµ are the true mean reductions in systolic pressure for people like those in the study who

take a calcium supplement or a placebo, respectively, for 12 weeks.

• Random? Subjects were randomly assigned to either take a calcium supplement or a placebo.

• Normal? The dotplots above do not show extreme skewness or outliers, so it’s safe to proceed.

• Independent? One subject’s decrease in systolic blood pressure should be unrelated to any other subject’s decrease in systolic blood pressure.

20151050-5-10-15

Calcium

Placebo

Decrease in systolic blood pressure

5.000 8.743 0.273 5.901calc calc plac plac

x s x s= = = − =

( ) ( ) ( )( )2 2 22

5 0.273 01.604

8.743 5.901

10 11

calc plac calc plac

placcalc

calc plac

x x µ µt

ss

n n

− − − − − −= = ≈

++

Conservative df = 10 – 1 = 9 and Technology df = 15.6

-value 0.0644P ≈ (with technology df = 15.6)

Since -value 0.05,P α> = we fail to reject 0.H There is not convincing evidence that the true mean

decrease in systolic blood pressure is higher for men like these who take calcium than for men like these who take a placebo.

The logic behind this example: There are two possible reasons why we observed a difference in mean blood pressure reduction for the two groups as large as we did. Either the mean blood pressure reduction was higher for the calcium group because calcium is more effective at lowering blood pressure, or the two treatments are equally effective, and any differences we saw were simply because of chance variation due to random assignment.

Assume that 0 : 0C P

H µ µ− = is true. That is, assume that calcium and the placebo are equally effective at

lowering blood pressure. If we reassign the 21 subjects to the two groups many times, assuming the treatment

doesn’t affect each individual’s change in blood pressure, and then calculate the new difference in sample mean

decrease in systolic blood pressure (C P

x x− ) for each re-randomization, we can estimate how likely it is that

we’d see a difference at least as extreme as the one we actually observed by chance if the two treatments don’t differ in effectiveness. The randomization distribution is approximately Normal with a mean of 0.014 and a standard deviation of 3.400, which agree well with the mean and standard deviation we calculated using the formulas for situations involving two independent random samples (0 and 3.29). Very convenient! The observed difference in the mean reduction in blood pressure in the calcium and placebo groups was 5.000 – (–0.273) = 5.273. About 660 of the re-randomizations resulted in differences this high or higher by chance, so the estimated P-value is about 0.066, which isn’t far off from the P-value we calculated with our formulas. Basically, the P-value is the probability that we see at least this much of a difference in the sample means simply due to chance variation in random assignment if calcium is no more effective than a placebo at reducing blood pressure.

AP Statistics – 12.1 Notes

Inference for Linear Regression

Regression Line: A line that describes how a response variable y changes as an explanatory variable x changes. Correlation (r): A number between –1 and 1 that measures the direction and strength of the linear relationship between two variables.

Coefficient of Determination 2( r ) : The proportion of the variation in the values of y that is explained by the

least-squares regression line.

Residuals: The differences between the observed value of the response variable and the value predicted by the

regression equation. ˆresidual y y= −

Residual Plot: A plot of the explanatory variable (x) vs. the residuals.

If we have all the data for a population, we can calculate the true regression line: .yµ α βx= += += += +

Regression line based on the entire population (all eruptions of Old Faithful in a month) What can we do if we have a sample? Can a regression line based on a sample tell us anything about the true regression line of the population?

Regression lines based on samples of 20 eruptions from that month. Notice how the slope of the regression line is different for each sample, even though they all come from the same population.

Population Regression Equation:

33.97 10.36yµ x= +

Conditions for Regression Inference (LINER): Suppose we have n observations on an explanatory variable x and a response variable .y Our goal is to study or predict the behavior of y for given values of .x

• Linear: The actual relationship between x and

y is linear. The mean values of y for each value

of x line up along the population (true)

regression line .yµ α βx= +

• Independent: Individual observations are independent of each other (or the 10% condition is met).

• Normal: For each value of x, the y-values are Normally distributed around the regression line.

• Equal Variance: The standard deviation of ,y

called ,σ is the same for all values of x.

• Random: The data come from a well-designed random sample or randomized experiment.

How to Check Conditions:

• Linear: Look at the scatterplot to make sure the overall pattern is roughly linear. Make sure there are no curved patterns in the residual plot. Check to see that the residuals appear randomly scattered and are centered around the “residual = 0” line.

• Independent: Look at how the data were produced. If sampling is done without replacement, check the 10% condition. If the study is a randomized experiment, good design with proper controls and random assignment help ensure the independence of individual observations.

• Normal: Make a stemplot, histogram, or Normal probability plot of the residuals and check to make sure there isn’t extreme skewness, outliers, or other major departures from Normality.

• Equal Variance: Look at the scatter of the residuals above and below the “residual = 0” line in the residual plot. The amount of scatter should be roughly the same from the smallest to the largest x-value. Make sure you don’t see a “fan” pattern – a tight cluster at one end of the graph and a spread out pattern at the other end.

• Random: The data come from a well-designed random sample or randomized experiment.

Example: Many people believe that students learn better if they sit closer to the front of the classroom. Does sitting closer cause higher achievement or do better students simply choose to sit nearer to the front? To investigate, an AP Statistics teacher randomly assigned students to seat locations in his classroom for a particular chapter and recorded the test score for each student at the end of the chapter. The explanatory variable in this experiment is which row the student was assigned to (Row 1 is closest to the front and Row 7 is farthest away). Here are the results. Row 1: 76, 77, 94, 99 Row 2: 83, 85, 74, 77 Row 3: 90, 88, 68, 78 Row 4: 94, 72, 101, 70, 79 Row 5: 76, 65, 90, 67, 96 Row 6: 88, 79, 90, 83 Row 7: 79, 76, 77, 63

Predictor Coef SE Coef T P

Constant 85.706 4.239 20.22 0.000

Row -1.1171 0.9472 -1.18 0.248

S = 10.0673 R-Sq = 4.7% R-Sq(adj) = 1.3%

a) What is the equation of the least-squares regression line? � ( )Test score 85.706 1.1171 Row= −

b) Interpret the slope of the least-squares regression line in this context.

For every row further from the front a student sits, our model predicts that the test score will decrease by 1.1171 points.

c) Interpret the value of 2r in this context.

4.7% of the variation in test score can be explained by the approximate linear relationship with row number.

d) Why was it important to randomly assign the students to seats rather than letting each student choose

where to sit?

Better students might all choose to sit near the front, which would make it look like sitting in the front improves test scores, when in reality, the students sitting there were just smarter.

7654321

100

90

80

70

60

Row

Score

e) Check to see if the conditions for inference are met. A residual plot and a histogram of the residuals are shown below.

7654321

20

10

0

-10

-20

Row

Residual

20151050-5-10-15

7

6

5

4

3

2

1

0

Residual

Frequency

Linear: The scatterplot shows a roughly linear relationship, and there are no curved patterns in the residual plot. The residuals are evenly scattered above and below zero. Independent: Since students were randomly assigned to seats and were most likely monitored for cheating, knowing the score of one student should give no additional information about the scores of other students. Normal: The histogram of residuals is roughly unimodal and symmetric, and there seem to be no serious departures from Normality. Equal Variance: The residual plot shows roughly even scatter around the residual = 0 line. There is no systematic increase or decrease in variability as row # increases. Random: The students were assigned to seats at random.

Parameters for the True Regression Line yµ α βx= += += += +

• α is the true y-intercept.

• β is the true slope.

• σ is the standard deviation. It describes the variability of the response y about the population (true)

regression line. It basically says how tightly packed the observations are around the line.

Estimating α and :β In the regression line for a sample, ˆ ,y a bx= + the slope b is an unbiased estimator of

the true slope ,β and the intercept a is an unbiased estimator of the true intercept .α

Estimating :σ We are often interested in how tightly the data are clustered around the regression line. Since σ

is unknown, we use ,s which is the standard deviation of the residuals. Remember that s can be interpreted as

the typical prediction error, or the typical or average distance of the observed values from the predicted values.

( )

22 ˆresiduals

2 2

y ys

n n

−= =

− −

∑ ∑.

Inference about the True Slope, β

Usually, the most important parameter in a regression problem is the true slope, .β The slope tells us how much

y is predicted to change, on average, each time x changes by 1 unit.

The standard error of the slope (((( ))))bSE is the standard deviation of the

sampling distribution of b – the standard deviation of the slopes of regression lines formed by taking repeated samples of the same size from the population. It measures how much the slopes of the sample regression lines from repeated samples typically vary from the slope of the population regression line. The graph at the left shows the slopes of the regression lines for 1000 samples

of size 20n= from the Old Faithful data.

Normally, we get the value of b

SE from computer output, but the formula is .1

b

x

sSE

s n=

−

� If 0,β = that means the mean of y does not change at all when x changes. In other words, it means there

is no true linear relationship between x and .y

� When data from a random sample or a randomized experiment suggest that there is an association between two variables, there are two possible explanations for why the slope differs from 0. We do inference to decide which explanation seems more plausible.

o Explanation 1: There really is no association between the variables, and we got a nonzero slope due to sampling variability or the chance variation due to random assignment.

o Explanation 2: There really is an association between the two variables.

Inference about the true slope, ,β involves a t curve with 2n− degrees of freedom.

Confidence Interval for the True Slope, :β

( )( )statistic critical value standard error of the statistic

* bb t SE

±

±

Use the t curve with 2n − degrees of freedom. Example: Here is the computer output from the previous example (test score vs. row #):


Constant 85.706 4.239 20.22 0.000

Row -1.1171 0.9472 -1.18 0.248

S = 10.0673 R-Sq = 4.7% R-Sq(adj) = 1.3%

a) Identify the standard error of the slope, ,b

SE from the computer output. Interpret this value in context.

0.9472.b

SE = If this experiment were repeated many times, the slopes of the sample regression lines for

predicting test scores from row number would vary by an average of about 0.9472 points per row from the true slope.

b) Calculate a 95% confidence interval for the true slope. Show your work. Interpret the interval in context.

t Interval for .β

30n = df 30 2 28= − = for 95% interval 2.048t∗

=

( ) ( )1.1171 2.048 0.9472 1.1171 1.9399 3.0570, 0.8228 .bb t SE∗

± = − ± = − ± = −

We are 95% confident that the interval from –3.0570 to 0.8228 points per row captures the slope of the true regression line for predicting a student’s test score from the student’s row number.

c) Based on your interval, is there convincing evidence that seat location affects scores?

Because the interval of plausible slopes includes 0, we do not have convincing evidence that there is an association between test score and row number.

Example: For their second-semester project, two AP Statistics students decided to investigate the effect of sugar on the life of cut flowers. They went to the local grocery store and randomly selected 12 carnations. All the carnations seemed equally healthy when they were selected. When the students got home, they prepared 12 identical vases with exactly the same amount of water in each vase. They put one tablespoon of sugar in 3 vases, two tablespoons of sugar in 3 vases, and 3 tablespoons of sugar in 3 vases. In the remaining vases, they put no sugar. After the vases were prepared and placed in the same location, the students randomly assigned one flower to each vase and observed how many hours each flower continued to look fresh. Here are the data along with computer output from a least-squares regression analysis:


Constant 181.200 3.635 49.84 0.000

Sugar (Tbsp.) 15.200 1.943 7.82 0.000

S = 7.52596 R-Sq = 86.0% R-Sq(adj) = 84.5%

a) Construct and interpret a 99% confidence interval for the slope of the true regression line.

t Interval for β

Linear: The scatterplot shows a roughly linear pattern, and there is no obvious curved pattern in the residual plot. Independent: The flowers were all in different vases and were assigned to treatments randomly, so knowing how long one flower stays fresh shouldn’t give any additional information about how long the other flowers stay fresh.

Sugar (Tbsp.) Freshness (hrs.)

0 168

0 180

0 192

1 192

1 204

1 204

2 204

2 210

2 210

3 222

3 228

3 234

3.02.52.01.51.00.50.0

240

230

220

210

200

190

180

170

160

Sugar (Tbsp.)

Freshness (hrs.)

3.02.52.01.51.00.50.0

10

5

0

-5

-10

-15

Sugar (Tbsp.)

Residual

1050-5-10-15

4

3

2

1

0

Residual

Frequency

Normal: The histogram of residuals does not show any major skewness or outliers. Equal Variance: The residual plot shows roughly even scatter around the residual = 0 line. There is no systematic increase or decrease in variation as sugar amount increases. Random: Flowers were selected at random and were randomly assigned to vases.

12n = df 12 2 10= − = 3.169t∗

=

( ) ( )15.2 3.169 1.943 15.2 6.16 9.04, 21.36 hours per Tbsp. of sugar.bb t SE∗

± = ± = ± =

We are 99% confident that the interval from 9.04 to 21.36 hours per tablespoon of sugar contains the slope of the true regression line for predicting hours of freshness from amount of sugar. Since all the values are positive, this is convincing evidence that there is a positive linear relationship between hours of freshness and amount of sugar.

b) Would you feel confident predicting the hours of freshness if 10 tablespoons of sugar are used? Explain.

No. Since the experiment only used amounts of sugar up to 3 tablespoons, this would be extrapolation. We don’t know if the linear relationship would continue outside the range of the data.

Significance Tests about the True Slope, :β

Hypotheses:

0 :H 0β = (There is no true linear relationship between x and .y )

:a

H 0β> (There is a positive correlation – y increases as x increases)

-or- 0β< (There is a negative correlation – y decreases as x increases)

-or- 0β ≠ (There is a linear relationship – y changes when x changes)

Test Statistic: b

bt

SE= with 2n− degrees of freedom.

P-value: The probability of getting a t statistic this large or larger in the direction specified by the alternative hypothesis.

� The P-value given by computer output is for a two-sided test. You must divide it by two if you are

doing a one-sided test.

Example: Do customers who stay longer at buffets give larger tips? An AP Statistics student who worked at an Asian buffet decided to investigate this question. While doing her job as a hostess, she obtained a random sample of receipts, which included the length of time (in minutes) the party was in the restaurant and the amount of the tip (in dollars). Do these data provide convincing evidence that customers who stay longer give larger tips? Here are the data and computer output.


Constant 4.535 1.657 2.74 0.021

Time (min.) 0.03013 0.02448 1.23 0.247

S = 1.77931 R-Sq = 13.2% R-Sq(adj) = 4.5%

a) Describe what the scatterplot tells you about the relationship between the two variables. There appears to be a weak, positive, linear association between amount of time spent at the buffet and amount of the tip. In general, those who stay longer leave larger tips.

b) What is the equation of the least-squares regression line for predicting the amount of the tip from the

length of the stay? Define any variables you use.

� ( )Tip 4.535 0.03013 Time= +

c) Interpret the slope and the y-intercept of the least-squares regression line in context.

Slope: For each additional minute spent at the buffet, our model predicts an increase of 3¢ in tip.

y-intercept: Our model predicts that someone who spends 0 minutes at the buffet will leave a $4.54 tip. This is extrapolation and is clearly nonsensical.

d) Carry out an appropriate test to determine whether the data provide convincing evidence that customers

who stay longer give larger tips.

t test for β (using 0.05α = )

0 : 0H β =

: 0a

H β >

β is the true slope of the population regression line for predicting tip from length of stay.

Time (min.) Tip ($)

23 5.00

39 2.75

44 7.75

55 5.00

61 7.00

65 8.88

67 9.01

70 5.00

74 7.29

85 7.50

90 6.00

99 6.50

1009080706050403020

9

8

7

6

5

4

3

2

T ime (min.)Tip ($)

1009080706050403020

3

2

1

0

-1

-2

-3

T ime (min.)

Residual

210-1-2-3

3.0

2.5

2.0

1.5

1.0

0.5

0.0

Residual

Frequency

Linear: The scatterplot shows a weak, positive, linear relationship between length of stay and tip amount. The residual plot shows random scatter around the residual = 0 line with no clearly curved pattern. Independent (10% condition): It is reasonable to believe that there are more than 10(12) = 120 receipts for the buffet. Normal: The histogram of residuals does not show strong skewness or outliers. Equal variance: The residual plot shows roughly even scatter around the residual = 0 line. There is no systematic increase or decrease in variation as time spent at the buffet increases. Random: The receipts were randomly selected.

Test statistic: 1.23t =

P-value = 0.247÷2 = 0.1235 df = 12 – 2 = 10

Since -value 0.05,P α> = we fail to reject 0.H We do not have convincing evidence that parties who

stay longer at the buffet leave bigger tips.

Example: A random sample of 11 used Honda CR-Vs from the 2002-2006 model years was selected from the inventory at www.carmax.com. The number of miles driven and the advertised price were recorded for each CR-V. A 95% confidence interval for the slope of the true least-squares regression line for predicting advertised

price from number of miles (in thousands) driven is ( )122.3, 50.1 .− − Based on this interval, what conclusion

should we draw from a test of 0 : 0H β = versus : 0a

H β ≠ at the 0.05α = significance level?

Since 0 is not in the 95% confidence interval of plausible slopes, we reject 0H at the 0.05α =

significance level. There is convincing evidence of a linear relationship between the number of miles driven and the advertised price.

ap statistics – ch. 10 noteslinford-math.weebly.com/uploads/4/2/4/6/4246372/ch._10... · 2018. 4....

Documents