comparing two proportions

49
COMPARING TWO PROPORTIONS Statistics

Upload: roden

Post on 22-Mar-2016

60 views

Category:

Documents


1 download

DESCRIPTION

Statistics. Comparing Two Proportions. Be able to state the null and alternative hypotheses for testing the difference between two population proportions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Comparing Two Proportions

COMPARING TWO PROPORTIONSStatistics

Page 2: Comparing Two Proportions

WHAT YOU WILL LEARN Be able to state the null and alternative hypotheses

for testing the difference between two population proportions.

Know how to examine your data for violations of conditions that would make inference about the difference between the two population proportions unwise or invalid.

Understand that the formula for the standard error of the difference between two independent sample proportions is based on the principle that when finding the sum or difference of two independent random variable, their variances add.

Page 3: Comparing Two Proportions

TERMS Variances of independent random

variables added— The variance of a sum or difference of

independent random variables is the sum of the variances of those variables.

Page 4: Comparing Two Proportions

TERMS Sampling distribution—

The sampling distribution of is, under appropriate assumptions, modeled by a Normal model with mean and standard deviation

ˆ p 1 − ˆ p 2

μ =p1 − p2

SD ˆ p 1 − ˆ p 2( ) = p1q1

n1

+ p2q2

n2

Page 5: Comparing Two Proportions

TERMS Two-proportion z-interval—

A two-proportion z-interval gives a confidence interval of the true difference in proportions, p1 – p2 , in two independent groups.

The confidence interval is where z*

is a critical value from the standard Normal model corresponding to a specified confidence level.€

ˆ p 1 − ˆ p 2( ) ± z * ×SE ˆ p 1 − ˆ p 2( ),

Page 6: Comparing Two Proportions

TERMS Pooling—

When we have data from different sources that we believe are homogeneous, we can get a better estimate of the common proportion and its standard deviation. We can combine, or pool, the data into a single group for the purpose of estimating the common proportion. The resulting pooled standard error is based on more data and is thus more reliable (in the null hypothesis is true and the groups are truly homogenous).

Page 7: Comparing Two Proportions

TERMS Two-proportion z-test—

Test the null hypothesis H0: p1 – p2 = 0 by referring the statistic

to a standard Normal model.

z =ˆ p 1 − ˆ p 2

SEpooled ˆ p 1 − ˆ p 2( )

Page 8: Comparing Two Proportions

EXAMPLE Who do you think is more intelligent, men or

women? Gallup poll of 520 women and 506 men. 28% of the men thought men were more intelligent. 14% of the women thought men were more

intelligent. Comparing two percentages are much more

common than questions with isolated percentages. Example– Treatment is better than placebo control Example– This year’s results are better than last

year’s.

Page 9: Comparing Two Proportions

COMPARING TWO PROPORTIONS We know the difference between the

two proportions of the random sample is 14%, but what is the true difference?

We would like to find the true difference and the margin of error.

For this we need to determine the standard deviation of the sampling distribution model for the difference in the proportions.

Page 10: Comparing Two Proportions

COMPARING TWO PROPORTIONS Remember– The variance of the sum

or difference of two independent random variables is the sum of their variances. (Chapter 16).

Why will this work?

Page 11: Comparing Two Proportions

COMPARING TWO PROPORTIONS How does this work? Consider grabbing a box of cereal. It claims there are 16 ounces in the

box. We know that this is not exact because

there is some variance from box to box. When you pour 2 ounces of cereal in a

bowl, there will be further variance from bowl to bowl.

How much cereal is left in the box?

Page 12: Comparing Two Proportions

COMPARING TWO PROPORTIONS According to our rule, the amount of

cereal left in the box would now be the sum of the two variances.

We need the standard deviation, not the variance which is finding the square root of the variance.

Page 13: Comparing Two Proportions

COMPARING TWO PROPORTIONS Here are the formulas.

This formula applies only when X and Y are independent.€

Var (X −Y ) = Var (X ) + Var (Y )

SD(X −Y ) = SD2(X) + SD2(Y ) = Var (X) + Var (Y )

Page 14: Comparing Two Proportions

COMPARING TWO PROPORTIONS The samples can have different sizes

and different proportion values. We use subscripts to keep the different

values straight. In comparing males and females, we

could use the subscripts of M and F or 1 and 2.

Page 15: Comparing Two Proportions

COMPARING TWO PROPORTIONS The standard deviations of the sample

proportions are:

SD ˆ p 1( ) = p1q1

n1

SD ˆ p 2( ) = p2q2

n2

Page 16: Comparing Two Proportions

COMPARING TWO PROPORTIONS The variance of the difference in the

proportions is:

The standard deviation is:€

Var ˆ p 1 − ˆ p 2( ) = p1q1

n1

⎝ ⎜

⎠ ⎟2

+ p2q2

n2

⎝ ⎜

⎠ ⎟2

= p1q1

n1

+ p2q2

n2

SD ˆ p 1 − ˆ p 2( ) = p1q1

n1

+ p2q2

n2

Page 17: Comparing Two Proportions

COMPARING TWO PROPORTIONS Since we usually don’t know the true

values of p1 and p2, we use the sample proportions from the data we are given.

We use them to estimate the variances and find the standard error.

SE ˆ p 1 − ˆ p 2( ) =ˆ p 1ˆ q 1n1

+ˆ p 2 ˆ q 2n2

Page 18: Comparing Two Proportions

INDEPENDENCE ASSUMPTIONS Within each group the data should be based

on results for independent individuals. Randomization Condition–

The data in each group should be drawn independently and at random from a homogeneous population or generated by a randomized comparative experiment.

The 10% Condition— If the data are sampled without replacement, the

sample should not exceed 10% of the population.

Page 19: Comparing Two Proportions

INDEPENDENCE ASSUMPTIONS Since we are comparing two groups, we

need to add the Independent Assumption.

This is the most important assumption. Independent Groups Assumption—

The two groups we are comparing must also be independent of each other. Usually, the independence of the groups from each other is evident in the way data were collected.

Page 20: Comparing Two Proportions

SAMPLE SIZE CONDITION Each of the groups must be big

enough. Success/Failure Condition—

Both groups are big enough that at least 10 successes and at least 10 failures have been observed in each.

Page 21: Comparing Two Proportions

SAMPLING DISTRIBUTION The sampling distribution model for a

difference between two independent proportions. Provided that the sampled values are

independent, the samples are independent, and the sample sizes are large enough, the sampling distribution of is modeled by a Normal model with and standard deviation

ˆ p 1 − ˆ p 2

μ =p1 − p2

SD ˆ p 1 − ˆ p 2( ) = p1q1

n1

+ p2q2

n2

Page 22: Comparing Two Proportions

SAMPLING DISTRIBUTION If we have the sampling distribution

model and the standard deviation, we have what we need to find the margin of error for the differences in proportions.

Page 23: Comparing Two Proportions

SAMPLING DISTRIBUTION Two-proportion z-interval—

When the conditions are met, we are ready to find the confidence interval for the difference of two proportions, . The confidence interval is where we find the standard error of the difference,

from the observed proportions.The critical value z* depends on the particular

confidence level, C, that you specify.

p1 − p2

ˆ p 1 − ˆ p 2( ) ± z *×SE ˆ p 1 − ˆ p 2( )

SE ˆ p 1 − ˆ p 2( ) =ˆ p 1ˆ q 1n1

+ˆ p 2 ˆ q 2n2

Page 24: Comparing Two Proportions

POOLING Consider this example— The National Sleep Foundation asked a random

sample of 1010 U.S. adults questions about their sleep habits. The study ensured that there was an equal number of men and women.

On the question about snoring had 995 respondents, 37% of adults reported that they snored at least a few nights a week during the past year.

26% of the 184 people under 30 snored with 39% of the 811 in the older group.

Can the difference really be 13% or is it due to the natural fluctuations in the sample that was chosen?

Page 25: Comparing Two Proportions

POOLING This type of question uses a hypothesis

test. What would be the null hypothesis? H0: p1 – p2 = 0 or H0: p1 = p2

What would be the alternative hypothesis?

HA:

p1 ≠ p2

Page 26: Comparing Two Proportions

POOLING The hypothesis is about a new

parameter– the difference in proportions.

We need to find the standard error for that.

But we can actually do better than the standard error.€

SE ˆ p 1 − ˆ p 2( ) =ˆ p 1ˆ q 1n1

+ˆ p 2 ˆ q 2n2

Page 27: Comparing Two Proportions

POOLING The proportions and the standard

deviations are linked. There are two proportions in the standard

error formula, but look at the null hypothesis.

It claims the proportions are equal. To test the hypothesis, we assume that the

null hypothesis is true. This means that there is a single value for

in the SE formula.

ˆ p

Page 28: Comparing Two Proportions

POOLING How can we do this? If the null hypothesis is true, then among

all adults the two groups have the same proportion.

We will see 48 + 318 = 366 snorers out of a total of 184 + 811 = 995 adults who responded to the question.

The overall proportion of snorers was 366/995 = 0.3678.

Page 29: Comparing Two Proportions

POOLING Pooling– Combining the counts to get

an overall proportion. Whenever we we have data from

different sources or different groups but we believe that they really came from the same underlying population, we can pool them to get better estimates.

ˆ p pooled = Success1 + Success2

n1 + n2

Page 30: Comparing Two Proportions

POOLING When we have only proportions and not the

counts, as in the snoring example, we have to reconstruct the number of successes by multiplying the sample sizes by the proportions.

If these calculations don’t come out to whole numbers, round first.

There must have been a whole number of successes to begin with. (This is the only time you round in the middle of a calculation.)€

Success1 = n1 ˆ p 1 and Success2 = n2 ˆ p 2

Page 31: Comparing Two Proportions

POOLING We can then put the pooled value into

the formula, substituting it for both sample proportions in the standard error formula.

SE pooled ˆ p 1 − ˆ p 2( ) =ˆ p pooled ˆ q pooled

n1

+ˆ p pooled ˆ q pooled

n2

Page 32: Comparing Two Proportions

POOLING Snoring--

=0.3678 × 1− 0.3678( )

184+

0.3678 × 1− 0.3678( )811

= 0.039375

Page 33: Comparing Two Proportions

EXAMPLE-- #1 PAGE 507 A presidential candidate fears he has a

problem with women voters. His campaign staff plans to run a poll to assess the situation. They’ll randomly sample 300 men and 300 women, asking if they have a favorable impression of the candidate. Obviously, the staff can’t know this, but suppose the candidate has a positive image with 59% of males but with only 53% of females.

Page 34: Comparing Two Proportions

EXAMPLE-- #1 PAGE 507 What kind of sampling design is his

staff planning to use?

This is a stratified random sample, stratified by gender.

Page 35: Comparing Two Proportions

EXAMPLE-- #1 PAGE 507 What difference would you expect the

poll to show?

We would expect the difference in proportions in the sample to be the same as the difference in proportions in the population, with the percentage of the respondents with a favorable impression of the candidate 6% higher among males.

Page 36: Comparing Two Proportions

EXAMPLE-- #1 PAGE 507 Of course, sampling error means the poll

won’t reflect the difference perfectly. What’s the standard error for the difference in the proportions?

The standard deviation of the difference proportions is:

σ ˆ p M − ˆ p F( ) =ˆ p M ˆ q MnM

+ˆ p F ˆ q FnF

=0.59( ) 0.41( )

300+

0.53( ) 0.47( )300

= 4%

Page 37: Comparing Two Proportions

EXAMPLE-- #1 PAGE 507 Sketch a sampling model for the size

difference in proportions of men and women with favorable impressions of this candidate that might appear in a poll like this.

Difference in proportion with favorable impression (Male – Female)

68%

95%

99.7%

-6% -2% 2% 6% 10% 14% 18%

Page 38: Comparing Two Proportions

EXAMPLE-- #1 PAGE 507 Could the campaign be misled by the poll,

concluding that there really is no gender gap? Explain.

The campaign could certainly be misled by the poll. According to the model, a poll showing little difference could occur relatively frequently. That result is only 1.5 standard deviations below the expected difference in proportions.

Page 39: Comparing Two Proportions

EXAMPLE-- #4 PAGE 508 In October 2000 the U.S. Department of

Commerce reported the results of a large-scale survey on high school graduation. Researchers contacted more than 25,000 Americans aged 24 years to see if they had finished high school; 84% of the 12,460 males and 88.1% of the 12,678 females indicated that they had high school diplomas.

Page 40: Comparing Two Proportions

EXAMPLE-- #4 PAGE 508 Are the assumptions and conditions necessary for

inference satisfied? Explain. Randomization condition—

Assume that the samples are representative of all recent graduates.

10% condition— Although large, the samples are less than 10% of all graduates.

Independent samples condition— The sample of men and the sample of women were drawn

independently of each other. Success/Failure condition—

The samples are very large, certainly large enough for the methods of inference to be used.

Page 41: Comparing Two Proportions

EXAMPLE-- #4 PAGE 508 Create a 95% confidence interval for the

difference in graduation rates between males and females.

ˆ p F − ˆ p M( ) ± z *ˆ p F ˆ q FnF

+ˆ p M ˆ q MnM

= 0.881− 0.849( ) ±1.9600.881( ) 0.119( )

12,687+

0.849( ) 0.151( )12,460

= (0.024, 0.040)

Page 42: Comparing Two Proportions

EXAMPLE-- #4 PAGE 508 Interpret your confidence interval.

We are 95% confident that the proportion of 24-year old American women who have graduated from high school is between 2.4% and 4.0% higher than the proportion of American men the same age who have graduated from high school.

Page 43: Comparing Two Proportions

EXAMPLE-- #4 PAGE 508 Does this provide strong evidence that

girls are more likely than boys to complete high school? Explain.

Since the interval for the difference in proportions of high school graduates does not contain 0, there is strong evidence that women are more likely than men to complete high school.

Page 44: Comparing Two Proportions

EXAMPLE– #6 PAGE 508 The painful wrist condition called carpal

tunnel syndrome can be treated with surgery or less invasive wrist splints. In September 2002, Time magazine reported on a study of 176 patients. Among the half that had surgery, 80% showed improvement after three months, but only 54% of those who used the wrist splints improved.

Page 45: Comparing Two Proportions

EXAMPLE– #6 PAGE 508 What’s the standard error of the

difference in the two proportions?

SE ˆ p Surg − ˆ p Splint( ) =ˆ p surg ˆ q surg

nsurg

+ˆ p splint ˆ q splint

nsplint

=0.80( ) 0.20( )

88+

0.54( ) 0.46( )88

= 0.068

Page 46: Comparing Two Proportions

EXAMPLE– #6 PAGE 508 Construct a 95% confidence interval for this difference. Randomization condition–

It’s not clear whether or not this study was an experiment. If so, assume that the subjects were randomly allocated to treatment groups. If not, assume that the subjects are representative of all carpal tunnel sufferers.

10% condition— 88 subjects in each group are less than 10% of all carpal tunnel

sufferers. Independent samples condition—

The improvement rates of the two groups are not related. Success/Failure condition--

All are greater than 10, so the samples are large enough.€

nˆ p (surg) = (88)(0.80) = 70; n ˆ q (surg) = (88)(.20) =18nˆ p (splint) = (88)(0.54) = 48; n ˆ q (splint) = (88)(0.46) = 40

Page 47: Comparing Two Proportions

EXAMPLE– #6 PAGE 508 Success/Failure condition—

All are greater than 10, so the samples are large enough.

Since the conditions have been satisfied, we will find a two-proportion z-interval.

nˆ p (surg) = (88)(0.80) = 70; n ˆ q (surg) = (88)(.20) =18nˆ p (splint) = (88)(0.54) = 48; n ˆ q (splint) = (88)(0.46) = 40

Page 48: Comparing Two Proportions

EXAMPLE– #6 PAGE 508 Success/Failure condition—

Since the conditions have been satisfied, we will find a two-proportion z-interval.

ˆ p Surg − ˆ p Splint( ) ± z *ˆ p surg ˆ q surg

nsurg

+ˆ p splint ˆ q splint

nsplint

=(0.80 − 0.54) ±1.9600.80( ) 0.20( )

88+

0.54( ) 0.46( )88

= 0.126, 0.394( )

Page 49: Comparing Two Proportions

EXAMPLE– #6 PAGE 508 State an appropriate conclusion.

We are 95% confident that the proportion of patients who show improvement in carpal tunnel syndrome with surgery is between 12.6% and 39.4% higher than the proportion who show improvement with wrist splints.