Download - CH9: Testing the Difference Between Two Means, …ssantori/MATH2830SP13/Math2830...CH9: Testing the Difference Between Two Means or Two Proportions Santorico - Page 344 Section 9-1

CH9: Testing the Difference Between Two Means or Two Proportions Santorico - Page 343

CH9: Testing the Difference Between Two Means, Two Proportions, and Two

Variances


Section 9-1 Testing the Difference Between Two Means: Using the Z Test

Suppose we are interested in determining if a certain medication relieves patients’ headaches. We give the drug/treatment to one group and give a placebo to a control group and compare the mean incidences of patient relief from the headache between the two groups. If the treatment group had a statistically significant improvement in headache symptoms over the control group, then we can conclude the drug works.

CH9: Testing the Difference Between Two Means or Two Proportions Santorico - 45

So our question might be, “Is the mean incidence of headache relief different for the two groups?” Let

1 mean headache relief from treatment group and

2 mean headache relief from control group. Then our hypotheses would be:

H0 :

H1 : Alternatively, we could state the hypotheses as:

H0 :

H1 :


Assumptions for the Test to Determine the Difference Between Two Means

The samples must be independent of each other. That is, there can be no relationship between the subjects in each sample.

The populations from which the samples come must be (approximately) normally distributed or the sample sizes of both groups should be at least 30.

The standard deviations of both populations must be known.


We can compare the groups by the difference in their population means,

1 2, where

1 is the population mean for group 1 and

2 is the population mean for group 2.

We estimate

1 2 with

x 1 x 2

The standard deviation of

x 1 x 2 is

1

2

n1

2

2

n2

When both populations are normally distributed or the

samples size for each group is at least 30, then

x 1 x 2 has a normal distribution.


Formula for the z test for Comparing Two Means from Independent Populations

H0 :1 2 k (or

k or

k) Note: We often

k 0, but it doesn’t have to be. Test value:

z* (x 1 x 2 ) (1 2 )

12

n1

2

2

n2

(x 1 x 2 )k

12

n1

2

2

n2


The observed difference between the sample means may be due to chance, in which case the null hypothesis will not be rejected. If the difference is statistically significant, the null hypothesis is rejected and the researcher can conclude the population means are different. The same approach to finding critical values and P-values that was used in Section 8-2 will be used here (Table E or Table F with d.f. = ∞).


Example: Dr. Cribari would like to determine if there is a statistically significant difference between her two Math 2830 classes. To make this comparison she will compare the results from exam 1. Class one had 35 students take the exam with a mean of 82.6 and a population standard deviation of 1.41. Class two had 32 students take the exam with a mean of 84 and a population standard deviation of 3.63. Can Dr. Cribari conclude that there is difference in the mean test grades between the two classes at α=0.05? Ho: µ 1 = µ 2 Ho: µ 1 ≥ µ 2 Step 1 State the hypotheses and identify the claim.

0 1 2

1 1 2 CLAIM

:

:

H

H


Step 2 Find the critical value(s) from the appropriate table.

As stated, the problem is giving the population standard deviations. This means that we will be doing a z-test.

Two-sided test critical value = 1.96 =0.05

Step 3 Compute the test value and determine the P-value.

* 1 2 1 2

2 2 2 2

1 2

1 2

( ) ( ) (82.6 84) 0 -2.05

1.41 3.63

35 32

x xz

n n

p-value = 2*.0202 = 0.0404


Step 4 Make the decision to reject or not reject the null hypothesis.

Since the p-value is smaller than our , the null hypothesis is rejected. [OR, Since, our test value, -2.05, falls within the rejection region, the null hypothesis is rejected]

Step 5 Summarize the results.

That is, there is evidence to support the claim that the exam 1 grades differ between the two sections of MATH2830.


Example: A survey found that the average hotel room in New Orleans is $88.42 and the average room rate in Phoenix is $80.61. Assume that the data were obtained from two samples of 50 hotels each and that the (population) standard deviations were $5.62 and $4.83, respectively. At α = 0.01, can it be concluded that the average hotel room in New Orleans costs more than in Phoenix? Step 1 State the hypotheses and identify the claim.


Step 2 Find the critical value(s) from the appropriate table. Step 3 Compute the test value and determine the P-value.


Step 4 Make the decision to reject or not reject the null hypothesis. Step 5 Summarize the results.


Formula for the z Confidence Interval for Difference Between Two Means Assumptions:

1. The data for each group are independent random samples. 2. The data are from normally distributed populations and/or

the sample sizes of the groups are greater than 30. 3. The population standard deviation is (assumed) known.

Formula:

Note: When

n1 and

n2 are at least 30, then

s1 and

s2 can be used

in place of

1 and

2.

2 2

1 21 2 /2

1 2

( )x x zn n


Example: Two brands of cigarettes are selected, and their nicotine content is compared. The data are shown below. Find the 95% confidence interval of the true difference in the means. Brand A Brand B

X1 28.6 mg

X2 32.9 mg

1 5.1 mg

2 4.4 mg

n1 30

n2 40

2 2 2 2

1 21 2 /2

1 2

5.1 4.4( ) (28.6 32.9) 1.96

30 40

( 4.3) 2.278158

(-6.58,-2.02)

x x zn n


At

0.05., is there convincing evidence that the mean amount of nicotine differs between the brands?


Example: For the hotel example, construct a 98% confidence interval of the true difference in the means.


Section 9-2: Testing the Difference Between Two Means of Independent Samples: Using the t Test

Many times the conditions set forth by the z test in Section 9-1 cannot be met (e.g., the population standard deviations are not known). In these cases, a t test is used to test the difference between means when the two samples are independent and when the samples are taken from two normally or approximately normally distributed populations.


Formula for the t Test for Testing the Difference Between Two Means: Independent Samples. Variances are assumed to be unequal:

t (X 1 X 2 ) (1 2 )

s12

n1

s2

2

n2

where degrees of freedom is equal to the smaller of

n1 1 or

n2 1. We will use Table F to find our critical values and our p-values.


WARNING: Your calculator will perform a 2 sample t-test (its #4 under STATS then TESTS). However, it uses a complicated formula to determine the degrees of freedom that will ultimately affect how the calculator deals with confidence intervals and p-values. We will come back to this point at the end of the section.


Example: A real estate agent wishes to determine whether tax assessors and real estate appraisers agree on the values of homes. A random sample of the two groups appraised 10 homes. Is there a significant difference in the values of the homes for each group? Let α = 0.05. Assume the data are from normally distributed populations. Real Estate Appraisers Tax Assessors

X1 $83,256

X2 $88,354

s1 $3256

s2 $2341

n1 10

n2 10 Step 1 State the hypotheses and identify the claim.

H0: 1=2 H1: 12

Sample standard deviations given!

Use a t-test


Step 2 Find the critical value(s) from the appropriate table.

T-test means use the t-table (Table F). We have 9 degrees of freedom since n1=10 and n2=10.

The smallest of n1-1 and n2-1 is 9. Information we need: two-tailed test, =0.05, df=9 T critical value is 2.262


1 2 1 2

2 2 2 2

1 2

1 2

( ) ( ) (83,256 88,354) (0)

3256 2341

10 10

-5098

1268.141

=-4.02

X Xt

s s

n n


-6 -4 -2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

Two tailed

t* = -4.02

t

P(t

)

2 ( 4.02)

2(0.0015)

0.003

p value P t

CRITICAL REGION



The null hypothesis is rejected. This decision can be based on: the fact that the test value (-4.02) is within the critical

region since it is less than -2.262 or the fact that the p-value (0.003) is smaller than =0.05


There is significant evidence that tax assessors and real estate appraisers disagree on the values of homes.


Example: A researcher suggests that male nurses earn more than female nurses. A survey of 16 male nurses and 20 female nurses reports these data. Is there enough evidence to support the claim that male nurses earn more than female nurses? Use α = 0.01. Assume the data are from normally distributed populations. Females Males

X1 = $23,750

X2 = $23,900

s1 = $250

s2 = $300

n1 = 20

n2 = 16 Step 1 State the hypotheses and identify the claim.


Step 2 Find the critical value(s) from the appropriate table. Step 3 Compute the test value and determine the P-value.


Confidence Intervals for the Difference of Two Means: Small Independent Samples Variances assumed to be unequal:

(X 1 X 2 )t 2

s12

n1

s2

2

n2

where d.f. = smaller value of

n1 1 or

n2 1.


WARNING: The way our calculator determines the degrees of freedom is not the same as the book. So you will NOT be able to use your calculator STAT/TESTS function to calculate your confidence interval because you will get a VERY different confidence interval. This is due to the fact that the t-multiplier will be sufficiently different then what the calculator will find.


Example: Let’s find the 95% Confidence Interval for the first problem.

2 22 2

1 21 2 2

1 2

3256 2341( ) (83256 88354) 2.262

10 10

-5098 2868.535

(-7967,-2229)

s sX X t

n n

Example: Let’s find the 99% Confidence Interval for the second problem.


Section 9-3: Testing the Difference Between Two Means: Dependent Samples

So far we have only compared two means when the samples were independent. Samples are considered to be dependent when the subjects are paired or matched in some way.


Examples of paired data:

Each person is measured twice where the 2 measurements measure the same thing but under different conditions

Similar individuals are paired prior to an experiment and each member of a pair receives a different treatment

Two different variables are measured for each individual and there is interest in the amount of difference between the 2 variables


When using paired data, you are interested primarily in the “difference” and not the data itself. When samples are dependent, a special t test for dependent means is used. The test uses the difference in the values of the matched pairs.

IMPORTANT: We cannot use the t test we had learned for a difference in independent means.

To determine whether one set of observations tend to be larger or different than the paired observations, we take the difference between the matched observations and perform analysis on the differences.


Classic example would be studies of weight loss Weight before

Weight after

We are interested in the CHANGE!

An aside: this study also used a placebo group. Why?


Hypotheses:

Right-tailed: H0: D=0 H1: D>0

Left-tailed: H0: D=0 H1: D<0

Two-tailed: H0: D=0 H1: D0

D population mean of differences = 1 – 2

Here,

1 is the mean of the population of the first set of measurements and

2 is the mean of the population of the second set of measurements.

Actually, you can also use 2 1D as long as you are consistent with your statement of hypotheses and calculation of D.


Formulas for the t Test for Dependent Samples

t DD

sD

n

with d.f. =

n1 and

where

D D

n is the mean of the sample of differences and

sD n D2 ( D)2

n(n1)

is the sample standard deviation of the sample of differences.


The good news is we can find the mean of the differences,

D, and the standard deviation of the differences,

sD, using the LIST and STAT functions in your TI-83/84.

1. Go to STAT -> EDIT -> Edit . 2. Enter the first set of observations under L1. 3. Enter the second set of observations under L2. 4. Highlight L3 in list, type L1 – L2 and hit enter. The set of

differences should now be calculated. 5. Go to STAT -> CALC -> 1-Var Stats, hit enter. Type L3

(after 1-Var Stats on your screen) and hit enter. Your calculator will calculate the sample mean and standard deviation for you.


Example: A physical education director claims by taking a special vitamin, a weight lifter can increase his strength. Eight athletes are selected and given a test of strength, using the standard bench press. After 2 weeks of regular training, supplemented with the vitamin, they are tested again. Test the effectiveness of the vitamin regimen at α = 0.05. Each value in these data represents the maximum number of pounds the athlete can bench press. Assume the variable is approximately normally distributed.


Step 1 State the hypotheses and identify the claim.

I will base my differences on: D = strength after – strength before. H0: D=0 H1: D>0

Step 2 Find the critical value(s) from the appropriate table. We have 8 lifters which gives 7 degrees of freedom. Our =0.05. We have a right tailed test critical value will be positive. t critical value = 1.895 (see next page)



Use your calculator to get the mean difference and standard deviation of the differences.

2.375 01.388

4.838

8

D

D

Dt

s

n

p-value = P(t1.388)=0.104 Found using tcdf(1.388,E99,7) on the TI calculator



The null hypothesis is NOT rejected. We can base this decision on either of the two facts: The p-value is larger than = 0.05 The test value (1.388) is smaller than the critical value

(1.895). That is, our test value is within the non-rejection region:

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

Right tailed

Test value = 1.388

t

P(t

) REJECTION REGION

NON-REJECTION REGION



There is not sufficient evidence to support the education director claims by taking a special vitamin, a weight lifter can increase his strength.


Example: A sample of 10 college students in a class were asked how many hours per week they watch TV and how many hours a week they used a computer. Is there a difference in the mean number of hours a college student spends on a computer versus watching TV at α = 0.01? Assume the population of differences is approximately normally distributed.


The data:

Student Comp TV 1 30 2 2 20 1.5 3 10 14 4 10 2 5 10 6 6 0 20 7 35 14 8 20 1 9 2 14 10 5 10


Step 1 State the hypotheses and identify the claim. Step 2 Find the critical value(s) from the appropriate table.


Step 3 Compute the test value and determine the P-value. Step 4 Make the decision to reject or not reject the null hypothesis.


Confidence Interval for the Mean Difference

Dt 2

sD

n where d.f. =

n1

Let’s find the 99% confidence interval for the mean difference of the last example of TV watching vs. Computer Usage.


Section 9-4: Testing the Difference Between Proportions

Let p1 be the proportion of a population having some characteristic of interest.

Similarly, let p2 be the proportion of a different population having that characteristic. We estimate these parameters by taking samples from each population and using the sample proportions as estimates.


Let x1 be the number of observations in sample 1 having the

characteristic of interest and x2 be the number of observations in sample 2 having that characteristic.

The sample proportion for the first sample is

ˆ p 1 x1

n1

and the

sample proportion for the second sample is

ˆ p 2 x2

n2

.

We will learn how to perform a hypothesis test for the difference in population proportions.


Hypotheses: Right-tailed test: H0: p1 = p2 or H0: p1-p2=k H1: p1 > p2 or H0: p1-p2>k Left-tailed test: H0: p1 = p2 or H0: p1-p2=k H1: p1 < p2 or H0: p1-p2<k Two-tailed test: H0: p1 = p2 or H0: p1-p2=k H1: p1 p2 or H0: p1-p2k


Formula for the z Test for Comparing Two Proportions

H0 : p1 p2 0 (or more generally p1-p2 = k) Test value:

z ( ˆ p 1 ˆ p 2 ) (p1 p2 )

p q 1

n1

1

n2

( ˆ p 1 ˆ p 2 )

p q 1

n1

1

n2

where

p x1 x2

n1 n2

n1 ˆ p 1 n2 ˆ p 2

n1 n2

and

q 1p .


What is

p ? We are assuming in the null hypothesis that

p1 p2 p, where p is the value of the common proportion. Under this assumption, we should combine the information from both samples to estimate the common population proportion p.

p is an estimate of p combining the information from both samples.


P-values: (computed as before, depending on the alternative hypothesis)

Right-tailed test:

P(Z z*)

Left-tailed test:

P(Z z*)

Two-tailed test:

2P(Z z* )

Since we are performing a z test we will use Table E for p-values and Table F (d.f. = ∞) for critical values.


Assumptions:

1. The samples are independent random samples

2. All counts must all be at least 5: a.

n1 ˆ p 1 x1 b.

n1 ˆ q 1 n1 x1 c.

n2 ˆ p 2 x2 d.

n2 ˆ q 2 n2 x2


Example: It is believed that a sweetener called xylitol helps prevent ear infections. In a randomized experiment

n1 165 children took a

placebo and 68 of them got ear infections. Another sample of

n2 159 children took xylitol and 46 of them got ear infections. We

believe that the proportion of ear infections in the placebo group will be greater than the xylitol group. Test this hypothesis at α = 0.025. Step 1 State the hypotheses and identify the claim.

H0: p1 = p2 H1: p1 > p2 (CLAIM) Step 2 Find the critical value(s) from the appropriate table.

Test of two proportions, right-tailed positive Z value Based on = 0.025, we get Z = 1.96. (I pulled my critical value from the bottom of Table F)



Let’s find p first: 68 46

0.352165 159

p

.

Using this:

p-value = P(Z2.31) = 0.0104 (I found my p-value using the Z table - Table E).

1 2 1 2

1 2

68 46ˆ ˆ( ) ( ) 165 159

1 11 10.352(1 0.352)

165 159

0.1228132.31

0.0530751

p p p pz

pqn n



Given that our p-value = 0.0104 which is smaller than our =0.025, the null hypothesis is rejected. We could also reach this conclusion by noting that our test value = 2.31 is greater than our critical value = 1.96.


There is significant evidence to support the claim that xylitol helps prevent ear infections. Specifically, infants that took xylitol had a lower proportion of ear infections than infants that were given an placebo.


Example: In a sample of 200 surgeons, 15% thought the government should control health care. In a sample of 200 general practitioners, 21% felt the same way. At α = 0.01, is there a difference in the proportions? Step 1 State the hypotheses and identify the claim. Step 2 Find the critical value(s) from the appropriate table.


Confidence Interval for the Difference Between Two Proportions

( ˆ p 1 ˆ p 2 ) z 2

ˆ p 1 ˆ q 1n1

ˆ p 2 ˆ q 2n2


Example: Let’s find the 95% confidence interval for the xylitol problem.

First, note that 1

68ˆ 0.412

165p , 2

46ˆ 0.289

159p and z/2=1.96

1 1 2 21 2 2

1 2

ˆ ˆ ˆ ˆˆ ˆ( )

0.412(1 0.412) 0.289(1 0.289)(0.412 0.289) 1.96

165 159

0.123 0.103

( 0.020, 0.226)

p q p qp p z

n n

We can say with 95% confidence that infants receiving xylitol have between 0.2% and 2.26% fewer ear infections than those receiving placebo.


Example: Let’s find the 90% confidence interval for the health care problem.


Decision Tree for Deciding Which Hypothesis Test to Use:

Download - CH9: Testing the Difference Between Two Means, …ssantori/MATH2830SP13/Math2830...CH9: Testing the Difference Between Two Means or Two Proportions Santorico - Page 344 Section 9-1

Top Related