12.5 differences between means ( ’s known) two populations: ( 1, 1 ) & ( 2, 2 ) two...

12.5 Differences between Means (’s known)

Two populations: (1, 1) & (2, 2) Two samples: one from each population Two sample means and sample sizes: n1 & n2 Compare two population means: H0: 1-2= (=0 in most cases) Alternatives: 1-2>; 1-2<; 1-2

1x 2x

Let’s go through a two sided alternative

H0: 1-2=0 vs HA: 1-2≠0 Reject H0 if is too far from zero in

either direction. How far from zero might be if 1-

2=0? Sampling distribution of is

asymptotically normal with mean 0 and standard deviation

We need to know

)( 21 xx

)( 21 xx

)( 21 xx

1 2x x

1 2x x

Fact: If the sample means are from

independent samples, then

1 2 1 2

1 2 1 2

2 2 2

2 22 2 2 21 2

1 11 2

x x x x

x x x x SE SEn n

Thus under certain assumptions:

1 2

2 21 2

1 2

( ) 0x xz

n n

Correspondingly, a confidence interval for 1-2 is

2

22

1

21

2/21 )(nn

zxx

Assumptions

1 & 2 are known Normal populations or large sample

sizes Under null hypothesis

is (asymptotically) standard normal

2

22

1

21

21 )(

nn

xxz

Rejection Regions:

Alternative Hypotheses

1-2> 1-2< 1-2

Rejection Regions

z>z z<-z z>z/2 or

z<-z/2

Example 12.4

Two labs measure the specific gravity of metal. On average do the two labs give the same answer?

1 -- Population mean by lab1

2 -- Population mean by lab2

H0: 1=2 vs HA: 12 1=0.02, n1=20, 2=0.03, n2=25,

032.21 x020.22 x

95% Confidence Interval

from –0.014 to 0.016

2 21 2

1 2 0.0251 2

2 2

( )

0.02 0.03(2.032 2.020) 1.96

20 250.012 1.96 (0.0075)

x x zn n

Two-tailed Hypotheses Test

Two sample test

Rejection region: |Z|>z0.025=1.96

Conclusion: Don’t reject H0.

1 2

1 2 0.0121.6

0.0075x x

x xz

Rejection Regions


HA: 1>2

HA: 1<2 HA: 12

Rejection Regions

z>z z<-z z>z/2 or

z<-z/2

Exercise An investigation of two kinds of photocopying

equipment showed that a random sample of 60 failures of one kind of equipment took on the average 84.2 minutes to repair, while a random sample of 60 failures of another kind of equipment took on the average 91.6 minutes to repair. If, on the basis of collateral information, it can be assumed that 1=2=19.0 minutes for such data, test at the 0.02 level of significance whether the difference between these two sample means is significant.

12.6 Differences Between Means (unknown equal variances) Large samples n130; n230

Small samples 1. 1=2

2. 12

Large Samples

n130; n230 Estimate 1 and 2 by s1 and s2

Set

2

22

1

21

21 )(

ns

ns

xxz

Rejection Regions


HA: 1>2

HA: 1<2 HA: 12

Rejection Regions

z>z z<-z z>z/2 or

z<-z/2

Small Samples

1=2= unknown Two populations are normal Standard error

Estimate the common variance

212

22

1

21 11

21 nnnnxx

Pooled standard deviation

Using both s12 and s2

2 to estimate 2, we combine these estimates, weighting each by its d.f.. The combined estimate of 2 is sp

2, the pooled estimate:

Estimate by sp

2

)1()1(

21

222

2112

nn

snsnsp

Two-Sample T-test

T-test (t distribution with df=n1+n2-2)

100(1-)% CI

21

21

11

)(

nns

xxt

p

212/21

11)(

nnstxx p

Hypothesized 1- 2

Example 12.5

Compare blood pressures Two populations: common

variance =0.05 n1=10, s1=16.2, n2=12, s2=14.3,

1251 x

1372 x

6.23021210

)3.14)(112()2.16)(110( 222

ps

CI & test

sp=15.2 df=10+12-2=20 Critical value t0.025=2.086 t statistic: reject H0 if |t|>2.086

Conclusion? Don’t Reject.

CI: -122.086(6.51)=-12 13.6 -1.6 to 25.6

84.151.6

12

121

101

2.15

137125

t

What happens when variances are not equal?

Testing: H0: 1-2=δ. Normal population 1 and 2 are not necessarily equal 1 and 2 unknown

1 2 1 2

1 2 1 2

2 2 2

2 2 2 22 2 1 2 1 2

1 2 1 2

estimated by

x x x x

x x x x

s s

n n n n

Two sample t-test with unequal variances

1 2

2 21 2

1 2

x xt

s s

n n

d.f. =min(n1-1, n2-1)

Exercise In a department store’s study designed to test

whether or not the mean balance outstanding on 30-day charge accounts is the same in its two suburban branch stores, random samples yielded the following results:

Use the 0.05 level of significance to test the null hypothesis 1-2=0.

1 1 1

2 2 2

80 $64.20 s $16.00

100 $71.41 s $22.13

n x

n x

12.7 Paired Data

12

3

4

5

6

T=top water zinc concentration (mg/L)B=bottom water zinc (mg/L)

1 2 3 4 5 6Top 0.415 0.238 0.390 0.410 0.605 0.609Bottom 0.430 0.266 0.567 0.531 0.7070.716

1982 study of trace metals in South Indian River. 6 random locations

One of the first things to do when analyzing data is to PLOT the data

This is not a useful way to plot the data. There is not a clear distinction between bottom water and top water zinc—even though Bottom>Top at all 6 locations.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Zinc

Top Bottom

A better way

0.2

0.3

0.4

0.5

0.6

0.7

Zinc

Top Bottom

Connect points in the same pair.

A better way

0

0.2

0.4

0.6

0.8

0 0.2 0.4 0.6 0.8

Bottom=Top

The plot suggests that Bottom>Top. Is it true?

That is equivalent to ask: is it true that difference>0?

1 2 3 4 5 6

Top 0.4150.2380.3900.4100.6050.609Bottom 0.4300.2660.5670.5310.7070.716D=B-T 0.0150.0280.1770.1210.1020.107

Ho: D=0 vs HA: D>0

First check the assumption that the population is normal

Normal Pl ot

0

0. 05

0. 1

0. 15

0. 2

- 2 - 1 0 1 2

Expected Z

Orde

red

diff

eren

ce(x

)

Ser i es1

Doing a one-sided test

Ho: D=0 vs HA: D>0

6

0.092 0.0923.68

0.0250.061/ 6

D D

D Dt

S

t0.05 at 5 d.f. is 2.015. So anything greater than 2.015 will be an evidence against H0.We reject H0: B-T=0 in favor of HA: B-T>0.

Another example

The average weekly losses of man-hours due to accidents in 10 industrial plants before and after installation of an elaborate safety program:

Plants 1 2 3 4 5 6 7 8 9 10 Before 45 73 46 124 33 57 83 34 26 17 After 36 60 44 119 35 51 77 29 24 11diff(B-A) 9 13 2 5 -2 6 6 5 2 6

Is the safety program effective? (level=0.05)

Two Populations: Before and After

Normal? Independent?

No, No

Normal Probability Plots

Small sizes Skew to right

somehow

-1 0 1

Quantiles of Standard Normal

20

40

60

80

10

01

20

be

fore

-1 0 1


20

40

60

80

10

01

20

aft

er

Normal Probability Plot for Difference

Looks better

-1 0 1


05

10

diff

Consider the Differences

Paired Observations:before and after the installation of safety program are from the same plants (dependent)

Data from different plants may be independent

Diff: 9 13 2 5 -2 6 6 5 2 6

Set up a Test—Paired T-Test

‘ effective’ means the program reduces the accidents, i.e., before > after (D>0)

=difference of average accidents H0: D=0 vs HA: D>0The procedure is the same as the one-sample t-test

Df=n-1ns

xt

D

D

/

Rejection Regions for Paired T-test


D> D< D

Rejection Regions

t>t t<-t t>t/2 or

t<-t/2

Paired t-test

One-tailed test Critical value: df=9, t0.05=1.833 Sample mean & standard deviation:

t-statistic: Conclusion: reject H0 since

t=4.03>1.833

03.410/08.4

02.5

/

ns

xt

D

D 08.4;2.5 DD sx

12.5 differences between means ( ’s known) two populations: ( 1, 1 ) & ( 2, 2 ) two...

Documents

unknown slide

z slide

s p slide

set slide

standard normal slide

reject h

common variance slide

sample means