comparison of 2 population means goal: to compare 2 populations/treatments wrt a numeric outcome...

Comparison of 2 Population Means

• Goal: To compare 2 populations/treatments wrt a numeric outcome

• Sampling Design: Independent Samples (Parallel Groups) vs Paired Samples (Crossover Design)

• Data Structure: Normal vs Non-normal

• Sample Sizes: Large (n1,n2>20) vs Small

Independent Samples

• Units in the two samples are different

• Sample sizes may or may not be equal

• Large-sample inference based on Normal Distribution (Central Limit Theorem)

• Small-sample inference depends on distribution of individual outcomes (Normal vs non-Normal)

Parameters/Estimates (Independent Samples)

• Parameter:

• Estimator:

• Estimated standard error:

• Shape of sampling distribution:– Normal if data are normal

– Approximately normal if n1,n2>20

– Non-normal otherwise (typically)

21 xx

2

22

1

21

n

s

n

s

Large-Sample Test of

• Null hypothesis: The population means differ by 0 (which is typically 0):

• Alternative Hypotheses:– 1-Sided: – 2-Sided:

• Test Statistic:

0210 : H

021: AH

021: AH

2

22

1

21

021 )(

ns

ns

xxzobs

Large-Sample Test of • Decision Rule:

– 1-sided alternative

• If zobs z ==> Conclude • If zobs < z ==> Do not reject


• If zobs z ==> Conclude • If zobs -z ==> Conclude • If -z < zobs < z ==> Do not reject

021: AH

021: AH

Large-Sample Test of

• Observed Significance Level (P-Value)– 1-sided alternative

• P=P(z zobs) (From the std. Normal distribution)

– 2-sided alternative• P=2P( z |zobs| ) (From the std. Normal distribution)

• If P-Value then reject the null hypothesis

021: AH

021: AH

Large-Sample (1-100% Confidence Interval for

• Confidence Coefficient (1-) refers to the proportion of times this rule would provide an interval that contains the true parameter value if it were applied over all possible samples

• Rule:

2

22

1

21

2/21n

s

n

szxx

Large-Sample (1-100% Confidence Interval for

• For 95% Confidence Intervals, z.025=1.96

• Confidence Intervals and 2-sided tests give identical conclusions at same -level:– If entire interval is above 0, conclude – If entire interval is below 0, conclude – If interval contains 0, do not reject =

Example: Vitamin C for Common Cold

• Outcome: Number of Colds During Study Period for Each Student

• Group 1: Given Placebo

• Group 2: Given Ascorbic Acid (Vitamin C)

15512.02.2 111 nsx

20810.09.1 222 nsx

Source: Pauling (1971)

2-Sided Test to Compare Groups

• H0: 12No difference in trt effects)

• HA: 12≠Difference in trt effects)

• Test Statistic:

• Decision Rule (=0.05) – Conclude > 0 since zobs = 25.3 > z.025 = 1.96

3.250119.0

3.0

208)10.0(

155)12.0(

0)9.12.2(22

obsz

95% Confidence Interval for

• Point Estimate:

• Estimated Std. Error:

• Critical Value: z.025 = 1.96

• 95% CI: 0.30 ± 1.96(0.0119) 0.30 ± 0.023

(0.277 , 0.323) Entire interval > 0

3.09.12.221 xx

0119.0208

)10.0(

155

)12.0( 22

Small-Sample Test for Normal Populations (P. 538)

• Case 1: Common Variances (12 = 2

2 = 2)

• Null Hypothesis:• Alternative Hypotheses:

– 1-Sided: – 2-Sided:

• Test Statistic:(where Sp2 is a “pooled” estimate of 2)

0210 : H

021: AH

021: AH

2

)1()1(

11

)(

21

222

2112

21

2

021

nn

snsns

nns

xxt p

p

obs

Small-Sample Test for Normal Populations

• Decision Rule: (Based on t-distribution with =n1+n2-2 df)

– 1-sided alternative• If tobs t, ==> Conclude • If tobs < t ==> Do not reject

– 2-sided alternative• If tobs t , ==> Conclude • If tobs -t ==> Conclude • If -t < tobs < t ==> Do not reject

Small-Sample Test for Normal Populations

• Observed Significance Level (P-Value)• Special Tables Needed, Printed by Statistical Software

Packages


• P=P(t tobs) (From the t distribution)


• P=2P( t |tobs| ) (From the t distribution)

• If P-Value then reject the null hypothesis

Small-Sample (1-100% Confidence Interval for Normal Populations

• Confidence Coefficient (1-) refers to the proportion of times this rule would provide an interval that contains the true parameter value if it were applied over all possible samples

• Rule:

• Interpretations same as for large-sample CI’s

21

2,2/21

11

nnstxx p

Small-Sample Inference for Normal Populations (P.529)

• Case 2: 12 2

2

• Don’t pool variances:

• Use “adjusted” degrees of freedom (Satterthwaites’ Approximation) :

2

22

1

21

21 n

s

n

ss

yy

11

*

2

2

2

22

1

2

1

21

2

2

22

1

21

n

ns

n

ns

ns

ns

Example - Maze Learning (Adults/Children)

• Groups: Adults (n1=14) / Children (n2=10)

• Outcome: Average # of Errors in Maze Learning Task

• Raw Data on next slide

Adults (i=1) Children (i=2)Mean 13.28 18.28Std Dev 4.47 9.93Sample Size 14 10

• Conduct a 2-sided test of whether mean scores differ

• Construct a 95% Confidence Interval for true difference

Source: Gould and Perrin (1916)

Example - Maze Learning (Adults/Children)Name Group Trials Errors AverageH 1 41 728 17.76W 1 25 333 13.32Mac 1 33 453 13.73McG 1 31 528 17.03 Group n Mean Std DevL 1 41 335 8.17 1 14 13.28 4.47R 1 48 553 11.52 2 10 18.28 9.93Hv 1 24 217 9.04Hy 1 32 711 22.22F 1 46 839 18.24Wd 1 47 473 10.06Rh 1 35 532 15.20D 1 69 538 7.80Hg 1 27 213 7.89Hp 1 27 375 13.89Hl 2 42 254 6.05McS 2 89 1559 17.52Lin 2 38 1089 28.66B 2 20 254 12.70N 2 49 599 12.22T 2 40 520 13.00J 2 50 828 16.56Hz 2 40 516 12.90Lev 2 54 2171 40.20K 2 58 1331 22.95

Example - Maze LearningCase 1 - Equal Variances

)2.1,2.11(20.600.5)99.2(074.200.5:%95

074.2||:

67.199.2

00.5

101

141

15.52

28.1828.13:

15.5221014

)93.9)(110()47.4)(114(

22,025.

222

CI

ttRR

tTS

s

obs

obs

p

H0: HA: 0 ( = 0.05)

No significant difference between 2 age groups

Example - Maze LearningCase 2 - Unequal Variances

)36.2,36.12(36.700.5)36.3(19.200.5:%95

19.2||:

49.136.3

00.5

10)93.9(

14)47.4(

28.1828.13:

63.1196.10

46.127

9)86.9(

13)43.1(

86.943.1

86.910

)93.9(43.1

14

)47.4(

63.11,025.

22

22

2*

2

2

22

2

1

21

CI

ttRR

tTS

n

S

n

S

obs

obs

H0: HA: 0 ( = 0.05)

No significant difference between 2 age groups

SPSS Output

Group Statistics

14 13.2761 4.46784 1.19408

10 18.2759 9.93279 3.14102

GROUPAdult

Child

AVE_ERRN Mean Std. Deviation

Std. ErrorMean

Independent Samples Test

4.420 .047 -1.672 22 .109 -4.9998 2.99017 -11.20101 1.20145

-1.488 11.621 .163 -4.9998 3.36034 -12.34787 2.34831

Equal variancesassumed

Equal variancesnot assumed

AVE_ERRF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Small Sample Test to Compare Two Medians - Nonnormal Populations

• Two Independent Samples (Parallel Groups)• Procedure (Wilcoxon Rank-Sum Test):

– Rank measurements across samples from smallest (1) to largest (n1+n2). Ties take average ranks.

– Obtain the rank sum for each group (W1 ,W2 )

– 1-sided tests:Conclude HA: M1 > M2 if W2 W0

– 2-sided tests:Conclude HA: M1 M2 if min(W1, W2) W0

– Values of W0 are given in many texts for various sample sizes and significance levels. P-values are printed by statistical software packages.

Normal Approximation (Supp PP5-7)

• Under the null hypothesis of no difference in the two groups (let W=W1 from last slide):

• A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution

21211

12

)1(

2

)1(nnN

NnnNnWW

12/)1(

2/)1(

21

1

Nnn

NnWWz

W

Wobs

Example - Maze LearningSubject Group Trials Tot_Err Ave_Err Rank Adult Child Rank*Ad Rank*ChHl 2 42 254 6.05 1 0 1 0 1D 1 69 538 7.80 2 1 0 2 0Hg 1 27 213 7.89 3 1 0 3 0L 1 41 335 8.17 4 1 0 4 0Hv 1 24 217 9.04 5 1 0 5 0Wd 1 47 473 10.06 6 1 0 6 0R 1 48 553 11.52 7 1 0 7 0N 2 49 599 12.22 8 0 1 0 8B 2 20 254 12.70 9 0 1 0 9Hz 2 40 516 12.90 10 0 1 0 10T 2 40 520 13.00 11 0 1 0 11W 1 25 333 13.32 12 1 0 12 0Mac 1 33 453 13.73 13 1 0 13 0Hp 1 27 375 13.89 14 1 0 14 0Rh 1 35 532 15.20 15 1 0 15 0J 2 50 828 16.56 16 0 1 0 16McG 1 31 528 17.03 17 1 0 17 0McS 2 89 1559 17.52 18 0 1 0 18H 1 41 728 17.76 19 1 0 19 0F 1 46 839 18.24 20 1 0 20 0Hy 1 32 711 22.22 21 1 0 21 0K 2 58 1331 22.95 22 0 1 0 22Lin 2 38 1089 28.66 23 0 1 0 23Lev 2 54 2171 40.20 24 0 1 0 24

158 142W1 W2

Example - Maze Learning

00.11.17

17

1.17

175158

08.1767.29112

)124)(10(14

12

)1(

1752

)124(14

2

)1(

:Identical are onsDistributi:Under

241014

21

1

0

2121

W

W

W

W

WZ

Nnn

Nn

H

nnNnn

As with the t-test, no evidence of population group differences

Computer Output - SPSS

Ranks

14 11.29 158.00

10 14.20 142.00

24

GROUPAdult

Child

Total

AVE_ERRN Mean Rank Sum of Ranks

Test Statisticsb

53.000

158.000

-.995

.320

.341a

Mann-Whitney U

Wilcoxon W

Z

Asymp. Sig. (2-tailed)

Exact Sig. [2*(1-tailedSig.)]

AVE_ERR

Not corrected for ties.a.

Grouping Variable: GROUPb.

Inference Based on Paired Samples (Crossover Designs)

• Setting: Each treatment is applied to each subject or pair (preferably in random order)

• Data: di is the difference in scores (Trt1-Trt2) for subject (pair) i

• Parameter: D - Population mean difference

• Sample Statistics:

21

2

21

1 dd

n

i id

n

i i ssn

dds

n

dd

Test Concerning D

• Null Hypothesis: H0:D=0 (almost always 0)

• Alternative Hypotheses: – 1-Sided: HA: D > 0

– 2-Sided: HA: D 0

• Test Statistic:

ns

dt

d

obs

Test Concerning D

Decision Rule: (Based on t-distribution with =n-1 df)1-sided alternative

If tobs t, ==> Conclude DIf tobs < t ==> Do not reject D

2-sided alternativeIf tobs t , ==> Conclude DIf tobs -t ==> Conclude DIf -t < tobs < t ==> Do not reject D

Confidence Interval for D

n

std d

,2/

Example Antiperspirant Formulations

• Subjects - 20 Volunteers’ armpits

• Treatments - Dry Powder vs Powder-in-Oil

• Measurements - Average Rating by Judges– Higher scores imply more disagreeable odor

• Summary Statistics (Raw Data on next slide):

20248.015.0 nsd d

Source: E. Jungermann (1974)


Subject Dry Powder Powder-in-Oil Difference1 2 1.9 0.12 2.8 2.4 0.43 1.3 1.5 -0.24 1.8 1.8 05 1.9 1.8 0.16 2.8 2.4 0.47 2 2.2 -0.28 1.5 1.5 09 1.9 1.7 0.2

10 2.9 2.8 0.111 2.9 2.7 0.212 2.3 1.5 0.813 2.3 2.5 -0.214 3.6 3.2 0.415 2.2 2.1 0.116 2.1 1.9 0.217 2.5 2.6 -0.118 2.4 2 0.419 3.1 2.9 0.220 2 1.9 0.1

0.15 Mean0.248151058 Std Dev


)266.0,034.0(116.015.0)0555(.093.215.0

:for CI 95%

2.70)2P(tvalue

093.2:

70.20555.

15.0

20248.0

15.0:

differ) effectson (Formulati 0:

effects)n formulatioin difference (No 0:

1,025.

19,025.120,025.

0

n

std

P

tttRR

ns

dtTS

H

H

dnD

obs

dobs

DA

D

Evidence that scores are higher (more unpleasant) for the dry powder (formulation 1)

Small-Sample Test For Nonnormal Data

• Paired Samples (Crossover Design)• Procedure (Wilcoxon Signed-Rank Test)

– Compute Differences di (as in the paired t-test) and obtain their absolute values (ignoring 0s)

– Rank the observations by |di| (smallest=1), averaging ranks for ties

– Compute W+ and W-, the rank sums for the positive and negative differences, respectively

– 1-sided tests:Conclude HA: M1 > M2 if W- T0

– 2-sided tests:Conclude HA: M1 M2 if min(W+, W- ) T0

– Values of T0 are given in many texts for various sample sizes and significance levels. P-values printed by statistical software packages.

Normal Approximation (Supp PP18-21)

• Under the null hypothesis of no difference in the two groups :

• A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution

24

)12)(1(

4

)1(

nnnnnWW

24/)12)(1(

4/)1(

nnn

nnWWz

W

Wobs

Example - Caffeine and Endurance

• Step 1: Take absolute values of differences (eliminating 0s)

• Step 2: Rank the absolute differences (averaging ranks for ties)

• Step 3: Sum Ranks for positive and negative true differences

• Subjects: 9 well-trained cyclists

• Treatments: 13mg Caffeine (Condition 1) vs 5mg (Condition 2)

• Measurements: Minutes Until Exhaustion

• This is subset of larger study (we’ll see later)

Source: Pasman, et al (1995)


Cyclist mg13 mg5 mg13-mg51 37.55 42.47 -4.922 59.30 85.15 -25.853 79.12 63.20 15.924 58.33 52.10 6.235 70.54 66.20 4.346 69.47 73.25 -3.787 46.48 44.50 1.988 66.35 57.17 9.189 36.20 35.05 1.15

Original Data


Cyclist mg13 mg5 mg13-mg5 abs(diff)1 37.55 42.47 -4.92 4.922 59.30 85.15 -25.85 25.853 79.12 63.20 15.92 15.924 58.33 52.10 6.23 6.235 70.54 66.20 4.34 4.346 69.47 73.25 -3.78 3.787 46.48 44.50 1.98 1.988 66.35 57.17 9.18 9.189 36.20 35.05 1.15 1.15

Absolute Differences

Cyclist mg13 mg5 mg13-mg5 abs(diff) rank9 36.20 35.05 1.15 1.15 17 46.48 44.50 1.98 1.98 26 69.47 73.25 -3.78 3.78 35 70.54 66.20 4.34 4.34 41 37.55 42.47 -4.92 4.92 54 58.33 52.10 6.23 6.23 68 66.35 57.17 9.18 9.18 73 79.12 63.20 15.92 15.92 82 59.30 85.15 -25.85 25.85 9

Ranked Absolute Differences

W+ = 1+2+4+6+7+8=28

W- = 3+5+9=17


Under the null hypothesis of no difference in the two groups:

65.044.8

5.5

44.8

5.2228

44.824

1710

24

)118)(19(9

24

)12)(1(

5.224

90

4

)19(9

4

)1(

W

Wobs

W

W

Wz

nnn

nn

There is no evidence that endurance times differ for the 2 doses (we will see later that both are higher than no dose)

SPSS OutputRanks

6a 4.67 28.00

3b 5.67 17.00

0c

9

Negative Ranks

Positive Ranks

Ties

Total

MG5 - MG13N Mean Rank Sum of Ranks

MG5 < MG13a.

MG5 > MG13b.

MG5 = MG13c.

Test Statisticsb

-.652a

.515

Z

Asymp. Sig. (2-tailed)

MG5 - MG13

Based on positive ranks.a.

Wilcoxon Signed Ranks Testb.

Note that SPSS is taking MG5-MG13, while we used MG13-MG5

Data Sources• Pauling, L. (1971). “The Significance of the Evidence about Ascorbic

Acid and the Common Cold,” Proceedings of the National Academies of Sciences of the United States of America, 11: 2678-2681

• Gould, M.C. and F.A.C. Perrin (1916). “A Comparison of the Factors Involved in the Maze Learning of Human Adults and Children,” Journal of Experimental Psychology, 1:122-???

• Jungermann, E. (1974). “Antiperspirants: New Trends in Formulation and Testing Technology,” Journal of the Society of Cosmetic Chemists 25:621-638

• Pasman, W.J., M.A. van Baak, A.E. Jeukendrup, and A. de Haan (1995). “The Effect of Different Dosages of Caffeine on Endurance Performance Time,” International Journal of Sports Medicine, 16:225-230

comparison of 2 population means goal: to compare 2 populations/treatments wrt a numeric outcome...

Documents