comparison of 2 population means goal: to compare 2 populations/treatments wrt a numeric outcome...
TRANSCRIPT
Comparison of 2 Population Means
• Goal: To compare 2 populations/treatments wrt a numeric outcome
• Sampling Design: Independent Samples (Parallel Groups) vs Paired Samples (Crossover Design)
• Data Structure: Normal vs Non-normal
• Sample Sizes: Large (n1,n2>20) vs Small
Independent Samples
• Units in the two samples are different
• Sample sizes may or may not be equal
• Large-sample inference based on Normal Distribution (Central Limit Theorem)
• Small-sample inference depends on distribution of individual outcomes (Normal vs non-Normal)
Parameters/Estimates (Independent Samples)
• Parameter:
• Estimator:
• Estimated standard error:
• Shape of sampling distribution:– Normal if data are normal
– Approximately normal if n1,n2>20
– Non-normal otherwise (typically)
21 xx
2
22
1
21
n
s
n
s
Large-Sample Test of
• Null hypothesis: The population means differ by 0 (which is typically 0):
• Alternative Hypotheses:– 1-Sided: – 2-Sided:
• Test Statistic:
0210 : H
021: AH
021: AH
2
22
1
21
021 )(
ns
ns
xxzobs
Large-Sample Test of • Decision Rule:
– 1-sided alternative
• If zobs z ==> Conclude • If zobs < z ==> Do not reject
– 2-sided alternative
• If zobs z ==> Conclude • If zobs -z ==> Conclude • If -z < zobs < z ==> Do not reject
021: AH
021: AH
Large-Sample Test of
• Observed Significance Level (P-Value)– 1-sided alternative
• P=P(z zobs) (From the std. Normal distribution)
– 2-sided alternative• P=2P( z |zobs| ) (From the std. Normal distribution)
• If P-Value then reject the null hypothesis
021: AH
021: AH
Large-Sample (1-100% Confidence Interval for
• Confidence Coefficient (1-) refers to the proportion of times this rule would provide an interval that contains the true parameter value if it were applied over all possible samples
• Rule:
2
22
1
21
2/21n
s
n
szxx
Large-Sample (1-100% Confidence Interval for
• For 95% Confidence Intervals, z.025=1.96
• Confidence Intervals and 2-sided tests give identical conclusions at same -level:– If entire interval is above 0, conclude – If entire interval is below 0, conclude – If interval contains 0, do not reject =
Example: Vitamin C for Common Cold
• Outcome: Number of Colds During Study Period for Each Student
• Group 1: Given Placebo
• Group 2: Given Ascorbic Acid (Vitamin C)
15512.02.2 111 nsx
20810.09.1 222 nsx
Source: Pauling (1971)
2-Sided Test to Compare Groups
• H0: 12No difference in trt effects)
• HA: 12≠Difference in trt effects)
• Test Statistic:
• Decision Rule (=0.05) – Conclude > 0 since zobs = 25.3 > z.025 = 1.96
3.250119.0
3.0
208)10.0(
155)12.0(
0)9.12.2(22
obsz
95% Confidence Interval for
• Point Estimate:
• Estimated Std. Error:
• Critical Value: z.025 = 1.96
• 95% CI: 0.30 ± 1.96(0.0119) 0.30 ± 0.023
(0.277 , 0.323) Entire interval > 0
3.09.12.221 xx
0119.0208
)10.0(
155
)12.0( 22
Small-Sample Test for Normal Populations (P. 538)
• Case 1: Common Variances (12 = 2
2 = 2)
• Null Hypothesis:• Alternative Hypotheses:
– 1-Sided: – 2-Sided:
• Test Statistic:(where Sp2 is a “pooled” estimate of 2)
0210 : H
021: AH
021: AH
2
)1()1(
11
)(
21
222
2112
21
2
021
nn
snsns
nns
xxt p
p
obs
Small-Sample Test for Normal Populations
• Decision Rule: (Based on t-distribution with =n1+n2-2 df)
– 1-sided alternative• If tobs t, ==> Conclude • If tobs < t ==> Do not reject
– 2-sided alternative• If tobs t , ==> Conclude • If tobs -t ==> Conclude • If -t < tobs < t ==> Do not reject
Small-Sample Test for Normal Populations
• Observed Significance Level (P-Value)• Special Tables Needed, Printed by Statistical Software
Packages
– 1-sided alternative
• P=P(t tobs) (From the t distribution)
– 2-sided alternative
• P=2P( t |tobs| ) (From the t distribution)
• If P-Value then reject the null hypothesis
Small-Sample (1-100% Confidence Interval for Normal Populations
• Confidence Coefficient (1-) refers to the proportion of times this rule would provide an interval that contains the true parameter value if it were applied over all possible samples
• Rule:
• Interpretations same as for large-sample CI’s
21
2,2/21
11
nnstxx p
Small-Sample Inference for Normal Populations (P.529)
• Case 2: 12 2
2
• Don’t pool variances:
• Use “adjusted” degrees of freedom (Satterthwaites’ Approximation) :
2
22
1
21
21 n
s
n
ss
yy
11
*
2
2
2
22
1
2
1
21
2
2
22
1
21
n
ns
n
ns
ns
ns
Example - Maze Learning (Adults/Children)
• Groups: Adults (n1=14) / Children (n2=10)
• Outcome: Average # of Errors in Maze Learning Task
• Raw Data on next slide
Adults (i=1) Children (i=2)Mean 13.28 18.28Std Dev 4.47 9.93Sample Size 14 10
• Conduct a 2-sided test of whether mean scores differ
• Construct a 95% Confidence Interval for true difference
Source: Gould and Perrin (1916)
Example - Maze Learning (Adults/Children)Name Group Trials Errors AverageH 1 41 728 17.76W 1 25 333 13.32Mac 1 33 453 13.73McG 1 31 528 17.03 Group n Mean Std DevL 1 41 335 8.17 1 14 13.28 4.47R 1 48 553 11.52 2 10 18.28 9.93Hv 1 24 217 9.04Hy 1 32 711 22.22F 1 46 839 18.24Wd 1 47 473 10.06Rh 1 35 532 15.20D 1 69 538 7.80Hg 1 27 213 7.89Hp 1 27 375 13.89Hl 2 42 254 6.05McS 2 89 1559 17.52Lin 2 38 1089 28.66B 2 20 254 12.70N 2 49 599 12.22T 2 40 520 13.00J 2 50 828 16.56Hz 2 40 516 12.90Lev 2 54 2171 40.20K 2 58 1331 22.95
Example - Maze LearningCase 1 - Equal Variances
)2.1,2.11(20.600.5)99.2(074.200.5:%95
074.2||:
67.199.2
00.5
101
141
15.52
28.1828.13:
15.5221014
)93.9)(110()47.4)(114(
22,025.
222
CI
ttRR
tTS
s
obs
obs
p
H0: HA: 0 ( = 0.05)
No significant difference between 2 age groups
Example - Maze LearningCase 2 - Unequal Variances
)36.2,36.12(36.700.5)36.3(19.200.5:%95
19.2||:
49.136.3
00.5
10)93.9(
14)47.4(
28.1828.13:
63.1196.10
46.127
9)86.9(
13)43.1(
86.943.1
86.910
)93.9(43.1
14
)47.4(
63.11,025.
22
22
2*
2
2
22
2
1
21
CI
ttRR
tTS
n
S
n
S
obs
obs
H0: HA: 0 ( = 0.05)
No significant difference between 2 age groups
SPSS Output
Group Statistics
14 13.2761 4.46784 1.19408
10 18.2759 9.93279 3.14102
GROUPAdult
Child
AVE_ERRN Mean Std. Deviation
Std. ErrorMean
Independent Samples Test
4.420 .047 -1.672 22 .109 -4.9998 2.99017 -11.20101 1.20145
-1.488 11.621 .163 -4.9998 3.36034 -12.34787 2.34831
Equal variancesassumed
Equal variancesnot assumed
AVE_ERRF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Small Sample Test to Compare Two Medians - Nonnormal Populations
• Two Independent Samples (Parallel Groups)• Procedure (Wilcoxon Rank-Sum Test):
– Rank measurements across samples from smallest (1) to largest (n1+n2). Ties take average ranks.
– Obtain the rank sum for each group (W1 ,W2 )
– 1-sided tests:Conclude HA: M1 > M2 if W2 W0
– 2-sided tests:Conclude HA: M1 M2 if min(W1, W2) W0
– Values of W0 are given in many texts for various sample sizes and significance levels. P-values are printed by statistical software packages.
Normal Approximation (Supp PP5-7)
• Under the null hypothesis of no difference in the two groups (let W=W1 from last slide):
• A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution
21211
12
)1(
2
)1(nnN
NnnNnWW
12/)1(
2/)1(
21
1
Nnn
NnWWz
W
Wobs
Example - Maze LearningSubject Group Trials Tot_Err Ave_Err Rank Adult Child Rank*Ad Rank*ChHl 2 42 254 6.05 1 0 1 0 1D 1 69 538 7.80 2 1 0 2 0Hg 1 27 213 7.89 3 1 0 3 0L 1 41 335 8.17 4 1 0 4 0Hv 1 24 217 9.04 5 1 0 5 0Wd 1 47 473 10.06 6 1 0 6 0R 1 48 553 11.52 7 1 0 7 0N 2 49 599 12.22 8 0 1 0 8B 2 20 254 12.70 9 0 1 0 9Hz 2 40 516 12.90 10 0 1 0 10T 2 40 520 13.00 11 0 1 0 11W 1 25 333 13.32 12 1 0 12 0Mac 1 33 453 13.73 13 1 0 13 0Hp 1 27 375 13.89 14 1 0 14 0Rh 1 35 532 15.20 15 1 0 15 0J 2 50 828 16.56 16 0 1 0 16McG 1 31 528 17.03 17 1 0 17 0McS 2 89 1559 17.52 18 0 1 0 18H 1 41 728 17.76 19 1 0 19 0F 1 46 839 18.24 20 1 0 20 0Hy 1 32 711 22.22 21 1 0 21 0K 2 58 1331 22.95 22 0 1 0 22Lin 2 38 1089 28.66 23 0 1 0 23Lev 2 54 2171 40.20 24 0 1 0 24
158 142W1 W2
Example - Maze Learning
00.11.17
17
1.17
175158
08.1767.29112
)124)(10(14
12
)1(
1752
)124(14
2
)1(
:Identical are onsDistributi:Under
241014
21
1
0
2121
W
W
W
W
WZ
Nnn
Nn
H
nnNnn
As with the t-test, no evidence of population group differences
Computer Output - SPSS
Ranks
14 11.29 158.00
10 14.20 142.00
24
GROUPAdult
Child
Total
AVE_ERRN Mean Rank Sum of Ranks
Test Statisticsb
53.000
158.000
-.995
.320
.341a
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Exact Sig. [2*(1-tailedSig.)]
AVE_ERR
Not corrected for ties.a.
Grouping Variable: GROUPb.
Inference Based on Paired Samples (Crossover Designs)
• Setting: Each treatment is applied to each subject or pair (preferably in random order)
• Data: di is the difference in scores (Trt1-Trt2) for subject (pair) i
• Parameter: D - Population mean difference
• Sample Statistics:
21
2
21
1 dd
n
i id
n
i i ssn
dds
n
dd
Test Concerning D
• Null Hypothesis: H0:D=0 (almost always 0)
• Alternative Hypotheses: – 1-Sided: HA: D > 0
– 2-Sided: HA: D 0
• Test Statistic:
ns
dt
d
obs
Test Concerning D
Decision Rule: (Based on t-distribution with =n-1 df)1-sided alternative
If tobs t, ==> Conclude DIf tobs < t ==> Do not reject D
2-sided alternativeIf tobs t , ==> Conclude DIf tobs -t ==> Conclude DIf -t < tobs < t ==> Do not reject D
Confidence Interval for D
n
std d
,2/
Example Antiperspirant Formulations
• Subjects - 20 Volunteers’ armpits
• Treatments - Dry Powder vs Powder-in-Oil
• Measurements - Average Rating by Judges– Higher scores imply more disagreeable odor
• Summary Statistics (Raw Data on next slide):
20248.015.0 nsd d
Source: E. Jungermann (1974)
Example Antiperspirant Formulations
Subject Dry Powder Powder-in-Oil Difference1 2 1.9 0.12 2.8 2.4 0.43 1.3 1.5 -0.24 1.8 1.8 05 1.9 1.8 0.16 2.8 2.4 0.47 2 2.2 -0.28 1.5 1.5 09 1.9 1.7 0.2
10 2.9 2.8 0.111 2.9 2.7 0.212 2.3 1.5 0.813 2.3 2.5 -0.214 3.6 3.2 0.415 2.2 2.1 0.116 2.1 1.9 0.217 2.5 2.6 -0.118 2.4 2 0.419 3.1 2.9 0.220 2 1.9 0.1
0.15 Mean0.248151058 Std Dev
Example Antiperspirant Formulations
)266.0,034.0(116.015.0)0555(.093.215.0
:for CI 95%
2.70)2P(tvalue
093.2:
70.20555.
15.0
20248.0
15.0:
differ) effectson (Formulati 0:
effects)n formulatioin difference (No 0:
1,025.
19,025.120,025.
0
n
std
P
tttRR
ns
dtTS
H
H
dnD
obs
dobs
DA
D
Evidence that scores are higher (more unpleasant) for the dry powder (formulation 1)
Small-Sample Test For Nonnormal Data
• Paired Samples (Crossover Design)• Procedure (Wilcoxon Signed-Rank Test)
– Compute Differences di (as in the paired t-test) and obtain their absolute values (ignoring 0s)
– Rank the observations by |di| (smallest=1), averaging ranks for ties
– Compute W+ and W-, the rank sums for the positive and negative differences, respectively
– 1-sided tests:Conclude HA: M1 > M2 if W- T0
– 2-sided tests:Conclude HA: M1 M2 if min(W+, W- ) T0
– Values of T0 are given in many texts for various sample sizes and significance levels. P-values printed by statistical software packages.
Normal Approximation (Supp PP18-21)
• Under the null hypothesis of no difference in the two groups :
• A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution
24
)12)(1(
4
)1(
nnnnnWW
24/)12)(1(
4/)1(
nnn
nnWWz
W
Wobs
Example - Caffeine and Endurance
• Step 1: Take absolute values of differences (eliminating 0s)
• Step 2: Rank the absolute differences (averaging ranks for ties)
• Step 3: Sum Ranks for positive and negative true differences
• Subjects: 9 well-trained cyclists
• Treatments: 13mg Caffeine (Condition 1) vs 5mg (Condition 2)
• Measurements: Minutes Until Exhaustion
• This is subset of larger study (we’ll see later)
Source: Pasman, et al (1995)
Example - Caffeine and Endurance
Cyclist mg13 mg5 mg13-mg51 37.55 42.47 -4.922 59.30 85.15 -25.853 79.12 63.20 15.924 58.33 52.10 6.235 70.54 66.20 4.346 69.47 73.25 -3.787 46.48 44.50 1.988 66.35 57.17 9.189 36.20 35.05 1.15
Original Data
Example - Caffeine and Endurance
Cyclist mg13 mg5 mg13-mg5 abs(diff)1 37.55 42.47 -4.92 4.922 59.30 85.15 -25.85 25.853 79.12 63.20 15.92 15.924 58.33 52.10 6.23 6.235 70.54 66.20 4.34 4.346 69.47 73.25 -3.78 3.787 46.48 44.50 1.98 1.988 66.35 57.17 9.18 9.189 36.20 35.05 1.15 1.15
Absolute Differences
Cyclist mg13 mg5 mg13-mg5 abs(diff) rank9 36.20 35.05 1.15 1.15 17 46.48 44.50 1.98 1.98 26 69.47 73.25 -3.78 3.78 35 70.54 66.20 4.34 4.34 41 37.55 42.47 -4.92 4.92 54 58.33 52.10 6.23 6.23 68 66.35 57.17 9.18 9.18 73 79.12 63.20 15.92 15.92 82 59.30 85.15 -25.85 25.85 9
Ranked Absolute Differences
W+ = 1+2+4+6+7+8=28
W- = 3+5+9=17
Example - Caffeine and Endurance
Under the null hypothesis of no difference in the two groups:
65.044.8
5.5
44.8
5.2228
44.824
1710
24
)118)(19(9
24
)12)(1(
5.224
90
4
)19(9
4
)1(
W
Wobs
W
W
Wz
nnn
nn
There is no evidence that endurance times differ for the 2 doses (we will see later that both are higher than no dose)
SPSS OutputRanks
6a 4.67 28.00
3b 5.67 17.00
0c
9
Negative Ranks
Positive Ranks
Ties
Total
MG5 - MG13N Mean Rank Sum of Ranks
MG5 < MG13a.
MG5 > MG13b.
MG5 = MG13c.
Test Statisticsb
-.652a
.515
Z
Asymp. Sig. (2-tailed)
MG5 - MG13
Based on positive ranks.a.
Wilcoxon Signed Ranks Testb.
Note that SPSS is taking MG5-MG13, while we used MG13-MG5
Data Sources• Pauling, L. (1971). “The Significance of the Evidence about Ascorbic
Acid and the Common Cold,” Proceedings of the National Academies of Sciences of the United States of America, 11: 2678-2681
• Gould, M.C. and F.A.C. Perrin (1916). “A Comparison of the Factors Involved in the Maze Learning of Human Adults and Children,” Journal of Experimental Psychology, 1:122-???
• Jungermann, E. (1974). “Antiperspirants: New Trends in Formulation and Testing Technology,” Journal of the Society of Cosmetic Chemists 25:621-638
• Pasman, W.J., M.A. van Baak, A.E. Jeukendrup, and A. de Haan (1995). “The Effect of Different Dosages of Caffeine on Endurance Performance Time,” International Journal of Sports Medicine, 16:225-230