unit 8: inference for two samples - university of...

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 1

Statistics 571: Statistical MethodsRamón V. León

Unit 8: Inference for Two Samples


Introductory Remarks• A majority of statistical studies whether experimental or

observational are comparative• Simplest type of comparative study compares two

populations• Two principal designs for comparative studies

– Using independent samples– Using matched pairs

• Graphical methods for informal comparisons• Formal comparisons of means and variances of normal

populations– Confidence intervals– Hypothesis tests


Independent Samples DesignExample: Compare Control Group to Treatment Group •Children are randomly assigned to two groups. One group is taught using an old teaching method and the other group using a new teaching method. To avoid one possible source of confounding both methods are taught by the same teacher.

1

2

1 2

1 2

:Sample 1: , ,...,

Sample 2: , ,...,n

n

x x x

y y y

Independent samples design

•Independent sample design relies on random assignment to make the two groups equal (on the average) on all attributes except for the treatment used (treatment factor).

•The two samples are independent

Possibly different numbers


Matched Pairs DesignExample: Comparison of Two Methods of Teaching Spelling.•Pretest the children and match pairs of children with similar test scores.

•Randomly assign one child of each pair to the control group andthe other to the treatment group

•The pretest score is called a blocking factor

yny2y1Sample 2:xnx2x1Sample 1:n…21Pair:

The x’s are the changes in the test scores of the children taught by the old method.The y’s are the changes in the test scores of the children taught by the new method.

The same number


Pros and Cons of Each Design• In the independent sample design the two groups may not be

quite equal on some attribute, especially if the sample sizes are small, because randomization assures equality only on the average. Does one group have a higher pretest score than the other? Difference in the groups could be the result of this difference not the treatment factor.

• The matched pair design enables more precise comparisons to be made between treatment groups because of smaller experimental error

• However, it can be difficult to form matched pairs, especially when dealing with human subjects

• The two types of designs can be used in observational studies where the two groups are not formed by randomization. Then a cause and effect relationship cannot be established.


Hospitalization Cost: Independent Sample Design


Graphical Methods for Comparing Independent Samples: Side-by-Side Box Plots

Outliers present and variation seems different


Side-by-Side Box Plots:Log Costs

•No outliers for LogeCost•Assumption of equal variance appears reasonable for the two groups in the log scale


Another Graphical

Methods for Comparing

Two Independent

Samples

( ) ( )

Plot of the order statistics ordered pairs ( , )

which are the i quantiles

n+1of the respective samples

i ix y

Plot suggests that treatment group costs are less than control group costs. But is this true?

Book discusses how to prepare this graph when the two samples are not of the same size.

Q-Q Plot


Graphical Displays of Data from Matched Pairs

• Plot the pairs (xi, yi) in a scatter plot. Using the 45° line as a reference, one can judge whether the two sets of values are similar or whether one set tends to be larger than the other

• Plots of the differences or the ratios of the pairs may prove to be useful

• Will show a JMP plot that is very informative• A Q-Q plot is meaningless for paired data because the

same quantiles based on the order observations do not, in general, come from the same pair.


Comparing Means of Two Populations:Independent Samples Design

(Large Samples Case)

1 21 2 1 2

12 2

2 1 2

Suppose that the observations , ,..., and , ,...,

are independent random samples from two populations with means

and and variances and . Both means and variancesare assumed to

n nx x x y y y

µ

µ σ σ

1

2 1 2 1

2

be unknown. The goal is to compare and in terms of their difference - . We assume that and are large (say 30).

nn

µµ µ µ

>


Comparing Means of Two Populations:Independent Samples Design

1 22 21 2

1 2

1 22 21 1 2 2

1 2

( ) ( ) ( )

( ) ( ) ( )

Therefore the standarized r.v.( ) has mean = 0 and variance = 1

If and are large, then Z is approximately (0,1) byth

E X Y E X E Y

Var X Y Var X Var Yn n

X YZn n

n n N

µ µ

σ σ

µ µσ σ

− = − = −

− = + = +

− − −=

+

e Central Limit Theorem though we did not assume the samples came from normal populations. (We also use fact that the difference of independent normal r.v.'s is also normal.)


Large Sample (Approximate) 100(1-α)% CI for µ1−µ2

( ) ( )2 2 2 21 2 1 2

2 1 2 21 2 1 2

2 2Note has been substituted for because samples arelarge, i.e., bigger than 30.

i i

s s s sx y z x y zn n n n

s

α αµ µ

σ

− − + ≤ − ≤ − + +

Example 8.2: Two large calculus classes of 150 students taught by two different professors (99% CI for difference of means)

2 2 2 21 2

.0051 2

(15) (12)(75 70) 2.576 [0.960,9.040]150 150

s sx y zn n

− ± + = − ± + =

Note interval does not contain zero so there is a α =.01 level significant difference between the two professors


Large Sample (Approximate) Test of Hypothesis

0 1 2 0 1 1 2 0 0: vs. : (Typically 0)H Hµ µ δ µ µ δ δ− = − ≠ =

02 21 1 2 2

( )Test statistics: x yzs n s n

δ− −=

+

Example 8.2: Final exam grades for large calculus classes of 150 students taught by two different professors. (Same test used on both classes.)

02 2 2 21 1 2 2

( ) (75 70) 0 3.188(15) 150 (12) 150

[ 3.188] 2[1 (3.188)] .0014Result is significant at the .01 level.

x yzs n s n

P value P z

δ

α

− − − −= = =

+ +

− = ≥ = − Φ =

=


Inference for Small Samples2 21 2Case 1: Variances and assumed equal.σ σ

2 22 22 1 1 2 2

1 2 1 22 2 2

1 2

Pooled estimate of the common variance:( ) ( )( 1) ( 1)

( 1) ( 1) 2Note: ( ) / 2 if sample sizes are equal

i iX X Y Yn S n SSn n n n

S S S

− + −− + −= =

− + − + −

= +

∑ ∑

1 21 2

1 2

( ) has -distribution with 2 d.f.1 1

X YT t n nS n n

µ µ− − −= + −

+

Assumption of normal populations is important since we cannot invoke the CLT


Inference for Small Sample: Confidence Intervals and Hypothesis Tests

1 2 1 22, 2 1 2 2, 21 2 1 2

Two-sided 100(1- )% CI is given by:

1 1 1 1n n n nx y t s x y t s

n n n nα α

α

µ µ+ − + −− − + ≤ − ≤ − + +

1 2

0 1 2 0 1 1 2 0

0

1 2

0 2, 2

Test of Hypothesis: : vs. :

Test statistics: 1 1

Reject if n n

H Hx yts

n n

H t t α

µ µ δ µ µ δδ

+ −

− = − ≠− −

=+

>

2 21 2Case 1: Variances and assumed equal.σ σ


JMP: Checking Assumptions for Log Costs


Normal Plots by Group

After log transformation, control group data seems normal


Normal Plots by Group

After log transformation, treatment group data seems normal


Testing for Normality

Control:

Treatment:


Hospitalization Cost Example•Work with log cost since then the normal assumption holds•Assumption of equal variance is also reasonable as seen bythe side-by-side box plots of loge cost (Slide 8)

2 2

Pooled estimate of the common variance:

29(1.167) 29(0.905) 1.045 with 58 d.f.58

s += =

58,.0251 2

95% CI:

1 1 1 1[ (8.251 8.084) 2.002 1.045 [ 0.373,0.707]30 30

x y t sn n

− ± + = − ± × + = −

Difference in cost in logarithmic scale are not significant. Contrast this conclusion with apparent difference seen on the Q-Q plot in Figure 8.1 (Slide 9)

SE( - ) 0.2698x y =

Contains zero


Interpretation of Difference in Means on the Log Scale

Mean (log Cost) = Median (log Cost) = log (Median Cost)

Because distribution of log cost is symmetric

Because the log preserves ordering

0.373 (log ) (log ) 0.707 0.373 log( ) log( ) 0.707

0.373 log 0.707

.689 exp( 0.373) exp(0.707) 2.028

C T

C T

C

T

C

T

Mean Cost Mean CostMedian Cost Median Cost

Median CostMedian Cost

Median CostMedian Cost

− ≤ − ≤− ≤ − ≤

− ≤ ≤

= − ≤ ≤ =

95% confidence interval for the ratio of median costs

This Interpretation is not in your textbook


JMP Analysis of Hospitalization Cost Example


Means Diamonds and x-Axis ProportionalA means diamond illustrates a sample mean and 95% confidence interval, as shown by the schematic in "Means Diamonds and X-Proportional Options". The line across each diamond represents the group mean. The vertical span of each diamond represents the 95%confidence interval for each group. Overlap marks are drawn above and below the group mean. For groups with equal sample sizes, overlapping marks indicate that the two group means are not significantly different at the 95% confidence level. If the X-Axis Proportional display option is in effect, the horizontal extent of each group along the x-axis (the horizontal size of the diamond) is proportional to the sample size of each level of the xvariable. It follows that the narrower diamonds are usually the taller ones because fewer data points yield a less precise estimate of the group mean. The confidence interval computation assumes that variances are equal across observations. Therefore, the height of the confidence interval (diamond) is proportional to the reciprocal of the square root of the number of observations in the group.


Inference for Small Samples2 21 2Case 2: Variances and unequal.σ σ

1 22 2

1 2

1 2

( ) does not have a Student - distributionX YT tS Sn n

µ µ− − −=

+

T is not a pivotal quantity since it depends on ratio of sigmas.

However, when n1 and n2 are large T has an approximate N(0,1) distribution



( ) ( )

1 22 2

1 2

1 2

21 2

2 21 1 2 2

2 22 21 2

1 21 2

For small samples( ) has an approximately -distribution

( )with degrees of freedom( 1) ( 1)

where SEM( ) and SEM( )

X YT tS Sn n

w ww n w n

s sw x w yn n

µ µ

ν

− − −=

+

+=

− + −

= = = =

Note: d.f. are estimated from the data and are not a function of the samples sizes alone

Note: ν is not usually an integer but is rounded down to the nearest integer



1 2

2 2 2 21 2 1 2

, 2 1 2 , 21 2 1 2

Approximate 100(1- )% two-sided CI for :

s s s sx y t x y tn n n nν α ν α

α µ µ

µ µ

−

− − + ≤ − ≤ − − +

0 1 2 0 1 1 2 0

02 21 1 1 1

0 , 2

Test statistics for : vs. :

is

Reject if .

H Hx yt

s n s n

H t tν α

µ µ δ µ µ δδ

− = − ≠− −

=+

>


Hospitalization Costs: Inference Using Separate Variances

2 2 2 2 221 2 1 2

1 21 2

2

2 2 2Since we have 2

where is the pooled sample variance. Hence,

s s s s sn n n sn n n n n

s

+ = = + = × = =

0 0 02 2 21 1 1 1

0.6191 12

That is, with this is the same as the with pooled variance

x y x y x yts n ns n s n s n

t t

δ δ δ− − − − − −= = = =

++

equal sample sizes

The textbook calculates the d.f. to be ν = 54.6 = 54Recall that assuming equal variance the d.f. were n1+ n2 - 2 = 58

Since the t distributions with 54 and 58 d.f. do not differ by much the results not assuming equal variances do not differ by much from those gotten assuming equal variances.

Recall that we are working with log cost


Hospitalization Costs: Inference Using Separate Variances – Conclusions

• Result with equal variances and unequal variance do not differ by much because– The two sample variances are not too different– The two sample sizes are equal

• In other cases there can be considerable loss in d.f. as a result of using separate variance estimates

• If there is any doubt about the equality of variances do not usethe pooled variance method since this method is not robustagainst the violation of the assumption of equal variances. This is particularly so when sample sizes are unequal.


Testing for the Equality of VariancesSection 8.4 covers the classical F test for the equality of two variances and associated confidence intervals. However, this method is not robust against departures from normality. For example, p-values can be off by a factor of 10 if the distributions have shorter or longer tails than the normal.A robust alternative is Levene’s Test. His test applies the two-sample t-test to the absolute value of the difference of each observation and the group mean

1 1 1

2 2 2

| |, 1, 2, ,| |, 1, 2, ,

i

i

Y Y i nY Y i n

− = ⋅⋅⋅

− = ⋅⋅⋅This method works well even though these absolute deviations are not independent.

In the Brown-Forsythe test the response is the absolute value of the difference of each observation and the group median


JMP Analysis Without Assuming Equal Variances


JMP 4 Analysis Without Assuming Equal Variances

Note that (.6181)2 = 0.3820

Classical Test for equality of several variances is not robust against departure from the normality assumption

These tests can be used when there aremore than two populations

ν = 54.6 = 54


Independent Sample Design: Sample Size Determination Assuming Equal Variances

0 1 2 1 1 22

21 2

: 0 vs. : 0

( )2

H H

z zn n n α β

µ µ µ µ

σδ

− = − ≠

+ = = =

Hospital cost example: Assume σ = 1 in loge scale costs (consistent with data)Assume we want to detect change of a factor of two in the median cost with at least .90 power. The alpha level is .05.

21 2

1

2 .025 .10

2

Then (log ) (log ) log log 2 0.693

1, 1.960, and 1.282

1.0(1.960 1.282)2 43.75 44.693

median CostE Y E Ymedian Cost

z z z z

n

α β

δ

σ

= − = =

= = = = =

+ = =

Because we assume a known variance this n is a slight underestimate of sample size

Smallest difference of practical importance that we want to detect


Sample Size Determination With JMP


Sample Size Determination With JMP: Method 1

.693/2= δ/2


Sample Size Determination With JMP

Sample sizehere is 2n


Sample Size Determination With JMP:Exploring a Range of Standard Deviations


Sample Size Determination With JMPExploring a Range of Standard Deviations


Sample Size Determination With JMP: A Simpler Way


Sample Size Determination With JMP: A Simpler Way

Extra Paramsis only for multi-factor designs. Leave this field zero in simple cases.

2n


Matched Pairs Design

Example: Clinical study to compare two methods of measuringcardiac output discussed in Example 4.11. Method A is invasivewhile Method B is noninvasive. To control for difference betweenpersons, both methods are used on each person participating in the study. The purpose is to compare the two methods in terms ofthe mean difference.


Cardiac Output Data


Cardiac Output JMP FileYou can downloadthis file in the coursehome page followingthe Textbook Data Setslink.

Methods used for matched pair are the one-sample methods already learned applied to the A-B data.


Cardiac Output JMP Analysis


Testing Equality of Means

Test that the meandifference is zero

Example 8.6 in the textbook has a slight rounding error in the calculation of the mean difference

Distribution JMP platform:


Confidence Interval for Difference of Means


JMP’s Matched Pairs Platform: Another Way to Compare Matched Pairs


JMP’s Matched Pairs Platform

X axis measures how different the pairs are. The more scatter in this axis the more likely the results are to generalize to other persons outside the sample.

Significant difference between the methods because significance bands do not contain 0.


Statistical Justification of Matched Pairs Design


Sample Size Determination2

22

( ) (One-Sided Test)

( ) (Two-Sided Test)

D

D

z zn

z zn

α β

α β

σδ

σδ

+ =

+ =

•One needs a planning value for σD

•This formulas come from the one-sample formulas applied to the differences


Comparing Variances of Two Populations

•Application arises when comparing instrument precision oruniformities of products. Also when checking assumptions.

•The method discussed in the book (F test) is applicable only under the assumption of normality of the data. The F test is highly sensitive to even modest departures from normality.

• In case of nonnormal data there are nonparametric and other robust methods for comparing data dispersion. We recallsome supported by JMP.


Example 8.7 (Hospitalization Costs: Test of Equality of Variances)


Comparing Variances with JMP

These tests are alternatives to the F test used inExample 8.7 on Page 288. These tests can be usedwith more than two samples. Levene’s test is relatively robust against violations of the normalityassumption . See JMP’s help for details.


Comparing Variances of Two Populations: F Test

1

1

21 2 1 1

21 2 1 1

Independent sample design:Sample 1: , ,..., is a random sample from ( , )

Sample 2: , ,..., is a random sample from ( , )n

n

x x x N

y y y N

µ σ

µ σ2 2

1 11 22 2

2 2

has an F distribution 1 and 1 d.f. respectivelySF n nS

σσ

= − −


An Important Industrial Application:Example 8.8

Do the two labs have equal measurement precision?


Example 8.8: Calculation Details

d.f. = (12 - 1) + (12 - 1) + (12 - 1) = 33

d.f. = (6 - 1) + (6 - 1) + (6 - 1) = 15, ,

, ,1

Used the fact that1

m nn m

ff α

α−

=

unit 8: inference for two samples - university of...

Documents