unit 8: inference for two samples - university of...

59
7/14/2004 Unit 8 - Stat 571 - Ramón V. León 1 Statistics 571: Statistical Methods Ramón V. León Unit 8: Inference for Two Samples

Upload: others

Post on 19-Feb-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 1

Statistics 571: Statistical MethodsRamón V. León

Unit 8: Inference for Two Samples

Page 2: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 2

Introductory Remarks• A majority of statistical studies whether experimental or

observational are comparative• Simplest type of comparative study compares two

populations• Two principal designs for comparative studies

– Using independent samples– Using matched pairs

• Graphical methods for informal comparisons• Formal comparisons of means and variances of normal

populations– Confidence intervals– Hypothesis tests

Page 3: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 3

Independent Samples DesignExample: Compare Control Group to Treatment Group •Children are randomly assigned to two groups. One group is taught using an old teaching method and the other group using a new teaching method. To avoid one possible source of confounding both methods are taught by the same teacher.

1

2

1 2

1 2

:Sample 1: , ,...,

Sample 2: , ,...,n

n

x x x

y y y

Independent samples design

•Independent sample design relies on random assignment to make the two groups equal (on the average) on all attributes except for the treatment used (treatment factor).

•The two samples are independent

Possibly different numbers

Page 4: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 4

Matched Pairs DesignExample: Comparison of Two Methods of Teaching Spelling.•Pretest the children and match pairs of children with similar test scores.

•Randomly assign one child of each pair to the control group andthe other to the treatment group

•The pretest score is called a blocking factor

yny2y1Sample 2:xnx2x1Sample 1:n…21Pair:

The x’s are the changes in the test scores of the children taught by the old method.The y’s are the changes in the test scores of the children taught by the new method.

The same number

Page 5: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 5

Pros and Cons of Each Design• In the independent sample design the two groups may not be

quite equal on some attribute, especially if the sample sizes are small, because randomization assures equality only on the average. Does one group have a higher pretest score than the other? Difference in the groups could be the result of this difference not the treatment factor.

• The matched pair design enables more precise comparisons to be made between treatment groups because of smaller experimental error

• However, it can be difficult to form matched pairs, especially when dealing with human subjects

• The two types of designs can be used in observational studies where the two groups are not formed by randomization. Then a cause and effect relationship cannot be established.

Page 6: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 6

Hospitalization Cost: Independent Sample Design

Page 7: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 7

Graphical Methods for Comparing Independent Samples: Side-by-Side Box Plots

Outliers present and variation seems different

Page 8: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 8

Side-by-Side Box Plots:Log Costs

•No outliers for LogeCost•Assumption of equal variance appears reasonable for the two groups in the log scale

Page 9: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 9

Another Graphical

Methods for Comparing

Two Independent

Samples

( ) ( )

Plot of the order statistics ordered pairs ( , )

which are the i quantiles

n+1of the respective samples

i ix y

Plot suggests that treatment group costs are less than control group costs. But is this true?

Book discusses how to prepare this graph when the two samples are not of the same size.

Q-Q Plot

Page 10: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10

Graphical Displays of Data from Matched Pairs

• Plot the pairs (xi, yi) in a scatter plot. Using the 45° line as a reference, one can judge whether the two sets of values are similar or whether one set tends to be larger than the other

• Plots of the differences or the ratios of the pairs may prove to be useful

• Will show a JMP plot that is very informative• A Q-Q plot is meaningless for paired data because the

same quantiles based on the order observations do not, in general, come from the same pair.

Page 11: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 11

Comparing Means of Two Populations:Independent Samples Design

(Large Samples Case)

1 21 2 1 2

12 2

2 1 2

Suppose that the observations , ,..., and , ,...,

are independent random samples from two populations with means

and and variances and . Both means and variancesare assumed to

n nx x x y y y

µ

µ σ σ

1

2 1 2 1

2

be unknown. The goal is to compare and in terms of their difference - . We assume that and are large (say 30).

nn

µµ µ µ

>

Page 12: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 12

Comparing Means of Two Populations:Independent Samples Design

1 22 21 2

1 2

1 22 21 1 2 2

1 2

( ) ( ) ( )

( ) ( ) ( )

Therefore the standarized r.v.( ) has mean = 0 and variance = 1

If and are large, then Z is approximately (0,1) byth

E X Y E X E Y

Var X Y Var X Var Yn n

X YZn n

n n N

µ µ

σ σ

µ µσ σ

− = − = −

− = + = +

− − −=

+

e Central Limit Theorem though we did not assume the samples came from normal populations. (We also use fact that the difference of independent normal r.v.'s is also normal.)

Page 13: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 13

Large Sample (Approximate) 100(1-α)% CI for µ1−µ2

( ) ( )2 2 2 21 2 1 2

2 1 2 21 2 1 2

2 2Note has been substituted for because samples arelarge, i.e., bigger than 30.

i i

s s s sx y z x y zn n n n

s

α αµ µ

σ

− − + ≤ − ≤ − + +

Example 8.2: Two large calculus classes of 150 students taught by two different professors (99% CI for difference of means)

2 2 2 21 2

.0051 2

(15) (12)(75 70) 2.576 [0.960,9.040]150 150

s sx y zn n

− ± + = − ± + =

Note interval does not contain zero so there is a α =.01 level significant difference between the two professors

Page 14: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 14

Large Sample (Approximate) Test of Hypothesis

0 1 2 0 1 1 2 0 0: vs. : (Typically 0)H Hµ µ δ µ µ δ δ− = − ≠ =

02 21 1 2 2

( )Test statistics: x yzs n s n

δ− −=

+

Example 8.2: Final exam grades for large calculus classes of 150 students taught by two different professors. (Same test used on both classes.)

02 2 2 21 1 2 2

( ) (75 70) 0 3.188(15) 150 (12) 150

[ 3.188] 2[1 (3.188)] .0014Result is significant at the .01 level.

x yzs n s n

P value P z

δ

α

− − − −= = =

+ +

− = ≥ = − Φ =

=

Page 15: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 15

Inference for Small Samples2 21 2Case 1: Variances and assumed equal.σ σ

2 22 22 1 1 2 2

1 2 1 22 2 2

1 2

Pooled estimate of the common variance:( ) ( )( 1) ( 1)

( 1) ( 1) 2Note: ( ) / 2 if sample sizes are equal

i iX X Y Yn S n SSn n n n

S S S

− + −− + −= =

− + − + −

= +

∑ ∑

1 21 2

1 2

( ) has -distribution with 2 d.f.1 1

X YT t n nS n n

µ µ− − −= + −

+

Assumption of normal populations is important since we cannot invoke the CLT

Page 16: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 16

Inference for Small Sample: Confidence Intervals and Hypothesis Tests

1 2 1 22, 2 1 2 2, 21 2 1 2

Two-sided 100(1- )% CI is given by:

1 1 1 1n n n nx y t s x y t s

n n n nα α

α

µ µ+ − + −− − + ≤ − ≤ − + +

1 2

0 1 2 0 1 1 2 0

0

1 2

0 2, 2

Test of Hypothesis: : vs. :

Test statistics: 1 1

Reject if n n

H Hx yts

n n

H t t α

µ µ δ µ µ δδ

+ −

− = − ≠− −

=+

>

2 21 2Case 1: Variances and assumed equal.σ σ

Page 17: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 17

JMP: Checking Assumptions for Log Costs

Page 18: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 18

Normal Plots by Group

After log transformation, control group data seems normal

Page 19: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 19

Normal Plots by Group

After log transformation, treatment group data seems normal

Page 20: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 20

Testing for Normality

Control:

Treatment:

Page 21: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 21

Hospitalization Cost Example•Work with log cost since then the normal assumption holds•Assumption of equal variance is also reasonable as seen bythe side-by-side box plots of loge cost (Slide 8)

2 2

Pooled estimate of the common variance:

29(1.167) 29(0.905) 1.045 with 58 d.f.58

s += =

58,.0251 2

95% CI:

1 1 1 1[ (8.251 8.084) 2.002 1.045 [ 0.373,0.707]30 30

x y t sn n

− ± + = − ± × + = −

Difference in cost in logarithmic scale are not significant. Contrast this conclusion with apparent difference seen on the Q-Q plot in Figure 8.1 (Slide 9)

SE( - ) 0.2698x y =

Contains zero

Page 22: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 22

Interpretation of Difference in Means on the Log Scale

Mean (log Cost) = Median (log Cost) = log (Median Cost)

Because distribution of log cost is symmetric

Because the log preserves ordering

0.373 (log ) (log ) 0.707 0.373 log( ) log( ) 0.707

0.373 log 0.707

.689 exp( 0.373) exp(0.707) 2.028

C T

C T

C

T

C

T

Mean Cost Mean CostMedian Cost Median Cost

Median CostMedian Cost

Median CostMedian Cost

− ≤ − ≤− ≤ − ≤

− ≤ ≤

= − ≤ ≤ =

95% confidence interval for the ratio of median costs

This Interpretation is not in your textbook

Page 23: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 23

JMP Analysis of Hospitalization Cost Example

Page 24: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 24

JMP Analysis of Hospitalization Cost Example

Page 25: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 25

JMP Analysis of Hospitalization Cost Example

Page 26: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 26

Means Diamonds and x-Axis ProportionalA means diamond illustrates a sample mean and 95% confidence interval, as shown by the schematic in "Means Diamonds and X-Proportional Options". The line across each diamond represents the group mean. The vertical span of each diamond represents the 95%confidence interval for each group. Overlap marks are drawn above and below the group mean. For groups with equal sample sizes, overlapping marks indicate that the two group means are not significantly different at the 95% confidence level. If the X-Axis Proportional display option is in effect, the horizontal extent of each group along the x-axis (the horizontal size of the diamond) is proportional to the sample size of each level of the xvariable. It follows that the narrower diamonds are usually the taller ones because fewer data points yield a less precise estimate of the group mean. The confidence interval computation assumes that variances are equal across observations. Therefore, the height of the confidence interval (diamond) is proportional to the reciprocal of the square root of the number of observations in the group.

Page 27: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 27

JMP Analysis of Hospitalization Cost Example

Page 28: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 28

Inference for Small Samples2 21 2Case 2: Variances and unequal.σ σ

1 22 2

1 2

1 2

( ) does not have a Student - distributionX YT tS Sn n

µ µ− − −=

+

T is not a pivotal quantity since it depends on ratio of sigmas.

However, when n1 and n2 are large T has an approximate N(0,1) distribution

Page 29: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 29

Inference for Small Samples2 21 2Case 2: Variances and unequal.σ σ

( ) ( )

1 22 2

1 2

1 2

21 2

2 21 1 2 2

2 22 21 2

1 21 2

For small samples( ) has an approximately -distribution

( )with degrees of freedom( 1) ( 1)

where SEM( ) and SEM( )

X YT tS Sn n

w ww n w n

s sw x w yn n

µ µ

ν

− − −=

+

+=

− + −

= = = =

Note: d.f. are estimated from the data and are not a function of the samples sizes alone

Note: ν is not usually an integer but is rounded down to the nearest integer

Page 30: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 30

Inference for Small Samples2 21 2Case 2: Variances and unequal.σ σ

1 2

2 2 2 21 2 1 2

, 2 1 2 , 21 2 1 2

Approximate 100(1- )% two-sided CI for :

s s s sx y t x y tn n n nν α ν α

α µ µ

µ µ

− − + ≤ − ≤ − − +

0 1 2 0 1 1 2 0

02 21 1 1 1

0 , 2

Test statistics for : vs. :

is

Reject if .

H Hx yt

s n s n

H t tν α

µ µ δ µ µ δδ

− = − ≠− −

=+

>

Page 31: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 31

Hospitalization Costs: Inference Using Separate Variances

2 2 2 2 221 2 1 2

1 21 2

2

2 2 2Since we have 2

where is the pooled sample variance. Hence,

s s s s sn n n sn n n n n

s

+ = = + = × = =

0 0 02 2 21 1 1 1

0.6191 12

That is, with this is the same as the with pooled variance

x y x y x yts n ns n s n s n

t t

δ δ δ− − − − − −= = = =

++

equal sample sizes

The textbook calculates the d.f. to be ν = 54.6 = 54Recall that assuming equal variance the d.f. were n1+ n2 - 2 = 58

Since the t distributions with 54 and 58 d.f. do not differ by much the results not assuming equal variances do not differ by much from those gotten assuming equal variances.

Recall that we are working with log cost

Page 32: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 32

Hospitalization Costs: Inference Using Separate Variances – Conclusions

• Result with equal variances and unequal variance do not differ by much because– The two sample variances are not too different– The two sample sizes are equal

• In other cases there can be considerable loss in d.f. as a result of using separate variance estimates

• If there is any doubt about the equality of variances do not usethe pooled variance method since this method is not robustagainst the violation of the assumption of equal variances. This is particularly so when sample sizes are unequal.

Page 33: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 33

Testing for the Equality of VariancesSection 8.4 covers the classical F test for the equality of two variances and associated confidence intervals. However, this method is not robust against departures from normality. For example, p-values can be off by a factor of 10 if the distributions have shorter or longer tails than the normal.A robust alternative is Levene’s Test. His test applies the two-sample t-test to the absolute value of the difference of each observation and the group mean

1 1 1

2 2 2

| |, 1, 2, ,| |, 1, 2, ,

i

i

Y Y i nY Y i n

− = ⋅⋅⋅

− = ⋅⋅⋅This method works well even though these absolute deviations are not independent.

In the Brown-Forsythe test the response is the absolute value of the difference of each observation and the group median

Page 34: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 34

JMP Analysis Without Assuming Equal Variances

Page 35: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 35

JMP 4 Analysis Without Assuming Equal Variances

Note that (.6181)2 = 0.3820

Classical Test for equality of several variances is not robust against departure from the normality assumption

These tests can be used when there aremore than two populations

ν = 54.6 = 54

Page 36: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 36

Independent Sample Design: Sample Size Determination Assuming Equal Variances

0 1 2 1 1 22

21 2

: 0 vs. : 0

( )2

H H

z zn n n α β

µ µ µ µ

σδ

− = − ≠

+ = = =

Hospital cost example: Assume σ = 1 in loge scale costs (consistent with data)Assume we want to detect change of a factor of two in the median cost with at least .90 power. The alpha level is .05.

21 2

1

2 .025 .10

2

Then (log ) (log ) log log 2 0.693

1, 1.960, and 1.282

1.0(1.960 1.282)2 43.75 44.693

median CostE Y E Ymedian Cost

z z z z

n

α β

δ

σ

= − = =

= = = = =

+ = =

Because we assume a known variance this n is a slight underestimate of sample size

Smallest difference of practical importance that we want to detect

Page 37: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 37

Sample Size Determination With JMP

Page 38: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 38

Sample Size Determination With JMP: Method 1

.693/2= δ/2

Page 39: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 39

Sample Size Determination With JMP

Sample sizehere is 2n

Page 40: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 40

Sample Size Determination With JMP:Exploring a Range of Standard Deviations

Page 41: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 41

Sample Size Determination With JMPExploring a Range of Standard Deviations

Page 42: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 42

Sample Size Determination With JMP: A Simpler Way

Page 43: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 43

Sample Size Determination With JMP: A Simpler Way

Extra Paramsis only for multi-factor designs. Leave this field zero in simple cases.

2n

Page 44: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 44

Matched Pairs Design

Example: Clinical study to compare two methods of measuringcardiac output discussed in Example 4.11. Method A is invasivewhile Method B is noninvasive. To control for difference betweenpersons, both methods are used on each person participating in the study. The purpose is to compare the two methods in terms ofthe mean difference.

Page 45: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 45

Cardiac Output Data

Page 46: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 46

Cardiac Output JMP FileYou can downloadthis file in the coursehome page followingthe Textbook Data Setslink.

Methods used for matched pair are the one-sample methods already learned applied to the A-B data.

Page 47: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 47

Cardiac Output JMP Analysis

Page 48: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 48

Testing Equality of Means

Test that the meandifference is zero

Example 8.6 in the textbook has a slight rounding error in the calculation of the mean difference

Distribution JMP platform:

Page 49: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 49

Confidence Interval for Difference of Means

Page 50: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 50

JMP’s Matched Pairs Platform: Another Way to Compare Matched Pairs

Page 51: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 51

JMP’s Matched Pairs Platform

X axis measures how different the pairs are. The more scatter in this axis the more likely the results are to generalize to other persons outside the sample.

Significant difference between the methods because significance bands do not contain 0.

Page 52: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 52

Statistical Justification of Matched Pairs Design

Page 53: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 53

Sample Size Determination2

22

( ) (One-Sided Test)

( ) (Two-Sided Test)

D

D

z zn

z zn

α β

α β

σδ

σδ

+ =

+ =

•One needs a planning value for σD

•This formulas come from the one-sample formulas applied to the differences

Page 54: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 54

Comparing Variances of Two Populations

•Application arises when comparing instrument precision oruniformities of products. Also when checking assumptions.

•The method discussed in the book (F test) is applicable only under the assumption of normality of the data. The F test is highly sensitive to even modest departures from normality.

• In case of nonnormal data there are nonparametric and other robust methods for comparing data dispersion. We recallsome supported by JMP.

Page 55: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 55

Example 8.7 (Hospitalization Costs: Test of Equality of Variances)

Page 56: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 56

Comparing Variances with JMP

These tests are alternatives to the F test used inExample 8.7 on Page 288. These tests can be usedwith more than two samples. Levene’s test is relatively robust against violations of the normalityassumption . See JMP’s help for details.

Page 57: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 57

Comparing Variances of Two Populations: F Test

1

1

21 2 1 1

21 2 1 1

Independent sample design:Sample 1: , ,..., is a random sample from ( , )

Sample 2: , ,..., is a random sample from ( , )n

n

x x x N

y y y N

µ σ

µ σ2 2

1 11 22 2

2 2

has an F distribution 1 and 1 d.f. respectivelySF n nS

σσ

= − −

Page 58: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 58

An Important Industrial Application:Example 8.8

Do the two labs have equal measurement precision?

Page 59: Unit 8: Inference for Two Samples - University of Tennesseeweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit8.pdf · 7/14/2004 Unit 8 - Stat 571 - Ramón V. León 10 Graphical Displays

7/14/2004 Unit 8 - Stat 571 - Ramón V. León 59

Example 8.8: Calculation Details

d.f. = (12 - 1) + (12 - 1) + (12 - 1) = 33

d.f. = (6 - 1) + (6 - 1) + (6 - 1) = 15, ,

, ,1

Used the fact that1

m nn m

ff α

α−

=