intermediate applied statistics stat 460

32
Intermediate Applied Statistics STAT 460 Lecture 10, 10/1/2004 Instructor: Aleksandra (Seša) Slavković [email protected] TA: Wang Yu [email protected]

Upload: tana

Post on 05-Jan-2016

50 views

Category:

Documents


4 download

DESCRIPTION

Intermediate Applied Statistics STAT 460. Lecture 10, 10/1/2004 Instructor: Aleksandra (Seša) Slavković [email protected] TA: Wang Yu [email protected]. Failure Times Example from Sleuth , p. 171. One-way ANOVA. One-way ANOVA: TIME versus COMPOUND Analysis of Variance for TIME - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Intermediate Applied Statistics  STAT 460

Intermediate Applied Statistics STAT 460

Lecture 10, 10/1/2004

Instructor:Aleksandra (Seša) Slavković[email protected]

TA:Wang [email protected]

Page 2: Intermediate Applied Statistics  STAT 460

Failure Times Example from Sleuth, p. 171

54321

25

20

15

10

5

0

COMPOUND

TIM

EBoxplots of TIME by COMPOUND

(means are indicated by solid circles)

Page 3: Intermediate Applied Statistics  STAT 460

One-way ANOVA

One-way ANOVA: TIME versus COMPOUND

Analysis of Variance for TIME Source DF SS MS F PCOMPOUND 4 401.3 100.3 5.02 0.002Error 45 899.2 20.0Total 49 1300.5 Individual 95% CIs For Mean Based on Pooled StDevLevel N Mean StDev --+---------+---------+---------+----1 10 10.693 4.819 (------*------) 2 10 6.050 2.915 (------*------) 3 10 8.636 3.291 (-------*------) 4 10 9.798 5.806 (------*-------) 5 10 14.706 4.863 (------*------)

--+---------+---------+---------+---Pooled StDev = 4.470 4.0 8.0 12.0 16.0

Page 4: Intermediate Applied Statistics  STAT 460

ANOVA So far we have talked about testing the null

hypothesis

H0: μ1 = μ2 = μ3 = . . . = μk

versus the alternative hypothesisHA: at least one pair of groups has μi ≠ μj

Next, we want to know where the difference comes from?

Page 5: Intermediate Applied Statistics  STAT 460

Approaches to Comparisons

A. Pairwise comparisonsB. Multiple comparison with a control groupC. Multiple comparison with the “best” groupD. Linear contrasts

Each of these procedures, (e.g. pairwise confidence intervals for the difference) are available in addition to corresponding tests, in most statistical packages.

Page 6: Intermediate Applied Statistics  STAT 460

All Pairwise Comparisons In this approach, we don’t just want to know whether

there is any significant difference, we want to know which groups are significantly different from which other groups.

The most obvious approach to testing pairwise comparisons is just to do a t-test on each pair of groups. The problem with this is sometimes called “compound uncertainty” or “capitalizing on chance.”

Note: By significantly we mean systematically or demonstrably (statistical significance), not greatly or importantly (practical significance).

Page 7: Intermediate Applied Statistics  STAT 460

Fails 5% of timeFails 5% of time

Fails 5% of time

Fails 5% of time

Fails 5% of time

If there are five independent parts and each fail 5% of the time, then the whole machine fails 1-(0.95)5 = 23% of the time!

Page 8: Intermediate Applied Statistics  STAT 460

All Pairwise Comparisons If you have c tests, each of which has its

individual false positive rate controlled at , then the total false positive rate may be as high as 1-(1-)c c.

If we want to do all possible pairwise comparisons then the number of tests will be

2

)1(* kk

Page 9: Intermediate Applied Statistics  STAT 460

All Pairwise Comparisons

2

)1(*

kkc

If we want to do all possible pairwise comparisons,

then the number of tests will be

where k is the number of groups.If there are 5 groups then there are 10 tests!

Page 10: Intermediate Applied Statistics  STAT 460

All Pairwise Comparisons So we have to decide whether we want to

control only individual type I error risk, or experiment-wide (or “familywise”) type I error risk.

We have to be much more conservative to control the latter than we would need to be to control the former. In fact, we might have to make so low that β (power) will become very high.

Page 11: Intermediate Applied Statistics  STAT 460

All Pairwise Comparisons One possible answer is to control individual type I

error risk for planned comparisons (a few comparisons which were of special interest as you were planning the study) and control experiment-wide type I error risk for unplanned comparisons (those you are making after the study has been done just for exploratory purposes).

Suppose for now that there aren’t any comparisons of special interest, we just want to look at all possible pairwise comparisons. Then we definitely should try to control experiment-wide error.

Page 12: Intermediate Applied Statistics  STAT 460

Method 1: Fisher’s Protected LSD (Least Significant Difference)

A simple approach to trying to control experiment-wide error:1. First do an overall F-test to see if there

are any differences at all2. If this test does not reject H0 then conclude that no differences can be found. 3. If it does reject H0 then go ahead and do all possible two-group t-tests. (You might use a standard deviation pooled over all groups instead of each pair separately, but otherwise act just as in the two-sample case.)

Page 13: Intermediate Applied Statistics  STAT 460

Method 1: Fisher’s Protected LSD (Least Significant Difference)

The only problem with that approach is that it doesn’t work. At least according to some experts, the LSD procedure does not really control experiment-wide error. You still have the multiple comparisons problem.

Page 14: Intermediate Applied Statistics  STAT 460

Method 2: Bonferroni Correction One way to do this is to use a “Bonferroni

correction” on . The idea here is that we first set a family-wise , say .05, and then figure out how small the individual * would need to be in order to keep the family-wise type I error rate at . For example, if *=.01 and c=5, then .05.

Page 15: Intermediate Applied Statistics  STAT 460

Method 2: Bonferroni Correction

Bonferroni Inequality:For independent events A1, …, Ak of equal probability,

cic APAAAAP ))(1(1),...,,,( 321

)( iAPc

Page 16: Intermediate Applied Statistics  STAT 460

Method 2: Bonferroni Correction So if we just set then the

experiment-wide error rate will be controlled.c

*

An advantage of this approach is that it is a simple idea (just divide up your acceptable risk equally among all the tests you want to do).

A disadvantage is that it can be inefficient. Thecan become quite small, e.g., .05/21 .002. c

Page 17: Intermediate Applied Statistics  STAT 460

Method 3: Tukey’s (HSD) Tests Tukey’s “Honest Significant Difference” test takes a

different approach that does not use the t or F distributions but the distribution of the “studentized range” or Q statistic.

In a Tukey test, the Q distribution is used to find a boundary on differences between group averages , such that 95% of the time if the null hypothesis was

true, no pair of groups would have a difference larger than this boundary.

So any pair of groups with a difference larger than this critical value is declared significantly different.

SEMYY )( minmax

Page 18: Intermediate Applied Statistics  STAT 460

Confidence Intervals

Much like the usual t-intervals, the Bonferroni and Tukey confidence intervals for μ1-μ2 is

where m is some multiplier. But m is chosen differently for each procedure.

ji yyji SEmyy *

21

11

nnsSE pyy ji

best estimatefor μ1-μ2

“margin of error” or “half-width” of interval

Page 19: Intermediate Applied Statistics  STAT 460

For a t-interval or LSD t-interval,

For a Bonferroni t-interval,

For a Tukey confidence interval,

ji yyji SEmyy *

2tm

)1)()(( , ErrorTreat dfdfTreat Fdfm

2*tm with α*= α/c

21,2

for stands

actually,

dfe

tt

Page 20: Intermediate Applied Statistics  STAT 460

Method 4: Scheffe’s test Also available are Scheffé tests and intervals,

which use

but they lead to a test that may be too conservative.

)1)()(( , ErrorTreat dfdfTreat Fdfm

Page 21: Intermediate Applied Statistics  STAT 460

How to compare specific groups in an ANOVA

If you have a few planned comparisons of theoretical importance in mind, then you can just do them with a slightly adjusted t-test

a different value for sP and a different degrees of freedom for the t statistic based on the fact that you pool variance over all groups, not just two.

If you want to do many comparisons then you may wish to try some means of controlling compound uncertainty (i.e., correcting for multiple comparisons).

In the case where you want to test for all pairwise differences, choices include Fisher’s LSD (too lenient), Bonferroni, and Tukey (the latter also known as Tukey-Kramer when the sample sizes are different). There are also various other procedures I didn’t (and won’t) mention.

Page 22: Intermediate Applied Statistics  STAT 460

Tukey's pairwise comparisons

Family error rate = 0.0500Individual error rate = 0.00670

Critical value = 4.02

Intervals for (column level mean) - (row level mean)

1 2 3 4

2 -1.040 10.326

3 -3.626 -8.269 7.740 3.097

4 -4.788 -9.431 -6.845 6.578 1.935 4.521

5 -9.696 -14.339 -11.753 -10.591 1.670 -2.973 -0.387 0.775

This is an abbreviation for, “We are 95% confident according to the Tukey procedure that the signed

difference μ1-μ2 is between -1.040

hours and +10.326 hours.” So we

can’t conclude that μ1≠μ2.

Page 23: Intermediate Applied Statistics  STAT 460

Tukey's pairwise comparisons

Family error rate = 0.0500Individual error rate = 0.00670

Critical value = 4.02

Intervals for (column level mean) - (row level mean)

1 2 3 4

2 -1.040 10.326

3 -3.626 -8.269 7.740 3.097

4 -4.788 -9.431 -6.845 6.578 1.935 4.521

5 -9.696 -14.339 -11.753 -10.591 1.670 -2.973 -0.387 0.775

Conclude μ2≠μ5 and μ3≠μ5. More specifically, μ2<μ5 and μ3<μ5.

Page 24: Intermediate Applied Statistics  STAT 460

Tukey Simultaneous TestsResponse Variable TIME All Pairwise Comparisons among Levels of COMPOUND

COMPOUND = 1 subtracted from:Level Difference SE of AdjustedCOMPOUND of Means Difference T-Value P-Value2 -4.643 1.999 -2.322 0.15673 -2.057 1.999 -1.029 0.84064 -0.895 1.999 -0.448 0.99145 4.013 1.999 2.007 0.2792

COMPOUND = 2 subtracted from:Level Difference SE of AdjustedCOMPOUND of Means Difference T-Value P-Value3 2.586 1.999 1.294 0.69654 3.748 1.999 1.875 0.34545 8.656 1.999 4.330 0.0008

COMPOUND = 3 subtracted from:Level Difference SE of AdjustedCOMPOUND of Means Difference T-Value P-Value4 1.162 1.999 0.5812 0.97725 6.070 1.999 3.0363 0.0309

COMPOUND = 4 subtracted from:Level Difference SE of AdjustedCOMPOUND of Means Difference T-Value P-Value5 4.908 1.999 2.455 0.1196

Page 25: Intermediate Applied Statistics  STAT 460

2 3 4 1 5

(“not significantly different”)

(“not significantly different”)

Page 26: Intermediate Applied Statistics  STAT 460

Multiple Comparisons with a Control

A different situation arises when you have a control group and are only interested in the comparison of other groups to the control group, not to one another.

Then there is something like Tukey’s test, but modified, called Dunnett’s test.

Page 27: Intermediate Applied Statistics  STAT 460

Multiple Comparisons with a Control

Tukey confidence intervals are for μi - μJ and there is one of them for every pair of distinct groups, which comes out to be k*(k-1)/n of them.

Dunnett confidence intervals are for μi - μControl and there are only k-1 of them, one for each noncontrol group.

Page 28: Intermediate Applied Statistics  STAT 460

Multiple Comparisons with a Control

Suppose compound 5 is thought of as a control group because it is the compound that has been in use previously. We want to know if any of the other compounds have different population mean survival times than compound 5.

Page 29: Intermediate Applied Statistics  STAT 460

Multiple Comparisons with a ControlDunnett's comparisons with a control

Family error rate = 0.0500Individual error rate = 0.0149

Critical value = 2.53

Control = level (5) of COMPOUND

Intervals for treatment mean minus control mean

Level Lower Center Upper -----+---------+---------+---------+--1 -9.074 -4.013 1.048 (------------*------------) 2 -13.717 -8.656 -3.595 (-----------*------------) 3 -11.131 -6.070 -1.009 (------------*-----------) 4 -9.969 -4.908 0.153 (------------*-----------) -----+---------+---------+---------+-- -12.0 -8.0 -4.0 0.0

For compounds 2 and 3, we are confident that their difference from the control is nonzero.

Page 30: Intermediate Applied Statistics  STAT 460

Multiple Comparisons with a ControlDunnett Simultaneous TestsResponse Variable TIME Comparisons with Control LevelCOMPOUND = 5 subtracted from:

Level Difference SE of AdjustedCOMPOUND of Means Difference T-Value P-Value1 -4.013 1.999 -2.007 0.15552 -8.656 1.999 -4.330 0.00033 -6.070 1.999 -3.036 0.01414 -4.908 1.999 -2.455 0.0597

Again we conclude that compounds 2 and 3 are different from the control group (actually, poorer).

Page 31: Intermediate Applied Statistics  STAT 460

Multiple comparison with the best There is also a kind of test called Hsu’s multiple comparison with the

best (MCB).

This test tries to decide which groups could plausibly have come from the population with the highest (or lowest, if you prefer) mean.

The group with the highest (lowest) sample average is automatically the leading candidate, but any group that does not significantly differ from this group is still included as a possible best.

So Hsu’s confidence intervals, one for each group, would be for μi -μbest.

This approach is harder to understand than the others, perhaps because it seems to be doing more than one thing at the same time.

Page 32: Intermediate Applied Statistics  STAT 460

Next Lecture Linear Contrasts