revisiting an old topic: probability of replication d. lizotte, e. laber & s. murphy johns...

31
Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

Post on 22-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

Revisiting an Old Topic:Probability of Replication

D. Lizotte, E. Laber & S. Murphy

Johns Hopkins Biostatistics

September 23, 2009

Page 2: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

2

Outline

• Scientific Background

• Our Estimand: Probability of Selection

• Estimators

• STAR*D

• Where to go from here?

Page 3: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

3

Scientific Background

First experiment results in– or – ,

– what is the chance that we will replicate this result in a subsequent experiment?

– Prob. of Concurrence or Prob. of Replication– Killeen (2005) followed by great controversy in

psychology (Cumming, (2005, 2006, 2008); MacDonald (2005);Doros & Geier(2005); Iverson(2008); Iverson, Wagenmakers & Lee (2008); Asby & O’Brien(2008), Iverson, Lee & Wagenmakers (2009)……)

Page 4: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

4

Scientific Background

Similar problem but discredited:

• Post-hoc power/ Observed power: Assuming the observed standardized effect size is the truth, calculate the probability of rejecting null hypothesis. Hoenig & Heisey (2001)

Page 5: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

5

Scientific Background

First experiment results in– or – – what is the chance that we will replicate this result

in a subsequent experiment?

• Why is this question so attractive?• Scientists (including statisticians!) often want to

answer this question with 1 – p-value

Page 6: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

6

Scientific Background

• First experiment results in– or – ,– what is the chance that we will replicate this result

in a subsequent experiment?

• 1 – p-value does not address this question.– Goodman (1992), Cumming (2008)– 1 – p-value is not an estimator.

Page 7: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

7

Scientific Background

• Much confusion about estimand:– , what is the chance that we will replicate this

result in a subsequent experiment?• Do we want to “estimate”

1) or

2) or

3) or

4) ?

• Good frequentist properties are desired.

Page 8: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

8

Our Estimand

• Probabilities of Selection 2)

• The probability of selection is a composite measure of signal, noise, and sample size

Page 9: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

9

Our Estimand• Advantages (The Hope) over the concept of p-value

– Close to what many scientists want.

– The intuitive interpretation is correct.

– Does not rely on the correctness of a data generating model for meaning.

– Less ambitious than 3)

• Disadvantages– We changed the question.

– Some may think that there is no need for a confidence interval—wrong.

– Non-regular

Page 10: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

10

Estimators

• Why is this a hard problem?– The desire for good frequentist properties – The fact that effect sizes tend to be small relative to

the noise.– This is a non-regular problem—bias is of the same

order as variance.

• Back of the envelope calculations:

Page 11: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

11

Estimators

• Use plug-in estimator

• Plug-in estimator is 1 – p-value (Goodman, 1992)!– Nonregular

• Near a uniform distribution if

• If n is large, close to 0 or 1 otherwise

– We can expect to be small.

Page 12: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

12

Estimators

• Try a Bayesian approach. – Random sample from a , – Flat prior on , known – Use as an estimator of

• Bayesian methods do not eliminate non-regularity.

Page 13: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

13

Estimators

Focus on MSE in formulating estimators for . 1) Assume is approximately normal with mean

and variance 1) Flat prior (e.g. Killeen’s prep)

2) Normal Prior:

3) Prior is mixture between N(0,1) with probability w point mass on with probability 1-w

Page 14: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

14

Estimators

Focus on MSE in formulating estimators for . 2) Single bootstrap (Efron & Tibshirani:1989) .

• This is 1 - p-value. No assumption of approximate normality. If is approximately normal then this is approximately the plug-in estimator:

3) Double bootstrap• This is a bagged plug-in estimator. This bags the 1-

bootstrap p-value. No assumption of approximate normality.

Page 15: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

15

Why a double bootstrap?

Double bootstrap estimator for . • Bagging is used to trade variance for bias when

estimators are unstable (Buehlman & Yu, 2002). • The bootstrap estimator of is

unstable; if it does not converge as the sample size increases.

• Under local alternatives such as the bootstrap estimator is inconsistent as well.

Page 16: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

16

Double Bootstrap

Double bootstrap estimator for .If has an approximate normal distribution then the

double bootstrap estimator is

That is, the double bootstrap reduces to prep in this case.

Page 17: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

17

MSE Plots

• Two groups, each of size 25

• Two distributions (normal, bimodal)

• Two definitions of – –

• Compare – prep, pnorm, pmix, single bootstrap, double

bootstrap

Page 18: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009
Page 19: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009
Page 20: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009
Page 21: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

21

Estimators

Instead of a point estimator, consider a confidence interval for .

Assume has an approximate normal distribution; then

In this case a confidence interval for can be found from a confidence interval for the standardized effect size:

Page 22: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

22

STAR*D

• Sequenced Treatment Alternatives to Relieve Depression

• Large multi-site study focused on individuals whose depression did not remit with citalopram

• In this trial each individual can proceed through up to 4 stages of treatment. The individual moves to a next stage if the individual is not responding to present treatment.

• Each stage involves a randomization.

Page 23: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

23

STAR*D

• This is a data from 683 individuals who did not respond to citalopram and preferred a switch in treatment.

• These individuals were randomized between Venlafaxine, Bupropion, Sertraline

• Outcome: Time until remission.

• We model the area under the survival curve from entry into this stage of treatment until 30 months. (e.g. min(T, 30)).

Page 24: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

STAR*D

Regression formula at level 2:

Page 25: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

25

STAR*D

• For each s,

• Double Bootstrap– Inner-most bootstrap counts proportion of “votes”

in which – Outer-most bootstrap averages over the proportion

across the bootstrap samples

Page 26: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009
Page 27: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009
Page 28: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

28

Discussion

• Definition of the probability of selection when there is more than two treatments.

• Confidence intervals for comparisons between more than two treatments.

• Is there a minimax estimator of the selection probability?

• Is there hope for the replication probability?

Page 29: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

29

Truth in Advertising:STAR*D

Missing Data + Study Drop-Out

• 1200 subjects begin level 2 (e.g. stage 1)

• 42% study dropout during level 2

• 62% study dropout by 30 weeks.

• Approximately 13% item missingness for important variables observed after the start of the study but prior to dropout.

Page 30: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

30

This seminar can be found at:http://www.stat.lsa.umich.edu/~samurphy/

seminars/HopkinsBiostat09.23.09.ppt

Email me with questions or if you would like a copy!

[email protected]

Page 31: Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009

31

Our Estimand

• The probability of selection is a composite measure of signal, noise and sample size

• The p-value is a composite measure of estimated signal, estimated noise and sample size.