optimality considerations in testing massive numbers of hypotheses peter h. westfall ananda...

Optimality Considerations Optimality Considerations in Testing Massive Numbers in Testing Massive Numbers

of Hypotheses of Hypotheses

Peter H. WestfallPeter H. WestfallAnanda BandulasiriAnanda Bandulasiri

Texas Tech UniversityTexas Tech University

Hypotheses; FWE and FDRHypotheses; FWE and FDR

HH0i0i (point null) vs. H (point null) vs. H1i 1i , i=1,…,k. , i=1,…,k. k is large!k is large! A decision algorithm for classifying k tests A decision algorithm for classifying k tests

produces R total rejections, with V erroneous. produces R total rejections, with V erroneous. FWE= P(V>0 )FWE= P(V>0 ) FDR = E(V/RFDR = E(V/R++))

To control FWE: Hochberg, Westfall-Young,… To control FWE: Hochberg, Westfall-Young,… To control FDR: Benjamini and Hochberg,…To control FDR: Benjamini and Hochberg,…

Scale Up, wrt kScale Up, wrt k

FWE-controlling methods FWE-controlling methods do notdo not scale up as k scale up as k : : Reject HReject H0i0i when p when pii ~ ~ /k./k.

FDR-controlling methods FDR-controlling methods dodo scale up as k scale up as k : :

Reject HReject H0i0i when p when pii kk

where where kkas kas k0<0<

FDR Convergence as kFDR Convergence as k

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20

CDF T

CDF Crit

Critical t3 for FDR(.05) is 4.93

Marginal (unadjusted) is 0.0160

Application: EEG Application: EEG Responses to Light StimuliResponses to Light Stimuli

43 time series responses; 62 scalp locations; 70 ind. reps; 5 trt: (1)G60% (2)R90% (3)G80% (4)R100% (5)G100%

Average EEG CurvesAverage EEG Curves

Average Stimulus-Evoked EEG curves (loc 50)

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

0 10 20 30 40 50

Time Bin

EE

G (m

V)

Trt1

Trt2

Trt3

Trt4

Trt5

Questions of InterestQuestions of Interest

Validity CheckValidity Check: Differences should exist : Differences should exist between responses for different intensities.between responses for different intensities.

Main QuestionMain Question: Are there differences between : Are there differences between red and green stimuli? When? Where?red and green stimuli? When? Where?

Number of testsNumber of tests: : k = (10 trt comparisons) x (62 scalp spots) k = (10 trt comparisons) x (62 scalp spots)

x (43 time locations) = 26,660. x (43 time locations) = 26,660.

Histograms of t-statistics with null reference

G60 v R90G60 v R90

G60 v G80


G60 v R100

G60 v G100


R90 v G80

R90 v R100


R90 v G100

G80 v R100


G80 v G100

R100 v G100

ResultsResults

Westfall-Young FWE-controlling methodWestfall-Young FWE-controlling method:: No significant R100 v G100 comparisonsNo significant R100 v G100 comparisons Significant comparisons for all other contrastsSignificant comparisons for all other contrasts

Benjamini-Hochberg FDR-controlling methodBenjamini-Hochberg FDR-controlling method:: 23 significant R100 v G100 comparisons23 significant R100 v G100 comparisons

ClaimClaim: The FWE-controlling method gave the : The FWE-controlling method gave the right answer.right answer.

A CommentA Comment

FDR scales up better as k FDR scales up better as k , but that does , but that does not necessarily mean the results are “better,” not necessarily mean the results are “better,” even for large k.even for large k.

Scale Up, wrt n Scale Up, wrt n

Model for test statistics ZModel for test statistics Zii , i=1,…,k , i=1,…,k ZZii||ii ~ N(n ~ N(n1/2 1/2 ii ,1); ,1); i i ==ii//xixi= stdzd effect size= stdzd effect size

ii ~ F ~ F

Suppose P(Suppose P(ii =0 ) = 0. Then =0 ) = 0. Then FDR FDR does notdoes not scale up as n scale up as n. . FWE FWE mightmight scale up, but only serendipitously, scale up, but only serendipitously,

if n and k diverge at appropriate ratesif n and k diverge at appropriate rates

Efron’s MethodEfron’s Method

JASA(2004), 96-104JASA(2004), 96-104 Estimate an “empirical null distribution,” fEstimate an “empirical null distribution,” f00(z), (z),

from the center of the histogram of z’s.from the center of the histogram of z’s. Estimate the combined distribution, f(z).Estimate the combined distribution, f(z). Estimate a “local FDR” for each zEstimate a “local FDR” for each z ii, as , as

fdr(zfdr(zii) = f) = f00(z(zii)/f(z)/f(zii).).

Choose as “interesting” cases with fdr(zChoose as “interesting” cases with fdr(z ii) <0.1.) <0.1.

DiscussionDiscussion

P(P(ii =0) > 0 is usually false, but a reasonable =0) > 0 is usually false, but a reasonable

approximation for small n.approximation for small n. As nAs n we need more realistic models: we need more realistic models: ““P(P(ii =0) > 0 never true”, but even =0) > 0 never true”, but even if if true -true -

Unobserved covariatesUnobserved covariates Failed model assumptionsFailed model assumptions Imperfect sampling proceduresImperfect sampling procedures

“ “Empirical null” sensibleEmpirical null” sensible

Results from Efron’s MethodResults from Efron’s Method Significant diffs only for 2 v 3, 2 v 4, 2 v 5Significant diffs only for 2 v 3, 2 v 4, 2 v 5 No significant R100 v G100 comparisons (right answer) No significant R100 v G100 comparisons (right answer)

What is the “Right Answer”?What is the “Right Answer”?

Methods that have “good” Methods that have “good” utilityutility are “right” are “right” MCPs must have reasonable utility, otherwise MCPs must have reasonable utility, otherwise

they would have disappeared long agothey would have disappeared long ago DECISION THEORY: The right answer is to DECISION THEORY: The right answer is to

maximize utility/minimize loss.maximize utility/minimize loss.

Loss FunctionsLoss Functions

1 0UE 0

0 1

OE UE

NC 22

L ( )1 exp( )

L ( ) L ( )

2L ( ) 1

1 exp( )

C CC

Loss Functions

0

-1 -0.5 0 0.5 1

Lo

ss

LUE

LNC

0

k

C1

C2

Optimal Decision Rule:Optimal Decision Rule:

Classify to:Classify to: ““UE” when UE” when < - < -00

““OE” when OE” when > > 00

““NC” when |NC” when || < | < 0 0 ,,

regardlessregardless of the distribution of of the distribution of ..

Problem: We observe x = Problem: We observe x = + + , not , not ..

HereHeredistributions *do* matterdistributions *do* matter

Assumptions

x|~ N(, x) ; (say x 0.223 )

= ~ N(0, ) wp 0

~ N(0, ) wp 1-0

Baseline ParametersBaseline Parameters

Model ParametersModel Parameters xx = 0.223 = 0.223

= 0.05= 0.05

= 1.00= 1.00

0 0 = 0.80= 0.80

Loss Function Parameters Loss Function Parameters CC11 = 0.4 = 0.4

CC22 = 0.2 = 0.2 k = 0.99k = 0.99 00 = 0.223 = 0.223

Test Stats & Multiple TestsTest Stats & Multiple Tests

/i

i

i

xt

s n

For testing H0i: i = 0, for i = 1,…,k,

, p-Values are

12 {1 ( | |)}i n ip P T t

FWE, FDR controlling methods use pi;Efron’s method use ti.

Average LossAverage Loss

Simulation: Simulation: generated, then x’s | generated, then x’s | IndependenceIndependence All combinations of:All combinations of:

p=(400, 2000, 10000, 50000) p=(400, 2000, 10000, 50000) n=(10, 20, 40, 80, 160)n=(10, 20, 40, 80, 160)

p=10000

0

0.2

0.4

0.6

0.8

1

10 20 40 80 160

n

% o

f L

oss

(Nev

er)

FDR

FWE

Efron

n=10

0

0.2

0.4

0.6

0.8

1

400 2000 10000 50000

p

% o

f lo

ss(N

ever

)

FDR

FWE

Efron

Concluding CommentsConcluding Comments

Consider scale up in both p Consider scale up in both p andand n n FWE often ok (serendipity)FWE often ok (serendipity) Efron promising for scaling up both waysEfron promising for scaling up both ways Recommendations: Either Recommendations: Either

a.a. Learn to recognize situations where Learn to recognize situations where FWE/FDR/Efron/… have good utility, orFWE/FDR/Efron/… have good utility, or

b.b. Bite the bullet and construct loss functionsBite the bullet and construct loss functions

optimality considerations in testing massive numbers of hypotheses peter h. westfall ananda...

Documents