optimality considerations in testing massive numbers of hypotheses peter h. westfall ananda...
TRANSCRIPT
Optimality Considerations Optimality Considerations in Testing Massive Numbers in Testing Massive Numbers
of Hypotheses of Hypotheses
Peter H. WestfallPeter H. WestfallAnanda BandulasiriAnanda Bandulasiri
Texas Tech UniversityTexas Tech University
Hypotheses; FWE and FDRHypotheses; FWE and FDR
HH0i0i (point null) vs. H (point null) vs. H1i 1i , i=1,…,k. , i=1,…,k. k is large!k is large! A decision algorithm for classifying k tests A decision algorithm for classifying k tests
produces R total rejections, with V erroneous. produces R total rejections, with V erroneous. FWE= P(V>0 )FWE= P(V>0 ) FDR = E(V/RFDR = E(V/R++))
To control FWE: Hochberg, Westfall-Young,… To control FWE: Hochberg, Westfall-Young,… To control FDR: Benjamini and Hochberg,…To control FDR: Benjamini and Hochberg,…
Scale Up, wrt kScale Up, wrt k
FWE-controlling methods FWE-controlling methods do notdo not scale up as k scale up as k : : Reject HReject H0i0i when p when pii ~ ~ /k./k.
FDR-controlling methods FDR-controlling methods dodo scale up as k scale up as k : :
Reject HReject H0i0i when p when pii kk
where where kkas kas k0<0<
FDR Convergence as kFDR Convergence as k
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
CDF T
CDF Crit
Critical t3 for FDR(.05) is 4.93
Marginal (unadjusted) is 0.0160
Application: EEG Application: EEG Responses to Light StimuliResponses to Light Stimuli
43 time series responses; 62 scalp locations; 70 ind. reps; 5 trt: (1)G60% (2)R90% (3)G80% (4)R100% (5)G100%
Average EEG CurvesAverage EEG Curves
Average Stimulus-Evoked EEG curves (loc 50)
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
0 10 20 30 40 50
Time Bin
EE
G (m
V)
Trt1
Trt2
Trt3
Trt4
Trt5
Questions of InterestQuestions of Interest
Validity CheckValidity Check: Differences should exist : Differences should exist between responses for different intensities.between responses for different intensities.
Main QuestionMain Question: Are there differences between : Are there differences between red and green stimuli? When? Where?red and green stimuli? When? Where?
Number of testsNumber of tests: : k = (10 trt comparisons) x (62 scalp spots) k = (10 trt comparisons) x (62 scalp spots)
x (43 time locations) = 26,660. x (43 time locations) = 26,660.
Histograms of t-statistics with null reference
G60 v R90G60 v R90
G60 v G80
Histograms of t-statistics with null reference
G60 v R100
G60 v G100
Histograms of t-statistics with null reference
R90 v G80
R90 v R100
Histograms of t-statistics with null reference
R90 v G100
G80 v R100
Histograms of t-statistics with null reference
G80 v G100
R100 v G100
ResultsResults
Westfall-Young FWE-controlling methodWestfall-Young FWE-controlling method:: No significant R100 v G100 comparisonsNo significant R100 v G100 comparisons Significant comparisons for all other contrastsSignificant comparisons for all other contrasts
Benjamini-Hochberg FDR-controlling methodBenjamini-Hochberg FDR-controlling method:: 23 significant R100 v G100 comparisons23 significant R100 v G100 comparisons
ClaimClaim: The FWE-controlling method gave the : The FWE-controlling method gave the right answer.right answer.
A CommentA Comment
FDR scales up better as k FDR scales up better as k , but that does , but that does not necessarily mean the results are “better,” not necessarily mean the results are “better,” even for large k.even for large k.
Scale Up, wrt n Scale Up, wrt n
Model for test statistics ZModel for test statistics Zii , i=1,…,k , i=1,…,k ZZii||ii ~ N(n ~ N(n1/2 1/2 ii ,1); ,1); i i ==ii//xixi= stdzd effect size= stdzd effect size
ii ~ F ~ F
Suppose P(Suppose P(ii =0 ) = 0. Then =0 ) = 0. Then FDR FDR does notdoes not scale up as n scale up as n. . FWE FWE mightmight scale up, but only serendipitously, scale up, but only serendipitously,
if n and k diverge at appropriate ratesif n and k diverge at appropriate rates
Efron’s MethodEfron’s Method
JASA(2004), 96-104JASA(2004), 96-104 Estimate an “empirical null distribution,” fEstimate an “empirical null distribution,” f00(z), (z),
from the center of the histogram of z’s.from the center of the histogram of z’s. Estimate the combined distribution, f(z).Estimate the combined distribution, f(z). Estimate a “local FDR” for each zEstimate a “local FDR” for each z ii, as , as
fdr(zfdr(zii) = f) = f00(z(zii)/f(z)/f(zii).).
Choose as “interesting” cases with fdr(zChoose as “interesting” cases with fdr(z ii) <0.1.) <0.1.
DiscussionDiscussion
P(P(ii =0) > 0 is usually false, but a reasonable =0) > 0 is usually false, but a reasonable
approximation for small n.approximation for small n. As nAs n we need more realistic models: we need more realistic models: ““P(P(ii =0) > 0 never true”, but even =0) > 0 never true”, but even if if true -true -
Unobserved covariatesUnobserved covariates Failed model assumptionsFailed model assumptions Imperfect sampling proceduresImperfect sampling procedures
“ “Empirical null” sensibleEmpirical null” sensible
Results from Efron’s MethodResults from Efron’s Method Significant diffs only for 2 v 3, 2 v 4, 2 v 5Significant diffs only for 2 v 3, 2 v 4, 2 v 5 No significant R100 v G100 comparisons (right answer) No significant R100 v G100 comparisons (right answer)
What is the “Right Answer”?What is the “Right Answer”?
Methods that have “good” Methods that have “good” utilityutility are “right” are “right” MCPs must have reasonable utility, otherwise MCPs must have reasonable utility, otherwise
they would have disappeared long agothey would have disappeared long ago DECISION THEORY: The right answer is to DECISION THEORY: The right answer is to
maximize utility/minimize loss.maximize utility/minimize loss.
Loss FunctionsLoss Functions
1 0UE 0
0 1
OE UE
NC 22
L ( )1 exp( )
L ( ) L ( )
2L ( ) 1
1 exp( )
C CC
Loss Functions
0
-1 -0.5 0 0.5 1
Lo
ss
LUE
LNC
0
k
C1
C2
Optimal Decision Rule:Optimal Decision Rule:
Classify to:Classify to: ““UE” when UE” when < - < -00
““OE” when OE” when > > 00
““NC” when |NC” when || < | < 0 0 ,,
regardlessregardless of the distribution of of the distribution of ..
Problem: We observe x = Problem: We observe x = + + , not , not ..
HereHeredistributions *do* matterdistributions *do* matter
Assumptions
x|~ N(, x) ; (say x 0.223 )
= ~ N(0, ) wp 0
~ N(0, ) wp 1-0
Baseline ParametersBaseline Parameters
Model ParametersModel Parameters xx = 0.223 = 0.223
= 0.05= 0.05
= 1.00= 1.00
0 0 = 0.80= 0.80
Loss Function Parameters Loss Function Parameters CC11 = 0.4 = 0.4
CC22 = 0.2 = 0.2 k = 0.99k = 0.99 00 = 0.223 = 0.223
Test Stats & Multiple TestsTest Stats & Multiple Tests
/i
i
i
xt
s n
For testing H0i: i = 0, for i = 1,…,k,
, p-Values are
12 {1 ( | |)}i n ip P T t
FWE, FDR controlling methods use pi;Efron’s method use ti.
Average LossAverage Loss
Simulation: Simulation: generated, then x’s | generated, then x’s | IndependenceIndependence All combinations of:All combinations of:
p=(400, 2000, 10000, 50000) p=(400, 2000, 10000, 50000) n=(10, 20, 40, 80, 160)n=(10, 20, 40, 80, 160)
p=10000
0
0.2
0.4
0.6
0.8
1
10 20 40 80 160
n
% o
f L
oss
(Nev
er)
FDR
FWE
Efron
n=10
0
0.2
0.4
0.6
0.8
1
400 2000 10000 50000
p
% o
f lo
ss(N
ever
)
FDR
FWE
Efron
Concluding CommentsConcluding Comments
Consider scale up in both p Consider scale up in both p andand n n FWE often ok (serendipity)FWE often ok (serendipity) Efron promising for scaling up both waysEfron promising for scaling up both ways Recommendations: Either Recommendations: Either
a.a. Learn to recognize situations where Learn to recognize situations where FWE/FDR/Efron/… have good utility, orFWE/FDR/Efron/… have good utility, or
b.b. Bite the bullet and construct loss functionsBite the bullet and construct loss functions