multiple testing in the survival analysis of microarray data
Post on 13-Jan-2016
22 Views
Preview:
DESCRIPTION
TRANSCRIPT
Multiple Testing in the Survival Analysis of
Microarray Data
José A. Correa, Florida Atlantic University
Sandrine Dudoit, Univ. California Berkeley
Darlene R. Goldstein, École PolytechniqueFédérale de Lausanne
Contact: linkage@stat.berkeley.edu
Software: http://www.math.fau.edu/correa/
cDNA gene expression data
Data on m genes for n samples
Genes
mRNA samples
Gene expression level of gene i in mRNA sample j
= (normalized) Log( Red intensity / Green intensity)
sample1 sample2 sample3 sample4 sample5 …
1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...
Multiple Testing Problem• Simultaneously test m null
hypotheses, one for each gene j Hj: no association between
expression level of gene j and the covariate or response
• Because microarray experiments simultaneously monitor expression levels of thousands of genes, there is a large multiplicity issue
• Would like some sense of how ‘surprising’ the observed results are
Hypothesis Truth vs. Decision
# not rejected
# rejected totals
# true H U V (F +) m0
# non-true H
T S m1
totals m - R R m
Truth
Decision
Type I (False Positive) Error Rates
• Per-family Error Rate
PFER = E(V)
• Per-comparison Error Rate PCER = E(V)/m
• Family-wise Error Rate
FWER = p(V ≥ 1)
• False Discovery RateFDR = E(Q), whereQ = V/R if R > 0; Q = 0 if R = 0
Strong vs. Weak Control
• All probabilities are conditional on which hypotheses are true
• Strong control refers to control of the Type I error rate under any combination of true and false nulls
• Weak control refers to control of the Type I error rate only under the complete null hypothesis (i.e. all nulls true)
• In general, weak control without other safeguards is unsatisfactory
Comparison of Type I Error Rates
• In general, for a given multiple testing procedure,
PCER FWER PFER, and
FDR FWER,
with FDR = FWER under the complete null
Adjusted p-values (p*)
• If interest is in controlling, e.g., the FWER, the adjusted p-value for hypothesis Hj is:
pj* = inf {: Hj is rejected at FWER }
• Hypothesis Hj is rejected at FWER if pj
*
• Adjusted p-values for other Type I error rates are similarly defined
Some Advantages of p-value Adjustment
• Test level (size) does not need to be determined in advance
• Some procedures most easily described in terms of their adjusted p-values
• Usually easily estimated using resampling
• Procedures can be readily compared based on the corresponding adjusted p-values
A Little Notation
• For hypothesis Hj, j = 1, …, m
observed test statistic: tj
observed unadjusted p-value: pj
• Ordering of observed (absolute) tj: {rj}
such that |tr1| |tr2
| … |trm|
• Ordering of observed pj: {rj}
such that |pr1| |pr2
| … |prm|
• Denote corresponding RVs by upper case letters (T, P)
Control of the FWER
• Bonferroni single-step adjusted p-valuespj* = min (mpj, 1)
• Holm (1979) step-down adjusted p-valuesprj* = maxk = 1…j {min ((m-k+1)prk, 1)}
• Hochberg (1988) step-up adjusted p-values (Simes inequality)
prj* = mink = j…m {min ((m-k+1)prk, 1) }
Control of the FWER
• Westfall & Young (1993) step-down minP adjusted p-values
prj* = maxk = 1…j { p(maxl{rk…rm} Pl prk H0C )}
• Westfall & Young (1993) step-down maxT adjusted p-values
prj* = maxk = 1…j { p(maxl{rk…rm} |Tl| ≥ |trk| H0
C )}
Westfall & Young (1993) Adjusted p-values
• Step-down procedures: successively smaller adjustments at each step
• Take into account the joint distribution of the test statistics
• Less conservative than Bonferroni, Holm, or Hochberg adjusted p-values
• Can be estimated by resampling but computer-intensive (especially for minP)
maxT vs. minP
• The maxT and minP adjusted p-values are the same when the test statistics are identically distributed (id)
• When the test statistics are not id, maxT adjustments may be unbalanced (not all tests contribute equally to the adjustment)
• maxT more computationally tractable than minP
• maxT can be more powerful in ‘small n, large m’ situations
Control of the FDR• Benjamini & Hochberg (1995): step-up
procedure which controls the FDR under some dependency structures
prj* = mink = j…m { min ([m/k] prk, 1) }
• Benjamini & Yuketieli (2001): conservative step-up procedure which controls the FDR under general dependency structures
prj* = mink = j…m { min (m [1/j]/k] prk, 1) }
• Yuketieli & Benjamini (1999): resampling based adjusted p-values for controlling the FDR under certain types of dependency structures
Identification of Genes Associated with Survival
• Data: survival yi and gene expression xij for individuals i = 1, …, n and genes j = 1, …, m
• Fit Cox model for each gene singly:
h(t) = h0(t) exp(jxij)
• For any gene j = 1, …, m, can test Hj: j = 0
• Complete null H0C: j = 0 for all j = 1, …, m
• The Hj are tested on the basis of the Wald statistics tj and their associated p-values pj
Datasets
• Lymphoma (Alizadeh et al.)40 individuals, 4026 genes
• Melanoma (Bittner et al.)15 individuals, 3613 genes
• Both available at http://lpgprot101.nci.nih.gov:8080/GEAW
Results: Lymphoma
Results: Melanoma
Other Proposals from the Microarray Literature
• ‘Neighborhood Analysis’, Golub et al.– In general, gives only weak control of FWER
• ‘Significance Analysis of Microarrays (SAM)’ (2 versions)– Efron et al. (2000): weak control of PFER– Tusher et al. (2001): strong control of PFER
• SAM also estimates ‘FDR’, but this ‘FDR’ is defined as E(V|H0
C)/R, not E(V/R)
Controversies
• Whether multiple testing methods (adjustments) should be applied at all
• Which tests should be included in the family (e.g. all tests performed within a single experiment; define ‘experiment’)
• Alternatives
– Bayesian approach
– Meta-analysis
Situations where inflated error rates are a concern
• It is plausible that all nulls may be true• A serious claim will be made whenever
any p < .05 is found• Much data manipulation may be
performed to find a ‘significant’ result• The analysis is planned to be
exploratory but wish to claim ‘sig’ results are real
• Experiment unlikely to be followed up before serious actions are taken
Discussion (I)
• Lack of significant findings
– Small sample sizes
– FWER-controlling procedures may be too stringent in microarray applications
– FDR could perhaps be made even more powerful by taking into account the joint distribution of gene expression levels
Discussion (II)
• Computational considerations
– All computing done in the R statistical environment (Ihaka and Gentleman)
– For max T, Cox model analysis was repeated for each of 100,800 random permutations of survival times
– Exact maximum likelihood calculation took about 60 hours per machine in cluster of 24 PCs, each with 1 GHz Pentium III and 256 MB memory
– Time can be reduced substantially by using a score approximation to obtain parameter estimates, and by calling C language code from within R
References
• Alizadeh et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503-511
• Benjamini and Hochberg (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRSSB 57: 289-200
• Benjamini and Yuketieli (2001) The control of false discovery rate in multiple hypothesis testing under dependency. Annals of Statistics
• Bittner et al. (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406: 536-540
• Efron et al. (2000) Microarrays and their use in a comparative experiment. Tech report, Stats, Stanford
• Golub et al. (1999) Molecular classification of cancer. Science 286: 531-537
References
• Hochberg (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75: 800-802
• Holm (1979) A simple sequentially rejective multiple testing procedure. Scand. J Statistics 6: 65-70
• Ihaka and Gentleman (1996) R: A language for data analysis and graphics. J Comp Graph Stats 5: 299-314
• Tusher et al. (2001) Significance analysis of microarrays applied to transcriptional responses to ionizing radiation. PNAS 98: 5116 -5121
• Westfall and Young (1993) Resampling-based multiple testing: Examples and methods for p-value adjustment. New York: Wiley
• Yuketieli and Benjamini (1999) Resampling based false discovery rate controlling multiple test procedures for correlated test statistics. J Stat Plan Inf 82: 171-196
Acknowledgements
• Debashis Ghosh
• Erin Conlon
top related