family-wise error rate control in qtl mapping and gene ...jrstevens/stat5570/garrett_small.pdf ·...
TRANSCRIPT
Dissertation Defense
Family-wise Error Rate Control inQTL Mapping and Gene Ontology Graphs
with Remarks on Family Selection
Garrett Saunders – Dissertation Defense
April 15, 2014
Dissertation Defense Garrett Saunders
Contents
1 Introduction2 FWER Control within Gene Ontology Graphs3 A Power Improving Multiplicity Correction for
Large-Scale SNP Selection in LD Based QTL Mapping4 QTL Mapping: Hypotheses and Approaches5 Discussion
Dissertation Defense Garrett Saunders
1. Introduction
“If enough statistics are computed, some of them will besure to show structure.” (Diaconis 1985)
0 20 40 60 80 100
0.2
0.4
0.6
0.8
1.0
Number of Simultaneous Tests
P(a
ny ty
pe I
erro
r)
(m)
Dissertation Defense Garrett Saunders
1. Introduction
40's & 50's
Great spurt
of MCP activity
1976
Mar
cus
et al
Hoc
hber
g &
Tam
hane
1966
Mille
r
Tipp
et
Cou
rnot
19311843 '87 '93
Wes
tfall
&Yo
ung
'96
Hsu
10
20
30
'00 '13
Number of MCP articles from four leading
journals by year from 1965 to 2008.
Denotes a world-wide
conference on MCPs.
Dissertation Defense Garrett Saunders
1. Introduction
“The Vitality of [the] field in the future – as a researcharea – depends upon [the researcher’s] ability to continueand address the real needs of statistical analysis incurrent problems” (Benjamini, 2010).
Dissertation Defense Garrett Saunders
1. Introduction
Declared Declarednon-significant significant Total
True null hypothesis U V m0False null hypothesis T S m − m0
m − R R m
The Per Comparison Error Rate (PCER): E(V/m)
The Familywise Error Rate (FWER): P(V ≥ 1)
The False Discovery Rate (FDR): E(V/R)
Dissertation Defense Garrett Saunders
1. Introduction
Selecting a Family of Hypotheses
A subjective, but important decision.
“Any collection of inferences for which it is meaningful to takeinto account some combined measure of errors.” (Hochberg &Tamhane 1987)
Gatekeeping (Bretz et al. 2009)
Dissertation Defense Garrett Saunders
1. Introduction
The Bonferroni Adjustment: test each Hi at level α/mBoole’s Inequality:
P(A∪B) ≤ P(A)+P(B) . . . or generally P(⋃
Ai)≤
∑P(Ai)
Let Ri denote the event that hypothesis Hi is rejected.
Then, if PHi(Ri) = αi,
FWER = PHi
(⋃Ri
)≤
∑PHi (Ri) =
∑αi
If αi = α/m for all i, then FWER ≤ α.
Dissertation Defense Garrett Saunders
1. Introduction
Weighted Bonferroni Adjustment: test Hi at level αi, s.t.∑αi ≤ α
Since
FWER = PHi
(⋃Ri
)≤
∑PHi (Ri) =
∑αi
So long as ∑αi ≤ α,
then FWER ≤ α.
Dissertation Defense Garrett Saunders
1. Introduction
Holm’s Sequential Bonferroni: test ordered H(j) at level α/(m− j+1)
Let Pi denoted the p-value for Hi.
Let I ⊆ {1, . . . ,m} index the true Hi, |I| = k ≤ m.
Then,
P(Pi >
α
kfor all i ∈ I
)= 1 − P
(Pi ≤
α
kfor some i ∈ I
)≥ 1 −
∑i∈I
P(Pi ≤
α
k
)≥ 1 − k
α
k= 1 − α.
Since m − j + 1 ≥ k, FWER ≤ α.
Dissertation Defense Garrett Saunders
1. Introduction
Closed Testing: reject wi iff all wj ⊂ wi are rejected at level α
Let W be a set of hypotheses.W is closed under intersection if:
for any two hypotheses Hi,Hj ∈ W,w = Hi ∩ Hj is also in W.
An example
Consider the elementary hypotheses H1, H2, and H3.
Let w1 = H1, w2 = H2, w3 = H3, and
w4 = H1∩H2, w5 = H1∩H3, w6 = H2∩H3, and w7 = H1∩H2∩H3
W = {w1, . . . ,w7} is a set of hypotheses closed underintersections.
Dissertation Defense Garrett Saunders
1. Introduction
Closed Testing: reject wi iff all wj ⊂ wi are rejected at level α
W = {w1, . . . ,w7} is a set of hypotheses closed underintersections.
w1 w2 w3
w4 w5 w6
w7
H1 H2 H3
H1 ∩H2
H1 ∩H3
H2 ∩H3
H1 ∩H2 ∩H3
1
Dissertation Defense Garrett Saunders
1. Introduction
Generalized Weighted Bonferroni Testing
H1 H2 H3
1/2 1/2
1/21/2
1/2
1/2
α/3 α/3 α/3
1
∑mi=1 αi ≤ α
0 ≤ gij ≤ 1, gii = 0, and∑m
k=1 gik ≤ 1 for all i, j = 1, . . . ,m.
Dissertation Defense Garrett Saunders
1. Introduction
Generalized Weighted Bonferroni Testing
H2 H3
1
1
α/2 α/2
H3
α(a) (b)
1
∑mi=1 αi ≤ α
0 ≤ gij ≤ 1, gii = 0, and∑m
k=1 gik ≤ 1 for all i, j = 1, . . . ,m.
Dissertation Defense Garrett Saunders
Contents
1 Introduction2 FWER Control within Gene Ontology Graphs3 A Power Improving Multiplicity Correction for
Large-Scale SNP Selection in LD Based QTL Mapping4 QTL Mapping: Hypotheses and Approaches5 Discussion
Dissertation Defense Garrett Saunders
2. FWER Control within GO Graphs
2
84
47
113
108
71
51
7
69
36 37
57
94
102
112
89
99
109 110
9897
106105
64
83
16
67 68
21 27
53
26
90
23 28
1
3
24
107
20
46
18
43
86
91 93
100
111
92
101
3835
29
74 75
55
4
81
49
6
33
88
96
5
63
42
77
12
48
13
72
5039
65
85
95
8
31
82
19
60
78
32
59
1715
4541 44
61
14 34
70
103 104
40
9
22
11
56
25
76
73
52
10
30
54
58 6662
8079
87
1 GO:0008150 2 GO:0000003 3 GO:0008152 4 GO:0009987 5 GO:0016265 6 GO:0016032 7 GO:0002376 8 GO:0040011 9 GO:005089610 GO:007184011 GO:0051704
12 GO:001995313 GO:002241414 GO:004428115 GO:004423816 GO:000680717 GO:004423718 GO:000905819 GO:004317020 GO:000905621 GO:000692822 GO:0051301
Dissertation Defense Garrett Saunders
2. FWER Control within GO Graphs
(Toy) Example GO Graph
A
B F
DEC
Dissertation Defense Garrett Saunders
2. FWER Control within GO Graphs
Focus Level Method (Goeman and Mansmann 2008)
Applies a top-down and a bottom-up approach.
A
B F
DEC
A
B F
CDE
DECD CE
D EC
BF
A
B CDF
CDE CF
CD DE CE
DF
F
C D E
(a) (b) (c)
Dissertation Defense Garrett Saunders
2. FWER Control within GO Graphs
Focus Level Method Short Focus Level
A
B F
DEC
A
B F
CDE
DECD CE
D EC
BF
A
B CDF
CDE CF
CD DE CE
DF
F
C D E
(a) (b) (c)A
B F
DEC
α/3α/3
α/3
1⁄2
1⁄2 1⁄2
A
B
α/2
F
α/2
DEC
1⁄31⁄3
1⁄3 1
1⁄2
1⁄2
1⁄21⁄2
1⁄2
1⁄2
A
α
B F
DEC
1⁄2 1⁄2
1⁄31⁄3
1⁄3 11
1
1
Dissertation Defense Garrett Saunders
2. FWER Control within GO Graphs
Table: Summary of power calculations for Simulation 1.
MeanNode Computation
n Method A B F C D E Time (sec)5 FL 0.447 0.428 0.132 0.142 0.135 0.130 0.426134
SFL 0.447 0.366 0.120 0.092 0.083 0.122 0.001778
20 FL 0.574 0.567 0.180 0.186 0.192 0.179 0.102097SFL 0.574 0.552 0.178 0.184 0.188 0.179 0.001789
100 FL 0.642 0.635 0.202 0.220 0.207 0.201 0.355848SFL 0.642 0.623 0.201 0.217 0.204 0.201 0.001793
FL: Focus LevelSFL: Short Focus Level
Dissertation Defense Garrett Saunders
2. FWER Control within GO Graphs
Simulation 2
05 06 07 08 09 10 11 12
0302 04
01
13 14
(The closure of this graph contains 574 nodes.)
Dissertation Defense Garrett Saunders
2. FWER Control within GO Graphs
Table: Results of the power analysis under Simulation 2.
GO:01 GO:02 GO:03 GO:04 GO:06 GO:07 GO:10 GO:11 GO:13
FL 0.995 0.968 0.890 0.462 0.512 0.872 0.380 0.399 0.344SFL 0.995 0.988 0.952 0.543 0.837 0.949 0.489 0.476 0.445
FL: Focus LevelSFL: Short Focus Level
Computation Time
FL 3:42:938SFL 0:00:015
Dissertation Defense Garrett Saunders
2. FWER Control within GO Graphs
2
84
47
113
108
71
51
7
69
36 37
57
94
102
112
89
99
109 110
9897
106105
64
83
16
67 68
21 27
53
26
90
23 28
1
3
24
107
20
46
18
43
86
91 93
100
111
92
101
3835
29
74 75
55
4
81
49
6
33
88
96
5
63
42
77
12
48
13
72
5039
65
85
95
8
31
82
19
60
78
32
59
1715
4541 44
61
14 34
70
103 104
40
9
22
11
56
25
76
73
52
10
30
54
58 6662
8079
87
1 GO:0008150 2 GO:0000003 3 GO:0008152 4 GO:0009987 5 GO:0016265 6 GO:0016032 7 GO:0002376 8 GO:0040011 9 GO:005089610 GO:007184011 GO:0051704
12 GO:001995313 GO:002241414 GO:004428115 GO:004423816 GO:000680717 GO:004423718 GO:000905819 GO:004317020 GO:000905621 GO:000692822 GO:0051301
(Computation took 3 minutes and 23 seconds. Original graph contained 5,687 nodes.)
Dissertation Defense Garrett Saunders
Contents
1 Introduction2 FWER Control within Gene Ontology Graphs3 A Power Improving Multiplicity Correction for
Large-Scale SNP Selection in LD Based QTL Mapping4 QTL Mapping: Hypotheses and Approaches5 Discussion
Dissertation Defense Garrett Saunders
3. FWER Control in LD QTL Mapping
1 2 3 4 5
Want to know if:
a QTL exists,
the QTL is linked to any markers.
Dissertation Defense Garrett Saunders
3. FWER Control in LD QTL Mapping
1 2 3 4 5
HL0 HL
0 HL0 HL
0 HL0
HD0 HD
0 HD0 HD
0 HD0
HL0 : No QTL exists.
HD0 : QTL is unlinked with marker.
Dissertation Defense Garrett Saunders
3. FWER Control in LD QTL Mapping
1 2 3 4 5
HL0 HL
0 HL0 HL
0 HL0
HD0 HD
0 HD0 HD
0 HD0
L(p, q,D, µ1, . . . , µG, σ |Y,M) =n∏
i=1
G∑g=1
ωg|Mi(p, q,D)f (Yi|µg, σ)
(D is not identifiable under HL0 )
Dissertation Defense Garrett Saunders
3. FWER Control in LD QTL Mapping
HL0
α
HD0
0
HD0
α
A B
1
Figure: A) Demonstration of the GBA testing scheme for a single marker.B) The updated graph after finding HL
0 significant.
Dissertation Defense Garrett Saunders
3. FWER Control in LD QTL Mapping
HL10 HL2
0 HL30
HD10 HD2
0 HD30
α/3 α/3 α/3
0 0 011
11/2
1/2
1/2 1/2
1/2
1/2
Figure: Demonstration of the hierarchy of the GBA testing scheme forthree markers.
Dissertation Defense Garrett Saunders
3. FWER Control in LD QTL Mapping
HL20
HD10
HD20
HD30
α/3
α/3
0
α/3
1
1/2
1/2
1/2 1/2
1/2
Figure: Demonstration of the GBA testing scheme for three markersassuming that hypotheses HL1
0 and HL30 from the initial graph in Figure 2
are rejected.
Dissertation Defense Garrett Saunders
3. FWER Control in LD QTL Mapping
HL20
HD20
HD30
α/2
0
α/2
11
1
HL20
HD20
α
0 1
A B
Figure: A) The updated graph from Figure 3 assuming the hypothesisHD1
0 of Figure 3 is rejected at the α/3-level. B) Graph resulting from therejection of the hypothesis HD3
0 at the α/2-level.
Dissertation Defense Garrett Saunders
3. FWER Control in LD QTL Mapping
Conditions under which the GBA simplifies to an IUT (+Holm).
Let HU0 denote the union hypothesis HL
0 ∪ HD0 .
Let PU denote the p-value for the IUT of HU0 .
Let k denote the marker with arg min PUi < α/m.
Then,
mpLk = PL
k ≤ PDk = m max{pL
k , pDk } = mPM
k = PMk ≤ α
where m is number of markers, pLk and pD
k are raw p-values formarker k, Pk are GBA adjusted p-values for marker k, and PM
kdenotes the raw IUT p-value for marker k.
Dissertation Defense Garrett Saunders
3. FWER Control in LD QTL Mapping
Number of SNPs
Pow
er
Number of SNPs
Pow
er
Number of SNPs
Pow
er
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
GBABonferroni
n = 500n = 300n = 100
H2 = 0.1A
1 10 50 100 500 1000
Number of SNPs
Pow
er
Number of SNPs
Pow
er
Number of SNPs
Pow
er
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0H2 = 0.4B
1 10 50 100 500 1000
Figure: Power comparison between the graphical Bonferroni adjustment(GBA) and standard Bonferroni adjustment under different sample size,number of SNPs, and heritability (A: H2 = 0.1, B: H2 = 0.4).
Dissertation Defense Garrett Saunders
3. FWER Control in LD QTL Mapping
−0.04 −0.02 0.00 0.02 0.04
−0.
050.
000.
050.
10
−0.04 −0.02 0.00 0.02 0.04
−0.
050.
000.
050.
10
x
y
AA
Aa
aa
PC4 SNP 1
−0.04 −0.02 0.00 0.02 0.04
−0.
050.
000.
050.
10
−0.04 −0.02 0.00 0.02 0.04
−0.
050.
000.
050.
10
x
y
AA
Aa
aa
PC4 SNP 1
Figure: The control of leaf shape for different genotypes (AA, Aa, aa) ofthe QTL identified by marker 1 on PC 4.
Dissertation Defense Garrett Saunders
3. FWER Control in LD QTL Mapping
Chromosome mb
−lo
g p a
djus
ted
1020
303
10 15 17 191 2 3 4 5 6 7 8 9 11 12 13 14 16 18 X
Figure: The negative log of the GBA-adjusted p-values for HD0 for each
SNP in the mouse HDL cholesterol QTL mapping project.
Link
Dissertation Defense Garrett Saunders
Contents
1 Introduction2 FWER Control within Gene Ontology Graphs3 A Power Improving Multiplicity Correction for
Large-Scale SNP Selection in LD Based QTL Mapping4 QTL Mapping: Hypotheses and Approaches5 Discussion
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
HA: a linked QTL.
L(p, q,D, µ1, . . . , µG, σ |Y,M) =n∏
i=1
G∑g=1
ωg|Mi(p, q,D)f (Yi|µg, σ).
(1)
H20 : an unlinked QTL.
L(q, µ1, . . . , µG, σ |Y) =n∏
i=1
G∑g=1
ωg(q)f (Yi|µg, σ) (2)
H10 : no QTL.
L(µ, σ |Y) =n∏
i=1
f (Yi|µ, σ). (3)
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
Test for association between QTL and phenotype Y.
HL0 : µ1 = µ2 = µ3 ≡ µ vs
HL1 : one of the equalities above does not hold.
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
Test for linkage between SNP and QTL.
HD0 : D = 0 vs HD
1 : D , 0. (4)
χ2D =
nD2
p(1 − p)q(1 − q)∼ χ2
1 (5)
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
LRTS
Fn(x
)
ν = 10
ν = 1
n=100n=300n=500
Figure: Likelihood Ratio Test of H10 against HA for synthetic data
simulated under the null hypothesis of no QTL, H10 .
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
ν = 10
ν = 1
no QTLknown QTLunlinked QTL
Figure: The empirical cumulative density functions corresponding to thetest of D = 0 for three scenarios.
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
The (Bivariate) Null Kernel Method1 Simulate s data sets, each of size n, based on the model
assumptions of the joint hypothesis H0.2 Calculate Ti and Ui for i = 1, . . . , s.3 Estimate the joint density f of T and U using a kernel density
estimation technique on the Ti and Ui.4 Compute the cdf F of f by F(c) =
∫A(c) f , where
A(c) = {(t, u)|f (t, u) ≥ c}.5 The joint p-value for the calculated statistics t and u can then
be obtained by the formula p = 1 − F(f (t, u)).
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
The Null Kernel Method vs. Hotelling’s T2 (bivariate test of location)
T2 = n(X − µ0)′S−1(X − µ0) ∼ 2(n − 1)/(n − 2)F2,n−2 (6)
−6
−4
−2
0
2
4
T−statistic
U−
stat
istic
−4 −2 0 2 4
0.0001
0.001
0.01
0.05
Figure: Visualization of the Null Kernel method as applied to a sample of1,000 T and U statistics simulated under the bivariate normal nulldistribution with zero mean, unit variances, and covariances of 0.3.
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
The Null Kernel Method vs. Hotelling’s T2 (bivariate test of location)
0
1
2
3
4
5
6
Null Kernel −log10(p)
T2
−lo
g10(
p)
0 2 4 6 8
III
II I
IV
Figure: Comparison of P-values (− log10(p)) obtained from either the NullKernel method or Hotelling’s T2 test.
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
The Null Kernel Method vs. Hotelling’s T2 (bivariate test of location)
0.0
0.5
1.0
1.5
2.0
2.5
Null Kernel −log10(p)
T2
−lo
g10(
p)
0.0 0.5 1.0 1.5
III
II I
IV
Figure: The − log10 of the P-values from the Null Kernel and Hotelling’sT2 methods for data simulated consistent with the null hypothesis.
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
QTL mapping Simulation Study
0
10
20
30
40
χD2
χ L2
0 20 40 60 80
0.1 0.1
0.05
0.001 0.0001
1e −5
Figure: Visualization of the Null Kernel estimated (null) density for thebivariate data corresponding to the test of HD
0 , χ2D, and HL
0 , χ2L.
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
QTL mapping Simulation Study
0
2
4
6
8
10
Null Kernel
0 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 100
−lo
g10(
p)
Chromosome 1 Chromosome 2 Chromosome 3 Chromosome 4
Figure: The resulting adjusted P-values from each of the permutation,simulation, and theoretical approaches against the results of the NullKernel method.
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
Mice HDL QTL mapping study
Chromosome mb
−lo
g p a
djus
ted
1030
3
10 15 17 191 2 3 4 5 6 7 8 9 11 12 13 14 16 18 X
Figure: The negative log of the Holm adjusted P-values for the NullKernel approach.
Dissertation Defense Garrett Saunders
4. QTL Mapping: Hypotheses and Approaches
0 50 100 150 200 250
020
4060
8010
0
χD2
χ L2
Figure: The joint plot of the observed test statistics for the mouse HDLQTL mapping data.
Dissertation Defense Garrett Saunders
Contents
1 Introduction2 FWER Control within Gene Ontology Graphs3 A Power Improving Multiplicity Correction for
Large-Scale SNP Selection in LD Based QTL Mapping4 QTL Mapping: Hypotheses and Approaches5 Discussion
Dissertation Defense Garrett Saunders
5. Discussion
“Professional statisticians... bear an obligation to offeralternatives (or entirely new approaches) that meet realneeds and are practical as well.” (J. W. Tukey, as quotedin Benjamini and Braun 2001)
Dissertation Defense Garrett Saunders
5. Discussion
1 Introduction2 FWER Control within Gene Ontology Graphs
Extended GBA methods to Restricted Hypotheses (Theorem 1).Introduced the Short Focus Level method (code in: mvGST).Quantified the computational advantage.
3 A Power Improving Multiplicity Correction for Large-Scale SNPSelection in LD Based QTL Mapping
Introduced a GBA approach for LD based QTL mapping.Protects model identifiability and strong FWER control.Quantified the power increase numerically and practically.
4 QTL Mapping: Hypotheses and ApproachesDetailed problems of χ2 assumptions in LD based QTL mapping.Introduced the Null Kernel method.Showed power and computational advantages of the NK method.
5 Discussion
Dissertation Defense Garrett Saunders
Acknowledgements
This work was supported by
Utah Agricultural Experiment Station (UAES) project numberUTA01062, associated with the W2112 multi-state project“Reproductive Performance in Domestic Ruminants”
Utah State University VPR Research Catalyst Grant.