family-wise error rate control in qtl mapping and gene ...jrstevens/stat5570/garrett_small.pdf ·...

Dissertation Defense

Family-wise Error Rate Control inQTL Mapping and Gene Ontology Graphs

with Remarks on Family Selection

Garrett Saunders – Dissertation Defense

April 15, 2014

Dissertation Defense Garrett Saunders

Contents

1 Introduction2 FWER Control within Gene Ontology Graphs3 A Power Improving Multiplicity Correction for

Large-Scale SNP Selection in LD Based QTL Mapping4 QTL Mapping: Hypotheses and Approaches5 Discussion


1. Introduction

“If enough statistics are computed, some of them will besure to show structure.” (Diaconis 1985)

0 20 40 60 80 100

0.2

0.4

0.6

0.8

1.0

Number of Simultaneous Tests

P(a

ny ty

pe I

erro

r)

(m)


1. Introduction

40's & 50's

Great spurt

of MCP activity

1976

Mar

cus

et al

Hoc

hber

g &

Tam

hane

1966

Mille

r

Tipp

et

Cou

rnot

19311843 '87 '93

Wes

tfall

&Yo

ung

'96

Hsu

10

20

30

'00 '13

Number of MCP articles from four leading

journals by year from 1965 to 2008.

Denotes a world-wide

conference on MCPs.


1. Introduction

“The Vitality of [the] field in the future – as a researcharea – depends upon [the researcher’s] ability to continueand address the real needs of statistical analysis incurrent problems” (Benjamini, 2010).


1. Introduction

Declared Declarednon-significant significant Total

True null hypothesis U V m0False null hypothesis T S m − m0

m − R R m

The Per Comparison Error Rate (PCER): E(V/m)

The Familywise Error Rate (FWER): P(V ≥ 1)

The False Discovery Rate (FDR): E(V/R)


1. Introduction

Selecting a Family of Hypotheses

A subjective, but important decision.

“Any collection of inferences for which it is meaningful to takeinto account some combined measure of errors.” (Hochberg &Tamhane 1987)

Gatekeeping (Bretz et al. 2009)


1. Introduction

The Bonferroni Adjustment: test each Hi at level α/mBoole’s Inequality:

P(A∪B) ≤ P(A)+P(B) . . . or generally P(⋃

Ai)≤

∑P(Ai)

Let Ri denote the event that hypothesis Hi is rejected.

Then, if PHi(Ri) = αi,

FWER = PHi

(⋃Ri

)≤

∑PHi (Ri) =

∑αi

If αi = α/m for all i, then FWER ≤ α.


1. Introduction

Weighted Bonferroni Adjustment: test Hi at level αi, s.t.∑αi ≤ α

Since

FWER = PHi

(⋃Ri

)≤

∑PHi (Ri) =

∑αi

So long as ∑αi ≤ α,

then FWER ≤ α.


1. Introduction

Holm’s Sequential Bonferroni: test ordered H(j) at level α/(m− j+1)

Let Pi denoted the p-value for Hi.

Let I ⊆ {1, . . . ,m} index the true Hi, |I| = k ≤ m.

Then,

P(Pi >

α

kfor all i ∈ I

)= 1 − P

(Pi ≤

α

kfor some i ∈ I

)≥ 1 −

∑i∈I

P(Pi ≤

α

k

)≥ 1 − k

α

k= 1 − α.

Since m − j + 1 ≥ k, FWER ≤ α.


1. Introduction

Closed Testing: reject wi iff all wj ⊂ wi are rejected at level α

Let W be a set of hypotheses.W is closed under intersection if:

for any two hypotheses Hi,Hj ∈ W,w = Hi ∩ Hj is also in W.

An example

Consider the elementary hypotheses H1, H2, and H3.

Let w1 = H1, w2 = H2, w3 = H3, and

w4 = H1∩H2, w5 = H1∩H3, w6 = H2∩H3, and w7 = H1∩H2∩H3

W = {w1, . . . ,w7} is a set of hypotheses closed underintersections.


1. Introduction

Closed Testing: reject wi iff all wj ⊂ wi are rejected at level α

W = {w1, . . . ,w7} is a set of hypotheses closed underintersections.

w1 w2 w3

w4 w5 w6

w7

H1 H2 H3

H1 ∩H2

H1 ∩H3

H2 ∩H3

H1 ∩H2 ∩H3

1


1. Introduction

Generalized Weighted Bonferroni Testing

H1 H2 H3

1/2 1/2

1/21/2

1/2

1/2

α/3 α/3 α/3

1

∑mi=1 αi ≤ α

0 ≤ gij ≤ 1, gii = 0, and∑m

k=1 gik ≤ 1 for all i, j = 1, . . . ,m.


1. Introduction

Generalized Weighted Bonferroni Testing

H2 H3

1

1

α/2 α/2

H3

α(a) (b)

1

∑mi=1 αi ≤ α

0 ≤ gij ≤ 1, gii = 0, and∑m

k=1 gik ≤ 1 for all i, j = 1, . . . ,m.


Contents




2. FWER Control within GO Graphs

2

84

47

113

108

71

51

7

69

36 37

57

94

102

112

89

99

109 110

9897

106105

64

83

16

67 68

21 27

53

26

90

23 28

1

3

24

107

20

46

18

43

86

91 93

100

111

92

101

3835

29

74 75

55

4

81

49

6

33

88

96

5

63

42

77

12

48

13

72

5039

65

85

95

8

31

82

19

60

78

32

59

1715

4541 44

61

14 34

70

103 104

40

9

22

11

56

25

76

73

52

10

30

54

58 6662

8079

87

1 GO:0008150 2 GO:0000003 3 GO:0008152 4 GO:0009987 5 GO:0016265 6 GO:0016032 7 GO:0002376 8 GO:0040011 9 GO:005089610 GO:007184011 GO:0051704

12 GO:001995313 GO:002241414 GO:004428115 GO:004423816 GO:000680717 GO:004423718 GO:000905819 GO:004317020 GO:000905621 GO:000692822 GO:0051301



(Toy) Example GO Graph

A

B F

DEC



Focus Level Method (Goeman and Mansmann 2008)

Applies a top-down and a bottom-up approach.

A

B F

DEC

A

B F

CDE

DECD CE

D EC

BF

A

B CDF

CDE CF

CD DE CE

DF

F

C D E

(a) (b) (c)



Focus Level Method Short Focus Level

A

B F

DEC

A

B F

CDE

DECD CE

D EC

BF

A

B CDF

CDE CF

CD DE CE

DF

F

C D E

(a) (b) (c)A

B F

DEC

α/3α/3

α/3

1⁄2

1⁄2 1⁄2

A

B

α/2

F

α/2

DEC

1⁄31⁄3

1⁄3 1

1⁄2

1⁄2

1⁄21⁄2

1⁄2

1⁄2

A

α

B F

DEC

1⁄2 1⁄2

1⁄31⁄3

1⁄3 11

1

1



Table: Summary of power calculations for Simulation 1.

MeanNode Computation

n Method A B F C D E Time (sec)5 FL 0.447 0.428 0.132 0.142 0.135 0.130 0.426134

SFL 0.447 0.366 0.120 0.092 0.083 0.122 0.001778

20 FL 0.574 0.567 0.180 0.186 0.192 0.179 0.102097SFL 0.574 0.552 0.178 0.184 0.188 0.179 0.001789

100 FL 0.642 0.635 0.202 0.220 0.207 0.201 0.355848SFL 0.642 0.623 0.201 0.217 0.204 0.201 0.001793

FL: Focus LevelSFL: Short Focus Level



Simulation 2

05 06 07 08 09 10 11 12

0302 04

01

13 14

(The closure of this graph contains 574 nodes.)



Table: Results of the power analysis under Simulation 2.

GO:01 GO:02 GO:03 GO:04 GO:06 GO:07 GO:10 GO:11 GO:13

FL 0.995 0.968 0.890 0.462 0.512 0.872 0.380 0.399 0.344SFL 0.995 0.988 0.952 0.543 0.837 0.949 0.489 0.476 0.445

FL: Focus LevelSFL: Short Focus Level

Computation Time

FL 3:42:938SFL 0:00:015



2

84

47

113

108

71

51

7

69

36 37

57

94

102

112

89

99

109 110

9897

106105

64

83

16

67 68

21 27

53

26

90

23 28

1

3

24

107

20

46

18

43

86

91 93

100

111

92

101

3835

29

74 75

55

4

81

49

6

33

88

96

5

63

42

77

12

48

13

72

5039

65

85

95

8

31

82

19

60

78

32

59

1715

4541 44

61

14 34

70

103 104

40

9

22

11

56

25

76

73

52

10

30

54

58 6662

8079

87

1 GO:0008150 2 GO:0000003 3 GO:0008152 4 GO:0009987 5 GO:0016265 6 GO:0016032 7 GO:0002376 8 GO:0040011 9 GO:005089610 GO:007184011 GO:0051704

12 GO:001995313 GO:002241414 GO:004428115 GO:004423816 GO:000680717 GO:004423718 GO:000905819 GO:004317020 GO:000905621 GO:000692822 GO:0051301

(Computation took 3 minutes and 23 seconds. Original graph contained 5,687 nodes.)


Contents




3. FWER Control in LD QTL Mapping

1 2 3 4 5

Want to know if:

a QTL exists,

the QTL is linked to any markers.



1 2 3 4 5

HL0 HL

0 HL0 HL

0 HL0

HD0 HD

0 HD0 HD

0 HD0

HL0 : No QTL exists.

HD0 : QTL is unlinked with marker.



1 2 3 4 5

HL0 HL

0 HL0 HL

0 HL0

HD0 HD

0 HD0 HD

0 HD0

L(p, q,D, µ1, . . . , µG, σ |Y,M) =n∏

i=1

G∑g=1

ωg|Mi(p, q,D)f (Yi|µg, σ)

(D is not identifiable under HL0 )



HL0

α

HD0

0

HD0

α

A B

1

Figure: A) Demonstration of the GBA testing scheme for a single marker.B) The updated graph after finding HL

0 significant.



HL10 HL2

0 HL30

HD10 HD2

0 HD30

α/3 α/3 α/3

0 0 011

11/2

1/2

1/2 1/2

1/2

1/2

Figure: Demonstration of the hierarchy of the GBA testing scheme forthree markers.



HL20

HD10

HD20

HD30

α/3

α/3

0

α/3

1

1/2

1/2

1/2 1/2

1/2

Figure: Demonstration of the GBA testing scheme for three markersassuming that hypotheses HL1

0 and HL30 from the initial graph in Figure 2

are rejected.



HL20

HD20

HD30

α/2

0

α/2

11

1

HL20

HD20

α

0 1

A B

Figure: A) The updated graph from Figure 3 assuming the hypothesisHD1

0 of Figure 3 is rejected at the α/3-level. B) Graph resulting from therejection of the hypothesis HD3

0 at the α/2-level.



Conditions under which the GBA simplifies to an IUT (+Holm).

Let HU0 denote the union hypothesis HL

0 ∪ HD0 .

Let PU denote the p-value for the IUT of HU0 .

Let k denote the marker with arg min PUi < α/m.

Then,

mpLk = PL

k ≤ PDk = m max{pL

k , pDk } = mPM

k = PMk ≤ α

where m is number of markers, pLk and pD

k are raw p-values formarker k, Pk are GBA adjusted p-values for marker k, and PM

kdenotes the raw IUT p-value for marker k.



Number of SNPs

Pow

er

Number of SNPs

Pow

er

Number of SNPs

Pow

er

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

GBABonferroni

n = 500n = 300n = 100

H2 = 0.1A

1 10 50 100 500 1000

Number of SNPs

Pow

er

Number of SNPs

Pow

er

Number of SNPs

Pow

er

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0H2 = 0.4B

1 10 50 100 500 1000

Figure: Power comparison between the graphical Bonferroni adjustment(GBA) and standard Bonferroni adjustment under different sample size,number of SNPs, and heritability (A: H2 = 0.1, B: H2 = 0.4).



−0.04 −0.02 0.00 0.02 0.04

−0.

050.

000.

050.

10

−0.04 −0.02 0.00 0.02 0.04

−0.

050.

000.

050.

10

x

y

AA

Aa

aa

PC4 SNP 1

−0.04 −0.02 0.00 0.02 0.04

−0.

050.

000.

050.

10

−0.04 −0.02 0.00 0.02 0.04

−0.

050.

000.

050.

10

x

y

AA

Aa

aa

PC4 SNP 1

Figure: The control of leaf shape for different genotypes (AA, Aa, aa) ofthe QTL identified by marker 1 on PC 4.



Chromosome mb

−lo

g p a

djus

ted

1020

303

10 15 17 191 2 3 4 5 6 7 8 9 11 12 13 14 16 18 X

Figure: The negative log of the GBA-adjusted p-values for HD0 for each

SNP in the mouse HDL cholesterol QTL mapping project.

Link

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3284324/figure/fig3/


Contents





Test for association between QTL and phenotype Y.

HL0 : µ1 = µ2 = µ3 ≡ µ vs

HL1 : one of the equalities above does not hold.



Test for linkage between SNP and QTL.

HD0 : D = 0 vs HD

1 : D , 0. (4)

χ2D =

nD2

p(1 − p)q(1 − q)∼ χ2

1 (5)



0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

LRTS

Fn(x

)

ν = 10

ν = 1

n=100n=300n=500

Figure: Likelihood Ratio Test of H10 against HA for synthetic data

simulated under the null hypothesis of no QTL, H10 .



0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

ν = 10

ν = 1

no QTLknown QTLunlinked QTL

Figure: The empirical cumulative density functions corresponding to thetest of D = 0 for three scenarios.



The (Bivariate) Null Kernel Method1 Simulate s data sets, each of size n, based on the model

assumptions of the joint hypothesis H0.2 Calculate Ti and Ui for i = 1, . . . , s.3 Estimate the joint density f of T and U using a kernel density

estimation technique on the Ti and Ui.4 Compute the cdf F of f by F(c) =

∫A(c) f , where

A(c) = {(t, u)|f (t, u) ≥ c}.5 The joint p-value for the calculated statistics t and u can then

be obtained by the formula p = 1 − F(f (t, u)).



The Null Kernel Method vs. Hotelling’s T2 (bivariate test of location)

T2 = n(X − µ0)′S−1(X − µ0) ∼ 2(n − 1)/(n − 2)F2,n−2 (6)

−6

−4

−2

0

2

4

T−statistic

U−

stat

istic

−4 −2 0 2 4

0.0001

0.001

0.01

0.05

Figure: Visualization of the Null Kernel method as applied to a sample of1,000 T and U statistics simulated under the bivariate normal nulldistribution with zero mean, unit variances, and covariances of 0.3.




0

1

2

3

4

5

6

Null Kernel −log10(p)

T2

−lo

g10(

p)

0 2 4 6 8

III

II I

IV

Figure: Comparison of P-values (− log10(p)) obtained from either the NullKernel method or Hotelling’s T2 test.




0.0

0.5

1.0

1.5

2.0

2.5

Null Kernel −log10(p)

T2

−lo

g10(

p)

0.0 0.5 1.0 1.5

III

II I

IV

Figure: The − log10 of the P-values from the Null Kernel and Hotelling’sT2 methods for data simulated consistent with the null hypothesis.



QTL mapping Simulation Study

0

10

20

30

40

χD2

χ L2

0 20 40 60 80

0.1 0.1

0.05

0.001 0.0001

1e −5

Figure: Visualization of the Null Kernel estimated (null) density for thebivariate data corresponding to the test of HD

0 , χ2D, and HL

0 , χ2L.



QTL mapping Simulation Study

0

2

4

6

8

10

Null Kernel

0 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 100

−lo

g10(

p)

Chromosome 1 Chromosome 2 Chromosome 3 Chromosome 4

Figure: The resulting adjusted P-values from each of the permutation,simulation, and theoretical approaches against the results of the NullKernel method.



Mice HDL QTL mapping study

Chromosome mb

−lo

g p a

djus

ted

1030

3

10 15 17 191 2 3 4 5 6 7 8 9 11 12 13 14 16 18 X

Figure: The negative log of the Holm adjusted P-values for the NullKernel approach.



0 50 100 150 200 250

020

4060

8010

0

χD2

χ L2

Figure: The joint plot of the observed test statistics for the mouse HDLQTL mapping data.


Contents




5. Discussion

“Professional statisticians... bear an obligation to offeralternatives (or entirely new approaches) that meet realneeds and are practical as well.” (J. W. Tukey, as quotedin Benjamini and Braun 2001)


5. Discussion

1 Introduction2 FWER Control within Gene Ontology Graphs

Extended GBA methods to Restricted Hypotheses (Theorem 1).Introduced the Short Focus Level method (code in: mvGST).Quantified the computational advantage.

3 A Power Improving Multiplicity Correction for Large-Scale SNPSelection in LD Based QTL Mapping

Introduced a GBA approach for LD based QTL mapping.Protects model identifiability and strong FWER control.Quantified the power increase numerically and practically.

4 QTL Mapping: Hypotheses and ApproachesDetailed problems of χ2 assumptions in LD based QTL mapping.Introduced the Null Kernel method.Showed power and computational advantages of the NK method.

5 Discussion


Acknowledgements

This work was supported by

Utah Agricultural Experiment Station (UAES) project numberUTA01062, associated with the W2112 multi-state project“Reproductive Performance in Domestic Ruminants”

Utah State University VPR Research Catalyst Grant.

family-wise error rate control in qtl mapping and gene ...jrstevens/stat5570/garrett_small.pdf ·...

Documents