Download - Beyond the traditional simulation design
Beyond the traditional simulation designfor evaluating type 1 error: from ‘theoretical’ to ‘empirical’ null
by
Ting Zhang
A thesis submitted in conformity with the requirements for the degree of Master of Science
Department of Public Health Sciences University of Toronto
c© Copyright 2018 by Ting Zhang
Abstract
Beyond the traditional simulation design
for evaluating type 1 error: from ‘theoretical’ to ‘empirical’ null
Ting Zhang
Master of Science
Department of Public Health Sciences
University of Toronto
2018
When evaluating a new statistical test, its important to check the type 1 error (T1E) control, often
achieved by the ‘theoretical’ design S0. In whole-genome association analyses, people scan through
large numbers of genes (Gs) for the ones associated with an outcome(Y). Y comes from an unknown
alternative and is not associated with the majority of Gs, This reality can be represented by two
‘empirical designs, where S1.1 simulates Y from G then evaluates its association with independent
Gnew; while S1.2 evaluates the association between permutated Yperm and G. Using scale tests,
location tests with single and multiple Gs as examples, we show that not all designs are equal. For
certain statistics, T1E inflation can only be revealed under 1empirical designs and doesnt diminish
as sample size increases. This important observation calls for new practices for methods evaluation
and interpretation of T1E control.
ii
Acknowledgements
The author have no conflict of interest to declare.
The authors would like to thank Dr. Lei Sun for suggesting the topic of this thesis and her kind
supervision.
The author would like to thank Dr. David Soave and Dr. Jerry Lawless for helpful discussions;
and the PhD candidates in Biostatistics, University of Toronto, Bo Chen, Kuan Liu and Kaviul
Anam for assistance and support in this research.
The author would like to thank my dearest friend Xi Luo and my family. Without their love and
caring, I could never have gone this far.
This research was funded by the Natural Sciences and Engineering Research Council of Canada
(NSERC, 250053-2013), and the Canadian Institutes of Health Research (CIHR, 201309MOP-
310732-G-CEAA-117978) to LS.
iii
Contents
Preliminary
Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 Introduction 1
2 Methods and Materials 6
2.1 Three GxE interaction scenarios and corresponding statistical tests . . . . . . . . . . 6
2.1.1 Scenario 1: single G, and E missing . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Scenario 2: single SNP, E is known . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Scenario 3: multiple Gs, and E known . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Three T1E simulation designs: the ‘theoretical’ null S0, and the ‘empirical’ null S1.1
and S1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Data-generating models for the three T1E simulation designs and under the
three interaction scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Simulation Studies 14
3.1 Summary of Simulation models and parameter values . . . . . . . . . . . . . . . . . 14
3.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Discussion 24
Bibliography 26
iv
5 Appendix 31
5.1 Conditional and Marginal Distribution of Y under Alternative Model . . . . . . . . . 31
5.2 Asymptotic Distribution of Likelihood Ratio Statistics for Variance under Non-normality 32
5.3 Score Test on GxE Interaction Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3.1 Dependency of Test Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3.2 Under Non-Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.4 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
v
List of Tables
3.1 Summary of the two data-generating models for indirect and direct GxE interaction
testing, and evaluating the ‘theoretical’ null simulation designs S0 vs. the two the
‘empirical’ null simulation designs S1.1 and S1.2, as described in Sections 2.3.1 and
2.3.2. Assume E was available for direct GxE testing, the Aschard et al. (2013) model
coincides with Model I of Rao and Province (2016), except E was Gnon−repeating. T1E
rate is first estimated from nrep.in simulation replicates in an inner loop in which
E is fixed (similar to one whole-genome GxE interaction scan), then averaged over
nrep.out simulation replicates in an outer loop in which E varies. . . . . . . . . . . 15
3.2 Summary of the data-generating models for multivariate GxE interaction testing,
and evaluating the ‘theoretical’ null simulation designs S0 vs. the two the ‘empirical’
null simulation designs S1.1 and S1.2, as described in Section 2.3.3. T1E rate is first
estimated from nrep.in simulation replicates in an inner loop in which E is fixed
(similar to one whole-genome gene-based GxE interaction scan), then averaged over
nrep.out simulation replicates in an outer loop in which E varies. . . . . . . . . . . 16
3.3 Simulation results of interaction scenario 1: single G, and E missing. Empirical T1E
rates of LRTm and Scorem location tests for mean difference in Y across the three
G groups, and of LRTv and Levene scale tests for variance difference in Y , based on
the ‘theoretical’ null design of S0 and the alternative ‘empirical’ null designs of S1.1
and S1.2. The alternative Y1 data were generated using the Aschard’s genetic model
as described in Table 3.1. Empirical T1E rates outside α ± 3√α× (1− α)/nrep.in
are bolded in red. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
vi
3.4 Simulation results of interaction scenario 1: single G, and E missing; effect of the
nominal α level. Empirical T1E rates of LRTm and Scorem location tests for mean
difference in Y across the three G groups, and of LRTv and Levene scale tests for
variance difference in Y , based on the ‘theoretical’ null design of S0 and the al-
ternative ‘empirical’ null designs of S1.1 and S1.2. The alternative Y1 data were
generated using the Aschard’s genetic model as described in Table 3.1, focusing on
the extreme case of large interaction effect, βGE = 1. Empirical T1E rates outside
α± 3√α× (1− α)/nrep.in are boded in red. . . . . . . . . . . . . . . . . . . . . . 18
3.5 Simulation results of interaction scenario 1: single G, and E missing; effect of sample
size n. Empirical T1E rates of LRTm and Scorem location tests for mean difference
in Y across the three G groups, and of LRTv and Levene scale tests for variance
difference in Y , based on the ‘theoretical’ null design of S0 and the alternative ‘em-
pirical’ null designs of S1.1 and S1.2. The alternative Y1 data were generated using
the Cao’s genetic model as described in Table 3.1, and using two difference sample
sizes of n = 103 and 104. Empirical T1E rates outside α ± 3√α× (1− α)/nrep.in
are bolded in red. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.6 Simulation results of interaction scenario 2: single G, and E known. Empirical T1E
rates of the LRTGE and ScoreGE tests, based on the ‘theoretical’ null design of S0 and
the alternative ‘empirical’ null designs of S1.1 and S1.2. The alternative Y1 data were
generated using the Aschard’s genetic model as described in Table 3.1 when βGE = 1,
but E was assumed to be known in this case and direction interaction testing was
possible. Empirical T1E rates outside α± 3√α× (1− α)/nrep.in are bolded in red. 22
3.7 Simulation results of interaction scenario 3: multiple Gs, and E known. Empirical
T1E rates of the FGE , SKATGE and BurdenGE tests for jointly testing for multiple
interaction effects, based on the ‘theoretical’ null design of S0 and the alternative
‘empirical’ null designs of S1.1 and S1.2. Models and parameters values are given in
Table 3.2. Empirical T1E rates outside α± 3√α× (1− α)/nrep.in are bolded in red. 23
vii
List of Figures
S1 Comparison of the asymptotic distribution (black solid) and finite-sample distribution
(red dashed) of LRTv under the ‘empirical’ null S1.1 or S1.2, with the asymptotic
distribution (χ22, blue dot-dashed) of LRTv under the ‘theoretical’ null S0. Vertical
lines correspond the 99.9% quantile cutoffs for α = 0.001. . . . . . . . . . . . . . . . 20
S1 Marginal and conditional histogram (Top), and normal Q-Q plot (Bottom) of the
empirical phenotype data Y1 when βGE = 1 based on the Aschard’s model as described
in Table 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
S2 Marginal and conditional histogram (Top), and normal Q-Q plot (Bottom) of the em-
pirical phenotype data Y1 when βGE = 0.2 based on the Aschard’s model as described
in Table 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
S3 Histogram of p-values of the LRTv test under S0 (Left column), S1.1 (Middle col-
umn) and S1.2 (Right column) null simulation designs. Simulation of the empirical
phenotype data Y1 was based on the Aschard’s genetic model as described in Table 1
with βGE = 0 (Top row), βGE = 0.2 (Top row), and βGE = 0.6 (Top row). Red line
represents the Uniform (0, 1) distribution. . . . . . . . . . . . . . . . . . . . . . . . . 37
S4 Histograms with theoretical normal distribution (Top row), and normal QQ-plots
(Bottom row) of phenotype Y1 simulated based on the Aschard’s genetic model as
described in Table 2 with βGE = 1 for before transformation (Left column), after
square root (Middle column), and after log transformations (Right column). . . . . . 38
viii
S5 Histogram of p-values of the LRTv test under the S0 (Left column), S1.1 (Middle
column) and S1.2 (Right column) null simulation designs, when phenotype Y1 was
simulated based on the Aschard’s genetic model as described in Table 1 with βGE = 1
for before transformation (Top row), after square root transformation (Middle row),
and after log transformation (Bottom row). . . . . . . . . . . . . . . . . . . . . . . . 39
S6 Scatter plot of nrep.out = 100 estimated T1E rates for whole-genome interaction
GxE scans, each estimated from nrep.in = 105 simulated replicates (or SNPs) with
a fixed E, under the ‘theoretical’ null design S0 as described in Table 1, for sample
size n = 104 (Left) and 105 (Right). . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
ix
Chapter 1
Introduction
Type 1 error (T1E) control evaluation using simulations is always the first step in understanding
the performance of any newly developed statistical test. A test statistics is considered as robust
when it is applicable to a wide range of probability distributions and maintains a correct type 1
error. To formulate the problem more precisely, let us consider the current large-scale genome-wide
association studies (GWAS) or next-generation sequencing (NGS) studies of complex and heritable
traits. A typical GWAS compares two large groups of individuals, one healthy control group and one
case group affected by a disease. And it then scan through millions or more genetic markers (Gs)
across the genome, one at a time, for the ones associated with the disease trait (Y ), while accounting
for covariates. And for Y -G association tests, type 1 error is made when a test claims statistical
significant association between independent Y -G pair, or more technically, p-value being greater
than a threshold value α and null hypothesis of independence being rejected. And if the proportion
of type 1 errors in a large numbers of tests on simulated Y -G pairs equals to the nominal level α,
we say the test maintain a correct T1E control. Many Y -G association tests have been developed
and they often require the assumption of (approximately) normally distributed errors. However,
independence between Y -G does not restrict the distribution of Y to be normal, and among all these
tests, some are more robust to non-normality than others. For example, Bartlett test for variance
heterogeneity has been shown to have large inflated T1E rates when the error term e follows a t-
or χ2-distribution (Struchalin et al. 2010), and the likelihood ratio test (LRT) is similarly sensitive
(Cao et al. 2014), while Levene’s test appears to be more robust (Soave, Corvol, et al. 2015; Soave
and L. Sun 2017).
1
Chapter 1. Introduction 2
Standard T1E simulation design, denoted as S0, generates phenotype data Y0 = e under the ‘the-
oretical’ null model of no association, then independently generates genotype data G and estimates
the empirical T1E of a certain test statistics for H0 : βG = 0 from a fitted model Y0 = bGG; for
notation simplicity and without loss of generality, intercept and additional covariates Zs are omitted
from the conceptual expression of the regression model. A method is generally considered sound
if T1E is well controlled under the e ∼ N(0, σ2) assumption, and robustness is then evaluated by
assuming other distribution forms for e, typically t or χ2 distribution. Given statistical accuracy of
T1E control, statistical efficiency in terms of power will be studied by generating phenotype under
an alternative, Y1 = G + e and e ∼ N(0, σ2). The sensitivity problems mentioned before can be
revealed by altering error term distribution in T1E evaluation. But it would not match the power
estimation settings, where a normal error is often assumed.
In practice, GWAS and NGS receive one empirical Y that is truly associated with one or more
Gs among all candidates. And the rest, large numbers of Gs are under the null hypothesis of no
association. When examining these independent Y -G pairs, the marginal distribution of Y can be
viewed as identical to some distribution under alternative hypothesis. Now consider two alternative
simulation designs to evaluate T1E control that generate Y similar to the real case. Suppose we are
evaluating a test statistic T for Y -G association, design S1.1 simulates Y1 = G+e from an alternative
hypothesis, then it independently generates Gnew and applies the association test T to fitted model
Y1 = bGGnew. T1E rates is calculated from proportion of significant T (bG) at a nominal level α.
Design S1.2 permutes the simulated Y1 and evaluates T1E rate from fitted model Y perm1 = bGG. The
two designs both achieves the goal that (1) Y is from an ‘empirical’ distribution, identical to some
alternative hypothesis; and (2) a null relationship between Y -G pair is maintained in evaluating
T1E rate. Note that Y1 is in contrast to Y0 in distribution both marginally and conditionally , and
the residual in fitted models may or may be normally distributed.
An important question can then be asked as to whether the ‘empirical’ S1.1 and S1.2 designs
lead to similar T1E conclusion as the S0 design for one specific test statistics. In particular, even if
the e ∼ N(0, σ2) assumption is true and the test appears to be accurate based on the S0 evaluation,
do we expect it to perform well in real data which are better represented by the S1.1 and S1.2
simulation designs? The answers would depend on the type of test statistics used.
Efron (2004) has brought up the discussion of the ‘theoretical’ vs. ‘empirical’ null more than a
decade ago. Focusing on controlling the false discovery rate (FDR), Efron (2004) outlined several
Chapter 1. Introduction 3
possible sources of non-normality including unobserved covariates and hidden correlation, that may
cause the FDR under ‘theoretical’ and ‘empirical’ null to be different. He also proposed an em-
pirical Bayes approach to solve the problem and address the importance of choosing a correct null
hypothesis. In this thesis, we study the practical implications of T1E evaluation based on the the
commonly used ‘theoretical’ null simulation design S0 in the context of whole-genome association
scans. We show that while a method may appear to be accurate under S0 assuming normality, it
can have incorrect T1E rates under the ‘empirical’ null of S1.1 or S1.2, also ‘assuming normality’.
The fundamental cause of the discrepancy is that, in evaluating Y1 = bGGnew (or Y perm1 = bGG),
the marginal distribution of Y1 may not be normal even if it was generated with a normal error
term, and the Y1 condition on the true causal G and other covariates is normal. Here, discrepancy
firstly leads to a conclusion that the test statistic is not robust to non-normality, and thus can have
much more false discoveries than expected in real data application. However, instead of focusing on
test statistics level, we hope to address the problem in standard T1E evaluation design. In the big-
ger picture, that many statistics show discrepancy between ‘theoretical’ and ‘empirical’ null designs
shows the flaws of S0 design since its not reflecting the real distribution of Y in data generating
process.
As a proof-of-principle, we will focus on testing the gene-environment(GxE) and gene-gene (GxG)
effects first. Traditional association tests focus on location parameters bGE in the interaction model
Y = bGG+ bEE + bGEGxE(or bGG for GxG interaction). Such interaction effects are expected for
complex traits and we studied three scenarios for the interaction tests.
The first scenario assumes that data on E are not available in practice. Thus, direct GxE inter-
action analysis is not possible. In that case, because the un-modelled interaction induces variance
heterogeneity in Y when conditional only on G, scale tests such as Levene’s test, originally devel-
oped for model diagnostics, can be used to indirectly test for the interaction effect. In contrast to
traditional location tests, which focus on mean difference between groups, scale tests refer to those
focusing on variance differences. We will investigate the several scale tests recently proposed for this
purpose (Pare et al. 2010; Aschard et al. 2013; Cao et al. 2014; Soave, Corvol, et al. 2015). It is
worth-noting that the causes of variance heterogeneity are multifaceted beyond potential interactions
(X. Sun et al. 2013; Dudbridge and Fletcher 2014; Wood et al. 2014). We show that, depending on
the robustness of a test, T1E conclusion may differ between the “theoretical’ and ‘empirical’ null.
The second scenario assumes E was available for direct modelling of the GxE interaction effect.
Chapter 1. Introduction 4
Previously, Voorman et al. (2011) and Rao and Province (2016) showed that T1E rate of testing GxG
in a whole-genome interaction scan can be sensitive to reasons beyond the traditional causes of mean
or variance model mis-specifications. The authors focused on the SNPnon−repeatingxSNPrepeating
analysis, where SNPrepeating represents a fixed SNP of primary interest and one is interested in
studying its interaction with all other SNPs. Statistically, this is similar to GxE that we will be
studying here, because E does not vary between SNPs in a genome-wide GxE interaction scan.
Focusing on inflated or deflated genomic inflation factor λGC (Devlin and Roeder 1999), Rao and
Province (2016) demonstrated a larger variation in λGC (similar to a larger variation in T1E rates
between different whole-genome association scans), when testing the interaction effect as compared
to the main effect under the ‘theoretical’ null. They attributed this to dependence between the
interaction test statistics, because SNPrepeating (or the fixed E in our setting) is not changing
between tests. They also noted that increasing sample size mitigates the problem. Here, we use this
opportunity to revisit direct GxE interaction testing in a GWAS setting. We show that, while T1E
rates are indeed more variable between simulation replicates (i.e. between genome-wide interaction
scans) under the conventional ‘theoretical’ null due to dependency between the tests as in Rao and
Province (2016), the average T1E rate is correct regardless of the sample size. However, under the
‘empirical’ null, a different picture emerges as in the scale test setting.
The third scenario extends the above to a multivariate setting, where the fixed E interacts with
multiple different Gs as in gene-based interaction studies. We will examine SKAT-type of variance
component test (Wu et al. 2011), together with burden-type of sum test (Madsen and Browning
2009) and the classical F-test, that jointly evaluate multiple interaction effects. Generally speaking,
departure from normality is of a less concern when an outcome Y is influenced by multiple genetic
and environmental factors (Falconer 1960, Mackay 2009). However, we will show that the distinction
between the ‘theoretical’ and ‘empirical’ null remains relevant for multivariate models.
In Chapter 2, we first describe three genetic settings in Section 2.1 for testing interaction effect,
including when environmental factor E is unknown; when E is known and interacts with one SNP;
and when E is know and interacts with multiple SNPs. Under each scenario, we briefly review all
the tests being investigated. And in Section 2.2 we introduced three simulation designs S0, S1.1 and
S1.2e, and how they will be applied to each scenario. Following detailed simulation settings and
parameters in Section 3.1, We then provide numerical results from extensive simulation studies in
Section 3.2. Importantly, we note that even if the departure from normality is generally minor in
Chapter 1. Introduction 5
practice and appears to pass standard diagnostic tests for non-normality, the different null simulation
designs can still noticeably affect conclusion regarding T1E control for some tests. Finally, in
Chapter 4, we remark that future T1E evaluation and interpretation should go beyond the traditional
‘theoretical’ null and adopt the alternative ‘empirical’ null simulation designs. All the figures and
tables are included in Chapter 5 and some analytical explanations of the simulation results are
included in Chapter 6.
Chapter 2
Methods and Materials
2.1 Three GxE interaction scenarios and corresponding sta-
tistical tests
For association study of a complex trait Y using a sample of size n, we first define genotype data Gi
for individual i at each SNP under the study. As in tradition, Gi denotes the number of copies of
the minor allele, coded additively as Gi = 0, 1 and 2. And under Hardy-Weinberg equilibrium, Gi
is assumed to come from a binomial distribution, Gi ∼ Binomial(2, 1 − f), where f is minor allele
frequency(MAF).
Consider the simplest case where only one SNP is causal, the true generating model is:
Y = βGG+ βEE + βGEG× E + e, e ∼ N(0, σ2) (2.1)
where βG is the main effect of a causal SNP, βE is the environmental effect, and βGE is the gene-
environment interaction effect. Note that the error term in the true data generating model is
denoted as e and assumed to be normal. We wish to identify the SNP whose genotype G has
non-zero interaction effect βGE .
6
Chapter 2. Methods and Materials 7
2.1.1 Scenario 1: single G, and E missing
Suppose information regarding E was not collected, then the fitted model in real application can
only account for the main effect of G:
Y = bGG+ ε (2.2)
However, it is straightforward to show that variances of Y stratified by the three genotype groups
of G differ if βGE 6= 0:
V ar(Y |G) = (βE + βGEG)2V ar(E) + σ2 = σ2G = V ar(ε). (2.3)
That is, ε in the fitted model of (2.2) can behave quite differently from e, the error term in the
generating model of (2.1). Thus, when E is missing and direct interaction modelling is not feasible,
scale tests for heteroscedasticity can be utilized to identify G associated with variance of Y (Pare
et al. 2010). A joint location-scale testing framework can provide robustness against either βG = 0 or
βGE = 0, and it can improve power if both main and interaction effects are present (Soave, Corvol,
et al. 2015). Here we focus on studying the more sensitive scale tests, because power of the joint
test depends on the individual components.
Different scale tests have been studied in this context, and chief among them are the Levene’s
test (Levene et al. 1960) considered by Pare et al. (2010) and Soave, Corvol, et al. (2015), and
the likelihood ratio test considered by Cao et al. (2014). Levene’s test for variance heterogeneity
between k groups is an ANOVA of the absolute deviation of each observation yi from its group mean
or median. The resulting test statistic Levene follows a F(k−1, n−k) distribution under normality,
and it is asymptotically χ2k−1/(k − 1) distributed; k = 3 in our case. Using median instead of
mean to measure the spread within each group has been proved to be more robust to non-normality,
particularly for t-distributed or skewed data (Brown and Forsythe 1974; Soave and L. Sun 2017).
And we will be using the median version of Levene in this thesis.
The variance likelihood ratio test (LRTv) considered by Cao et al. (2014) contrasts the null model
of no variance difference with the alternative model,
Y = βGG+ e, e ∼ N(0, σ2) vs. Y = βGG+ e, e ∼ N(0, σ2G), (2.4)
and conduct the corresponding LRT for H0 : σ2G=0 = σ2
G=1 = σ2G=2. The corresponding test statistic
Chapter 2. Methods and Materials 8
LRTv is asymptotically χ2(2) distributed. For comparison purpose, we will also examine the likelihood
ratio test(LRTm) and score test (Scorem) for testing the main effect, H0 : bG = 0.
Cao et al. (2014) has pointed out that LRTv is sensitive to the normality assumption, but under
normality they have demonstrated that LRTv has good T1E control. However, they implicitly
assumed that the test would work well as long as the error term e in the phenotype-generating
model is normally distributed, regardless of the ‘theoretical’ null or ‘empirical’ null. The work here
is to show why LRTv may have T1E issue in practice. Indeed, Soave, Corvol, et al. (2015) applied
LRTv to a GWAS of lung function measures in cystic fibrosis subjects. Despite the fact that the lung
measures were approximately normally distributed and permuted prior to the variance association
analysis, the histogram of GWAS p-values clearly showed an increased T1E rate (Figure S2.G of
Soave, Corvol, et al. 2015); the actual application was a joint LRTm and LRTv test but the T1E
issue was due to the LRTv component.
2.1.2 Scenario 2: single SNP, E is known
Assume that E was known, model (2.1) also allows us to evaluate T1E control for our second study
of directly testing for the interaction effect bGE . Continuing from the location tests from the first
study, likelihood ratio test and the score test were examined again. The likelihood ratio test, denoted
as LRT (bGE) considered testing:
Y = bGG+ bEE + ε, vs. Y = bGG+ bEE + bGE(G× E) + ε, (2.5)
where ε ∼ N(0, σ2). In the rest of this thesis, we will use the full model Y = bG+bE+bGE(G×E)+ε
as ‘fitted model’ to refer to both, for tests for nested models like LRT (bGE) here. The corresponding
statistics LRT (bGE) and the classic score test statistics Score(bGE) are both asymptotically χ2(1)
distributed. That is, under the ‘theoretical’ null that βGE = 0 in the true phenotype-generating
model (2.1), the density function f0(T ) = χ2(1), where the test statistic T is either LRT (bGE) or
Score(bGE). This is an important note for our study here. We will show that if βGE 6= 0 but the
association test is conducted for Gnew under the ‘empirical’ null, continuing using χ2(1) to obtain
p-value can lead to T1E result that is quite different from that obtained under the ‘theoretical’ null.
More test statistics could be examined under scenario 2. We use these two mainly to demonstrate
the T1E control problem in standard design, not to list all sensitive test statistics.
Chapter 2. Methods and Materials 9
2.1.3 Scenario 3: multiple Gs, and E known
A complex trait is influenced by multiple factors. For example, intelligence has been found to be
associated with more than a hundred of SNPs (Hill et al. 2018, Dadaev et al. 2018). Without loss
of generality, the simple phenotype-generating model (2.1) can be extended to include J SNPs,
Y =
J∑j=1
βGjGj + βEE +
J∑j=1
βGEj(Gj × E) + e, e ∼ N(0, σ2), (2.6)
where βGj is the main effect of SNP j, βE is the environmental effect, and βGEj is the interaction
effect of Gj × E.
To detect any of the J interaction effects, the classical F-test, denoted as FGE , can be applied
to the following fitted model,
Y =
J∑j=1
bGjGj + bEE +
J∑j=1
bGEj(Gj × E) + ε, (2.7)
and test H0 : bGE1= . . . = bGEJ
= 0 simultaneously. Under the ‘theoretical’ null that βGEj= 0 ∀j
in the phenotype-generating model (2.6), the FGE test statistic is F(J, n− 2J − 1) distributed and
is known to have good T1E control. However, it is not clear how the test would behave in practice
under the ‘empirical’ null. In that case, a set of Gnewj ’s under the study do not interact with E to
influence Y , but βGEj6= 0 and e ∼ N(0, σ2) in the true phenotype-generating model (2.6).
For rare variants where MAFs are low, multivariate testing is common. In addition, different
joint testing methods have been proposed, such as the burden-type sum test (B. Li and Leal 2008;
Madsen and Browning 2009; Price et al. 2010), and SKAT-type variance component test (Wu et al.
2011); see Derkach, Lawless, L. Sun, et al. 2014 for a review. Although these tests were originally
developed for main effects of rare variants, they can be applied to common variants (Xinyi Lin, Lee,
Christiani, et al. 2013) and used for interaction effects (Xinyi Lin, Lee, Wu, et al. 2016 Section 3).
Briefly, the BurdenGE interaction test first aggregates the allele counts across the J SNPs to
obtain G∗ =∑Jj=1Gj . It then tests H0 : bG∗E = 0 using the fitted model of
Y = bG∗G∗ + bEE + bG∗E(G∗ × E) + ε. (2.8)
However, the BurdenGE test has T1E issue even under the ‘theoretical’ null; this was studied in
Chapter 2. Methods and Materials 10
Section 3 of Xinyi Lin, Lee, Wu, et al. (2016). For completeness of method evaluation, we include
BurdenGE in our study of ‘theoretical’ vs. ‘empirical’ null.
Extending the earlier SKAT work for main effect, Xinyi Lin, Lee, Wu, et al. 2016 then used
it to study interaction effect. Without going to the technical details, the main component of the
SKATGE interaction test is∑Jj=1 wjScore
2(bGEj), where Score(bGEj
) is the score test statistic for
each bGEj in the fitted model (2.7), and wj depends on the MAF of SNP j. We will study the
SKATGE , as well as the BurdenGE and FGE for gene-based interaction studies of both common
and rare variants.
2.2 Three T1E simulation designs: the ‘theoretical’ null S0,
and the ‘empirical’ null S1.1 and S1.2
The three simulation designs for evaluate T1E can be conceptualized as follows.
The ‘theoretical’ null’ S0: simulates Y0 under the null,
independently generates Gnew,
and evaluates T1E from the Y0 −Gnew null relationship.
The ‘empirical’ null’ S1.1: simulates Y1 under an alternative based on G,
independently generates Gnew,
and evaluates T1E from the Y1 −Gnew null relationship.
The ‘empirical’ null’ S1.2: permutes the Y1,
and evaluates T1E from the Y perm1 −G null relationship.
(2.9)
The exact implementation depends on the test to be evaluated. Thus, we describe below, in details,
how data are being generated for the three T1E simulation designs (S0, S1.1, and S1.2), and under
the different interaction testing scenarios (E missing or not, and single or multiple Gs).
Chapter 2. Methods and Materials 11
2.2.1 Data-generating models for the three T1E simulation designs and
under the three interaction scenarios
Scenario 1: single G, and E missing
Consider the true phenotype-generating model (2.1), the ‘theoretical’ null simulation design S0
assumes βGE = 0; for simplicity we also assume βG = 0. Thus, S0 simulates phenotype data using
Y0 = βEE + e, e ∼ N(0, σ2). (2.10)
It then independently simulates genotype data for an non-associated SNP,
Gnew ∼ Binomial(2, 1− f), (2.11)
where f is the MAF. Finally, because E was assumed missing in practice, S0 uses the following fitted
model,
Y0 = bGGnew + ε, ε ∼ N(0, σ2G), (2.12)
to detect the variance heterogeneity present in ε, based on the Levene and LRTv tests as described
in Section 2.1.1. Theoretical design S0 can be formulated in another way where we generate true
phenotype from Y0 = βEE+βGG+e, e ∼ N(0, σ2), with main effect of some genotype G (Model II of
Rao and Province (2016)), and apply variance test using fitted model Y0 = bGGnew+ε, ε ∼ N(0, σ2G).
The two type of ‘theoretical’ null designs are not essentially different due to independently generated
Gnew. Similar conclusion that main effect in data generating model does not affect our study of
interaction effect can be drawn in the other two scenarios for the same reason.
The ‘empirical’ null simulation design S1.1, however, first simulates phenotype data under an
alternative hypothesis. That is,
Y1 = βGG+ βEE + βGE(G× E) + e, e ∼ N(0, σ2). (2.13)
Note that G is a truly associated SNP, and G ∼ Binomial(2, 1 − f ′). S1.1 then independently
simulates genotype data for a non-associated SNP Gnew as in (2.11). Note that the MAF f does
not have to be the same as f ′ for G. Similarly, because E was assumed missing in practice, S1.1
Chapter 2. Methods and Materials 12
uses the fitted model of
Y1 = bGGnew + ε, ε ∼ N(0, σ2G) (2.14)
to obtain the Levene and LRTv variance test statistic.
The ‘empirical’ null simulation design S1.2 first permutes the Y1 generated above. Because Y perm1
is no longer associated with the Y1-generating G, it then uses the fitted model of
Y perm1 = bGG+ ε, ε ∼ N(0, σ2G) (2.15)
to assess T1E control of a test.
In summary, the true phenotype-generating models are Y0 = βEE + e or Y1 = βGG + βEE +
βGEG×E + e, where e ∼ N(0, σ2) in both cases. Gnew is genotype data for a non-associated SNP.
The three T1E simulation designs estimate T1E rate of a heteroscedasticity test for V ar(ε) using,
respectively, the fitted models of
S0: Y0 = bGGnew + ε,
S1.1: Y1 = bGGnew + ε,
S1.2: Y perm1 = bGG+ ε.
(2.16)
It’s apparent that both ‘empirical’ null designs simulated phenotype data Y under some alter-
native hypothesis, which is as expected in real data applications. And the tested Y -G pairs are
independent as the null hypothesis claims so that we can evaluate T1E rate control. The facts
explained why we define S1.1 and S1.2 as ‘empirical’ null designs.
Scenario 2: single G, and E known
In this case, the data generating model is the same as above, namely, Y0 = βEE + e, e ∼ N(0, σ2),
for S0 design, and Y1 = βGG + βEE + βGE(G × E) + e, e ∼ N(0, σ2) for S1.1 and S1.2 designs.
Because E is known in this second interaction scenario, the three T1E simulation designs estimate
the T1E rate by testing H0 : bGE = 0 using, respectively, the fitted models of
S0: Y0 = bGGnew + bEE + bGE(Gnew × E) + ε,
S1.1: Y1 = bGGnew + bEE + bGE(Gnew × E) + ε,
S1.2: Y perm1 = bGG+ bEEperm + bGE(G× Eperm) + ε.
(2.17)
Chapter 2. Methods and Materials 13
Note that Eperm represents the fact that the permutation must be performed jointly for Y1 and
E. This is to maintain the Y1 − E association while breaking the Y1 − G association. We can
easily conclude that E(bGE) = 0 in all three fitted models in (2.17), even the Y -GxE may not be
independent. Details can be found in Appendix.
For the ‘theoretical’ null S0, we assumed Y0 = βEE + e without the main G effect in the true
generating model (similar to Model I of Rao and Province (2016)). Alternatively, we could consider
Y0 = βGG+ βEE + e with the main effect (Model II of Rao and Province (2016)). In that case, the
fitted model would be Y0 = bGG+ bEE + bGE(G× E) + ε, and bGE is expected to be zero.
The work of Rao and Province (2016) studied the effect of dependency in SNPnon−repeatingxSNPrepeating
interaction analysis on T1E control. Similarly, we can assume E is fixed to represent the fact that,
in a real genome-wide GxE interaction scan, the E does not change from SNP to SNP. Thus the
test statistics T (bGE)s are dependent in a scan. This dependency can lead to larger variation in
genomic inflation factors. However, we show that this dependency is not the source of the T1E issue
addressed here in Appendix.
Scenario 3: multiple Gs, and E known
By now, it should be clear how the three T1E simulation designs would be implemented in this
setting. The true phenotype-generating models are Y0 = βEE + e, e ∼ N(0, σ2) for S0, and Y1 =∑Jj=1 βGj
Gj +βEE+∑Jj=1 βGEj
(Gj×E)+e, e ∼ N(0, σ2) for S1.1 and S1.2. For the SKATGE and
FGE tests, the three T1E simulation designs estimate the T1E rate of jointly testing H0 : bGEj=
0, j = 1, . . . J , using, respectively, the fitted models of
S0: Y0 =
∑Jj=1 bGj
Gnewj+ bEE +
∑Jj=1 bGEj
(Gnewj× E) + ε,
S1.1: Y1 =∑Jj=1 bGjGnewj + bEE +
∑Jj=1 bGEj (Gnewj × E) + ε,
S1.2: Y perm1 =∑Jj=1 bGj
Gj + bEEperm +
∑Jj=1 bGEj
(Gj × Eperm) + ε.
(2.18)
For the BurdenGE test based on G∗ =∑Jj=1Gj , the three T1E simulation designs estimate the
T1E rate of testing H0 : bG∗E = 0, using, respectively, the fitted models of
S0: Y0 = bG∗G∗new + bEE + bG∗E(G∗new × E) + ε,
S1.1: Y1 = bG∗G∗new + bEE + bG∗E(G∗new × E) + ε,
S1.2: Y perm1 = bG∗G∗ + bEE
perm + bG∗E(G∗ × Eperm) + ε.
(2.19)
Chapter 3
Simulation Studies
3.1 Summary of Simulation models and parameter values
For indirect or direct GxE interaction study of a single SNP (i.e. interaction scenarios 1 and 2), Table
3.1 provides the details of the data-generating models and parameter values used. To evaluate the
Levene and LRTv tests for variance heterogeneity, besides the data generating model as described in
Section 2.3.1 and as used by Aschard et al. (2013), we also considered the model adopted by Cao et
al. (2014) for a more extensive comparison. Cao et al. (2014) used model (2.4) to directly simulate
variance homogeneity or heterogeneity in Y stratified by G. In contrast, Aschard et al. (2013)
used model (2.1) to indirectly simulate variance heterogeneity that has better genetic epidemiology
interpretation, because the size of βGE corresponds to power of scale tests under alternatives. For
direct testing of the interaction effect, conveniently, the data-generating model of Aschard et al.
(2013) in Table 3.1 is conceptually the same as the simulation model I of Rao and Province (2016),
except SNPrepeating is a fixed E here.
To best mimic real data condition, we also used a ‘double-loop’ simulation design. Consider the
‘theoretical’ null of Y0 = βEE + e for GxE interaction analysis. Within each of nrep.out replicates
(e.g. 100) in an outer simulation loop, we simulate Y0 based on a fixed E, Y0 = βEEfixed + e.
We then simulate nrep.in replicates (e.g. 104) of Gnew, test bGE based on the fitted model of
Y0 = bGGnew + bEEfixed + bGE(Gnew × Efixed) + ε, and estimate the T1E rate using the nrep.in
replicates. This is similar to one single whole-genome GxE interaction scan. Finally, we can average
the T1E rate over nrep.out replicates to account for sampling variation inherent in the simulation of
14
Chapter 3. Simulation Studies 15
a Efixed for one single whole-genome interaction scan. This can be done similarly for the ‘empirical’
null.
Table 3.1: Summary of the two data-generating models for indirect and direct GxE interactiontesting, and evaluating the ‘theoretical’ null simulation designs S0 vs. the two the ‘empirical’ nullsimulation designs S1.1 and S1.2, as described in Sections 2.3.1 and 2.3.2. Assume E was availablefor direct GxE testing, the Aschard et al. (2013) model coincides with Model I of Rao and Province(2016), except E was Gnon−repeating. T1E rate is first estimated from nrep.in simulation replicatesin an inner loop in which E is fixed (similar to one whole-genome GxE interaction scan), thenaveraged over nrep.out simulation replicates in an outer loop in which E varies.
Introduce Introducevariance heterogeneity by σ2
G variance heterogeneity by G× E(Cao et al. 2014) (Aschard et al. 2013)
Or, Directly test βGE(assuming E was available)
Null Model Y0 = e, Y0 = βEE + e,for S0 e ∼ N(0, σ2) e ∼ N(0, σ2)
Alternative Models Y1 = βGG+ e, Y1 = βGG+ βEE + βGEG× E + e,for S1.1 and S1.2 e ∼ N(0, σ2
G) e ∼ N(0, σ2)
ParametersMAF=0.4 for G and Gnew MAF=0.4 for G and Gnew; P(E=1)=0.3
βG=0.3 βG=0.01, βE=0.3, βGE=0.1, 0.2,· · · ,1σ20=0.23, σ2
1=0.25, σ22=0.29 σ2 = 1
Sample size n=103 or 104 n=103 or 104
Nominal T1E α=0.05 α=0.05, 0.01, 0.001, 10−5
Replications nrep.in= 105 nrep.in=105, or 107 for α=10−5
nrep.out= 100 nrep.out= 100
For each combination of parameter values in Table 3.1 that generates Y1 under an alternative,
instead of studying power, we focused on evaluating T1E control, contrasting the proposed ‘empirical’
null with the previously considered ‘theoretical’ null, as described in Section 2.
Table 3.2 provides the parameter values for gene-based interaction analysis (i.e interaction sce-
nario 3). The number of SNPs was chosen to be J = 11 as in Xinyi Lin, Lee, Wu, et al. (2016),
among which only seven interact with E. That is, βGEj6= 0 only for some j = 1, . . . , J as detailed in
Chapter 3. Simulation Studies 16
Table 3.2. Two sets of MAF levels were considered, with one presents gene-based interaction studies
of common variants and the other of rare variants.
Table 3.2: Summary of the data-generating models for multivariate GxE interaction testing, andevaluating the ‘theoretical’ null simulation designs S0 vs. the two the ‘empirical’ null simulationdesigns S1.1 and S1.2, as described in Section 2.3.3. T1E rate is first estimated from nrep.in
simulation replicates in an inner loop in which E is fixed (similar to one whole-genome gene-basedGxE interaction scan), then averaged over nrep.out simulation replicates in an outer loop in whichE varies.
Null ModelY0 =
∑Jj=1 βGj
Gj + βEE + e, e ∼ N(0, σ2)for S0
Alternative ModelsY1 =
∑Jj=1 βGjGj + βEE +
∑Jj=1 βGEj (Gj × E) + e, e ∼ N(0, σ2)
for S1.1 and S1.2
MAF for ~Gj and ~Gnewj
small: (2.15, 2.58, 2.58, 4.16, 2.57, 2.61, 4.95, 2.58, 2.57, 2.58, 3.68)× 10−2
large: (2.15, 2.58, 2.58, 4.16, 2.57, 2.61, 4.95, 2.58, 2.57, 2.58, 3.68)× 10−1
Parameters
J=11 ; P(E=1)=0.3~βG = (−0.218, 0, 0,−0.476, 0, 0,−0.151,−0.845, 0.0945, 0,−0.133)
βE = 0.3~βGE = (0.1,−0.1, 0, 0, 0.1, 0.1, 0,−0.1, 0,−0.1, 0)
σ2=0.27
Sample size n=103
Nominal T1E α=0.05, 0.01, 0.001
Replications nrep.in=105; nrep.out=100
3.2 Simulation results
For each of the three GxE interaction scenarios, for each of the statistical tests under the study,
and for each of the three T1E simulation designs, we recorded the corresponding T1E rate. We bold
in red colour the ones that exceed the α ± 3√α× (1− α)/nrep.in range, where α is the nominal
T1E rate, and rep.in is the number of simulation replicates used to estimate the empirical T1E
rate for each of the nrep.out replicates; each replicate in an outer loop represents a whole-genome
association scan. Thus, α± 3√α× (1− α)/nrep.in is a conservative interval.
Chapter 3. Simulation Studies 17
Scenario 1: single G, and E missing
To understand the table results below, each cell indicates an averaged estimated T1E rate from a
full loop (nrep.out×nrep.in), evaluated at a nominal level α. If nominal level α = 0.05, we write
α = 5× 10−2 and each cell is in the same 10−2 scale and we can directly compare the results with 5.
Table 3.3: Simulation results of interaction scenario 1: single G, and E missing. Empirical T1Erates of LRTm and Scorem location tests for mean difference in Y across the three G groups, andof LRTv and Levene scale tests for variance difference in Y , based on the ‘theoretical’ null designof S0 and the alternative ‘empirical’ null designs of S1.1 and S1.2. The alternative Y1 data weregenerated using the Aschard’s genetic model as described in Table 3.1. Empirical T1E rates outsideα± 3
√α× (1− α)/nrep.in are bolded in red.
α = 5× 10−2, n = 103
βGE 0.0 0.2 0.4 0.6 0.8 1
Location
LRTm
S0 5.029 5.027 5.026 5.027 5.027 5.026S1.1 5.039 5.021 5.021 5.019 5.014 5.010S1.2 4.997 5.023 5.022 5.020 5.017 5.011
Scorem
S0 5.002 4.998 4.997 4.998 4.998 4.997S1.1 5.014 4.992 4.993 4.991 4.985 4.981S1.2 4.974 4.994 4.993 4.990 4.988 4.983
Scale
LRTv
S0 5.035 5.083 5.081 5.081 5.081 5.079S1.1 5.029 5.188 5.262 5.507 5.979 6.757S1.2 5.031 5.198 5.274 5.519 5.994 6.756
LeveneS0 4.956 4.898 4.898 4.898 4.898 4.896
S1.1 4.989 4.901 4.904 4.905 4.909 4.907S1.2 4.906 4.922 4.912 4.911 4.915 4.908
Table 3.3 above shows that location tests for phenotypical mean differences (LRTm and Scorem)
are generally robust to the choice of ‘theoretical’ null S0 vs. ‘empirical’ null S1.1 or S1.2. That is, for
a location test with correct T1E control under S0, it has correct T1E control under ‘empirical’ designs
as well. Similarly, if it has inflated T1E rates under S0, it is supposed to have the same inflation
problem showing under S1.1 and S1.2. On the contrary, LRTv test for variance heterogeneity is not
as ‘robust’. While S0 design concludes that LRTv has correct T1E rate control across the parameter
values considered, S1.1 and S1.2 reveals its problem in T1E control with ‘empirical’ null phenotype
Chapter 3. Simulation Studies 18
data. For example, in some settings, the T1E rate of LRTv is 0.07 for the nominal α = 0.05 level.
The empirical T1E rates of the Levene test were slightly deflated but not significantly.
While the increased T1E rates under the S1.1 and S1.2 ‘empirical’ null designs appear to be mild
and occur in extreme models (i.e. large GxE interaction effect), results in Table 3.4 demonstrate
that the T1E issue of LRTv, revealed under the ‘empirical’ null simulation designs of S1.1 and S1.2,
can be more severe at the tail. For example, for the nominal α = 1× 10−5 level, the empirical T1E
rate of LRTv can be as high as 11.5× 10−5. Because the genome-wide significance level for GWAS
is α = 5 × 10−8 (Dudbridge and Gusnanto 2008), an inflation of false positive findings can be of a
real problem in practice. Further, results in Table 3.5 confirm that increasing sample size n (from
103 to 104) does not mitigate the discrepancy in T1E conclusion drawn from the ‘theoretical’ vs.
‘empirical’ null.
Table 3.4: Simulation results of interaction scenario 1: single G, and E missing; effect of the nominalα level. Empirical T1E rates of LRTm and Scorem location tests for mean difference in Y acrossthe three G groups, and of LRTv and Levene scale tests for variance difference in Y , based on the‘theoretical’ null design of S0 and the alternative ‘empirical’ null designs of S1.1 and S1.2. Thealternative Y1 data were generated using the Aschard’s genetic model as described in Table 3.1,focusing on the extreme case of large interaction effect, βGE = 1. Empirical T1E rates outsideα± 3
√α× (1− α)/nrep.in are boded in red.
n = 103 α 5× 10−2 1× 10−2 1× 10−3 1× 10−5
Location
LRTm
S0 5.016 0.999 0.985 0.988S1.1 5.009 1.007 0.988 0.990S1.2 5.011 1.009 1.033 1.013
Scorem
S0 5.008 0.998 0.989 0.990S1.1 4.982 0.998 0.982 0.991S1.2 4.982 0.999 0.983 0.997
Scale
LRTv
S0 5.009 1.002 1.024 1.033S1.1 6.923 1.636 2.059 11.599S1.2 6.920 1.639 2.042 11.624
LeveneS0 4.955 0.961 0.938 0.971
S1.1 4.964 0.978 0.932 0.952S1.2 4.962 0.970 0.953 0.958
Chapter 3. Simulation Studies 19
Table 3.5: Simulation results of interaction scenario 1: single G, and E missing; effect of sample sizen. Empirical T1E rates of LRTm and Scorem location tests for mean difference in Y across the threeG groups, and of LRTv and Levene scale tests for variance difference in Y , based on the ‘theoretical’null design of S0 and the alternative ‘empirical’ null designs of S1.1 and S1.2. The alternative Y1 datawere generated using the Cao’s genetic model as described in Table 3.1, and using two differencesample sizes of n = 103 and 104. Empirical T1E rates outside α ± 3
√α× (1− α)/nrep.in are
bolded in red.
α = 5× 10−2
n = 103 n = 104
Location
LRTm
S0 5.011 5.012S1.1 4.992 5.003S1.2 4.934 4.989
Scorem
S0 5.011 5.012S1.1 4.993 5.003S1.2 4.934 4.989
Scale
LRTv
S0 5.103 5.165S1.1 7.034 7.125S1.2 7.007 7.020
LeveneS0 4.924 4.945
S1.1 4.905 4.965S1.2 4.874 4.825
The root cause is that, under the S1.1 and S1.2 designs, Y1 marginally is not normally distributed,
even if it was generated (conditional on the true G) using a normally distributed error e term. Using
LRTv as an example for the test statistic T of interest. Cao et al. (2014) has shown that LRTv is
accurate based on f0(T ), which is χ2(2) derived for Y0−Gnew under the ‘theoretical’ null S0 condition
and assuming e ∼ N(0, σ2). However, when LRTv is applied to Y1 − Gnew, even if e ∼ N(0, σ2)
for generating Y1 conditional on the true G, the correct asymptotic distribution of LRTv under
the ‘empirical’ null is a weighted sum of independent χ2(1) (Appendix and Theorem 3.4.1(1) of
Yanagihara, Tonda, and Matsumoto (2005)), we call it asymptotic distribution under ‘empirical’
null. Thus, assessing LRTv using χ2(2) (asymptotic distribution under ‘theoretical’ null) can have
T1E issue if the data were under the ‘empirical’ null S1.1 condition; similarly for S1.2.
Figure S1 compares the asymptotic distribution (black solid curve) with the finite-sample dis-
tribution from simulation results (red dashed curve) of LRTv under the ‘empirical’ null, as well as
Chapter 3. Simulation Studies 20
with the asymptotic distribution (χ2(2), blue dot-dashed curve) of LRTv under the ‘theoretical’ null.
While the the finite-sample distribution approximates well the asymptotic distribution derived un-
der the ‘empirical’ null, it is clear that the distributions of LRTv differ between the ‘empirical’ and
‘theoretical’ null; the difference is more visible on the scale of critical value for statistical significance
(the vertical lines). The true significance threshold for LRTv under the ‘empirical ’ null is further
away to the tail, as compared with the threshold of χ2(2) under the ‘theoretical’ null. Thus, applying
LRTv to empirical GWAS or NGS while using the significance threshold of χ2(2) can lead to inflated
T1E rate.
Figure S1: Comparison of the asymptotic distribution (black solid) and finite-sample distribution(red dashed) of LRTv under the ‘empirical’ null S1.1 or S1.2, with the asymptotic distribution(χ2
2, blue dot-dashed) of LRTv under the ‘theoretical’ null S0. Vertical lines correspond the 99.9%quantile cutoffs for α = 0.001.
0 5 10 15
0.0
0.1
0.2
0.3
0.4
0.5
Density of LRT variance test statistics
N = 1000000 Bandwidth = 0.08798
Density
Asymptotic dist. | `empirical' nullFinite-sample dist. | `empirical' nullAsymptotic dist. | `theoretical' null
In practice, it is routine (and recommended) to display and examine the empirical distribution
of a trait under the study. However, Figure S1 shows that even under the most extreme setting
where βGE = 1, the marginal histogram of Y1 appears to be approximately normal visually, unless a
formal diagnostic test for normality was conducted. The slightly right-skewed empirical distribution
of Y1 is the result of mixing six conditional distributions of Y1, each perfectly normally distributed
conditional on the causal G and E. This is the key difference between the ‘theoretical’ and ‘empirical’
Chapter 3. Simulation Studies 21
null simulation designs, regardless of the sample size. For a less extreme case where βGE = 0.2,
although both the histogram and Q-Q plot (Figure S2) suggest that normal distribution is a good
fit (passing the Shapiro-Wilk normality test), the T1E discrepancy between the ‘theoretical’ and
‘empirical’ null remains, albeit less severe, as shown in Table 3.3 (column βGE = 0.2) and Figure S3
(middle row).
Tables 3.3, 3.4 and 3.5 also provide T1E results for testing phenotypic mean (as opposed to
variance) difference across the genotype groups. Although location testing for main effect is generally
quite robust to the assumption of normality, problem can arise when testing for interaction effects,
beyond any apparent model mis-specifications (Rao and Province 2016).
Scenario 2: single G, and E known
In direct testing for the GxE interaction effect (SNPrepeating×SNPnon−repeating to be more precise),
Rao and Province (2016) used the classical ‘theoretical’ null simulation design S0, with or without
the main G effect. Regardless, Figures 1B-1C of Rao and Province (2016) showed that the variation
in the resulting λGC was substantially bigger when testing bGE than testing bG. Their Figures 1D
and 1E also demonstrated that the variation diminishes as sample size increases. However, we note
that this observation was made before averaging across the 414 simulated interaction scans/data
sets; each scan contained 20,000 SNPs from which a λGC value was estimated.
The results of Rao and Province (2016) are consistent with ours shown in Figure S6. Figure
S6 showed that scan-specific estimated T1E rates are indeed variable due to test dependency, and
become less so as sample size increases; 100 GxE whole-genome interaction scans with 105 SNPs in
each scan and E was fixed within a scan. However, it is important to note that the average T1E rate
across nrep.out simulated scans reflects better the long-run behaviour of a method. Indeed, results
in Table 3.6 show that the T1E rate of testing bGE , estimated from 105×100 (nrep.in× nrep.out)
simulated replicates, is well controlled, under the conventional ‘theoretical’ null simulation design
of S0. However, this is not the case under the ‘empirical’ null simulation designs of S1.1 or S1.2.
Similar to the LRTv scale test for variance heterogeneity, the discrepancy in T1E control of LRTGE
and ScoreGE , between the two types of null simulation designs, becomes more prominent at the tail
and persists as sample increases (Table 3.6).
Chapter 3. Simulation Studies 22
Table 3.6: Simulation results of interaction scenario 2: single G, and E known. Empirical T1Erates of the LRTGE and ScoreGE tests, based on the ‘theoretical’ null design of S0 and the al-ternative ‘empirical’ null designs of S1.1 and S1.2. The alternative Y1 data were generated usingthe Aschard’s genetic model as described in Table 3.1 when βGE = 1, but E was assumed to beknown in this case and direction interaction testing was possible. Empirical T1E rates outsideα± 3
√α× (1− α)/nrep.in are bolded in red.
n = 103 n = 104
α = 5× 10−2 1× 10−2 1× 10−3 5× 10−2 1× 10−2 1× 10−3
LRTGE
S0 5.034 1.017 1.029 5.033 1.007 0.979S1.1 7.091 1.763 2.422 6.886 1.709 2.410S1.2 7.100 1.771 2.389 6.972 1.707 2.339
ScoreGE
S0 4.982 1.003 1.002 5.021 1.004 0.972S1.1 7.028 1.738 2.363 6.874 1.705 2.400S1.2 7.040 1.747 2.339 6.960 1.703 2.326
Scenario 3: multiple Gs, and E known
Table 3.7 contains T1E results for gene-based interaction studies of both common and rare variants.
When the MAFs are between 0.2 and 0.5 (common), we observed that the T1E rates of all tests
considered are significantly different between the ‘theoretical’ null and ‘empirical’ null. This result is
consistent with that of the interaction scenarios 1 and 2 above. Our simulation results also confirmed
that the BurdenGE interaction test has T1E issue even under the ‘theoretical’ null, as previously
noted in Xinyi Lin, Lee, Wu, et al. (2016). When the MAFs are reduced ten-fold to be between 0.02
and 0.05 (rare), although SKATGE appears to be robust, the BurdenGE and FGE tests continue
to display significant differences between the ‘theoretical’ and ‘empirical’ null simulation designs for
evaluating their T1E control.
Chapter 3. Simulation Studies 23
Table 3.7: Simulation results of interaction scenario 3: multiple Gs, and E known. EmpiricalT1E rates of the FGE , SKATGE and BurdenGE tests for jointly testing for multiple interactioneffects, based on the ‘theoretical’ null design of S0 and the alternative ‘empirical’ null designs ofS1.1 and S1.2. Models and parameters values are given in Table 3.2. Empirical T1E rates outsideα± 3
√α× (1− α)/nrep.in are bolded in red.
small MAF (rare) large MAF (common)
α = 5× 10−2 1× 10−2 1× 10−3 5× 10−2 1× 10−2 1× 10−3
FGE
S0 4.991 0.982 0.985 4.983 1.001 0.992S1.1 5.448 1.130 2.363 6.905 1.591 2.043S1.2 5.448 1.130 1.238 6.818 1.596 1.945
SKATGE
S0 4.904 0.949 0.895 4.934 0.976 0.917S1.1 5.038 1.074 1.059 6.460 1.411 1.640S1.2 5.034 1.065 1.110 6.378 1.394 1.651
BurdenGE
S0 6.169 3.751 15.049 13.709 4.280 7.440S1.1 8.432 2.213 3.144 5.666 1.232 1.410S1.2 8.408 2.208 3.148 5.712 1.237 1.420
Chapter 4
Discussion
In this work, we highlight the importance of distinguishing the ‘theoretical’ and ‘empirical’ null
distributions, first noted by Efron (2004), in a different application context. Starting with scale
tests for variance heterogeneity and through simulation studies, we showed that conclusions of type
1 error control of a statistical test could differ depending on the choice of the null simulation designs.
For example, the LRTv variance test appears to be accurate under the ‘theoretical’ null but invalid
under the ‘empirical’ null (Tables 3.3, 3.4 and 3.5, and Figure S3). Such differences indicates firstly
the test statistics we are examining does not have correct T1E rate control under ‘empirical’ null
hypothesis, which is expected from real data sets. And more importantly, such problem cannot be
revealed by traditional ‘theoretical’ null designs.
Cao et al. (2014) has pointed out the sensitivity and limitation of LRTv when the error term e is
not normally distributed. However, they implicitly assumed that the test would work well as long as
e ∼ N(0, σ2), regardless of ‘theoretical’ null or ‘empirical’ null. In our simulation study, although all
the e’s for generating the phenotype data were normal, the increased T1E rates under the ‘empirical’
null are, fundamentally, attributed to the sensitivity of LRTv to departure from normality. This is
because the marginal distribution of the empirical outcome data are not normal (Figures S1 and
S2). Thus, tests shown to be extremely sensitive to the assumption of normality are particularly
vulnerable when applied to real data, which are better represented by the ‘empirical’ null than the
‘theoretical’ null.
Conversely, power calculation for LRTv using the significance threshold of the ‘theoretical’ null
distribution can be too optimistic. Indeed, this study was motivated by the observation made in
24
Chapter 4. Discussion 25
Table S7 of Soave, Corvol, et al. (2015) that, “further analysis using permutation estimation of
p-values showed that power of the LRT under asymptotic analysis was greatly inflated. The work
here provides insights on why this is the case. The asymptotic power was obtained using the χ2(2)
distribution derived under the ‘theoretical’ null while controlling the T1E rate at α, as in Cao et
al. (2014). The permutation-based power was obtained by using the empirical 1− α quantile cutoff
of the LRTv statistic applied to permuted phenotype. This is equivalent to controlling the T1E
rate at α under the ‘empirical’ null of S1.2. Under the ‘empirical’ null, however, we showed that
LRTv follows a different distribution and the corresponding significance quantile cutoff is further
to the tail, as compared with the f0(T ) = χ2(2) distribution under the ‘theoretical’ null (Figure
S1). This leads to (correct) smaller permutation-based power. The (incorrect) higher asymptotic-
based power is a result of increased T1E rate, when the data were generated under the ‘empirical’
null condition but the test statistic was evaluated using χ2(2) derived under the ‘theoretical’ null
condition. Permutation-based S1.2 null design can estimate the true distribution of a test statistic
T when applied to a real data set, and identify the correct significance threshold for T under the
‘empirical’ null. But, permutation must be carried out carefully in practice, for example, in the
presence of sample correlation (Abney 2015).
In practice, investigators often rely on visual inspection of histograms of outcome data as illus-
trated in Figures S1 and S2. We have noted that the departure from normality does not have to
be severe to have an effect on tests such as LRTv, as observed in a LRTv scan of lung function in
cystic fibrosis subjects by Soave, Corvol, et al. (2015). Furthermore, for data appear to deviate from
normal such as those displayed in Figure S1, even if investigators chose to perform some standard
normal transformations, the T1E issue can persist. For example, let us consider the phenotype data
simulated based on Aschard’s genetic model, as described in Table 3.1 where βGE = 1 (Figure S1).
After square-root or log transformations (Goh and Yap 2009), although the empirical marginal dis-
tribution of the phenotype improved as expected (Figure S4), the severity of T1E inflation of LRTv
in fact worsened under the ‘empirical’ S1.1 and S1.2 null (Figure S5).
Voorman et al. (2011) showed that spurious false positives can occur in genome-wide scans for
GxE interactions, particularly in the presence of model mis-specification. Rao and Province (2016)
also presented inflated or deflated genomic inflation factors in a GxG interaction scan when one SNP
is anchored (i.e. SNPrepeatingxSNPnon−repeating), using the conventional ‘theoretical’ null simula-
tion design without any apparent model mis-specifications. Based on our GxE simulation studies
Chapter 4. Discussion 26
where E was fixed within each scan, we note that the large variation in λGC estimate demonstrated
by Rao and Province (2016) corresponds to the sampling variation inherent in estimating T1E rate
from nrep.in replicates (or SNPs) across nrep.out replicates (or scans). This, however, does not
translate to T1E issue based on the classical frequentist interpretation. Results in Table 3.6 show
that, under the ‘theoretical’ null of S0, there is no T1E issue in GxE interaction studies even if E
was fixed within a genome-wide scan. But, similar to scale test of variance, T1E results differ using
the ‘empirical’ S1.1 or S1.2 null simulation designs. Additional theoretical insights are provided in
Appendix Section 3.
Departure from normality is generally weakened under multivariate assumptions. However, the
topic addressed here remain relevant. To demonstrate this, in addition to studying interaction
between E and G of a single SNP, we also examined testing for interaction effects between E
and multiple Gs as in gene-based interaction studies. We reached the same conclusion that T1E
conclusions could differ between the ‘theoretical’ and ‘empirical’ null.
In practice, with real data available, ‘empirical’ null designs can be implemented as well to
evaluate T1E rate control for a test statistics. S1.1 generates new genotype data and fits models
using real phenotype Y and environmental factors E. S1.2 permutes the real Y -E pair jointly and
fit a model with Y perm, G, and Eperm. From the simulation results, S1.1 and S1.2 seem to reach
the same conclusion in all cases. However, with real data, S1.1 can hardly generates Gs while
maintaining the same linkage relationship. Permutation approach S1.2 keeps the genotype data,
but it could not maintain correlation structures in phenotype. The ‘empirical’ null designs S1.1 and
S1.2 are two basic designs to mimic real data and obtain true T1E rate evaluation. However, due
to the difficulties in reproducing real data together with possible relationships, new designs may be
developed to improve the current ones.
To conclude, although we only presented three examples (i.e. scale tests for variance heterogene-
ity, and direct tests for one or multiple interaction effects jointly), the findings here have important
implications for future evaluation of type 1 error control and interpretation. The newer tests be-
ing developed are increasingly complex, and they often go beyond the first moment such as the
scale tests studied here. Conventional simulation design S0 under the ‘theoretical’ null can lead
to misleading conclusion regarding the accuracy of a test, which in turn affects power study. The
alternative simulation designs S1.1 and S1.2 under the ‘empirical’ null, can reveal the true behaviour
of a test when applied to real data.
Bibliography
[1] Mark Abney. “Permutation Testing in the Presence of Polygenic Variation”. In: Genetic Epi-
demiology 39.4 (2015), pp. 249–258.
[2] Hugues Aschard et al. “A nonparametric test to detect quantitative trait loci where the phe-
notypic distribution differs by genotypes”. In: Genetic epidemiology 37.4 (2013), pp. 323–333.
[3] Morton B Brown and Alan B Forsythe. “Robust tests for the equality of variances”. In: Journal
of the American Statistical Association 69.346 (1974), pp. 364–367.
[4] Ying Cao et al. “A versatile omnibus test for detecting mean and variance heterogeneity”. In:
Genetic epidemiology 38.1 (2014), pp. 51–59.
[5] Tokhir Dadaev et al. “Fine-mapping of prostate cancer susceptibility loci in a large meta-
analysis identifies candidate causal variants”. In: Nature communications 9.1 (2018), p. 2256.
[6] Andriy Derkach, Jerry F Lawless, Lei Sun, et al. “Pooled association tests for rare genetic
variants: a review and some new results”. In: Statistical Science 29.2 (2014), pp. 302–321.
[7] Bernie Devlin and Kathryn Roeder. “Genomic control for association studies.” In: Biometrics
55.4 (1999), pp. 997–1004.
[8] Frank Dudbridge and Olivia Fletcher. “Gene-environment dependence creates spurious gene-
environment interaction”. In: The American Journal of Human Genetics 95.3 (2014), pp. 301–
307.
[9] Frank Dudbridge and Arief Gusnanto. “Estimation of significance thresholds for genomewide
association scans”. In: Genetic epidemiology 32.3 (2008), pp. 227–234.
[10] Bradley Efron. “Large-scale simultaneous hypothesis testing: the choice of a null hypothesis”.
In: Journal of the American Statistical Association 99.465 (2004), pp. 96–104.
27
BIBLIOGRAPHY 28
[11] Friedhelm Eicker. “Limit theorems for regressions with unequal and dependent errors”. In:
Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Vol. 1.
1. 1967, pp. 59–82.
[12] Douglas Scott Falconer. Introduction to quantitative genetics. Oliver and Boyd; Edinburgh;
London, 1960.
[13] Liang Goh and Von Bing Yap. “Effects of normalization on quantitative traits in association
test”. In: BMC bioinformatics 10.1 (2009), p. 415.
[14] E Gomez-Sanchez-Manzano, MA Gomez-Villegas, and JM Marfffdfffdn. “Sequences of elliptical
distributions and mixtures of normal distributions”. In: Journal of multivariate analysis 97.2
(2006), pp. 295–310.
[15] WD Hill et al. “A combined analysis of genetically correlated traits identifies 187 loci and a
role for neurogenesis and myelination in intelligence”. In: Molecular psychiatry (2018), p. 1.
[16] Azmeri Khan and Glen D Rayner. “Robustness to non-normality of common tests for the
many-sample location problem”. In: Journal of Applied Mathematics & Decision Sciences 7.4
(2003), pp. 187–206.
[17] Julia Kozlitina and William R Schucany. “A robust distribution-free test for genetic association
studies of quantitative traits”. In: Statistical applications in genetics and molecular biology 14.5
(2015), pp. 443–464.
[18] Seunggeun Lee, Michael C Wu, and Xihong Lin. “Optimal tests for rare variant effects in
sequencing association studies”. In: Biostatistics 13.4 (2012), pp. 762–775.
[19] Howard Levene et al. “Robust tests for equality of variances”. In: Contributions to probability
and statistics 1 (1960), pp. 278–292.
[20] Bingshan Li and Suzanne M Leal. “Methods for detecting associations with rare variants for
common diseases: application to analysis of sequence data”. In: The American Journal of
Human Genetics 83.3 (2008), pp. 311–321.
[21] Xinyi Lin, Seunggeun Lee, David C Christiani, et al. “Test for interactions between a ge-
netic marker set and environment in generalized linear models”. In: Biostatistics 14.4 (2013),
pp. 667–681.
BIBLIOGRAPHY 29
[22] Xinyi Lin, Seunggeun Lee, Michael C Wu, et al. “Test for rare variants by environment inter-
actions in sequencing association studies”. In: Biometrics 72.1 (2016), pp. 156–164.
[23] Trudy FC Mackay. “Q&A: Genetic analysis of quantitative traits”. In: Journal of biology 8.3
(2009), p. 23.
[24] Bo Eskerod Madsen and Sharon R Browning. “A groupwise association test for rare mutations
using a weighted sum statistic”. In: PLoS genetics 5.2 (2009), e1000384.
[25] Benjamin M Neale et al. “Testing for an unusual distribution of rare variants”. In: PLoS
genetics 7.3 (2011), e1001322.
[26] Guillaume Pare et al. “On the use of variance per genotype as a tool to identify quantitative
trait interaction effects: a report from the Women’s Genome Health Study”. In: PLoS genetics
6.6 (2010), e1000981.
[27] Alkes L Price et al. “Pooled association tests for rare variants in exon-resequencing studies”.
In: The American Journal of Human Genetics 86.6 (2010), pp. 832–838.
[28] Tara J Rao and Michael A Province. “A Framework for Interpreting Type I Error Rates from a
Product-Term Model of Interaction Applied to Quantitative Traits”. In: Genetic epidemiology
40.2 (2016), pp. 144–153.
[29] David Soave, Harriet Corvol, et al. “A joint location-scale test improves power to detect as-
sociated SNPs, gene sets, and pathways”. In: The American Journal of Human Genetics 97.1
(2015), pp. 125–138.
[30] David Soave and Lei Sun. “A generalized Levene’s scale test for variance heterogeneity in the
presence of sample correlation and group uncertainty”. In: Biometrics (2017).
[31] Maksim V Struchalin et al. “Variance heterogeneity analysis for detection of potentially inter-
acting genetic loci: method and its limitations”. In: BMC genetics 11.1 (2010), p. 92.
[32] Xiangqing Sun et al. “What is the significance of difference in phenotypic variability across
SNP genotypes?” In: The American Journal of Human Genetics 93.2 (2013), pp. 390–397.
[33] Tetsuji Tonda, Hirofumi Wakaki, et al. “Asymptotic expansion of the null distribution of the
likelihood ratio statistic for testing the equality of variances in a nonnormal one-way ANOVA
model”. In: Hiroshima Mathematical Journal 33.1 (2003), pp. 113–126.
BIBLIOGRAPHY 30
[34] Arend Voorman et al. “Behavior of QQ-plots and genomic control in studies of gene-environment
interaction”. In: PloS one 6.5 (2011), e19416.
[35] Andrew R Wood et al. “Another explanation for apparent epistasis”. In: Nature 514.7520
(2014), E3.
[36] Michael C Wu et al. “Rare-variant association testing for sequencing data with the sequence
kernel association test”. In: The American Journal of Human Genetics 89.1 (2011), pp. 82–93.
[37] Hirokazu Yanagihara, Tetsuji Tonda, and Chieko Matsumoto. “The effects of nonnormality
on asymptotic distributions of some likelihood ratio criteria for testing covariance structures
under normal assumption”. In: Journal of Multivariate Analysis 96.2 (2005), pp. 237–264.
Chapter 5
Appendix
5.1 Conditional and Marginal Distribution of Y under Alter-
native Model
The conditional and marginal mean/variance of phenotype Y:
µij = E[Y |E = i, G = j] = jβG + iβE + ijβGE
σij = Var[Y |E = i, G = j] = σ2
E[Y ] =
1∑i=0
2∑j=0
ωij · µij
Var[Y ] =
1∑i=0
2∑j=0
ωij · ((µij − E[Y ])2 + σ2)
(5.1)
Where ωij = 21(j=1)f2−j(1−f)jP[E = 1]i(1−P[E = 1])1−i, where f is the minor allele frequency
(MAF) of the SNP. And it can be easily concluded that Y ∼∑1i=0
∑2j=0 ωijN (µij , σ
2).
31
Chapter 5. Appendix 32
5.2 Asymptotic Distribution of Likelihood Ratio Statistics
for Variance under Non-normality
The null hypothesis for testing equal variance in 3 genotype groups are:
H0 : σ20 = σ2
1 = σ22
Likelihood Ratio test statistic equals to:
LRTv = −1
2
2∑j=0
log σ2j − log σ2
It is known that under normality, LRTvL−→ χ2
(2). When Y is not normal but with E[||Y ||4] <∞ and
nj/n = O(1) for each group G = j, denote ρj = nj/n, ρ = (ρ0, ρ1, ρ2)′. And uj = (√
nj
2 ·σ2i−σ
2
σ2 ),
u = (u0, u1, u2)′. The covariance matrix of u can be written as I3⊗Ω for some 3x3 matrix Ω. Then
the characteristic function of LRTv becomes:
C(t) =
2∏j=0
(1− 2itλj)−1 + o(1)
where λj are the eigenvalues of Ω. And the test statistics converges to:
LRTvL−→
2∑j=0
λjχ2(1)
When Y is normally distributed the eigenvalues will reduce to a zero and two 1s and the asymptotic
distribution will reduce to∑2j=1 χ
2(1) = χ2
(2).
5.3 Score Test on GxE Interaction Scan
5.3.1 Dependency of Test Statistics
The working model in the whole-genome interaction scan is:
E[Y 0] = β1G+ β2E + β3G× E
Chapter 5. Appendix 33
Score statistics:
T =S(0)− S(β3)√V ar(S(β3))
=
n∑i=1
1
σ2(GiEi) · (yi − βGGi − βEEi)
/√n
σ2E[G2E2]
Since Y ∼ N (0, 1), σ2 is the variance of normal error term, T becomes:
T =
( n∑i=1
1
σGiEi · εi
)/√n · E[G2E2] ∼ N
(0,
1n
∑ni=1G
2iE
2i
E[G2E2]
)
For a p-SNP scan, the distributions of T1, · · · , Tp have dependent variances. When∑ni=1 1[Gi =
0] (1−maf)2 · n, or∑ni=1 1[Ei = 1] P[E = 1] · n, the genomic inflation factor λ is expected to
be lower than 1 and vice versa.
Small-sized sample can be largely deviated from expectation by chance. As n→∞, 1n
∑ni=1(GiEi)
2 →
E[(GE)2], and thus λ becomes stabled around 1 (Figure S5). Averaged over outer loop for type 1
error control, location tests will not reveal any problem under S0 design (Table 6).
5.3.2 Under Non-Normality
The phenotype data Y 1 under alternative ‘empirical’ null is generated from:
Y 1 = βGG+ βEE + βGEGE + e, e ∼ N (0, 1)
With conditional expectation and variance:
E[Y 1|E] = βGE[G] + (βE + βGEE[G]) · E
Var[Y 1|E] = β2GVar[G] + β2
GEVar[G] · E2
For example, in S1.1, Gnew |= Y,E in the working model:
E[Y 1] = β1Gnew + β2E + β3Gnew × E
E has linear mean effect on Y with increased variance. Thus error term ε has variance depending
on E.
Chapter 5. Appendix 34
Consequently, the asymptotic distribution of score test statistics is expected to have variance larger
than 1. Thus empirical type 1 error is larger than nominal level in general regardless sample size.
Chapter 5. Appendix 35
5.4 Figures
Figure S1: Marginal and conditional histogram (Top), and normal Q-Q plot (Bottom) of the empir-ical phenotype data Y1 when βGE = 1 based on the Aschard’s model as described in Table 1.
Chapter 5. Appendix 36
Figure S2: Marginal and conditional histogram (Top), and normal Q-Q plot (Bottom) of the em-pirical phenotype data Y1 when βGE = 0.2 based on the Aschard’s model as described in Table1.
Chapter 5. Appendix 37
Figure S3: Histogram of p-values of the LRTv test under S0 (Left column), S1.1 (Middle column)and S1.2 (Right column) null simulation designs. Simulation of the empirical phenotype data Y1 wasbased on the Aschard’s genetic model as described in Table 1 with βGE = 0 (Top row), βGE = 0.2(Top row), and βGE = 0.6 (Top row). Red line represents the Uniform (0, 1) distribution.
Chapter 5. Appendix 38
Figure S4: Histograms with theoretical normal distribution (Top row), and normal QQ-plots (Bot-tom row) of phenotype Y1 simulated based on the Aschard’s genetic model as described in Table 2with βGE = 1 for before transformation (Left column), after square root (Middle column), and afterlog transformations (Right column).
Histogram of Y
Y
Den
sity
−2 0 2 4
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Histogram of rtY
rtY
Den
sity
−4 −2 0 2
0.0
0.1
0.2
0.3
0.4
Histogram of logY
logY
Den
sity
−4 −3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
−3 −2 −1 0 1 2 3
−2
−1
01
23
4
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
−4
−2
02
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
−3
−2
−1
01
23
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Chapter 5. Appendix 39
Figure S5: Histogram of p-values of the LRTv test under the S0 (Left column), S1.1 (Middle column)and S1.2 (Right column) null simulation designs, when phenotype Y1 was simulated based on theAschard’s genetic model as described in Table 1 with βGE = 1 for before transformation (Top row),after square root transformation (Middle row), and after log transformation (Bottom row).
Chapter 5. Appendix 40
Figure S6: Scatter plot of nrep.out = 100 estimated T1E rates for whole-genome interaction GxEscans, each estimated from nrep.in = 105 simulated replicates (or SNPs) with a fixed E, under the‘theoretical’ null design S0 as described in Table 1, for sample size n = 104 (Left) and 105 (Right).
0 20 40 60 80 100
0.03
0.04
0.05
0.06
0.07
sample size n=1000
Est
imat
ed ty
pe 1
err
or
0 20 40 60 80 100
0.03
0.04
0.05
0.06
0.07
sample size n=10000
Est
imat
ed ty
pe 1
err
or