nonparametric covariance estimation in functional mapping...
TRANSCRIPT
NONPARAMETRIC COVARIANCE ESTIMATION IN FUNCTIONAL MAPPINGOF COMPLEX DYNAMIC TRAITS
By
JOHN STEPHEN F. YAP
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2008
1
c© 2008 John Stephen F. Yap
2
To my mother, Rhoda.
3
ACKNOWLEDGMENTS
I realized that I have been in school for most of my life and this dissertation is the
culmination of my formal education - but certainly not the end to learning. I would like to
thank everyone who has contributed to the accumulation of my knowledge and honing of
my skills, those who have helped me in all aspects of my career, and all who have affected
my life. In particular, thanks to
The people in my early years of education: Mabel Nakasas, my first tutor, for her tireless
effort even when I was daydreaming or falling asleep while she was teaching me Math;
Rico Santos who gave us the opportunity to better our Math skills; Mr. and Mrs. Yeban
for all their support and mentorship; Dr. Aurello Ramos, Jr. for giving me a job at LSC;
Dr. Augusto Hermosilla for all his precious pieces of advice regarding my career; all my
colleagues at the Ateneo de Manila University Math Department for all their friendship
and support.
All my recommenders: Dr. Jose Marasigan, Dr. Reginald Marcelo, and Dr. Gerry Salas
(for initial admission to graduate school); Dr. Stephen Agard, Dr. Dennis Cook, Dr. Chris
Bingham, and Dr. John Baxter (for admission to the Ph.D. program in Statistics at the
University of Florida); Dr. Rongling Wu, Dr. James Hobert, Dr. Mark Yang, and Dr.
Wendy London (for job applications).
Dr. Wendy London for giving me the opportunity to work at COG and learn about
children’s cancer and my COG colleagues Patrick McGrady, Chenguang Wang, and
Stephen Linda.
My colleagues, officemates and friends in the Statistics Department at UF: Aixin Tan
who has helped me a lot in my statistics career, Song Wu for all his help in statistical
4
genetics, Yao Li, Vivekananda Roy, Ruitao Liu, Bong-Rae Kim, Jie Yang, Jiahan Li,
Tezcan Orazgat, Tian Liu, Hongying Li, Wei Hou, Qin Li, Jiangtao Liu, and Guifang Fu.
Pinoy UF for their camaraderie and awesomeness.
Jordana Arzadon, Tita Rhoda and family, Tita Esther and Uncle Jim, Tita Rizza and
Manang Grace for being my surrogate family in the US.
Clouded Fury: Tobs, Alvin, Allan, and Anne.
Charlie, Junior, and Little Kitten.
Greg and the potluck gang.
My mom who has always supported me in everything that I did.
My Ph.D. Committee: Dr. Malay Ghosh, Dr. Ronald Randles, Dr. Xueli Liu, and Dr.
Wendy London.
My adviser, Dr. Rongling Wu, who has been very patient, supportive, generous, and for
being the best adviser ever!
5
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
CHAPTER
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1 Basic Genetics and QTL Mapping . . . . . . . . . . . . . . . . . . . . . . . 141.1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.1.2 Experimental Crosses . . . . . . . . . . . . . . . . . . . . . . . . . . 151.1.3 Linkage and Markers . . . . . . . . . . . . . . . . . . . . . . . . . . 161.1.4 Interval Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Functional Mapping of QTL . . . . . . . . . . . . . . . . . . . . . . . . . . 201.2.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.2.2 Parameter Estimation via the EM Algorithm . . . . . . . . . . . . . 231.2.3 Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3 Other QTL Mapping Models . . . . . . . . . . . . . . . . . . . . . . . . . . 261.4 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 NONPARAMETRIC COVARIANCE ESTIMATION IN FUNCTIONALMAPPING OF QTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.2 Covariance Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.1 Modified Cholesky Decomposition and Regression Interpretation . . 312.2.2 Regularized Covariance Estimators . . . . . . . . . . . . . . . . . . 332.2.3 Ridge Regression and LASSO . . . . . . . . . . . . . . . . . . . . . 352.2.4 Penalized Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3 Covariance Estimation in Functional Mapping . . . . . . . . . . . . . . . . 412.3.1 Computing the Penalized Likelihood Estimates . . . . . . . . . . . . 412.3.2 From EM to ECM Algorithm . . . . . . . . . . . . . . . . . . . . . 442.3.3 Selection of Tuning Parameter . . . . . . . . . . . . . . . . . . . . . 46
2.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.4.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.4.2 Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.5 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6
3 NONPARAMETRIC COVARIANCE ESTIMATION IN FUNCTIONALMAPPING OF REACTION NORMS TO TWOENVIRONMENTAL SIGNALS . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.2 Functional Mapping of Reaction Norms to Multiple Environmental Signals 66
3.2.1 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.2.2 Mean and Covariance Models . . . . . . . . . . . . . . . . . . . . . 693.2.3 Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3 Spatio-temporal Covariance Functions . . . . . . . . . . . . . . . . . . . . 713.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.3.2 Basic Ideas, Notation, and Assumptions . . . . . . . . . . . . . . . . 723.3.3 Separable Covariance Structures . . . . . . . . . . . . . . . . . . . . 733.3.4 Nonseparable Covariance Structures . . . . . . . . . . . . . . . . . . 75
3.3.4.1 Spectral method by Cressie and Huang (1999) . . . . . . . 753.3.4.2 Monotone function method by Gneiting (2002) . . . . . . 77
3.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.5 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4 CONCLUDING REMARKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.0.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.0.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
APPENDIX
A DERIVATION OF EM ALGORITHM FORMULAS . . . . . . . . . . . . . . . 97
B DERIVATION OF EQUATION 2-9 . . . . . . . . . . . . . . . . . . . . . . . . . 99
C MINIMIZATION OF 2-33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
D DEFINITION OF KRONECKER PRODUCT . . . . . . . . . . . . . . . . . . . 102
E DERIVATION OF EQUATION 3-20 . . . . . . . . . . . . . . . . . . . . . . . . 103
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7
LIST OF TABLES
Table page
1-1 Conditional genotype probability in a backcross . . . . . . . . . . . . . . . . . . 19
1-2 Conditional genotype probability in an F2 . . . . . . . . . . . . . . . . . . . . . 19
2-1 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for threeQTL genotypes in an F2 population under different sample sizes (n) based on 100simulation replicates (ΣNP , Normal Data). . . . . . . . . . . . . . . . . . . . . . . 52
2-2 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for threeQTL genotypes in an F2 population under different sample sizes (n) based on 100simulation replicates (ΣAR(1), Normal Data). . . . . . . . . . . . . . . . . . . . . . 53
2-3 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for threeQTL genotypes in an F2 population under different sample sizes (n) based on 100simulation replicates (ΣNP , Data from t-distribution). . . . . . . . . . . . . . . . . . 54
2-4 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for threeQTL genotypes in an F2 population under different sample sizes (n) based on 100simulation replicates (ΣAR(1), Data from t-distribution). . . . . . . . . . . . . . . . 55
2-5 Available markers and phenotype data of a linkage map in an F2 population ofmice (data from Vaughn et al., 1999). . . . . . . . . . . . . . . . . . . . . . . . . 59
3-1 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for twoQTL genotypes in a backcross population under different sample sizes (n) based on100 simulation replicates (Nonseparable Model). . . . . . . . . . . . . . . . . . . . . 81
3-2 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for twoQTL genotypes in a backcross population under different sample sizes (n) based on100 simulation replicates (Nonseparable Model). . . . . . . . . . . . . . . . . . . . . 82
3-3 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for twoQTL genotypes in a backcross population under different sample sizes (n) based on100 simulation replicates (ΣNP ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8
3-4 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for twoQTL genotypes in a backcross population under different sample sizes (n) based on100 simulation replicates (ΣAR(1)). . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3-5 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for twoQTL genotypes in a backcross population under different sample sizes (n) based on100 simulation replicates (C1 with n = 400 and σ2 = 2, 4). . . . . . . . . . . . . . . 88
3-6 Averaged QTL position, mean curve parameters, maximum log-likelihood ratios (maxLR),entropy and quadratic losses and their standard errors (given in parentheses) for twoQTL genotypes in a backcross population under different sample sizes (n) based on100 simulation replicates (C1 with n = 400, increased irradiance and temperature lev-els, and σ2 = 1, 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9
LIST OF FIGURES
Figure page
1-1 Experimental crosses from pure inbred line parents P1 and P2 . . . . . . . . . . 15
1-2 Crossing-over . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1-3 Weights of mice measured every week for 10 weeks . . . . . . . . . . . . . . . . 22
1-4 Hypothetical plot of LR vs. linkage map . . . . . . . . . . . . . . . . . . . . . . 27
2-1 Penalized likelihood in curve estimation . . . . . . . . . . . . . . . . . . . . . . 40
2-2 Log-likelihood ratio (LR) plots based on simulated data under three differentcovariance structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2-3 The profile of the log-likelihood ratios (LR) between the full model (there is aQTL) and reduced (there is no QTL) model for body mass growth trajectoriesacross the genome in a mouse F2 population . . . . . . . . . . . . . . . . . . . . 58
2-4 Log-likelihood ratio (LR) plots for chromosomes 6 and 7 of the mice data . . . . 60
2-5 Three growth curves each presenting a genotype at each of seven QTLs detectedon mouse chromosomes 1, 4, 6, 7, 10, 11, and 15 for growth trajectories of micein an F2 population. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3-1 Reaction norm surface of photosynthetic rate as a function of irradiance andtemperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3-2 Boxplots of the values of the log-likelihood under the alternative model, H1 . . . 85
3-3 Covariance plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3-4 Contour plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4-1 Formation of a phenotype by a landscape . . . . . . . . . . . . . . . . . . . . . . 95
10
Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy
NONPARAMETRIC COVARIANCE ESTIMATION IN FUNCTIONAL MAPPINGOF COMPLEX DYNAMIC TRAITS
By
John Stephen F. Yap
August 2008
Chair: Rongling WuMajor: Statistics
One of the fundamental objectives in agricultural, biological and biomedical
research is the identification of genes that control the developmental pattern of complex
traits, their responses to the environment, and the way these genes interact in a
coordinated manner to determine the final expression of the trait. More recently, a
new statistical framework, called functional mapping, has been developed to identify
and map quantitative trait loci (QTLs) that determine developmental trajectories by
integrating biologically meaningful mathematical models of trait progression into a
mixture model for unknown QTL genotypes. Functional mapping has emerged to be a
powerful statistical tool for mapping QTLs controlling the responsiveness (reaction norm)
of a trait to developmental and environmental signals.
From a statistical perspective, functional mapping designed to study the genetic
regulation and network of quantitative variation in dynamic complex traits is virtually a
joint mean-covariance likelihood model. Appropriate choices of the model for the mean
and covariance structures are of critical importance to statistical inference about QTL
locations and actions/interactions. While a battery of statistical and mathematical models
have been proposed for mean vector modeling, the analysis of covariance structure has
been mostly limited to parametric structures like autoregressive one (AR(1)) or structured
antedependence (SAD) model. In functional mapping of reaction norms that respond
to two environmental signals, a model, expressed as a Kronecker product of two AR(1)
11
structures, has been proposed to test differences of the genetic control of responses to
different environments. For practical longitudinal data sets, parametric modeling may
be too simple to capture the complex pattern and structure of the covariance. There
is a pressing need to develop a robust approach for modeling any possible structure of
longitudinal covariance, ultimately broadening the use of functional mapping.
Our study proposes a nonparametric covariance estimator in functional mapping
of quantitative trait locus. We adopt Huang et al.’s (2006) approach of invoking the
modified Cholesky decomposition and converting the problem into modeling a sequence of
regressions of responses. A regularized positive-definite covariance estimator is obtained
using a normal penalized likelihood with an L2 penalty. This approach is embedded
within the mixture likelihood framework of functional mapping by using a reparameterized
version of the derivative of the log-likelihood. We extend the idea of functional mapping
to model the covariance structure of interaction effects between the two environmental
signals in a non-separable way. The extended model allows the quantitative test of several
fundamental biological questions. Is there a pleiotropic QTL that regulates genotypic
responses to different environmental signals? What is the difference in the timing and
duration of QTL expression between environment-specific responsiveness? How is an
environment-dependent QTL regulated by a development-related QTL? We performed
various simulation studies to reveal the statistical properties of the new models and
demonstrate the advantages of the proposed estimator. By analyzing real examples in
genetic studies, we illustrated the utilization and usefulness of the methodology. The new
methods will provide a useful tool for genome-wide scanning for the existence, distribution
and interactions of QTLs underlying a dynamic trait important to agriculture, biology and
health sciences.
12
CHAPTER 1INTRODUCTION
A number of biological traits are quantitatively inherited. Examples of such traits
include the height of trees, the weight or body mass of animals, the yield of agricultural
crops, or even disease progression and drug response. Genetic mapping of quantitative
traits and subsequent cloning of the underlying genes have become a considerable focus
in agricultural, biological, and biomedical research. Since the publication of the seminal
mapping paper by Lander and Botstein (1989), there has been a large amount of literature
concerning the development of statistical methods for mapping complex traits (reviewed
in Jansen, 2000; Hoeschele, 2000; Wu et al., 2007b). Although the idea of associating a
continuously varying phenotype with a discrete trait (marker) dates back to the work
of Sax (1923), it was Lander and Botstein (1989) who first established an explicit
principle for linkage analysis. They also provided a tractable statistical algorithm for
dissecting a quantitative trait into their individual genetic locus components, referred to as
quantitative trait loci (QTLs).
The success of Lander and Botstein in developing a powerful method for linkage
analysis of a complex trait has roots in two different developments. First, the rapid
development of molecular technologies in the middle 1980s led to the generation of a
virtually unlimited number of markers that specify the genome structure and organization
of any organism (Drayna et al., 1984). Second, almost simultaneously, improved statistical
and computational techniques, such as the EM algorithm (Dempster et al., 1977), made it
possible to tackle complex genetic and genomic problems.
Lander and Botstein’s (1989) model for interval mapping of QTLs is regarded as
appropriate for an ideal (simplified) situation, in which the segregation patterns of all
markers can be predicted on the basis of the Mendelian laws of inheritance and a trait
under study is strictly controlled by one QTL on a chromosome. This work was extended
and improved by many researchers (Jansen and Stam, 1994; Zeng, 1994; Haley et al., 1994;
13
Xu, 1996), with successful identification of so-called “outcrossing” QTLs in real-world data
sets of pigs (Andersson et al., 1994) and pine (Knott et al., 1997). A general framework
for QTL analysis in outcrossing populations was recently established by Lin et al. (2003).
While most statistical methods for QTL mapping are simply the combination between
statistics and genetics, Ma et al. (2002) pioneered a novel mapping framework in which
statistics, genetics and biology are integrated through mathematical equations to ask
and disseminate biological hypotheses. Functional mapping provides a powerful tool for
mapping and identifying QTLs that control the developmental pattern of complex traits.
This chapter provides an overview of basic genetic concepts related to QTL mapping
of complex traits. Fundamental procedures for functional mapping (Ma et al., 2002) will
be emphasized. Functional mapping is a statistical and genetic model for mapping QTLs
that underlie a complex dynamic trait. This chapter is organized as follows: Section 1.1
introduces basic genetic concepts and describe how QTL mapping, via interval mapping,
is done using the idea of linkage in structured populations called experimental crosses.
Section 1.2, introduces the functional mapping model. Section 1.3 describes a few other
QTL mapping methods and finally, Section 1.4 states the main goals of this dissertation
and gives the outline of the rest of the chapters.
1.1 Basic Genetics and QTL Mapping
1.1.1 Terminology
A gene exists in alternate forms or alleles. The alleles usually occur as pairs in specific
locations in a chromosome and are transmitted from one generation to another. An
offspring (or progeny) inherits one allele from each parent and the two alleles together
constitute the offspring’s genotype. The usual textbook notation for a particular gene is a
letter in the alphabet with capital and lowercase to denote the alleles. Thus, with alleles
14
0.5
720.5
P1 P
1 P
2 P
2
F1 F
1 F
1
F2
Figure 1-1. Experimental crosses from pure inbred line parents P1 and P2: F1 × P1 orF1 × P2 produces a backcross while F1 × F1 produces an intercross or F2.(Adapted from Broman, 1997).
’A’ and ’a’, the possible genotypes are AA, Aa and aa. The genotype determines the trait
or phenotype. Variation due to a QTL results from phenotypes determined by different
genotypes. However, because environmental factors also contribute to the total phenotypic
variation, it is difficult to infer an offspring’s genotype from its phenotype.
1.1.2 Experimental Crosses
The identification of a QTL can be done through experimental crosses. The simplest
experimental cross, called backcross, starts by mating two pure inbred line parents (P1
and P2 with genotypes AA and aa, respectively) that differ in the phenotype under
consideration (Figure 1-1). Each parent contributes a chromosome strand to create an
offspring called the first filial or F1 which has genotype Aa. If the F1 is mated to one of its
parents, say P2, the offspring is called a backcross, with genotype either Aa or aa. During
meiosis (the production of sex cells or gametes), each parental strand replicates and
exchange genetic material with other strands. This exchange is known as crossing-over.
15
Thus, the strand contributed by the F1 parent to produce the backcross contains alleles
from both P1 and P2. If the F1 is selfed (mated with itself) or mated with another F1,
the offspring is called F2, with possible genotypes AA,Aa or aa. This experiment is called
intercross.
1.1.3 Linkage and Markers
Suppose we consider another gene, with alleles ’B’ and ’b’, on P1 and P2. The
genotypes in this case are AABB and aabb, respectively. Crossing-over during meiosis
is illustrated in Figure 1-2. Each chromosome strand from P1 and P2 pair up and then
replicates to form a tetrad. Crossing-over, as illustrated here, occurs at one point between
the inner strands and involves an exchange of genetic material. The end result is four
chromosomes that either resemble the parental strands (nonrecombinant, NR) or not
(recombinant, R). If crossing-over does occur, it can also do so more than once and include
the other two outer strands. In general, recombinant strands are formed when there is an
odd number of crossover points.
Genes on the same chromosome have an association called linkage. The tendency
during meiosis is for the genes to remain on the same strand. This means that there
will be more nonrecombinant than recombinant chromosomes. If r is the recombina-
tion fraction or the proportion of recombinant chromosomes, then the proportion of
nonrecombinant chromosomes is 1− r. Because of linkage, it is generally true that r < 1/2.
The value of r depends on the distance between gene loci. Genes that are far apart usually
have high values of r because of the large portion of chromosome in between allowing
for a better chance for crossing-over to occur. If two genes are very close to each other,
there is a high possibility that no crossing-over will occur and they will end up in the same
chromosome.
Linkage provides a way of locating a QTL in a chromosome by using known or
identified genes called genetic or molecular markers. Markers do not affect the QTL’s
phenotype directly and as such they are said to be phenotypically neutral. But they
16
0.5
720.5
P1 P
2
Figure 1-2. Crossing-over: (1) Parental chromosome strands align (2) Each strandreplicates to form a tetrad, crossing-over starts between inner strands (3)Recombinant (R) or nonrecombinant (NR) gametes
.
may affect other visible phenotypes such as eye color, making it possible to distinguish
their genotypes. If a marker is closely linked with a QTL, then both of their alleles could
possibly end up on the same chromosome in a backcross or F2 offspring. The resulting
marker genotype, which can be identified, is informative in predicting the QTL. Thus,
a prerequisite for QTL mapping is the construction of a linkage map of markers that
spans an entire chromosome or the set of all chromosomes in an organism (genome). The
more markers there are, the greater the chance of QTL detection. Some of the popularly
used markers include the restriction fragment length polymorphisms (RFLPs), amplified
fragment length polymorphisms (AFLPs), and single nucleotide polymorphisms (SNPs).
1.1.4 Interval Mapping
In this section, we shall see exactly how the marker genotypes are useful in QTL
mapping. The most popular approach in QTL mapping for experimental crosses is called
17
interval mapping (Lander and Botstein, 1989; Broman, 2001). This approach is also the
basis of function mapping. We start by discussing the concept of genetic distance.
A unit map distance, expressed in centiMorgans (cM), between two loci is defined
as the expected number of crossovers between loci in 100 meiotic products. Assuming
that crossovers occur at random and are independent of each other, the Haldane map
function (Haldane, 1919; Wu et al., 2007c) can be used to relate a distance of d cM to the
recombination frequency r in the following way:
d = −100 log(1− 2r)
2or r =
1
2(1− e−2d/100). (1–1)
The distances between markers across the genome are known and usually expressed in cM.
However, QTL mapping models utilize probabilities which are expressed in terms of r.
Thus, when a linkage map of markers are given in cM, they are converted to r using Eq.
1–1.
For simplicity, we assume a backcross population with two possible genotypes at a loci
denoted by 1 (for Aa) or 0 (for aa). Consider an interval on a chromosome with two linked
markers, M and N , as endpoints and let r be the recombination fraction between them.
We refer to M as the left marker and N as the right marker. Suppose there exists a QTL,
Q, within the marker interval. Let r1 be the recombination fraction between M and Q and
r2 between Q and N . It is easy to show using Eq. 1–1 that r = r1 + r2 − 2r1r2. The QTL
genotypes are unknown but their conditional probabilities given the marker genotypes can
be derived. These conditional probabilities are shown in Table 1-1. As an illustration, if
an offspring has genotype 1 (Mm) at marker M and 0 (nn) at marker N , then the marker
interval genotype is 10. The conditional probability that a QTL has genotype 1 (Qq) is
P (1|10) =(1− r1)r2
r.
18
Denote the QTL conditional probability by pk|i, where k is an index for genotype and i is
an index for progeny. Thus, the above example is the value for p1|i. We omit an index for
the marker interval to simplify notation.
Table 1-1. Conditional genotype probability in a backcross
Marker Interval QTL GenotypeGenotype 1 0
11 (1−r1)(1−r2)(1−r)
r1r2
(1−r)
10 (1−r1)r2
rr1(1−r2)
r
01 r1(1−r2)r
(1−r1)r2
r
00 r1r2
(1−r)(1−r1)(1−r2)
(1−r)
For an intercross or F2 population, there are three possible genotypes denoted by 2
(for AA), 1 (for Aa), and 0 (for aa). The conditional QTL probabilities given the marker
interval genotypes can also be derived and are shown in Table 1-2.
Table 1-2. Conditional genotype probability in an F2
Marker Interval QTL GenotypeGenotype 2 1 0
22 (1−r1)2(1−r2)2
(1−r)22r1(1−r1)r2(1−r2)
(1−r)2r21r2
2
(1−r)2
21 (1−r1)2r2(1−r2)r(1−r)
r1(1−r1)(1−2r2+2r22)
r(1−r)
r21r2(1−r2)
r(1−r)
20(1−r1)2r2
2
r2
2r1(1−r1)r2(1−r2)r2
r21(1−r2)2
r2
12 r1(1−r1)(1−r2)2
r(1−r)
(1−2r1+2r21)r2(1−r2)
r(1−r)
r1(1−r1)r22
r(1−r)
11 2r1(1−r1)r2(1−r2)(1−2r+2r2)
(1−2r1+2r21)(1−2r2+2r2
2)
(1−2r+2r2)2r1(1−r1)r2(1−r2)
(1−2r+2r2)
10r1(1−r1)r2
2
r(1−r)
(1−2r1+2r21)r2(1−r2)
r(1−r)r1(1−r1)(1−r2)2
r(1−r)
02r21(1−r2)2
r2
2r1(1−r1)r2(1−r2)r2
(1−r1)2r22
r2
01r21r2(1−r2)
r(1−r)
r1(1−r1)(1−2r2+2r22)
r(1−r)(1−r1)2r2(1−r2)
r(1−r)
00r21r2
2
(1−r)22r1(1−r1)r2(1−r2)
(1−r)2(1−r1)2(1−r2)2
(1−r)2
Tables 1-1 and 1-2 were obtained using a three-point analysis of genes (see for
example, chapter 4 of Wu et al., 2007c). A QTL is usually searched or scanned at
consecutive equidistant points or intervals in the genome. For example, a given chromosome
is searched starting at the leftmost marker to the opposite end at every 2 or 4 cM. At each
search point, the numerical values of the conditional probabilities of a QTL can be
calculated using Tables 1-1 and 1-2. These conditional probabilities form the weights
19
for the phenotype densities in the mixture model in Section 1.2.1. Notice that for a
given marker interval in either table,∑J
k=1 pk|i = 1, where J is the number of genotypes
(J = 2, 3 for a backcross and intercross, respectively). This means that all entries for a
given row in either table add up to 1.
A complete interval mapping model involves the phenotype data aside from the
markers. We shall see in the next section (1.2.1) how functional mapping, which is based
on interval mapping, incorporates phenotype data.
1.2 Functional Mapping of QTL
1.2.1 Model Formulation
Assume there are J genotypes in a mapping population of n individuals due to a
segregating QTL. Suppose each individual is measured for a trait at m equidistant points,
with the phenotype observation vector expressed as yi = (yi1, ..., yim)′ for each i = 1, ..., n.
Assuming a multivariate normal density, the phenotype of individual i with genotype k is
expressed as
fk(yi) = (2π)−m/2|Σ|−1/2 exp−(yi − gk)′Σ−1(yi − gk)/2, (1–2)
where the mean genotype value gk and covariance Σ are specified, and k = 1, ..., J .
The likelihood function can be represented by a multivariate mixture model
L(Ω) =n∏
i=1
[J∑
k=1
pk|ifk(yi|Ω)
](1–3)
where Ω is the parameter vector which we will specify shortly. pk|i is the probability of
a QTL genotype given the genotypes of two flanking markers (Section 1.1.4). As stated
earlier, a QTL is searched at different points throughout the genome. At any given
point, the numerical value of pk|i can be computed based on Tables 1-1 and 1-2. Thus,
for a given search position, the log-likelihood, log L(Ω), can be maximized to obtain the
maximum likelihood estimates (MLEs) of the mean, Ωµ, and covariance, ΩΣ, parameters.
Therefore, Ω = (Ωµ,ΩΣ).
20
Functional mapping is a general joint mean-covariance model (Pourahmadi, M., 1999,
2000; Pan and Mckenzie, 2003; Wu and Pourahmadi, 2003; Wu et al., 2007b) for QTL
mapping. Its generality stems from a wealth of possibilities in specifying the mean and
covariance structures to solve a large number of problems. With regards to the mean,
many phenomena have structural forms and can be modeled parametrically. For example,
growth (Figure 1-3) can be modeled by a logistic function defined by
gk = [gk(t)]m×1 =
[ak
1 + bke−rkt
]
m×1
(1–4)
(Niklas, 1994; West et al., 2001; Zhao et al., 2004). This model has a few desirable
descriptive properties. The curve starts with an exponential phase and reaches an
inflection point where the maximum rate of growth occurs. Then growth continues
asymptotically towards the value a. The value at the onset of growth is a/(1 + b) at t = 0
while at the point of inflection is a/2 at t = log b/r. These properties can be used to
derive hypothesis tests (Ma et al., 2002; Zhao et al., 2004). Other parametric mean models
include the sigmoid Emax equation which relates drug concentration and drug effect (Lin
et al., 2007), the Richards and Gompertzian curves for time-dependent tumor growth
(Li et al., 2006), and the biexponential model for HIV-I dynamics (Wang et al., 2006).
In the absence of structural forms, semiparametric (Cui et al., 2006; Wu et al., 2007d;
Yang et al., 2007) or nonparametric (Zhao, W., 2005a; Yang, J., 2006; Yang et al., 2007)
approaches can be used to model the mean.
The covariance, Σ, is assumed to be the same for each genotype group k. The usual
parametric model is the autoregressive one or AR(1),
Σ = σ2
1 ρ ρ2 . . . ρm−1
ρ 1 ρ . . . ρm−2
ρ2 ρ 1 . . ....
......
.... . .
...
ρm−1 ρm−2 . . . . . . 1
, (1–5)
21
1 2 3 4 5 6 7 8 9 100
10
20
30
40
50
Age (week)
Wei
ght (
g)
Figure 1-3. Weights of mice measured every week for 10 weeks. Data from the study ofVaughn et al. (1999).
which is popular in the longitudinal data literature (Diggle et al., 2002; Verbeke and
Molenberghs, 2005). This model assumes variance stationarity (equal residual variances
σ2 at each time point) and covariance stationarity (proportionally decreasing covariances
ρ between time points). Explicit forms for the inverse and determinant are easily obtained
by matrix algebra:
Σ−1 =1
σ2
1 − ρ1−ρ2 0 . . . . . . 0
− ρ1−ρ2 1 + ρ2 − ρ
1−ρ2 . . . . . . 0
0 − ρ1−ρ2 1 + ρ2 . . . . . .
...
......
.... . . . . .
...
......
...... 1 + ρ2 − ρ
1−ρ2
0 0 . . . . . . − ρ1−ρ2 1
(1–6)
22
and
|Σ| = [σ2(1− ρ2)]m−1σ2. (1–7)
Thus, an AR(1) model is computationally efficient. Furthermore, when using the
ECM-algorithm, it is possible to obtain CM-step iteration solutions for the parameters in
logistic (Ma et al., 2002) or rational function (Yap et al., 2007) mean models. Despite the
advantages of an AR(1) model, the assumptions of variance and covariance stationarity
may not always hold, especially for real data. This is evident from Figure 1-3 where
the data appears to ’fan out’ across time instead of being stationary. Wu et al. (2004b)
tried to resolve this problem by applying a transform-both-sides (Carroll and Ruppert,
1984) log-transformation on the data to achieve approximate stationarity. The AR(1) can
then be used on the transformed data (Zhao et al., 2004). However, such transformation
may not produce a stationary covariance so that an AR(1) is still not an appropriate
model. To get rid of stationarity issues, Zhao et al. (2005) proposed using a structured
antedependence model (SAD) (Zimmerman and Nunez-Anton, 2001) which can handle
non-stationary data and is more robust and less data-dependent than AR(1). The
elements of an SAD covariance structure of order 1 are given by
var(yij) =1− φ2j
1− φjν2 and cov(yij1 , yij2) = φj2−j1
1− φ2j1
1− φjν2, 1 ≤ j1 ≤ j2 ≤ m, (1–8)
where φ is the generalized autoregressive parameter, ν is the innovation variance, and
j = 1, ..., m. 1–8 shows that the variances are not constant and the covariances are not
only dependent on |j2 − j1| but also on the reference point j1. However, Zhao et al. still
recommends modeling data by SAD in conjunction with AR(1).
1.2.2 Parameter Estimation via the EM Algorithm
The log-likelihood function, Eq. 1–3, can be written as
log L(Ω) =n∑
i=1
log
[J∑
k=1
pk|ifk(yi|Ω)
](1–9)
23
and the MLEs can be obtained by solving the likelihood or normal equation
∂
∂θlog L(Ω) = 0 (1–10)
where θ ∈ Ω. Note that for logistic mean, Ωµ = ak, bk, rk|k = 1, ..., J, and AR(1)
covariance, ΩΣ = σ2, ρ, models, Ω = (Ωµ,ΩΣ). However, the left side of Eq. 1–10 can
be reparameterized as
∂
∂θlog L(Ω) =
n∑i=1
J∑
k=1
pk|i ∂∂θ
fk(yi|Ω)∑Jk=1 pk|ifk(yi|Ω)
=n∑
i=1
J∑
k=1
pk|ifk(yi|Ω)∑Jh=1 ph|ifh(yi|Ω)
· ∂
∂θlog fk(yi|Ω)
=n∑
i=1
J∑
k=1
Pk|i∂
∂θlog fk(yi|Ω) (1–11)
where
Pk|i =pk|ifk(yi|Ω)∑J
h=1 ph|ifh(yi|Ω)(1–12)
is interpreted as the posterior probability that progeny i has QTL genotype k (McLachlan
and Peel, 2000; Ma et al., 2002). Let P = Pk|i, k = 1, ..., J ; i = 1, ..., n. The Expectation
and Maximization (EM) algorithm (Dempster et al., 1977) at the (j + 1)th iteration
proceeds as follows:
1. The current value of Ω is Ω(j).
2. E-Step. Update P(j) using Ω(j):
P(j)k|i =
pk|ifk(yi|Ω(j))∑Jh=1 ph|ifh(yi|Ω(j))
. (1–13)
3. M-Step. Solven∑
i=1
J∑
k=1
P(j)k|i
∂
∂θlog fk(yi|Ω) = 0 (1–14)
to get Ω(j+1).
4. Repeat until some convergence criterion is met.
The values at convergence are the MLEs of Ω.
24
Appendix A shows the derivation of 1–13 and 1–14 based on a missing data argument.
This derivation can also be used to show the extension from a maximum to maximum
penalized likelihood algorithm (Section 2.3.2). For a more thorough treatment on the EM
algorithm, the reader is referred to McLachlan and Peel (2000).
Although the EM algorithm provides efficient computation of the model parameters
in functional mapping, other methods can be used as well such as Newton-Raphson or
the Nelder-Mead simplex algorithm (Nelder and Mead, 1965; Zhao et al., 2004) which is a
direct nonlinear optimization procedure. These methods are particularly useful when no
closed form formulas for the parameter estimates can be obtained.
1.2.3 Hypothesis Tests
After obtaining the MLEs, we can formulate a hypothesis about the existence of a
QTL affecting genotype mean patterns as
H0 : a1 = ... = aJ , b1 = ... = bJ , r1 = ... = rJ
H1 : at least one of the inequalities above does not hold,
where H0 is the reduced (or null) model so that only a single logistic curve can fit the
phenotype data and H1 is the full (or alternative) model in which case there exist more
than one logistic curves that fit the phenotype data due to the existence of a QTL. Note
that the likelihood function corresponding to the null model is given simply by
L(Ω) =n∏
i=1
f(yi|Ω) (1–15)
where
f(yi|Ω) = (2π)−m/2|Σ|−1/2 exp−(yi − g)′Σ−1(yi − g)/2 (1–16)
and
g = [g(t)]m×1 =
[a
1 + be−rt
]
m×1
(1–17)
in the case of a logistic mean model. A number of other important hypotheses can be
tested, as outlined in Wu et al. (2004a).
25
The evidence for the the existence of a QTL can be displayed graphically using the
log-likelihood ratio (LR) test statistic
LR = −2 log
[L(Ω)
L(Ω)
](1–18)
plotted over the entire linkage map, where Ω and Ω denote the MLEs under H0 and H1,
respectively. The peak of the LR plot, which we shall from hereon refer to as maxLR,
would suggest a putative QTL because this corresponds to when H1 is the mostly likely
over H0. The distribution of LR is difficult to determine because of two major issues: the
unidentifiability of the QTL position under H0 and a multiple testing problem that arises
because tests across the genome are not mutually independent (Wu et al, 2007c). However,
a nonparametric method called permutation tests by Doerge and Churchill (1996) can be
used to find an approximate distribution and a significance threshold for the existence of
a QTL. In permutation tests, the functional mapping model is applied to several random
permutations of the phenotype data on the markers and a threshold is determined from
the set of maxLR values obtained from each permutation test run. The idea here is to
disassociate the markers and phenotypes so that repeated application of the model on
permuted data will produce an approximate empirical null distribution.
Figure 1-4 shows a hypothetical plot of the log-likelihood ratio test statistic over a
linkage map on a chromosome. The markers are spaced out at 0, 32, 46, 58, 68, 82, 100, 112cM and the QTL search was done at 2 cM intervals from the leftmost marker at 0. The
two horizontal lines are thresholds based on permutation tests. The solid red line that
crosses the LR plot suggests a significant QTL while the broken green line does not.
1.3 Other QTL Mapping Models
As mentioned in the previous section, functional mapping is based on interval
mapping. Unfortunately, interval mapping may be inadequate because only one marker
interval is considered at a time without regard to other QTLs that may exist beyond the
interval. Multiple QTL models (MQM; Jansen and Stam, 1994) and composite interval
26
0 32 46 58 68 82 100 1120
10
20
30
40
50
60
70
Location (cM)
LR
maxLR
QTL
Figure 1-4. Hypothetical plot of LR vs. linkage map. The latter consist of markers spacedout at 0,32,46,58,68,82,10,112 along the chromosome represented by thex-axis. QTL search was at 2 cM intervals. The location corresponding to thepeak of the plot, maxLR, suggests a QTL position. A threshold that crossesthe plot (red horizontal line) indicates a significant QTL; if not (greenhorizontal broken line), the QTL is not significant.
mapping (CIM; Zeng, 1994) were proposed to address this issue. To increase the power of
interval mapping, both methods use a subset of marker loci beyond the marker interval
under consideration as covariates in a partial regression analysis. The effects of the
subset of markers are used to estimate the effects of other QTLs. The problem with these
methods though is how to select the appropriate markers to include in the model.
Multiple interval mapping (MIM; Kao et al., 1999) uses multiple marker intervals
simultaneously to identify multiple QTLs. This model allows estimation of main and
epistatic (interaction) effects among all detected QTLs. However, an issue with this
method is model comparison and searching through models (Broman, 2001).
27
Other approaches include Bayesian methods (Satagopan et al., 1996; Sillanpaa and
Arjas, 1999; Xu and Yi, 2000) and the use a genetic algorithm (Carlborg et al., 2000;
Broman, 2001).
1.4 Goals
Although other QTL mapping methods have merits like being able to model multiple
QTLs and their interactions, they are mostly limited to using only single trait information
or measurements at one time point. When the trait being observed is actually affected by
a QTL over a continuous course of time, these models fall short in capturing the true QTL
dynamics. Many attempts to model this type of phenomenon are hindered by complexity
in structure and intensive computation. Fortunately, a novel approach called functional
mapping by Ma et al. (2002)(Section 1.2) provides a useful framework for genetic mapping
through mean and covariance modeling of multi- or longitudinal traits. Because it requires
a small number of model parameters to estimate, it is computationally efficient and can be
used on data that have limited sample sizes. Functional mapping has shown potential as a
powerful statistical method in QTL mapping.
Although parametric models such as AR(1) and SAD are suitable covariance
structures for the likelihood-based functional mapping, severe bias could be introduced
in the estimation process if the underlying data structure is significantly different.
Specifically, a biased covariance estimate can affect the estimates for QTL location,
QTL effects (the estimated mean model), and even the value of maxLR, which is needed
in permutation tests for significance. Thus, there is a need for a robust estimator
that can provide more accurate and precise results. In this regard, we propose a
nonparametric approach. A nonparametric covariance estimator was proposed by Huang
et al. (2006) for the null model (Section 1.2.3). These authors used a penalized likelihood
procedure in solving a set of regression equations obtained from the modified Cholesky
decomposition of the covariance matrix. Their estimator is regularized and guaranteed to
28
be positive-definite. In this dissertation, we extend their model to the mixture likelihood
framework of functional mapping.
The rest of the chapters are organized as follows:
In Chapter 2, we describe the modified Cholesky decomposition approach of
covariance estimation and Huang et al.’s nonparametric procedure. We provide some
discussion on ridge regression and LASSO techniques for solving a regression, and
penalized likelihood because these are the main concepts behind their method. Then
we show the extension to the mixture likelihood case and apply our model to simulated
and real longitudinal data.
In Chapter 3, we extend the use of our proposed estimator to functional mapping
of reaction norms with two environmental signals. We consider photosynthetic rate as
the reaction norm and irradiance and temperature as the two environment signals. The
previous proposed covariance model was parametric and separable. In situations when the
underlying data structure is nonseparable, our nonparametric estimator is shown to be
more reliable based on the simulation results.
In Chapter 4, we conclude this dissertation.
29
CHAPTER 2NONPARAMETRIC COVARIANCE ESTIMATION IN FUNCTIONAL
MAPPING OF QTL
2.1 Introduction
Covariance estimation is an important aspect of multivariate methods such as
principal component analysis, discriminant analysis, multivariate regression, and
longitudinal data modeling (Wong et al., 2003; Bickel and Levina, 2008; Levina et al.,
2008; Rothman et al., 2007). The most commonly used covariance estimator is the sample
covariance matrix which is unbiased and positive-definite (Huang et al., 2006 and 2007b).
However, with high-dimensional data like gene arrays, fMRI, or spectroscopy, the sample
covariance matrix is highly unstable. In longitudinal data modeling, covariances that have
definite structures, such as compound symmetry and AR(1), are popular model choices.
Aside from parsimony, these structures guarantee positive-definite estimators. However,
unsuitably chosen or misused parametric models can result in severe estimation bias.
Hence, there is a need for more robust estimators.
The main difficulties associated with covariance estimation are the number of
parameters that need to be estimated (which grows quadratically with dimension) and
the positive-definite constraint. Pourahmadi (1999) discovered that the latter can be
taken care of if one uses the modified Cholesky decomposition. A few published research
that followed Pourahmadi’s suggestion proposed ways of regularization to provide an
efficient covariance estimator (Wu and Pourahmadi, 2003; Huang et al., 2006 and 2007b;
Levina et al., 2008). In this chapter, we adopt the approach proposed by Huang et al.
(2006) which uses a penalized likelihood procedure and extend it to the mixture likelihood
framework of functional mapping. Such extension is possible if the posterior probability
reparameterization of the derivative of the log-likelihood function, 1–11, is used. The
new approach is a nonparametric covariance estimator in functional mapping. The term
nonparametric, which refers to distribution free methods, may not be exactly appropriate.
As we shall see in this chapter, the estimator is really for unstructured covariances and
30
it is nonparametric in this sense. But because nonparametric is the term used in the
literature (Huang et al., 2006 and 2007b), we shall adhere to it.
This chapter is organized as follows: In Section 2.2, we discuss the modified Cholesky
decomposition and its regression interpretation, review the methods proposed in the
literature for regularized covariance estimators, and discuss ridge regression (Hoerl and
Kennard, 1970), the least absolute shrinkage and selector operator or LASSO (Tibshirani,
1996), and penalized likelihood. In Section 2.3, we describe how Huang et al.’s approach
can be extended to functional mapping. Section 2.4 is devoted to simulations and an
analysis of a real data of an intercross progeny of mice using our proposed methodology.
The last section (2.5) summarizes the chapter and provides some discussion.
2.2 Covariance Estimation
2.2.1 Modified Cholesky Decomposition and Regression Interpretation
Whereas a Cholesky decomposition expresses a symmetric positive-definite matrix in
terms of a lower triangular matrix and its transpose, the modified Cholesky decomposition
expresses it in terms of a lower triangular and diagonal matrix. That is, the symmetric
and positive-definite covariance matrix Σ of a zero-mean random longitudinal vector
y = (y1, ..., ym)′, can be uniquely diagonalized as
TΣT ′ = D, (2–1)
where
T =
1 0 0 · · · 0
−φ21 1 0 · · · 0
−φ31 −φ32 1 · · · ...
... · · · · · · . . ....
−φm1 −φm2 · · · −φm,m−1 1
,
31
and
D =
σ21 0 0 · · · 0
0 σ22 0 · · · 0
0 0. . . · · · ...
... · · · · · · . . ....
0 0 · · · 0 σ2m
.
Aside from ensuring positive-definiteness, the modified Cholesky decomposition
provides a meaningful statistical interpretation from its components (Pourahmadi, 1999):
the subdiagonal entries of T are the regressions coefficients when each yt (t = 2, ..., m)
is regressed on its predecessors yt−1, ..., y1 and the entries of D are the corresponding
prediction error variances. More precisely, y1 = ε1 and for t = 2, ..., m,
yt =t−1∑j=1
φtjyj + εt (2–2)
where −φtj is the (t, j)th entry (for j < t) of T , and σ2t =var(εt) is the tth diagonal
element of D. φtj, j = 1, ..., t − 1; t = 2, ..., m and σ2t , t = 1, ..., m are referred
to as generalized autoregressive parameters (GARPs) and innovation variances (IVs),
respectively. Note that Eq. 2–2 can be generalized to the non-zero mean case, E(y) =
E[(y1, ..., ym)′] = (µ1, ..., µm)′ = µ, as
yt − µt =t−1∑j=1
φtj(yj − µj) + εt. (2–3)
Furthermore, Eq. 2–2 can be written in matrix form as
Ty = ε (2–4)
where ε = (ε1, ..., εm)′ and cov(εi, εj) = 0 for all i and j. Thus,
cov(Ty) = T cov(y)T ′ = TΣT ′ = D = cov(ε)
which yields Eq. 2–1.
32
2.2.2 Regularized Covariance Estimators
Ill-conditioned covariance estimators can cause overfitting of data by the model.
Thus, there is a need for regularization to prevent this kind of problem. Several
approaches that provide regularized covariance estimators have been proposed and we
shall explore some of them in this section.
Pourahmadi (1999) recognized that the GARPs and the logarithm of the IVs are
unconstrained parameters and hence can be modeled in terms of covariates. His approach
was to estimate the Cholesky components T and D, instead of estimating Σ directly.
That is, find estimates T and D of T and D, respectively, so that an estimator of Σ is
Σ = T−1D(T−1)′ which is positive-definite. He did this by suggesting a link function g(·)for Σ defined by
g(Σ) = log D + T + T ′ − 2I (2–5)
(Wu and Pourahmadi, 2003) where log D means the matrix D where the logarithm is
taken on each diagonal element and I is the identity matrix. This formulation is analogous
to a link function for the mean in generalized linear model theory (McCullagh and Nelder,
1989).
Although the Cholesky components still have the same number of parameters
to estimate as any unstructured covariance matrix Σ, this number can be reduced
considerably by using covariates and modeling the entries of T and D either parametrically,
nonparametrically, or in a Bayesian way (Pourahmadi, 1999; Daniels and Pourahmadi,
2002; Wu and Pourahmadi, 2003). Pourahmadi (1999) illustrated the parametric approach
by using time lags as covariates for the entries of T and D in analyzing the cattle data
(Kenward 1987). Wu and Pourahmadi (2003) and Huang et al. (2007b) each used a
nonparametric approach by capitalizing on the regression representation Eq. 2–2.
Intuitively, for longitudinal data, terms far away in the regression are expected to be
small. That is, the lag j regression coefficient φt,t−j is expected to be small for a fixed t
and large j. This means that for a given row on the T matrix, the terms are expected
33
to be monotone decreasing as they move farther away from the main diagonal. Thus, a
reasonable estimate for T would be a banded (or tapered) matrix T where the first m0
subdiagonals of T are estimated or smoothed and the remaining subdiagonals are set to
zero. m0 is chosen using the Akaike Information Criterion (AIC; Akaike, 1974). Wu and
Pourahmadi (2003) used local polynomial estimators to estimate the first m0 subdiagonals
of T while Huang et al. (2007b) used B-Splines. The latter authors claim that their
method is more efficient than the former in terms of reduction in risk as shown in their
simulations. Moreover, because their method employs maximum likelihood estimation,
the EM algorithm can be used to handle missing data and simultaneous modeling of
mean and covariance can be accommodated. Neither of these can be done under Wu
and Pourahmadi’s approach. In the Bayesian paradigm, Furrer and Bengtsson (2006)
considered a banded sample covariance matrix instead of T (Bickel and Levina, 2008).
Although Huang et al.’s nonparametric approach of smoothing the first few
subdiagonals of the T matrix and setting the rest to zero produces a statistically efficient
estimator of Σ, it may not be adequate in cases when the diagonals are not smooth or
when there may be small but nonzero elements in T . An alternative approach is to use
penalized likelihood as proposed by Huang et al. (2006). By imposing roughness penalties
on the negative log-likelihood function, this procedure essentially provides a solution to the
sequence of regression equations 2–2. The class of Lp penalties with p = 1, 2 are considered
under the general framework of penalized likelihood for regression models (Fan and Li,
2001). The L1 and L2 penalties allow shrinkage on the GARPs in a similar fashion as
LASSO and ridge regression, respectively. Moreover, the L1 penalty can shrink some of
the GARPs toward zero and thus provide a selection scheme for regression coefficients.
This is however different from banding T where φt,t−j is set to zero for all j > m0. With
the L1 penalty, the zeroes can be irregularly placed in any given row of T . Levina et al.
(2008) proposed a similar penalized likelihood procedure called adaptive banding. Their
method uses a nested lasso penalty which sets φt,t−j to zero for all j > k, where k may
34
be different for each row of T . Adaptive banding is particularly useful in the case when
sparsity in the inverse covariance matrix is desirable such as in graphical models (Yuan
and Lin, 2007). Other approaches include using the L1 penalty directly on the covariance
matrix (Banerjee et al., 2006; Bickel and Levina, 2008), hierarchical priors to allow zeros
in T (Smith and Kohn, 2002; Rothman et al., 2007) or in the inverse covariance matrix
(Wong et al., 2003; Rothman et al., 2007), and Steinian shrinkage toward the identity
(Ledoit and Wolf, 2003; Bickel and Levina, 2008).
We adopt the L2 penalty approach of Huang et al. (2006) and propose an extension
of this method to covariance estimation in the mixture likelihood framework of functional
mapping. Such extension is possible by capitalizing on the posterior probability representation
of the mixture log-likelihood used in the implementation of the EM algorithm (Section
2.3.1). Estimation is then carried out by using the ECM algorithm (Meng and Rubin,
1993) with two CM-steps (Section 2.3.2).
2.2.3 Ridge Regression and LASSO
In this section, we discuss two techniques used in solving regression problems: ridge
regression and LASSO.
Assume the linear regression model, y = Xβ + ε, where y = (y1, y2, ..., yn)′ is the
vector of responses,
X =
x11 x21 · · · xp1
x12 x22 · · · xp2
......
......
x1n x2n · · · xpn
is the design matrix with rank(X) = p, β = (β1, ..., βp)′ is the parameter vector,
ε = (ε1, ε2, ..., εn)′ is the vector of independent and identically distributed (iid) random
errors satisfying E(ε) = 0 and E(ε′ε) = σ2I, and I is the identity matrix. When ε is
normal, the ordinary least squares (OLS) estimate
β = (X ′X)−1X ′y, (2–6)
35
is based on the minimum residual sums of squares
ε′ε = (y −Xβ)′(y −Xβ)′ =n∑
i=1
(yi −
p∑j=1
xijβj
)2
, (2–7)
and gives the minimum variance among unbiased linear estimators of β. A drawback of
the OLS estimator 2–6 is that it is not unique when the design matrix X is less than full
rank, i.e. rank(X) < p.
Assume X ′X is in correlation form and let its eigenvalues be given by
λmax = λ1 ≥ λ2 ≥ ... ≥ λp = λmin > 0. (2–8)
The expected squared distance of β from β can be expressed in terms of eigenvalues as
E[(β − β)′(β − β)] = σ2Trace(X ′X)−1 = σ2
p∑i=1
1
λi
(2–9)
and the variance, when ε is normal, is given by
V [(β − β)′(β − β)] = 2σ4Trace(X ′X)−2 = 2σ4
p∑i=1
(1/λi)2 (2–10)
(Hoerl and Kennard, 1970; see Appendix B for a derivation). Eq. 2–9 can also be
expressed as
E[β′β] = β′β + σ2
p∑i=1
1
λi
. (2–11)
Ideally, X ′X is nearly the identity matrix so that the predictor variables are
orthogonal to each other and are weakly correlated. However, in the presence of mul-
ticollinearity, this will not be the case and X ′X can have at least one small eigenvalue.
Multicollinearity occurs when there are near linear dependencies or high correlations
among the regressors or predictor variables. Mathematically, this means that there exist
constants cj|j = 1, ..., p, not all zero, such that
p∑j=1
cjxj ≈ 0.
36
As can be seen from Eqs. 2–9 and 2–10, small eigenvalues can cause the expected value
and variance of the squared distance from β to β to be large or, as shown by Eq. 2–11,
the regression coefficients themselves to be too large in magnitude. Similarly, the variance
of β can also be inflated since var(β) = σ2(X ′X)−1. As a result, the mean squared error
(MSE) also becomes inflated and predictions based on the OLS estimator (2–6) are not
very reliable.
One way of resolving multicollinearity is through ridge regression. The idea of ridge
regression is to make X ′X close to the identity matrix by replacing it with X ′X + κI
where κ is some positive number and I is the identity matrix. The resulting estimator is
βr = (X ′X + κI)−1X ′y (2–12)
which is essentially a shrunk version of β toward 0. βr yields larger eigenvalues λi + κ, for
i = 1, ..., p, and therefore, smaller prediction variances. It should be noted however, that
there is a trade-off for this because unlike β, βr is not unbiased. But on the average, we
still get lower MSE and better overall prediction. Hoerl and Kennard (1970) provides ways
for selecting κ.
LASSO does a similar approach as ridge regression in reducing variance by sacrificing
bias. LASSO also shrinks the regression coefficients towards 0 but goes further by possibly
allowing some of them to be 0. This is desirable in the sense that the resulting model
is parsimonious and has better interpretation, because only the coefficients with strong
effects are included in the model. The LASSO estimate of β, which we shall denote as βl,
can be obtained by minimizing
n∑i=1
(yi −
p∑j=1
xijβj
)2
subject to the constraint
p∑j=1
|βj| ≤ t (2–13)
where t ≥ 0 is a tuning parameter. This is a quadratic programming problem with linear
inequality constraints and Tibshirani (1996) provides efficient and stable algorithms to
37
solve it. 2–13 is equivalent to the penalized residual sum of squares
n∑i=1
(yi −
p∑j=1
xijβj
)2
+ λ
p∑j=1
|βj|, (2–14)
(Gill et al., 1981; Tibshirani, 1996) where λ is a tuning parameter.
Ridge regression can also be expressed as a constrained optimization problem as a
minimization ofn∑
i=1
(yi −
p∑j=1
xijβj
)2
subject to
p∑j=1
β2j ≤ t (2–15)
where t ≥ 0 is a tuning parameter or equivalently, a minimization of
n∑i=1
(yi −
p∑j=1
xijβj
)2
+ λ
p∑j=1
β2j , (2–16)
(Tibshirani, 1996) where λ is a tuning parameter. Therefore, 2–16 also leads to 2–12 with
κ = λ.
2.2.4 Penalized Likelihood
Let L(θ) be a loss function of some variable θ. An estimation procedure usually
involves minimization of L(θ) to obtain an estimator θ that is said to have a ”good”
fit. However, there may be an aspect of θ, say p(θ), that a modeler wants to control or
consider besides it being a good fit. This can be achieved by minimizing the penalized
likelihood, instead of L(θ), which has the generic form
L(θ) + λp(θ) (2–17)
where p(θ) is called the penalty function and λ > 0 is a tuning parameter. The idea
behind penalized likelihood is, in some sense, similar to mean squared error
mse = bias2 + variance (2–18)
38
where a balance between bias and variance is often desired. In this context, the fit and
p(θ), correspond to bias and variance, respectively. As an example, consider the model
yi = f(xi) + εi, i = 1, ..., n (2–19)
where yi is the response in a regression on xi, εi is i.i.d. N(0, σ2) and f is a function to be
estimated. Here, θ = f and L(f) =∑n
i=1(yi− f)2, the residual sum of squares, without the
constants. A completely unbiased estimate of f is a curve, f , that fits all the yi’s exactly.
However, this curve should have a high variance because of its rapid local variation. We
say that f in this case is ”rough” so that if we want to control the ”roughness” aspect of
f , we can use it as a penalty, p(f), in 2–17. Towards the other extreme, another estimate
of f can be too smooth but may be severely biased. Figure 2-1 shows three different
estimates of f with varying degrees of roughness. The ”in between” curve seems to be
the best estimate because it has a good balance between fit and roughness. The tuning
parameter λ controls the relative importance of these two.
A popular choice for assessing roughness is the integrated squared second derivative
(ISSD)
pISSD(f) =
∫[f ′′(x)]2dx (2–20)
which measures the curvature of a function (Ramsay and Silverman, 1997; Green, 1999).
Highly variable functions will have high values for pISSD. A linear function has pISSD = 0
and therefore has the least curvature.
Another example of penalized likelihood is ridge regression which we have seen
in Section 2.2.3. In ridge regression, 2–16 has the same form as 2–17 with θ = β =
(β1, ..., βp)′. The aspect of β that is being controlled is the size of its elements (the
regression coefficients), quantified by pr(β) = β′β =∑p
j=1 β2j . LASSO also controls the
size of the regression coefficients but by using the penalty pl(β) =∑p
j=1 |βj| instead. Ridge
regression and LASSO are special cases of a family of penalized regressions called bridge
regression which imposes a penalty function of the form∑p
j=1 |βj|γ, γ ≥ 1 (Fu, 1998).
39
−4 −3 −2 −1 0 1 2 3 4
−3
−2
−1
0
1
2
3
x
y
smooth
in between
rough
Figure 2-1. Penalized likelihood in curve estimation. Depending on the penalty, theestimated curve can either be rough, smooth or in between. The in betweencurve illustrates a balance between bias and variance.
Penalized likelihood is also used in model selection. Suppose we have model Mi
with parameter vector θi and likelihood function Li(θi). Let Λ = L1(θ1)/L2(θ2) where
θi = argminLi(θi), i = 1, 2. Suppose model M2 is nested within model M1. Then by
the likelihood ratio test, the simpler model M2 is rejected if −2 log Λ exceeds a certain
percentile of the χ2p1−p2
distribution, where pi is the number of parameters in Mi. However,
if the amount of data is large, the more complex model M1 is selected even when the
simpler model M2 is true (Lindley, 1957; Green, 1999). Various approaches that utilize
penalized likelihood have been developed to alleviate this problem. One such approach is
the Bayesian Information Criteria (BIC; Schwarz, 1978) which, for model Mi, is defined as
BICi = −2 log Li(θi) + pi log n (2–21)
40
where −2 log Li(θi) is the loss function, pi log n is the penalty term, and n is the sample
size. The model with lowest BIC is selected as the best model. Models M1 and M2 can be
compared by
BIC1 −BIC2 = −2 log Λ + (p1 − p2) log n (2–22)
which shows a penalized form of the likelihood ratio test statistic. Other approaches such
as Akaike’s Information Criteria (AIC; Akaike, 1974)
− 2 log Λ + 2(p1 − p2) (2–23)
and Mallows’ Cp (Mallows, 1973)
− 2 log Λ + (p1 − p2) (2–24)
use different forms of the penalties. The main idea behind these criteria is penalize more
complex models to favor the simpler ones based on the number of parameters in each of
them.
Recall that LASSO shrinks the regression coefficients towards zero and even allow
them to be exactly zero. Therefore, in addition to controlling the size of the regression
coefficients through pl(β) =∑p
j=1 |βj|, LASSO also implements model selection by
providing simpler or more parsimonious regression models.
For more about penalized likelihood see Ramsay and Silverman (1997) and Green
(1999), and for asymptotic theory, Cox et al. (1990).
2.3 Covariance Estimation in Functional Mapping
2.3.1 Computing the Penalized Likelihood Estimates
Let the kth genotype density be written as
fk(yi) = (2π)−m/2|Σ|−1/2 exp−(yi − gk)T Σ−1(yi − gk)/2
= (2π)−m/2|Σ|−1/2 exp−(yki )
T Σ−1(yki )/2 (2–25)
where yki = yi − gk, k = 1, ..., J .
41
Recall that the log-likelihood function is
log L(Ω) =n∑
i=1
log
[J∑
k=1
pk|ifk(yi|Ω)
](2–26)
and that
∂
∂θlog L(Ω) =
n∑i=1
J∑
k=1
Pk|i∂
∂θlog fk(yi|Ω) (2–27)
where
Pk|i =pk|ifk(yi|Ω)∑J
h=1 ph|ifh(yi|Ω)(2–28)
and θ ∈ Ω = (Ωµ, ΩΣ).
It follows from TΣT ′ = D, Eq. 2–1, that Σ−1 = T ′DT and |Σ| = |D|. Therefore, if Ωµ
is given,
∂
∂θlog L(ΩΣ) =
n∑i=1
J∑
k=1
Pk|i∂
∂θ
(−m
2log 2π − 1
2log |Σ| − 1
2yk
i
′Σ−1yk
i
)
= −1
2
n∑i=1
J∑
k=1
Pk|i∂
∂θ
(log |Σ|+ yk
i
′Σ−1yk
i
)
= −1
2
n∑i=1
J∑
k=1
Pk|i∂
∂θ
(m∑
t=1
log σ2t +
m∑t=1
εkit
2
σ2t
)(2–29)
where εki1 = yk
i1 and εkit = yk
it −∑t−1
j=1 φtjykij for t = 2, ..., m. It is implicitly assumed,
therefore, that σ2t =var(εk
t ) for k = 1, ..., J . Note that if εk = (εk1, ..., ε
km)′ and yk =
(yk1 , ..., y
km)′ then εk = Tyk so that var(εk) = TΣT ′=D.
Define the penalized negative log-likelihood as
− 2 log L(ΩΣ) + λp(φtj). (2–30)
Assuming the L2 penalty p(φtj) =∑m
t=2
∑t−1j=1 φ2
tj, we have
− 2 log L(ΩΣ) + λp(φtj) = −2n∑
i=1
log
[J∑
k=1
pk|ifk(yi|ΩΣ)
]+
m∑t=2
t−1∑j=1
φ2tj (2–31)
Our problem is immediately solved if 2–31 can be expressed in the same form as 2–16
where the φtj’s correspond to the βj’s. However, the first term on the right side of 2–31
42
cannot be explicitly written in terms of the φtj’s, unless it is in derivative form 2–27.
Thus, by taking the derivative of 2–31 and using 2–29, we get
∂
∂θ[−2 log L(ΩΣ) + λp(φtj)] = −2
n∑i=1
J∑
k=1
Pk|i∂
∂θlog fk(yi|ΩΣ) + λ
∂
∂θ
m∑t=2
t−1∑j=1
φ2tj
=n∑
i=1
J∑
k=1
Pk|i∂
∂θ
(m∑
t=1
log σ2t +
m∑t=1
εkit
2
σ2t
)+ λ
∂
∂θ
m∑t=2
t−1∑j=1
φ2tj
=∂
∂θ
[n∑
i=1
J∑
k=1
Pk|i
(m∑
t=1
log σ2t +
m∑t=1
εkit
2
σ2t
)+ λ
m∑t=2
t−1∑j=1
φ2tj
]
=∂
∂θ
[n∑
i=1
J∑
k=1
Pk|i
(log σ2
1 +εki1
2
σ21
)]+
∂
∂θ
m∑t=2
[n∑
i=1
J∑
k=1
Pk|i
(m∑
t=1
log σ2t +
m∑t=1
εkit
2
σ2t
)+ λ
m∑t=2
t−1∑j=1
φ2tj
].
Notice that, without the derivative, the third line in the preceding calculation has the
same form as 2–16 when written in terms of φtj because εkit = yk
it −∑t−1
j=1 φtjykij for
t = 2, ..., m. Thus, we need to minimize
n∑i=1
J∑
k=1
Pk|i
(log σ2
1 +εki1
2
σ21
)(2–32)
andn∑
i=1
J∑
k=1
Pk|i
(log σ2
t +m∑
t=1
εkit
2
σ2t
)+ λ
t−1∑j=1
φ2tj (2–33)
for each t = 2, ..., m.
The minimizer of 2–32 can be obtained by solving
∂
∂σ21
[n∑
i=1
J∑
k=1
Pk|i
(log σ2
1 +εki1
2
σ21
)]=
n∑i=1
J∑
k=1
Pk|i
(1
σ21
− yki1
2
σ41
)= 0
which yields
σ12 =
∑ni=1
∑Jk=1 Pk|iyk
i12
n. (2–34)
For t = 2, ..., m, 2–33 can be minimized by alternating minimization over σ2t and φtj,
j = 1, 2, ..., t− 1 (see Appendix C). The solutions are
43
σ2t =
∑ni=1
∑Jk=1 Pk|i
(yk
it −∑t−1
j=1 ykijφtj
)2
n(2–35)
and
φt(t) = (Ht + λIt)−1gt (2–36)
where φt(t) = (φt1, φt2, ..., φt,t−1)′ and Ht, It and gt are given in Appendix C. Notice
the similarity of 2–36 to 2–12 and that in formulas 2–34, 2–35 and 2–36, the posterior
probabilities, Pk|i’s, are the weights for the genotype groups, k = 1, ..., J . A nonparametric
covariance estimate, ΣNP can therefore be obtained through ΣNP = T−1D(T−1)′, where
the elements of D are given by 2–34 and 2–35, and the elements of T are given by 2–36.
The preceding calculations were based on the L2 penalty, p(φtj) =∑m
t=2
∑t−1j=1 φ2
tj. If
the L1 penalty, p(φtj) =∑m
t=2
∑t−1j=1 |φtj|, is used instead, closed form solutions like 2–35
and 2–36 cannot be obtained and an iterative algorithm is needed. This is carried out by
using an iterative local quadratic approximation of∑t−1
j=1 |φtj| (Fan and Li, 2001; Ojelund
et al., 2001). The reader is referred to Huang et al.(2006) for additional details.
2.3.2 From EM to ECM Algorithm
The EM algorithm can also be used for penalized likelihood estimation (Green,
1990). In the E-step of the EM algorithm (Section 1.2.2), a penalty term, p(Ω), on the
model parameters, can be introduced to the complete data log-likelihood, A–1, to get the
penalized complete data log-likelihood
log LPc (Ω) =
n∑i=1
J∑
k=1
xik[log pk|i + log fk(yi|Ω)] + λp(Ω). (2–37)
Clearly, taking the conditional expectation of 2–37 does not affect the penalty term
because the expectation is taken with respect to the missing variable x. Thus, at the
E-step, we have
QP (Ω|Ω(j)) =n∑
i=1
J∑
k=1
P(j)k|i [log pk|i + log fk(yi|Ω)] + λp(Ω). (2–38)
44
The M-step, therefore, involves solving
∂
∂θQP (Ω|Ω(j)) =
n∑i=1
J∑
k=1
P(j)k|i
∂
∂θlog fk(yi|Ω) + λ
∂
∂θp(Ω) = 0 (2–39)
to get Ω(j+1), where θ ∈ Ω.
The derived formulas for ΩΣ in the preceding section cannot be directly used in the
EM algorithm because we assumed that Ωµ was given. We instead use a variant of the EM
algorithm called the Expectation and Conditional Maximization (ECM) algorithm (Meng
& Rubin, 1993) which partitions the parameter set Ω according to mean and covariance
parameters, Ωµ and ΩΣ, respectively. The ECM algorithm differs from EM in that the
M-step involves a conditional optimization with respect to each partition of Ω. More
precisely, for the (j + 1)th iteration, the ECM algorithm proceeds as follows:
1. Initialize Ω(j) = (Ω(j)µ ,Ω
(j)Σ ).
2. E-Step. Update P(j) using Ω(j).
3. CM-Steps.
• Conditional on P(j) and Ωµ(j), solve for ΩΣ
(j+1) using 2–34, 2–35 and 2–36
(Section 2.3.1).
• Conditional on P(j) and ΩΣ(j+1), solve for Ωµ
(j+1).
4. Repeat steps (2)− (3) until some convergence criterion is met.
Unless a structure, such as AR(1), is imposed on the covariance matrix, it is difficult
to find closed form CM-step solutions for the mean parameters in functional mapping.
Hence, estimation in this case is carried out by using the Nelder-Mead simplex algorithm
(Nelder and Mead, 1965; Zhao et al., 2004) which can be readily implemented by popular
software. See for example the fminsearch built-in function in Matlab or optim in R.
In a backcross population design, Ma et al. (2002) provide closed form iteration
formulas for
Ω(j+1) = a(j+1)k , b
(j+1)k , r
(j+1)k , σ2(j+1)
, ρ(j+1)|k = 1, 2 (2–40)
45
when the mean model is a logistic curve and the covariance structure is AR(1). Similar
formulas can also be obtained when the mean model is in the form of a rational function
and the covariance structure is AR(1) (Yap et al., 2007).
2.3.3 Selection of Tuning Parameter
The tuning parameter λ can be selected either through 5 or 10-fold cross-validation or
generalized cross-validation (Huang et al., 2006; Fan and Li, 2001).
For a K-fold cross-validation, let Z denote the full data set. Z is randomly split into
K subsets of about the same size. Each subset, say Zs (s = 1, ..., K), is used to validate
the log-likelihood based on the parameters estimated using the data Z \ Zs. The value
of λ that maximizes the average of all cross-validated log-likelihoods is used to select an
estimate for Σ.
The cross-validated log-likelihood criterion is given by
C(λ) =1
K
K∑s=1
log Ls(Ω−s) (2–41)
where Ω−s is an estimate of Ω−s which is based on the data set Z \ Zs and Ls is the
likelihood based on Zs. λ = λ is chosen to maximize C(λ).
Note that there really are two sets of tuning parameters in our setting - one under
the null model and another under the alternative. However, because the log-likelihood
under the null model is constant throughout a marker interval, we shall assume that the
corresponding tuning parameter has been estimated accordingly and in the succeeding
sections simply refer to the tuning parameters as the ones for the alternative model.
2.4 Numerical Results
2.4.1 Simulations
In this section, the performance of the nonparametric covariance estimator, ΣNP
(Section 2.3.1), is assessed and compared to an AR(1)-structured estimator, ΣAR(1)
(Section 1.2.1). We investigate data generated from both multivariate normal and
t-distributions. We begin with the former.
46
Assuming an F2 population (J = 3 genotype groups; Section 1.1.2) for QTL mapping,
we randomly generated 6 markers equally spaced on a chromosome 100 cM long with
1 QTL between the second and third markers, 12 cM from the second marker (or 32
cM from the leftmost marker in the chromosome). Each phenotype associated with
the simulated QTL had m = 10 measurements and was sampled from a multivariate
normal distribution, using logistic curves as genotype means under 3 different covariance
structures. The mean parameters were a1 = 30, a2 = 28.5, a3 = 27.5, b1 = b2 = b3 = 5, and
r1 = r2 = r3 = .5 and the covariance structures were as follows:
• Σ1 = AR(1) with σ2 = 3, ρ = 0.6;
• Σ2 = σ2(1 − ρ)I + ρJ), with σ2 = 3, ρ = 0.5, J is a matrix of 1’s, and I is the
identity matrix (Compound Symmetry);
• an unstructured covariance matrix
Σ3 =
0.72 0.39 0.45 0.48 0.50 0.53 0.60 0.64 0.68 0.68
0.39 1.06 1.61 1.60 1.50 1.48 1.55 1.47 1.35 1.29
0.45 1.61 3.29 3.29 3.17 3.09 3.19 3.04 2.78 2.53
0.48 1.60 3.29 3.98 4.07 4.01 4.17 4.18 4.00 3.69
0.50 1.50 3.17 4.07 4.70 4.68 4.66 4.78 4.70 4.36
0.53 1.48 3.09 4.07 4.68 5.56 6.23 6.87 7.11 6.92
0.60 1.55 3.19 4.17 4.66 6.23 8.59 10.16 10.80 10.70
0.64 1.47 3.04 4.18 4.78 6.87 10.16 12.74 13.80 13.80
0.68 1.35 2.78 4.00 4.70 7.11 10.80 13.80 15.33 15.35
0.68 1.29 2.53 3.69 4.36 6.92 10.70 13.80 15.35 15.77
.
Σ1 and Σ2 were considered previously by Huang et al. (2006) and Σ1 by Levina et al.
(2008). Σ3 has increasing diagonal elements and decreasing long term dependence which is
typical of longitudinal growth data. It is based on the sample covariance matrix of a real
data set.
47
Functional mapping was applied to the simulated data, with n = 100 and 400
samples, using a logistic model for the mean, and ΣNP and ΣAR(1) for the covariance
matrix. The simulated chromosomes were searched at every 4 cM (i.e. 0, 4, 8, ..., 100) for
a total of 26 search points across 5 marker intervals. The estimated model parameters
at each point were used to construct an LR plot for the QTL linkage map. For ΣNP ,
the LR plot is constructed from parameter estimates obtained out of individual tuning
parameters λc (c = 1, ..., 26), that are separately cross-validated. However, we focused our
attention only on those λ’s corresponding to the maximum LR at each marker interval.
An initial LR plot was constructed using an arbitrary λ0 (λc = λ0 for all c = 1, ..., 26), and
the maximum on each marker interval was located. At the point corresponding to each
maximum, λ = λd (d = 1, ..., 5) was selected using 5-fold cross-validation. The final model
parameter estimates were based on the λd that produced the maximum LR or maxLR.
In Figure 2-2, the broken line LR plot is the result of our procedure while the solid one
is based on individual λc’s that have each been separately cross-validated. For n = 400,
these two plots are indistinguishable. The reason for this is that, the cross-validated λ’s at
each search point within a marker interval are not that different from one another. Thus,
using one λ for each marker interval (the one that produces the maximum LR) will not
significantly alter the general shape of the LR plot. The two dotted line plots were based
on λc, for all c = 1, 2, ..., 26, set to two different arbitrary values of λ.
To evaluate the estimate Σl (l = 1, 2, 3) of the true covariance structure Σl, a number
of criteria can be used. Among them are the matrix norm losses
L(Σl, Σl) =‖Σl − Σl‖‖Σl‖
where ‖ · ‖ denotes either the operator (‖Σ‖2op = max(ΣΣ′)), Frobenius (‖Σ‖2
F =∑m
i=1
∑nj=1 |σij|2), or matrix 1-norm (‖Σ‖1 = maxj
∑i |σij|), the entropy loss
LE(Σl, Σl) = tr(Σ−1l Σl)− log |Σ−1
l Σl| −m (2–42)
48
0 20 40 60 80 1000
10
20
30
LR
n=100
0 20 40 60 80 100−50
0
50
100
150
LR
n=400
0 20 40 60 80 1000
20
40
60
80
LR
0 20 40 60 80 1000
50
100
150LR
0 20 40 60 80 1000
5
10
15
20
25
LR
0 20 40 60 80 100−10
0
10
20
30
40
LR
Σ1 Σ
1
Σ2 Σ
2
Σ3 Σ
3 individual λ ’s, CV
max λ ’s, CVarbitrary λ ’s
maxLR
maxLR
maxLR
maxLR
maxLR
maxLR
Figure 2-2. Log-likelihood ratio (LR) plots based on simulated data under three differentcovariance structures. The solid line plot is based on cross-validated (CV)tuning parameters at each search point (individual λ’s). The broken line plotis based on cross-validated tuning parameters (max λ’s) corresponding to themaximum LR in each marker interval. The dotted line plot is based on twodifferent arbitrary tuning parameter values, each assumed at all search points.
49
and the quadratic loss
LQ(Σl, Σl) = tr(Σ−1l Σl − I)2 (2–43)
where I is the identity matrix. These losses are all nonnegative and equality to zero holds
when Σl = Σl. There is no agreement as to which of these norms is appropriate for a
particular situation but any of them may be used and the results would qualitatively
be the same (Levina et al., 2008). Here, we use LE and LQ which were also used by Wu
and Pourahmadi (2003), Huang et al. (2006 and 2007b), and Levina et al. (2008). The
corresponding risk functions are defined by
rE(Σl, Σl) = E[LE(Σl, Σl)]
and
rQ(Σl, Σl) = E[LQ(Σl, Σl)].
An estimator Σi is deemed better than Σj, for some i 6= j, if its risk function is smaller
i.e. rE(Σ, Σi) < rE(Σ, Σj) or rQ(Σ, Σi) < rQ(Σ, Σj). The risk functions are estimated via
Monte Carlo simulation.
100 simulation runs were carried out and the averages on all runs of the estimated
QTL location, logistic mean parameter estimates, maxLR, entropy and quadratic losses,
including the respective Monte carlo standard errors (SE), were recorded. The results are
shown in Tables 2-1 and 2-2. For Σ1, ΣAR(1) does well as expected, but ΣNP also does a
good job. Both provide better precision with increased sample size. The maxLR values
are comparable i.e. 38.52 and 112.03 from Table 1 versus 37.78 and 128.21 from Table 2,
respectively, are not too different from each other.
For Σ2 and Σ3, ΣNP does better than ΣAR(1). ΣAR(1) shows high values for both
averaged losses which translates to significantly biased estimates in QTL location and
poor mean parameter estimates, particularly for Σ3 at the second and third genotype
group. Increased sample size does not help and even makes mean parameter estimates
worse in the case of Σ3. Values of maxLR for ΣNP and ΣAR(1) are very different in these
50
cases. But because the averaged losses for ΣNP are much smaller, we can conclude that
the corresponding maxLR values must be close to the true ones.
To assess the robustness of our proposed nonparametric estimator, we modeled
simulated data from a t-distribution with 5 degrees of freedom. That is, samples were
taken from
Y = gk +X√Z/v
(2–44)
where X ∼ N(0, Σ), Z ∼ χ2(ν) and gk is the logistic mean for genotype k = 1, 2, 3. The
results are presented in Tables 2-3 and 2-4. We excluded the column for maxLR because
it is not appropriate in this scenario. The results show that despite inflated average losses,
ΣNP still outperforms ΣAR(1). Notice that the quadratic loss is severely inflated because of
the fat tails of the t-distribution. It may not be a reliable measure of performance but we
present the results here for illustration.
51
Tab
le2-
1.A
vera
ged
QT
Lpo
siti
on,m
ean
curv
epa
ram
eter
s,m
axim
umlo
g-lik
elih
ood
rati
os(m
axL
R),
entr
opy
and
quad
rati
clo
sses
and
thei
rst
anda
rder
rors
(giv
enin
pare
nthe
ses)
for
thre
eQ
TL
geno
type
sin
anF
2po
pula
tion
unde
rdi
ffere
ntsa
mpl
esi
zes
(n)
base
don
100
sim
ulat
ion
repl
icat
es(Σ
NP,N
orm
alD
ata)
.Q
TL
QT
Lge
noty
pe1
QT
Lge
noty
pe2
QT
Lge
noty
pe3
Cov
aria
nce
nLoc
atio
na
1b 1
r 1a
2b 2
r 2a
3b 3
r 3m
axL
RL
EL
Q
Σ1
100
32.8
430
.11
5.04
0.50
28.5
24.
970.
5027
.47
5.06
0.50
38.5
20.
531.
00(0
.99)
(0.0
7)(0
.04)
(0.0
0)(0
.05)
(0.0
3)(0
.00)
(0.0
7)(0
.04)
(0.0
0)(1
.27)
(0.0
1)(0
.02)
400
31.5
230
.00
4.99
0.50
28.4
95.
010.
5027
.52
4.97
0.50
112.
030.
140.
28(0
.28)
(0.0
3)(0
.01)
(0.0
0)(0
.02)
(0.0
1)(0
.00)
(0.0
3)(0
.02)
(0.0
0)(1
.80)
(0.0
0)(0
.01)
Σ2
100
32.5
630
.07
4.98
0.50
28.5
54.
990.
5027
.38
5.07
0.51
47.0
50.
440.
83(0
.76)
(0.0
6)(0
.03)
(0.0
0)(0
.04)
(0.0
2)(0
.00)
(0.0
6)(0
.04)
(0.0
0)(1
.32)
(0.0
1)(0
.02)
400
31.6
830
.04
4.97
0.50
28.4
85.
010.
5027
.54
4.98
0.50
145.
830.
130.
26(0
.26)
(0.0
2)(0
.01)
(0.0
0)(0
.02)
(0.0
1)(0
.00)
(0.0
2)(0
.01)
(0.0
0)(2
.00)
(0.0
0)(0
.01)
Σ3
100
33.2
430
.07
5.04
0.50
28.5
95.
010.
5027
.66
5.01
0.50
19.5
70.
561.
09(2
.22)
(0.1
0)(0
.03)
(0.0
0)(0
.06)
(0.0
2)(0
.00)
(0.0
9)(0
.02)
(0.0
0)(0
.59)
(0.0
1)(0
.02)
400
32.3
229
.99
5.00
0.50
28.5
05.
000.
5027
.62
5.01
0.50
38.9
00.
140.
29(1
.19)
(0.0
4)(0
.01)
(0.0
0)(0
.03)
(0.0
1)(0
.00)
(0.0
5)(0
.01)
(0.0
0)(1
.06)
(0.0
0)(0
.01)
Tru
eva
lues
:32
305
0.5
28.5
50.
527
.55
0.5
52
Tab
le2-
2.A
vera
ged
QT
Lpo
siti
on,m
ean
curv
epa
ram
eter
s,m
axim
umlo
g-lik
elih
ood
rati
os(m
axL
R),
entr
opy
and
quad
rati
clo
sses
and
thei
rst
anda
rder
rors
(giv
enin
pare
nthe
ses)
for
thre
eQ
TL
geno
type
sin
anF
2po
pula
tion
unde
rdi
ffere
ntsa
mpl
esi
zes
(n)
base
don
100
sim
ulat
ion
repl
icat
es(Σ
AR
(1),
Nor
mal
Dat
a).
QT
LQ
TL
geno
type
1Q
TL
geno
type
2Q
TL
geno
type
3C
ovar
ianc
en
Loc
atio
na
1b 1
r 1a
2b 2
r 2a
3b 3
r 3m
axL
RL
EL
Q
Σ1
100
33.2
429
.99
5.03
0.50
28.4
84.
990.
5027
.57
5.04
0.50
37.7
80.
020.
04(0
.77)
(0.0
6)(0
.04)
(0.0
0)(0
.05)
(0.0
3)(0
.00)
(0.0
7)(0
.05)
(0.0
0)(1
.09)
(0.0
0)(0
.00)
400
31.8
030
.01
4.97
0.50
28.5
05.
020.
5027
.51
4.98
0.50
128.
210.
010.
01(0
.32)
(0.0
3)(0
.02)
(0.0
0)(0
.02)
(0.0
1)(0
.00)
(0.0
3)(0
.02)
(0.0
0)(1
.98)
(0.0
0)(0
.00)
Σ2
100
35.2
830
.36
4.63
0.48
28.5
45.
040.
5027
.12
5.51
0.52
64.6
82.
156.
57(1
.57)
(0.0
9)(0
.05)
(0.0
0)(0
.07)
(0.0
4)(0
.00)
(0.0
9)(0
.07)
(0.0
0)(2
.53)
(0.0
6)(0
.38)
400
31.9
630
.51
4.62
0.48
28.4
25.
080.
5027
.14
5.35
0.51
193.
842.
669.
94(0
.54)
(0.0
4)(0
.02)
(0.0
0)(0
.03)
(0.0
2)(0
.00)
(0.0
4)(0
.03)
(0.0
0)(4
.65)
(0.0
4)(0
.25)
Σ3
100
46.4
830
.39
5.33
0.51
28.0
14.
990.
5227
.85
5.20
0.51
112.
669.
6473
.15
(2.7
4)(0
.38)
(0.0
9)(0
.00)
(0.3
5)(0
.07)
(0.0
0)(0
.39)
(0.0
9)(0
.00)
(2.8
3)(0
.13)
(2.0
6)40
043
.64
30.6
05.
280.
5127
.64
4.93
0.52
28.3
85.
340.
5028
8.87
10.1
480
.36
(2.6
4)(0
.30)
(0.0
6)(0
.00)
(0.3
4)(0
.07)
(0.0
0)(0
.33)
(0.0
8)(0
.00)
(6.0
9)(0
.07)
(1.1
2)Tru
eva
lues
:32
305
0.5
28.5
50.
527
.55
0.5
53
Tab
le2-
3.A
vera
ged
QT
Lpo
siti
on,m
ean
curv
epa
ram
eter
s,m
axim
umlo
g-lik
elih
ood
rati
os(m
axL
R),
entr
opy
and
quad
rati
clo
sses
and
thei
rst
anda
rder
rors
(giv
enin
pare
nthe
ses)
for
thre
eQ
TL
geno
type
sin
anF
2po
pula
tion
unde
rdi
ffere
ntsa
mpl
esi
zes
(n)
base
don
100
sim
ulat
ion
repl
icat
es(Σ
NP,D
ata
from
t-di
stri
buti
on).
QT
LQ
TL
geno
type
1Q
TL
geno
type
2Q
TL
geno
type
3C
ovar
ianc
en
Loc
atio
na
1b 1
r 1a
2b 2
r 2a
3b 3
r 3L
EL
Q
Σ1
100
32.5
230
.07
5.02
0.50
28.5
85.
020.
5027
.53
5.07
0.50
2.56
10.5
1(1
.34)
(0.0
8)(0
.04)
(0.0
0)(0
.06)
(0.0
4)(0
.00)
(0.0
9)(0
.06)
(0.0
0)(0
.12)
(0.7
5)40
032
.88
30.0
35.
010.
5028
.46
5.00
0.50
27.5
94.
990.
501.
846.
24(0
.49)
(0.0
4)(0
.02)
(0.0
0)(0
.03)
(0.0
2)(0
.00)
(0.0
3)(0
.02)
(0.0
0)(0
.06)
(0.2
5)
Σ2
100
32.5
630
.15
4.94
0.50
28.5
45.
020.
5027
.47
5.09
0.50
2.27
8.81
(1.0
8)(0
.07)
(0.0
3)(0
.00)
(0.0
5)(0
.03)
(0.0
0)(0
.08)
(0.0
4)(0
.00)
(0.1
1)(0
.66)
400
32.8
430
.06
4.97
0.50
28.4
85.
010.
5027
.53
5.02
0.50
1.78
5.86
(0.0
3)(0
.03)
(0.0
2)(0
.00)
(0.0
3)(0
.01)
(0.0
0)(0
.03)
(0.0
2)(0
.00)
(0.0
5)(0
.22)
Σ3
100
40.9
229
.95
5.03
0.50
28.6
35.
000.
5027
.78
5.05
0.50
2.68
11.8
2(2
.76)
(0.1
3)(0
.03)
(0.0
0)(0
.09)
(0.0
2)(0
.00)
(0.1
4)(0
.04)
(0.0
0)(0
.14)
(1.2
6)40
033
.08
29.9
55.
000.
5028
.56
5.02
0.50
27.5
14.
990.
501.
906.
55(1
.37)
(0.0
6)(0
.01)
(0.0
0)(0
.04)
(0.0
1)(0
.00)
(0.0
6)(0
.02)
(0.0
0)(0
.06)
(0.2
6)Tru
eva
lues
:32
305
0.5
28.5
50.
527
.55
0.5
54
Tab
le2-
4.A
vera
ged
QT
Lpo
siti
on,m
ean
curv
epa
ram
eter
s,m
axim
umlo
g-lik
elih
ood
rati
os(m
axL
R),
entr
opy
and
quad
rati
clo
sses
and
thei
rst
anda
rder
rors
(giv
enin
pare
nthe
ses)
for
thre
eQ
TL
geno
type
sin
anF
2po
pula
tion
unde
rdi
ffere
ntsa
mpl
esi
zes
(n)
base
don
100
sim
ulat
ion
repl
icat
es(Σ
AR
(1),
Dat
afr
omt-
dist
ribu
tion
).Q
TL
QT
Lge
noty
pe1
QT
Lge
noty
pe2
QT
Lge
noty
pe3
Cov
aria
nce
nLoc
atio
na
1b 1
r 1a
2b 2
r 2a
3b 3
r 3L
EL
Q
Σ1
100
34.0
030
.04
5.01
0.50
28.6
15.
000.
5027
.51
5.06
0.51
1.65
5.03
(1.1
2)(0
.08)
(0.0
4)(0
.00)
(0.0
6)(0
.03)
(0.0
0)(0
.08)
(0.0
6)(0
.00)
(0.1
0)(0
.39)
400
33.0
429
.98
4.99
0.50
28.4
85.
010.
5027
.61
4.98
0.50
1.61
4.75
(0.4
0)(0
.03)
(0.0
2)(0
.00)
(0.0
3)(0
.02)
(0.0
0)(0
.03)
(0.0
2)(0
.00)
(0.0
7)(0
.28)
Σ2
100
38.9
230
.57
4.62
0.48
28.4
85.
090.
5027
.13
5.58
0.52
6.24
35.2
5(1
.91)
(0.1
3)(0
.06)
(0.0
0)(0
.09)
(0.0
5)(0
.00)
(0.1
4)(0
.09)
(0.0
0)(0
.25)
(2.5
0)40
032
.16
30.6
14.
550.
4828
.35
5.13
0.51
27.2
25.
300.
517.
3545
.86
(0.4
8)(0
.05)
(0.0
2)(0
.00)
(0.0
4)(0
.02)
(0.0
0)(0
.05)
(0.0
3)(0
.00)
(0.1
7)(1
.75)
Σ3
100
49.1
229
.71
5.23
0.58
28.8
05.
210.
5127
.04
5.37
0.53
22.0
430
1.53
(2.9
6)(0
.50)
(0.1
1)(0
.06)
(0.3
8)(0
.08)
(0.0
0)(0
.49)
(0.1
9)(0
.01)
(0.5
6)(1
4.94
)40
042
.64
30.7
85.
380.
5128
.21
5.08
0.52
27.1
25.
050.
5224
.45
366.
54(2
.39)
(0.3
8)(0
.09)
(0.0
0)(0
.35)
(0.0
8)(0
.00)
(0.3
6)(0
.08)
(0.0
0)(0
.49)
(15.
65)
Tru
eva
lues
:32
305
0.5
28.5
50.
527
.55
0.5
55
2.4.2 Real Data Analysis
We study a real mice data set from an experiment by Vaughn et al. (1999). Briefly,
the data consists of an F2 (J = 3 genotype groups; Section 1.1.2) population of 259 male
and 243 female progeny with 96 markers in a total of 19 chromosomes. The mice were
measured for their body mass at 10 weekly intervals starting at age 7 days. Corrections
were made for the effects due to dam, litter size at birth, parity, and sex (Cheverud et al.,
1996; Kramer et al., 1998). A plot of the weight data is shown in Figure 1-3.
Functional mapping was first used to analyze this data in Zhao et al. (2004), who
investigated QTL × sex interaction. The authors used a logistic curve (Eq. 1–4) to model
the genotype means and employed the transform-both-sides (TBS; Section 1.2.1) technique
for variance stabilization in order to utilize an AR(1) structure. Their method identified 4
of 19 chromosomes that each had significant QTLs and they concluded that there were sex
differences of body mass growth in mice. Zhao et al. (2005) applied an SAD covariance
structure in functional mapping and found 3 QTLs. Liu and Wu (2007) likewise analyzed
the same data using a Bayesian approach in functional mapping and detected only 3
significant QTLs.
Here, we applied our proposed nonparametric estimator, ΣNP , in a genome-wide scan
for growth QTL without regard to sex. We scanned the linkage map at intervals of 4 cM.
Figure 2-3 shows the LR plots for all 19 chromosomes. They were obtained using λ’s that
were cross-validated at each search point. We conducted a permutation test (Doerge and
Churchill, 1996; Section 1.2.3) to identify significant QTLs. For every permutation run,
we calculated maxLRe for chromosome e = 1, ..., 19 using the same general procedure
as in the simulations (Section 2.4.1). In this mice data set, however, some markers were
either missing or not genotyped and we used only the available markers (Table 2-5). Thus,
every marker interval had different sets of available phenotype data. But we believe this
did not affect the results because of the large sample size of the available data. We looked
at chromosomes 6 and 7 and found this to be the case. Figure 2-4 shows LR plots based
56
on tuning parameters cross-validated at each search point (solid line) and using the same
tuning parameter for each search point as the one corresponding to the maximum LR in
each marker interval (broken line; our procedure). The dotted line plots were again based
on arbitrary tuning parameters and presented here to illustrate shape consistency. Each
permutation run yielded the maximum maxLRe, for all e = 1, ..., 19, or the genome-wide
maxLR. The two horizontal lines in Figure 2-3 correspond to 95% (broken) and 99%
(solid) thresholds based on 100 permutation test runs. There were 9 chromosomes with
significant QTLs (1, 4, 6, 7, 9, 10, 11, 14 and 15) based on the 95% threshold but only 7
under 99% (1, 4, 6, 7, 10, 11 and 15). The two chromosomes that did not make the 99%
threshold (9 and 14) barely made the 95%. For this mice data set, we recommend using
the 99% threshold because there were only 100 permutation test runs. Zhao et al. (2004)
identified QTLs in chromosomes 6, 7, 11 and 15, and Zhao et al. (2005) and Liu and Wu
(2007) found QTLs in chromosomes 6, 7 and 10. These were all at the 95% threshold.
Our findings verified the results of these previous studies that made use of the functional
mapping method and even detected more QTLs. Although there is a discrepancy in our
results and others, it is inconclusive to say that these additional QTLs that our proposed
model detected are nonexistent. In fact, Vaughn et al. (1999) identified 17 QTLs, although
most of them are suggestive, using simple interval mapping.
The estimated genotype mean curves for the detected QTLs are shown in Figure ??.
Three genotypes at a QTL have different growth curves, indicating the temporal genetic
effects of this QTL on growth processes for mouse body mass. Some QTLs, like those
on chromosomes 6, 7 and 10, act in an additive manner because the heterozygote (Qq,
broken curves) are intermediate between the two homozygotes (QQ, solid curves and qq,
dot curves). Some QTL such as one on chromosome 11 are operational in a dominant way
since the heterozygote is very close to one of the homozygotes.
57
01020304050607080
LR
01020304050607080
LR
01020304050607080
LR
1 2
3 4
5
6 7
8 9
10
11
12
13
14
15
16
17
18
19
Tes
t Pos
ition
99%
cut
−of
f
95%
cut
−of
f
QT
L lo
catio
n
20 c
M
Fig
ure
2-3.
.T
he
genom
icpos
itio
nco
rres
pon
din
gto
the
pea
kof
the
curv
eis
the
opti
mal
like
lihood
esti
mat
eof
the
QT
Llo
caliza
tion
indic
ated
by
vert
ical
bro
ken
lines
.T
he
tick
son
the
x-a
xis
indic
ate
the
pos
itio
ns
ofm
arke
rson
the
chro
mos
ome.
The
map
dis
tance
s(i
nce
nti
-Mor
gan)
bet
wee
ntw
om
arke
rsar
eca
lcula
ted
usi
ng
the
Hal
dan
em
appin
gfu
nct
ion.
The
thre
shol
ds
for
clai
min
gth
ege
nom
e-w
ide
exis
tence
ofa
QT
Lar
esh
own
by
hor
izon
tal
lines
.
58
Table 2-5. Available markers and phenotype data of a linkage map in an F2 population ofmice (data from Vaughn et al., 1999).
Marker IntervalsChromosome 1 2 3 4 5 6 7 8
1 378 433 483 467 450 440 4662 414 404 453 465 4303 477 491 489 476 4754 461 475 481 481 4915 441 439 449 381 3856 467 483 485 4817 407 424 459 452 378 372 428 4158 395 453 4729 498 496 49810 401 406 481 490 49711 431 451 468 464 44612 497 489 483 48813 450 443 46614 443 475 49515 491 494 46816 49817 371 39418 487 479 42019 445 468 468
59
0
10
20
30
40
50
60
70
LR
0
10
20
30
40
50
60
70
LR
chrom 6 chrom 7
individual λ ’s, CV max λ ’s, CV
arbitrary λ ’s
45.2
62
.0
86.1
94
.1
26.1
36
.5
46.0
48
.7
60.3
68
.0
82.0
90.0
Figure 2-4. Log-likelihood ratio (LR) plots for chromosomes 6 and 7 of the mice data. Thesolid line plot is based on cross-validated (CV) tuning parameters at eachsearch point (individual λ’s). The broken line plot is based on cross-validatedtuning parameters (max λ’s) corresponding to the maximum LR in eachmarker interval. The dotted line plot is based on two different arbitrary tuningparameter values, each assumed at all search points. Slight differences betweenthe solid and broken line plots may be due to different sample sizes amongmarker intervals (see Table 5).
60
24
68
100510152025303540
Tim
e (w
eek)
Weight (g)2
46
810
0510152025303540
Tim
e (w
eek)
Weight (g)
24
68
100510152025303540
Tim
e (w
eek)
Weight (g)
24
68
100510152025303540
Tim
e (w
eek)
Weight (g)
24
68
100510152025303540
Tim
e (w
eek)
Weight (g)
24
68
100510152025303540
Tim
e (w
eek)
Weight (g)
24
68
100510152025303540
Tim
e (w
eek)
Weight (g)
chro
m 1
ch
rom
4
chro
m 6
chro
m 7
ch
rom
10
chro
m 1
1
chro
m 1
5 G
enot
ype
1
Gen
otyp
e 2
Gen
otyp
e 3
Fig
ure
2-5.
fig:
Mea
ns
61
2.5 Summary and Discussion
Covariance estimation is an important aspect in modeling longitudinal data.
It is difficult, however, because of the large number of parameters to estimate and
the positive-definite constraint. Many longitudinal data models resort to structured
covariances which, although positive-definite and computationally favorable due to a
reduced number of parameters, are possibly highly biased. However, Pourahmadi (1999,
2000) recognized that a positive-definite estimator can be found if modeling is done
through the components of the modified Cholesky decomposition of the covariance matrix
which converts the problem into modeling a set of regression equations. Huang et al.
(2006) employed LASSO and ridge regression techniques through L1 and L2 penalties,
respectively, in a normal penalized likelihood framework to obtain a regularized covariance
estimator. Using these penalties allows shrinkage in the elements of T , even setting some
of them to zero in the case of the L1 penalty.
In this chapter, we adopted Huang et al.’s L2 penalty approach in functional
mapping. This penalty works best when the true T matrix has many small elements.
Using the L1 penalty gives a better estimator when some of the elements of T are actually
zero. However, we believe that the differences in results between using either penalties
will not be significant unless the dimension is very large. Nonetheless, the L1 penalty can
be easily incorporated into our scheme. We have shown how to integrate Huang et al.’s
procedure into the mixture likelihood framework of functional mapping. The key was
to utilize the posterior probability representation of the derivative of the log-likelihood,
2–27, and apply an L2 penalty to the negative log-likelihood. Estimation was then carried
out using the ECM algorithm (Section 2.3.2) with two CM-steps, based on a partition of
the mean and covariance parameters. Our simulations have shown better accuracy and
precision in estimates for QTL location, genotype mean parameters, and maxLR values,
by ΣNP compared to ΣAR(1). The maxLR values are important because the complete LR
plot provides the amount of evidence for the existence of a QTL. LR values noticeably
62
change when very different covariance structures are used. This is of course under the
assumption of multivariate normal data. In our analysis of the mice data, although there
were a few chromosomes that were found to have significant QTLs, chromosomes 6 and
7 (Figure 2-3) seemed to have the largest evidence for QTL existence. The LR plots are
also used in permutation tests (Section 1.2.3) to find a significance threshold. More precise
estimates of the covariance structure means better estimates of the the peak of the LR
plot and therefore more reliable permutation tests results.
With regards to the utilization of our proposed model, we suggest a preliminary
analysis of the data by checking variance and covariance stationarity. If these latter
conditions are satisfied then ΣAR(1) may be an appropriate model. If covariance stationarity
is not an issue then a TBS method (Section 1.2.1) coupled with using ΣAR(1) is applicable.
If no stationarity is detected then an SAD (Section 1.2.1) or ΣNP may be more useful.
Although we did not assess the comparative performance of these two models, we
think that SAD becomes more computationally intensive if the data exhibits long-term
dependence, in which case ΣNP may be more appropriate. ΣNP should also be considered
if other parametric structures are suspect.
63
CHAPTER 3NONPARAMETRIC COVARIANCE ESTIMATION IN FUNCTIONAL
MAPPING OF REACTION NORMS TO TWOENVIRONMENTAL SIGNALS
3.1 Introduction
The phenotypic plasticity of a quantitative trait occurs if the trait changes its
phenotypes with changing environment. Such environment-dependent changes, also
called reaction norms, are ubiquitous in biology. For example, thermal reaction norms
show how performance, such as caterpillar growth rate (Kingsolver et al., 2004) or
growth rate and body size in ectotherms (Angilletta et al.,2004), varies continuously
with temperature (Yap et al., 2007). Another example is the flowering time of Ara-
bidopsis thaliana with respect to changing light intensity (Stratton, 1998). However,
reaction norms and their genetic basis are difficult to model because of the inherent
complexity in the interplay of a multitude of factors involved. An added difficulty is in
their being ”infinite-dimensional” as they require an infinite number of measurements to
be completely described (Kirkpatrick and Heckman, 1989). Wu et al. (2007a) proposed
a functional mapping-based model (Section 1.2) which addresses the latter difficulty by
using a biologically relevant mathematical function to model reaction norms. The authors
considered a parametric model of photosynthetic rate as a function of light irradiance
and temperature and studied the genetic mechanism of such process. They showed,
through extensive simulations, that in a backcross population with one or two-QTLs,
their method accurately and precisely estimated the QTL location(s) and the parameters
of the mean model. However, they assumed the covariance matrix to be a Kronecker
product of two AR(1) structures, each modeling a reaction norm due to one environmental
factor. This type of covariance model is said to be separable. Although computationally
attractive, such model only captures separate reaction norm effects but fails to incorporate
interactions. A more general approach is therefore needed.
64
In the spatio-temporal (or space-time) literature, there exist separable and nonsep-
arable covariance structures which are used to model random processes in the spatial
and temporal domains. Nonseparable means the covariance cannot be expressed as a
Kronecker product of two matrices like separable structures can. The random processes
may be the concentration of pollutants in the atmosphere, groundwater contaminants,
wind speed, or even disposable household incomes. The main importance of the
covariance is in providing a better characterization of the random process to obtain
optimal prediction or kriging of unobserved portions of it. Unlike the separable ones,
nonseparable structures can model the interactions of the random processes in space
and time. Thus it seems natural to consider the utilization of nonseparable structures in
modeling reaction norms that react to two environmental factors. More concretely, we
consider the photosynthetic rate as a random process, and the irradiance and temperature
as the spatial (one dimension) and temporal domains, respectively.
In this chapter, we show through simulations that, in functional mapping of reaction
norms to two environmental signals, (1) nonseparable structures can be utilized as
covariance models and used to generate data of processes that exhibit interactions (2)
the separable model proposed by Wu et al. (2007), which we shall call ΣAR(1), may not
be appropriate for such data and (3) the nonparametric covariance estimator, ΣNP ,
developed in chapter 2, is a more reliable covariance model than ΣAR(1). By utility in (1),
we mean that a nonseparable model can analyze data generated by the same nonseparable
model. With regards to (2), our results are surprising because, for some variance of the
process or a certain number of levels in the environmental signals, the estimated QTL
location and mean model parameters are generally robust to a biased separable covariance
estimate, ΣAR(1), of a nonseparable underlying structure. That is, if the covariance of a
data generated from a nonseparable structure is estimated by the separable model, ΣAR(1),
the estimate is biased, as expected, but the QTL location and mean model parameters
are still accurately and precisely estimated. However, the estimated maxLR (Section
65
1.2.3) is not accurate because the true underlying covariance structure and the (biased)
estimate, ΣAR(1), produce different log-likelihood values. Recall that maxLR is important
because it is used in permutation tests to assess significance of QTL existence. But when
both the variance and the number of levels in the environmental signals are increased,
the estimated QTL location is severely biased while the mean parameters are only mildly
affected. ΣNP provides consistently better results over ΣAR(1). Of course, if nonseparable
covariance models themselves are used to analyze data that exhibit interactions, the
results are expected to be much better. However, in reality, the underlying structure of the
data is unknown and it is very difficult to identify an appropriate nonseparable model to
use in this case. Modelers often employ strategies that are mainly ad hoc or specific to a
problem. Unfortunately, there are no general guidelines that are available in approaching
these type of problems. We will, however, use nonseparable covariance models to generate
simulated data with interactions and use it to compare ΣNP and ΣAR(1).
This chapter is organized as follows: In Section 3.2, we describe the functional
mapping model proposed by Wu et al. (2007a) for reaction norms. In Section 3.3, we
discuss separable and nonseparable models used in spatio-temporal modeling. In Section
3.4, we present a simulation study using some nonseparable structures introduced in
Section 3.3 and then conclude with a summary and discussion in Section 3.5. In this
chapter, we may alternately use the terms covariance matrix, structure or function. They
all refer to the same thing.
3.2 Functional Mapping of Reaction Norms to Multiple EnvironmentalSignals
Wolf (2002) described a reaction norm as a surface landscape determined by genetic
and environmental factors. The surface is obtained as a phenotypic trait plotted against
different environmental factors such as temperature, light intensity, humidity, etc., and
corresponds to a specific genetic effect such as additive, dominant or epistatic (Wu et
al., 2007c). At least in three dimensions, the features of the surface such as ”slope”,
66
”curvature”, ”peak valley”, and ”ridge”, can be described mathematically and these can
help elucidate how the underlying factors affect the phenotype.
An example of a reaction norm that illustrates a surface landscape is photosyn-
thesis (Wu et al., 2007a) which is the process by which light energy is converted
to chemical energy by plants and other living organisms. It is an important but
complex process because it involves several factors such as the age of a leaf (where
photosynthesis takes place in most plants), the concentration of carbon dioxide in the
environment, temperature, light irradiance, available nutrients and water in the soil,
etc.. A mathematical expression for the rate of single-leaf photosynthesis, P , without
photorespiration is
P =1
2θ
(αI + Pm −
√(αI + Pm)2 − 4θαIPm
)(3–1)
(Thornley and Johnson, 1990), where θ ∈ (0, 1) is a dimensionless parameter, α is the
photochemical efficiency, I is the irradiance, and Pm is the asymptotic photosynthetic rate
at a saturating irradiance. Pm is a linear function of the temperature, T ,
Pm =
Pm(20)(
T−T ∗20−T ∗
)if T ≥ T ∗
0 if T < T ∗,
(3–2)
where Pm(20) is the value of Pm at the reference temperature of 20oC and T ∗ is the
temperature at which photosynthesis stops. T ∗ is chosen over a range of temperatures,
such as 5oC-25oC, to provide a good fit to observed data.
Wu et al. (2007a) studied the reaction norm of photosynthetic rate, defined by Eqs.
3–1 and 3–2, as a function of irradiance (I) and temperature (T ). That is, the authors
considered P = P (I, T ). Here, we assume that T ∗ = 5 so that the reaction norm model
parameters are (α, Pm(20), θ). The surface landscape that describes the reaction norm of
P (I, T ), with parameters (α, Pm(20), θ) = (0.02, 1, 0.9), is shown in Figure 3-1. As stated
earlier, each reaction norm surface corresponds to a specific genetic effect. Thus, if a QTL
67
is at work, the genetic effects produce different surfaces defined by distinct sets of model
parameters.
0
100
200
300
15
20
25
300
0.5
1.0
1.5
2.0
Irradiance (I)Temperature (T)
Pho
tosy
nthe
tic R
ate
(P)
Figure 3-1. Reaction norm surface of photosynthetic rate as a function of irradiance andtemperature. Model is based on Eqs. 3–1 and 3–2 with parameters(α, Pm(20), θ) = (0.02, 1, 0.9). Adapted from Wu et al. (2007a).
3.2.1 Likelihood
We consider only a backcross design (Section 1.1.2) with one QTL. Extensions
to more complicated designs and the two-QTL case, as in Wu et al. (2007a), are
straightforward. Assume a backcross plant population of size n with a single QTL
affecting the phenotypic trait of photosynthetic rate. The photosynthetic rate for each
progeny i (= 1, ..., n) is measured at different irradiance (s = 1, ..., S) and temperature
(t = 1, ..., T ) levels. This choice of variables is adopted for consistency in later discussions
as we will be working with spatio-temporal covariance models. The set of phenotype
measurements or observations can be written in vector form as
yi = [yi(1, 1), ..., yi(1, T )︸ ︷︷ ︸irradiance 1
, ..., yi(S, 1), ..., yi(S, T )︸ ︷︷ ︸irradiance S
]′. (3–3)
68
The progeny are genotyped for molecular markers to construct a genetic linkage map for
the segregating QTL in the population. This means that the genotypes of the markers are
observed and will be used, along with the phenotype measurements, to predict the QTL
(Section 1.1). Because we assume a backcross design, the QTL has two possible genotypes
(as do the markers) which shall be indexed by k = 1, 2. The likelihood function based on
the phenotype and marker data can be formulated as
L(Ω) =n∏
i=1
[2∑
k=1
pk|ifk(yi|Ω)
](3–4)
where pk|i is the conditional probability of a QTL genotype given the genotype of a marker
interval for progeny i (Section 1.1.4). We assume a multivariate normal density for the
phenotype vector yi with genotype-specific means
µk = [µk(1, 1), ..., µk(1, T )︸ ︷︷ ︸irradiance 1
, ..., µk(S, 1), ..., µk(S, T )︸ ︷︷ ︸irradiance S
]′ (3–5)
and covariance matrix Σ =cov(yi).
3.2.2 Mean and Covariance Models
The mean vector for photosynthetic rate in 3–5 can be modeled using Eqs. 3–1 and
3–2 as
µk(s, t) =1
2θk
(αks + Pmk −
√(αks + Pmk)2 − 4θkαksPmk
)(3–6)
where
Pmk(t) =
Pmk(20)(
t−T ∗20−T ∗
)if t ≥ T ∗
0 if t < T ∗(3–7)
and k = 1, 2.
Wu et al. (2007a) used a separable structure (Mitchell et al., 2005) for the ST × ST
covariance matrix Σ as
ΣAR(1) = Σ1 ⊗ Σ2 (3–8)
69
where Σ1 and Σ2 are the (S × S) and (T × T ) covariance matrices among different
irradiance and temperature levels, respectively, and ⊗ is the Kronecker product operator
(see Appendix D). Note that Σ1 and Σ2 are unique only up to multiples of a constant
because for some |c| > 0, cΣ1 ⊗ (1/c)Σ2 = Σ1 ⊗Σ2. Each of Σ1 and Σ2 is modeled using an
AR(1) structure with a common error variance, σ2, and correlation parameters ρ1 and ρ2:
Σ1 = σ2
1 ρ1 . . . ρS−11
ρ1 1 . . . ρS−21
......
. . ....
ρS−11 ρS−2
1 . . . 1
, Σ2 = σ2
1 ρ2 . . . ρT−12
ρ2 1 . . . ρT−21
......
. . ....
ρT−12 ρT−2
2 . . . 1
(3–9)
Separable covariance structures, however, cannot model interaction effects of each reaction
norm to temperature and irradiance. Thus, there is a need for a more general model for
this purpose.
Note that with 3–6, 3–7, 3–8 and 3–9, Ω = α1, Pm1(20), θ1, α2, Pm2(20), θ2, σ2, ρ1, ρ2
in 3–4. These model parameters may be estimated using the ECM algorithm but closed
form solutions at the CM-step could be very complicated. A more efficient method is the
Nelder-Mead simplex algorithm (Section 2.3.2) .
3.2.3 Hypothesis Tests
The features of the surface landscape are important because they can be used as
a basis in formulating hypothesis tests. Let Ho and H1 denote the null and alternative
hypotheses, respectively. Then the existence of a QTL that determines the reaction norm
curves can be formulated as
H0 : α1 = α2, Pm1(20) = Pm2(20), θ1 = θ2
H1 : at least one of the inequalities above does not hold.
This means that if the reaction norm curves are distinct (in terms of their respective
estimated parameters), then a QTL possibly exists. Of course a slight difference in
parameter estimates does not automatically mean a QTL exists. But the significance
of the results can be tested by doing permutation tests using the log-likelihood ratio
70
between the null and alternative hypotheses (Section 1.2.3). A procedure described in
Wu et al. (2004a) can be used to test the additive effects of a QTL. Other hypotheses
can be formulated and tested such as the genetic control of the reaction norm to each
environmental factor, interaction effects between environmental factors on the phenotype,
and the marginal slope of the reaction norm with respect to each environmental factor or
the gradient of the reaction norm itself. The reader is referred to Wu et al. (2007a) for
more details.
3.3 Spatio-temporal Covariance Functions
3.3.1 Introduction
In this section, we investigate parametric nonseparable spatio-temporal covariance
structures for functional mapping of photosynthetic rate as a reaction norm to the
environmental factors irradiance and temperature. As stated earlier, the main idea is to
model irradiance as a one-dimensional spatial variable and temperature as a temporal
variable. Nonparametric methods are also available but are limited to either spatial
(Sampson and Guttorp, 1992; Li et al., 2007) or time series (Li et al., 2007) only and
not joint spatio-temporal. Schabenberger and Gotway (2005) noted that the statistical
methods available in analyzing spatio-temporal processes are not yet as fully developed as
those for spatial or time series alone. This is mainly because joint spatio-temporal analysis
is very difficult. One major difficulty is in producing a covariance function that is positive
definite. Until recently, some researchers have resorted to parametric separable models
to circumvent this difficulty. Aside from computational benefits, separable models allow
conditional analysis of processes with respect to the spatial and temporal domains which
can be combined to produce a joint spatio-temporal model. This strategy is helpful and
often used as an exploratory analysis tool prior to fitting a nonseparable model. Unlike
separable ones, nonseparable models cannot be expressed as a Kronecker product of two
matrices. But they are more general (and usually more complicated and have many more
71
parameters) because they can model interactions between spatial and temporal processes
and some of them allow separable models as special cases.
The construction of valid (positive-definite) nonseparable covariance models has
taken great strides in recent years. Schabenberger and Gotway (2005) describe four main
approaches: (1) Gneiting’s (2002) monotone function, (2) Cressie and Huang’s (1999)
spectral method, (3) mixture (Ma, 2007), and (4) Jones and Zhang’s (1997) partial
differential equation. (1) and (2) utilize mainly statistical principles whereas (3) and (4)
are mostly mathematical in nature. We shall discuss (1) and (2) in Section 3.3.4 and use
examples derived from these approaches in the simulations (Section 3.4).
3.3.2 Basic Ideas, Notation, and Assumptions
A spatio-temporal random process can be represented by
Y (s, t), (s, t) ∈ Rd × R, (3–10)
where observations are collected at N spatio-temporal coordinates (s1, t1), (s2, t2), ..., (sN , tN)
and d ∈ Z+. The data are only a partial realization of the process because, for practical
reasons, the process cannot be observed at each coordinate. Gneiting (2002) notes that
mathematically, the space-time domain Rd × R and the purely spatial domain Rd+1 are
equivalent. This means that the space-time covariance functions in Rd × R and spatial
covariance functions in Rd+1 belong to the same class. However, the notation Rd × Ris used to highlight the distinction between the respective domains. In this study, we
will only be concerned with the case d = 1 so that, from hereon, we will use R instead
of Rd for the spatial domain. Aside from those mentioned in the introduction (Section
3.1), Y may also represent ozone levels, disease incidence, ocean current patterns, water
temperatures, etc. In our study, Y represents photosynthetic rate.
If var(Y (s, t)) < ∞ for all (s, t) ∈ R × R, then the mean E[Y (s, t)] and covariance
cov(Y (s, t), Y (s + u, t + v)), where u and v are spatial and temporal lags, respectively,
both exist. We assume that the covariance is stationary in space and time so that for some
72
functions C,
cov(Y (s, t), Y (s + u, t + v)) = C(u, v). (3–11)
This means that the covariance function, C, depends only on the lags and not on the
values of the coordinates themselves. Stationarity is often assumed to allow estimation of
the covariance function from the data (Cressie and Huang, 1999). We also assume that the
covariance function is isotropic which means that it depends only on the absolute lags and
not in the direction or orientation of the coordinates to each other:
cov(Y (s, t), Y (s + u, t + v)) = C(|u|, |v|). (3–12)
Stationary and isotropic covariance functions are said to be translation and rotation-invariant
(about the origin) (Waller and Gotway, 2004). Note that C(u, 0) and C(0, v) correspond to
purely spatial and purely temporal covariance functions, respectively.
To be a valid covariance function, C must be positive definite. This means that for
any (s1, t1), ..., (sk, tk) ∈ R× R, any real coefficients a1, ..., ak, and any positive integer k,
k∑i=1
k∑j=1
aiajC(si − sj, ti − tj) ≥ 0 (3–13)
Note that based on Eq. 3–13, C should really be nonnegative-definite. However, this is the
way it is defined in the literature and we will adhere to this convention.
In spatio-temporal analysis, the ultimate goal is optimal prediction (or kriging) of an
unobserved part of the random process using an appropriate covariance function model.
In this study, we utilize a nonseparable covariance to calculate the mixture likelihood
associated with functional mapping.
3.3.3 Separable Covariance Structures
A covariance function C(u, v|θ) of a spatio-temporal process is separable if it can be
expressed as
C(u, v|θ) = C1(u|θ1)C2(v|θ2), (3–14)
73
where C1(u|θ1) and C2(v|θ2) are purely spatial and purely temporal covariance functions,
respectively, and θ = (θ1, θ2)′. This representation implies that the observed joint
process can be seen as a product of two independent spatial and temporal processes. A
formulation in terms of the joint process is
C(u, v|θ) =C(u, 0|θ)C(0, v|θ)
σ2, (3–15)
where σ2 = C(0, 0) is the variance of the process.
With representation 3–14, separable models have an advantage. For example, models
for C(u, v|θ) can be easily constructed by selecting suitable and readily available choices
for each of C1(u|θ1) and C2(v|θ2). Because many of these choices are positive-definite,
C(u, v|θ) is guaranteed to be positive-definite also. An example is
C(u, v|a, b) = exp(−a|u| − b|v|) (3–16)
where C1(u|a) = exp(−a|u|) and C2(v|b) = exp(−b|v|). Notice that for any given spatial
lags u1 and u2, C(u1, v|a, b) and C(u2, v|a, b) are proportional to each other. This means
that the plots of the temporal covariances have the same shapes at these spatial lags. This
property is important in the spectral construction of valid nonseparable models proposed
by Cressie and Huang (1999) (Section 3.3.4.1). For separable models, the processes in
the spatial and temporal domains do not act on each other and hence the selection of
an appropriate model for C(u, v|θ) can be facilitated by doing separate (conditional)
exploratory data analyses of spatial and temporal patterns.
A more general definition for separability is as a Kronecker product, as in Eq. 3–8.
From Eq. 3–8, it can be shown that Σ−1AR(1) = Σ−1
1 ⊗ Σ−12 and |ΣAR(1)| = |Σ1||Σ2|,
where | · | denotes the determinant of a matrix. Thus, another advantage of separable
models is computational efficiency, particularly in likelihood models where the inverse
and determinant of the covariance matrix are calculated. For a large covariance matrix of
dimension UV , its inverse can be calculated from the inverses of its Kronecker component
74
matrices, Σ1 and Σ2, with dimensions U and V , respectively. Thus, the inversion of a
100 × 100 matrix, for example, may only require the inversion of two 10 × 10 matrices.
A similar argument can be used for the determinant. Note that ΣAR(1) can be put in the
form 3–14 as
C(u, v|σ2, ρ1, ρ2) = σ2ρu1 · σ2ρv
2 = σ4ρu1ρ
v2, (3–17)
where u = 1, ..., U , v = 1, ..., V . This model assumes equidistant or regularly spaced
coordinates. Thus, two consecutive or closest neighbor coordinates will have the same
correlation structure as another even if their respective distances are different. A more
appropriate model might be
C(u, v|σ2, ρ1, ρ2, a, b) = σ4ρu/a1 ρ
v/b2 (3–18)
where a and b are scale parameters. However, this model is more complex than ΣAR(1) in
the sense that it has more parameters (5 vs 3) to estimate. The question of which model is
better will lead us to a model selection issue.
3.3.4 Nonseparable Covariance Structures
In this section, we review two methods in the construction of nonseparable spatio-temporal
models: the spectral (Cressie and Huang, 1999) and monotone function (Gneiting, 2002)
approaches. The discussion is for d = 1 but this can be generalized to the case d > 1.
3.3.4.1 Spectral method by Cressie and Huang (1999)
We assume that the covariance function C is continuous. If C is positive definite,
then the process has a spectral distribution (Matern, 1960; Cressie and Huang, 1999). If
the spectral density exists then Bochner’s theorem (Bochner, 1955) states that C can be
represented as
C(u, v) =
∫ ∫ei(uω+vτ)g(ω, τ)dωdτ, (3–19)
75
where g(ω, τ) is the spectral density. It can be shown (Cressie and Huang, 1999; Appendix
E) that Eq. 3–19 can be expressed as
C(u, v) =
∫eiuωρ(ω, v)κ(ω)dω. (3–20)
3–20 can be used to find valid covariance functions by selecting appropriate forms for
ρ(ω, v) and κ(ω). To get nonseparable structures, ρ(ω, v) must not be independent of ω.
Otherwise, C(u, v) will be separable.
Cressie and Huang gave seven examples of valid nonseparable covariance functions
constructed from certain choices for ρ(ω, v) and κ(ω) and using equation 3–20. We present
three of them here and use the first two in the simulations.
Example 1
The three-parameter nonseparable stationary covariance function, Example 1 of
Cressie and Huang (1999), given by
C(u, v) =σ2
√(a2v2 + 1)
exp
(− b2u2
a2v2 + 1
), (3–21)
where a, b ≥ 0 are the scaling parameters of time and space, respectively, and σ2 = C(0, 0).
Example 2
The three-parameter nonseparable stationary covariance function, Example 4 of
Cressie and Huang (1999), given by
C(u, v) =σ2(a|v|+ 1)
(a|v|+ 1)2 + b2|u|2 , (3–22)
where a, b ≥ 0 are the scaling parameters of time and space, respectively, and σ2 = C(0, 0).
Example 3
The four-parameter nonseparable stationary covariance function, Example 6 of Cressie
and Huang (1999), given by
C(u, v) = σ2 exp(−a|v| − b2|u|2 − c|v||u|2), (3–23)
76
where a, b ≥ 0 are the scaling parameters of time and space, respectively, c is an
interaction parameter of time and space, and σ2 = C(0, 0). Note that when c = 0,
3–23 reduces to a separable model.
3.3.4.2 Monotone function method by Gneiting (2002)
Although various nonnegative integrable functions can be used as a spectral density
function, the spectral method can be limited if no closed form solution can be obtained in
either 3–19 or 3–20. Gneiting (2002) developed an approach that does not rely on Fourier
transform pairs and avoids this kind of limitation.
Let φ(x) and ψ(x) be functions with nonnegative domains. Suppose φ(x) is
completely monotone and ψ(x) is positive with a completely monotone derivative. Then
C(u, v) =σ2
√ψ(|v|2)φ
( |u|2ψ(|v|2)
), (u, v) ∈ R× R, (3–24)
where σ2 = C(0, 0) > 0, is a valid nonseparable spatio-temporal covariance model.
φ(x) and ψ(x) can be associated with spatial and temporal structures, respectively, and
Gneiting (2002) provides a list of functions that can be used for each. For example, using
φ(x) = exp(−bxβ), b > 0, β ∈ (0, 1],
and
ψ(x) = (axα + 1)γ, a > 0, α ∈ (0, 1], γ ∈ [0, 1],
leads to
C(u, v) =σ2
(a|v|2α + 1)γ/2exp
(− b|u|2β
(a|v|2α + 1)βγ
), (u, v) ∈ R× R. (3–25)
Multiplying 3–25 by the purely temporal covariance function (a|v|2α + 1)−δ, v ∈ R, with
δ ≥ 0, produces
C(u, v) =σ2
(a|v|2α + 1)δ+γ/2exp
(− b|u|2β
(a|v|2α + 1)βγ
), (u, v) ∈ R× R. (3–26)
77
where a, b > 0 are scaling parameters of space and time, respectively; α, β ∈ (0, 1] are
smoothness parameters of space and time, respectively; γ ∈ [0, 1], and σ2 ≥ 0. A useful
reparametrization of 3–26 is
C(u, v) =σ2
(a|v|2α + 1)τexp
(− b|u|2β
(a|v|2α + 1)βγ
), (u, v) ∈ R× R. (3–27)
where τ ≥ 1/2 replaces δ + γ/2. γ is a space-time interaction parameter which implies a
separable structure when 0 and nonseparable structure otherwise. Increasing values of γ
indicates strengthening spatio-temporal interaction.
3.4 Simulations
In this section, we investigate the performance of the three nonseparable covariances
structures 3–21, 3–22 (Examples 1 and 2 of Section 3.3.4.1) and 3–27, denoted as follows:
C1(u, v) =σ2
√(a2v2 + 1)
exp
(− b2u2
a2v2 + 1
), a, b ≥ 0; σ2 > 0, (3–28)
C2(u, v) =σ2(a|v|+ 1)
(a|v|+ 1)2 + b2|u|2 , a, b ≥ 0; σ2 > 0, (3–29)
C3(u, v) =σ2
(a|v|2α + 1)τexp
(− b|u|2β
(a|v|2α + 1)βγ
), a, b ≥ 0; α, β ∈ (0, 1]; τ ≥ 1/2; γ ∈ [0, 1]; σ2 > 0.
(3–30)
To simplify our analysis, we assume for C3(u, v) that α = 1/2, β = 1/2, and τ = 1 so that
C3(u, v) =σ2
(a|v|+ 1)exp
(− b|u|
(a|v|+ 1)γ/2
), a, b ≥ 0; γ ∈ [0, 1]; σ2 > 0. (3–31)
We then generate data using these nonseparable structures to simulate interaction
effects between the two environmental signals in functional mapping of a reaction norm.
Simulations using separable and nonseparable covariance structures for spatio-temporal
process were studied by Huang et al. (2007a). The generated data is analyzed using
the nonparametric estimator, ΣNP , developed in chapter 2 and ΣAR(1) to assess their
performance. We also want to test whether the separable model, ΣAR(1), can be used to
78
analyze data generated from nonseparable covariance structures in three different cases.
Covariance fit is assessed using entropy and quadratic losses (Section 2.4.1).
Using a backcross design (2 genotype groups; Section 1.1.2) for the QTL mapping
population, we randomly generated 6 markers equally spaced on a chromosome 100 cM
long. One QTL was simulated between the fourth and fifth markers, 12 cM from the
fourth marker (or 72 cM from the leftmost marker of the chromosome). The QTL had
two possible genotypes which determined two distinct mean photosynthetic rate reaction
norm surfaces defined by Eqs. 3–1 and 3–2 (see Figure 3-1). The surface parameters
for each genotype group were (α1, Pm1(20), θ1) = (0.02, 2, 0.9) and (α2, Pm2(20), θ2) =
(0.01, 1.5, 0.9). Phenotype observations were obtained by sampling from a multivariate
normal distribution with mean surface based on irradiance and temperature levels of
0, 50, 100, 200, 300 and 15, 20, 25, 30, respectively and covariance matrix Cl(u, v), l =
1, 2, 3.
The functional mapping model was applied to the marker and phenotype data with
n = 200, 400 samples. The surface defined by Eqs. 3–1 and 3–2 was used as mean model
and Cl(u, v) as covariance model to analyze the data generated using Cl(u, v). That is,
we modeled data generated by the same mean and covariance used in the model. 100
simulation runs were carried out and the averages on all runs of the estimated QTL
location, mean parameter estimates, maxLR, entropy and quadratic losses (see Section
2.4.1), including the respective Monte carlo standard errors (SE), were recorded. The
results are shown in Tables 3-1 and 3-2. Table 3-2 also includes the results for ΣAR(1).
Both tables show accurate and precise estimates of QTL location, mean surface and
covariance parameters.
Next, ΣNP and ΣAR(1) were used to analyze the data generated by each of Cl(u, v), l =
1, 2, 3 . Tables 3-3 and 3-4 show the results of these respective analyses. The results for
ΣNP are very good. However, those for ΣAR(1) are somewhat unexpected. Apparently, the
estimated QTL location and mean parameters are accurate and precise! This would imply
79
that these aspects of the model are robust to misspecification of the covariance structure.
Even the maxLR values are very close to the corresponding ones in Tables 3-1 and 3-2,
which should be (almost) the true values. The average losses, however, are inflated for
C1 and C2. Upon close inspection, it turns out that it is misleading to look at maxLR
in this situation. What should be considered are the log-likelihood values under the null
and alternative models from which maxLR is derived. Figure 3-2 provides box plots of
the log-likelihood values under the alternative model based on the 100 simulation runs.
These plots reveal clear biased estimates of C1 and C2 by ΣAR(1) and the degrees of bias
are consistent with the average losses. The results for the null model are very similar but
are not presented here. We also provide the covariance and corresponding contour plots of
Cl(u, v), l = 1, 2, 3 and the ΣAR(1) estimates of these in Figures 3-3 and 3-4.
We conducted further simulations under C1 with n = 400, the case where ΣAR(1)
performed the worst. We considered two scenarios: increased variance (σ2 = 2, 4) and
number of irradiance (0, 50, 100, 150, 200, 250, 300) and temperature (15, 18, 21, 24, 27, 30)levels. The results are shown in Tables 3-5 and 3-6, respectively. The results show that
under these two scenarios, the estimate of the QTL location is severely biased if one uses
ΣAR(1). This is not the case for ΣNP .
80
Tab
le3-
1.A
vera
ged
QT
Lpo
siti
on,m
ean
curv
epa
ram
eter
s,m
axim
umlo
g-lik
elih
ood
rati
os(m
axL
R),
entr
opy
and
quad
rati
clo
sses
and
thei
rst
anda
rder
rors
(giv
enin
pare
nthe
ses)
for
two
QT
Lge
noty
pes
ina
back
cros
spo
pula
tion
unde
rdi
ffere
ntsa
mpl
esi
zes
(n)
base
don
100
sim
ulat
ion
repl
icat
es(N
onse
para
ble
Mod
el).
QT
LQ
TL
geno
type
1Q
TL
geno
type
2C
ovar
ianc
en
Loc
atio
nα
1P
m1(2
0)θ 1
α2
Pm
2(2
0)θ 2
max
LR
LE
LQ
σ2
ab
C1
200
71.9
60.
022.
000.
900.
011.
540.
8813
1.46
0.02
0.19
1.00
0.50
0.01
(0.3
2)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.02)
(0.0
1)(2
.31)
(0.0
0)(0
.04)
(0.0
0)(0
.00)
(0.0
0)40
072
.00
0.02
2.01
0.90
0.01
1.52
0.89
262.
110.
011.
131.
000.
500.
01(0
.20)
(0.0
0)(0
.01)
(0.0
0)(0
.00)
(0.0
1)(0
.01)
(3.0
0)(0
.00)
(0.0
2)(0
.00)
(0.0
0)(0
.00)
Tru
e:72
.00
0.02
2.00
0.90
0.01
1.50
0.09
--
-1.
000.
500.
01
QT
LQ
TL
geno
type
1Q
TL
geno
type
2C
ovar
ianc
en
Loc
atio
nα
1P
m1(2
0)θ 1
α2
Pm
2(2
0)θ 2
max
LR
LE
LQ
σ2
ρ1
ρ2
C2
200
72.0
00.
022.
000.
900.
011.
540.
8814
9.63
0.02
0.19
1.00
1.02
0.01
(0.3
0)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.01)
(0.0
1)(2
.37)
(0.0
0)(0
.04)
(0.0
0)(0
.02)
(0.0
0)40
071
.84
0.02
2.01
0.90
0.01
1.52
0.89
299.
290.
010.
131.
001.
010.
01(0
.18)
(0.0
0)(0
.01)
(0.0
0)(0
.00)
(0.0
1)(0
.01)
(3.0
6)(0
.00)
(0.0
2)(0
.00)
(0.0
1)(0
.00)
Tru
e:72
.00
0.02
2.00
0.90
0.01
1.50
0.90
--
-1.
001.
000.
01
81
Tab
le3-
2.A
vera
ged
QT
Lpo
siti
on,m
ean
curv
epa
ram
eter
s,m
axim
umlo
g-lik
elih
ood
rati
os(m
axL
R),
entr
opy
and
quad
rati
clo
sses
and
thei
rst
anda
rder
rors
(giv
enin
pare
nthe
ses)
for
two
QT
Lge
noty
pes
ina
back
cros
spo
pula
tion
unde
rdi
ffere
ntsa
mpl
esi
zes
(n)
base
don
100
sim
ulat
ion
repl
icat
es(N
onse
para
ble
Mod
el).
QT
LQ
TL
geno
type
1Q
TL
geno
type
2C
ovar
ianc
en
Loc
atio
nα
1P
m1(2
0)θ 1
α2
Pm
2(2
0)θ 2
max
LR
LE
LQ
σ2
ab
c
C3
200
71.9
60.
022.
010.
890.
011.
550.
8712
6.80
0.02
0.19
1.00
1.03
0.01
0.62
(0.3
4)(0
.00)
(0.0
1)(0
.01)
(0.0
0)(0
.02)
(0.0
1)(2
.20)
(0.0
0)(0
.04)
(0.0
0)(0
.02)
(0.0
0)(0
.02)
400
71.9
20.
022.
010.
900.
011.
520.
8925
3.38
0.01
0.13
1.00
1.01
0.01
0.61
(0.2
0)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.01)
(0.0
1)(2
.83)
(0.0
0)(0
.02)
(0.0
0)(0
.01)
(0.0
0)(0
.02)
Tru
e:72
.00
0.02
2.00
0.90
0.01
1.50
0.09
--
-1.
001.
000.
010.
60
QT
LQ
TL
geno
type
1Q
TL
geno
type
2C
ovar
ianc
en
Loc
atio
nα
1P
m1(2
0)θ 1
α2
Pm
2(2
0)θ 2
max
LR
LE
LQ
σ2
ρ1
ρ2
ΣA
R(1
)20
072
.12
0.02
2.01
0.89
0.01
1.55
0.88
75.0
80.
020.
191.
000.
600.
60(0
.44)
(0.0
0)(0
.01)
(0.0
1)(0
.00)
(0.0
2)(0
.01)
(1.8
0)(0
.00)
(0.0
4)(0
.00)
(0.0
0)(0
.00)
400
72.0
00.
022.
010.
900.
011.
520.
8914
9.14
0.01
0.13
(0.2
6)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.01)
(0.0
1)(2
.33)
(0.0
0)(0
.02)
(0.0
0)(0
.00)
(0.0
0)
Tru
e:72
.00
0.02
2.00
0.90
0.01
1.50
0.90
--
-1.
000.
600.
60
82
Tab
le3-
3.A
vera
ged
QT
Lpo
siti
on,m
ean
curv
epa
ram
eter
s,m
axim
umlo
g-lik
elih
ood
rati
os(m
axL
R),
entr
opy
and
quad
rati
clo
sses
and
thei
rst
anda
rder
rors
(giv
enin
pare
nthe
ses)
for
two
QT
Lge
noty
pes
ina
back
cros
spo
pula
tion
unde
rdi
ffere
ntsa
mpl
esi
zes
(n)
base
don
100
sim
ulat
ion
repl
icat
es(Σ
NP).
QT
LQ
TL
geno
type
1Q
TL
geno
type
2C
ovar
ianc
en
Loc
atio
nα
1P
m1(2
0)θ 1
α2
Pm
2(2
0)θ 2
max
LR
LE
LQ
C1
200
71.6
80.
022.
020.
900.
011.
520.
8899
.39
1.04
2.03
(0.2
8)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.02)
(0.0
1)(5
.10)
(0.0
1)(0
.02)
400
72.1
60.
022.
000.
900.
011.
520.
8818
9.34
0.53
1.06
(0.2
3)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.01)
(0.0
1)(2
.08)
(0.0
0)(0
.01)
C2
200
71.8
80.
022.
000.
900.
011.
530.
8810
9.47
1.00
1.96
(0.2
9)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.01)
(0.0
1)(5
.65)
(0.0
1)(0
.02)
400
71.9
20.
022.
000.
900.
011.
520.
8921
4.76
0.52
1.02
(0.1
7)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.01)
(0.0
1)(2
.31)
(0.0
0)(0
.01)
C3
200
72.1
20.
022.
010.
890.
011.
540.
8710
2.73
0.88
1.70
(0.3
7)(0
.00)
(0.0
1)(0
.01)
(0.0
0)(0
.02)
(0.0
1)(1
.90)
(0.0
1)(0
.02)
400
72.0
80.
022.
010.
900.
011.
520.
8919
8.32
0.48
0.94
(0.2
0)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.01)
(0.0
1)(2
.36)
(0.0
0)(0
.01)
Tru
e:72
.00
0.02
2.00
0.90
0.01
1.50
0.90
83
Tab
le3-
4.A
vera
ged
QT
Lpo
siti
on,m
ean
curv
epa
ram
eter
s,m
axim
umlo
g-lik
elih
ood
rati
os(m
axL
R),
entr
opy
and
quad
rati
clo
sses
and
thei
rst
anda
rder
rors
(giv
enin
pare
nthe
ses)
for
two
QT
Lge
noty
pes
ina
back
cros
spo
pula
tion
unde
rdi
ffere
ntsa
mpl
esi
zes
(n)
base
don
100
sim
ulat
ion
repl
icat
es(Σ
AR
(1))
.Q
TL
QT
Lge
noty
pe1
QT
Lge
noty
pe2
Cov
aria
nce
nLoc
atio
nα
1P
m1(2
0)θ 1
α2
Pm
2(2
0)θ 2
max
LR
LE
LQ
C1
200
72.3
20.
022.
030.
900.
011.
530.
8712
2.96
19.4
368
1.78
(0.4
5)(0
.00)
(0.0
1)(0
.01)
(0.0
0)(0
.02)
(0.0
1)(2
.43)
(0.0
7)(6
.16)
400
71.7
20.
022.
030.
900.
011.
510.
8924
5.85
19.4
568
4.11
(0.2
7)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.01)
(0.0
1)(3
.33)
(0.0
5)(4
.40)
C2
200
71.9
60.
022.
010.
900.
011.
550.
8713
0.93
4.83
58.6
0(0
.34)
(0.0
0)(0
.01)
(0.0
0)(0
.00)
(0.0
2)(0
.01)
(2.3
0)(0
.02)
(1.0
1)40
071
.84
0.02
2.01
0.90
0.01
1.52
0.89
262.
504.
8358
.61
(0.2
0)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.01)
(0.0
1)(3
.06)
(0.0
2)(0
.77)
C3
200
72.0
00.
022.
010.
890.
011.
540.
8712
4.98
0.60
1.51
(0.3
5)(0
.00)
(0.0
1)(0
.01)
(0.0
0)(0
.02)
(0.0
1)(2
.23)
(0.0
0)(0
.10)
400
71.9
60.
022.
010.
890.
011.
520.
8925
0.76
0.60
1.43
(0.2
2)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.01)
(0.0
1)(2
.97)
(0.0
0)(0
.08)
Tru
e:72
.00
0.02
2.00
0.90
0.01
1.50
0.90
84
−1500
−1100
−700
−300n=200
log−
likel
ihoo
d, H
1
−3000
−2000
−1000
n=400
log−
likel
ihoo
d, H
1
−1300
−950
−600
log−
likel
ihoo
d, H
1
−2500
−2100
−1700
−1300
log−
likel
ihoo
d, H
1
−1700
−1400
−1100
log−
likel
ihoo
d, H
1
−3300
−2950
−2600
log−
likel
ihoo
d, H
1
NP C1 AR(1) NP C
1
NP C2 NP C
2
NP C3 NP C
3
AR(1)
AR(1) AR(1)
AR(1) AR(1)
Figure 3-2. Boxplots of the values of the log-likelihood under the alternative model, H1.Significantly biased estimates by ΣAR(1) are apparent for C1.
85
0 100
200300
0 5 10 15
0
0.5
1
|u|
TRUE NONSEPARABLE COVARIANCE
|v|
C1(u
,v)
01
23
0 1 2 3
0
0.5
1
AR(1)
0 100
200300
0 5 10 15
0
0.5
1
|u||v|
C2(u
,v)
01
23
0 1 2 3
0
0.5
1
0 100
200300
0 5 10 15
0
0.5
1
|u||v|
C3(u
,v)
01
23
0 1 2 3
0
0.5
1
Figure 3-3. Covariance plots. Plots of Cl, l = 1, 2, 3 versus irradiance (|u|) andtemperature (|v|) lags are on the left column. On the right column are theestimates of Cl by ΣAR(1).
86
0 100 200 3000
5
10
15
|u|
|v|
TRUE NONSEPARABLE COVARIANCE
0 1 2 30
1
2
3AR(1)
0 100 200 3000
5
10
15
|u|
|v|
0 1 2 30
1
2
3
0 100 200 3000
5
10
15
|u|
|v|
0 1 2 30
1
2
3
C1(u,v)
C2(u,v)
C3(u,v)
Figure 3-4. Contour plots. Contour plots of Cl, l = 1, 2, 3 on the left column. On the rightcolumn are the contour plots of the estimates of Cl by ΣAR(1).
87
Tab
le3-
5.A
vera
ged
QT
Lpo
siti
on,m
ean
curv
epa
ram
eter
s,m
axim
umlo
g-lik
elih
ood
rati
os(m
axL
R),
entr
opy
and
quad
rati
clo
sses
and
thei
rst
anda
rder
rors
(giv
enin
pare
nthe
ses)
for
two
QT
Lge
noty
pes
ina
back
cros
spo
pula
tion
unde
rdi
ffere
ntsa
mpl
esi
zes
(n)
base
don
100
sim
ulat
ion
repl
icat
es(C
1w
ith
n=
400
and
σ2
=2,
4).
QT
LQ
TL
geno
type
1Q
TL
geno
type
2lo
g-lik
elih
ood
Cov
aria
nce
σ2
Loc
atio
nα
1P
m1(2
0)θ 1
α2
Pm
2(2
0)θ 2
H0
H1
max
LR
LE
LQ
ΣA
R(1
)2
72.4
00.
022.
050.
890.
011.
520.
87-5
437
-537
312
8.51
19.4
568
4.37
(0.4
4)(0
.00)
(0.0
1)(0
.01)
(0.0
0)(0
.02)
(0.0
1)(7
.36)
(7.3
1)(2
.45)
(0.0
5)(4
.44)
474
.20
0.02
2.11
0.88
0.01
1.52
0.84
-817
5-8
141
65.5
519
.44
683.
82(0
.69)
(0.0
0)(0
.02)
(0.0
1)(0
.00)
(0.0
3)(0
.02)
(7.3
2)(7
.31)
(1.8
0)(0
.05)
(4.4
6)
C1
271
.96
0.02
2.01
0.90
0.01
1.54
0.88
-408
8-4
021
133.
410.
010.
13(0
.29)
(0.0
0)(0
.01)
(0.0
0)(0
.00)
(0.0
2)(0
.01)
(7.1
7)(7
.16)
(2.1
5)(0
.00)
(0.0
2)4
71.9
60.
022.
030.
890.
011.
570.
86-6
822
-678
869
.07
0.01
0.13
(0.4
4)(0
.00)
(0.0
1)(0
.01)
(0.0
0)(0
.03)
(0.0
2)(7
.16)
(7.1
6)(1
.57)
(0.0
0)(0
.02)
NP
272
.16
0.02
2.01
0.89
0.01
1.54
0.87
-396
7-3
912
109.
790.
531.
05(0
.29)
(0.0
0)(0
.01)
(0.0
0)(0
.00)
(0.0
2)(0
.01)
(6.8
7)(6
.89)
(1.6
6)(0
.00)
(0.0
1)4
71.6
40.
022.
010.
890.
011.
570.
84-6
713
-668
459
.92
0.53
1.04
(0.4
9)(0
.00)
(0.0
1)(0
.01)
(0.0
0)(0
.03)
(0.0
2)(6
.89)
(6.9
3)(1
.27)
(0.0
0)(0
.01)
Tru
e:72
.00
0.02
2.00
0.90
0.01
1.50
0.90
88
Tab
le3-
6.A
vera
ged
QT
Lpo
siti
on,m
ean
curv
epa
ram
eter
s,m
axim
umlo
g-lik
elih
ood
rati
os(m
axL
R),
entr
opy
and
quad
rati
clo
sses
and
thei
rst
anda
rder
rors
(giv
enin
pare
nthe
ses)
for
two
QT
Lge
noty
pes
ina
back
cros
spo
pula
tion
unde
rdi
ffere
ntsa
mpl
esi
zes
(n)
base
don
100
sim
ulat
ion
repl
icat
es(C
1w
ith
n=
400,
incr
ease
dir
radi
ance
and
tem
pera
ture
leve
ls,an
dσ
2=
1,2)
.Q
TL
QT
Lge
noty
pe1
QT
Lge
noty
pe2
log-
likel
ihoo
dC
ovar
ianc
eσ
2Loc
atio
nα
1P
m1(2
0)θ 1
α2
Pm
2(2
0)θ 2
H0
H1
max
LR
LE
LQ
ΣA
R(1
)1
72.1
60.
022.
040.
900.
011.
480.
88-1
278
-106
343
0.01
223
6409
0(0
.36)
(0.0
0)(0
.01)
(0.0
0)(0
.00)
(0.0
1)(0
.01)
(14.
01)
(14.
15)
(4.7
8)(0
.45)
(261
.88)
278
.44
0.02
2.15
0.91
0.01
1.48
0.86
-699
2-6
876
231.
8622
263
923
(0.8
4)(0
.00)
(0.0
2)(0
.00)
(0.0
0)(0
.02)
(0.0
1)(1
4.08
)(1
4.16
)(3
.62)
(0.4
4)(2
57.8
9)
C1
171
.76
0.02
2.01
0.90
0.01
1.51
0.89
4913
5068
309.
860.
010.
31(0
.18)
(0.0
0)(0
.00)
(0.0
0)(0
.00)
(0.0
1)(0
.00)
(11.
04)
(11.
10)
(3.1
7)(0
.00)
(0.0
4)2
71.7
60.
022.
010.
900.
011.
520.
88-8
21.0
8-7
43.7
615
4.64
0.01
0.31
(0.2
4)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.01)
(0.0
1)(1
1.10
)(1
1.12
)(2
.22)
(0.0
0)(0
.04)
NP
171
.726
0.02
2.01
0.90
0.01
1.51
0.89
5431
5537
212.
642.
344.
55(0
.18)
(0.0
0)(0
.01)
(0.0
0)(0
.00)
(0.0
1)(0
.00)
(11.
22)
(11.
11)
(2.2
0)(0
.01)
(0.0
3)2
72.1
260.
022.
010.
900.
011.
490.
89-3
36-2
7312
7.37
2.37
4.53
(0.3
4)(0
.00)
(0.0
1)(0
.00)
(0.0
0)(0
.01)
(0.0
1)(1
0.44
)(1
0.42
)(1
.72)
(0.0
1)(0
.03)
Tru
e:72
.00
0.02
2.00
0.90
0.01
1.50
0.90
89
3.5 Summary and Discussion
In this chapter, we studied the covariance model in functional mapping of photosynthetic
rate as a reaction norm to irradiance and temperature as the environmental signals. In
the presence of interaction between the two signals simulated by nonseparable covariance
structures, our analysis showed that ΣNP is a more reliable estimator than ΣAR(1). The
advantage of ΣNP over ΣAR(1) is more pronounced when the variance of the reaction norm
process and the number of signal levels increase.
A few issues need to be discussed. First, ΣNP was developed in chapter 2 based
on a sequence of regressions obtained from the modified Cholesky decomposition of the
covariance matrix of a one dimensional (longitudinal) vector which has an ordering of
variables. In this chapter, the phenotype vector consists of observations based on two
levels of irradiance and temperature measurements, i.e.
yi = [yi(1, 1), ..., yi(1, T )︸ ︷︷ ︸irradiance 1
, ..., yi(S, 1), ..., yi(S, T )︸ ︷︷ ︸irradiance S
]′. (3–32)
While the order of the variables in this vector is predefined, there is no natural ordering
like in longitudinal data. Instead of ΣNP , a more appropriate method might be to adopt
the sparse permutation invariant covariance estimator (SPICE) proposed by Rothman et
al. (2008) which is invariant to variable permutations. SPICE is derived by decomposing
the covariance matrix as
Σ = C ′C (3–33)
where C = [ctj] is a lower triangular matrix. In terms of the components of the sequence of
regression equations,
ctj = −φtj
σtt
, t < j and ctj =1
σtt
(3–34)
where φtj and σ2tt are the GARPs and IVs (Section 2.2.1). However, our simulation
results suggest that ΣNP can still be directly applied to observations that have no
variable ordering such as 3–32. Furthermore, Rothman et al. stated that, under variable
90
permutations, the L1 penalty in Huang eta al.’s (2006) method can still potentially
produce reasonable estimates.
A second issue pertains to using nonseparable models in functional mapping where
the simulations in this chapter showed very good results. This might be a good idea if
the model closely reflects the structure of the data. Unfortunately, this is not often the
case. In fact, it is not even known whether the data exhibits interactions or not. Before
deciding on what model to use, spatio-temporal modelers utilize tests for separability
(Mitchell et al., 2005; Fuentes et al., 2005). If separable models are appropriate, there
are a wealth of options. Otherwise, it is difficult to choose from a number of complex
models because there are no available general guidelines as yet that can help one decide
on a specific nonseparable model. The model C3 that was used in the simulations (Section
3.4) has an easy to interpret interaction parameter γ ∈ [0, 1]. However, despite an
interaction ”strength” of γ = 0.6, the separable model, ΣAR(1), estimated the data
generated by C3 quite well. Thus, the trade-off between using a nonseparable model
instead of a separable one may not be worth it. Another option is to use separable
approximations to nonseparable covariances (Genton, 2007). The nonseparable covariances
that we considered were assumed to be stationary and isotropic (Section 3.3.2). These two
assumptions may not always hold for real data. Although not specifically addressed here,
using ΣNP may work for data that do not satisfy these assumptions.
In this chapter, we only considered two environmental signals with interactions:
irradiance and temperature. However, the reaction norm of photosynthetic rate is a very
complex process because there are really more environmental signals at play other than
the latter two. The spatial domain of spatio-temporal nonseparable covariance models
can be extended to more than one dimensions. For example, a two dimensional spatial
domain models an area on a flat surface while a three dimensional domain models space.
However, this extension cannot be used to increase the number of signals unless the signals
have the same unit of measurement or one assumes separability or no interaction. Thus,
91
it is difficult to simulate data from more than two signals with interactions. However, the
proposed nonparametric estimator can theoretically handle this type of data.
The analysis conducted in this chapter were all based on simulated data, which
makes our proposed model theoretical and not (yet) practical. However, we hope that
our theoretical framework can either stimulate and motivate researchers to conduct
experiments and studies to produce data that our model can analyze or at least lead
research to a direction that we consider theoretically possible.
92
CHAPTER 4CONCLUDING REMARKS
4.0.1 Summary
In this dissertation, we provided a new nonparametric covariance estimator for
functional mapping of complex dynamic traits. The estimator is positive-definite because
it was derived through the modified Cholesky decomposition. It was also regularized by
the application of ridge regression and LASSO techniques to a sequence of regressions
obtained from a statistical interpretation of the components of the modified Cholesky
decomposition. The estimator was obtained by using the ECM algorithm whereby the
posterior probability that each progeny had a particular genotype was utilized as the
weights in the closed form formulas 2–34, 2–35 and 2–36 based on an L2 penalty. This
provided not only an intuitive interpretation but more importantly, a way to extend a
null model covariance to an alternative (or mixture) model covariance. Thus, the nested
LASSO (Levina et al., 2008) and SPICE (Rothman et al., 2008) models can potentially be
implemented by our method to produce other regularized estimators.
We considered two main applications of functional mapping: traits measured
across time or longitudinal traits (chapter 2) and reaction norms to two environmental
signals (chapter 3). For longitudinal traits, simulations showed that our estimator
can more precisely estimate the QTL location and its effects compared to AR(1). For
reaction norms, simulations again showed our estimator to be more reliable than Wu
et al.’s (2007a) proposed estimator in the presence of interaction effects between the
two environmental signals. The interaction effects were simulated using nonseparable
covariance structures. In both applications, the nonparametric estimator is more flexible
because it does not assume a parametric or structural form and is therefore suited to
analyze data with varying structures. Therefore, the nonparametric estimator can be
used as an alternative over, or a guide for, parametric modeling of the covariance in the
practical deployment of functional mapping.
93
4.0.2 Future Directions
Although functional mapping only considers one QTL at a time, a breakthrough
model was developed by Yang et al. (2007). The authors proposed composite functional
mapping which is an integration of functional mapping and composite interval mapping
(Jansen and Stam, 1994; Zeng, 1994; section 1.3). Composite functional mapping allows
modeling of marker effects beyond the interval considered by using a partial regression
analysis. This significantly improves the accuracy and precision of functional mapping
in multiple QTL detection. However, composite functional mapping assumes an AR(1)
covariance structure. It would be advantageous to incorporate our proposed nonparametric
estimator into this method to further improve its power.
The development of complex traits is the consequence of interactions among a
multitude of genetic and environmental factors that each trigger an impact on every step
of trait development. This process is inherently complicated, but can be illustrated by a
landscape of phenotype formed by genetic and environmental variables (Rice 2002; Wolf
2002). The surface of such a phenotype landscape defines the phenotype determined by
a particular combination of underlying genetic (such as additive, dominant or epistatic)
and environmental factors (such as temperature, light or moisture) that interact with each
other through developmental pathways. The number of underlying factors contributing
to phenotypic variation defines the number of dimensions of the landscape space. In
theory, the number of underlying factors can be unlimited, implying that a landscape can
exist in very-high-dimensional space (i.e., hyperspace) (Wolf 2002). Figure 4-1 shows a
hypothetical landscape, where the phenotype of an individual is determined by the values
of two underlying factors. By characterizing the topographical features of such landscape,
a fundamental question of how each underlying factor contributes to the expression of
a particular trait individually or through an interactive web can be addressed. These
features typically include “slope”, “curvature”, “peak-valley” and “ridge”. The description
of the topography of a three-dimensional landscape (Fig. 4-1) is most intuitive, but
94
the same descriptors can also be applied to hyperdimensional landscapes, although
the intuitive interpretation of the features become increasingly abstract with increased
dimensionality (Wolf 2002). It is worthwhile to develop a model for testing whether these
landscapes are the same for different genetic machineries involved to regulate phenotypic
responsiveness to multiple different environments. For example, we do not know whether
there are the same genetic system to regulate the reaction norms of a plant to photoperiod
and temperature. If there are different genetic systems, how do different genetic elements
interact with each other in a complicated web to determine the outcome of reaction
norms?
Figure 4-1. Formation of a phenotype by a landscape. The phenotypic formation is afunction of the value of underlying factors 1 and 2 (u1 and u2) that interactduring trait development. Two shaded ovals present two different areas on thesurface, one being steeper (pointing to Inset A) and the second being flatter(pointing to Inset B). The steeper one is associated with a dramatic change inphenotypic expression contributed by a small change in the underlying factors(indicated by the distribution in Inset A), whereas the flatter one associatedwith a different pattern in which dramatic changes in the underlying factorsonly lead to a minor change in phenotypic expression (indicated by thedistribution in Inset B). Adapted from Wolf (2002).
Because biological traits are derived from developmental processes and physiological
regulatory mechanisms, complex multivariate systems that undergo such processes should
95
be carefully studied. Considering a complex interplay among numerous interacting
genetic loci, developmental processes, and environmental aspects of trait expression, we
need to develop a systems approach that integrates the analytic and synthetic method,
encompassing both holism and reductionism. Such a systems approach enables the study
of all elements in a network in response to developmental or environmental perturbations.
Synthetic analyses of biological information from different elements through mathematical
modeling provide new insights into the operation of the system. For example, to better
study the genetics of seed production in the common bean, one should dissolve this system
into its elements, seed size and seed number, and study the developmental trajectories
of these elements and their developmental interactions during ontogeny. We will need to
integrate a systems approach to study the genetic etiology of development through the
connection of its three fundamental aspects, allometry, ontogeny and plasticity.
96
APPENDIX ADERIVATION OF EM ALGORITHM FORMULAS
Let x = (x′1, ...,x′n)′, where xi = (xi1, ..., xiJ)′, i = 1, ..., n, is a vector that indicates
from which genotype group yi = (yi1, ..., yim)′ belongs to. We assume that the xi’s are
independent and identically distributed (i.i.d.) realized values from a multinomial(1,pi)
distribution where pi = (p1|i, ..., pJ |i)′. Thus, xik = 1 or 0, depending on whether or not
yi belongs to genotype group k = 1, ..., J . In reality, x is unknown (or missing) so that
y = (y′1, ...,y′n)′ can be viewed as incomplete data. The complete data is (x′,y′)′ with
log-likelihood
log Lc(Ω) = logn∏
i=1
J∏
k=1
[pk|ifk(yi|Ω)]xik
=n∑
i=1
J∑
k=1
xik[log pk|i + log fk(yi|Ω)]. (A–1)
The EM algorithm at the (j + 1)th iteration proceeds as follows:
1. The current value of Ω is Ω(j).
2. E-Step. Calculate the conditional expectation of the complete data log-likelihood,
(1-11), given the observed data y and Ω(j):
Q(Ω|Ω(j)) = E[log Lc(Ω)|y,Ω(j)]
=n∑
i=1
J∑
k=1
E(Xik|yi,Ω(j))[log pk|i + log fk(yi|Ω)]
=n∑
i=1
J∑
k=1
P(Xik = 1|yi,Ω(j))[log pk|i + log fk(yi|Ω)]. (A–2)
97
By Bayes’ theorem,
P(Xik = 1|yi,Ω(j)) =
P(yi, Xik = 1|Ω(j))
P(yi|Ω(j))
=P(yi|Xik = 1,Ω(j))P(Xik = 1|Ω(j))∑J
h=1 P(yi|Xih = 1,Ω(j))P(Xih = 1|Ω(j))
=pk|ifk(yi|Ω(j))∑J
h=1 ph|ifh(yi|Ω(j))
= P(j)k|i . (A–3)
so that A–2 becomes
Q(Ω|Ω(j)) =n∑
i=1
J∑
k=1
P(j)k|i [log pk|i + log fk(yi|Ω)]. (A–4)
Therefore, this step involves updating P(j)k|i using Ω(j) as in 1–13.
3. M-Step. Solve
∂
∂θQ(Ω|Ω(j)) =
n∑i=1
J∑
k=1
P(j)k|i
∂
∂θlog fk(yi|Ω) = 0 (A–5)
to get Ω(j+1).
4. Repeat until some convergence criterion is met.
98
APPENDIX BDERIVATION OF EQUATION 2-9
Suppose X ′X is in correlation form. Then the eigenvalue decomposition of X ′X is
V ′(X ′X)V =
λ1 0 0 0
0 λ2 0 0
......
. . ....
0 0 0 λk
where λi’s are the eigenvalues of X ′X and the elements of the orthogonal matrix V =
(v1, ..., vk) are the associated eigenvectors (Myers, 1990). Here, vi = (vi1, ..., vik)′. Thus,
(X ′X)−1 = V
1/λ1 0 0 0
0 1/λ2 0 0
......
. . ....
0 0 0 1/λk
V ′
since V is orthogonal. Eq. 2–9 then follows.
99
APPENDIX CMINIMIZATION OF 2-33
For fixed φtj, j = 1, 2, ..., t− 1, 2–33 is minimized with respect to σ2t by solving
∂
∂σ2t
[n∑
i=1
J∑
k=1
Pk|i
(log σ2
t +εkit
2
σ2t
)]=
n∑i=1
J∑
k=1
Pk|i
(1
σ2t
− εkit
2
σ4t
)= 0
yielding
σ2t =
∑ni=1
∑Jk=1 Pk|i
(yk
it −∑t−1
j=1 ykijφtj
)2
n(C–1)
since εkit = yk
it −∑t−1
j=1 ykijφtj.
For fixed σ2t , 2–33 is minimized with respect to φtj by the minimizer of
n∑i=1
J∑
k=1
Pk|i(yk
it −∑t−1
j=1 ykijφtj
)2
σ2t
+ λ
t−1∑j=1
φ2tj. (C–2)
Let φt(t) = (φt1, φt2, ..., φt,t−1)′ and yk
i(t) = (yki1, y
ki2, ..., y
ki,t−1)
′. The first term of C–2 is
1
σ2t
n∑i=1
J∑
k=1
Pk|i
(yk
it −t−1∑j=1
ykijφtj
)2
=1
σ2t
n∑i=1
J∑
k=1
Pk|i(yk
it − yki(t)
′φt(t)
)2
=1
σ2t
n∑i=1
J∑
k=1
Pk|i(yk
it
2 − 2yki(t)y
ki(t)
′φt(t) + φt(t)
′yki(t)y
ki(t)
′φt(t)
)
=n∑
i=1
J∑
k=1
Pk|iykit
2
σ2t
− 2
(n∑
i=1
J∑
k=1
Pk|iyki(t)y
ki(t)
′
σ2t
)φt(t) +
φt(t)′(
n∑i=1
J∑
k=1
Pk|iyki(t)y
ki(t)
′
σ2t
)φt(t)
= ct − 2g′tφt(t) + φt(t)′Htφt(t)
where
ct =n∑
i=1
J∑
k=1
Pk|iykit
2
σ2t
, Ht =n∑
i=1
J∑
k=1
Pk|iyki(t)y
ki(t)
′
σ2t
, and gt =n∑
i=1
J∑
k=1
Pk|iyki(t)y
ki(t)
′
σ2t
.
100
If It is a (t− 1)× (t− 1) identity matrix, then C–2 becomes
n∑i=1
J∑
k=1
Pk|i(yk
it −∑t−1
j=1 ykijφtj
)2
σ2t
+ λ
t−1∑j=1
φ2tj = ct − 2g′tφt(t) + φ′t(t)Htφt(t) + λφ′t(t)Itφt(t)
= ct − 2g′tφt(t) + φ′t(t)(Ht + λIt)φt(t)
which can be minimized by
φt(t) = (Ht + λIt)−1gt (C–3)
for fixed σ2t .
101
APPENDIX DDEFINITION OF KRONECKER PRODUCT
If
Am×n =
a11 a12 · · · a1n
a21 a22 · · · a2n
......
. . ....
am1 am2 · · · amn
and Bp×q =
b11 b12 · · · b1q
b21 b22 · · · b2q
......
. . ....
bp1 bm2 · · · bmq
then the kronecker product of A and B is
A⊗B =
a11B a12B · · · a1nB
a21B a22B · · · a2nB
......
. . ....
am1B am2B · · · amnB
.
102
APPENDIX EDERIVATION OF EQUATION 3-20
By Fourier transformation,
g(ω, τ) =
(1
2π
)2 ∫ ∫e−i(uω+vτ)C(u, v)dudv
=1
2π
∫e−ivτ
[1
2π
∫e−iuωC(u, v)du
]dv
=1
2π
∫e−ivτh(ω, v)dv (E–1)
where
h(ω, v) =
∫ ∞
−∞eivτg(ω, τ)dτ (E–2)
is the inverse Fourier transform of g in τ or the spatial spectral density for temporal lag τ .
Using E–2, 3–19 becomes
C(u, v) =
∫eiuωh(ω, v)dω. (E–3)
Let
h(ω, v) = ρ(ω, v)κ(ω) (E–4)
where ρ(ω, v) is a valid continuous autocorrelation function in v for each ω and κ(ω) > 0.
If∫
ρ(ω, v)dv < ∞ and∫
k(ω)dω < ∞, then in terms of E–4, E–3 becomes 3–20.
103
REFERENCES
[1] Andersson, L., Haley, C.S., Ellegren, H., Knott, S.A., Johansson, M., Andersson,K., Anderssoneklund, L., Edforslilja, I., Fredholm, M., Hansson, I., Hakansson, J.,Hakansson, J. and Lundstrom, K. (1994). “Genetic mapping of quantitative trait locifor growth and fatness in pigs”, Science 263 1771-1774.
[2] Angilletta, Jr., M.J. and Sears, M.W. (2004). “Evolution of thermal reaction normsfor growth rate and body size in ectotherms: an introduction to the symposium”,Integr. Comp. Biol. 44, 401-402.
[3] Akaike, H. (1974) “A new look at the statistical model identification”, IEEETransactions on Automatic Control 19(6): 716723.
[4] Banerjee, O., ’dAspremont, A., and El Ghaouli, L. (2006). “Sparse covarianceselection via robust maximum likelihood estimation”, Proceedings of ICML.
[5] Bickel, P. and Levina, E. (2008) “Regularized estimation of large covariancematrices”, Ann. Statist. 36(1):199-227.
[6] Bochner, S. (1955). Harmonic Analysis and the Theory of Probability, University ofCalifornia Press, Berkley and Los Angeles.
[7] Broman, K. (1997) Identifying quantitative trait loci in experimental crosses, Ph.D.Dissertation, Department of Statistics, University of California, Berkley.
[8] Broman, K. (2001). “Review of statistical methods for QTL mapping inexperimental crosses”, Lab Animal 30, no. 7, 44-52.
[9] Carlborg, O., Andersson, L. and Kinghorn, B. (2000). “The use of a geneticalgorithm for simultaneous mapping of multiple interacting quantitative trait loci”,Genetics 155, 2003-2010.
[10] Carrol, R.J. and Rupert, D. (1984). “Power transformations when fitting theoreticalmodels to data”, J. Am. Statist. Assoc. 79, 321-328.
[11] Cox, D.D. and Sullivan, F. (1990). “Asymptotic analysis of penalized likelihood andrelated estimators”, Annals of Statistics 18, 1676-1695.
[12] Cressie, N. and Huang, H-C. (1999). “Classes of nonseparable, spatio-temporalstationarycovariance functions”, J. Am. Statist. Assoc 94, no. 448, 1330-1340.
[13] Cui, H.J., Zhu, J. and Wu, R. (2006) “Functional mapping for genetic control ofprogrammed cell death”, Physiol. Genom. 25, 458-469.
[14] Daniels, M.J. and Pourahmadi, M. (2002). “Bayesian analysis of covariance matricesand dynamic models for longitudinal data”, Biometrika 89, 553-566.
[15] de Boor, C. (2001) “A Practical Guide to Splines”, Revised ed. Springer New York.
104
[16] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). “Maximum likelihood fromincomplete data via the EM algorithm”, J. Roy. Statist. Soc. B 39, 1-38.
[17] Diggle, P.J., Heagerty, P., Liang, K.Y. and Zeger, S.L. (2002). Analysis of Longitudi-nal Data, Oxford University Press, UK.
[18] Doerge, R.W. (2002). “Mapping and analysis of quantitative trait loci inexperimental populations”, Nat. Rev. Genet. 3: 43-52.
[19] Doerge, R.W. and Churchill, G.A. (1996). “Permutation tests for multiple lociaffecting a quantitative character”, Genetics 142, 285-294.
[20] Drayne, D., Davies, K., Hartley, D., Mandel, J.L., Camerino, G., Williamson, R.and White, R. (1984). “Genetic mapping of the human X-chromosome by usingrestriction fragment length polymorphisms”, Proc. Natl. Acad. Sci. USA 812836-2839.
[21] Eilers, P.H.C. and Marx, B.D. (1996) “Flexible smoothing with B-splines andpenalties”, Statistical Science 11, no. 2, 89-121.
[22] Fan, J. and Li, R. (2001). “Variable selection via nonconcave penalized likelihoodand its oracle properties”, J. Am. Statist. Assoc. 96, 1348-1360.
[23] Fu, W. (1998). “Penalized regressions: The bridge versus the lasso”, Comput. Graph.Statist. 7, 397-416.
[24] Fuentes, M. (2005). “Testing separability of spatial-temporal covariance functions”,Journal of Statistical Planning and Inference 136, no. 2, 447-466.
[25] Furrer, R. and Bengtsson, T. (2007). “Estimation of high-dimensional prior andposteriori covariance matrices in Kalman filter variants”, Journal of MultivariateAnalysis 98, no. 2, pp. 227-255.
[26] Genton, M. (2007). “Separable approximations of space-time covariance matrices”,Envirometrics 18, 681-695.
[27] Gill, P., Murray, W. and Wright, M. (1981). Practical Otimization, Academic Press,New York.
[28] Gneiting, T. (2002). “Nonseparable, stationary covarience functions for space-timedata”, J. Am. Statist. Assoc 97, no. 458, 590-600.
[29] Gneiting, T., Genton, M. and Guttorp, P. (2006). “Geostatistical space-time models,stationary, separability and full symmetry”, Statistical Methods for Spatio-temporalSystems (Monographs on Statistics and Applied Probability) B. Finkenstadt, L. Heldand V. Isham, editors, Chapman & Hall/CRC.
[30] Green, P. (1990). “On use of the EM algorithm for penalized likelihood estimation”,J. Roy. Statist. Soc. B 52, 443-452.
105
[31] Green, P. (1999). “Penalized likelihood”, Encyclopedia of Statistical Sciences 3,578-586.
[32] Griffiths, A.J., Wessler, S.R., Lewontin, R.C., Gelbart, W.G., Suzuki, D.T. andMiller, J.H. (2005). Introduction to Genetic Analysis, W.H. Freeman and Company,New York.
[33] Haldane, J. B. S. (1919). “The combination of linkage values and the calculation ofdistance between the loci of linked factors”, Journal of Genetics 8, 299-309.
[34] Haley, C.S., Knott, S.A. and Elsen, J.M. (1994). “Genetic mapping of quantitativetrait loci in cross between outbred lines using least squares”, Genetics 1361195-1207.
[35] Hoerl, A. and Kennard, R. (1970). “Ridge regression: biased estimation fornonorthogonal problems”, Technometrics 12, 55-67.
[36] Hoeschele, I. (2000). Mapping quantitative trait loci in outbred pedigrees. In:Handbook of Statistical Genetics Edited by D. J. Balding, M. Bishop and C.Cannings. Wiley New York. 567-597.
[37] Huang, H.C., Martinez, F., Mateu, J. and Montes, F. (2007a). “Model comparisonand selection for stationary space-time models”, Comp. Statistics and Data Analysis51, 4577-4596.
[38] Huang, J., Liu, L. and Liu, N. (2007b). “Estimation of large covariance matrices oflongitudinal data with basis function approximations”, J. Comput. Graph. Statist.16, 189-209.
[39] Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). “Covariance selection andestimation via penalised normal likelihood”, Biometrika 93, 85-98.
[40] Ibanez, M.V. and Simo, A. (2007). “Spatio-temporal modeling of perimetric testdata”, Statistical Methods in Medical Research 16, no. 6, 497-522.
[41] Jansen, R.C. (2000). Quantitative trait loci in inbred lines. In: Handbook ofStatistical Genetics Edited by D. J. Balding, M. Bishop and C. Cannings. WileyNew York. 567-597.
[42] Jansen, R.C. and Stam, P. (1994). “High resolution of quantitative traits intomultiple loci via interval mapping”, Genetics 136, 1447-1455.
[43] Jones, R.H. and Zhang, Y. (1997). “Models for continuous stationary space-timeprocess”, In Modelling Longitudinal and Spatially Correlated Data, Lecture Notes inStatistics 122, Springer, New York, 122, 289-298.
[44] Kao, C.H., Zeng, Z-B. and Teasdale, R.D. (1999). “Multiple interval mapping forquantitative trait loci”, Genetics 152, 1203-1216.
106
[45] Knott, S.A., Neale, D.B., Sewell, M.M. and Haley, C.S. (1997). “Multiple markermapping of quantitative trait loci in an outbred pedigree of loblolly pine”, Theor.Appl. Genet. 94 810-820.
[46] Kramer, M.G., Vaughn, T.T., Pletscher, L.S., King-Ellison, K. Erikson, C. andCheverud, J.M. (1998). “Genetic variation in body weight growth and compositionin the intercross of large (LG/J) and small (SM/J) inbred strains of mice”, Geneticsand Molecular Biology 21, 211-218.
[47] Kenward, M.G. (1987). “A method for comparing profiles of repeatedmeasurements”, Appl. Statist 36, 296-308.
[48] Kingsolver, J.G., Izem, R. and Ragland, G.J. (2004). “Plasticity of size and growthin fluctuating thermal environments: comparing reaction norms and performancecurves”, Integr. Comp. Biol. 44, 450-460.
[49] M. Kirkpatrick and N. Heckman, “A quantitative genetic model for growth, shape,reaction norms, and other infinite-dimensional characters”, J. Math. Biol. 27,429-450, 1989.
[50] Kolovos, A., Christakos, G., Hristopulos, D.T. and Serre, M.L. (2004). “Methodsfor generating non-separable spatiotemporal covariance models with potentialenvironmental applications”, Advances in Water Resources 27, 815-830.
[51] Krishnaiah, P. (1985). “Multivariate Analysis”, Elsevier Science Publishers B.V.,New York.
[52] Lander, E.S. and Botstein, D. (1989). “Mapping Mendelian factors underlyingquantitative traits using RFLP linkage maps”, Genetics 121, 185-199.
[53] Ledoit, O. and Wolf, M. (2003). “A well-conditioned estimator for large-dimensionalcovariance matrices”, Journal of Multivariate Analysis 88, 365-411.
[54] Levina, E., Rothman, A. and Zhu, J. (2008). “Sparse estimation of large covariancematrices via a nested lasso penalty”, Ann. Appl. Statist. 2, no. 1, 245-263.
[55] Li, H.Y., Kim, B-R. and Wu, R.L. (2006). “Identification of quantitative traitnucleotides that regulate cancer growth: A simulation approach”, J. Theor. Biol.242, 426-439.
[56] Li, Y., Wang, N., Hong, M., Turner, N., Lupton, J. and Carrol, R., (2007).“Nonparametric estimation of correlation functions in longitudinal and specialdata, with applications to colon carcinogenesis experiments”, Annals of Statistics35, no. 4, 1608-1643.
[57] Lin, M., Li, H.Y., Hou, W., Johnson, J.A. and Wu, R.L. (2007). “Modelingsequence-sequence interactions for drug response”, Bioinformatics 23, no. 10,1251-1257.
107
[58] Lin, M., Lou, X-Y., Chang, M. and Wu, R.L. (2003). “A general statisticalframework for mapping quantitative trait loci in non-model systems: Issue forcharacterizing linkage phases”, Genetics 165, 901-913.
[59] Lindley, D.V. (1957). “A statistical paradox”, Biometrika 44, 187-192.
[60] Liu, T., Liu, X-L, Chen, Y.M. and Wu, R.L. (2007). “A unifying differentialequation model for functional genetic mapping of circadian rhythms”, Theor. Biol.Medical Model. 4, 5.
[61] Liu, T. and Wu, R.L. (2007). “A general Bayesian framework for functional mappingof dynamic complex traits”, Genetics (tentatively accepted 2007).
[62] Liu, T., Zhao, W., Tian, L. and Wu, R.L. (2005). “An algorithm for moleculardissection of tumor progression”, J. Math. Biol. 50, 336-354.
[63] Long, F., Chen, Y.Q., Cheverud, J.M. and Wu, R.L. (2006). “Genetic mapping ofallometric scaling laws”, Genet. Res. 87, 207-216.
[64] Lynch, M. and Walsh, B. (1998). Genetics and Analysis of Quantitative Traits.Sinauer, Sunderland, MA.
[65] Ma, C. (2007). “Recent developments on the construction of spatial-temporalcovariance models”, Stoch Environ Res Risk Assess, Springer-Verlag, 22, s39-s47.
[66] Ma, C., Casella, G. and Wu, R.L. (2002). “Functional mapping of quantitative traitloci underlying the character process: A theoretical framework”, Genetics 161,1751-1762.
[67] Madsen, H. and Thyregod, P. (2001). “Calibration with absolute shrinkage”, J.Chemomet. 15, 497-509.
[68] Mallows, C.L. (1973). “Some Comments on Cp”, Technometrics, 15, 661-675.
[69] Matern, B. (1986). Spatial Variation, Springer New York, 2nd Ed..
[70] McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models, Chapman andHall, London.
[71] McLachlan, G. and Peel, D. (2000). Finite Mixture Models, John Wiley and Sons,Inc., New York.
[72] Meng, X-L. and Rubin, D. (1993). “Maximum likelihood estimation via the ECMalgorithm: A general framework”, Biometrika 80, 267-278.
[73] Mitchell, M.W., Genton, M.G. and Gumpertz, M.L. (2005) “Testing for separabilityof space-time covariences”, Envirometrics 16, 819-831.
[74] Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal Data,Springer Science+Business Media, Inc., New York.
108
[75] Myers, R. (1990). Classical and Modern Regression with Applications, PWS-KentPublishing Company, Boston.
[76] Nelder, J.A. and Mead, R. (1965). “A simplex method for function minimization”,Comput. J. 7, 308-313.
[77] Newton, H.J. (1988). TIMESLAB: A Time Series Analysis Laboratory, Wadsworth& Brooks/Cole, Pacific Grove, CA.
[78] Niklas, K.L. (1994). Plant Allometry: The Scaling of Form and Process, Universityof Chicago, Chicago.
[79] Nychka, D., Wikle, C. and Royle, A. (2002). “Multiresolution models fornonstationary spatial covariance functions”, Statistical Modeling 2, 315-331.
[80] Ojelund, H., Madsen, H. and Thyregod, P. (2001). “Calibration with absoluteshrinkage”, J. Chemomet 15, 497-509.
[81] Pan, J.X. and Mackenzie, G. (2003). “On modelling mean-covariance in longitudinalstudies”, Biometrika 90, 239-244.
[82] Porcu, E. and Mateu, J. (2006) “Nonseparable stationary anisotropic space-timecovariance functions”, Stoch Environ Res. Risk Assess 21, 113-122.
[83] Porcu, E., Mateu, J., Zini, A. and Pini, R. (2007). “Modelling spatio-temporal data:A new viogram and covariance structure proposal”, Statistics and Probability Letters77, 83-89.
[84] Pourahmadi, M. (1999). “Joint mean-covariance models with applications tolongitudinal data: Unconstrained parameterization”, Biometrika 86, 677-690.
[85] Pourahmadi, M. (2000). “Maximum likelihood estimation of generalised linearmodels for multivariate normal covariance matrix”, Biometrika 87, 425-435.
[86] Ramsay, J.O. and Silverman, B.W. (1997). Functional Data Analysis ,Springer-Verlag, New York.
[87] Rothman, A., Bickel, P., Levina, E. and Zhu, J. (2007). “Sparse permutationinvariant covariance estimation”, Dept. of Statistics, Univ. of Michigan (TechnicalReport no. 467).
[88] Sampson. P. and Guttorp, P. (1992). “Nonparametric estimation of nonstationaryspatial covariance structure”, J. Am. Statist. Assoc 87, 108-119.
[89] Satagopan, J.M., Yandell, Y.S., Newton, M.A. and Osborn, T.C. (1996). “ABayesian approach to detect quantitative trait loci using Markov chain MonteCarlo”, Genetics 144, 805-816.
[90] Sax, K. (1923). “The association of size difference with seed-coat pattern andpigmentation in Phaseolus vulgaris”, Genetics 8 552-560.
109
[91] Schabenberger, O. and Gotway, C. (2005). Statistical Methods for Spatial DataAnalysis, Chapman and Hall/CRC, Boca Raton.
[92] Schwarz, G. (1978). “Estimating the dimension of a model”, Annals of Statistics6(2):461-464.
[93] Sillanpaa, M.J. and Arjas, E. (1999). “Bayesian mapping of multiple quantitativetrait loci from incomplete outbred offspring data”, Genetics 151, 1605-1619.
[94] Smith, M. and Kohn, R. (2002). “Parsimonious covariance matrix estimation forlongitudinal data”, J. Am. Statist. Assoc 97, no. 460, 1141-1153.
[95] Stein, M. (2005). “Space-time covariance functions”, J. Am. Statist. Assoc 100,no.469, 310-321.
[96] Stratton, D. (1998). “Reaction norm functions and QTL-environment interactionsfor flowering time in Arabidopsis thaliana”, Heredity 81, 144-155.
[97] Stroud, J. (2001). “Dynamic models for spatiotemporal data”, J. R. Statist. Soc. B63, 673-698.
[98] Tibshirani, R. (1996). “Regression shrinkage and selection via the Lasso”, J. Roy.Statist. Soc. B 58, 267-288.
[99] Vaughn, T., Pletscher, S., Peripato, A., King-Ellison, K., Adams, E., Erikson, C. andCheverud, J. (1999). “Mapping of quantitative trait loci for murine growth: A closerlook at genetic architecture”, Genet. Res. 74, 313-322.
[100] Thornley, J.H.M. and Johnson, I.R. (1990). Plant and Crop Modelling: A Mathemat-ical Approach to Plant and Crop Physiology, Clarendon Press, Oxford.
[101] Waller, L. and Gotway, C. (2004). Applied Spatial Statistics for Public Health Data,Wiley-Interscience, Hoboken, N.J..
[102] Wang, Z., Hou, W. and Wu, R.L. (2006). “A statistical model to analysequantitative trait locus interactions for HIV dynamics from the virus and humangenomes”, Statist. Med 25, 495-511.
[103] Wang, Z. and Wu, R.L. (2004). “A statistical model for high-resolution mapping ofquantitative trait loci determining HIV dynamics”, Statist. Med 23, 3033-3051.
[104] Weiss, R. (2005). Modeling Longitudinal Data, Springer-Verlag, New York.
[107] West, G.B., Brown, J.H. and Enquist, B.J. (2001). “A general model for ontogeneticgrowth”, Nature 413, 628-631.
[106] Wolf, J.B. (2002). “The geometry of phenotypic evolution in developmentalhyperspace”, Proceedings of the National Academy of Sciences of the USA 99,15849-15851.
110
[107] Wong, F., Carter, C.K. and Kohn, R. (2003) “Efficient estimation of covarianceselection models”, Biometrika 90, 809-830.
[108] Wu, J., Zeng, Y., Huang, J., Hou, W., Zhu, J. and Wu, R.L. (2007a). “Functionalmapping of reaction norms to multiple environmental signals”, Genet. Res. Camb.89, 27-38.
[109] Wu, R.L., Hou, W., Cui, Y., Li, H.Y., Wu, S., Ma, C-X. and Zeng, Y. (2007b)“Modeling the genetic architecture of complex traits with molecular markers”,Recent Patents on Nanotechnology 1, 41-49.
[110] Wu, R.L., Ma, C-X., and Casella, G. (2007c). Statistical Genetics of QuantitativeTraits: Linkage, Maps, and QTL, Springer-Verlag, New York.
[111] Wu, R.L., Ma, C-X., Lin, M. and Casella, G. (2004a). “A general framework foranalyzing the genetic architecture of developmental characteristics”, Genetics 166,1541-1551.
[112] Wu, R.L., Ma, C-X., Lin, M., Wang, Z. and Casella, G. (2004b). “Functionalmapping of quantitative trait loci underlying growth trajectories using atransform-both-sides logistic model”, Biometrics 60, 729-738.
[113] Wu, R.L., Ma, C-X., Littel, R. and Casella, G. (2002). “A statistical model for thegenetic origin of allometric scaling laws in biology”, J. Theor. Biol. 217, 275-287.
[114] Wu, W.B. and Pourahmadi, M. (2003). “Nonparametric estimation of largecovariance matrices of longitudinal data”, Biometrika 90, 831-844.
[115] Wu, S., Yang, J. and Wu, R.L. (2007d). “Semiparametric functional mapping ofquantitative trait loci governing long-term HIV dynamics”, Bioinformatics 23,569-576.
[116] Xu, S.Z. (1996). “Mapping quantitative trait loci using four-way crosses”, Genet.Res. 68 175-181.
[117] Xu, S.Z. and Yi, N.J. (2000). “Mixed model analysis of quantitative trait loci”,Proc. Natl. Acad. Sci. USA 97, 14542-14547.
[118] Yang, J. (2006) Nonparametric functional mapping of quantitative trait loci, Ph.D.Dissertation, Department of Statistics, University of Florida.
[119] Yang, R.Q., Gao, H.J., Wang, X., Zheng, Z-B., and Wu, R. L. (2007). “Asemiparametric model for composite functional mapping of dynamic quantitativetraits”, Genetics 177, 1859-1870.
[120] Yap, J.S., Wang, C.G. and Wu, R.L. (2007). “A simulation approach for functionalmapping of quantitative trait loci that regulate thermal performance curves”, PLoSONE 2(6), e554.
111
[121] Yuan, M. and Lin, Y. (2007). “Model selection and estimation in the Gaussiangraphical model”, Biometrika 94(1), 19-35.
[122] Zeng, Z-B. (1994). “Precision mapping of quantitative trait loci”, Genetics 136,1457-1468.
[123] Zhao, W. (2005a). Statistical modelling for functional mapping of longitudinaland multiple longitudinal traits: structured antedependence model and waveletdimensionality reduction, Ph.D. Dissertation, Department of Statistics, University ofFlorida.
[124] Zhao, W., Chen, Y., Casella, G., Cheverud, J.M. and Wu, R.L. (2005b). “Anon-stationary model for functional mapping of complex traits”, Bioinformatics 21,2469-2477.
[125] Zhao, W., Ma, C-X., Cheverud, J.M. and Wu, R.L. (2004). “A unifying statisticalmodel for QTL mapping of genotype × sex interaction for developmentaltrajectories”, Physiol. Genomics 19, 218-227.
[126] Zimmerman, D. and Nunez-Anton, V. (2001). “Parametric modeling of growth curvedata: An overview (with discussions)”, Test 10, 1-73.
112
BIOGRAPHICAL SKETCH
John Stephen F. Yap was born in the town of Tagoloan, Misamis Oriental, Philippines
to Rhoda and Lizardo Yap. He has an older brother, Mark. John earned a B.S. in
mathematics from Ateneo de Manila University in the Philippines and upon graduation
worked as an actuarial assistant at Watson Wyatt. He also spent a year as an assistant
instructor in the Mathematics Department of Ateneo de Manila University. John
obtained a M.S. in mathematics with emphasis in actuarial science from the University
of Minnesota in Minneapolis and a Ph.D. in statistics from the University of Florida
in Gainesville. He will work at the Food and Drug Administration as a mathematical
statistician.
113